Intelligent stress calculation and scheduling in segmented processor systems using buddy approach

Abstract

Parallel Processing has been a widely studied field, used and implemented in computational systems. Many different types of topologies of processors have been implemented and their performance has been analyzed. The processor technology keeps evolving so their computational capability must be utilized accordingly when employed in parallel systems. In this article, new intra-parallel processor architectures (segmented/heterogeneous) has been used and an intelligent co-operative protocol has been implemented to optimally utilize the parallel components of the parallel processor design. More precisely a friendship based intelligent load balancing strategy has been designed and implemented to maximally utilize the parallel processor, which takes care of overloading and starvation problems and makes intelligent decisions regarding job scheduling. Context switching policies must also be designed carefully to stop performance degradation and with intelligent techniques this switching time can be reduced considerably. Work proposed in this article performs and executes load stability with feasible priori information about processors utilization, depending upon and based on this metric value the entire process space is partitioned among different categories. Based on the load status and state of affairs, processors are categorized and labeled and a suitable set out of those is figured-out that act as buddy for others and handles incoming process queue for overloaded processors. Further history and statistics of each processors is maintained and is utilized to make intelligent future scheduling decisions.

Keywords

Intelligent systems parallel environment intra-processor parallelism load stability job scheduling strategy

1 Introduction

Practically, all standalone computers or even phones and other mobile devices nowadays backs parallel processing from hardware view point and many other systems are configured by a network interconnection to form parallel environment to perform parallel computing. In the present research article a new form of parallel processor architecture (segmented processors comprising of different processor cores [1]) has been used and work has been done to enhance its productivity by using intelligent computing techniques. The strategy discussed and developed can also be used to enhance the computation capability of other parallel architecture with slight modification. Many different techniques have been designed for small and Large Massive systems whether they are loosely coupled or tightly coupled to achieve maximum parallelism in computing. These strategies augment the performance of computing by effective load distribution. As may such framework are network based parallel systems so they require minimizing delay in message communication. Today, focus is on to achieve maximum efficiency from processor and their interconnections without performance degradation and by the use of intelligent methods. For achieving these goals many different scheduling strategies has been implemented and consistently modified along with an overhead of load stability but little work has been done on providing fuzzy or intelligence based computing to improve scheduling and job allocation strategies. Motivation of this research works it to provide agglomeration of conventional and intelligent computing techniques to ensure maximum performance gains.

Numerous factors are considered and taken into account for providing intelligent scheduling like load stability matrix. The Load stability metric is in fact meant for estimation of stress (Work Load) on processors in terms of workload assigned at regular intervals [1, 13]. Some strategies follows load stability at initial or earlier stages of load distribution also known as pre-load balancing mechanism but some other mechanisms consider later system stages are known as post load balancing mechanisms i.e. when the system gets overloaded [9]. Some of these policies are architectural dependent some are globally or universally moldable to any architecture. Some policy structures are based on shared memory interconnection where as some policy environment depends upon socket based message passing paradigm [2]. Effective load distributing strategies need to be developed so the load can be uniformly distributed all over the processors. In the Present work a fair load distribution scheme has been developed for intra-processor parallelism. This policy evenly distributes the load among different processors cores depending upon their workload. The highly loaded cores are relieved from extra or new load and redirecting load meant for them to other free processors cores.

The proposed work obtains its objective by using two algorithms one is a conventional scheduling algorithm and other is intelligent decision based algorithm for making optimal scheduling decisions.

1.2 Intelligent scheduling

To make intelligent decisions in scheduling or job allocation is the primary motivation of this research work. Incorporation of intelligent algorithm makes the job allocation easier and optimal and on the same lines intelligent computing techniques has been applied in the present work. One special algorithm has been used for used for making intelligent decisions which relies on past history and statistics of processors/cores. The algorithm uses greedy approach to calculate performance, stability matrices. The output of the algorithm can be used for forecasting the job allocations and making pro-active scheduling decisions. This intelligent algorithm presented later in the pper is the key algorithm of the proposed work.

2 Related work

Parallel computing has been used extensively in large systems to enhance processing capabilities. Lots of Literature is available for implementing and enhancing parallel systems. Many different parallel architecture has been developed and implemented with different configurations and constraints. In this present work one such intra processor parallel architectures has been taken [1, 3]. This architecture is highly flexible architecture and evolves with the type of application used. Figure 1(a) presents the core idea of processor as proposed by its inventors [1]. The processors architecture is segmented in to different sized cores each having its own CPU. All the cores share the same main memory and each have its own independent cache memory as well. This processor architecture is a hybrid type of processor architecture which is largely homogenous but slightly heterogeneous also, attributed to different sizes of different cores. In [1, 4] the segmented processor Fig. 1(b) has been used for MANET (Mobile Ad Hoc Networks) applications. The different components of the application have been distributed to different cores of the processor. When the load is distributed to the different cores of the processor then some of the cores may get overloaded and some may be idle at times. So this load can be equally distributed to other cores which are lying idle at this time. In this article one such policy has been designed which ensures equal load distribution. In literature five major dynamic scheduling strategies were proposed at very early stages for implementing load balancing in parallel system as listed below:

SID/RID (sender/receiver initiated diffusion)

HPC (High Performance Computing)

HBM (Hierarchical Balancing Method)

DEM (Dimension Exchange Method)

Gradient model (GM)

The approach used in the present article is similar to the gradient method like as it identifies over loaded and under loader processors for further schedule of incoming job arrival [5]. On the other hand heterogeneous processor interconnection gives one new dimensional aspect to the parallel scheduler implementation [6, 7]. Many different processor’s parameters can be considered for example million instructions per seconds (MIPS), frequency based load estimations and scheduling (FBS), floating point operations per second (FLOPS) [11].

3 Characterization of scheduling processes

Generally, developing or implementing a parallel scheduler requires lot of intermediary tasks which must be consistently accomplished on time. Management or synchronization among each nucleus/central activity is required with an improved level of abstraction [8]. Plenty of the work on parallel processing discipline has artificial or synthetic workload experimentation model to discover high level attributes of the system. Simulation designs are the finest display place to analyze the behavior of any parallel scheduling strategy. Even though actual implementation of simulation might require so many additional tasks to complete but gives an idea to structure a benchmark. In the following sub-sections are some key points that must be taken care while implementing a parallel scheduler.

3.1 Scheduling the processor

Generally, scheduling instantaneous arrival of tasks over parallel environment is a nucleus part of multi-processor scheduling algorithm [3]. Policies designed so far tries to place the processes on the processors in such a way that the effectiveness of processors is never compromised and the processes are completed on time without getting trapped in the processor’s queue. Few existing strategies that are followed have center of attention on the efficient tasks placement while some work on task reallocation and alteration of the jobs so as to balance the load afterward when the system state gets overloaded.

3.2 Thread and process level parallelism

Composition of the scheduling policies is designed analogous to the process behavior. The strategies in use may also think about the time sharing and space sharing aspects where the previous allocate many jobs to available processors and afterward assign multiple processors to a sole active job [8]. This provides a competence to realize process and thread level parallelism. Parallel jobs have numerous numbers of lively threads so this needs space sharing problem division or distribution. On the other side number of simultaneous arrivals currently ready to schedule makes task or process level parallelism.

3.3 Dynamic and static scheduling policies

The method of scheduling strategies further involves static and dynamic allocation [2]. The static method assign the jobs to the processors in a non-preemptive mode that means the processors will not surrender the control while a job is in execution over them. In difference to this the dynamic scheme works as a preemptive one where the jobs presently running can be pre-empted based on priorities [10 –12]. The static scheduling strategies considers only the basic characteristics of the design and configuration like the processor’s queue size i.e. the number of processes in the input buffer, number of jobs in the queue, number of waited jobs, the burst cycle requirement of tasks or jobs [3, 5]. The lively or dynamic policies in addition to the just mentioned parameters also takes into account the varying processor load, their clock frequency, their computing efficiency, maximum load all the processors can handle etc. The dynamic arrangement or scheduling is effective in terms of its adaptableness to the changing system conditions but is also intricate to implement [6, 10]. The fixed or static scheduling on the other hand is quite small in complexity and so is easy to implement. A dynamic scheme goes for task re-scheduling and task context switching after distribution for load balancing.

3.4 Heterogeneous processors and scheduling constraints

The processors used and connected in a parallel environment can have either heterogeneous homogeneous configuration. The homogeneous configurations have all the processors with same hardware resources like clock frequency whereas the heterogeneous configurations have all the processors with different clock frequencies and may have different hardware architecture [11]. So, the scheduling strategies are designed differently as per the configurations. The competence of the processors is least compromised if dynamic strategies and ‘heterogeneous’ configuration of processors is followed [18]. This kind of policies involves highest efforts in load allocation as clock cycles are dissimilar and load must be planned according to that. In some state of affairs it might be possible that high frequency processor may have low computation intensive workload or vice versa [19, 20].

3.5 Context switching and Process Rescheduling

Despite the complexity involved, context switching of jobs acts or plays a significant function among the processors queues with which the load robotically or automatically gets balanced among them but these policies or strategies have an operating cost associated with them as the jobs are reshuffled many times to establish stable the load among the processors. Sometimes task waiting time may rise or increases as frequent context switches may occurred throughout its execution life span [4]. The apprehension of load stability or load balancing arises in case of static policies and homogeneous configurations where all the processors comes with same speed to execute the jobs with different processors’ cycles’ requirements and where no preemption is allowed; the jobs once allocated are not changed. In that case there arises the need of following a load balancing course of action so that all the processors have almost same quantity of workload to execute and their competence would not get affected [1].

3.6 Queues and their use in scheduling

Apart from of all these aspects, scheduling queues is the chief component in a parallel environment. There are loads of queues in configurations. The higher or upper level scheduling is handled by a universal or global queue also known as ‘long term scheduling’ from where jobs entered in the system or controller and then controller filters the processes to select suitable jobs for dispatching them to the processors. Other queue used is ‘the local queue’ (short term scheduling) that is self-sufficient or autonomous to an individual processor and from where jobs are picked up by the processor for execution [8, 11]. In addition to it, several other queues are also maintained like priority, waiting etc. The usage of these different queues provides an occasion to implement different scheduling schemes on different processors configurations. Each policy used and implementation should be precisely and well defined and must realize performance according to the parameter given.

4 Comparison of different scheduling policies

Conventionally, process scheduling at operating system level involve multi-programming and time sharing systems. Focal point was on to select best job for the processor at any given time. Single controller and processing unit was available for which the priori task selection action was performed and a ready queue was maintained. In earlier days process scheduling was not the most important issue of concern; either a single job was executed or batch processing of jobs was there and carried out. But when parallel computing came into existence the task of job scheduling became a main concern while designing of a job controller unit. Scheduling jobs on a single processor is easy as it only considers which job to execute but on multiple processors, two dimensions are necessary that is which job to execute and on which processor that complicate the task of scheduling. Nonetheless, all the scheduling strategies try to improve the computational capability of the systems. The different scheduling strategies are:

4.1 First Come First Serve Scheduling (FCFS)

In this strategy the jobs are submitted to the processors in the order in which they appear in the global queue and are executed in non-preemptive style. This method does not take into account the parameters like the computation cycles a particular job will take and the processors configuration. This type of technique is easy in its realization but pay little interest to the effective scheduling of jobs to increase processor’s efficiency because the normal or average waiting time for the processes is quite high in this case.

4.2 MCM (Minimum Computation Mass)

This policy make up or structure the job whose weight mass is minimum in the given batch is allocated first. Every job has a given or predefined weight in terms of burst cycle, which is interpreted before or prior to the scheduling of jobs among processors. Job with the least computation obligation is scheduled first on the processors and so on. This strategy is more efficient in case of lone processor systems as it increases the competence of processor by improving the average waiting time for jobs. Shorter or shortest job also increases overall throughput of thesystem.

4.3 HCTS (Highly Critical Tasks Selection)

In HCTS technique the scheduler selects high critical task from a list of jobs and assigns a priority catalog to each of them. A priority order is maintained after resolving intermediate dependencies. This kind of jobs which are considered extremely critical must be finished according to a predefined order. In this type of real time environment, scheduling and dispatching is executed according to this precedence. The excuse or loophole in this technique is that it causes starvation which means low priority jobs may never get executed awaiting a higher priority task releases the CPU. Shortest Job First (SJF) is a kind of priority based scheduling in a non real timesystem.

4.4 FSBP (Fair Share Batch Partitioning)

In FSBP policy structure, scheduler divides or partitions the current batch according to the available processor space. Multiple jobs may be allocated to each processor. An equivalent division metric will be applied during allotment. These types of mechanisms are static in nature because consideration is on number of jobs in the present batch not on their size, computation requirements etc. Additionally at each individual CPU level, execution time is divided among processes in interleaved fashion. On the other facet scheduler distributes fair share of each batch to the available processors in the interconnection so that, concurrent operations are performed in an overlapped fashion [4, 16].

4.5 BBS (Bounded Buffer Schemes)

In BBS jobs are dispersed chronologically to the number of offered processors till there is a free space in processors buffer queue. Essentially this acts as comparable partitioning of currently available ready job set. But the division is performed chronologically or in order that schedule jobs according to their arrival or arrival time. This policy is static and is used in multiple processor systems where every processor implements RR (Round Robin) scheduling in its local queue. The Long term scheduling can either be FCFS or SJF in the global queue. In this a long term queue is maintained whenever a volume of job arrives; the scheduler performs allocation seriously against these queues; although pre-fetch buffers are also used and deployed in interaction with processors cache [8].

4.6 ESJE (Effective Single Job Execution Policy)

In ESJE short term scheduler of every processor pick up a sole job from global queue and execute it till its completion. The processors are lightly loaded in this method as they have only single job to execute in single unit of time, although scheduler may use pre-fetch buffers for early choice of next allocation [8]. This strategy is static non preemptive in character.

4.7 MJQL (Minimum Job Queue Length)

In MJQL selection of the suitable processor for the next inward bound job is based upon the queue size. Arriving job is given to that processor which has minimum number of pending jobs in its local ready queue. In this local scheduling is done using traditional Round Robin scheduling. This policy is dynamic in its manner and requires a priori information about number of jobs in the ready queue of each processor.

4.8 LRW (Least Remaining Work)

In LRW policy is essentially a load balancing scheduling strategy where the jobs are given or allotted to those processors that have bare minimum pending work to execute. LRW technique works in homogeneous environment and considers that all processors have similar internal configuration. The policy structure periodically approximate the current load of each processor and the processor which has minimum amount of work remaining is the projected processor for the current job [4, 14].

4.9 CBS (Cycle Based Scheduling)

The CBS strategy works well in heterogeneous configuration and is the improvement over LRW (Least Remaining Work) policy structure. The inward bound jobs are allotted to that processor which can handle them effortlessly and complete them swiftly.CBS policy firstly computes the current load on each processor and then finds remaining time necessary to compute that load plus incoming job load. The reaming time is computed with respect to total processor pending work and clock frequency [11]. After then the processor which requires minimum computation cycle will get that job.

4.10 QRA (Quicker Round Robin Access)

QRA strategy follows swift task allocation. It measures the round robin cycle length of each processor’s ready queue. Each of the cycle length measures from the given index of currently executed job till end of analogous ready queue. This metric provides swift access to next incoming job as the job is allotted to the processor which has smaller round robin cycle. QRA policy structure is different from MJQL policy discussed where the scheduler determine the number of jobs currently available in the ready queue but the given QRA determines the round robin cycle length from the index of presently executed job till the end of ready queue.

4.11 TAGS (Task Assignment by Guessing Size)

In TAGS strategy, jobs are allotted as and when they arrive in the system but if it is found that a job is running too long in a processor and consuming recourses, it is destroyed and restarted either from the beginning on some other processor or some of its modules are distributed to other processors for load stability. These types of tasks must have flexible structures. In other words, the conduct of the job can be changed to adapt present system needs [4, 17]. This technique estimates the size of the current job and based on that estimate, it performs the task shuffling.

4.12 ORBITA (On Demand Restriction for Big Tasks)

ORBITA policy is the enhancement over TAGS in the sense that it collects advanced or priori information about present ready jobs. In ORBITA, a processor will execute a job only if it will not take too long to execute or else it can refuse that particular job [14, 15]. All this information regarding to current job is collected from precedent execution history of similar kind of jobs. This method is fairly well in load balancing of jobs on multiple processors as the processors can someway decide which load to take and which to reject [4, 16]. Tables 1 and 2 presents the comparison of different scheduling policies. The X indicates absence of a feature and √ indicates presence.

5 Proposed work and discussion

5.1 Novel BSPI-scheduling policy

Two major algorithms are used to make optimal scheduling decisions. The first algorithm is based on conventional computing techniques and is called BSPI (Buddy Set Processor Identification). The other one is based on intelligent computational techniques and uses the first algorithm to construct profiles and statists of the processor cores. The BSPI policy for segmented processor design described in this research implement the Long term scheduling, which acts as global upper level or system level incoming queue where jobs are placed randomly in a frequent arrival stream i.e the arrival of jobs is non ordered. Now the idea here is to figure out the processors which are more apt or suitable to interact with this incoming queue. Its, worth noting that some policy structure selects the jobs for the processor but in this work the selection of processors is done for the jobs. At the outset all processors are free and able to interact with the global queue but as the jobs are captured, processors get busy or occupied and it might be possible that they get over loaded. In the Proposed work the buddy set out of them is recognized which handles global queue in place of their stress below or underneath processors. So, there should not be a task blockage at upper level queue. The given approach that is followed is a time sharing system All the processors are majorly homogenous and slightly heterogeneous in nature. The long term scheduling queue talks or interacts with the processors whose PUF (processor utilization factor) is greater than 50%. Set of or multiple jobs are allotted to processor till their PUF remains larger than 50%. Each time the PUF value goes down; the scheduler automatically itself stops the further allotment to that buddy set and figures out another buddy set again. Such process is performed regularly or periodically at each processor clocks. The advantage of this scheme is that intermediary processor interaction is no longer required, so decrease of context switching and task reallocation operation. The given policy interacts behaves dynamically by dividing processor space in different parts. As per or according to the processor under category the job may be efficiently scheduled, but needs workload characterization step that is priori information about job behavior and its structure. In Order or chronological job distribution is done which allot incoming jobs to processors in the given order in which they arrived. The processors internally itself implement RR (round robin algorithm) to execute the jobs by allocating or giving a time quantum of their clock frequency. The time quantum is calculated and computed according to the average of computation cycles of all jobs in the queue. This case provides a dynamic behavior to the existing in order Chronological long term scheduling strategy. The Strategy does not blindly allocate jobs to the processors but somewhat a parameter is calculated on the basis of which the load on each processor is projected and estimated and then it is deduced that whether that processor is ready to adapt further jobs or not.

The scheduling policy works as follows:

The given Jobs are queued in a global queue managed by a job controller that sends the jobs to the processors in a periodic interval of time.

In the beginning incoming jobs are allocated to processors in chronological order.

After this the stress on each processor is guessed in terms of Processor Utilization Factor (PUF) that is estimated or predicted from the work done by the processor in terms of tasks cycles assigned and tasks cycles finished.

The processors which have PUF below 50% are aptly considered under stress and they are filtered out from the global queue so that they are not allocated any new task, so that they will complete the work they already have.

The processors who have PUF above 50% becomes the friends or buddies of the under stress processors and leverage them by getting jobs from the global queue.

When the under stress processor’s PUF reaches more than or above 50% value, it is again allocated jobs from the global queue.

This entire process is repeated and jobs are executed by the stress free or underused processors so that the need of load stability is satisfied during scheduling of jobs.

The given scheme tries to overcome the following drawbacks of existing load balancing strategies:

Recurrent task adjustments

Tasks or load Rescheduling

Tasks walkout or Blockage

Context switching of process address space

The Processor ‘inefficiency’ due to excessive stress

5.2 Sequence flow diagram

Figure 2 shows the flow chart of the scheduling policy presently followed in this research work, idea beneath this is to stabilize the allocation process by selection of most promising processor set for the in-hand current distribution. The outcome after this processing is then fed to an intelligent algorithm which makes intelligent and up to date scheduling moves.

5.3 Metric and parameters used

The Processor work-load is calculated on the foundation of work assigned and then completed by the processor. The work allotment considers the processor speed and the total of computation cycles of it that the processes allocated will require. Let The Processor frequency is n GHZ (n can be 1,2,3,...) for example 1-Ghz.

Then Calculate the following metrics:

Cycles per second: $Pro_C yc_P er_S ec = n \times 10^{9}$ (1)

Seconds consumed for one processor cycle: $T_{{one}_{cyc}} = \frac{1}{n * 10^{9}}$ (2)Processor load: Let α_i specifies queue length at ith processor (i ranges from 1 to 8; there are 8 segmented processors in simulation). Work assigned to processor in current time unit: $P_{Work_a {ssig}_{i}} = \sum_{k = 1}^{α_{i}} (Job_k_comp_cyc_P_{i})$ (3)

The inbound jobs may have computation time greater than the computation cycles (cyc_per_sec) of the processor. This directly means that one cycle of processor may not be sufficient for the current job completion. In such case the next cycle of the processor will be allotted to the previous job rather than the next one. So, calculate the pending work of the processor as sum of remaining cycles the last or previous job require and the computation requirements of the next jobs of the batch. $Rem_job_c {ycle}_{k} = curr_job_c {yc}_{k} - avail_cyc_p_{i}$ (4)

Work pending that is to be done in next cycle: $P_{work_p {end}_{i}} \sum_{k = 1}^{α_{i}} ({Job}_{K}_comp_cyc_P_{i} - avail_cyc_p_{i})$ (5)

P_{work_pending_i} is the all or total work pending in ith processor queue. This is compulsory to calculate total processor utilization factor (PUF). On the reverse side job scheduler allots the jobs in chronological job distribution scheme, so load will be unstable. The Jobs are arranged or scheduled and completed side by side, utilization factor will be calculated based on the work completed and allotted periodically.

Work Completed: $P_{work_c {ompleted}_{i}} = \sum_{k = 1}^{α_{i}} (Proc_c {ycle}_{i}_given_j {ob}_{k})$ (6)

Now afterwards calculate the efficiency of the processor in terms of its utilization of the cycles given. Define this term as Processor Utilization Factor (PUF). It considers the number of cycles processor consumed in harmony with the work assigned to it and the work completed by it.

5.4 Processor Utilization Factor (PUF)

$PUF = \sum_{i = 1}^{p} \frac{P_{work_c {ompleted}_{i}}}{P_{work_a {ssigned}_{i}}} \times 100$ (7)

Depending on the above calculated metric PUF, the scheduling policy will identify the “stress underneath” processors that is processors which have PUF <50%. They will not be allocated the new job till they complete the already given jobs and their PUF reaches more than 50% value. Meanwhile, the processors who have PUF > =50% will continue receiving jobs from the scheduler.

5.5 Pseudo code

Initialize

Set BC:=0

Set SC:=0

For every Processor P_i do where i = 0 to n

Compute Processor Utilization Factor PUF by work assigned and work completed

[If Test PUF > =R1 then]

Set-FreeState(P_i,1)

Add-BuddySet(P_i)

BuddySetCount(BSC,1)

Alloc(pi,GlobalQueue)

[Else Test PUF > =R2 then]

Stress start or initiation state: still manageable can be assigned new job; add in buddy set

Set-InitiationState(P_i,2)

Add-BuddySet(P_i)

Buddy-SetCount(BSC,1)

Alloc(pi,GlobalQueue)

[Else Test PUF> =R3 then]

Stress start or beginning state: can be assigned new job but in next few seconds encounter stress; add in buddy set

Add-BuddySet(P_i)

Buddy-SetCount(BSC,1)

Alloc(pi,GlobalQueue)

[Else Test PUF <R4 then]

Stress underneath state: no new job assigned

Add-StressUnderneathSet(P_i)

Stress-UnderneathCount(SC,1);

Detach(P_i,GlobalQueue);

[End of If then Test]

[End of For construct]

Exit

See Table 3 for conditional rules R1-R4.

5.6 Layered architecture

In the Architecture Four layer (Fig. 3) are proposed containing load collector, stress estimation and a friend or buddy set identifier module etc. Load container or collector, regularly calculates the load of each processor. The load information so obtained is then send to the stress estimator module of load balancer which computes the PUF of every processor and figures out the buddy set that must work for stress underneath processors.

5.6.1 Layer 1: PLSC (Processor Load Status Collector)

The Processor Load Status Collector prepare a status table that contains processors utilization value, which afterward sent to BPSI module for the recognition of stress underneath buddy set among number of working processors so that global or system queue should be scheduled in such a way that processors having PUF <50% are not to be given any new assignment until they complete their pending tasks.

5.6.2 Layer 2: BPSI (Buddy Processor Set Identifier)

After getting the load information regarding processors from load status collector, the BPSI module figures out the stress underneath processors for which the scheduling of global queue must be aborted or stopped for current period. The BPSI finds out the main processors which have limited stress from the given processors space, which acts as friends or buddies for stressed underneath processor set. This buddy set now handles global queue in place of their other stressed buddy set.

5.6.3 Layer 3: JC (Job Controller)

JC the Job Controller module receives the buddy set rank from the BPSI and then schedules the jobs in such a fashion that only those processors stay in touch with the global queue which are stress free and the stress underneath processors are not given any new work-load until they complete their pending work. Once the job scheduling is decided, the job dispatcher sends the jobs to the concerned processors.

5.6.4 Layer 4: PI (Processor Interconnection)

The PI layer consists of the processor space having many homogeneous or heterogeneous processors that are connected in homogeneous configuration. The job controller sends the jobs to these processors depending upon their PUF value. The processors executes the jobs allotted to them in a round robin fashion where the time variable or quantum given to each job is same as to the one clock period of the processors. If a job requires computation time for its execution that is more than processors’ one clock cycle then it is added as remaining or pending work in processors next clock cycle and this keeps recurring until one particular job is completely executed.

5.7 Intelligent scheduling

To optimally enhance the performance of the proposed system a special algorithm called IPSI (Intelligent Processor Set identifier) has been developed and presented. The IPSI algorithm takes care of the statistics and history of each processor to predict optimal future job assignments. The algorithm is based on greedy approach and chooses optimal local solution from search space. The algorithm prepares a performance history sheet and uses statistics from this sheet to find most favorable processor set. The algorithm takes care of PI i.e. performance index and RANK. The rank defines the goodness of the processor’s performance index and stability computed overtime. The algorithm iterates many times to find out a suitable and stable set of processors from the pool. During each iteration the algorithm compares the previous processor parameters with the current parameters. The algorithm then prepares a stability matrix which can be used for optimal job allocation.

5.7.1 Execution scenario

The algorithm is executed at regular interval for assessing the performance stability of processors cores. RANK matrix is the final outcome of the algorithm and indicates the performance capability

IPSI (Intelligent Processor Set Identifier)

//RANK [Stability, Status], PUF, CSUM (Cumulative Sum)

Set CSUM =0;

Repeat until Exit criteria is met i.e. CSUM > 5

Repeat for processor 1 ... n and Iterate through step 3 until optimal solution is found (Greedy Approach)

IF (PUF_i (P)> PUF_i - 1(p))

Then Increase RANK of Processor (P) by differential of PUF

Status = Status+PUF_i (P) - PUF_i - 1(p)

CSUM = CSUM+1

Else IF(PUF_i (P) - PUF_i - 1(p))

Decrease RANK of Processor (P) by differential of PUF

Status = Status-(PUF_i - 1(p) - PUF_i (P))

CSUM = CSUM+1

End of Step 3 Loop

Set CSUM = 0

End of Step 2 Loop

Refer RANK Matrix of the processor job allocation

Exit

-of each processing core. The matrix is kept in sorted order (ascending) and is used as a priori information for scheduling. The stability value indicates the processing power of the processor/core computed over time. The status value indicates the availability of the processor. Both of the stability and status values are used for scheduling after the exit criteria of the IPSI algorithm is met. The variable CSUM i.e. cumulative sum controls the exit condition of the algorithm. The CSUM depends upon the performance fluctuations and when these fluctuations are trivial i.e. performance is stable then the algorithm terminates.

6 Results

The simulation environment and implemented presents the workload distribution to the multiple hybrid i.e. majorly homogeneous but slightly heterogeneous processors in such a way that load stability prevails at initial or very early stage of job scheduling so the later impacts of context switching are removed. The results are taken at regular time intervals and are verified with the help of setup and illustration produced in a simulation. The simulation view is described using 8 hybrid processors having same clock frequency. The job scheduler and buddy processor set identifier divides the load according to friendship or so called “buddy strategy”. The outcomes of IPSI algorithm can be utilized for making smart scheduling decisions. The benefit of buddy approach is that no in-between processor interaction is required which results in decrease in task shuffling operation. This research categorizes processor space in different categories and as per their PUF new jobs are scheduled. The processors which have more than 50 percent PUF will be added to the buddy set sequence respectively and handles the global queue till PUF is greater than 50 percent. Figure 4 presents utilization factors at a specific run time.

The essential parameter of processor’s categories used in the picture is described below, showing the processors with different colors according to occurred stress conditions. Processors identification marks are changed dynamically, they are constraint specific. Depending and based on above depicted calculations, P1 and P2 are in stress initiation or build up stage and all others are in stress free stage. The graph is as shown in Fig. 5.

	Stress Free (PUF >90%)
	Stress Initiation (PUF > =75% and <90%)
	Stress Beginning (PUF > =50% and <75%)
	Stress underneath (PUF <50%)

As presented above (Fig. 6), after some time span all the processors encounter stress underneath condition so, none of them accepts job from the job scheduler and completes the tasks assigned to them. The graph is as presented in Fig. 7.

When the time passes and the system timer reaches 4 second, the status is as processors P1, P3, P5 and P6 are in stress underneath stage i.e. not in stress (Fig. 8); P2 and P8 in stress beginning stage and P4 and P7 are in stress commencement stage. So the processors P2, P8, P4 and P7 turns out to becomes the “buddy set” and take jobs from the global queue while the processors P1, P3, P5 and P6 are under stress or high stressed condition therefore they do not take any new job from the global queue. These all processors complete their respective tasks assigned to them so that can reach the stress free condition as soon as possible. So, with the scheduling policy named friendship based or “buddy strategy” followed, as discussed above, the results produced verifies that the overhead of context switching and load stability by task adjustments are not needed as the policy implemented looks for the workload distribution at the initial stage of job scheduling and assigns jobs only to those processors which are stress free and those processors becomes friends or “buddies” of the stress underneath processors that need not be worried about the incoming jobs and focuses to finish their work easily. Figure 9 presents the graph as per the Present PUF values.

6.1 Impact of intelligent scheduling

The matrices constructed by IPSI algorithm significantly impact the performance of the system. The matrices are used as a reference by the scheduler to make smarter scheduling decisions. Due to the selection of best processor/core for each incoming process the performance of the systems is positivity impacted and results in increased performance gains. The IPSI algorithm is able to provides best and dynamic processors set selection and by capturing the past and current status and availability of the processor. Figure 10 presents an output snapshot and the impact of this intelligent scheduling. It can be seen from Fig. 10 that over long term the intelligent component of the IPSI algorithm yields significant performance improvements. The y-axis indicates the PUF or core utilization factors. The x-axis presents the different processors/cores.

7 Conclusions and future scope

With the ever rising computing needs of the users efficient utilization of underlying hardware resources is a prerequisite. Convention methods to utilize parallel computing resources has been widely studied and implemented. But there exist scope of intelligent algorithms to further improve the performance of parallel computing scheduling techniques. In the present work intelligent algorithm has been used together with conventional computing methods to optimally utilize the parallel hardware resources. The system proposed first utilizes a conventional computing based buddy method to solve load distribution problem. The other algorithm which further improves the performance gain is based on intelligent computing methods. This algorithm constructs matrices from the past history and statistics of the processor to forecast future job allocations. After the system matures after running for considerable time and statistics are well recorded then scheduling decision can be made from these matrices. The proposed systems achieve remarkable performance improvement by using intelligent scheduling algorithms. The processors utilization factor remains considerably good as per the load characteristics. Further improvements in the system can be made by incorporating more intelligent methods and by introducing evolutionary computing strategies.

References

Taneja

, et al., SPF: Segmented processor framework for energy efficient proactive routing based applications in MANET, 2nd International Conference on Recent Advances in Engineering & Computational Sciences (RAECS) (2016), 1–5.

Bhullar

R.K.

et al., Cross-Platform Application Development for Smartphones: Approaches and Implications, 3rd International Conference on Computing for Sustainable Global Development, IEEE (2016), 2571–2577.

Kumar

et al., Specialized hardware Architecture for Smartphones, Intl Journal of Engineering Research and Applications, ISSN: 2248–96224(5) (2014), 76–80.

Kumar

et al., Smartphones hardware Architectures and Their Issues, International Journal of Engineering Research and Applications, ISSN: 2248-96224(5) (2014), 76–80.

Kaur

et al., Energy Named Entity Recognition, Extraction and Classification using Conditional Random Field with Kernel Approach, IJEAR, ISSN: 0973-456210(15) (2015), 38193–38198.

Kaur

et al., Comparing Routing Protocols, IJEMER, ISSN: 2249-66454(2) (2014), 36–42.

Taneja

et al., EEGRP: Grid Based and Caching Equipped Energy Efficient Routing Protocol for MANET with Restricted Movement, Far East Journal of Electronics and Communications, ISSN: 0973-69993(Part I) (2016), 185–199.

Kumar

et al., Smartphone’s Hardware and Software Development Issues and Factors, IJES, ISSN: 2320-03323(12) (2014), 69–75.

Liao

W.H.

et al., GRID: A fully Location Aware Routing Protocols for MANET, Telecommunications Systems (2001), 1548–1557.

10.

Tseng

Y.-C.

et al., Power Saving Protocol for IEEE 802.11 based Multi-hop MANET, IEEE (2001), 37–60.

11.

Verma

S.K.

et al., Hybrid Image Fusion Algorithm using Laplace pyramid and PCA Method, ACM Conference March (2016), 1–6.

12.

Hsieh

T.-Y.

et al., An Architecture for Power-saving Communication in Wireless Mobile Ad-hoc Network Based on Location and Position Information, Micro-processors and Micro-systems (2004), 457–465.

13.

Royer

E.M.B.

et al., Distance Vector (AODV) Routing, IETF Internet draft, draft perkinsmanet aodvbis-OO txt (2003), 225–234.

14.

Ebert

J.P.

et al., Power Saving Mechanisms in Emerging Standards for Wireless LANs: MAC Layer Perspective, IEEE Personal Communications4 (1998), 40–48.

15.

Steenstrup

M.E.

et al., Distance-Vector Routing, Routing in Comm. Networks, Prentice-Hall (1995), 83–98.

16.

Sisodia

et al., Performance Evaluation of a Table Driven and On Demand Routing protocol in Energy Constrained MANETs, Proceedings of IEEE Conference ICCCI (2013), 76–80.

17.

Kumar

et al., A Novel Prime Numbers Based Hashing Technique for Minimizing Collisions, 2016 2nd International Conference on Next Generation Computing Technologies (NGCT-2016), IEEE (2016), 1–7.

18.

Pawar

et al., Optimized Route Selection On The Basis Of Discontinuity And Energy Consumption In Delay Tolerant Networks, Springer : Advances in Intelligent Systems and Computing, ISSN: 2194-5357, In Press.

19.

Arora

et al., Novel Stress Calculation in Parallel Processor Systems Using Buddy Approach with Enhanced Short Term CPU Scheduling, Proceedings of the International Conference on Communication and Computing Systems (ICCCS 2016), ISBN: 9781138029521 (2016), 181–197.

20.

Kaur

et al., Stochastic Approach for Energy-Efficient Clustering in WSN, Global Journal of Computer Science and Technology14(7) (2014), 1–8.