A new agent based load balancing model for improving the grid performance

Abstract

In grid computing, if the system load in each of resources is almost equal, this indicates correct resource usage. This paper presents an agent based load balancing model. Instead of balancing the load in the grid by job migration technique or by moving an entire process to underloaded resources. The proposed agent based load balancing model aims to take advantage of the agent’s characteristics to generate an autonomous system. It also has a distinct advantage over other agent-based load balancing models in terms of resource utilization. The performance evaluation shows that the proposed model can enhance the overall performance of grid computing.

Keywords

Grid computing job migration load estimation performance metrics grid agent resource utilization queue length cpu usage

1. Introduction

Grid computing has recently become one of the most vital research themes in the field of computing. The Grid paradigm has acquired popularity as a result of its capability to give facile access to geographically distributed resources functioning through various administrative domains. The grid environment looks as a combination of heterogeneous, dynamic, and shared resources in order to give reliable and faster access to Grid resources [7].

The main aim of load balancing is to enhance the application response time by which load would be distributed according to resources. This is even more vital in computational Grid where the main concern is to equitably submit jobs to resources and to reduce the difference between the overloaded and underloaded resources [7].

Multi-agent systems offer promising features for resource managers. The reactivity, proactivity, scalability, cooperation, robustness, flexibility and autonomy that characterize agents can help in the complex task of managing resources in dynamic and changing environments.

This paper presents a new Agent Based load balancing Model, called ABLBM. A hierarchical architecture with coordination is designed to ensure scalability and efficiency. In addition, a multi-agent approach is applied to improve the adaptability. This study aims to decrease the average response time of jobs submitted to the gridand maximize resource utilization whenever possible. The reminder of the paper is organized as follows: Section 2 briefly reviews some related works; Section 3 defines the proposed agent-based load balancing model and Section 4 describes the proposed load balancing algorithms; experimental results are presented in Section 5; finally, Section 6 summarizes the main contributions of this paper with a summary on future directions.

2. Literature review

The authors [5] proposed Agent Grid Load balancing model (AGLBM) by analyzing the load of compute nodes and the subsequent migration of virtual machines from overloaded nodes to underloaded nodes. The proposed system involves multiple nodes that interact to implement MapReduce jobs. The multi-agent system consists of a group of agents: node sensor agent, simulation model sensor agent, analysis agent, migration agent and distribution agent. Analysis and distribution agents are defined as reasoning agents. Sensors and migration agent are reactive agents. The sensor agent collects the necessary information on the status of the node. This information contains: node loading, communication line loading. When collecting information, use performance counters. The sensor agent observes the evolution of the state of the simulated model objects located on the node. As information that the sensor agent sends as output information, the frequency of the event, the frequency of receiving and sending messages from the poles, the frequency of the state change. The analysis agent interviews the analysis agents at certain intervals, decides, using the rules, whether balancing is necessary. If this is the case, the distribution agent is communicated.

The distribution agent is the source of “knowledge” about the surrounding environment (neighboring nodes, statistical data on neighboring nodes) for the analysis agent. This knowledge is proposed to clarify the rules according to which the agent decides on the need for balancing. The broker agent controls with all agents and controls the functioning of the nodes, when the node has completed its part of the work. The broker-agent takes some of the work from the other node and mixes it with a free node. This ensures that the entire node is still being processed. The rule-based distribution agent selects the model objects to be transferred and chooses the target network nodes. The distribution agent, when selecting objects for transfer to other nodes, refers to neighboring distribution agents. Then, the distribution agents are synchronized, which makes it possible to determine the main agent. After the execution of the synchronization algorithm, the lead agent completes the simulation process by sending a message corresponding to the simulation system. The agents then request the objects necessary for the system. The migration agent moves the objects to other nodes. The migration agent sends the objects from the simulation model to the distribution agents of the target nodes. Once the migration is complete, the distribution agents are synchronized again and the simulation system is launched. The modelling process is ongoing. In this study, system performance has not been studied, the implemented Hadoop platform is also simple, and the experimental installation of Hadoop may not be optimal and the results may be misleading to some extent.

In the study of [10], a distributed resource allocation protocol model (dRAPM ) is proposed to assign and schedule tasks on a distributed grid. Using the properties of multi-agent systems, the proposed distributed resource allocation protocol (dRAP) is described as follows: An agent in the system is simply a node. Each agent has a vector including the number of CPUs in its cluster and the residual time to complete the execution of its current process. Each agent is assured to be in exactly 1 out of 4 cases during the simulation.

In the first case, an agent has no assigned jobs to it and it is not part of a cluster. In this case, the agent analyzes the queue, it takes into account the CPU ${}_{\text{req}}$ resource requirements of the non-assigned jobs, and it manages the job that minimizes the following equation: $|$ CPU ${}_{\text{req}}-$ 1 $|$ .

In the second case, an agent has been assigned jobs and it is not part of a cluster.In this case,the agent continues executing the job and updates its information vector. If the job requirements are not totally satisfied, the agent will query its neighbors and attempt to form a cluster such as: CPU ${}_{\text{req}}=$ CPU ${}_{\text{cluster}}$ . When the agent terminates the execution of the job, it returns to first case.

In the third case, an agent is currently part of a cluster and has no assigned jobs.In this case the agent scans the queue, it considers the unassigned jobs and it takes on the job which minimizes the equation: $|$ CPU ${}_{\text{req}}-$ CPU ${}_{\text{cluster}}|$ .

In the fourth case, an agent has been assigned a job and it is currently part of a cluster. In this case,the agent continues the execution of the job and updates its information vector. When the job is completed, the agent disconnects from the cluster and returns to first case.

A main feature of this algorithm is that nodes ask their neighbors to form clusters. This reduces waiting time and communication costs. One optimization to consider would be to delay the disconnection of the cluster in state 4, which would guide learning or memory in the system where the planner would be able to remember the requirements of the past process. The problem with this algorithm is its decentralized nature, it is neither a centralized control nor a precise synchronization on nodes (agents).

The main contribution of this study [9] is the development of an agent-based model for managing network resources with defined operations so that the user can perform jobs efficiently and effectively and thus significantly improve management by a gLite Grid middleware. The agent-based resource management model (ABRMM) provides a platform based on a collection of agents in a virtual organization. The key aspects of this proposal architecture are: resource tracking, load balancing and agent hierarchy. The agents that make up this platform are defined as the agent user interface (AUI), scheduler agent (SA), resource search agent (RSA) and monitoring agent (MA). AUI collects basic information on the type of work resource requirements or special requests for resources using a web interface to communicate with the user. The SA is responsible for planning the execution of work with appropriate resources. He contacted RSA for a list of available resources. He finds the best resource according to the demands and availability of the machines. RSA gives a list of resources, RSA makes the link between uses and resources. Then, the RSA functionality switches to the WM gLite. The user can choose the resource or allow it to be executed in a transparent manner while respecting the established criteria. The MA is responsible for monitoring the status of resources to provide online information on availability and, at the same time, it controls the tasks performed by each task. The MA multi-agent system interacts through a communication interface, successfully maintaining the discovery of current information services and updating the gLite middleware.

In this paper [4], the authors proposed a new load balancing structure based on the moving agent and a technique for optimizing ant colonies. In the proposed Ant Colony Optimization Model (ACOM), a dispatcher agent is involved in distributing the tasks received to the worker agents according to the right decisions in order to minimize the overall execution time (makespan). The proposed framework is constructed using three layers which are the producer of user tasks, the scheduling load balancing layer and the workers’ layer. This study should be complemented by comparing their results with other methods, minimizing task movements and resulting in additional costs in the migration process.

Another study [6] presented the design and implementation of a priority based fuzzy load balancing model (PBFLBM) in a computing grid. In this grid template, the user sends his jobs to the grid agent, after the grid scheduler uses the priority-based scheduling algorithm to schedule jobs from the grid agent to the available resource. Load balancing is done using the fuzzy logic technique Propose, in which a set of fuzzy rules are produced using the resource and the work parameter. As fuzzy control rules are collected using linguistic variables, perceptual knowledge and inspection are easily integrated into the control mechanism.

3. Proposed model

3.1 Grid model

A Grid computing was modelled as a collection of clusters proposed in a multi-node platform. Each cluster was composed of nodes and belonged to a LAN local domain (Local Area Network). Every cluster was connected to the WAN global network (World Area Network) by a Switch [2].

The proposed load balancing model was based on mapping the Grid architecture into a tree structure. This tree was built by aggregation as follows: First, for each cluster, a two level subtree was created. The leaves of this sub-tree correspond to the cluster nodes, and its root, called cluster manager, represents a virtual node associated with the cluster. Secondly, sub-trees corresponding to all clusters were collected to generate a three level sub-tree whose root is a virtual node designated as a Grid manager. The concluding tree is referred to as C/N, where C is the number of clusters that constitute the Grid and N the number of worker nodes [2].

3.2 Agent based load balancing model

This study aims to develop a hierarchical load balancing model based on a multi-agent system, in addition to previous work on agent-based load balancing. There are two key challenges for Grid computing: heterogeneity and scalability. The authors propose a three-layer architecture to address the scalability issue (Fig. 1). Connecting or disconnecting resources (worker nodes or clusters) correspond to simple operations in a tree (adding or removing leaves or sub-trees). The proposed agent based load balancing model aims to take advantage of the agent’s characteristics to create an autonomous system.

Figure 1.

Proposed agent based load balancing model in grid.

In the proposed model, three levels are identified. The first is Level 0, at this level we find Grid Agent, the Grid users send their jobs to the Grid Agent. The latter is responsible for receiving jobs from Grid users and sending them to Node Agents. When a node fails, Grid Agent stops it and restarts it on a different set of nodes. Also, the Grid Agent starts up all Cluster Agents.

The second level is Level 1. At this level, Cluster Agent is associated with a physical grid cluster; this Agent is responsible for the maintenance of the load information relating to each of its Node Agents. With such information, the Cluster Agent estimates the load of the associated cluster and distributes information to other Cluster Agents. Moreover, Cluster Agents take the decision to start local load balancing and sending these decisions to Migration Agent, also Cluster Agent initiates a global load balancing process. Then, the Cluster Agent starts up its associated Migration Agent and all Node Agents of nodes under its control.

In addition, the Migration Agent is present at level 1. Its role is to start the migration process, by sending the migration decisions to the Node Agent, waiting for an acknowledgment from the receiver node, and ensuring that the migrated jobs are received and successfully resumed at the destination node.

The third level is Level 2. At this level, Node Agents and LBC Agents are present. It is necessary to have one Node Agent on each node. Each Node Agent at this level is responsible for receiving jobs sent by Grid Agent and maintaining its load information. Next, it sends this information to its associated Cluster Agent. Likewise, Node Agent works in cooperation with the Migration Agent to execute the migration process. The AgentNode starts up all its associated LBC Agents.

At Level 2, the LBC Agent is also present. It resides at each node and its role is to collect information about the jobs (number of jobs queued at node, arrival time, waiting time, submission time, start time, processing time and finish time of each job on the local node). It is concerned also with removing the terminated or migrated jobs from the queue of jobs, and calculating a total load of nodes and sending it to its related Node Agent.

3.3 Algorithms

According to the proposed model, two levels of load balancing are considered: Intra-cluster Agent based load balancing algorithm and Inter-Clusters Agent based load balancing algorithm.

There are certain specific events that change the load configuration in Grid computing and can be classified as follows: i) any new job is arrived, ii) accomplishment of execution of any job, iii) any new node is arrived, iv) any existing node is removed, v) failure of Machine at any node, vi) the node become overloaded.

When any of these events happen, the local load value is changed. Table 1 summarized the notations used for the proposed algorithms.

Table 1
Notations used

Parameter	Description
N	Node
Load ${}_{N}$	Load of node
Qlength	Queue length of node
CPU-U	CPU utilization of node
Mem	Memory utilization of node
TH ${}_{H}$	The higher threshold
TH ${}_{L}$	The lower threshold
OLD-list	Overloaded list
ULD-list	Underloaded list
BLD-list	Balanced list
Load ${}_{\text{avg}}$	Average load
NBR ${}_{N}$	Number of nodes of cluster
C	Cluster

3.3.1 Intra-cluster agent based load balancing algorithm

Depending on its current load, each Cluster Agent decides to start a Job Migration operation. In this case, the Cluster Agent tries, in priority, to balance its load among its nodes.

3.3.1.1. Load estimation

The node load at a given time was simply described by the CPU queue length. It indicates the number of processes awaiting execution. The proposed algorithm considers CPU-U (CPU Utilization), Q length (Queue length) and Mem (memory utilization) as load information parameters to measure the load of a node.

These parameters are calculated as follows:

$\displaystyle\text{Load (CPU-U)}=(U_{1}+U_{2}+\ldots+U_{T})/T,$ (1)

where $U_{1}$ , $U_{2}$ , $\ldots$ , $U_{T}$ is the value of CPU-U in a previous one second interval.

$\displaystyle\text{Load (Qlength)}=(Q_{1}+Q_{2}+\ldots+Q_{T})/T,$ (2)

where $Q_{1}$ , $Q_{2}$ , $\ldots$ , $Q_{T}$ is the value of Qlength in a previous one second interval.

$\displaystyle\text{Load (Mem)}=(M_{1}+M_{2}+\ldots+M_{T})/T$ (3)

where $M_{1}$ , $M_{2}$ , $\ldots$ , $M_{T}$ is the value of Mem in a previous one second interval. T is the number of time intervals.

The averaged information of CPU-U, Qlength and Mem are the load parameters used to describe the node load.

Algorithm 1 is LBC Agent deployed in the proposed framework.

Algorithm 1. An algorithm of LBC Agent
1: T $\leftarrow$ 5 seconds
2: Waiting for jobs;
3: Create jobs queue for related node;
4: In each one second of T intervals do
5: Calculate (CPU-U);
6: Calculate (Qlength);
7: Calculate (Mem);
8: End do
9: Load (CPU-U) $=$ ( $U_{0}+U_{1}+\ldots U_{T}$ )/ $T$ ;
10: Load (Qlength) $=$ ( $Q_{0}+Q_{1}+\ldots Q_{T}$ )/ $T$ ;
11: Load (Mem) $=$ ( $M_{0}+M_{1}+\ldots M_{T}$ )/ $T$ ;
12: Wait for load change//happening of any of defined events
13: If (events_happens () $=$ 1 or events_happens () $=$ 4) then // Termination or migration of job
14: Remove terminated or migrated job from the waiting queue
15: Subtract their load value from the total local load of node.
16: Send new load to its Node Agent associated;
17: End if
18: If (events_happens () $=$ 2 or events_happens () $=$ 3) then // new or incoming job
19: Add the newly created or incoming job for the waiting queue
20: Add their load value for the total local load of node
21: Send new load to its Node Agent associated;
22: End if

Function events_happens ()
output Type: integer
1: If (Job.state $=$ Termination) then events_happens () $=$ 1; End If
2: If (Job.state $=$ Start) then events_happens () $=$ 2; End If
3: If (Job.state $=$ Incoming Migrating) then events_happens () $=$ 3; End If
4: If (Job.state $=$ migrated) then events_happens () $=$ 4; End If
5: If (Arrival of any new resource) then events_happens () $=$ 5; End If
6: If (Cluster.state $=$ saturated )then events_happens () $=$ 6; End If
7: If (Cluster.state $=$ unbalanced) then events_happens () $=$ 7; End If

Algorithm 2 is Node Agent deployed in the proposed framework.

Algorithm 2. An algorithm for Node Agent
1: Startup its related LBC Agent
2: Receive load information from its LBC Agent
3: Send load information for related Cluster Agent
4: Perform the job migration process
5: Receive jobs from its related Cluster Agent

Figure 2.

Activity diagram for AgentLBC Algorithm.

3.3.1.2. Decision making

In the next step, the nodes must be classified according to their load. Three states were used for classification: overloaded, underloaded and balanced. First, Cluster Agent must calculate two threshold values, which are calculated as follows:

Cluster Agent calculates load average of each parameter (CPU-U and Qlength) over all related nodes.

$\displaystyle\text{Load}_{\text{avg}}(\text{Qlength})=(\text{load}_{1}+\text{% load}_{2}+\ldots\text{load}_{\text{NBRN}})/\text{NBRN},$ (4)

where $\text{Load}_{\text{avg}}(\text{Qlength})$ is the average load of Qlength over all related nodes. load ${}_{1}$ , load ${}_{2}$ , $\ldots$ load ${}_{n}$ are the current Qlength of each node calculated by LBC Agent.

$\displaystyle\text{Load}_{\text{avg}}(\text{CPU-U})=(\text{load}_{1}+\text{% load}_{2}+\ldots\text{load}_{\text{NBRN}})/\text{NBRN},$ (5)

where $\text{Load}_{\text{avg}}$ (CPU-U) is the average load of CPU-U over all related nodes. load ${}_{1}$ , load ${}_{2}$ , $\ldots$ load ${}_{\text{NBRN}}$ are the current load of CPU-U of each node calculated by LBC Agent.

Figure 3.

Activity diagram for AgentCluster algorithm.

Algorithm 3. An algorithm for Cluster Agent
1: Startup its related Node Agents
2: Startup its related Migration Agent
3: Receive load information(Load ${}_{N}$ (Qlength), Load ${}_{N}$ (CPU-U)) from its related nodes.
4: Calculate and send its load information for other Cluster Agents.
4: Receive and save load informations from other clusters
5: somme $\leftarrow$ 0; somme1 $\leftarrow$ 0;
6: For every Node N of cluster C do
7; Somme $\leftarrow$ Somme $+$ Load ${}_{N}$ (Qlength);
8; Somme1 $\leftarrow$ Somme1 $+$ Load ${}_{N}$ (CPU-U);
9: End For
10: Load ${}_{\text{avg}}$ (Qlength) $=$ somme1/NBR-N;
11: Load ${}_{\text{avg}}$ (CPU-U) $=$ somme/NBR-N;
12: TH ${}_{H}$ (Qlength) $=$ Load ${}_{\text{avg}}$ (Qlength)*H;
13: TH ${}_{L}$ (Qlength) $=$ Load ${}_{\text{avg}}$ (Qlength)*L;
14: TH ${}_{H}$ (CPU-U) $=$ Load ${}_{\text{avg}}$ (CPU-U)*H;
15: TH ${}_{L}$ (CPU-U) $=$ Load ${}_{\text{avg}}$ (CPU-U)*L;
16: Partition Nodes into overloaded list OLD-list, underloaded list ULD-list and balanced list BLD-list
17: OLD-list $\leftarrow\emptyset$ ; ULD-list $\leftarrow\emptyset$ ; BLD-list $\leftarrow\emptyset$ ;
18: For every Node N of cluster C do
19: If ((Load ${}_{N}$ (Qlength) $>$ TH ${}_{H}$ (Qlength)) or (Load ${}_{N}$ (CPU-U) $>$ TH ${}_{H}$ (CPU-U)) or (Load(Mem) $>$ 85%)) then
20: OLD-list $\leftarrow$ OLD-list $\cup$ N;
21: End If
22: Else If ((Load ${}_{N}$ (Qlength)) $<$ TH ${}_{L}$ (Qlength))or (Load ${}_{N}$ (CPU-U) $<$ TH ${}_{L}$ (CPU-U))) then
23: ULD-list $\leftarrow$ ULD-list $\cup$ N;
24: Else BLD-list $\leftarrow$ BLD-list $\cup$ N;
25: End If
26: End For
27: Sort OLD_list by descending order relative to their LoadN(Qlength).
28: Sort ULD_list by ascending order relative to Their Load ${}_{N}$ (Qlength).
29: If (events_happens () $=$ 7) then//cluster is unbalanced
30: While (OLD-list $\neq\emptyset$ . AND. ULD-list $\neq\emptyset$ ) do 31: For $i=$ 1 To ULD-list.Size() do
32: send the decision of migration for Migration Agent (with address of first sender node of OLD List and i ${}^{\rm th}$ receiver node of ULD-list);
33: If an Acknowledgment received from Migration Agent
34: Update the current Load ${}_{N}$ of receiver and sender nodes
35: Update OLD-list, ULD-list and BLD-list;
36: Sort OLD-list by descending order of their Load ${}_{N}$ ( Qlength).;
37: End For

3.3.1.3. Calculation of threshold values

The higher and lower threshold values of Qlength and CPU-U of parameters are calculated by multiplying the average load of (Qlength or CPU-U) and a constant value.

$\displaystyle\text{TH}_{H}(\text{Qlength})=\text{H*Load}_{\text{avg}}(\text{% Qlength})$ $\displaystyle\text{TH}_{L}(\text{Qlength})=\text{L* Load}_{\text{avg}}(\text{% Qlength})$ (6) $\displaystyle\text{TH}_{H}(\text{CPU-U})=\text{H*Load}_{\text{avg}}(\text{CPU-% U})$ $\displaystyle\text{TH}_{L}(\text{CPU-U})=\text{L* Load}_{\text{avg}}(\text{CPU% -U})$

where TH ${}_{H}$ is the high threshold and TH ${}_{L}$ is the low threshold. $H$ and $L$ are constants.

The next step is to divide the nodes for balanced, overloaded and underloaded nodes using the threshold values.

The node will be added for overloaded list if queue length is high, or CPU utilization is high, or memory usage is greater than 85%, then the node is classified as overloaded node.

The node will be added for underloaded list if queue length is low, or CPU utilization is low.

The node is not into the overloaded list or the underloaded list. The node is in a balanced load state. They are considered to be more loaded than the low state and less loaded than the high state.

Algorithm 3 is Cluster Agent deployed in the proposed framework.

3.3.1.4. Job migration

After classifying the nodes, in the next step Cluster Agent decide to transfer jobs from overloaded to underloaded nodes. It sends this decision for Migration Agent.

Algorithm 4 is Migration Agent deployed in the proposed framework.

Algorithm 4. An algorithm for Migration Agent
1: Receive decision of migration from its related Cluster Agent.
2: Sending the migration decisions to the Node Agents of sender and receiver node.
3: Wait for an Acknowledgment from the Node Agent of receiver node.
4: Send an Acknowledgment for its related Cluster Agent

Figure 4.

Activity diagram for AgentMigration algorithm.

3.3.2 Inter-cluster Agent based load balancing algorithm

This algorithm applies a global load balancing among all clusters of the Grid. The Inter-cluster load balancing at this level is made if Cluster Agent fails to balance its load among its associated nodes. In this case the Cluster Agents transfers Jobs to under loaded clusters based on the load information received by clusters. the following algorithms are proposed:

Algorithm 5 is Cluster Agent deployed in the proposed framework.

Algorithm 5. An algorithm for Cluster Agent
1: if (events_happens () $=$ 6) then//cluster is saturated
2: Create underloaded_clusters_table;
3: While (underloaded _clusters_table $\neq\Phi$ ) Do
4: Sort the clusters Cr of underloaded _clusters_table by ascending order of inter clusters (Ci-Cr) WAN bandwidth sizes.
5: Sort nodes of saturated cluster by descending order of their load
6: Sort Jobs of first node of saturated cluster by FCFS algorithm and communication cost
7: Migrate the selected job from the first node of saturated cluster to jth cluster of underloaded_clusters_table
8: Update ULD_clusters_table, and its load.

Algorithm 6 is Grid Agent deployed in the proposed framework.

Algorithm 6. An algorithm for Grid Agent
1: Startup all Cluster Agents
2: Receive jobs from grid user
3: Send jobs for Cluster Agents

Figure 5.

Activity diagram for AgentCluster algorithm.

The last algorithm is implemented in Cluster Agent which determines the way a receiver cluster is selected for a job migrated from overloaded cluster. Cluster Agent of Saturated Cluster calculates the minimum communication cost of sending jobs to receiver underloaded clusters based on the information collected in the last exchange interval. Cluster Agent selects the cluster that gives minimum overall cost.

3.4 Agents interactions

Figure 6.

UML Sequence Diagram describes agent interactions in intra cluster load balancing process.

The proposed agent based load balancing algorithm is intended to take advantage of the agent’s characteristic to create a self-adaptive and self-sustaining load balancing system. The proposed system consists of five types of agents, in unbalanced situations, and if the Cluster Agent finds that there is a load imbalance between the nodes under its control, it receives the load information from each Node Agents. On the basis of this information and the estimated equilibrium threshold, it analyses the current load of the cluster. Depending on the result of this analysis, it decides whether to start a local balancing in case of an unbalanced state, or simply inform the other Cluster Agents of its current load. Node Agent sends the updated local load value to Cluster Agent, which updates its load information. The local node load is calculated by the LBC Agent residing at each calculation node. LBC Agent creates the task queue at the local node and updates it if necessary, and sends the Node Agent load information based on the defined events. Migration Agent is responsible for migrating jobs to the selected underloaded node. There is an Migration Agent in each Cluster, who expects an acknowledgement of receipt from the receiving node once it receives the migrated job. The Migration Agent ensures that the work is successfully received and resumed or started at the destination node. The last agent is Grid Agent, it is the role of the distribution of work between clusters, all Cluster Agents are started by this type of agents.

4. Experimental environment

4.1 Workload

The complex data set was modelled from the national Grid of the Czech Republic’s MetaCentrum [11], which allowed to carry out very realistic simulations. It also provides information on machine failures and specific work requirements and this information influences the quality of solutions generated by scheduling algorithms. The job description includes (job ID, user, queue, number processors used, etc.).

The cluster description also includes detailed information such as RAM size, CPU speed, CPU architecture, operating system and list of supported properties (allowed queue(s), cluster location, network interface, etc.) In addition, the information machines were under maintenance (failure/restart). Finally, the list of queues containing their time limits and priorities is provided.

The proposed algorithms were implemented in Alea 2 (Job Scheduling Simulator based on GridSim) [3]. Alea 2 is an event-based simulator used to test planning algorithms under various conditions (different types of resources and dynamic task modifications in the environment, etc.). This simulator is based on GridSim Toolkit [8] and represents an extension that includes better tools for scheduling the implementation visualization skill and higher simulation speed.

4.2 Experimental results

To evaluate the effectiveness of the proposed algorithms, an experimental environment using Alea 2 as a grid simulator and JADE (Java Agent DEvelopment Framework) for agent implementation was set up. In the proposed infrastructure, management agents can communicate in a grid environment using the Jade agent platform. In addition to Alea 2, a class library was developed that simulates the activities of an agent platform. This library, called ABLB (Agent based load balancing), includes the classes: GridAgent, ClusterAgent, LBCAgent, MigrationAgent and NodeAgent. The simulation is initialized by the GridAgent class which creates instances of resources, jobs and other entities as required by the GridSim standard. GridAgent reads information describing the grid resources and jobs from a data file, reads the jobs from the data_set file and dynamically produces the jobs instances over time. each job (gridlet) attribute such as Gridlet ID, Gridlet length and input file size and output file size. Figure 7 specifies the Grid Resource parameters such as: resource ID, resource’s CPU speed and resource’s memory capacity. Next, GridAgent lists all the available grid resources within the grid environment. When the simulation time is equal to the job submission time, the GridAgent starts the ClusterAgent and dynamically sends the jobs created for the Node Agents over time.

Figure 7.

Grid resource characteristics.

On the basis of its own load and the estimated balance threshold, ClusterAgent analyzes the load state of the cluster. For an imbalance state, ClusterAgent determines the overloaded nodes (sources) and the under loaded ones (receivers), depending on their load information by using the threshold values. Decision making algorithm results are shown in Fig. 8.

Figure 8.

Result of decision making algorithm.

After classifying the nodes, in the next step ClusterAgent decide to migrate jobs from overloaded to under-loaded nodes, the proposed algorithm works in pair of node (sender and receiver), the first pair has the most overloaded node and the must under loaded node, second pair has the second most overloaded node and the second most under-loaded and so on. in the proposed example the pair of node (Resourceid21, Resourceid17), the first submitted job is selected to be migrated.

4.3 Performance evaluation

The important performance factors in estimating the proposed algorithm are: scalability, the number of jobs waiting in queues and maximizing resource utilization. Some experiments were carried out to evaluate the efficiency and performance of the proposed algorithm.

4.3.1 Experimentation 1

In the first experimentation, the focus was on the number of agents that the system can stabilize before it becomes out of control.

Figure 9.

Average response time (seconds) according to various Agentscluster numbers.

Figure 9 shows that the limit of the proposed algorithm is about 100 Cluster Agents. The simulation takes only a few minutes, but it creates a huge message. This is simply caused by the number of agents participating in the system. This problem increases as new agents are added, but around 1Grid Agent, 100 Cluster Agents, 100 Migration Agents, 462 Node Agents, 462 LBC Agents, the system still works for the most part correctly.

The hierarchical load balancing model solves scalability problems: a larger number of agents in the hierarchy will lead to greater activity without the intervention of a central agent. In this model, the system adapts correctly.

4.3.2 Experimentation 2

In the second experimentation, the use of resources was the main focus (%). The number of clusters was assumed to be 14, and each cluster was considered to be composed of different numbers of resources. The number of jobs was 3000. Figure 10 shows the use of the cluster with and without ALBA algorithm.

Figure 10.

Comparison of cluster utilization (%) with and without Agent Based Load Balancing using 14 clusters.

Figure 10 shows that the agent based load balancing algorithm is more effective in maximizing resource utilization.

The ALBA algorithm allows job to be scattered over the most available resources when there was no appropriate resource, unlike other traditional algorithms that try to select the best resource that resembles the work requirements; otherwise, the job will remain in the global queue, indicating an underutilization of those resources.

4.4 Comparison with related works

In this section we will show the differences and similarities between our proposed model and the existing models in [5, 4, 9, 6, 10].

Table 2
Comparison between the proposed Agent Based Load Balancing model and other models

Agent based models	Heterogeneity of grid resources	Scalability	Centralised/ decentralised/hierarchical	Performance metrics improvement	Software	Drawback
ABLBM	Heterogeneous	High	Hierarchical	Resource Utilization, Scaling Properties, Response Time	Alea 2 simulator $+$ JADE	Communication Costs
AGLBM [5]	Heterogeneous	Low	Hierarchical	System Performance was not studied	JADE	Instability, Non-Scalability, No Fault Tolerance
dRAPM [10]	Heterogeneous	High	Decentralise-d	Scaling Properties, Latency Costs	MASON	Security, Information Gathering Overhead, No Fault Tolerance
ABRMM [9]	Heterogeneous	Medium	Hierarchical	Resource Availability	Simevents Matlab	Communication Costs
ACOM [4]	Heterogeneous	Low	Hierarchical	Execution Time	JADE	Migration Costs, No Fault Tolerance
PBFLBM [6]	Heterogeneous	Medium	Hierarchical	Response Time, Resource’s Capacity	Gridsim Simulator	Imprecision Non-reliability

In Table 2, we compared some current Agent Based Load Balancing Models for grid computing environments..

Because all existing models which we considered, have been proposed for grid computing environments. Therefore all of them have applied dynamic approach which is suitable for systems like grid computing that have dynamic, unpredictable and various workloads. In addition, most of the proposed models which we considered, have been designed in hierarchical and heterogeneous forms. So, we can say that all of them are not fault tolerant solutions and this would be a main drawback for most of them. Also we observed that all of these existing models in this study are scalable.

After reviewing the existing models comprehensively, it can be stated that different models have considered different metrics for evaluation. Some of the papers have considered a single performance metric such as [9, 4] while some have considered multiple objectives for metrics such as [6, 10]. Those existing models do have limitations that need to be addressed. Thus there is need of an algorithm which can offer maximum resource utilization, maximum throughput, minimum response time, dynamic resource scheduling with scalability and reliability. The proposed model address most above issues, it supports heterogeneity, scalability and dynamics of grids. Moreover, it decreases the average response time and maximizes resource utilization.

There have been a number of efforts at developing Agent based Load Balancing Models for grid computing. It is often difficult to make comparisons between different efforts because each Agent Based Load Balancing Model is usually developed for a particular system environment or particularly greedy application with different suppositions and restrictions.

Load balancing models are generally multi-objective such as enhancing performance and reducing operational cost. Therefore, an optimal trade-off between various contradictory objectives needs to be preserved.

5. Conclusion

The algorithms proposed under the Alea 2 simulator written in Java were developed to test and estimate the performance of the load balancing model based on the proposed agent. Experimental results show that the proposed model allows a better balance of load and the correct use of resources.

There are many opportunities to improve resource utilization and reduce response time through coordination and cooperation among agents. Therefore, the proposed model supports heterogeneity, scalability and dynamics of grids. In addition, a multi-agent architecture for grid load balancing was suggested, as well as a job migration technique to reduce the difference between overloaded and underloaded nodes. Finally, to estimate node load, the combination of CPU usage, memory usage and queue length was applied.

The proposed model ABLBM has some unique features:

•

It is hierarchical, which facilitates the circulation of information through the tree and defines the flow of messages in the proposed model.

•

The model also provides fault tolerance through its self-configurability. Nodes leaving the system will be replaced by new nodes, also the tasks will be reallocated to other nodes when a node leaves without finishing its current task.

•

Furthermore, it supports the scalability of grids: insertion or elimination entities (nodes or clusters) are very simple operations in the proposed model (insertion or elimination nodes, subtrees.

•

In addition, the proposed model considers the heterogeneity of resources; thus, it is completely independent of any physical network architecture.

•

The proposed model considers the heterogeneity of resources so that it is completely independent of any physical network architecture.

•

It supports flexibility and expandability: various intelligent agents have been deployed to reduce system complexity by modularization. Moreover, it is easy to modify its components, and add more features and functions to it.

However, the problems of the model implemented, there is no certainty that migrating jobs will resume in the reception node. The sender node does not keep a copy of the job until it is left at its new receiver node. Other solutions must be found to offer more reliability for migrating jobs. Moreover, the time required to complete a migration process is not explicitly calculated.

Thus, based on the promising results of these experiments, the authors aim to compare the results obtained with other agent-based load-balancing algorithms in terms of performance metrics. Therefore, the authors will consider the cost of negotiation between agents. The authors will also use a mobile agent for load balancing and improve and use the proposed model in real grid environments

Footnotes

Authors’ Bios

Ali Wided is a Ph.D. student at the Faculty of computer science, university of biskra, Algeria. She received her master’s degree in Computer Systems Engineering (2011) from University of tebessa, Algeria, She worked as an associate professor at computer science department of tebessa university. Her main research interest include load balancing, grid computing and multi agents systems.

Pr. Okba Kazar was born at Biskra (Algeria) in 1962. He obtain his magister degree in 1997 from the Constantine University (Algeria) working on artificial intelligence field. He obtained his PhD degree from the same university in 2005. He is member of editorial board of some Journals. He is author of some publication in international journals and communication in international conference. He participate as session chair in international conferences, he also publish a book “Manual d’Intelligence artificielle”. He is interested to the multi-agents systems and their applications, PHM in medical and industrial fields, ERP, advanced information systems, Web services, semantic Web, bigdata, internet of things and cloud computing. Actually, Okba KAZAR is full professor at computer science department of Biskra University and director of intelligent computer science laboratory (LINFI).

References

Yagoubi

and Slimani

, Load Balancing Strategy For Grid Computing, Journal of Information Technology and Applications 1(4) (March 2007), 285–296.

Yagoubi

and Meddeber

, Distributed load balancing model for grid computing, ARIMA Journal 12 (2010), 43–60.

Klusáček

and Rudová

, Alea 2 job scheduling simulator, In Proceedings of the 3rd International Conference on Simulation Tools and Techniques, 2010, Torremolinos, MalaMA, Spain.

younes

et al., New load balancing Framework based on mobile AGENT and ant-colony optimization technique, Conference Intelligent Systems and Computer Vision (ISCV), 2017, Fès, morocco.

Satymbekov

M.N.

Pak

I.T.

Naizabayeva

and Nurzhanov

Ch.A.

, Multi-agent grid system Agent-GRID with dynamic load balancing of cluster nodes, Open Engineering 7(1) (December 2017), 485–490.

Rathore

, Efficient agent based priority scheduling and load balancing using fuzzy logic in grid computing, I-Manager’s Journal on Computer Science 3(3) (2015), 11–22.

Rathore

and Chana

, Job Migration Policies for Grid Environment, Wireless Personal Communication, Springer Publication-New-York (USA) 89(1) (March 2016), 241–269.

Buyya

and Murshed

, Gridsim: A toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing, The Journal of Concurrency and Computation: Practice and Experience (CCPE) 14 (2002), 13–15.

Suros

and Serrano

J.F.

, Communication complexity in high-speed distributed computer network in an agent based architecture for grids service Ray Tracing View project, International Journal of Advanced Computer Research, 22 February 2018, Vol 8(35) ISSN (Print), 2249–7277.

10.

Banerjee

and Hecker

, Multi-agent system approach to load-balancing and resource allocation for distributed computing, First Complex Systems Digital Campus World E-Conference, 2015.

11.

http://www.cs.huji.ac.il/labs/parallel/workload/l_metacentrum/.