Multi-objective hybrid optimized task scheduling in cloud computing under big data perspective

Abstract

The new and rising paradigm of cloud computing offers customers various possibilities of task computation based on their desires and choices. Customers receive services from cloud computing systems as a utility. Customers are enthusiastic about low-cost service availability and task completion times that are kept to be minimum. To achieve client fulfilment, the service provider must schedule the jobs to the right resources if the cloud server gets many user requests. The rapid growth in data volume necessitates petabytes processing of data each day. Unstructured, semi-structured, and structured data are all described in terms of their rapid growth and availability. In order to make correct and timely decisions, it must be processed appropriately. In this research, we present BWUJS (Black Widow Updated Jellyfish Search), a multi-objective hybrid optimization-based task scheduling algorithm. This work considers task generation from the Bigdata perspective. The clustering of tasks is performed via the Map Reduce framework with an Improved K-means clustering model. After task clustering, the task priority estimation is performed. Finally, the scheduling is performed via BWJSU based on certain constraints like priority, makespan, completion time, resource utilization, and degree of imbalance.

Keywords

Task scheduling big data BWUJS map reduce cloud computing

1. Introduction

Cloud computing [1, 2] is becoming more and more common in business, academia, and society due to the ubiquity of Internet connection and the big data’s increase in volume, velocity, and diversity through the Internet. Bigdata is produced by a variety of applications, including Facebook, Google, open-source websites, scientific research, business software, cloud computing, IoT devices, e-government software, bio-medical software, and many more. Even though it has a power and larger storage capacity than a typical organization, it nonetheless accurately depicts the data that has been systematically collected. Processing petabytes of data per day is necessary due to the quick increase in data volume [3]. Unstructured, semi-structured, and structured data are all described in terms of their rapid growth and availability. In order to make correct and timely decisions, it must be processed appropriately. As a result, processing and extracting this data is crucial to understanding the valuable insights it contains; this process is referred to as big data analytics.

Three different resource service types are offered by cloud computing [4, 5] as a business computing model: Paas, SaaS, and IaaS. Even though cloud computing focuses on numerous application programs and offers a variety of services, it still has resource and task management issues. Regarding the latter issue, a very important component of the relationship between user service quality and operating expenses, resource utilization rates, and cloud service stability is also present. As a result, the multi-objective task scheduling [6, 7] for cloud computing has a lot of theoretically significant and practical implications. Resources as well as their loads have a lot of dynamic and unpredictable components in the environment of cloud computing. For example, changes in time have an impact on the demand for resource nodes, and quarter, year, as well as holiday variations, also affect resource requests. These elements may also contribute to resource waste and a decline in service quality. Resources [8, 9] were wasted if the load of cloud resources was too low; on the other hand, if it is excessively high, the system’s service performance would be compromised. Task scheduling in cloud computing has been studied for the mentioned problems.

However, task scheduling is a significant issue in the cloud computing environment [10, 11] due to some factors, such as task completion time, the overall cost of finishing all users’ activities, power consumption, resource utilization, and fault tolerance. Although many task scheduling strategies including CR-PSO, SARO, EMVO, and so on are designed with many different optimization objectives, they frequently contribute to some common functional mechanism and make use of similar software engineering architecture when being put into practice. However, each scheduling method must have a new scheduling competency added individually, which is tedious, expensive, and error-prone. The ideal task-scheduling technique [12] cannot be determined by calculating all probable task-scheduling strategies for a secure cloud. Since task scheduling is an NP problem, it is not realistic to calculate all feasible task-scheduling techniques and select the optimum one for a safe cloud. A heuristic method can iterate and improve until it finds an optimal solution that is infinitesimally near the ideal one. So, one of the better approaches to tackle this kind of issue is a heuristic methodology. In order to provide optimal task scheduling under big data constraints in the cloud, we suggest a novel algorithm. The main contributions of this research are given below:

1.
MapReduce framework is adopted to handle the big data, in which the improved K-Means clustering is introduced for the clustering of tasks.
2.
Optimal task scheduling is ensured by the introduced BWUJS optimization algorithm under the consideration of a set of constraints including priority, make span, completion time, resource utilization, and degree of imbalance.

The organization of the structure: A review of the state-of-the-art model is in Section 2. Task scheduling in the cloud computing system model is in Section 3. The suggested task scheduling model in the cloud is in Section 4. Proposed BWUJS validation is in Section 5. Section 6 includes the conclusion.
2. Literature review

In 2021, Xueying Guo [13] suggested a fuzzy self-defence algorithm-based multi-objective task scheduling optimization model for cloud computing. The three main objectives of multi-objective task scheduling in the cloud computing platform were least time selection, degree of resource load balancing, and cost of performing the multi-objective work. The target function for multi-objective task scheduling in cloud computing platforms is evaluated using a theoretical framework that is developed.

In 2021, Kalka Dubey and S.C. Sharma [14] developed the CR-PSO method, a new hybrid task scheduling technique, for the distribution of numerous independent tasks across the available VM. By combining the optimal schedule series features in which the processing of tasks was done based on the deadline and demand, the traditional PSO and CRO were enhanced for quality improvement for factors including makespan, energy and cost.

In 2021, Reza NoorianTalouki et al. [15]. To solve the task scheduling issue of the dependent tasks inside the heterogeneous cloud computing platform, a novel task priority strategy has been presented, along with the application of task duplication methods. The current paper is unique in that it introduces a novel task scheduling mechanism, together with a new way for task prioritization and the use of practical task duplication tactics. This study employs the OCTd and OCTu processes to efficiently sort tasks into a list, and then it applies the HEFT-duplication method to accomplish task duplication, which greatly shortens the makespan.

In 2021, Wanneng Shu et al. [16] have suggested a SARO method based on the data centre’s peak energy use and the duration of task scheduling. The PDF of the task request queue’s overflow was explored to analyze the SARO model, and the suggested technique may be used to request a break in order to prevent network congestion from the task failure rate perspective.

In 2020, Sarah E. Shukri et al. [17] have suggested EMVO which is a better task scheduler for this work. The proposed EMVO is compared to the actual PSO and MVO frameworks in cloud platforms. The findings demonstrate that EMVO performs much better than both PSO as well as MVO algorithms for minimizing makespan time and maximizing resource consumption. The literature uses a variety of meta-heuristic methods, including MVO and PSO, for work scheduling in cloud environments.

In 2020, Longxin Zhang et al. [18] suggested an EPRD technique to reduce task scheduling time for process applications with precedence restrictions while still adhering to the end-to-end deadline requirement. Two processes make up this algorithm: a task priority queue is first created. A VM is then assigned to tasks based on its relative distance. The scheduling and VM utilization performance can be greatly enhanced using the suggested approach.

In 2020, Ding Ding et al. [19] suggested a QEEC with two stages: The M/M/S queuing technique, which assigns the incoming user requests to each server in a cloud, is implemented in the initial phase using a central task dispatcher. The second step begins with each server’s Q-learning assisted scheduler prioritizing all requests based on laxity and lifetime of the tasks. Then, it uses a policy that is always being updated to assign tasks to the VM, delivering incentives to encourage the assignments that can speed up task responses and utilize more CPU power on each server.

In 2020, S. Velliangiri et al. [20] developed HESGA to enhance task scheduling activity under the consideration of factors like load balancing, makespan, cost of multi-cloud, and resource utilization. The suggested approach combines the benefits of an electro-search and a genetic algorithm. The superior global optimal solutions are produced by the electro-search approach, whereas the better local optimal solutions are produced by the genetic algorithm.

Table 1
Pros and cons of existing algorithms

Author [citation]	Methods	Pros	Cons
Xueying Guo [13]	Fuzzy self-defence model	Improved completion time, resource utilization of VM, and deadline violation rate	Adaptable assignment planning problem
Kalka Dubey and S.C. Sharma [14]	CR-PSO	Cost-effective and less energy consumption	High power consumption, high turnaround time
Reza NoorianTalouki et al. [15]	OCTd, OCTu	Reduced makespan, superior speedup and SLR	Power management is challenging
Wanneng Shu et al. [16]	SARO	Improved throughput and network congestion are avoided	Optimal resource usage is challenging
Sarah E. Shukri et al. [17]	EMVO	Minimum makespan time and improved resource utilization	Premature convergence and lack of diversity
Longxin Zhang et al. [18]	EPRD	Minimized parallel task completion time	Enhancing security and energy saving was challenging
Ding Ding et al. [19]	QEEC	Energy-efficient and minimized task response time	With a number of service nodes, evaluating QEEC in a large-scale cloud environment was challenging
S. Velliangiri et al. [20]	HESGA	Provides good local optimal solutions	Less energy efficiency and less degree of imbalance

2.1 Problem statement

Users like the cloud computing model because of its powerful processing capacity and practical services. Research on cloud data centre scheduling in a multi-cloud setting is currently focused on the difficulties posed by peak business demand. The following are some of the difficulties that the current task scheduling techniques (Table 1) that we studied in this research encounter: In the Fuzzy self-defence model [13], the adaptable assignment planning problem was very challenging to handle. The challenging cases of CR-PSO [14] were High power consumption and high turnaround time. In OCTd as well as the OCTu model [15], the power management problem was challenging. Optimal resource usage is challenging while employing the SARO model [16]. Premature convergence and lack of diversity problems were increased while using the EMVO model [17]. Enhancing security and energy saving was challenging in the EPRD method [18]. With some service nodes, evaluating QEEC [19] in a large-scale cloud environment was challenging. Less energy efficiency and less degree of imbalance were faced by the HESGA model [20].

3. Task scheduling in cloud computing: System model

Figure 1.

Design of the task scheduling in cloud computing.

Let us consider the cloud $C$ with users $U_{i},i=\left\{{1,2,\ldots,N}\right\}$ , $N_{\textit{which }}$ denotes the total number of users. The tasks from the users $U_{i}$ are denoted a $T_{j},\,j=\left\{{1,2,\ldots,S}\right\}$ , $S_{\textit{which}}$ denotes the total number of tasks. The goal is to schedule the task to the resources i.e., available physical machine $\textit{pm}_{i},\,i=\left\{{\textit{pm}_{1},\textit{pm}_{2},\ldots\textit{pm}_% {M}}\right\}$ and virtual machine $\textit{vm}_{i},\,i=\left\{{\textit{vm}_{1},\textit{vm}_{2},\ldots\textit{vm}_% {N}}\right\}$ . The scheduling of tasks $T_{j}$ is based on the priority defined in this model. As the Bigdata concern is the current issue, this work focuses on handling the Bigdata in this aspect. The task scheduling layout in cloud computing is shown in Fig. 1.

4. Proposed task scheduling model in the cloud

Figure 2.

MapReduce frameworks for task clustering.

The proposed task scheduling is carried out in a manner that minimizes the completion time while managing the Bigdata. Here considering the dataset $D$ with one million data (task), each task $T_{j}$ is task is created by giving priority. The scheduling process is done by the new algorithm termed BWUJS by considering constraints like priority, makespan, completion time resources utilization and degree of imbalance. For handling the big data, the MapReduce framework (illustrated in Fig. 2) is incorporated and the steps followed in the proposed task scheduling are given below:

•

Task generation

•

Task clustering

•

Estimation of task priority

•

Scheduling

4.1 Task generation

Initially, the task $T_{j}$ is generated by considering the Bigdata with the size of one million data (task) as the dataset. The physical machine $\textit{pm}_{M}$ is set to 10 and the virtual machine $\textit{vm}_{N}$ variation is set between the range of (1000, 2000, 3000, 4000, and 5000).

4.2 Task clustering

In this phase, task $T_{j}$ clustering is done via the MapReduce framework with an improved K-Means clustering algorithm [21]. Figure 2 represents the MapReduce framework for task clustering.

4.2.1 Mapper phase (Improved K-Means clustering)

In the mapper phase, a pair (key, values) is created by the map function in which the cluster member is considered as key and the task identification number is considered as value. The primary objective of the improved K-Means algorithm is to iteratively cluster the points while minimizing the total distance between each point and its associated cluster centroid. Following are the steps for this clustering process:

Step 1: First $K$ object is selected as the initial cluster centre and then the Euclidean distance amongst cluster centre $m_{i}$ and task $T_{j}$ , and split the task $T_{j}$ with the nearest distance into a single group defined as in Eq. (1).

$\displaystyle S_{i}^{t}=\left\{{T_{j}:\left\|{T_{j}-m_{i}^{t}}\right\|^{2}% \leqslant\left\|{T_{j}-m_{j}^{t}}\right\|^{2}\forall j,\,\,\,1\leqslant j% \leqslant K}\right\}$ (1)

Step 2: Re-compute $m_{i}$ of clusters which are newly split, and as per the center of the new cluster, the cluster is re-divided in a prior way as in Eq. (2). The new cluster centre is improved as $\textit{Im}_{i}^{t+1}=\textit{Con}^{\textit{Mean}}\left({C_{j}}\right)$ as per the proposed logic, where $C_{j}$ are clusters and $\textit{Con}^{\textit{Mean}}$ is Contra-harmonic mean which is complementary to the harmonic mean calculated as in Eq. (3).

$\displaystyle m_{i}^{t+1}=\frac{1}{\left|{S_{i}^{t}}\right|}\sum\limits_{C_{j}% \in S_{i}^{t}}{\left({C_{j}}\right)}$ (2)

$\displaystyle\textit{Im}_{i}^{t+1}=\frac{\frac{1}{\left|{S_{i}^{t}}\right|}% \sum\limits_{C_{j}\in S_{i}^{t}}{\left({C_{j}}\right)^{2}}}{\frac{1}{\left|{S_% {i}^{t}}\right|}\sum\limits_{C_{j}\in S_{i}^{t}}{\left({C_{j}}\right)}}$ (3)

Step 3: When the cluster center stops changing, the algorithm can then end its iterative process in this manner.

Three standards for algorithm termination

Standard 1:

The newly created cluster centroid stays unchanged, its points stay in the same cluster, and its iterations reach their maximum. We can halt the procedure if the freshly created cluster centroid has not been altered. Even after numerous iterations, if overall clusters have centroid as same, then the model has not learned any newly generated patterns, and training should be terminated at this time.

Standard 2:

Another indication that training should be expressly ended is if, even after performing several iterations, these points remain in the same cluster. In this case, training should be terminated.

Standard 3:

We can finally end training after the specified iteration count has been reached. Suppose we assign the iterations count as 200. The procedure will go through 200 times before halting.

4.2.2 Reducer phase

The output from the map phase i.e., pair is given to the reducer phase. In this reducer phase, for each cluster, we get pairs (key, values). As a result, each cluster contains a collection of task values.

4.3 Estimation of task priority [22]

In this phase, the physical machine $\left(\textit{pm}_{M}\right)$ and virtual machine $\left(\textit{pm}_{N}\right)$ are considered for estimating the task priority defined as in Eq. (5), where $T_{\textit{rc}}^{z}$ represents task position in the physical machine $\textit{pm}_{M}$ and the cluster $C$ , 3 represents the task resource parameter count and $T_{44}$ is the overall task count. The decision matrix Dm in Eq. (4) defines the mapping between the group of possible criteria as well as a group of alternatives.

$\displaystyle\begin{array}[]{l}\textit{Dm}=\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,% \,\,\,\,\,\,C_{1}\,\,\,\,\,\,\,C_{2}\,\,\,\,\,\,\,\,C_{3}\,\,\,\,\,\,\,\,\,C_{% 4}\\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\begin{array}[]{c}{\textit{pm}_{1}}\\ {\textit{pm}_{2}}\\ {\textit{pm}_{3}}\\ {\textit{pm}_{4}}\\ \end{array}}\,\left({\begin{array}[]{llll}T_{11}&\quad T_{12}&\quad T_{13}&% \quad T_{14}\\ T_{21}&\quad T_{22}&\quad T_{23}&\quad T_{24}\\ T_{31}&\quad T_{32}&\quad T_{33}&\quad T_{34}\\ T_{41}&\quad T_{42}&\quad T_{43}&\quad T_{44}\\ \end{array}}\right)\end{array}$ (4)

$\displaystyle\textit{prio}_{i}=\frac{\sum\limits_{z\in M}{T_{\textit{rc}}^{z}}% }{3\times T_{44}}\times 100\%$ (5)

4.4 Task scheduling

In cloud computing, task scheduling is the process of providing the best resources for a task execution while taking a number of constraints into account, including priority $\left({\textit{pri}_{i}}\right)$ , makespan $\left({M_{s}}\right)$ , completion time $\left(\textit{CT}\right)$ , resource utilization $\left(R\right)$ , and degree of imbalance $\left({D_{I}}\right)$ . Ten clusters have been created here, and the corresponding virtual machine will carry out the task for each cluster. Fitness function Fit is calculated as in Eq. (6).

$\displaystyle\textit{Fit}=\left[{\left({1-w_{1}}\right)*\textit{prio}_{i}+% \left({w_{2}}\right)*\textit{CT}+\left({w_{3}}\right)*M_{s}+\left({w_{4}}% \right)*R+\left({w_{5}}\right)*D_{I}}\right]$ (6)

4.4.1 Priority

\left({\textit{pri}_{i}}\right)

The priority of each cluster is calculated in Eq. (7) . By using the cluster $C$ and the virtual machine $\left({\textit{vm}_{N}}\right)$ , where $T_{\textit{rc}}^{z}$ represents task position in a virtual machine $\textit{vm}_{N}$ and the cluster $C$ , 3 represents the task resource parameter count and $T_{44}$ is the overall task count.

$\displaystyle\textit{pri}_{i}=\frac{\sum\limits_{z\in M}{T_{\textit{rc}}^{z}}}% {3\times T_{44}}\times 100\%$ (7)

4.4.2 Completion time

\left({CT}\right)

The completion time of a process is the moment when it has finished running. Each task virtual machine wall time $W_{\textit{time}}^{i}$ is used to determine the completion time defined as in Eq. (8), where $n$ is the task count.

$\displaystyle\textit{CT}=\frac{\sum\limits_{i=1}^{n}{W_{\textit{time}}^{i}}}{n}$ (8)

4.4.3 Makespan

\left({M_{s}}\right)

Planning assumptions place a high priority on creating a timeline that limits the allocated work’s final completion time, also known as Makespan. It is calculated to the highest wall time of total blocks in $T_{j}$ (task).

$\displaystyle M_{s}\left({T_{j}}\right)=\textit{Max}\left({\sum\limits_{B=1}^{% N}{W_{\textit{time}}}}\right)$ (9)

4.4.4 Resource utilization

\left(R\right)

Resource usage is often the total amount of time the CPU is used to complete the specified task calculated as in Eq. (10).

$\displaystyle R=\sum{\textit{Count}_{\textit{CPU}\left({T_{1},T_{2},\ldots,T_{% j}}\right)}}$ (10)

4.4.5 Degree of imbalance

\left({D_{I}}\right)

The degree of imbalance in Eq. (11) measure the imbalance among the virtual machine $\textit{vm}_{N}$ , where $W_{t_{\max}},W_{t_{\min}},W_{t_{\textit{avg}}}$ is the maximum, minimum, and average wall time.

$\displaystyle D_{I}=\frac{W_{t_{\max}}+W_{t_{\min}}}{W_{t_{\textit{avg}}}}$ (11)

4.4.6 BWUJS for optimal task scheduling

Figure 3.

Solution encoding of proposed BWUJSoptimization model.

For optimal task scheduling, virtual machine $\left({\textit{vm}_{N}}\right)$ of ten clusters is passed as an input solution (represented in Fig. 3) to the optimization problem called BWUJS which is the hybridization concept of JSO [23] and BWO [24]. The lower bound $\left({\textit{vm}_{N}}\right)$ is set as one and the upper bound $\left({\textit{vm}_{N}}\right)$ is set as virtual machine variation. The distinctive mating habits of black widow spiders served as the basis for BWUJS. The BWUJS technique begins with the initial population of spiders, with each spider holding out for a potential solution, much like earlier evolutionary methods. These original spiders make a pair-reproduction attempt for the following generation. Either during or immediately after mating, the female black widow devours the male. She then expels into the egg sacs the sperm she has gathered in her sperm thecae. Within 11 days of placement, spiderlings hatch from their egg sacs. Sibling cannibalism is observed after they have lived together on the maternal web for a few days to a week. They then take off into the wind.

Initial population

For solving the optimization problem, the problem variable values must generate a suitable design for the present issue solution. Every Black widow spider displays the problem variable values. The design in this study should be viewed as an array for getting around the benchmark functions. A widow is $1\times N_{\textit{var}_{\textit{an}}}$ array depicted as $\textit{widow}=\left[{x_{1},x_{2},\ldots,x_{N_{\textit{var}}}}\right]_{a}$ specified solution to a $N_{\textit{var}}$ – dimensional optimization problem. Every $\left[{x_{1},x_{2},\ldots,x_{N_{\textit{var}}}}\right]$ (variable value) is referred to as the floating-point number. By evaluating the fitness function $f$ as in Eq. (12) at a widow of $\left[{x_{1},x_{2},\ldots,x_{N_{\textit{var}}}}\right]$ , one can determine the widow’s fitness.

$\displaystyle\textit{fitness}=f\left({x_{1},x_{2},\ldots,x_{N_{\textit{var}}}}\right)$ (12)

Before initialising the optimization model, generate the candidate widow matrix with $N_{\textit{pop}}\times N_{\textit{var}}$ (size) with the spider’s initial population. Lastly, parent couples are chosen at random to begin the procreation process through mating, during or immediately after which the female black widow consumes the male.

Proposed procreate

The parent pairs start to mate in order to reproduce the generation as new, parallel, and in nature because they are independent of one another. In the real world, a single mating generates about 1000 eggs, yet some spiderlings do make it and grow stronger. Now, in this process, children are made by employing $\alpha$ the following Eq. (13) for reproducing, alpha is an array that should also be generated including with widow array that contains random values, where $x_{1},x_{2}$ represents parents and $y_{1},y_{2}$ represents offspring.

$\displaystyle\left\{{\begin{array}[]{l}y_{1}=\alpha\times x_{1}+\left({1-% \alpha}\right)\times x_{2}\\ y_{2}=\alpha\times x_{2}+\left({1-\alpha}\right)\times x_{1}\\ \end{array}}\right.$ (13)

As per the proposed logic, the update equation of the procreate process is done in Eq. (15) by adding the JSO position update equation (Eq. (14)) and the conventional procreate process equation. In Eq. (14), $\beta$ is the distribution coefficient, $\mu$ is the mean position of jellyfish, $x^{*}\textit{and }$ is the current best jellyfish location.

$\displaystyle x_{i}\left({t+1}\right)=x_{i}\left(t\right)+\textit{rand}\left({% 0,1}\right)\times\left({x^{*}-\beta\times\textit{rand}\left({0,1}\right)\times% \mu}\right)$ (14)

$\displaystyle\left\{{\begin{array}[]{l}y_{1}=\frac{\alpha x_{1}+\left({1-% \alpha}\right)x_{2}+x+\textit{rand}\left({x^{*}-\beta\times\textit{rand}\times% \mu}\right)}{2}\\ y_{2}=\frac{\alpha x_{2}+\left({1-\alpha}\right)x_{1}+x+\textit{rand}\left({x^% {*}-\beta\times\textit{rand}\times\mu}\right)}{2}\end{array}}\right.$ (15)

Repeating this process ${N_{\textit{var}}}\mathord{\left/{\vphantom{{N_{\textit{var}}}2}}\right.\kern-% 1.2pt}2$ times should prevent duplication of the randomly chosen numbers. The father and children are then placed in an array and organized according to their fitness value, which is this time decided by the cannibalism rating; some of the fittest people are then incorporated into the newly established population. Each pair needs to take these actions. The proposed algorithm is improved by the Gaussian mutation in Eq. (16), which results in an enhancement to local search capabilities like direction and search range.

$\displaystyle\textit{Gauss}\left(x\right)=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp^% {-\left({\frac{\left({x-\alpha}\right)^{2}}{2\sigma^{2}}}\right)}$ (16)

Here, $\alpha=\textit{Mean}\,\left({x\left(t\right)}\right)$ , $\sigma=\textit{SD}\,\left({x\left(t\right)}\right)$

Arithmetic crossover

As per the proposed concept, the solutions are also generated using Arithmetic crossover in Eq. (17) that combines the 2 parent chromosomes in a linear manner, where $\delta$ is offspring, $\gamma_{1},\gamma_{2}$ is parent 1 and parent 2, $H\in\left({0,1}\right)$ is the random value. In this operator, 2 chromosomes are chosen randomly for the crossover process and by linearly combining the 2 chromosomes, 2 offspring are generated [25].

$\displaystyle\delta=H*\gamma_{1}+\left({1-H}\right)*\gamma_{2}$ (17)

Cannibalism

In this case, there are three different kinds of cannibalism. In the first, a female black widow consumes her partner during or after mating – a practice known as sexual cannibalism. Based on their degrees of fitness, we may use this algorithm to discern among males and women. Another type of cannibalism involves stronger spider lings eating their weak siblings. This algorithm includes a cannibalism rating that determines how many survivors are defined. There are sporadic reports of the third type of cannibalism, in which the young spiders devour their mother. The fitness rating serves as a gauge for spiderling strength and weakness. Algorithm 1 shows the pseudo code of the suggested BWUJS algorithm.

Algorithm 1: Pseudo code of proposed BWUJS algorithm
Input:					virtual machine $\left({\textit{vm}_{N}}\right)$ , procreating rate, cannibalism rate and mutation rate
Output:					Optimal solution for the objective function
/ /initialization stage
The initial population of BWUJS
Every pop (population) is a d-dimensional chromosome array for the d-dimensional problem
//loop until termination condition met
				Calculate reproduction count $\left(\textit{nr}\right)$ depending on the procreating rate
				Choose the best $\left(\textit{nr}\right)$ solution for the population and store it in $\textit{pop}_{1}$
				for $i=1$ tonrdo
	Choose parents as two solution randomly from $pop_{1}$
	Generation of $D$ children by using Eq. (15) as per BWUJS
	Destroy the father
	Destroy some children (newly obtained solution) depending on the cannibalism rate
	Store the solutions that remain in $\textit{pop}_{2}$
end for
//Mutation
Calculate the count of mutation children $\left(\textit{nm}\right)$ depending on the mutation rate
			for $i=$ 1 tonmdo
Choose solutions from the $\textit{pop}_{1}$
$x=x\times\left({1-\textit{Gauss}\left(x\right)}\right)$
$x_{\textit{new}}=\textit{Arithmetic}\_\textit{crossover}\left(x\right)$
Store new solutions in $\textit{pop}_{3}$
		end for
		Update $\textit{pop}=\textit{pop}_{2}+\textit{pop}_{3}$
Return the best solution
Return the best solution from the pop

Table 2

Constraints in database

1	PM count	Original data
2	Task count
3	PM cost
4	Wall time
5	Security	Synthetic data
6	CPU count

Figure 4.

Evaluation on BWUJS and the existing models for Big data task scheduling in cloud framework a) Completion time b) Degree of imbalance c) Fitness.

5. Results and discussion

5.1 Simulation procedure

Python was used to execute the suggested large data task scheduling in the cloud architecture. The dataset was accumulated from [26]. The effectiveness of the BWUJS model was compared with the traditional schemes like PSO-BASED SCHEDULER [27], EBBO [28], BOA, WOA, SALPSO, BES, AOA and BWO, respectively. Further, the evaluation was carried out with regard to Completion time, Degree of imbalance, Fitness, Makespan, Priority and Resource utilization by altering the number of virtual machines (VM) from 1000–5000.

5.2 Dataset description

Table 2 represents the several criteria of the PM database. The PM database includes several criteria, including.

With the help of the PM database, the VM database is constructed. Additionally, it includes the same criteria as the PM.

5.3 Assessment on BWUJS and the conventional methods with respect to Completion Time, Degree of Imbalance and Fitness for big data task scheduling in cloud framework

The performance of the BWUJS is examined over the prior models ( PSO-based scheduler, EBBO, BOA, WOA, SALPSO, BES, AOA and BWO) in terms of completion time, Degree of imbalance and fitness is represented in Fig. 4(a), 4(b) and 4(c). Further, the evaluation is done for a varied number of VMs. For optimal scheduling of tasks, the model should acquire the least completion time, Degree of Imbalance and fitness rate. For instance, the completion time of the BWUJS is 3.8743 in the 1000^th VM, which is extremely lower than the conventional strategies, including, PSO-BASED SCHEDULER $=$ 7.8621, EBBO $=$ 9.3708, BOA $=$ 5.9836, WOA $=$ 6.2854, SALPSO $=$ 8.3621, BES $=$ 7.8659, AOA $=$ 8.6832 and BWO $=$ 7.4529, correspondingly. Analyzing the degree of imbalance, the BWUJS scored the minimized degree of imbalance in almost all the virtual machines. More particularly, for the VM 4000, the BWUJS offered the least degree of imbalance (0.6319) than the PSO-BASED SCHEDULER, EBBO, BOA, WOA, SALPSO, BES, AOA and BWO.

Consequently, the highest fitness rate scored by the PSO-BASED SCHEDULER is 16.9783 for the VM 5000, whereas the lowest fitness rate attained by the BWUJS is 4.6872. Similarly, the BWUJS recorded a minimized fitness value than the other conventional strategies. Therefore, the low completion time, degree of imbalance and fitness rate is due to the introduction of a new optimal task scheduling process in the cloud via the BWUJS optimization strategy.

5.4 Assessment on BWUJS and the conventional methods with respect to Makespan, Resource Utilization and Priority for big data task scheduling in the cloud framework

Figure 5.

Examination on BWUJS and the traditional methods for Big data task scheduling in cloud framework a) Makespan b) Resource utilization and c) Priority.

Figure 5(a), 5(b) and 5(c) show the analysis on makespan, resource utilization and priority of the BWUJS over the PSO-BASED SCHEDULER, EBBO, BOA, WOA, SALPSO, BES, AOA and BWO for big data task scheduling in cloud framework. The model must accomplish minimal makespan and resource utilization with higher priority for big data task scheduling in the cloud framework. Moreover, the makespan of the BWUJS for the 3000^th VM is 6.38, which is superior to PSO-BASED SCHEDULER (11.2693), EBBO (11.6578), BOA (9.8541), WOA (10.2463), SALPSO (10.5790), BES (11.8968) and BWO (19.9753), correspondingly. Furthermore, in the VM 4000, the BWUJS utilized fewer resources than the conventional methods, such as PSO-BASED SCHEDULER, EBBO, BOA, WOA, SALPSO, BES, AOA and BWO, respectively. Finally, assessing Fig. 5(c), the BWUJS scored the highest priority rate in the 4000^th VM than the 1000^th VM. However, the BWUJS generated better findings than the other schemes in all the VMs. Mainly, the greatest priority obtained in the 5000^th VM is 13.9846, whilst the conventional strategies hold the minimized priority ratings. Hence, the supremacy of the BWUJS is acquired for the big data task scheduling in the cloud framework using the hybrid optimization strategy.

5.5 Convergence study on BWUJS and the conventional methods for big data task scheduling in cloud framework

Figure 6.

(a) Convergence analysis on BWUJS and the conventional methods for Big data task scheduling in cloud framework (b) local optima versus global optima.

The convergence evaluation on BWUJS is assessed over the PSO-BASED SCHEDULER, EBBO, BOA, WOA, SALPSO, BES, AOA and BWO for big data task scheduling in cloud architecture is exposed in Fig. 6. More efficient task scheduling can be provided by the model with a lower cost value and faster convergence. Likewise, the BWUJS gained the lowest cost rate over the conventional schemes. Further, the BWUJS attained the minimized cost value from the initial to the final iterations. Also, poor performance is observed in the AOA approach. Similarly, the BWUJS obtained the diminished cost rate (14.0758) from iteration 5 to 25, whereas the PSO-BASED SCHEDULER, EBBO, BOA, WOA, SALPSO, BES, AOA and BWO gained the greatest cost rate. We have performed task scheduling in the cloud using hybrid optimization algorithms including BWO and JS have lowered the cost rate.

In Fig. 6(b) Instead of convergent toward global optima, all IML optimization algorithms do so toward local ones. An objective function’s highest local maximum is referred to as the global maximum, while it’s smallest local minimum is known as the global minimum.

Table 3

Statistical analysis on BWUJS and the conventional methods for optimal task scheduling in the cloud with respect to completion time and degree of imbalance

Completion time
Statistical measures	Best	Mean	Worst	Standard deviation	Median
PSO-BASED SCHEDULER	7.4498	6.0900	7.1940	1.5518	2.4080
EBBO	8.0675	5.6396	8.8974	2.0012	4.0049
BOA	7.3591	5.5924	7.0424	1.3227	1.7494
WOA	7.9761	6.1655	8.0645	1.2787	1.6350
SALPSO	8.2732	6.5121	8.4642	0.9372	0.8784
BES	6.4240	5.6692	6.1924	0.6244	0.3898
AOA	7.6181	5.8169	6.5117	1.8967	3.5976
BWO	6.8939	5.7290	6.0366	1.5449	2.3867
BWUJS	3.7031	3.6888	3.7082	0.0084	0.0001
Degree of imbalance
Statistical measures	Best	Mean	Worst	Standard deviation	Median
PSO-BASED SCHEDULER	1.3240	1.0557	1.2138	0.2600	0.0676
EBBO	1.5690	1.2690	1.5224	0.2837	0.0805
BOA	1.2231	1.0748	1.2032	0.1062	0.0113
WOA	1.3531	1.0956	1.1170	0.3043	0.0926
SALPSO	1.4220	1.1388	1.4388	0.2126	0.0452
BES	1.3361	1.0583	1.2094	0.2856	0.0816
AOA	1.3407	1.1476	1.2780	0.1926	0.0371
BWO	1.3676	1.1882	1.3576	0.1317	0.0173
BWUJS	0.6974	0.6963	0.6968	0.0010	0.0000

Table 4

Statistical evaluation on BWUJS and the conventional methods for optimal task scheduling in the cloud with respect to fitness, makespan

Fitness
Statistical measures	Best	Mean	Worst	Standard deviation	Median
PSO-BASED SCHEDULER	12.1734	8.5681	11.0731	3.5146	12.3527
EBBO	10.9179	9.3007	10.5062	1.4826	2.1980
BOA	10.8475	8.5471	11.4523	1.8199	3.3123
WOA	8.3917	7.6379	8.19891	0.5608	0.3145
SALPSO	11.6136	8.1030	11.8151	2.3866	5.6962
BES	11.4501	8.5581	11.9556	1.4867	2.2105
AOA	13.5716	8.4337	16.1242	3.7896	14.3611
BWO	9.7471	8.0061	9.37283	1.3215	1.7465
BWUJS	4.9262	4.8720	4.9315	0.0324	0.0011
Makespan
Statistical measures	Best	Mean	Worst	Standard deviation	Median
PSO-BASED SCHEDULER	12.9123	10.8681	12.1368	1.9452	3.7839
EBBO	14.0861	11.8051	13.2762	2.6145	6.8356
BOA	13.6597	10.3415	11.8808	3.2957	10.8615
WOA	13.9786	10.8139	12.9436	2.8012	7.8466
SALPSO	13.1801	11.3555	11.5554	3.1298	9.7956
BES	12.5353	10.2616	13.0565	1.1768	1.3847
AOA	12.9564	10.1198	13.3702	2.5193	6.3466
BWO	13.8799	10.1757	10.7812	4.1351	17.0990
BWUJS	6.6487	6.62661	6.6297	0.0359	0.0013

Table 5

Statistical analysis on BWUJS and the conventional methods for optimal task scheduling in the cloud with respect to priority and resource utilization

Priority
Statistical measures	Best	Mean	Worst	Standard deviation	Median
PSO-BASED SCHEDULER	4.9201	1.2671	5.4406	2.8470	8.1052
EBBO	4.1220	1.5917	3.0377	2.3477	5.5118
BOA	4.2290	1.2705	3.0490	2.6487	7.0155
WOA	1.5465	1.0027	1.4202	0.4527	0.2049
SALPSO	4.7480	1.1986	4.9551	2.4059	5.7885
BES	1.6840	1.2833	1.6398	0.3141	0.0987
AOA	3.6722	1.0650	3.5133	1.8666	3.4843
BWO	3.6613	1.4913	3.9552	1.7954	3.2233
BWUJS	8.4283	2.8147	8.4562	3.9614	15.6928
Resource utilization
Statistical Measures	Best	Mean	Worst	Standard deviation	Median
PSO-BASED SCHEDULER	9.0266	6.0067	10.4649	2.3342	5.4487
EBBO	7.9711	5.9652	8.2592	1.2991	1.6877
BOA	8.1659	6.8553	7.5060	1.4515	2.1068
WOA	8.6636	6.0895	8.8755	1.6245	2.6389
SALPSO	8.9706	7.5577	9.0519	1.1073	1.2262
BES	8.5948	6.8980	9.1718	1.2562	1.5781
AOA	7.4832	6.0771	6.6567	1.8747	3.5147
BWO	9.3886	7.4962	8.7452	1.6089	2.5887
BWUJS	3.8426	3.6915	3.8780	0.0756	0.0057

5.6 Statistical evaluation on BWUJS and the traditional methods for optimal task scheduling in the cloud with regard to completion time, degree of imbalance, fitness, makespan, priority and resource utilization

Table 6
Analysis on wilcoxon and friedman values

Methods	Friedman $P$ value	wilcoxon $p$ value
PSO-BASED SCHEDULER	0.000912	7.63E-06
EBBO	4.54E-05	6.10E-05
BOA	0.000123	3.05E-05
WOA	6.14E-06	0.000244
SALPSO	1.67E-05	0.000122
BES	8.32E-07	0.000977
AOA	2.26E-06	0.000488
BWO	4.54E-05	6.10E-05
BWUJS	0.367879	1.19E-07

Table 7

Analysis on $T$ test and $P$ test

Methods	$T$ test	$P$ test
PSO-BASED SCHEDULER	302.8655	2.01E-80
EBBO	291.3414	1.29E-79
BOA	315.9173	2.66E-81
WOA	981.9758	########
SALPSO	529.4619	4.60E-92
BES	252.1491	1.32E-76
AOA	5472.985	########
BWO	285.1624	3.62E-79
BWUJS	216.4254	2.01E-73

Tables 3, 4 and 5 summarized the statistical analysis of the BWUJS computed over the PSO-BASED SCHEDULER, EBBO, BOA, WOA, SALPSO, BES, AOA and BWO in terms of completion time, degree of imbalance, fitness, makespan, priority and resource utilization. For efficient scheduling of tasks, the BWUJS generated superior outcomes over the conventional approaches. Regarding the completion time analysis, the BWUJS acquired a completion time of 3.6888 under the mean statistical measure, meanwhile, the PSO-BASED SCHEDULER is 6.0900, EBBO is 5.6396, BOA is 5.5924, WOA is 6.1655, SALPSO is 6.5121, BES is 5.6692, AOA is 5.8169 and BWO is 5.7290, correspondingly. Moreover, the priority rate of the BWUJS for the median statistical measure is 15.6928, though the PSO-BASED SCHEDULER, EBBO, BOA, WOA, SALPSO, BES, AOA and BWO obtained the minimal priority value. Consequently, evaluating the worst statistical measure, the BWUJS scored Fitness $=$ 4.9315, Makespan $=$ 6.6297 and Resource Utilization $=$ 3.8780. The hybrid optimization method taken as a whole enables the suggested model to schedule the jobs in a more optimal manner.

5.7 Statistical test

A non-parametric statistical test for repeated measurements data analysis is the Friedman test. A within-subject design is achieved by using the Friedman test, which is an extension of the wilcoxon signed rank test. Data having three or more correlated or recurring outcomes with non-normal distributions are employed for this type of analysis. The distribution remains constant throughout the series of measurements, according to the null hypothesis. Tables 6 and 7 represent the statistical test on wilcoxon, friedman, $p$ test and $T$ test values. When compared to the other ways, the suggested method yields better results.

6. Conclusion

Three levels make up the cloud computing task scheduling system: task, scheduling, and VM. Due to its direct impact on cloud performance, task scheduling was one of the most significant issues in the cloud computing environment. In this work, we developed BWUJS, a multi-objective hybrid optimization-based task scheduling algorithm. The task generation was the initial step of this research considering the big data perspective. The clustering of tasks was performed via the MapReduce framework with an Improved K-means clustering model. After task clustering, the task priority estimation was performed. Finally, the scheduling was performed via BWJSU based on certain constraints like priority, makespan, completion time, resource utilization, and degree of imbalance.

Footnotes

Nomenclature

Abbreviation	Description
OCTd	Optimistic Cost Table downward
PaaS	Platform as a Service
HEFT	Heterogeneous Earliest Finish Time
EMVO	Enhanced version of the Multi-Verse Optimizer
PSO	Particle Swarm Optimization
OCTu	Optimistic Cost Table upward
EPRD	Efficient Priority And Relative Distance
QEEC	Q-learning-based task scheduling framework for energy-efficient cloud computing
MVO	Multi-Verse Optimizer
IaaS	Infrastructure as a Service
SaaS	Software as a Service

Abbreviation	Description
HESGA	Hybrid Electro Search with a genetic algorithm
VM	Virtual Machine
CR-PSO	Chemical Reaction Optimization-Particle Swarm Optimization
CRO	Chemical Reaction Optimization
PSO	Particle Swarm Optimization
SLR	Schedule Length Ratio
SARO	Strong Agile Response Optimization
IT	Information Technology
IoT	Internet of Things
PDF	Probability Density Function
JSO	Jellyfish Search Optimizer
BWO	Black Widow Optimization
EBBO	Extended Biogeography-Based Optimization

References

Zhang

Han

Dong

. OKCM: improving parallel task scheduling in high-performance computing systems using online learning. The Journal of Supercomputing. 2020. doi: 10.1007/s11227-020-03506-5.

Wei

. Task scheduling optimization strategy using improved ant colony optimization algorithm in cloud computing. Journal of Ambient Intelligence and Humanized Computing. 2020. doi: 10.1007/s12652-020-02614-7.

Bai

Xie

. The optimizing resource allocation and task scheduling based on cloud computing and Ant Colony Optimization Algorithm. Journal of Ambient Intelligence and Humanized Computing. 2021. doi: 10.1007/s12652-021-03445-w.

Sreenivasulu

Ilango

. Hybrid optimization algorithm for task scheduling and virtual machine allocation in cloud computing. Evolutionary Intelligence. 2020. doi: 10.1007/s12065-020-00517-2.

Ashalatha

Jayashree

Patil Siddarama

. Adaptive task scheduling method in multi-tenant cloud computing. Int J Inf Tecnol. 2019. doi: 10.1007/s41870-019-00389-5.

Laith

Muhammad

. Amended hybrid multi-verse optimizer with genetic algorithm for solving task scheduling problem in cloud computing. The Journal of Supercomputing. 2021. doi: 10.1007/s11227-021-03915-0.

Kalka

Sharma

. A hybrid multi-faceted task scheduling algorithm for cloud computing environment. Int J Syst Assur Eng Manag. 2021. doi: 10.1007/s13198-021-01084-0.

Sun

Wang

. Task scheduling of cloud computing based on hybrid particle swarm algorithm and genetic algorithm. Cluster Computing. 2020. doi: 10.1007/s10586-020-03221-z.

Laith

Ali

. A novel hybrid antlion optimization algorithm for multi-objective task scheduling problems in cloud computing environments. Cluster Computing. 2020. doi: 10.1007/s10586-020-03075-5.

10.

Avinab

Sandeep

Parizi Reza

. Kim-Kwang Raymond Choo, Zhiyong Liu, Classification-based and Energy-Efficient Dynamic Task Scheduling Scheme for Virtualized Cloud Data Center. IEEE Transactions on Cloud Computing. 1-1. doi: 10.1109/tcc.2019.2918226.

11.

Chen

Zheng

Min

. Computation Offloading and Task Scheduling for DNN-Based Applications in Cloud-Edge Computing. IEEE Access. 8: 115537-115547. doi: 10.1109/access.2020.3004509.

12.

Chen

Cheng

Liu

Mao

John

. A WOA-based optimization approach for task scheduling in cloud computing systems. IEEE Systems Journal. 14(3): 3117-3128. doi: 10.1109/jsyst.2019.2960088.

13.

Guo

. Multi-objective task scheduling optimization in cloud computing based on fuzzy self-defense algorithm. Alexandria Engineering Journal. 2021; 60.

14.

Kalka

Sharma

. A novel multi-objective CR-PSO task scheduling algorithm with deadline constraint in cloud computing. Sustainable Computing: Informatics and Systems. 2021; 32.

15.

Reza

Mirsaeid Hosseini

Homayun

. A heuristic-based task scheduling algorithm for scientific workflows in heterogeneous cloud computing platforms. Journal of King Saud University – Computer and Information Sciences. 2021.

16.

Shu

Cai

Xiong

. Research on strong agile response task scheduling optimization enhancement with optimal resource usage in green cloud computing. Future Generation Computer Systems. 2021; 124.

17.

Shukri

Al-Sayyed

Hudaib

, et al. Enhanced multi-verse optimizer for task scheduling in cloud computing environments. Expert Systems With Applications. 2020. doi: 10.1016/j.eswa.2020.114230.

18.

Zhang

Zhou

Ahmad

. Efficient scientific workflow scheduling for deadline-constrained parallel tasks in cloud computing environments. Information Sciences. 2020; 531.

19.

Ding

Fan

Zhao

Kang

Yin

Zeng

. Q-learning based dynamic task scheduling for energy-efficient cloud computing. Future Generation Computer Systems. 2020; 108.

20.

Velliangiri

Karthikeyan

Arul Xavier

Baswaraj

. Hybrid electro search with genetic algorithm for task scheduling in cloud computing. Ain Shams Engineering Journal. 2020.

21.

Cui

. Introduction to the K-means clustering algorithm based on the elbow method, accounting. Auditing and Finance. 2020; 1: 5-8. doi: 10.23977/accaf.2020.010102.

22.

Rjoub

Bentahar

Wahab

. BigTrustScheduling: Trust-aware big data task scheduling approach in cloud computing environments. Future Generation Computer Systems. 2019. doi: 10.1016/j.future.2019.11.019.

23.

Jui-Sheng

Dinh-Nhat

. A novel metaheuristic optimizer inspired by behavior of jellyfish in ocean. Applied Mathematics and Computation. 2020; 389.

24.

Vahideh

Ali Asghar Pourhaji

. Black Widow Optimization Algorithm: A novel meta-heuristic approach for solving engineering optimization problems. Engineering Applications of Artificial Intelligence. 2020; 87.

25.

Yılmaz

KAYA

Murat

UYAR

Ramazan

TEKDN

. A novel crossover operator for genetic algorithms: Ring crossover. Neural and Evolutionary Computing. arXiv:1105.0355. 2011. doi: 10.48550/arXiv.1105.0355.

26.

https//www.kaggle.com/discdiver/clouds.

27.

Huang

Chen

. Task scheduling in cloud computing using particle swarm optimization with time varying inertia weight strategies. Cluster Computing. 2020; 23.

28.

Xiao

Zhang

Zhuang

. Game theory–based multi-task scheduling in cloud manufacturing using an extended biogeography-based optimization algorithm. Concurrent Engineering. 2019; 27.

Multi-objective hybrid optimized task scheduling in cloud computing under big data perspective

Abstract

Keywords

1. Introduction

Table 1 Pros and cons of existing algorithms

3. Task scheduling in cloud computing: System model

4.2 Task clustering

4.2.1 Mapper phase (Improved K-Means clustering)

4.3 Estimation of task priority [22]

5.1 Simulation procedure

5.2 Dataset description

5.3 Assessment on BWUJS and the conventional methods with respect to Completion Time, Degree of Imbalance and Fitness for big data task scheduling in cloud framework

5.4 Assessment on BWUJS and the conventional methods with respect to Makespan, Resource Utilization and Priority for big data task scheduling in the cloud framework

Table 6 Analysis on wilcoxon and friedman values

6. Conclusion

Footnotes

Nomenclature

References

Table 1
Pros and cons of existing algorithms

Table 6
Analysis on wilcoxon and friedman values