Abstract
Job scheduling has become one of the most challenging issues in the research field of data centers within the cloud computing sector. Green cloud computing is a paradigm that employs efficient techniques to significantly reduce carbon emissions, CPU frequency scaling, and overall energy consumption. The large volume of data processed necessitates the consideration of data center strategies to mitigate energy consumption, along with other factors such as environmental impact, operational cost, and system reliability. Thus, this research aims to develop a new job scheduling scheme in a big data-assisted green cloud environment by formulating a multi-objective optimization problem that considers makespan, power consumption, latency, and resource utilization. For handling big data, the MapReduce framework is employed. To implement the green cloud, a Hybrid Firefly Water Strider Optimization (HFWSO) algorithm is developed by combining the Firefly Algorithm (FFA) and Water Strider Algorithm (WSA). This algorithm optimizes the number of servers and jobs allocated based on data volume, ensuring efficient job distribution across servers. Task scheduling is performed to solve a multi-objective function under various constraints. The makespan of the developed model is 148.3 s, energy consumption is 6.72 kWh, and resource utilization is 76.50%. Thus, the findings demonstrate that it effectively minimizes the energy consumption in data centers and also appropriately allocates jobs to machines. These comparative findings also reinforce the suitability of HFWSO as a powerful metaheuristic for multi-objective optimization in energy-efficient cloud computing environments, making it a promising candidate for practical applications requiring balanced trade-offs among makespan, energy consumption, latency, and resource utilization.
Keywords
Introduction
The emergence of big data has significantly impacted recommendation models in business applications. Well-known frameworks such as Apache Spark and Apache Hadoop are widely employed to perform data analytics tasks that can handle enormous volumes of data in both industry and academia. 1 Since multiple jobs often require real-time access, efficient job scheduling becomes essential to enhance system performance and to optimize resource utilization. 2 As the term “big data” implies, it involves complex and large datasets, making it challenging to use conventional data processing or database management systems. Key challenges include data analysis, searching, capturing, transferring, visualization, storage, and data sharing. The advantage of possessing a large data source lies in the ability to gather critical information from numerous sources. Compared to individual datasets, big data supports various applications such as improving research quality, linking legal citations, estimating real-time traffic conditions, preventing diseases, identifying business trends, and solving crimes. To meet these demands, the primary factor in job scheduling is the average response time, defined as the time elapsed between job assignment and completion. 3 Recently, job scheduling has played a pivotal role in enhancing the performance of big data applications. 4 Some scheduling algorithms, like First-In-First-Out (FIFO) and fair scheduling, are limited in their performance improvements because job sizes vary widely. 5 Since jobs arrive continuously, scheduling must handle datasets of different sizes effectively. 6 From a system perspective, reducing job response times helps prioritize smaller jobs, minimizing delays caused by longer-running jobs. Similarly, from the user's perspective, lower response times lead to improved outcomes and satisfaction. 7
Cloud computing is a technology that enables both the delivery and access of cloud services via the Internet, as well as the use of hardware and software tools to provide these services in a data-centric manner, commonly referred to as Software as a Service (SaaS). 8 Unlike traditional approaches, cloud data centers differ significantly from conventional data centers. 9 Typically, cloud data centers consist of numerous network servers, along with networking subsystems, air-conditioning equipment, power distribution units, cooling infrastructure, and associated storage. However, these data centers consume substantial amounts of energy and produce significant carbon emissions due to the handling of hundreds or even thousands of network devices. 10 Another critical component is the network system in cloud computing, which contributes notably to overall power consumption. Both data and applications require reliable transfer and access through cloud resources over Internet-based services. 11 Therefore, a major challenge for traditional methodologies is maintaining energy efficiency while reducing network congestion and traffic within the cloud environment. 12
Generally, energy consumption is a major component of the operational costs of data centers. 13 Data centers consume a significant amount of energy, often measured in kilowatt-hours per year, and this consumption continues to rise. 14 Therefore, data centers must handle large tasks that involve various computational factors, such as communications, Central Processing Unit (CPU) usage, and memory demands. Depending on the nature of these tasks, each server may experience varying levels of energy consumption and different processing times. 15 To address this, data centers are actively working to reduce energy consumption, which has led to the emergence of green cloud computing as a significant research area. Green computing focuses on utilizing energy-efficient resources, including personal devices, to reduce power usage. The primary goals of reducing energy consumption include improving overall performance, extending battery and network lifetime, meeting performance requirements, lowering power consumption, optimizing resource utilization and profitability, and minimizing CO2 emissions. 16 With the increasing number of users, the volume of jobs that need to be scheduled also increases. Cloud servers typically operate one or more data centers to manage this workload. However, underutilized servers tend to waste energy unnecessarily. 17 Consequently, reducing the number of active servers is an effective approach for lowering energy consumption. To this end, various task scheduling methods have been designed and implemented to reduce energy usage by optimizing how tasks are allocated to servers through algorithmic procedures. Therefore, this paper proposes a scheduling technique designed to minimize energy consumption, network traffic, and congestion.
The primary aspects of the research paper are sequenced as follows. To design a job scheduling model based on a multi-objective function using big data within a green cloud framework aimed at reducing energy consumption in cloud data centers. To implement the MapReduce model for handling big data analytics in the green cloud environment, with the primary goal of reducing the dimensionality of large-scale datasets. To develop a hybrid optimization algorithm, named HFWSO, which integrates the conventional WSA with the classical FFA to achieve optimal job allocation to respective servers or machines. To formulate a multi-objective function optimized by the proposed HFWSO algorithm. This function incorporates various constraints such as energy consumption, queuing time, delay, number of active servers, and task completion time. To analyze the performance of the proposed model through comprehensive validation, including comparative and statistical analyses against classical heuristic optimization methods under varying numbers of tasks and Virtual Machines (VMs)
The organization of the research paper is explained as follows. Section II reviews the existing job scheduling work with research gaps and challenging problems. Section III explores the big data-assisted job scheduling in the green cloud, along with its system model. The architectural view of the job scheduling process with the MapReduce framework for handling big data is demonstrated in Section IV. The novel HFWSO algorithm is elucidated for optimizing the tasks and deriving the multi-objective function, which is explained in Section V. Section VI discusses the experimental results. Finally, the new framework ends with Section VII.
Literature review
Existing works
Recently, numerous task scheduling approaches in cloud and fog computing models have been developed. However, energy efficiency, service quality, and scalability remain pressing challenges, and some of the models are explained in this section as follows:
Heuristic-based Methods: In 2019, You et al. 18 described the fair scheduling process as Multi-resource Collapsed Hierarchies (MCH) using a hierarchical structure of a weighted flat tree. MCH was carried out within a feasible time frame. Finally, the simulations were performed using a Google cluster, and the results demonstrated that it achieved the least computation time. Though effective in minimizing computation time, MCH lacked flexibility in adapting to real-time workload variations. Therefore, in 2018, Liu et al. 19 developed the Multiple Job Scheduling and Lightpath Provisioning (MJSLP) model for reducing task completion time. It was processed in two phases: first, it reduced the task completion time by applying a heuristic approach. The simulation results revealed that MJSLP outperformed existing methods in terms of slot utilization and completion time. While MJSLP improved slot utilization and reduced task completion time, its two-phase design was still rule-based and limited by the assumptions embedded in its heuristic formulations. To overcome these limitations, in 2018, Mao et al. 20 presented two approaches for task scheduling: a time-aware algorithm and an energy-aware algorithm. Since the key benefits of the two algorithms were complementary, they were integrated into a single model known as the Energy-Performance Trade-off Multi-resource Cloud Task Scheduling Algorithm (ETMCTSA). The enhanced model was implemented in MultiRECloudSim and compared against existing scheduling mechanisms. Due to this parameter optimization, the suggested model demonstrated improved overall performance. However, the energy consumption of this model was relatively low, and the processing time remained high. Subsequently, in 2016, Deng et al. 21 explored the EcoPower mechanism to schedule loads or jobs distributed across different cloud data centers. The simulation was carried out successfully, but the model faced complexity in addressing the Quality of Experience (QoE) holistically, particularly under heterogeneous and unpredictable loads.
Stochastic and Learning-based Methods: In 2020, Chen et al.
22
implemented a Quality of Service (QoS) model to estimate data center performance. In addition, a new model, Cross-Entropy-based Stochastic Scheduling (CESS), was deployed to optimally tune QoS parameters for all jobs within the cloud. The investigated work increased both the service rate and job arrival rate, providing assurance of improved scheduling performance. Yet, it raised concerns about convergence time and generalizability. Later, in 2021, Caviglione et al.
23
developed a Deep Reinforcement Learning (DRL) model to acquire the optimal solution for allocating jobs to respective machines. In contrast to other techniques, the optimization method achieved more accurate results and improved scheduling performance. However, it faced challenges with task security and data loss due to the high task density
Recent Models: In 2025, Ergin et al.
26
developed a finite element-based model that helped reduce delay by eliminating stress concentration. Subsequently,
Thus, this work proposes an enhanced HFWSO model to address these gaps through a robust combination of the FFA and WSA, where FFA enhances local search and intensification, and WSA improves diversification and global search
Problem statement
Numerous job scheduling tasks are explained in Table 1. MCH
18
algorithm saves computational time, reduces total run time, and guarantees job separation through its hierarchical design. However, it requires more memory. The MJSLP algorithm
19
reduces job completion time and achieves superior efficiency in terms of average frequency slots and completion time. On the other hand, it consumes more power. The DRL method
23
efficiently handles workload fluctuations and provides a trade-off between costs, security requirements, and the performance of the data center. Conversely, it suffers from user mobility issues. ETMCTSA
20
offers a superior balance between efficiency management and performance, and reduces the energy consumption of VMs through optimized allocation. On the contrary, it does not focus on solving the workload imbalance
Features and challenges of existing job scheduling in the green cloud.
Features and challenges of existing job scheduling in the green cloud.
While shifting big data from one data center to another, the cloud requires more energy or power. Thus, green cloud computing emerges, which is performed using various algorithms and techniques that consume less power while maximizing performance in terms of power minimization. Therefore, an energy-efficient task scheduling scheme is required for green computing, which considers its primary objective as the minimization of power consumption. Moreover, scheduling is also one of the most complicated activities within the cloud computing framework, as it maximizes the efficiency of cloud workloads. The major aim of task scheduling is to maximize the makespan utilization, throughput, and overall profitability. This motivation brings attention to job scheduling mechanisms suitable for cloud data centers.
In a green cloud environment, many independent VMs are created and deployed on a single server. While using data centers, energy is required to schedule the tasks. Hence, every data center incurs varying energy costs, power consumption, and carbon emissions. Some former models also face persistent challenges, which are addressed in the proposed work. Thus, the primary objective of this study is to allocate jobs using the aid of an appropriate data center that has lower energy cost and consumption, which helps achieve an efficient green cloud system.
System model of green cloud
Green cloud computing has emerged with potential services, applications, and computing capabilities that offer sustainable environmental benefits. It is referred to as a paradigm for server integration, manufacturing, and designing, where the setup enables the effective utilization of cloud resources in an environmentally friendly manner. The key role of employing a green cloud is to minimize energy consumption, leading to enhanced energy efficiency. It also aims to reduce hazardous particles that affect the environment. In general, the green cloud is designed to implement environment-aware computing in data centers. Since it is environmentally focused
Some of the significant components are described below. Green brokers: In the cloud sector, brokers or consumers receive user requests and schedule them to the respective machines or cloud data centers. Resource allocator: It serves as the interface between consumers and the cloud environment and is responsible for managing energy-efficient resources. VMs: They are used for processing requests from users. They dynamically start and stop processes once requests are received and requirements are met. Router: It is used to transfer requests and responses between the user and the internet. Carbon emission directory: Once a request is sent, this directory retrieves information related to energy efficiency. Datacenter: Due to the vast scale of cloud systems, numerous data or requests are received from various companies, industries, or organizations. To handle all this data, a datacenter is utilized, where information is stored securely.

Diagrammatic view of the green cloud system.
Big data analytics has become a pivotal component for job scheduling in green cloud environments, mainly due to the massive volume of information produced by the rapid growth of modern technologies. As the name suggests, “big data” involves vast and complex datasets. This data is organized into three layers: hardware-based infrastructure, software-based management, and big data applications that deliver diverse services.
Challenges of using big data in green cloud: As big data comprises a large amount of information and datacenters, it consumes more energy and power. Therefore, higher energy consumption increases the environmental impact. Typically, big data is acquired from cameras, sensors, and other data sources. Due to this large-scale data acquisition, the green cloud faces critical issues, particularly related to energy consumption. The data center is another major concern in big data analytics, as it needs to store large volumes of data, and this storage process results in higher power consumption, making it a significant challenge. Furthermore, big data introduces a dimensionality issue, which can be mitigated by employing the Green Hadoop and MapReduce frameworks. The schematic illustration of big data analytics in the green cloud is depicted in Figure 2.

Pictorial depiction of the big data analytics model over a green cloud environment.
MapReduce framework for handling big data
Other than the Hadoop method, the MapReduce 31 process is now considered for handling big data analytics. Nowadays, analyzing big data has become a challenging task; in such scenarios, a dimensionality reduction model is employed to lessen the data size and complexity. Thus, this paper utilizes the MapReduce framework to manage big data in the cloud environment. The schematic model of the MapReduce approach is illustrated in Figure 3.

Diagrammatic Illustration of the MapReduce framework for handling big data.
The basic components of the Mapper: It is mainly used to generate intermediate key-value pairs as random numbers or values. Reducer: After the generation of pairs, the Reducer processes all intermediate values corresponding to their respective intermediate keys. Partitioner: Initially, it divides the data based on the intermediate key space and ensures that the intermediate key-value pairs are properly assigned to the Reducers. Combiner: It optimizes performance by aggregating mapped data locally before sorting, thereby saving bandwidth, energy, and power.
Conversely, the model comprises both mapping and reducing functions. Initially, big data consists of large datasets containing irrelevant or redundant information in the form of data, tasks, or jobs. Hence, the mapping function is utilized to reduce the dimensionality of big data, as expressed in Equation (1).
Here, the term
In the above equation, the mapping is accomplished concerning big data that is annotated as
Here, the term
Recently, more individuals have started utilizing cloud services, including social networking, content delivery, streaming media

Architectural representation of the proposed job scheduling mechanism using big data in the green cloud.
The main objective of the proposed scheduling is to minimize energy consumption in big data-related green clouds using multi-objective optimization functions. In order to handle big data, the MapReduce framework is implemented to address large-scale dimensionality challenges. Subsequently, the HFWSO is developed to render the optimal solution to implement the green cloud. This novel algorithm optimizes server and job allocation while enhancing scheduling performance in big data environments. The optimal scheduling process aims to address multiple objectives, including reducing energy consumption in datacenters, minimizing job queuing time, decreasing task allocation delays, lowering task completion time, minimizing makespan, and improving resource utilization. Finally, the performance is validated under multiple constraints, and the results demonstrate that the proposed method effectively minimizes energy consumption while efficiently allocating jobs to machines.
Proposed HFWSO-based scheduling
The proposed algorithm is developed by combining two conventional algorithms, namely WSA and FFA, as they achieve a rapid convergence rate and effectively resolve local optimum issues. On the other hand, when using these algorithms separately, they have some drawbacks, such as difficulty in finding the global optimum, challenges in handling large-scale data, lack of adaptability,
HFWSO introduces a fitness-aware adaptive control mechanism that guides the switching behavior between the two algorithms. Unlike WSA's fixed random number generation within the range [0,1], the proposed HFWSO computes a dynamic switching parameter based on population statistics, specifically the mean fitness, best fitness, and worst fitness, as shown in Equation (4).
In the above equation, the random value
WSA 32 : The WSA is inspired by the behavior of water striders, an insect species known for their remarkable ability to move and survive on the water's surface. Due to this fascinating capability, they have attracted significant attention in the field of optimization. The behavior of water striders can be divided into several stages, including territory establishment, mating, foraging behavior, and death.
Mathematical model
Initialization: During the initialization phase, water striders are represented as agents created from eggs distributed across the lake's surface. This initialization process is mathematically expressed using Equation (5).
Here, the upper and lower bound value is indicated by
Establishing territory region: The territory establishment is required to survive, feed and mate. Let us consider the total territories as Y. Initially, with the aid of fitness value, it is divided into distinct groups as
Mating: This is accomplished by ripple signals to produce the new striders. At the beginning stage, the male sends the ripple signals, and on the other side, the female responds with either repulsion or attraction of signals. Thus, both the attraction and repulsion are carried out by the probability value as b. The mating happens between the striders once the female sends the attraction signal, where the new position is obtained. When the female rejects the signal, then the male strider gets away from the place that leads to generating the new location. It is formulated using Equation (6).
Here, the term
Feeding: After mating, the females consume more energy, which tends to increase the food intake. This behaviour relies on the concept of the objective function. If the value exceeds the preceding state, the strider has identified the food. Otherwise, it moves forward to reach the best value of more fitness strider. Therefore, the new position is generated by Equation (8).
Here, the term
Strider death and larva succession: Once the food attainment process is done, the striders are compared with the objective value. If the value of the new fitness becomes lower, the keystone is dead in the designated territory region; otherwise, it remains to survive in the world. Further, the new larva is matured and acts as a keystone. Based on this new generation, the positions are arbitrarily initialized. It is shown in Equation (9).
Here, the upper and lower boundary values of the strider position in
Due to its bright nature, the firefly can attract other fireflies. A high level of brightness leads to achieving the high attractiveness feature. The low-brightness firefly moves towards the brighter firefly.
Based on the criteria, the FFA is accomplished. The search agents are initialized with the position. It is expressed as Eqution (10).
The
In order to update the new position, the
Here, the variable as
The term
Finally, a new solution is generated regarding the global best value determined by Equation (15).
In the aforementioned equation, the term

Flow chart representation of the suggested HFWSO algorithm.
The flow chart representation of the novel hybrid algorithm is shown in Figure 5.
The multi-objective function is derived by optimizing the task and machine allocation process. During job scheduling, the developed HFWSO algorithm optimally selects the jobs and VMs. For example, the model considers 50 tasks and 3 machines, and the optimization process is carried out to determine which job should be allocated to which machine. Due to these optimal allocation results, the proposed method can effectively minimize various datacenter performance factors. The key benefit of designing the multi-objective function lies in enhancing system performance by reducing energy consumption, execution time, resource utilization, makespan, queuing delay,
Here, the
Diverse constraints are employed for formulating the multi-objective function, which is described below.
Here, energy is measured through Kilowatt-hours and power
Here, the term
Experimental setup
The proposed task scheduling was implemented in MATLAB 2020a, and the experimental analysis was carried out. Here, the performance of the proposed model was compared over the conventional models in terms of convergence analysis and statistical analysis, run time, average completion time, percentage of used resources, QoS and waiting time, makespan, latency and resource consumption, etc. Here, the proposed algorithm has considered the 10 total populations and a total iteration count of 100. The proposed work is compared against SSA, 33 JAYA, 34 WSA 32 and FFA. 35
Generated Synthetic Data: A synthetic dataset was generated in this work to evaluate the robustness and effectiveness of the developed HFWSO algorithm. This dataset was carefully designed to mimic real-time cloud workload scenarios, as it encompasses variations in task arrivals, resource requirements, and execution times, thereby allowing for a systematic and controlled evaluation of the proposed scheduling model. Moreover, it replicates several cloud workload patterns, such as variable task arrival rates, heterogeneous resource demands, and fluctuations in task execution times. Using synthetic data enables a comprehensive experimental evaluation across a wide range of configurations and facilitates benchmarking the performance of the developed algorithm against related heuristic approaches under consistent testing conditions. The dataset includes details such as task limits, datacenter limits, number of datacenters, number of tasks, task weight, task size, task execution time, task waiting time, task energy consumption, datacenter size, datacenter busy time, and datacenter energy usage.
Validation metrics
Validation metrics used in the developed model are given below.
Average computation time: It refers to the time taken to complete all tasks assigned to the active servers. The average computation time is calculated and used to analyze the performance of the model.
Makespan: It is defined as the total elapsed time from the start of processing to completion. Specifically, it denotes the duration from the moment a job is allocated to a server until the corresponding machine completes the assigned tasks.
QoS: It represents the overall performance of a service, such as a cloud computing service, from the user's perspective. It includes parameters like availability, reliability, response time, and throughput.
Setting the configurations
The proposed model designs five different configurations, which comprise a greater number of tasks and more VMs. Table 2 demonstrates the configuration settings for the proposed job scheduling model. To perform the job scheduling, five configuration cases are defined. In the first case, 100 tasks and 10 VMs are used for allocation. In the second case, 200 tasks are allocated to 20 different VMs. In the third case, 30 VMs are used to allocate 300 jobs. In the fourth case, 400 tasks and 40 VMs are considered. Finally, 500 jobs and 50 VMs are used for job scheduling.
Configuration settings for proposed job scheduling.
Configuration settings for proposed job scheduling.

Convergence analysis of the suggested job scheduling process using big data in terms of different cases, as (a) configuration case 1, (b) configuration case 2, (c) configuration case 3, (d) configuration case 4 and (e) configuration case 5.
Figure 6 depicts the convergence analysis of the proposed model compared with other existing algorithms across different iterations. Figure 6(b) shows the convergence performance when the cloud executes 200 tasks distributed among 20 virtual machines. The cost function values are 10.1% for SSA, 23.5% for JAYA, 6.7% for WSA, and 37% for FFA, all of which are higher than that of the proposed HFWSO algorithm. This demonstrates the improved convergence rate achieved by the novel heuristic model. As a result, the proposed method effectively allocates jobs in a green cloud with big data while reducing energy consumption in datacenters.
Active server analysis on the recommended job scheduling model
Figure 7 illustrates the number of active servers utilized for job scheduling. By reducing the number of active servers

Active server analysis of the proposed job scheduling process over other algorithms.
Figure 8 illustrates the multi-objective analysis of the proposed job scheduling mechanism, while Table 3 presents the numerical evaluation of the developed HFWSO. With respect to different configuration cases, the objective constraints are analyzed and compared against classical heuristic algorithms. Figure 8(d) depicts the makespan analysis of the proposed scheduling mechanism. When the cloud network executes 300 tasks across 30 VMs, the proposed model reduces the makespan by 7.4% compared to SSA, 1.85% compared to JAYA, 11.1% compared to WSA, and 5.5% compared to FFA, respectively.

Comparison analysis on proposed job scheduling algorithm using big data with green cloud over other existing optimization algorithms by (a) computation time, (b) resource consumption, (c) energy consumption, (d) makespan, (e) QoS and (f) queueing time.
Numerical evaluation of the developed model.
Table 4 presents the statistical analysis of the proposed job scheduling method. This evaluation is performed using parameters such as best, worst, median, mean, and standard deviation. The mean represents the average of the best and worst values, while the median refers to the midpoint between them. The standard deviation indicates the degree of variation across multiple executions
Statistical evaluation of the proposed job scheduling system.
Statistical evaluation of the proposed job scheduling system.
Table 5 presents a comprehensive comparison of the proposed HFWSO algorithm against recent state-of-the-art metaheuristic approaches. Here, the energy consumption of the developed HFWSO is 48 kWh. The results prove that the developed model outperforms existing models such as DNN-EVARO, BSLO, IAFSA, and TLCO. It also demonstrates superior capability in providing a faster convergence rate, improved stability, and a lower error rate. This superiority of the model stems from its hybridized nature. When using only the FFA, the model may produce suboptimal results and may be prone to overfitting. Thus, employing the hybridized form enables the algorithm to achieve optimal solutions by leveraging the strengths of WSA and FFA, effectively reducing overfitting. Moreover, HFWSO faces challenges when applied to large-scale problems, particularly in distributed contexts. However, by integrating the MapReduce framework into the developed HFWSO model, this issue is effectively resolved, as MapReduce acts as a programming model for processing large datasets using parallel and distributed algorithms, offering efficient solutions to hybrid optimization challenges. By utilizing the exploration capability of WSA and the exploitation efficiency of FFA, the developed HFWSO achieves rapid and accurate solutions, allowing it to escape local optima. Furthermore, HFWSO demonstrates remarkable scalability and robustness in handling large-scale job scheduling within green cloud environments. These comparative findings reinforce the suitability of HFWSO as a powerful metaheuristic for multi-objective optimization
Comparison with recent metaheuristics.
Comparison with recent metaheuristics.
This section provides an in-depth discussion of the limitations, future scope, and implementation details of the developed model
Implementation Details: The simulation of the developed model was carried out using MATLAB 2020a on a Windows 11 operating system, which ensures stable and modern software execution. The system was equipped with an Intel Core i3 processor, a mid-range CPU capable of handling moderate computational workloads typical of scheduling simulations. Additionally, 8 GB of RAM was used to ensure smooth processing, while a 500 GB hard drive provided ample storage capacity, supporting extensive experimentation without storage constraints. This combination of hardware and software specifications reflects a practical, widely accessible, and cost-effective setup for implementing the proposed model under real-world conditions.
Limitations and Future Scope: Although the developed model demonstrates enhanced performance using a cost-effective implementation platform, it faces limitations when tested with synthetic data, as such data may not fully capture the unpredictability of real-world cloud environments. Furthermore, employing hybrid optimization for effectively distributing workloads across multiple machines remains a complex task, requiring careful consideration of communication overheads and data dependencies.
To address these issues, future research will incorporate benchmark datasets and real-world job traces for experimental evaluation, thereby improving the external validity of the study. This enhancement will also facilitate more comprehensive comparisons with other state-of-the-art scheduling algorithms using standardized devices, thus increasing the practical relevance and impact of the proposed work. Finally, the integration of a deep learning model with the hybrid optimization algorithm is envisioned to provide more accurate and optimal workload distribution in future implementations.
Conclusion
This study presented a novel job scheduling mechanism that utilized a hybridized heuristic approach for green cloud computing with big data. Initially, big data was distributed across the green cloud infrastructure, which required efficient management, a task that was accomplished through the MapReduce framework. Within the green cloud network, components such as the green broker, cloud offers, and datacenters played a crucial role in allocating jobs to servers or machines. To achieve this, a new hybrid optimization algorithm, termed HFWSO, was introduced by integrating the existing Water Strider Algorithm (WSA) and Firefly Algorithm (FFA). This hybrid algorithm was implemented to provide optimal scheduling results for varying numbers of tasks. The optimized values demonstrated how tasks were effectively selected and allocated to corresponding machines. Subsequently, the proposed algorithm addressed multiple objectives, including energy and resource consumption, latency, makespan, queueing time, and the number of active and inactive servers. Through optimization, a multi-objective function was formulated, which yielded superior results across all performance parameters. Finally, the proposed model was validated and evaluated under various constraints and was compared against classical models. When the cloud system employed 20 servers and 200 tasks, the energy consumption of the proposed HFWSO model was significantly lower by 10.5% compared to SSA, 21% compared to JAYA, 31.5% compared to WSA, and 47.3% compared to FFA. Thus, the results proved that the developed HFWSO model effectively reduced datacenter energy consumption by optimally allocating jobs to machines within a big data-assisted green cloud environment.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
