Abstract
Cloud infrastructure provides a real time computing environment to customers and had wide applicability in healthcare, medical facilities, business, and several other areas. Most of the health data recorded and saved on the cloud. But the cloud infrastructure is configured using several components and that makes it a complex structure. And the high value of availability and reliability is essential for satisfactory operation of such systems. So, the present study is conducted with the prominent objective of assessing the optimum availability of the cloud infrastructure. For this purpose, a novel stochastic model is proposed and optimized using dragonfly algorithm (DA) and Grey Wolf optimization (GWO) algorithms. The Markovian approach is employed to develop the Chapman-Kolmogorov differential difference equations associate with the system. It is considered that all failure and repair rates are exponentially distributed. The repairs are perfect. The numerical results are derived to highlight the importance of the study and identify the best algorithm. The system attains its optimum availability 0.9998649 at population size 120 with iteration 700 by GWO. It is revealed that grey wolf optimization algorithm performed better than the Dragonfly algorithm in assessing the availability, best fitted parametric values and execution time.
Keywords
Introduction
Since last few years cloud infrastructure becomes very popular in MNCs. The terminology used for providing hosted services is known as cloud computing. The prominent objectives of cloud infrastructure are to provide simple and ascendable services to computing services. In cloud customers have freedom to manage applications, data, and operating systems. These services are classified as IaaS, PaaS and SaaS. Various hardware and software components involved in cloud infrastructure which establish the connection between front end and back-end devices. The central server depends on the protocols for data swapping and use software to establish connectivity between various clients and servers. It depends on the virtualization and automation. The key advantages of cloud are self-servicing, elastic, cost effective as option of pay per use, workload resilience, migration facility and broad network access.
In present days security and performance are the key challenges faced by the cloud management team. Many services like networking, security, billing, recovery, and load balancing are managed by cloud services. As cloud infrastructure is a very complex structure involving various physical and virtual components and it is very difficult to attain desired performance. So, complexity and performance issues are the prominent motivational factor behind this study. These can only be addressed if system operated with high reliability.
Several studies have been conducted to ensure the reliability and performance of cloud infrastructure systems. Vishwanath and Nagappan [19] briefed the characteristics of hardware components of cloud and their reliability. Xuejie et al. [51] used hybrid methods for reliability evaluation of cloud systems. Wu et al. [50] suggested a fast optimization algorithm for cloud services to enhance the reliability and performance. Tamura et al. [47] discussed and analysed an open-source cloud. Li et al. [18] used coloured petri nets methodology for reliable message queueing services in cloud computing environment. Sharma et al. [46] conducted a survey and presented the taxonomy for reliable and energy efficient cloud systems. Bai et al. [45] suggested a model for reliability evaluation of cloud based on complex computing network. Alannsary and Tian [22] created a mechanism for monitoring and predicting SaaS dependability from web server logs. Tang and Tan [43] proposed an efficient scheduling algorithm for efficient reliability awareness of heterogenous systems. Zhang et al. [16] suggested a mathematical model for the city distribution system using Bayesian networks and identified the influence factors. Qiu et al. [42] suggested a correlation model and optimized the resources in cloud computing under concept of fault recovery. Ahmad et al. [40] suggested a reliability model for the communication networks. Li et al. [41] explored the service reliability of a cloud based active data center based upon the IT infrastructure. Bai et al. [10] investigated a gear system and proposed the reliability model for estimation of parameters. Mesbahi et al. [23] presented a roadmap for achieving the high availability in cloud environments. Riza and Nugroho [20] developed an R package for eleven metaheuristic techniques based on natural event and animal behavior. Meng et al. [35] conducted a simulation study to evaluate the reliability of cloud infrastructure. Nguyen et al. [38] suggested hierarchical models to evaluate the reliability and availability of cloud-based data center networks.
Hu et al. [8] developed some cost-effective models in automotive production lines. Wang et al. [48] assessed the reliability of a multi-state pipeline by using the cloud-based inferences. Sinwar et al. [9] studied sewage treatment plant’s performance and optimized by GA and PSO. Li et al. [44] used petri net method for reliability evaluation of the service and management of data centers based on cloud. Amiri et al. [2] proposed a model for performance validation of dynamic routing in various architectures based on cloud and service. Alamri et al. [28] proposed a stress strength reliability model and estimated its characteristics using Rayleigh and half-normal distributions. Maciel et al. [30] presented a review work on the reliability evaluation of edge, for and cloud computing. Kumar et al. [4] investigated the cyber physical systems and various reliability critical perspectives and their impact discussed. Liu et al. [49] proposed an approach based on mathematical modelling to evaluate the reliability of Microservice-oriented cloud applications. Kumar et al. [6] performed the RAMD investigation along with Markov modelling for underground pipelines associated with tube wells.
It is revealed that most of the work explored so far for reliability investigation either for cloud infrastructure or other systems solely for evaluation of local solution. Few researchers utilized nature-based algorithms for optimizing the reliability of industrial systems. Akyol and Alatas [31] summarized all the algorithms and techniques related to plant intelligence with classifications of metaheuristic optimization techniques. Alatas and Bingol [7] focused on new light-based algorithm and proposed a comparative study between ray optimization (RO) and optics inspired optimization (OIO). The performance of light-based intelligent optimization algorithms was carried out on unconstrained benchmark functions and constrained real engineering design problems. Ayyarao et al. [39] developed a new traditional military tactics-based war strategy optimisation (WSO) algorithm and compare it with other techniques. It was based on the tactical deployment of military forces during conflict. Kumar et al. [3, 27] suggested efficient metaheuristic based stochastic models for performance optimization of sludge digestion processing unit, cooling towers and E-waste management plant. Saini et al. [24–26] proposed stochastic optimization model based on nature inspired algorithms for biological and chemical processing unit of sewage treatment plant, load haul dump machines and condenser unit of steam turbine power plant. Math et al. [32] developed a proactive fault management system for software defined IoT services. Wei [15] studied the distributed networks and proposed a reliability model. Zheng and Qiao [17] proposed a reliability investigation method based on conditional random field for rotating machinery. Zhao et al. [52] explored the applications of machine learning in reliability investigation of industrial systems. Meng et al. [34] investigated the reliability of a public cloud and optimized the autonomous reliability. Gupta et al. [29] explored the applications of hybrid whale optimization for resource allocation in cloud and its applications in E-healthcare sector. Several advanced metaheuristic approaches like ant colony optimization (Dorigo et al. [21]), dragonfly optimization (Mirjalili [36]) elephant herding optimization (Wang et al. [13]), Earthworm optimization (Wang et al. [12]), monarch butterfly optimization (Wang et al. [14]), Harris hawks optimization (Heidari et al. [1]) and Slime mould algorithm (Li et al. [33]), Moth search optimization (Wang [11]) are developed by researchers for performance optimization of industrial systems. The feasibility of these algorithms may be further explored in healthcare sectors. The summarization of the system evaluation techniques shown in Table 1.
Summarization of system evaluation techniques
Summarization of system evaluation techniques
It is observed that reliability aspects of cloud infrastructure are not so extensively explored by the researchers yet. Though in engineering the application of metaheuristic algorithms explored by several researchers. Mirjalili et al. [37] developed grey wolf optimization (GWO) and proved that it is benchmarked on 29 well known functions. It is shown that GWO out-perform on traditional algorithms in terms of avoiding local optima and convergence rate. This motivated to opt GWO for exploring the applicability of GWO in performance evaluation of cloud infrastructure. For this purpose, a novel stochastic model is proposed and optimized using Grey Wolf Optimization Algorithm (WOA). For comparison purpose results are compared with dragonfly algorithm outputs.
It is considered that all failure and repair rates are exponentially distributed. The repairs are perfect. The numerical results are derived to high light the importance of the study and identify the best algorithm. It is revealed that Grey Wolf optimization algorithm performed better than the Dragonfly algorithm in assessing the availability, best fitted parametric values and execution time.
The whole manuscript is organized into six sections including the current introduction section. Section 2 is designated to the material and methods having notations, assumptions, and system description while a stochastic model is proposed in section 3. Various optimization strategies discussed in section 4 and numerical result discussion is made in section 5. Section 6 is the final concluding section.
Notations
The nomenclature used to develop the mathematical model for cloud infrastructure is appended below:
S i (i = 0, 1, \dots .48) = States of the system. P0(t) = system at 0 state at time t. A0 = Availability of system. **Note: Here U/M = Under Maintenance and C. F. = Complete Failure.
System description
From last few years, data is generating continuously at a large scale in our day-to-day life or in industries. It became very difficult to handle all these data sets with traditional methods in traditional/personal devices. So, the requirement of handling this data at a big scale is needed. This problem is solved by cloud service. A cloud is containing large amount of data related to user and of many more. User can handle its own data whenever he/she wants. Cloud service authorised users can do several things like input new data in cloud, manage old data, recover data from cloud to his own system (laptop/PC’s or mobile etc). cloud service provides security of data, unlimited scalability and faster speed to access data and inter-connectivity. In cloud service data is stored on multiple machines to minimize the effect of data losses. Cloud infrastructure include hardware and software components, like servers, storage, networking, service, and management tools etc. Itbecomes difficult to obtain best performance due to the complexity of a cloud infrastructure system. It can only be achieved with the help of high reliable, available and maintenance of all the components of a cloud infrastructure.

Architecture of cloud service functioning.

State transition diagram of cloud Service providers system.
In present study, a stochastic model is proposed for the cloud infrastructure to assess its performance. The proposed system contains various sub systems like client, database, cloud, Service, and network. In proposed system standby database subsystem is used in case of failure of main database. It ensures the users to provide a secure and immediate access to backed-up versions. Client and network may face software and hardware both type of failures. Subsystems database and cloud may face software and hardware failure including under maintenance phase. Service may face only software failure. Other types of possible failure like due to overload, cyber-attack and hacking are also included in load failure. In this situation, complete system may fail or crash. All type of hardware failure is handled by repairman and software will go for update or upgrade during failure. An architecture of cloud service functioning is shown in Fig. 1. Whereas Fig. 2 appended the stochastic model of cloud.
Any computer hardware or software device that seeks access to a service offered by a server is referred to as a client in home and business networks. In a client-server architecture, clients are often thought of as the requesting program or user. Clients communicate with servers in a client-server architecture by requesting data or resources that the client is unable to provide. It’s possible for clients and servers to be spread apart and still connect over a network. They might even be situated on the same computer and communicate with each other between processes. Smartphones, laptops, and desktop PCs are examples of client end-user devices. In a client-server architecture, a client in a computer network is what asks a server for a service or resource. The server may be situated on-site or elsewhere.
Database
A database is a planned grouping of material that has been arranged and is often kept electronically in a computer system. A cloud database is a type of database designed to operate in a hybrid or public cloud environment and aid in managing, organizing, and storing data for an organization. Cloud databases may be made available as managed databases-as-a-service (DBaaS) or may be set up on a virtual machine (VM) hosted in the cloud and self-managed by an internal IT staff. This subsystem can face Hardware and software both types of failures. In this study a cold standby technique used against subsystem database. The entire system can collapse if both the subsystem fails.
Service
Choosing a cloud type or cloud service is an original decision. Despite being of the same type, no two clouds are exactly comparable, and no two cloud services are used to solve the same problem. The three main types of cloud computing services are as follows: Infrastructure-as-a-Service (IaaS) Platforms-as-a-Service (PaaS) Software-as-a-service (SaaS) But by recognizing the parallels, you can better understand how the limitations of each type of cloud computing and cloud service could affect your company.
The restrictions of each form of cloud computing and cloud service could impact business by understanding the commonalities. Request stage failure and Execution stage failure are the two basic categories of service failure. Alibaba Cloud, Amazon Web Services (AWS), Google Cloud, IBM Cloud, and Microsoft Azure are some of the biggest public cloud service providers.
Network
The network is made up of actual wires, switches, routers, and other gear. On top of these actual resources, virtual networks are created. A typical cloud network design consists of several subnetworks with varied visibility levels. In the cloud, virtual local area networks (VLANs) may be built, and all network resources can receive either static or dynamic addresses based on their requirements. The transmission of cloud resources to users through a network like the internet or an intranet allows you to use cloud services or apps remotely at any time. The four primary categories of cloud computing are private, public, hybrid, and multi-cloud.
Load failure
In cloud computing, load balancing is a crucial technique used to maximize resource use and ensure that no resource is overloaded with traffic. By preventing single points of failure and maximizing resource utilization, load balancing enhances the overall performance and dependability of cloud-based services. Additionally, it helps with on-demand application scalability and offers high availability and fault tolerance to resist traffic spikes or server failures.
Assumptions
The proposed model is developed under a set of following assumptions: Components of cloud configured as a series structure. No occurrence simultaneous failures. Maintenance, repairs, and switch devices are perfect. Independently and identically distributed (IID) failure and repair rates having exponential law. Failures are uncorrelated to each other.
Stochastic modeling of cloud infrastructure
In this section, a stochastic model of the cloud infrastructure is developed by using Markov birth death process. The Chapman-Kolmogorov differential difference equations are derived along with the initial conditions. The differential-difference equations are as follow:
Method
The transition probability expressions of all steady state are derived below,
By taking limit t→ ∞, we get
Initial conditions:
Solving the linear system of Equations (1–49) using initial conditions (50), the following probabilities derived at various states:
By using normalization condition,
The expression of P0 derived by using Equations (51, 52) and shown in Equation (53) as follows:
Where
The system availability is described as the sum of the upstate probability. Mathematically the expression for system availability is derived as:
The Equation (54) provides the local solution of cloud infrastructure either using algebraic or numerical methods. The availability of the cloud infrastructure is influenced by the failure and repair rates of subsystems of cloud infrastructure. To attain the optimal solution, it is necessary to apply some optimization technique. The availability function (54) is the objective function, and all failure and repair rates are termed as the decision variables. Mathematically, the optimization problem defined as:
Objective function: Max. A0
Where i = 1, 2, . . . ,10,11.
Nature inspired algorithms are extensively used to attain the global solution of real-world complicated problems or complex systems. In present study, two well-known algorithms namely Dragonfly and Grey wolf optimization are utilized to attain the optimum availability of cloud infrastructure.
Grey wolf optimization
Grey wolf optimization is a metaheuristic technique that is used to optimization the performance of the systems. It is inspired by the social hierarchy and the hunting behaviour of grey wolves. They are divided in four categories alpha, beta, delta, and omega. Alpha wolf is leader of the pack. The hierarchy of wolves is appended in Fig. 3. Mirjalili et al. (2014) proposed the mathematical equations for the algorithm are mentioned below:

Hierarchy of grey wolves.
Main steps of grey wolf hunting are given below: Searching the prey. Tracking, chasing, and approaching prey. Pursuing, Encircling and Harassing the prey until it stops moving. Attacking the prey.
The encircling and hunting behaviours of GWO are modelled as:
Where
t = current iterations;
coefficient vectors
The hunting behaviour of grey wolf is modelled as:
Dragonfly optimization is developed by Mirjalili. It is also a metaheuristic algorithm inspired by behaviour of dragonflies in nature. These are expert flier. They hunt small insects in the nature for their survival. This algorithm is inspired by the static and dynamic behaviour of dragonflies. In static swarm they fly in a small group and over a small area, they hunt other flying insects like mosquitos, butterflies, and other small insects. In dynamic swarm they make a large group and migrate in a one direction over a long distance. Mirjalili (2016) proposed the mathematical expression for the algorithm are mentioned below:
There are three main stages in this model. Separation:
X and X
i
indicate the position of current individuals and jth nearby individuals. N is the total count of neighboring Alignment:
V
i
indicates the velocity of ith neighboring. Cohesion:
Attraction and repulsion are the other stages.
Attraction: F d = X+ - X; Repulsion: E d = X- + X
Here
X = position of current individuals; X+ = position of food source; X– = position of natural predator.
Step vector calculation
s = separation weight and S d = separation individuals
a = alignment weight and A d = alignment of ith individual
c = cohesion weight and C d = cohesion of ith individual
f = food factor and F d = food source of ith individual
e = enemy factor and E d = position of enemy of the ith individual;
w = inertia weight
t is the iteration counter
position vector calculation
t = current iteration
Equations
The rule for updating position of dragonflies is defined as:
Here t = current iteration; D = dimension of the position vectors; LFM = levy flight mechanism; LMF is calculated as
r1andr2 are random numbers in [0,1]; β=constant
σ is calculated as
Numerical results and discussion
In this section, parameter estimation and performance optimization of the cloud infrastructure system is performed with the help of grey wolf optimization and dragonfly algorithm. The optimum value of the availability of the cloud infrastructure and best fitted values of failure/repair rates are explored in the search space. The given system has only simulation test bed and in given simulation scenario arbitration is the best way to select parameters value. The arbitrary values of the search space given in Table 2. As the main objective of the study is to optimize the availability of the cloud infrastructure so the availability function given in Equation (54) is treated as the objective function having failure and repair rates as decision variables. The well-known optimization approaches grey wolf optimization and dragonfly algorithm employed to attain optimum values. For simulating the experiments, we have used R and R Studio on Windows 10 64-bit operating system having 8 GB of RAM and Intel Core i5 8th generation CPU. The “MetaheuristicOpt” package developed by Riza and Nugroho used (2018) for optimization.
Failure and repair rates of sub systems of cloud service providers system
Failure and repair rates of sub systems of cloud service providers system
The availability of cloud infrastructure is derived using two meta-heuristic techniques at different iterations and varying population sizes. The numerical value of availability is obtained corresponding to various population sizes between 20 to 120 at a difference of 20 units between iterations 500 to 1500. The estimated value for all the repair and failure rates are also obtained in all the cases. It is observed that systems availability is slightly increases with respect to increase in iterations for different population sizes and at 500 iterations in grey wolf optimization (GWO).
The proposed cloud infrastructure system attained its optimum availability 0.9998649 at population size 120 after 700 iterations using grey wolf optimization algorithm. On the other hand, dragonfly algorithm shows rapid variation in availability of cloud infrastructure system. Cloud infrastructure system attained availability 0.6879055 at population 100 after 1300 iterations by dragonfly algorithm. It is also shown in further results that system reach to maximum availability value 0.7771377 at population size 100 after 2500 iterations. Table 3 revealed that, grey wolf optimization outperformed over the dragonfly algorithm in terms of availability as well as convergence time.
Availability of cloud service providers system with respect to iterations at different population sizes by using GWO and DA
The estimated values of the failure and repair rates associated with time dependent random variables after population size 20 are appended in Table 4. The parameters are estimated at various iterations i.e 500, 700, 900,1100, 1300 and 1500. It is observed that Failure rates α4 andα5 are highly sensitive with respect to variation in population size in both the algorithms.
The estimated values of the failure and repair rates associated with time dependent random variables after population size 40 are appended in Table 5. The parameters are estimated at different iterations 500, 700, 900,1100, 1300 and 1500. It is observed that failure rates and repair rates α3, α4 andα5 are highly sensitive with respect to variation in population size in GWO algorithm while the repair rates are more sensitive in dragonfly algorithm. Table 6 shows the estimated parameters value after population size 60 at various iterations, and it is identified that most of the failure rates after population size 60 remains approximately stable in GWO algorithm.
Parameter estimation of various failure and repair rates for population size 20 and different iterations by using GWO and DA
Parameter estimation of various failure and repair rates for population size 40 and different iterations by using GWO and DA
Parameter estimation of various failure and repair rates after for population size 60 and different iterations by using GWO and DA
Table 7 appended the estimated parameters value for population size 80 at various iterations, and it is identified that most of the failure rates after population size 80 remains approximately stable in GWO algorithm while failure and repair rates in dragonfly algorithm are sensitive with respect to population sizes. Similar pattern is followed by the estimated parametric values after population size 100. The estimated parametric values for population size 100 is given in Table 8.
Parameter estimation of various failure and repair rates for population size 80 and different iterations by using GWO and DA
Parameter estimation of various failure and repair rates after for population size 100 and different iterations by using GWO and DA
The estimated parameters value for population size 120 at various iterations are given in Table 9. It is revealed that the failure and repair rates of the system after population 120 becomes stable in GWO algorithm while in dragonfly algorithm few of the parameters show instability with respect to various population sizes.
Parameter estimation of various failure and repair rates for population size 120 and different iterations by using GWO and DA
The availability of any system including computer networks, computing devices, cloud infrastructure, etc. is highly influenced by failure and repair pattern of that system. These clouds have crucial role in the operation and maintainability of health data. Here, a stochastic model for cloud infrastructure is developed using Markov birth death process and its availability function is derived. The availability function (objective function) is optimized using dragonfly and grey wolf optimization algorithms along with the estimation of the decision variables best value. The thorough investigation about availability optimization is made for different population sizes at various number of iterations.
It is revealed that optimum availability 0.9998649 is obtained by GWO for population size 120 and iterations 700. The value of best estimated parameters is shown in Table 9. The GWO algorithm outperforms than the DA in terms of optimum value of objective function as well as execution time. From Tables 4–9, it is revealed that software failure of services and databases are most sensitive and needs extra care. The results of present study are very helpful for the cloud infrastructure developers. The maintenance personnel may also use the study for observing the impact of preventive maintenance on system performance. The present work can be further extended to other optimization techniques like ant colony optimization, Whale Optimization Algorithm, Grasshopper Algorithm, Cuckoo Search Algorithm, elephant herding algorithm, monarch butterfly optimization, war strategy optimization algorithm, slime mould algorithm, and Harris hawk’s optimization. As present study is performed under the specified assumptions like constant failure and repair rates beyond the proposed methodology may deviate from the findings. The assumptions of the proposed model treated as the limitations of the study. As a future direction of present study, the investigation of cloud infrastructure may be done by considering arbitrary failure and repair laws as well as simultaneous occurrence of multiple failures. The industrial application of present methodology can be observed in other industries like process, automobile, health, and mechanical. The present methodology can be opted in performance optimization of computer networks, distributed networks, and large-scale clouds. The present investigation is performed by considering exponentially distributed failure and repair rates, but it may be extended to the arbitrary distributed random variables. It can be extended to the large-scale and complex structured cloud infrastructures in future works.
Conflict of interest
The authors have no conflict of interest.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the author without undue reservation.
Funding statement
No Funding was used in this study.
