FBQ-LA: Fuzzy based Q-Learning approach for elastic workloads in cloud environment

Abstract

Cloud computing relates to the storage and accessing of data as a service from Internet for any organizational infrastructure at-any-time. The delivery of some of the services related to computing such as servers, networking, storage, software, etc., is made possible with the use of cloud computing. Companies offer these services in terms of cloud service providers (CSPs) who charge for the services they provide to the users. When a request is made to use the services, the service provider allocates a feasible number of virtual machines (VMs). Determining optimum amount of resources required at runtime to satisfy the user’s request is not a trivial task. Therefore, in cloud ecosystem the cardinal issue is the management of resource allocation to an application in order to abide by the service level agreements (SLAs). The fundamental objective of cloud service management is to design a self-adjustable auto-scalar to respond to elastic workload and optimizing the allocation of resources with reduced cost. The notable issue is how and at what time resources are to be allocated/de-allocated in order to follow agreed SLAs. In this paper, we propose a resource provisioning framework based on the integrated concepts of autonomic computing with Fuzzy Q Learning and Chebyshev’s Inequality principle. The concept of auto-scaling mechanism is commonly implemented in four phases of proposed autonomic MAPE loop framework: Monitoring, Analysis, Planning and Execution. The proposed framework follows the control MAPE loop structure with the inclusion of Chebyshev’s inequality for prediction in the analysis phase and fuzzy Q-learning in planning phase, where human intervention in the form of fuzzy rules ensures efficacious provisioning of VMs. A comparative analysis has been performed with a different combination of (i) LRM in the analysis phase with FBQ-LA in planning phase, ii) Chebyshev’s Inequality in the analysis phase with FBQ-LA in planning phase, and iii) Chebyshev’s Inequality in the analysis phase with Q-Learning in planning phase. Experimental results prove that the proposed autonomic model based on Chebyshev’s inequality and FBQ-LA outperforms the existing model in terms of improved VM provisioning, minimized costs as well as reduction in response time.

Keywords

Resource provisioning autonomic computing MAPE loop fuzzy Q-learning Q-learning

1 Introduction

Cloud computing is a pioneering technology, which provides the facility for bulk data storage in a remote location for easy accessing through internet. Utility of cloud computing being a pay-per-use model stimulated a large number of organizations to use a variety of resources through internet on rental basis [1]. With the phenomenal growth in cloud computing, innumerable application providers are hosting their applications on the cloud. Many cloud service providers like Amazon EC2 are giving their resources on rental premise and making profits. In fact, the ‘pay-per-use’ policy of cloud computing has revolutionized the business environment. The end user requests for availing the services offered by the SaaS provider is fulfilled by renting the infrastructure from IaaS provider. Since, CSP is well aware of the users’ demand dynamics, static resource provisioning may not yield optimum solution. This is due to the fact that, in static provisioning, with the surge in users’ demand, the occurrence of under-provisioning leads to the disruption or delayed response to user’s requests. This causes a breach of SLA. Whereas, with the decreased rate of traffic, the problem of resource over-provisioning occurs causing wastage of resources and will incur higher costs to CSP.

Managing resources for elastic workload in cloud environment is a challenging task for CSP [2 –5]. Therefore, the requirement for better strategy, meeting the SLA and at the same time benefiting the CSP is indispensable.

To strike a balance between minimizing SLA violation and reduced cost of CSP [6–7], there is a need of intelligent and automated strategies for efficient auto scaling, which can provide dominant and suitable paradigms in the form of autonomic computing. Autonomic computing is a model which is self-managed, self-healed, self-configured and self-protected [8 –10]. To achieve auto scaling mechanism, IBM provides a framework called control MAPE (Monitor, Analyze, Plan, Execute) loop. A system that adopts MAPE loop repeatedly follows all the four phases of monitoring, analysis, planning and execution to obtain optimal result in such a way that monitoring phase iteratively collects all the information regarding workload and resources. Analysis phase performs analysis by using the information gathered from monitoring phase. Subsequently, planning phase makes decision of scale up/scale down and execution phase executes decision by adding/removing appropriate VMs. This work presents an improved version of Analysis and planning phase in terms of VM provisioning, reduced cost and reduced response time.

This research focuses on the design of efficient autonomic framework. Our work follows MAPE loop architecture wherein Chebyshev’s inequality principle is implemented for forecasting workload by taking a workload history from monitoring phase. On the other hand, planning phase uses Fuzzy based Q-Learning to improve the learning accuracy and speed of convergence of Q-Learning. A summarized research contribution of this paper is described in the following manner.

An autonomic framework inspired by a MAPE Loop model of IBM for promoting auto scaling of resources is proposed.

An enhancement in analysis phase by the use of most accurate predicting model is proposed.

Use of reinforcement learning, i.e. Fuzzy Q Learning as a decision maker in the planning phase is proposed.

A number of experiments have been conducted using real world NASA workload data to prove the performance of the proposed work.

The rest of the paper is organized as follows. Section 2 is all about detailed background in terms of autonomic computing and fuzzy Q- Learning. Section 3 presents the related works. The proposed work is described in Section 4. Section 5 emphasizes on results and discussions followed by Section 6 that deliberates conclusion and future works.

2 Background

2.1 Autonomic computing

Autonomic computing refers to the computing systems that are self-managing and adjust to the environment as per requirements. The term autonomic computing was coined by IBM [11] in the form of reference model named MAPE control loop which is shown in Fig. 1. MAPE stands for Monitoring, Analysis, Planning and Execution. In the control MAPE loop, the elements such as operating system, VMs, CPU, storage, and other services are considered as data center elements. The sensor senses the managed elements and collects information such as the waiting time and response time of cloud services. The effectors incorporate changes required, such as adding or removing VMs. The autonomic manager consists of these sub phases viz. monitor, analysis, planning, and execution. The related information about resources can be collected from the sensors to fulfill the required changes through the effectors.

Fig. 1.

Autonomic computing MAPE-K Loop framework.

2.2 Q-learning and fuzzy Q-learning

As of today, different resource provisioning techniques in light of machine learning has been utilized. Among them Neural Networks [12 –14], Genetic Algorithm [15], Markov Chain [16] etc., are immensely adopted. But, our inclination is towards reinforcement learning (RL) in view of the accompanying two reasons:

As to obtain the training dataset of workload in the cloud is not possible and reinforcement learning does not require any training dataset.

RL is suitable for learning in dynamic and complex environment such as a cloud.

Q-Learning: Q-learning is a type of reinforcement learning (RL) technique in which the future action is dependent on the reward given on going from one state to other [17–18]. In the dynamic environment, an optimal action based on trial and error method is selected. A RL problem can be modelled as Markov decision problem, where S is the discrete states and A is the action with R as a reward function [19].

Fuzzy Q-Learning: Fuzzy Q-Learning is a reinforcement learning based on dynamic programming. Fuzzy Q-Learning takes values from fuzzy rules and then optimize these values by Q-Learning. With fuzzy controllers, rules are constructed from expert knowledge and tuned by knowledge gained from Q-Learning during runtime. Expert knowledge is tuned in terms of fuzzy rules which in turn go through the dataset at runtime and continuously tuned until the convergence is achieved.

3 Related works

This section explores the related works in the field of autonomic computing, use of FQ-L for auto scaling and FQ-L for knowledge evolution. This section examines related work in two parts (i) Autonomic computing and FQ-L for auto scaling in cloud, (ii) FQ-L for tuning rules of fuzzy controller for knowledge evolution.

3.1 Autonomic computing and use of FQ-L for auto scaling

Xu et al. [20] proposed a two-level resource management framework to provision the resources to each individual virtual container, where fuzzy based logic is used in the local controller of virtual container to handle the uncertainties of fluctuating workloads. A simulation engine embedded with Case Based Reasoning (CBR) for knowledge management and decision making is presented by Maurer et al. [21]. Mao et al. [22] presented a methodology based on monitor-control loop having the aim of accomplishing its goal with user’s specified deadline in a cost-efficient way. Ritter et al. [23] introduces a dynamic provisioning and cost-efficient autonomic framework for multi-tenant system topologies. This model empowers provisioning capacities, supporting the client’s request, utilizing resources in a cost-effective way and resources shared by multiple tenant. Frey et al. [24] discusses about autonomic resource management in virtualized data centers using fuzzy logic. An idea related to type-2 fuzzy system for handling uncertainty of elastic workloads with the use of fuzzy logic for specifying elastic rules is presented by Jamshidi et al. [25]. Jamshidi et al. [26] proposed a self-learning fuzzy controller FQL4KE that modifies fuzzy rules automatically at runtime and help elasticity management with dynamic approach. Singh et al. [27] presents an energy aware autonomic framework for scheduling resources in terms of energy efficiency in data centers. Amiri et al. [28] deals the problem of resources and energy wastage by utilizing the concept of Reinforcement Learning and a Fuzzy approach for dynamic resource distribution. Arani et al. [29] proposed a hybrid framework based on the combination of both Reinforcement Learning and Autonomic computing. This work reduces SLA violation by minimizing the total cost and also increases the resource utilization. Arabnejad et al. [30] introduces two approaches, namely fuzzy SARSA learning and fuzzy Q-learning for cloud auto scaling. Aslanpour et al. [31] proposed a control MAPE-K loop architecture that emphasizes on reducing the total cost with the use of a cost saving super professional executer.

3.2 FQ-L for tuning rules of fuzzy controller for knowledge evolution

Fuzzy Q-learning is not a new approach for decision making and knowledge evolution, where a good number of remarkable works in this field are credited. A work proposed by Glorennec et al. [32] explores a dynamic version of fuzzy Q-learning method (DFQ-L) and a comparative result of this method with the basic fuzzy Q-learning. DFQ-L removes the drawback of both Q-learning and fuzzy Q-learning. Berenji et al. [33] proposed a methodology for Fuzzy based learning called GARIC-Q. GARIC-Q uses intelligent agents controlled by FQL for incremental dynamic programming. A hybrid algorithm proposed by Oh et al. [34] combines the advantages of both fuzzy Q-learning and conventional Q-learning. Jouffe et al. [35] discusses two methods named fuzzy actor-critic learning and fuzzy Q-learning for online tuning the concluding part of a Fuzzy Inference System (FIS). Bonarini et al. [36] introduced two strategies to distribute reinforcements to handle situations arising from interactions among the rules. In FIS, the interaction between fuzzy rules create a problem in learning process due to incoherency of reinforcement coming from rules. Boumehraz et al. [37] proposed a Reinforcement Learning mechanism for tuning the rules of a fuzzy inference system. Er et al. [38] introduces a dynamic fuzzy Q- learning (DFQL) for tuning rules online and a novel self-organizing learning algorithm is implemented for automatic identification of structures using Q- Learning. Cabrerizo et al. [39] presented a methodology that covers the challenges associated in group decision making and has been analyzed using Fuzzy system.

4 Proposed work

4.1 Design of fuzzy controller

Fuzzy inference system (FIS) is a system that maps a set of input to output through fuzzy rules. FIS with N fuzzy rules can be defined as; $\begin{matrix} {Rule}_{i} : if x_{1} is A_{1}^{i (1)} and x_{2} is A_{2}^{i (2)} \dots x_{n} is A_{n}^{i (n)} \\ then y is B^{i} \end{matrix}$ (1)

Where, ${(A_{j}^{i (j)})}_{j = 1, n}$ and Bⁱ, i = 1 to N is a fuzzy subset characterized by linguistic labels like ‘small’, ‘medium’, ‘large’, and a membership function x → μ_A (x) ∈ [0, 1] quantifying the degree of x to A. Design of a fuzzy logic controller involves defining the fuzzy sets and membership functions of each input parameter (i.e., each crisp input set is converted into a fuzzy set through membership value). The fuzzy inference system combines rules with membership function to obtain the fuzzy output. De-fuzzification involves converting the output fuzzy value to the crisp value again. For input x, the diffuzified output is given by; $y = \frac{\sum_{i = 1}^{N} α_{i} (x) \times b^{i}}{\sum_{j = 1}^{N} α_{j} (x)}$ (2)

Where, α_i (x) is the rule strength.

Expert knowledge in the form of fuzzy rules can be applied to a given situation to take appropriate actions. Fuzzy rules are in the form of IF-THEN that depicts the human knowledge and can also handle the complex situations.

To perform the fuzzy related functionalities, in this work, we take two inputs and one output parameter. Two inputs are for the predicted values which are obtained from analysis phase and another is the number of requests in a given time (i.e. workload from monitoring phase). Output is a scaling action to be performed. Three scaling actions have been used to identify the output function (i.e. Scale IN (removing VMs), Scale Out (adding VMs), and No Operation). After this, we consider the first step and divide the input set into fuzzy state using membership value. Here, a triangular membership function is used to measure the membership degree. Each input value is now associated with linguistic term. The predicted value is labeled with three linguistic terms as; ‘Low’, ‘Medium’, and ‘High’. On the other hand, the workload has also the same three linguistic terms. After computation, Scale Out operation takes a maximum of 4 VMs as Scale Out {+4 + 3,+2,+1}. For Scale In, we can reduce up to 4 VMs, i.e. Scale In {–4, –3, –2, –1} and for No Operation it is {0}. Therefore, the output has the following value {–4, –3, –2, –1,0,+4,+3,+2,+1}. Rules formed by expert knowledge using these parameters are shown below.

$Rule$ ₁ : If the prediction is low and workload is low, then no operation with Q (s₁, a_i).

$Rule$ ₂ : If the prediction is low and workload is medium, then scale out with Q (s₂, a_i).

$Rule$ ₃ : If the prediction is low and workload is high, then scale out with Q (s₃, a_i).

$Rule$ ₄ : If the prediction is medium and workload is low, then scale in with Q (s₄, a_i).

$Rule$ ₅ : If the prediction is medium and workload is medium, then no operation with Q (s₅, a_i).

$Rule$ ₆ : If the prediction is medium and workload is high, then scale out with Q (s₆, a_i).

$Rule$ ₇ : If the prediction is high and workload is low, then scale in with Q (s₇, a_i).

$Rule$ ₈ : If the prediction is high and workload is medium, then scale in with Q (s₈, a_i).

$Rule$ ₉ : If the prediction is high and workload is high, then no operation with Q (s₉, a_i).

Where, s_i (1 ≤ i ≤ 9) is the state and a_i (1 ≤ i ≤ 9) is the action.

4.2 Fuzzy Q-Learning for auto scaling

Dynamic resource provisioning can also be achieved through auto scaling. Auto scaling is a decision making problem. Virtual machines are allocated to the cloud based applications by monitoring the current workload and future predictions. The idea behind this is to keep the response time of the system always below the desired response time as given in the SLA. To fulfill this constraint, in this paper, we use Fuzzy Q-Learning in planning phase of autonomic computing. Different characteristics of the system such as workload and response time are continuously monitored. The fuzzy rules obtained above are continuously tuned by Q-Learning to achieve optimal results. We use Q-Learning as a reinforcement learning approach in the autonomic MAPE loop architecture. A state is modeled by (prediction, workload) for which the FQ-L takes the best suitable action ‘a’ in terms of scaling in VMs or scaling out. The FQ-L is discussed in the following manner.

Initialize the q-values: Reinforcement learning (RL) approach makes use of past history of a certain application and stores the value in a Q value table. The q-values of value table can be initialized to any random value or all zeros. Each element of the q-value table is assigned to certain rule which describes the state action pair of that corresponding rule. The values in the q table are updated during the learning.

Select an action: Based on the exploration or exploitation strategy, an action is chosen. The ɛ-greedy functionality is known as exploration policy, where the action that ensures best reward is chosen with probability of 1-ɛ or a random action with a probability of ɛ is chosen.

Calculation of control action from the fuzzy logic controller: The weighted average of the fuzzy rules is called as fuzzy output, which is calculated as; $a = \sum_{i = 1}^{N} μ_{i} (x) * a_{i}$ (3)

Where, N is the number of rules, μ_i (x) is the firing strength of rule i for input signal x and a_i is the consequent function for the fired rule.

Approximation of the Q-function: The Q function is calculated from the current q values and the firing levels of rules. In a fuzzy inference system, actions composed of many rules can be executed. Hence, the Q value of the state-action pair Q(s, a) is calculated as: $Q (s, a) = \sum_{i = 1}^{N} (μ_{i} (s) * q [i, a_{i}])$ (4)

Where, the value of Q (s, a) tells how desirable, it is to reach state s by taking a single action a, or repeatedly taking the action a.

Calculation of reward value: On receiving the current values of workload and prediction for the current state of the systems, the controller calculates the reward by examining the amount of resources and the SLA violations.

Calculating the value of new state s^': Upon taking action ‘a’ and leaving from state s to s^', the value of the new state s^' is calculated as:

$V (s^{'}) = \sum_{i = 1}^{N} μ_{i} (s^{'}) . max_{k} (q [i, a_{k}])$ (5)

Where, max (q [i, a_k]) is the maximum of the q values which can be achieved in state s’.

Calculation of error signal: The error signal is calculated for FQ-L, if there is any deviation from the maximum reward. It is calculated as: if

$Δ Q_{FQL} (s, a) = r + γ \times V (s^{'}) - Q (s, a)$ (6)

Where, γ is the discount rate determining the importance of future reward.

Update q values: At each step, the q value is updated through the following equation:

$q [i, a_{k}] = q [i, a_{k}] + η \cdot Δ Q \cdot μ_{i} (s (t))$ (7)

Where, η is learning rate and we have taken its value between 0 and 1.

4.3 Proposed framework

Resource provisioning framework has been constructed that matches the control MAPE loop. All three cloud services viz. SaaS, PaaS and IaaS are accommodated in the framework. The working of framework is shown in Fig. 2. The SaaS layer provides cloud services to end users. The PaaS layer is responsible for resource provisioning to cloud services which are offered by SaaS. The IaaS layer contains data center at which VMs are hosted and it provides VMs to the SaaS layer. As shown in Fig. 2, the main units of the resource provisioning mechanisms based on the control MAPE loop are Monitoring, Analysis, Planning and Execution. These four units in the context of resource provisioning are discussed below.

Fig. 2.

Framework of autonomic computing.

Monitor: This component collects the metrics of both the users and resources. The monitor component consists of two sub components. The first sub-component, namely resource monitor collects information regarding the resources in use such as computational, network, storage and etc. The second sub-component the user monitor collects information regarding the number of users’ requests, request type, request size and etc.

Analyzer: A proposed prediction model based on Chebyshev’s inequality is used, where it gathers the data collected from the monitor component such as, request arrival rate and take the appropriate action. The action taken by the analyzer is intended to fulfill the said QoS. The workload analyzer continuously monitors the future workload and triggers the planning component to incorporate the desired changes.

Planner: The planner component is the main unit of this framework and at this component the number of VMs allocated to different cloud services is determined. The VMs allocated should be planned in a way that it abides by the said SLA and the same time optimizes the cost. We have used a FBQ-LA to handle the VM allocations.

Executer: It consists of two sub-components namely; (i) the load balancer and (ii) VM manager. The load balancer collects the incoming requests from users and distributes the VMs accordingly. The VM manager decides either scale in or scale out is required and accordingly allocates or de-allocates the VMs.

4.4 Problem formulation

The proposed approach comprises of equations and notations. The notations used in framing the problem are as follows. U is the number of users requesting for a cloud service at a given time with each user having a request R_u The total number of requests at a given time is calculated as; $D = \sum_{u = 1}^{U} R_{U}$ (8)

Where, ${req}_{u}^{r}$ means the u^th user requests r resources which include the items having a maximum cost budget, which the user u offers to the SaaS provider for utilizing any cloud service. Let S = {S₁, S₂, S₃, S₄, …, S_x} be the total cloud services offered by the SaaS provider. To execute the services, satisfactory VMs are allocated to optimize cost and minimize SLA Violations. Let N_i be the number of VMs required to execute service S_i at a time interval Δt. VM utilization is calculated as;

$U_{i} (t) = \frac{CloudletP E_{k} \times CloudletLengt h_{k}}{P E_{k} \times MIPS}$ (9)

Where, cloudlets are the total number of tasks being executed, Cloudlet PE is total amount of processing element required by task. Cloudlet Length is the size of particular task. PE is the number of processing elements assigned to the requested VM and MIPS are the number of instructions executed per second. Total costs incurred by the SaaS provider for processing all the requested cloud services is calculated as; $TotalCost = VMCost + PenaltyCost$ (10)

Where, each VM cost depends on VM price and the duration for which the VM is activated. $VMCosts = \sum_{n = 1}^{N} {VMCost}_{n}$ (11)

$\begin{matrix} {VMCost}_{n} = ({VMPrice}_{s} * {VMhour}_{n}) \\ + (VMInprice ($ / hr)_{s} * T_{n} (\min)) \end{matrix}$ (12)

Penalty cost is calculated by SLA violations. The objective is to minimize the total cost, i.e. Minimize (TotalCost = VMCost + PenaltyCost).

The proposed work follows the control MAPE loop. Each component of the MAPE architecture plays its role in order to avoid the SLA violation. Various notations with definitions that are used in our work are summarized in Table 1.

Table 1

Model notations

Notation	Definition
U	Total number of users
u	Particular user
D	Total requests of all users
S_i	ith Cloud services
${req}_{u}^{r}$	uth user requests r resources
N_i	Number of VMs
Δ t	Time interval
W_i(Δt)	Number of requests at time interval t U_i(Δt)	VM utilization
T_n	Time taken in VM n setup
VMCost_n	Cost of nth VM
s	VMType or VM heterogeneity
VMhour_n	Time during which nth VM is active
VM Inprice($/hr)_s	Initiation price for s type VM
(VMPrice_s)	Price of s type of VMs
SD	Standard deviation

4.5 Pseudo code for autonomic computing and various phases

Algorithm 1: Autonomic Computing

1. Initialization: Initialize appropriate number of VMs.

2. while (There is end user’s request)do

3. begin

4. for (every cloud services provided by CSP) do

5. begin

6. Monitoring(M);

7. Analysis(A);

8. Planning(P);

9. Execution(E);

10. end for

11. end while

Algorithm 2: Monitoring Phase

1. begin

2. User Monitor (Workload in interval Δt

(W_i (Δt))./*user’s requests are continuously

monitored*/

3. Resource Monitor (number of VMs (N_i (Δt))),

resource utilization in the interval Δt, (U_i (Δt)))

4. end

In the monitoring phase, both the user’s metrics and resource’s metrics are monitored continuously. The user monitor collects metrics of the number of VMs leased (N_i (Δt)) (and the number of requests used (W_i (Δt)) for executing a cloud service S_i at the Δtth time interval. The resource monitor observes the CPU utilization of VMs (U_i (Δt)).

Algorithm 3: Analysis Phase

1. begin

2. Input: Workload history from Monitor Phase (

{req}_{u}^{r}

(of each user workload)

3. Output: Prediction value

4. Prediction Value = P (Mean (Workload) –6

(SD_workload) <workload < (Mean (workload) + 6

(SD_workload))

5. return prediction value

6. end

This phase uses Chebyshev’s Inequality principle to predict the future workload for cloud services by processing the output obtained from monitoring phase.

Chebyshev’s Inequality for predicting the future workload:

Chebyshev’s inequality guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from the mean [40–41]. Specifically, no more than 1/k² of the distribution values can be more than k standard deviations away from the mean (or equivalently, at least 1–1/k² of the distribution’s values is within k standard deviations of the mean). This simply implies that the probability that the expectation (or mean) of a number X, when subtracted from the number itself, is always less than k times the standard deviation of X is greater than or equal to (1–1/k²). It is termed in the mathematical notation as; $P (| X - E (X) | < k σ_{Y}) \geq (1 - \frac{1}{k^{2}})$ (13)

Where, E(X) is the expectation or the mean of the sample, σ_y is the standard deviation and k is a constant generally taken to be either 3 or 6. This implies that; $\begin{matrix} X - E (X) \geq 0 \\ | X - E (X) | = X - E (X) \\ X - E (X) = k σ_{Y} \\ X < E (X) + k σ_{Y} \end{matrix}$ (14)

Similarly, $\begin{matrix} X - E (X) < 0 \\ | X - E (X) | = - (X - E (X)) = E (X) - X \\ E (X) - X = k σ_{Y} \\ E (X) - k σ_{Y} < X \end{matrix}$ (15)

From equation (14) and (15), we conclude that: $P (E (X) - k σ_{Y} < X < E (X) + k σ_{Y}) \geq 1 - \frac{1}{k^{2}}$ (16)

Taking k = 6 we get; $P (E (X) - 6 σ_{Y} < X < E (X) + 6 σ_{Y})$ (17)

This equation when applied to predict workload is; $\begin{matrix} P (Mean (Workload) - 6 (S D_{workload}) < \\ Workload < (Mean (Workload) + 6 (S D_{workload}) . \end{matrix}$ (18)

We have used a fuzzy Q-learning approach in autonomic computing for taking actions which optimizes the total cost by adjusting appropriate number of resources. Based on the 9 fuzzy rules, an action is chosen and the Q-value table is updated till we reach an optimal solution. A positive reward is given if the action chosen is appropriate else a negative or low reward. An optimal action is selected for each rule based on maximum q value from Q table. After this, adding or removing VMs performs action related to the selected action in the planning phase. The type of VMs considered for Q-Learning and Fuzzy Q-Learning are shown in Table 2 and Table 3.

Algorithm 4: Planning Phase and Execution Phase

1. begin

2. Input: Prediction value from analysis phase and number of requests for service S_i from monitoring phase (W_i (Δt)).

3. Output: Executed selected action i.e. (Scale IN/Scale Out/No Action)

4. Set discount rate _y =0.8 and learning rate η =.0.1

5. Initialize the q-values arbitrarily.

6. Choose current state s from fuzzy rules obtained from fuzzy controller based on prediction value and number of requests.

7. choose partial action a_i from state’s using (exploration/exploitation policy)

8. compute action a from a_i (Eq. 3) and approximate Q function Q (s, a)from q values and firing strength of rules (Eq. 4)

9. apply the action a, observe the new state s′

10. receive the reinforcement signal (reward) r from current state to next state and compute value of new state (Eq. 5)

11. compute the error signal ΔQFQL (s, a) (Eq. 6)

12. Update q-values (Eq. 7)

13. s ← s′

14. repeat for new state until convergence obtained

15. Select action from current state, Selected Action = Lookup (Max(Next State, Q-values))

16. If (Selected Action==Scale IN // Release appropriate number of VMs

17. else if (Selected Action==Scale Out)//Assign appropriate number of VMs

18. else No Operation

19. end

20. end

Table 2

Type of VMs considered for Q-Learning

VM Type	Extra Large	Large	Medium	Small
CPU(MIPS)	5000	3000	1500	800
DISK	1700	1000	800	200
VMPrice($/hr.)	3.67	2.71	1.81	0.25
VM Initiation price($/hr.)	4.13	3.81	1.67	0.37
VM Initiation time (min)	8	5	4	3

Table 3

Type of VMs considered for Fuzzy Q- Learning

VM Type	Small
CPU(MIPS)	800
DISK	200
VM price ($/hr.)	0.25
VM Initiation price ($/hr.)	0.37
VM Initiation time(min)	3

5 Results and discussions

This section illustrates the experimental results of the proposed work discussed in previous sections. Firstly, a brief detail on experimental setup with assumptions are described. After this, a comparative analysis of proposed work with existing work is done.

5.1 Experimental setup

The experimentations presented in this section were generated using CloudSim 3.0 toolkit and MATLAB version 2015a. Monitoring, analysis and execution phases are validated by using CloudSim [42], whereas planning phase is validated using MATLAB. Design of Fuzzy controller and implementation of Fuzzy Q-Learning and Q- Learning is done in MATLAB. For simulation, four heterogeneous VMs i.e. Large, Extra-large, Medium and Small are created when Q-Learning is used in the planning phase, whereas homogenous VMs of small size are taken when FBQ-L methodology is implemented. Configuration details of these VMs having different cost and capacities are presented in Table 2 and Table 3. Real World Workload traces are taken from NASA data [43] having different characteristics which gives more realistic results. Real workload is collected in a 1 hour interval for 12 hours.

5.2 Comparative analysis of proposed work with existing work

We compare our proposed approach with an autonomic framework proposed by Arani et al. [29], who uses a hybrid approach of autonomic computing and reinforcement learning. This framework is inspired by MAPE-K loop, where they used LRM in the analysis phase and Q-Learning for decision making in the planning phase. We compared our proposed prediction technique with LRM. Then analysis of the combination of LRM or Chebyshev’s Inequality based prediction in analysis phase and Q learning/Fuzzy Q learning in the planning phase is done.

5.2.1 Chebyshev’s inequality method with linear regression model (LRM) in analysis phase

We have used Chebyshev’s Inequality in our framework for predicting next hour workload and then compare it with LRM. Both the Chebyshev’s Inequality and LRM are used for predicting next hour workload. After rigorous analysis, it is concluded that Chebyshev’s Inequality gives very close result to actual workload whereas there is large variation in LRM. Prediction by Chebyshev’s inequality is more accurate than LRM. Simplicity of Chebyshev’s inequality grabs the attention towards the current prediction. As shown in Fig. 3, LRM shows large deviation from actual workload in the 2nd hour, whereas in Fig. 4, we can see how this difference is reduced by Chebyshev’s inequality.

Fig. 3.

LRM based prediction.

Fig. 4.

Chebyshev’s Inequality based prediction.

5.2.2 Linear regression model (existing) in analysis phase and fuzzy Q learning (proposed) in planning phase

From the above, it is concluded that Chebyshev’s inequality gives a better prediction than LRM. As prediction by LRM is not very close to accurate, therefore, for managing fluctuations in workload, we are applying fuzzy rules to prediction values obtained by LRM to give more realistic results. For this, we have formulated 9 fuzzy rules corresponding to 9 states. So, a 9x9 matrix was constructed which shows how the practical formulation of each rule affects the action to be taken in terms of scale in, scale out or no operation. The columns of each rule signify the amount of VMs to be reduced or added. So, the 5th column signifies a no operation action to be taken, the first four columns are for scale in action with values ranging from –4 to –1 and the last four columns are for scale out operation with values ranging from+1 to+4. The first rule says that if the prediction is low and workload is low then it signifies a no operation action to be taken. So, the Q- value for no operation, i.e. Q (1, 5) will get the highest value. This was indeed the case and after 27 iterations the values were seen to be constant. Hence, the stopping criterion was met and Q-matrix was optimized. This has been shown in Fig. 5.

Fig. 5.

Optimized Q (1, 5).

In the third rule (when prediction is low and workload is high), we have to add a large amount of VMs. So, every column of Scale out must be optimized and thus we see that the 9th column of the third row gets the highest value and rest other scale out columns also optimized with a value less than that of the 9th column. In this way, the practical result has been observed. The Q-value for the 8th and the 9th column of third rule is shown in Fig. 6(a) and 6(b). The Q (3, 9) gives maximum value than Q (3, 8) so Scale Out operation with 4 VMs are carried out. Again in this scenario, after 27 iterations the values have been optimized.

Fig. 6.

(a) Optimized Q (3, 8) (b). Optimized Q (3, 9).

The fourth rule says that if the prediction is medium and workload is low, then we conclude that VMs to be decreased in slight amount. Every column of Scale In are updated and the 3rd column of the fourth row gets the highest value and rest other Scale In columns are also optimized but with a value less than of the 3rd column. The values can be seen to be getting optimized after 27th iteration. With this criteria, we decrease the number of required VMs are 3. The practical observation with Q-values of (4, 3) and Q-values of (4, 4) are shown in Fig. 7(a) and 7(b).

Fig. 7.

(a) Optimized Q (4, 3) (b) Optimized Q (4, 4).

With this framework, the total cost and penalty cost can be minimized. Applying fuzzy rules and forcing the states to take action in accordance with the outcome of the rules not only handles the case of dynamic workload changes, but also minimizes the total cost incurred both for the user and the CSP.

5.2.3 Chebyshev’s inequality in analysis phase with fuzzy Q learning in planning phase

Chebyshev’s inequality for prediction gives very accurate prediction. Hence, there is no need to pass predicted values to fuzzy controller as it already produce near optimal result. Fuzzy Controllers are used only when there is lot of uncertainties. This instance must be considered as an important topic for further investigation in future.

5.2.4 Chebyshev’s inequality (proposed) in the analysis phase with Q-learning in planning phase (existing)

Arani et al. [29] proposed a planning phase of MAPE-Loop based on Q-Learning. In this work, CPU utilization of cloud service at a particular time was evaluated. In this, they consider three states ((i.e. Over Utilization, Under Utilization and Normal Utilization) based on CPU utilization. In our work, we consider the same methodology of the states to perform the related operations. On the basis of this a 3x3 matrix was constructed in planning phase, where rows of a matrix are states and columns are actions. As shown in Fig. 8(a), 8(b), and 8(c), the three states (i.e. Over Utilization, Under Utilization and Normal Utilization) form rows and their corresponding actions Scale Out, Scale In and No-Operation form columns. The formed matrix is optimized by Q-Learning. Suitable actions have been chosen based on maximum q-values. On the other hand, the cost comparison of Fuzzy Q-Learning and Q-Learning in the planning phase is depicted in Fig. 9. From the above validations, it is observed that Q-Learning converges at very slow rate and the same has been proved from Fig. 7(a), (b) and 8(a), (b), (c).

Fig. 8.

(a) Normal Utilization with No-operation (b) Over-utilization with Scale Out (c) Under-utilization with Scale In.

Fig. 9.

Comparison of VMs cost by using LRM in analysis phase with Q-L or Fuzzy Q-L in planning phase.

We can also observe that Q-values are being optimized after 80 iterations in all the three cases. On the other hand, from Figs. 5–7, we conclude that FQ-L causes earlier optimization, whereas optimization time using Q-Learning is nearly more than double than the FQ-L.

Thus, it is concluded that the use of FQ-L reduces response time. By using FQ-L, we get the exact amount of VMs and these VMs are being assigned when required, VM Cost is also minimized by using homogenous VMs of small type (shown in Fig. 9). Q-Learning stores Q-values in lookup table, which is impractical for large size states and actions, whereas FQ-L stores large Q-values easily. FQ-L decreases the training by embedding prior knowledge into rules. As a consequence, SLA has been complied with the proposed methodology of the framework.

5.3 Findings on comparative analysis

We adopted the MAPE model proposed by Arani et al. [29] for comparative analysis. Our work comprises phase wise (analysis phase and planning phase) comparison with the existing approach. The proposed analysis model is compared with existing LRM model. After that, three types of combinations are taken for analysis; (i) LRM in analysis phase (existing) with FBQ-LA in planning phase (proposed), (ii) Chebyshev’s Inequality in analysis phase (proposed) with FBQ-LA in planning phase (proposed), and (iii) Chebyshev’s Inequality in analysis phase (proposed) with Q-Learning in planning phase (existing). Finally, a cost comparison analysis using LRM in analysis phase with Q-Learning/Fuzzy Q-Learning in planning phase is performed.

6 Conclusion and future work

In this paper, resource provisioning for handling elastic demands for cloud services have been considered. Handling demand in cloud environment is a challenging issue that requires an efficient auto-scaling mechanism. To manage this issue, we have proposed an autonomic framework based on Chebyshev’s inequality and Fuzzy based Q-learning approach (FBQ-LA) for cloud infrastructure management. The proposed work reduces the wastage as well as shortage of resources by restricting the SLA violations. The total cost is cut down by dynamically allocating accurate homogenous resources as and when required. The implementation of fuzzy rules for each state further increases the robustness of MAPE loop. The fluctuating nature of the workload for a given cloud service is well handled by forcing human interference in the form of fuzzy rules. Moreover, the speed of convergence and learning is achieved in the planning phase by using Fuzzy Q-Learning that reduces the response time. The limitation of the proposed approach is that it has been analyzed with the group of VMs configured in a homogeneous manner when applied FBQ-L in planning phase.

In future work, heterogeneous VMs will be considered for dynamic allocation with FBQ-L to obtain more realistic outcomes. The fuzzy based SARSA learning methodology will be applied in autonomic computing and the performance of both the approaches with respect to an optimal solution will be evaluated. The control MAPE-K loop can be made more dynamic by applying well formulated fuzzy rules for each and every phase. Further enhancement of MAPE loop can be done with the use of dynamic fuzzy Q-Learning.

Footnotes

Acknowledgments

The authors wish to express their gratitude and heartiest thanks to the editor and anonymous reviewers for their valuable suggestions in improving the paper significantly. The authors also thank the Department of Computer Science & Engineering, Indian Institute of Technology (ISM), Dhanbad, India for providing their research support.

References

Buyya ,

Vecchiola and

S.T.

Selvi , Mastering cloud computing: Foundations and applications programming, Newnes (2013).

Singh and

Chana , Resource provisioning and scheduling in clouds: QoS perspective, The Journal of Supercomputing 72(3) (2016), 926–960.

Singh and

Chana , QRSF: QoS-aware resource scheduling framework in cloud computing, The Journal of Supercomputing 71(1) 2015, 241–292.

Singh and

Chana , A survey on resource scheduling in cloud computing: Issues and challenges, Journal of grid computing 14(2) (2016), 217–264.

R.R.

Bane ,

Annappa and

K.C.

Shet , Survey of dynamic resource management approaches in virtualized data centers, In Proceeding of IEEE International Conference on Computational Intelligence and Computing Research, 2013, pp. 1–7.

Buyya ,

S.K.

Garg and

R.N.

Calheiros , SLA-oriented resource provisioning for cloud computing: Challenges, architecture, and solutions, In Proceeding of IEEE International Conference on Cloud and Service Computing, 2011, pp. 1–10.

Chaisiri ,

B.S.

Lee and

Niyato , Optimization of resource provisioning cost in cloud computing, IEEE Transactions on Services Computing 5(2) (2012), 164–177.

Maurer ,

Breskovic ,

V.C.

Emeakaroha and

Brandic , Revealing the MAPE loop for the autonomic management of cloud infrastructures, In Proceeding of IEEE Symposium on Computers and Communications, 2011, pp. 147–152.

M.C.

Huebscher and

J.A.

McCann , A survey of autonomic computing—degrees, models, and applications, ACM Computing Surveys (CSUR) 40(3) (2008), 7.

10.

Buyya ,

R.N.

Calheiros and

Li . Autonomic cloud computing: Open challenges and architectural elements, In Proceeding of IEEE International Conference on Emerging Applications of Information Technology, 2012, pp. 3–10.

11.

Jacob ,

Lanyon-Hogg ,

D.K.

Nadgir and

A.F.

Yassin , A practical guide to the IBM autonomic computing toolkit, IBM Redbooks 4 (2004), 10.

12.

Bahrpeyma ,

Haghighi and

Zakerolhosseini , A bipolar resource management framework for resource provisioning in Cloud's virtualized environment, Applied Soft Computing 46 (2016), 487–500.

13.

Moreno and

Xu . Neural network-based overallocation for improved energy-efficiency in real-time cloud environments, In proceeding of IEEE 15th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, 2012, 119–126.

14.

S.K

Garg ,

A.N

Toosi ,

S.K.

Gopalaiyengar and

Buyya , SLA-based virtual machine management for heterogeneous workloads in a cloud datacenter, Journal of Network and Computer Applications 45 (2014), 108–120.

15.

A.F.

Antonescu ,

Robinson and

Braun , Dynamic SLA management with forecasting using multi-objective optimization, In Proceeding of IFIP/IEEE International Symposium on Integrated Network Management, 2013, pp. 457–463.

16.

M.M.

Al-Sayed ,

Khattab and

F.A.

Omara , Prediction mechanisms for monitoring state of cloud resources using Markov chain model, Journal of Parallel and Distributed Computing 96 (2016), 163–171.

17.

P.Y.

Glorennec , Reinforcement learning: An overview, In Proc of ESIT, 2000, 17–35.

18.

R.S.

Sutton and

A.G.

Barto , Reinforcement learning: An introduction, MIT press, 1998.

19.

C.J.

Watkins and

Dayan , Q-learning. Machine learning 8(3–4) (1994), 279–292.

20.

Xu ,

Zhao ,

Fortes ,

Carpenter and

Yousif , Autonomic resource management in virtualized data centers using fuzzy logic-based approaches, Cluster Computing 11(3) (2008), 213–227.

21.

Maurer ,

Brandic and

Sakellariou , Simulating autonomic SLA enactment in clouds using case based reasoning, In European Conference on a Service-Based Internet. Springer, Berlin, Heidelberg, 2010, pp. 25–36.

22.

Mao and

Humphrey , Auto-scaling to minimize cost and meet application deadlines in cloud workflows, In Proceeding of IEEE International Conference on High Performance Computing, Networking, Storage and Analysis, 2011, pp. 1–12.

23.

Ritter ,

Mitschang and

Mega , Dynamic provisioning of system topologies in the cloud, Enterprise Interoperability V. Springer, London, 2012, pp. 391–401.

24.

Frey ,

Lüthje ,

Reich and

Clarke , Cloud QoS scaling by fuzzy logic, In Proceeding of IEEE International Conference on Cloud Engineering, 2014, pp. 343–348.

25.

Jamshidi ,

Ahmad and

Cl.

Pahl , Autonomic resource provisioning for cloud-based software, In Proceedings of ACM 9th international symposium on software engineering for adaptive and self-managing systems, 2014, pp. 95–104.

26.

Jamshidi ,

Sharifloo ,

Pahl ,

Metzger and

Estrada , Self-learning cloud controllers: Fuzzy q-learning for knowledge evolution, In Proceeding of IEEE International Conference on Cloud and Autonomic Computing, 2015, pp. 208–211.

27.

Amiri ,

M.R.

Feizi-Derakhshi and

Mohammad-Khanli , IDS fitted Q improvement using fuzzy approach for resource provisioning in cloud, Journal of Intelligent & Fuzzy Systems 32(1) (2017), 229–240.

28.

Singh and

Chana , EARTH: Energy-aware autonomic resource scheduling in cloud computing, Journal of Intelligent & Fuzzy Systems 30(3) (2016), 1581–1600.

29.

Ghobaei-Arani ,

Jabbehdari and

M.A

Pourmina , An autonomic resource provisioning approach for service-based cloud applications: A hybrid approach, Future Generation Computer Systems 78 (2018), 191–210.

30.

Arabnejad ,

Pahl ,

Jamshidi and

Estrada , A comparison of reinforcement learning techniques for fuzzy cloud auto-scaling, In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2017, pp. 64–73.

31.

M.S.

Aslanpour ,

Ghobaei-Arani and

A.N.

Toosi . Auto-scaling web applications in clouds: A cost-aware approach. Journal of Network and Computer Applications 95 (2017), 26–41.

32.

P.Y.

Glorennec , Fuzzy Q-learning and dynamical fuzzy Q-learning, In fuzzy system, Proceedings of the Third IEEE Conference World Congress on Computational Intelligence, 1994, pp. 474–479.

33.

H.R.

Berenji , Fuzzy Q-learning for generalization of reinforcement learning, In Proceedings of the Fifth IEEE International Conference on Fuzzy Systems, Vol. 3, 1996, pp. 2208–2214.

34.

C.H.

Oh ,

Nakashima and

Ishibuchi , Initialization of Q-values by fuzzy rules for accelerating Q-learning, In Neural Networks Proceedings, IEEE World Congress on Computational Intelligence. IEEE International Joint Conference, Vol. 3, 1998, pp. 2051–2056.

35.

Jouffe , Fuzzy inference system learning by reinforcement methods, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 28(3) (1998), 338–355.

36.

Bonarini ,

Lazaric ,

Montrone and

Restelli . Reinforcement distribution in fuzzy Q-learning, Fuzzy sets and systems 160(10) (2009), 1420–1443.

37.

Boumehraz ,

Benmahammed ,

M. L.

Hadjili and

Werzz , Fuzzy Inference Systems Optimization by Reinforcement Learning, Courrier du Savoir 1(1) (2001), 9–15.

38.

M.J.

Er and

Deng , Online tuning of fuzzy inference systems using dynamic fuzzy Q-learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34(3) (2004), 1478–1489.

39.

F.J.

Cabrerizo ,

Chiclana ,

Al-Hmouz ,

Morfeq ,

A.S.

Balamash and

Herrera-Viedma , Fuzzy decision making and consensus: challenges, Journal of Intelligent & Fuzzy Systems 29(3) (2015), 1109–1118.

40.

A.A.

Jaoude , The paradigm of complex probability and Chebyshev’s inequality, Systems Science & Control Engineering 4(1) (2016), 99–137.

41.

Y.L.

Tong , Relationship between stochastic inequalities and some classical mathematical inequalities, Journal of Inequalities and Applications 1(1) (1997), 85–98.

42.

R.N.

Calheiros ,

Ranjan ,

Beloglazov ,

AF De Rose and

Buyya , Cloud Sim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms, Software: Practice and experience 41(1) (2011), 23–50.

43.

Nasa, one day http data log http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html (Accessed on 17.11.17).