Abstract
With the construction and application of large-scale datacenters, the issue of resource allocation in cloud computing becomes a serious concern. Although the current static allocation method can make applications get corresponding resources, there still exist some shortcomings such as resource surpluses or shortages. This kind of problem is more crucial in real-time requirements of mobile cloud computing service. Therefore, it is necessary to establish a forecasting model to predict the future resource demands, and then perform on-demand distribution, which can effectively reduce the unnecessary daily network management fees and address the issues mentioned above. This paper focuses on CPU resource forecasting, establishing three forecasting models including Markov chain, weighted Markov chain and stacking weighted Markov chain. By comparing and analyzing the experiment results, the most reasonable forecasting model is found and explained.
Introduction
With the developing of distributed computing, parallel computing and grid computing, a new computing mode gradually emerged called cloud computing, which help the Internet to conveniently share resource on-demand at anytime. Moreover, with the development of mobile devices, mobile cloud computing is also applied based on Android, IOS and other devices. At present, cloud computing permits renters to rent resource in a pay-as-you-go fashion, compared with in-house computing which needs to maintain complex computing infrastructures. Besides, cloud computing supplies a better solution to save spending. In order to achieve it, the right amount resources should be offered to the applications which are running in the cloud, and those resources are rarely static, while most of them are varying as the changes of the workload. However, under-provisioning resources will result in service level objective (SLO) violations, which are often associated with significant financial penalties. Over-provisioning will waste resources that could be put to other uses. These two cases are more severe in real-time requirements of mobile cloud computing service. In order to avoid these problems, it’s necessary to establish a reasonable resource allocation system that based on forecasting, in order to maximize the resource utilization.
Related work
Resource allocation in cloud computing
At present, there is acrucial challenge in cloud computing, which is resource allocation, particularly how do the cloud providers not only satisfy the service level objective but also maximize the resource utilization [1]. The multi-tier cloud applications based on SLA resource allocation concern in [2, 3] is thought to be the distributed solution to every processing capability, data storage and communication resources, and the point has been translated into a three-dimensional optimization problem. Li et al. [4] proposed the adaptive management structure of virtualization resources in a cloud computing environment, whose main component is a virtual resource adaptive configurator based on dynamic state space feedback control method, such as CPU, memory and I/O. Hu et al. [5] proposed a cloud computing PaaS platform resource optimization method named FC-LRU, which combines the advantages of Feedback Control and Least Recently Used to maintain low default rates and high CPU utilization of the tasks running on the PaaS. Koehler and Benkner in [6] build an adaptive framework for resource allocation architecture scientific application, the architecture relies on historical data and a cost produced by different configuration of resources ranking of the utility function to achieve effective allocation of cloud computing resources. In addition, cost-benefit trade-offs in cloud infrastructure-as-a-service services have been addressed [7], wherein the problems have been changed into multiple target optimization, and the fine-grained charging and the normalized performance model are the basis of the proposed model. The experimental results show that the model is effective. On the other hand, in most areas of static and special dynamic environments, there are a number of methods that use Particle Swarm Optimization (PSO), and the underlying PSO is an optimization technique in static environments. But in true-life, most applications are not stationary, it means that the environment and the overall global characteristics can be changed at any time. There are many PSO algorithms for non-stationary environment now [8]. The fast multi-swarm optimization (FMSO) is one of them, which employs two kinds of swarms: the one is used for searching for the hopeful area in the entire space, while the other is used to search for the local promising area and get the approximate solution. The other algorithm is adjusted PSO, which is based on a regular tracking of changes in the target. The tracking helps the particle memory to reset to current position, and allows the swarm to trace the varying goals with minimal spending. So as to improve the nature of the original PSO algorithm, the cooperative particle swarm optimization (CPSO) has been introduced, which uses multi-swarms to cooperatively optimize the variable components of the solution vector. The initial PSO utilizes a set of D-dimensional vectors, however, CPSO divides them into D groups of single dimensional vectors, with each swarm representing the vector of the original problem. Li et al. [9] proposed a cloud resource allocation method based on CDA framework and Nash equilibrium to achieve better resource allocation environment. The resource requirements usually show great waviness to guarantee the confidence of dynamic resources in cloud environment [10].
Forecasting and analysis
Since the forecasting techniques have been researched on for a long time, there are many proposed methods in the field of mathematics. Taylor et al. [11] studied some short-term prediction models, such as Auto-Regressive Integrated Moving Average (ARIMA) model and Auto-regressive (AR) model, Holt-Winters Exponential Smoothing model, etc. Sorjamaa et al. [12] provided both short-term and long-term forecasting methods by using fractional Auto-regressive integration moving average (FARIMA) model, which provides a global method for long-term forecasting. The conventional forecasting means are relatively complicated, taking into account the cost of the forecast to the system, and thus the cloud computing system cannot use those models directly and simply. With the developing of the cloud computing, in order to achieve effective management of resources, the predictive methods were introduced and applied to resource management. Currently, most of the forecasting models are based on time series, which can be divided into three categories, including basic forecasting model, feedback-based forecasting model and multi-time series forecasting model. Moreover, the basic forecasting model and the feedback-based forecasting model are both based on single time series. The classification of these models is as shown in Fig. 1.

Classification of forecasting models.
After analyzing those forecasting models we can find that basic forecasting model is the fundamental of feed-back-based forecasting and multi-time series forecasting model [13], hence it is necessary to study the basic forecasting model firstly, and the basic forecasting algorithm is also easy to be extended or combined with other algorithms to get a better experimental result. Besides, there are many algorithms for predicting at present such as multivariate statistics, regression analysis, exponential smoothing algorithm, mathematical induction and elasticity analysis, etc. These methods are all assuming the external environment is relatively stable or the variation trend is basically consistent with the historical data, so they can forecast by past principles. However, the variation of cloud computing resource is pretty dramatic which means its trend is not instability, so those methods are not vey suitable [14]. Among the various predictive model algorithms, as one of the basic forecasting model, the Markov chain has the advantages of simplicity and high accuracy, and is thus suitable for the scenario with large historical curve fluctuation, and it is convenient to combine with other algorithms. Besides, the resource usage of cloud computing actually is easily influenced by the external situation, so the long-term forecasting is useful. These features are also suitable for the environment of mobile cloud computing. In addition, Markov chain has is flexible for us to make a variety of improvements on it and increase the accuracy, and in this paper we have other two models based on the Markov chain. Hence, in this paper we attempt to get the most reasonable forecasting model based on the modification and optimization of the basic Markov chain model. The Markov process is widely used in scientific research, agricultural forecasting, market forecasting and other fields because of its unavailability, ergodic and time-homogeneity [15].
The artificial intelligence community has leveraged some random techniques for machine learning, and these techniques are on the base of the Bayesian rule, which could predict future incidents according to previous incidents. Markov chain forecasting model is one of these techniques. The probability of an incident occurring at any time is a function of the probability of the event occurred previously in Markov chain forecasting model [16].
Grade eigenvalue method
Similar to the conventional fuzzy comprehensive evaluation, the conventional Markov chain model uses the probability maximum membership principle to determine the state of the predicted object, so it can not get an accuracy forecasting value. Therefore, we adopt the grade eigenvalue method to process the predicted row vector so as to get the accuracy value. First of all, assign the corresponding weight to the state by computing and get the weight set, denoted as ω = {ω1, ω2, ω3, ⋯ ω
k
}, wherein K means the number of the states, and the weight values can be calculated as:
In Equation (1), η is the maximum probability of the action index, bigger the η it is, the performance of maximum probability is better, and the value usually chose as 2 or 4 [17]. In this paper we use 2. P i is the probability of statei.
Then, the grade eigenvalue H can be calculated as:
If H > i, the predicted value P is:
If H > i, the predicted value P is:
The T i , B i in Equation (3 and 4) represents the upper bound and lower bound of the state interval respectively.
Description of Markov chain algorithm
Markov chain model is a discrete-time stochastic process [18], usually characterized by Markov or memory less. That means, only the current state can influence the next state transition and there is no relationship with the preceding sequence of events.
Algorithm details
Markov property in Markov chain is shown:
P ij is the one-step probability matrix from state i to state j at a particular interval. These transition probabilities are placed in a K × K transition matrix, where K is the states’ quantity.
Supposing there is a transition matrix whose initial probability distribution is π, the probability of the chain in state i is Π n after n steps.
Establishing a state transition probability matrix is the essential of the Markov chain forecasting algorithm [19]. Therefore, the first thing is to determine the number of the states. This paper uses the mean-standard variance method to divide the data into 5 states, counts the number of the data in each state and computes the state transition probability by using Equation (5) to predict.
The Markov chain prediction model is executed as follows: Establish a grading standard and divide the data into different states, which represented by E1, E2, ⋯ En. Compute the state transition probability matrix using Equation (5), and get
Choose the value at the previous time slot before prediction as the initial value and obtain its state. Then a 1 × K row vector X is established with the corresponding bit set to 1. Finally, the vector X is used to multiply the P(1) in step (2), and then the forecasting state of the next moment can be calculated asfollows:
For example, assume that the data’s state at time n is selected as the initial value to establish the row vector (if the initial state is 2, the row vector is X = [0, 1, 1 … , 0]). Then multiply the row vector and the P(1) to get the forecasting value at time n + 1. Multiplying the row vector and the P(2) can get the expecting forecasting value at time n + 2. And so on, calculate the prediction probability of three moments after n times. Because of the forecasting result of Markov chain is a row vector consisting of 1 × K predicted probabilities. In order to get an exact value rather than a state we need to use the grade eigenvalue method introduced in Section 3.1. Repeat the above steps with three different data sets, and compare the result values.
Weighted Markov chain model on the basis of Markov chain model introduces the concept of “weight” to show the influence of diverse lengths of the step to the state transition probability matrix [20]. The algorithm allows the information contained in the initial data to be excavated as much as possible.
Algorithm overview
The fundamental of the weighted Markov chain forecasting model is establishing the transition probability matrix of each order, and regard the normalized autocorrelation coefficient of the input data as a weight. Then, the weight of each order and the state transition probability matrix of corresponding order should be multiplied to get the weighted forecasting probability of different step length. At last, all the probabilities in the same state are summed up as the final forecasting probabilities and the state of future moment could be obtained[21–23].
Algorithm details
The weighted Markov chain differs mainly in the “weight” from the Markov chain [24]. The ‘weight’ is autocorrelation coefficients of each step length. Multiply it and the related state transition probability matrix to get the prediction probability matrix.
The weighted Markov chain prediction model is executed as follows: The average±standard deviation analysis method is adopted to divide the data into n states, represented as E1, E2, ⋯ En. In this paper, we set n as 5, and calculated the number of samples in each state category. Compute the state transition probability matrix using Equation (5), and get
Compute the autocorrelation coefficients [25] of each order:
Normalize the autocorrelation coefficient of each order:
Select several values from previous time series and determine their states as the initial state x
f
respectively (f represents the step size). This paper chooses values from previous 5 points, denoted as X1, X2 ⋯ X5, according to the weights in step (4) and combining the state transition probability matrix of each step to compute the forecasting probability P
f
of the expecting point, as follows:
Sum all the forecasting probabilities in the same state as the expected forecasting probability. Use the grade eigenvalue method to get the last three forecasting values of each group, and compare the forecasting values with the real values. Repeat the above steps with three different data sets, and compare the resulted values.
The weighted Markov chain uses the normalized autocorrelation coefficient as the weights, which indicates the strength of the correlations between a series of random variables with different order. Therefore, it is suitable to predict using the data from previous points so as to summarize as much data information as possible. Besides, this forecasting model is simple, reasonable and accurate [26]. However, its result is still a probability matrix, so it also needs to use the grade eigenvalue method to get the exact forecasting values. In addition, although this model sums the absolute distribution probabilities of several orders, it ignores the different effects of the absolute distribution of the Markov chain in differentorders [27].
Superimposed Markov chain model based algorithm
Based on the basic Markov chain forecasting algorithm, by the superimposed Markov chain model based algorithm, the Markov chain is used with different orders to obtain the absolute distribution of one-dimensional random variables, and then the prediction analysis can be conducted.
Algorithm overview
The superimposed weighted Markov chain model based algorithm is a hybrid algorithm, and it is based on the weighted Markov chain. It includes both concepts of superimposition and weight. At first, the grading standard is used to calculate the mean vector, and then the prediction probability matrix of the weighted Markov chain can be computed. Finally, the product of the prediction probability matrix and the mean vector could be obtained as the forecasting value. To get the prediction value of the (n + 1)th period, we can repeat the above steps n times and sum the corresponding results.
Algorithm details
The superimposed weighted Markov chain is improved by the Markov chain and the weighted Markov chain. It needs to determine the states and creates the state transition probability matrix first. Then it normalizes the autocorrelation coefficient and calculates the prediction probability of the weighted Markov chain. After computing the mean vector according to the data, the prediction value can be derived. The flow diagram of the algorithm is in Fig. 2 as follow:

The flow diagram of the Superimposed Weighted Markov Chain.
The superimposed Markov chain model based algorithm is executed as following steps: Divide the data into different states according to the grading standard and establish the state transition probability matrix according to Equation (5). Calculate the normalized autocorrelation coefficient according to the Equations (9)and (10). Compute the Markov chain forecasting probability vectors of previous several points before expecting to forecast accordingEquation (11). Establish the column mean vector according to the grading standard, am = ((min + T1)/2 + (T1 + T2)/2 + … + (max + T
n
)/2), wherein min is the minimum value of the input data, and max is the maximum value of the input data [28], and T1, T2 … , T
n
means the upper and lower bounds of each interval that divided according to the grading standard. Multiply the weighted Markov chain forecasting probability vector from step (3) and the column mean vector from step (4) to get a forecasting value. Repeat calculations with different groups of data to get n forecasting values and sum them up. And the result is the forecasting value at the nth point. Repeat above steps with three different data sets, and compare the values.
This algorithm can get the forecasting value directly without using the grade eigenvalue method, which means it is easier than the previous algorithms and can improve the computing efficiency. Besides, this model uses the column mean vector to generalize the influence of the superimposed probability of each interval on the final forecasting result. In addition, this algorithm is based on the weighted Markov chain, which uses the normalized autocorrelation coefficient as the weight and has improved the accuracy. This model uses the column mean vector to restricted data range which means the forecasting value will be more accurate.
Experimental results and analysis
Simulation tools and data sources
In order to examine the accuracy of these three forecasting models, we simulated a testbed upon the MATLAB plat form, using four data sets to forecast and comparing the predicted values and actual values. In this experiment we focus on CPU resource and the dataset comes from Google which has been released in May 2011. This dataset contains 29 days’state information from a swarm which consists of 11,000 physical machines and these machines are considered as a single unit to be operated. The dataset is made of percentages of each task’ resource usage and allocation.
In this experiment, the data of the resource usage is our main concern, such as CPU usage, memory space usage, and some other measurements. For now, in order to simplify the problem, the maximum CPU is the main resource to be focused on. These measurements are placed in a table named “resource usage table”, and the measurement records every 300 seconds for each task. Our experiment was conducted based on the maximum CPU usage of eachmachine.
Comparison of three algorithms for the first data set
Comparison of three algorithms for the first data set
Comparison of three algorithms for the second data set
Comparison of three algorithms for the third data set
Comparison of three algorithms for the fourth data set
We mainly concern about two points on the experimental results. On the one hand, the forecasting values should not be lower than the actual value so as to avoid breaking the SLA; and on the other hand, the difference between the forecasting values and the actual values should be as small as possible.
This experiment selects four data sets to simulate randomly, each set includes 20 data entries of the maximum CPU usage. We use Markov chain algorithm, the weighted Markov chain algorithm and the superimposed weighted Markov chain algorithm to predict the last three values of each set respectively, and compare them with the actual values. The experimental data and the comparison are as follow and the simulation results are shown in Tables 1–4 and Figs. 3–6.

Comparison of the actual and predicted values of the Markov chain model based algorithm.

Comparison of the actual and predicted values of the Weighted Markov chain model based algorithm.

Comparison of the actual and predicted values of the Superimposed weighted Markov chain model based algorithm.

Comparison of the actual and forecasting values of three algorithms.
Take an example of the fourth set to illustrate the results of the Markov chain model, as shown in Fig. 3.
Figure 3 illustrates that the Markov chain based algorithm predicts fairly accurately at the initial point such as the 18th point, with the error of only 0.0023. However, after that, the error becomes bigger and bigger. The reason for the greater error is the greater dynamism of the data itself.
The result of the Weighted Markov chain forecasting model is shown as in Fig. 4.
Compared with the result of the Markov chain, the overall results of the weighted Markov chain prediction tend to be more realistic, the error of the three points are 0.0125, 0.0230 and 0.0311 respectively. Although, the errors are also increasing, the range is smaller than the results of Markov chain. Hence, we can see that the weight values make different, but it still cannot stop the error get bigger. Because the weight values are not enough to summarize the overall trend of the data.
The result of the superimposed weighted Markov chain forecasting model is as shown in Fig. 5.
Compared with the results of the previous two forecasting algorithms, the errors of the superimposed weighted Markov chain prediction are the smallest among the three methods. The errors of the three points are 0.0112, 0.0130 and 0.0155 respectively. From the figure we can find the trend of the forecasting curve and the actual value curve are consistent. So the superimposed weighted Markov chain forecasting model is the best among the threemodels.
The comparison of the actual and forecasting values of three models for four data sets is as shown in Fig. 6. The red curve describes the trend of the actual values from four data sets, and the blue, black and green curves represent the trend of Markov chain, the weighted Markov chain and the Superimposed weighted Markov chain. From the figure we can see the curve of Superimposed weighted Markov chain is the closest to the curve of actual values. Theoretically, the Superposition of Weighted Markov chain forecasting algorithm on the one hand considers normalized autocorrelation coefficients as weights, and on the other hand it uses the mean vector which is better for data mining, and finally to predict the result of overlapping value, which gets rid of the error caused by Grade level eigenvalue method on the simulation platform. However, both Markov chain and the Weighted Markov chain need to use the Grade eigenvalue method to deal the results as to get one accuracy forecasting value, and this processing will not only address the computational error, but also prolong the computation time. Above all, the Superimposed Weighted Markov chain is the best algorithm among the three to be used to forecastthe CPU resource.
As the appearance of large-scale data center, the issue of resource allocation in cloud computing becomes a serious concern. In order to solve the problems of resource surpluses or shortages, this paper proposed three kinds of forecasting algorithms to predict resource usage before allocation and compares these three algorithms by simulation experiments. After analyzing the results of the experiment, we find that the Superimposed weighted Markov chain forecasting model is most reasonable and accurate.
However, the superimposed weighted Markov based algorithm is only the best one among the three in this paper, so it is possible to further reduce the error values. Besides, because of the dynamic characteristics of cloud computing, there may be some variation points leading to larger errors. Therefore, we need some compensation measures, which will be explored in the future.
Footnotes
Acknowledgments
This paper is supported in part by the National Natural Science Foundation of China (Nos. 61762074, 61563044 and 61640206), Open Research Fund Program of State Key Laboratory of Hydroscience and Engineering (No. sklhse-2017-A-05), and National Natural Science Foundation of Qinghai Province (Nos. 2015-ZJ-725, 2017-ZJ-902).
