Abstract
Recently, the virtual machine deployment algorithm uses physical machine less or consumes higher energy in data centers, resulting in declined service quality of cloud data centers or rising operational costs, which leads to a decrease in cloud service provider’s earnings finally. According to this situation, a resource clustering algorithm for cloud data centers is proposed. This algorithm systematically analyzes the cloud data center model and physical machine’s use ratio, establishes the dynamic resource clustering rules through k-means clustering algorithm, and deploys the virtual machines based on clustering results, so as to promote the use ratio of physical machine and bring down energy consumption in cloud data centers. The experimental results indicate that, regarding the compute-intensive virtual machines in cloud data centers, compared to contrast algorithm, the physical machine’s use ratio of this algorithm is improved by 12% on average, and its energy consumption in cloud data center is lowered by 15% on average. Regarding the general-purpose virtual machines in cloud data center, compared to contrast algorithm, the physical machine’s use ratio is improved by 14% on average, and its energy consumption in cloud data centers is lowered by 12% on average. Above results demonstrate that this method shows a good effect in the resource management of cloud data centers, which may provide reference to some extent.
Introduction
As more and more enterprises deploy their businesses in cloud data centers, the demand for cloud data centers is greatly enhanced. To further seize markets and users, all cloud service providers further enlarge their deployment in cloud data centers, especially those large-scale cloud service provides which increase investment to promote their market share. For example, Amazon has deployed 23 cloud data centers around the world to date, involving America, Asia, Europe and Africa, to provide prompt and effective cloud service to global users. However, increased cloud data centers will further aggregate the competitions between cloud service providers.
A mature cloud data center can optimize itself to reduce operating costs, and the optimizing means mainly include improving physical machine’s use ratio and bringing down energy consumption in cloud data centers [1]. To improve physical machine’s use ratio, the most direct way is to deploy more virtual machines in this physical machine and keep it at a full load condition as possible. When a cloud user applies for a new virtual machine from cloud data center, a cloud service provider will deploy the virtual machine to a fixed physical machine according to its parameters [2]. Since the virtual machines applied by different users may have different parameters such as CPU, memory and so on, so the deployment algorithm for cloud data centers may produce different mapping results. To bring down energy consumption in cloud data centers, there are two approaches: static energy conservation and dynamic energy conservation [3]. Static energy conservation approach aims at the low power design of computer hardware and starts from processor framework and circuit board structure. It attempts to utilize low-power circuits and devices to reduce the operating power consumption of physical machine and then bring down the power consumption of cloud data centers [4, 5]. Nevertheless, physical machine runs when multiple devices jointly act, and different devices not only depend on each other but also mutually restrict, so this approach has limited effect on reducing energy consumption [6]. Dynamic energy conservation approach is more flexible, which primarily includes dynamic voltage and frequency scaling(DVFS) technique and optimal resource scheduling algorithm. Though DVFS technique can solve the energy consumption problem of a single physical machine, it can’t generate the optimal result due to local optimization when handling with an entire cloud data center. Optimizing the resource scheduling algorithm is an important way of dynamic energy conservation, because it can promptly respond to the true-time situation of cloud data centers and generate an optimal scheduling result [7, 8, 9, 10]. In general, scheduling algorithm has one or more optimization objectives, including completion time, cloud data centers’ energy consumption and load balancing [11, 2, 13, 14].
The previous scholars have made some research on scheduling system optimization [15, 16, 17], but they haven’t considered the following issues. First, in reality, cloud service providers often provide cloud service to cloud users in a fixed form, which is mainly reflected in virtual machine’s parameters. Normally, the proportion of virtual machine’s CPU core number and memory size is a fixed value: 1:2 in compute-intensive virtual machine, and 1:4 in general-purpose virtual machine. When applying for a virtual machine, the cloud users may increase CPU core number, meanwhile, the memory size will also be increased, but their proportion is fixed. Hence, the cloud users could choose an appropriate virtual machine according to their own needs. Virtual machines with different proportions also raise the difficulties of deploying virtual machines. Second, physical machines with different operational cycles and resource use ratios will show different deployment effect, and different deployment effects may lead to great discrepancy in energy consumption, but the deployment sequence can be adjusted to change the way of deployment. Third, the energy consumption in cloud data centers is another important factor to restrict the development of cloud computing. Too high energy consumption will increase the operational costs of cloud data centers, thereby increasing the cloud users’ expense and reducing the use ratio of cloud data center. Fourth, the properties of virtual machine and physical machine consist of factors such as CPU use ratio, memory size and so on. Most of previous algorithms only consider CPU use ratio or take CPU use ratio as the main factor, thus causing a large waste in other resources of physical machine.
On account of above reasons, this paper proposes a resource clustering algorithm for cloud data centers(RCDC). This algorithm systematically analyzes the cloud data center model and physical machine’s use ratio, establishes dynamic resource clustering rules through k-mean clustering algorithm, and deploy the virtual machines according to the clustering results, as so to reduce energy consumption in cloud data centers.
Cloud data centers model and rules
Cloud data centers model
This section first introduces the framework of cloud data centers. The cloud data centers studied in this paper refer to the true data of Amazon’s virtual machines and physical machines. The main considerations include virtual machines’ CPU core number, memory size and lease time. Therefore, assume this cloud data center contains the physical machine set PM
Parameters of physical machines
Parameters of physical machines
Similarly, virtual machines applied by cloud users can also be represented as a virtual machine set. VM
Parameters of virtual machines
The clustering issue is an important research direction in unsupervised learning. The virtual machine’s mapping problem in this paper can be classified into a clustering issue. So, the clustering algorithm in unsupervised learning can be taken as reference to cluster the activated physical machine and form a cluster.
Once a cloud user submits the application for virtual machine lease, the scheduling algorithm in cloud data centers will depend on the current operating condition of physical machines to select suitable physical machines for deployment. Taking six physical machines in cloud data centers as an example, they are all isomorphic physical machines based on Intel Xeon P-8175M. Hence, their CPU core numbers are both 16, and their corresponding memory sizes are both 64 GB. See parameters in Table 3. Use k-means algorithm and set k value as 2. See the clustering results in Fig. 1.
Examples for physical machines clustering
Examples for physical machines clustering
Clustering results of physical machines.
The K-means clustering algorithm is an iterative solution clustering analysis algorithm. Its steps are to pre-divide the data into K groups, then randomly select K objects as the initial cluster centers, and then calculate each object and each seed. The distance between cluster centers, each object is assigned to the nearest cluster center. The cluster centers and the objects assigned to them represent a cluster.
After physical machines clustering is finished, when there is new application for virtual machines, the scheduling algorithm in cloud data centers could deploy them reasonably according to the clustering results. At this time, not only the operating time of physical machines but also their use ratio should be considered. To increase the operating time of physical machines would lead to the increase in energy consumption of these physical machines, thereby raising the energy consumption in cloud data centers. A low use ratio of physical machines would make more physical machines be activated, resulting in a waste of system performance. In view of above, we propose the dynamic resource clustering rules.
Rule 1-Dynamic resource clustering rule
When deploying the applications for a new virtual machine, the resource management system in cloud data centers will schedule according to the activated physical machines’ clustering results. First, figure out the destroy moment of such virtual machine and its distance to the clustering center of physical machine, and then cluster this virtual machine into the nearest cluster. Second, find out a physical machine in this cluster which residual resources could meet the need of this virtual machine. In case that there is a physical machine in this cluster that can meet the needs, the virtual machine should be deployed in the physical machine that may maximize the use ratio. If there is no such physical machine that can meet the needs, a new physical machine should be activated to complete the deployment of such virtual machine. Finally, update the physical machine clustering center.
The parameters of physical machine in Table 3 are still used for demonstration. Their clustering results are shown in Fig. 1. Assume the parameters of virtual machine are shown in Table 4, which includes the application for two virtual machines, and moments are established based on these to cluster and deploy the virtual machine. First, according to destroy moment of vm
Parameters of virtual machines to be deployed
Parameters of virtual machines to be deployed
In the above example, dynamic resource clustering rules are used to complete the deployment of two virtual machines, which doesn’t increase the operating time of physical machine, but improves the use ratio of corresponding physical machines, thereby reducing the energy consumption in cloud data centers. The next section will introduce the core algorithm RCDC of this paper.
Above section firstly analyzes the cloud data center model, then further studies dynamic k-means clustering algorithm, finally proposes to use dynamic resource clustering rules in the clustering and deployment of virtual machine. By using this rule and machine learning theory, a Resource Clustering Algorithm for Cloud Data Centers(RCDC) is put forward. This algorithm can reduce the energy consumption in cloud data centers by improving the use ratio of physical machine.
Based on above-mentioned conclusions, RCDC algorithm mainly includes seven steps as below:
Cluster the activated physical machine set; Choose virtual machines successively according to the their establishing moment in virtual machine application queue, and cluster them using dynamic resource clustering rules. Find out whether there is a physical machine in the cluster of this virtual machine that can meet the needs of virtual machine. If not, go to Step 4. If there is only one such physical machine, go to Step 5. If there are more than one such physical machine, go to Step 6. Activate new physical machines and deploy the virtual machine. Simultaneously, update the clustering results. Then, go to Step 7. Deploy virtual machines in this physical machine. Then, go to Step 7. Choose the physical machine that can maximize use ratio to complete the deployment of such virtual machine. Then, go to Step 7. Judge if there is undeployed virtual machine. If yes, back to Step 2. Otherwise, end the entire clustering algorithm.
From above steps, we can find that RCDC algorithm can deploy the virtual machine into the virtual machine that can maximize the use ratio of physical machines without extending the operating time of physical machine, thereby improving the use ratio of physical machines and further reducing the energy consumption in cloud data centers. In the next section, we will further validate its efficiency by experiment.
Experimental parameters
The change of physical machines’ use ratio with the number of virtual machine (computing-intensive).
The change of physical machines’ use ratio with the number of virtual machine (general-purpose).
In this section, to verify the effect of RCDC algorithm, we use CloudSim to implement this algorithm. Meanwhile, we simulate and implement FF algorithm, BF algorithm and EPS algorithm in CloudSim [18]. By reference to parameters of true virtual machines and physical machines provided by Amazon, the Amazon-customized AWS Graviton2 CPU-based physical machine is used to deploy virtual machines, which power consumption is 0.256 W. See other parameters in Table 5. This section will validate the average use ratio of physical machine and the energy consumption of cloud data centers. The energy consumption of cloud data centers is shown in Eq. (1). The definitions of
In this part, we validate the change of physical machines’ use ratio and cloud data centers’ energy consumption with the number of virtual machines. The number of virtual machines’ CPU is set as 2, and virtual machines are computing-intensive and general-purpose types. See other parameters in Table 5.
The change in energy consumption of cloud data centers with the number of virtual machines (computing-intensive).
The change in energy consumption of cloud data centers with the number of virtual machines (general-purpose)..
It can be seen from Figs 2 and 3 that, with the change of the number of virtual machines, the average use ratio of physical machines basically remains stable in four algorithms. Compared to FF algorithm, BF algorithm and EPS algorithm, the average use ratio in RCDC algorithm is improved by 13%, 13% and 8%. This is mainly because RCDC algorithm cluster the virtual machines first when deploying them, so that a virtual machine could only be deployed into the physical machine with similar destroy moment. Meanwhile, the physical machine that can maximize use ratio is prior selected to deploy. By this way, no physical machine will extend operating time or lower its use ratio due to the deployment of virtual machine, thus the ensemble average use ratio of physical machine is enhanced ultimately.
The change of physical machines’ use ratio with the capacity of virtual machines (computing-intensive).
The change of physical machines’ use ratio with the capacity of virtual machines (general-purpose).
From Figs 4 and 5, it can be seen that the energy consumption of cloud data centers in four algorithms increases with the rising number of virtual machines. Compared to FF algorithm, BF algorithm and EPS algorithm, RCDC algorithm reduces by 11%, 11% and 7%, respectively. The unactivated physical machines are in inactive state, so it is unnecessary to calculate their power consumption. Therefore, the power consumption mainly depends on the operating time of activated physical machines. As RCDC algorithm adopts dynamic resource clustering rule in the deployment of virtual machines, such rule clusters these virtual machines into the cluster closest to their destroy moment, thereby lessening the possibility that the operating time of physical machine is extended to the largest extent. Finally, the operating time of physical machine and the energy consumption in cloud data centers are lower than those of the contrast algorithms.
The change of cloud data centers’ energy consumption with the capacity of virtual machines (computing-intensive).
The change of cloud data centers’ energy consumption with the capacity of virtual machines (general-purpose).
In this part, we will verify the change of the physical machines’ use ratio and cloud data centers’ energy consumption with the capacity of virtual machines. The number of virtual machines is set as 50000, including computing-intensive and general-purpose types. See other parameters in Table 5.
From Figs 6 and 7, it can be seen that the average use ratio of physical machines in four algorithms basically remain stable with the capacity of virtual machines. Compared to FF algorithm, BF algorithm and EPS algorithm, the use ratio in RCDC algorithm is improved by 23%, 23% and 18%. This is mainly because RCDC algorithm aims at maximizing the use ratio of physical machines when deploying virtual machines in the cluster, so as to enhance the ensemble average use ratio of physical machines as possible.
From Figs 8 and 9, it can be seen that the energy consumption of cloud data centers in the four algorithms increase with the rising capacity of virtual machines. Compared to FF algorithm, BF algorithm and EPS algorithm, the use ratio in RCDC algorithm is improved by 18%, 18% and 17%. RCDC algorithm can utilize dynamic resource clustering rules to cluster virtual machines into the cluster closest to their destroy moment, so as to reduce the operating time of physical machines and the energy consumption in cloud data centers.
Conclusions
Through analyzing the cloud data center model and physical machines’ use ratio systematically, using k-means clustering algorithm to establish dynamic resource clustering rules, and deploying virtual machines according to the clustering results, a Resource Clustering Algorithm for Cloud Data Center(RCDC) is proposed. When deploying virtual machines with this algorithm, on the premise of not extending the operating time of physical machines, the use ratio of physical machines is maximized, so as to improve use ratio and reduce energy consumption in cloud data centers. Finally, we use CloudSim to implement this algorithm. Compared to existing algorithms, RCDC algorithm can effectively improve physical machines’ use ratio and reduce energy consumption in cloud data centers.
Footnotes
Fund project
Youth Science Foundation Project of National Natural Science Foundation of China:Research on performance optimization of blockchain data communication considering trust and weight(61802301); Shaanxi Provincial Natural Science Basic Research Plan – General item (2019JQ-056); Xi’an Science and Technology Planning Project (2020KJRC0101).
