Abstract
Task clustering is an effective approach of improving cloud computing resource utilization, which includes other benefits such as better QoS, load balance and low energy consumption. Different existing clustering methods have sharp boundaries, three-way clustering as an application of three-way decision, uses core region and fringe region to represent a cluster. In this paper, we propose a novel idea of clustering weight algorithm called TWCW algorithm(Three-way clustering weight) based on three-way decision to overcome the low utilization aiming at improving energy-efficient. The algorithm encompasses two steps, the identified tasks are assigned into the core region and the uncertain tasks are assigned into the fringe region based on diversity of cloud tasks and the dynamic nature of resources using the three-way K-means clustering firstly. The cluster center of CS i , centroid i = {mips, ram, bw} is obtained from the result of three-way clustering. In the second step is to score clusters and schedule tasks. We define a scoring matrix to record scores of the weight between clusters and the preference of attributes within clusters according to the cluster center, and then schedule tasks based on scoring matrix. We validate the high utilization of resources of the proposed algorithm by using simulation of CloudSim. The experiment shows the proposed algorithms significantly reduce energy consumption while significant improving response time of tasks comparing with K-means algorithm and FCM algorithm.
Keywords
Introduction
Cloud computing has become a paradigm of on-demand services, while there are huge resources stay at low utilization rates. Furthermore, a large number of resources in the data center are in idle state, the usage rate is generally only 11% –15% [1]. Barrsos [2] found that the utilization of nodes only 10% to 50% by statistical analysis of running status of 5000 hosts in the past six months. Therefore, how to accurately and effectively allocate cloud resources and improve the utilization rate of cluster resources is an important research content of cloud computing. The data from real running in the Google cloud platform [17, 18] indicates cloud tasks have attributes of multidimensional heterogeneity, which include type of tasks, service characteristics, resource requirements and so on. Many researchers [21] use these features to study tasks clustering, and according to the characteristics of each cluster to take a scheduling strategy targeted. The experimental results show that clustering scheduling has obvious advantages for speeding up task response time and improving resource utilization [29]. However, usual clustering algorithm is a hard clustering, that is, for a specific cloud task, it belongs to either a cluster or another cluster. Actually, some cloud tasks maybe in a middle state according to a feature classification, that is, task clustering is usually not an either A or B question. Three-way clustering [4, 20] algorithm based on three-way decisions [6–8, 20] well meets this feature and provides a new perspective in task scheduling modeling and analysis in cloud computing.
In this paper, we propose a clustering weight algorithm based on three-way clustering for overcoming the low utilization. Firstly, we classify tasks into some clusters according to diversity of cloud tasks and dynamic requirement nature of resources, and then scheduling based on score matrix given on each cluster according to task attributes. The remainder of the paper is organized as follows. In Section 2 we review some clustering methods that have been applied to cloud computing to increase usage and three-way clustering. Section III we propose the TWCW(three-way clustering weight) algorithm. In Section IV, we validate the algorithm by cloud simulation-CloudSim. Finally, the conclusion and future work are presented in Section V.
Related work
Although it has relatively few historical studies in three-way decisions, there are a large volume of published studies describing the thought of three-way decisions, especially in recent years, there has been an increasing amount of literature on the application of three-way decisions. As a kind of method human problem solving and information processing, the basic ideas of three-way decisions are “to divide a universal set into three pair-wise disjoint regions, or more generally a whole into three distinctive parts, and to act upon each region or part by developing an appropriate strategy”(From Yiyu Yao [19]). The unique feature of three-way decisions is a type of three-way approaches, that is, problem solving and information processing using "the three division method". Since three-way decisions was proposed, it has been widely used such as three-way computing, three-way processing, three-way classification [19], three-way analysis [14], three-way clustering [10, 13], three-way recommendation [15], measure the model of TAO(Trisecting-acting-outcome) [11] and many others.
As an effective application of clustering algorithms stem from three-way decisions, the three-way clustering expand the relationship of belongs between objects and sets. For a given cloud task set U, let U = {x1, x2, . . . , x n }, n is the number of tasks of U. If we use general clustering, then a task x i , either belong to the cs j or not belong to the cs j , the result of clustering is CS = {cs1, cs2, . . . , cs k }, k is the number of clusters. And if we use the ideas of three-way decisions, the relationship between a task and a set are: belong, may belong, do not belong. Therefore, the set cs j is divided into three regions that do not intersect each other, L region, M region and R region, where L region represents the core objects in the set, M region is edge objects in the set and if a task is not in the cluster, it belongs the R region. The result of clustering is CS = {(L (cs1) , M (cs1)) , . . . , (L (cs k ) , M (cs k ))}. Obviously, the three-way clustering has made a further distinction between closeness of objects in a cluster based on degree of dispersion between clusters and degree of clustering within the cluster.
Clustering algorithm is widely used in cloud computing. In [22], the authors use k-means algorithm to cluster tasks into three types, that is, computing, storage and network. And using weight factors between three types resource to adjust the priority, thereby reducing quantity of candidate resource by clustering resources using ant colony algorithm. To improve the task response time and scheduling efficiency, Liu et al. [23] introduced the FCM algorithm to schedule tasks, according to CPU, memory, disk IO and bandwidth attributes. Feng et al. [24] drawing on the idea of threshold, defines the visibility of resources according to the multidimensional attributes of cluster resources, and then divides the levels by the static thresholds. Further, with this level and task execution time as the adaptive function in PSO scheduling algorithm, thereby achieve the purpose of improving load balance. In [25] clustering tasks is to adopt the length and importance of tasks, and taking the maximum number of tasks completed as the objective function to build LPM model. Experimental results show that under given conditions of user task budget, more tasks can be accomplished than classical scheduling algorithms. In [26], the author uses hierarchical clustering to pre-process cloud tasks. Scheduling task using minimize tasks completion time as the objective function as well as taking into account the resource load and system throughput. The results show that the task completion time is shortened while meeting the quality of service. Gao et al. [27] put forward a delay scheduling algorithm based on task classification, clustering tasks according to the task length, and then adjust the threshold of task’s waiting time for scheduling the task. Compared with DS and FIFO algorithms, the algorithm effectively shortens tasks response time.
A large number of studies are clustering tasks based on task attribute analysis. In these studies, each task either belongs to a class or belongs to another class. In fact, some objects may be in a fuzzy state, which has the characteristics of multiple clusters at the same time. The proposition of three-way decisions provides a new idea for the modelling and analysis of task scheduling.
TWCW algorithm
Scheduling of cloud tasks is a multi-objective optimization problem including service quality, load balancing and economy. The TWCW (three-way clustering weight) algorithm proposed in this paper is based on three-way clustering tasks with the goal of maximizing the utilization of cluster resources. Then, the scheduling strategy is set according to preference of the core state and fuzzy state task based on the clustering result. In this section, we describe in detail the scheduling strategy for improving resource utilization.
Schedule model
As shown in Fig. 1, the scheduling framework include the following modules, Scheduler, its works include cluster scheduling and select the appropriate nodes for cloud tasks. Computing node(Host): the physical node in which host virtual machine. VM(Virtual Machine): the basic unit of resource allocation, execute request tasks.

Schedule framework based on TWCW.
When tasks entering into the queue, the scheduler takes out tasks one by one and clustering them according to the tasks attributes into k clusters, {cs1, cs2, . . . cs k }. Then, the target host is selected for each cluster, and the tasks are executed by the virtual machine in the host, and finally the result is returned to the broker.
Given a set of n independent tasks U = {t1, t2, . . . , t n } in which each task t i , 1 < i < n is a quadruples, t i = {id, mips i , mem i , bw i } where id, mips i , mem i , bw i denote the task’s id, computing, memory, and bandwidth resources requested by the task, respectively. A set of m identical hosts H = {h1, h2, . . . , h m } in which each host h i , 1 < i < m is a triple of pair elements, h i = {(P i , p i ) , (M i , m i ) , (B i , b i )}, where P i , M i , B i represent maximum computation, memory and bandwidth resources of host h i can be used to allocate and p i , m i , b i represent allocated computing, memory, and bandwidth resources, respectively.
Based on three-way clustering, we propose formal descriptions of task clustering as follow,
If task t ∈ L (cs i ) then t belongs to cluster cs i ; If task t ∈ R (cs i ) then t does not belongs to cluster cs i ; If task t ∈ M (cs i ) then t may be belongs to cluster cs i or not; The result is, CS = {(L (cs1) , M (cs1)) , . . . , (L (cs k ) , M (cs k ))}, where k is the number of clusters.
Three-way clustering algorithm include two steps, one is to find the best number of clusters, the second is to determine the objects in each cluster, the algorithm as shown in Algorithm 1. In the process of finding the optimal number of clusters, we use K-means algorithm to clustering task set, and determine the clustering results according to degree of coupling between clusters and degree of cohesion within the cluster.
The process of determining cluster objects include two sub steps: determining objects of M region in the cluster by nearest neighbor algorithm and then determining objects of L region and M region by subtraction between two adjacent elements in turn, and then sorting the results. Specifically, for the cluster cs h , task t i , task t j and near neighborhood Neig q (t i ), where Neig q (t i ) represents the nearest q tasks away from t i in the Euclidean distance. let centroid h represents the center point in cs h . When t i ∉ cs h , t j ∈ Neig q (t i ), if t j ∈ cs h , then t i ∈ M (cs h ). In cs h , we calculate the distance between tasks t1, t2, . . . , t l and center point centroid h respectively. Next, we sort the results to find the first task tp-1, the second task t p . Finally, dividing t1, t2, . . . , tp-1 into L (cs h ), t p , tp+1, . . . , t l into M (cs h ).
In the CS = {(L (cs1) , M (cs2)) , . . . , (L (cs
k
) , M (cs
k
))} , the center of cs
i
is
For scoring clusters, we need to consider the weight between clusters and the preference of attributes within clusters according to the cluster center centroid, the scoring matrix is defined as follow to record the scores.
The scoring function determines value of {wj,1, wj,2, . . . , wj,M} by sorting the jth attribute of {centroid1, centroid2, . . . centroid
k
}. For example, suppose a scoring matrix as follow,
The rows of the matrix represent the value of attributes of centroid, that is, {mips, ram, bw} denote the weight of resources requesting between clusters. More specifically, for the mips in the matrix,
The columns of the matrix represent the number of clusters and each column denotes preference of resource requesting within the cluster. For example, the w1 = {4, 3, 2}, that means, the task in cs1 request 4 units mips, 3 units mem and 2 units bw.
Schedule
The schedule function need to be set for L region and M region respectively according to result of CS T and score matrix W after completing the task clustering and scoring. The TWCW algorithm proposed in this paper pursues the maximizing resource utilization, so the scheduling function needs to make full use of idle resources as well as balance load of clusters.
Similarity, we also define the overall amount of resources that can be allocated in host i as follow,
The core idea of TWCW algorithm is to select nodes that can allocate the most resources as destination nodes according to the remaining allocatable resource based on the task evaluation node list H for maximizing the utilization of free resources, that is, for the task t
i
, the selection strategy is,
For tasks in L (C
i
) region, the weighting factor μ
l
is determined by the column vector of the normalized scoring matrix W, namely,
For tasks in M (C i ) region, the first is to determine the main resource for this task request according to resources requested by the task in the proportion of cluster resources. The second is to map the multidimensional properties into one-dimensional resource space, finally determining the weight factor. The process of main resource classifier are as follow,
As mentioned above, TWCW algorithm includes tasks clustering, scoring and scheduling. Suppose the set has n tasks, the clustering clusters are k, number of neighbors are q, the number of iterations using K-means is 1. The time complexity of finding the best number of cluster is O (n2I + n2q). The time complexity of determining the objects in cluster region is O (n log n + knq).
Experiment and data analysis
In this section, we discuss the experimental details of the algorithm. The experiment was performed using the simulator-Cloudsim [31].
Experimental environment and parameter settings
The specific configuration of the experiment is 1000 hosts, the host is quad-core and single-core processing capabilities were 37,274 MIPs. Host RAM is 8GB, hard drive capacity is 1TB, the bandwidth is 1Gbit/s. The parameters of tasks setup as shown in Table 1,
Task parameters setup
Task parameters setup
We evaluate the performance of TWCW algorithm from two aspects, one is average response time and the other is the system resource utilization. The average response time includes waiting time and processing time, and the system resource utilization is denoted by average usage of the host list. The experiments simultaneously implement the clustering scheduling algorithm of k-means [21] and FCM [23] and examine their performance comparing with TWCW.
In the first experiment, we respectively implement TWCW, K-means and FCM algorithms to observe the influence of clustering method on task response time respectively. The experimental results are shown in Fig. 2. The TWCW algorithm has a certain reduction in the task response time compared with K-means or FCM. It also shows that TWCW can reduce the task running time by choosing more nodes with more preference attributes for tasks, so that computing operations are finished faster and the results are returned after clustering tasks with finer granularity. The average response time of the TWCW algorithm is reduced by about 7% compared with the FCM and K-means algorithms.

The average response time with requests increasing.

The average utilization with requests increasing.

The resource usage with requests increasing.
In the second experiment, we verify optimization effect of TWCW algorithm in resource utilization. We evaluate the performance of TWCW, K-means and FCM algorithms in cluster resource utilization respectively. The experimental results are shown in Figs. 3. and 4, respectively.
From the Fig. 3, the utilization of cluster resources increases with the load increasing. It is worth noting that three algorithms have little difference when resource requests between 1000 and 2000. Once the number of request increasing quickly, the optimization effect of TWCW algorithm is more remarkable, and the improvement is nearly 11.3% compared with other algorithms.
In the Fig. 4, the solid line indicates the median of resource utilization in the four quantile interval. It can be observed that the four quantile interval of TWCW algorithm is relatively concentrated, and the median of utilization is higher than other algorithms, As we all know, K-means or FCM need to set the number of clusters in advance. Compare with this, TWCW clustering can more effectively smoothing clusters load, and improve resource utilization based on finer granularity cluster partition and objects preference scheduling.
Conclusion
Aiming at the low utilization rate of cluster resources, this paper introduces three-way clustering into cloud platform for the first time, and proposes a three-way clustering weight algorithm-TWCW. Firstly, it analyses the diversification requirements of cloud tasks and the dynamic characteristics of resources, and uses three-way clustering algorithm to cluster task sets, and then combines them. Task attributes are used to score and schedule cluster objects. The simulation experiments reveal that our approach outperforms the K-means and FCM clustering scheduling, the three-way clustering scoring algorithm (TWCW) significantly improves the average task response time and resource utilization rate, and balances the cluster load at the same time.
In the future work, we’ll refine the task model and dynamic evaluation of resource load from the perspective of energy saving. And further consider the threshold setting in the task consolidation process, and design a better evaluation strategy for clustering.
Footnotes
Acknowledgments
This work was supported in part by the national natural science foundation of China (N0.61202458, No.61403109) and natural science foundation of Heilongjiang province (N0.F2017021).
