Data set replica placement strategy based on fuzzy evaluation in the cloud

Abstract

Replication technology can efficiently enhance data availability and thereby increase system reliability in cloud storage system. However, one urgent challenge is how to select the correct replica hosting sites, as different data center configurations lead to entirely different storage service quality. Additionally, users usually have individual requirements; some may care more about reliability while others are more likely to be concerned about cost. This paper investigates choosing the most suitable replica storage sites using an analytic hierarchy process (AHP) model by applying fuzzy comprehensive evaluation to candidate data centers for different kinds of users. First, a novel four-dimensional Qos (quality of service) model of cloud storage service is proposed. Then, a Qos preference-aware algorithm is introduced to deal with individual Qos sensitivity (IQS) constraints. In order to evaluate the candidate replica storage sites and select the best choices from among them, an algorithm based on fuzzy comprehensive evaluation is designed and implemented. Corresponding simulation results indicate that the strategy is suitable to serve various IQS users very well with better effectiveness and practicality.

Keywords

Cloud replica placement analytic hierarchy process (AHP)fuzzy Qos

1 Introduction

By placing multiple copies of data sets in different data centers, replication is a commonly used technique in cloud storage systems to reduce access latency and network bandwidth utilization, thereby enhancing system reliability and load balancing, especially for applied scientific and technical programs that require large-volume data sets [4, 7]. Additionally, it has been widely accepted as an important part of data-driven process monitoring or statistical process monitoring (SPM) that applying multivariate statistics and machine learning methods to fault detection and diagnosis for industrial process operations enhances production results [14].

However, many unsolved questions remain in data set replication technology. One question is how to select the correct sites for new replica storage that can satisfy not only the system requirements but also user requirements. On one hand, it is impossible to provide unlimited storage capacity in data centers. On the other hand, different configurations of different data centers lead to varying storage service quality, such as availability, performance and fault tolerance. Users of different cloud storage services may tolerate different Qos (quality of service). Each user typically has individual requirements, and some care more about reliability while others are most likely to be concerned about cost. Therefore, it is necessary to provide a preference-aware replica selection strategy for different users with individual Qos sensitivity (IQS). In addition, all sources in the cloud carry certain costs; wherever the data set replica storage is located, resource consumption must be paid for. Each data center deploys a different price policy, and thereby provide different storage cost ratios. Furthermore, data transfer also incurs cost, as the data sets must be transferred from remote data centers. However, the costs may be significantly different because application data sets vary in transfer price policies, usage frequencies and data size. Nevertheless, there is no suitable mathematical model to characterize network behavior predicting accurate replica placement due to global uncertainty. In this way, the wide utilization of the pay-as-you-go model in the cloud makes the data set replica placement problem more complex than before.

Fuzzy comprehensive evaluation is a mathematical method to comprehensively evaluate things that are not clearly defined in the real world by using fuzzy mathematical methods [8 , 19]. In this paper, fuzzy logic is used to identify suitable data centers for the replication of data sets, because fuzzy logic can deal with reasoning that is approximate rather than fixed and exact. In order to address the replica storage placement problem in such an environment, a Qos preference-aware replica selection strategy for different kinds of users with individual Qos sensitivity constraints in cloud computing data centers is proposed. First, a novel four-dimensional Qos model of replica selection is introduced, including reliability, time, cost and security. Next, a Qos preference-aware algorithm based on an analytic hierarchy process (AHP) is proposed to deal with the IQS constraints. In order to evaluate the candidate replica storage sites and choose the best from among them, an algorithm for replica placement selection based on fuzzy comprehensive evaluation is designed and implemented.

The rest of the paper is organized as follows: Section 2 presents related research on data replication in the cloud. Section 3 provides a novel four-dimensional Qos model of replica selection. Section 4 describes the candidate data center evaluation algorithm based on fuzzy logic. Section 5 outlines the simulation environments and presents the simulation results. Finally, Section 6 concludes and provides directions for future research.

2 Literature review

With the advancement and development of various technologies, data set replica placement in distributed systems has been studied in many works, which are referenced and adopted in cloud data set replication. Lee et al. [12] presented a modified bandwidth hierarchy-based replication (BHR) algorithm by minimizing data access time and avoiding unnecessary replication. Ren et al. [20] proposed replica placement based on storage used and makespan as evaluation parameters.

Unfortunately, current solutions are frequently focused on improvements in data access performance, and neglect the cost of data set replica management, such as storage cost. Simultaneously, new requirements and challenges continue to emerge for the deployment of scientific applications in the cloud along with the development of information technology. Nevertheless, very little has been done to consider the comprehensive factors in data set replica placement, such as reliability and security, and particularly the related cost.

Based on fuzzy evaluation, information about the priority of various alternatives can be achieved as a reference for decision makers. Some recent works have addressed the problem of data set replica placement using fuzzy evaluations. Table 1 summarizes the related technological reports in recent years.

As shown in Table 1, it can be concluded that: (i) current solutions primarily consider a simple linear combination of resources. In the proposed approach, comprehensive factors are additionally considered, such as reliability and security, to improve system performance; (ii) cost is an important element in deciding replica storage placement, but little research has been conducted considering the cost paid by users. In this paper, cost will be regarded as one of fundamental components in determining replica storage placement; (iii) with growing emphasis on cloud computing, data management systems for the cloud environment have emerged, including Google file system (GFS) and Hadoop distributed file system (HDFS) [1]. However, the employed replica selection features are still relatively simple, and do not consider the Qos preference of the user. These motivate the development of a replica placement strategy that can optimize storage sites in addition to high system performance.

3 Qos based replica placement selection models

There are many factors that affect data set replica selection. This paper proposes a 4-dimensional Qos model, which addresses reliability, time, security and cost, as shown in Fig. 1. These variables describe data access quality in terms of reliability; access time elapsed since request was sent; security; and the cost of data set storage and transfer. These parameters are analyzed in detail.

Reliability. It is clear that any application system that satisfies all business requirements but fails to satisfy the reliability quality of service parameters will lead to dissatisfaction in cloud users. Primary parameters involving reliability include: (i) data center availability; (ii) data transfer reliability; and (iii) consistency of data sets replicas.

Time. Data centers, as an important infrastructure component of the cloud computing environment, must hand user requests as soon as possible. It is obvious that network bandwidth consumption is an important factors used to identify cloud storage service quality [5]. However, the storage request queue and storage media speed were not addressed in previous work as factors that influence the data set response time. The primary parameters involving the time dimension include: (i) available bandwidth; and (ii) data transfer delay. Each data center receives many requests simultaneously, but can serve only one request at a time [2, 9]. Therefore, requests must wait in a queue. However, there are many unknown exceptions in practical application, such as the interruption of a previous request due to an unstable or failed network; timeout of a previous request or a previous request that receives a response with various error codes. To solve these problems, a time threshold t_maxtf, is defined as the longest transmission delay; each data transfer request must complete within the time threshold, which is proportional to data set size. (iii) Storage access latency. The storage media speed and the number of requests in the queue play major roles in determining the average response time experienced by applications [2]. Different storage media have different speeds (data transfer rates) in reading and writing operations. For example, the HP storage works Ultrium 920 Tape Drive speed = 120MBps, while the HP Storage Works Ultrium 448 Tape Drive speed = 24 MBps. Consequently, data set access latency is the data volume divided by storage speed [9].

Cost. With a pay-as-you-go model, it is obvious that cost is one of the most important aspects in deciding whether to use cloud storage services or not, especially when large data sets or “big data” are common in the cloud. There are three typical types of cost consumption: data set storage cost, transfer cost and update cost.

Security. In a cloud environment with many data centers, it is necessary to consider not only the data set security itself, but also the safety of the data set environment. The primary parameters involving security include: host security, data transfer security and replica integrity.

4 Fuzzy based Qos preference-aware and replica placement algorithm

In this section, a new method of data sets replica placement is proposed using AHP, called the fuzzy based Qos preference-aware algorithm (FQPA).

4.1 Qos preference-aware algorithm

According to user Qos preference, a Qos preference-aware algorithm (QPA) is proposed, as described below.

Step 1. Constructing a data center storage service evaluation hierarchical framework using a 4-dimensional model: reliability, time, cost and security. As shown in Fig. 2, the ultimate goal is the performance evaluation of data center storage services. There are four sub-goals as well. The third level represents the important facets of the sub-goals, representing 12 criteria in total.

Step 2. Constructing fuzzy judgment matrices by pair comparison in the rules layer and attributes layer, creating a 1–9 rating scale for reference [13]. For example, if x and y are of equal importance, then the intensity of importance f (x, y) is 1; if x is absolutely more important than y, f (x, y) is 9. This scale offers a theoretical justification for the comparison of a set of homogenous elements, and has been validated for effectiveness by a large number of applications [17].

After the hierarchy structure is established, fuzzy pair-wise comparison matrices of the ultimate and sub-goals can be obtained respectively for each dimension, written as matrix A and A_k (k = 1, 2, 3, 4) with respect to reliability, time, cost and security. Specifically, A (a_ij) _4×4 represents the compared value for the 4th dimension, where a_ij = f (x_i, x_j) , (i, j = 1, 2, 3, 4), x_i represents the four elements separately.

The following conclusions can be drawn:

Lemma 1.Ifa_ij > 1, thena_iis more important thana_jfor achievement of the ultimate goal. Ifa_ij < 1, a_jis more important thana_i. Furthermore, ifa_ij = 1, thena_ianda_jare of equal importance to the overall goal.

Lemma 2.The pair-wise matrixA, whose elements area_ijvalues, is essentially a square positive reciprocal matrix, anda_ii = 1, a_ij > 0, a_ij = 1/a_ji, for anyi, j = 1, 2, 3, 4, also referred to as the positive reciprocity principle.

Similarly, four 4-dimension matrixes can be obtained in the attributes layer, $A_{k} = (a_{ij}^{k})_{4 \times 4}$ , where $a_{ij}^{k} = f (x_{i}, x_{j})$ , k = 1, 2, 3, 4. Similarly, these matrixes are also compatible with the above two lemmas.

Step 3. Calculating the relative weight for five matrixes in Step 2 with the geometric mean method. Next, the consistency of the judgment matrix must be checked. For example, matrix A calculates the relative weight of reliability, time, cost and security, written as W = {w₁, w₂, w₃, w₄}.

Sub-step 1: Normalized process. The eigenvector (a column vector written as a row to save space), also called the Relative Value Vector (RVV), can be calculated by standard methods using Equation (1). $w_{i} = \frac{\sqrt[n]{\sum_{j = 1}^{n} a_{ij}}}{\sum_{k = 1}^{n} (\sqrt[n]{\sum_{j = 1}^{n} a_{kj}})} (i = 1, 2, 3, 4)$ (1)

Sub-step 2: Consistency checking analysis. The fuzzy judgment matrices have been established, but may create a situation with inconsistent logic. For example, one evaluator indicates that “x is more important than y, y seems moderately more important when compared with z, and x is equally important compared with y”. To avoid or reduce the inconsistency, it is necessary to analyze the consistency of the evaluation.

During construction of the judgment matrix, it is not necessary to be transitive and consistent, that is, a_ij × a_jk = a_ik is not to be satisfied, but the consistency of the judgment matrix is generally required. In addition, the consistency ration (CR) can be calculated according to Equation (2) [17, 18]. $CR = \frac{CI}{RI}$ (2)

RI is the average index for the randomly generated weight index, as is shown in Table 2. CI is the consistency index, that can be approximately calculated using Equation (3), where n is the dimension of matrix A (n = 4), and λ_max is the maximum eigenvalue of the matrix. $CI = \frac{λ_{\max} - n}{n - 1}$ (3)

A common calculation of the characteristic roots method can be obtained using Equation (4). $λ_{\max} = \frac{1}{n} \sum_{i = 1}^{n} \frac{\sum_{j = 1}^{n} a_{ij} w_{j}}{w_{i}}$ (4)

If CR < 0.1, the consistency of the matrix is acceptable; otherwise if CR > =0.1, appropriate amendments should be made to the matrix and the above steps should be repeated until the value is less than 0.1.

W is the weight vector once the value of CR meets the pre-defined index. Additionally, the weight vectors w₁, w₂, w₃ and w₄ can be obtained using the above methods.

Using the QPA algorithm, the individual IQS for cloud storage service users can be obtained using fuzzy and non-quantitative IQS constraints. Therefore, the corresponding weight vector reflecting IQS clouds users preference can be obtained, which is the foundation for the following algorithm.

4.2 Replica placement selection algorithm

In this sub-section, a novel fuzzy comprehensive evaluation algorithm for replica placement selection in the cloud using fuzzy theory is designed. The fuzzy Qos preference-aware replica placement algorithm (FQPA) can be described as follows:

Step 1. Identify and divide the cloud user attribute set. According to the 4-dimensional Qos model in Fig. 2, all Qos attributes can be divided into four sub-sets, written as U = {U₁ (reliability) , U₂ (time) , U₃ (cost) , U₄ (security)}, and $U_{1} = (u_{1}^{(1)} (host availability), u_{2}^{(1)} (data transfer reliability), u_{3}^{(1)} ({replicas}^{'} consistency)), U_{2} =; {u_{1}^{(2)} (storage access speed), u_{2}^{(2)} (available bandwidth), u_{3}^{(2)} (network delay)}, U_{3} = {u_{1}^{(3)} (data storage cost), u_{2}^{(3)} (data transfer cost), u_{3}^{(3)} (data update cost)}, U_{4} = {u_{1}^{(4)} (host security), u_{2}^{(4)} (data transfer security), u_{3}^{(4)} ({replicas}^{'} integrity)}$ .

Step 2. Calculate user Qos satisfaction evaluation set V. No matter the type of Qos attributes, the evaluation objects result is only one: offering a storage service of the top level. In this paper, a five-level evaluation result is obtained, that is, V = {I, II, III, IV, V}, and the higher grade represents greater user satisfaction.

Step 3. Determine the user preference using Qos aware technology. In this paper, the user Qos requirements are translated into comprehensive evaluation weight sets using the following sub-steps:

Sub-step 1: Calculate the weight of each dimension using Qos preference, then obtain th ith dimension weight a_i (i = 1, 2, 3, 4);

Sub-step 2: Similarly, calculate the weight of attribute $a_{j}^{(i)}$ for each dimension.

Step 4. Calculate the 2nd level single judgment matrix corresponding to the candidate data center.

There are at least two questions used to evaluate the 12 attributes. The first one is that there is no public criterion, and the second, the values upper and lower bounds are different. Also, some attributes are better with bigger values; however, some attributes are better with lower values.

In order to evaluate Qos attributes, the attribute values must be standardized using the membership function. In this paper, triangular membership functions (MF) are ised to describe these variables. Each variable has five MFs: I, II, III, IV and V. Table 3 describes the membership function for each attribute.

In Table 3, the replica update frequency is measured with units mpt and hpt; x-mpt (x-minutes per time) indicates that the data set updates once every x minutes, and x-hpt (x-hours per time) indicates that the data set updates once every x hours. Request wait time in the queue is used to describe data transfer delay. Since the time required for the current request is the same as the sum of the storage access latency time in the queue, the underlying request has to wait for the total latency of the prior requests in the queue. Thus, the data transfer delay should be the sum of the access latencies of all prior requests.

In addition, the access control strategy, transfer security strategy and data security strategy are described in Table 4. The number of security strategies is used to describe the host security, data center security and replica integrity, respectively. A modern server CPU usually has multi-cores, while the CPU utilization ratio refers to the whole system CPU resource utilization. It is not difficult to acquire all the processor utilization ratios using functions in C++ or Java programming language. For simplicity, the average CPU utilization ratio is used to represent the data center availability. For example, a data center with four cores of utilization ratios CPU-1:23.98%, CPU-2:17.43%, CPU-3:16.02% and CPU-4:8.15%, has an average CUP utilization ratio of 16.395%.

As an example, the CPU utilization ratio membership function using Equations (5–9) is presented. Other attributes membership functions are similar to that of the CPU. $μ I (x) = {\begin{matrix} 1, & x \geq 80 \\ \frac{1}{1 + 0.2 \times (80 - x)^{2}}, & x \leq 80 \end{matrix}$ (5) $μ II (x) = {\begin{matrix} \frac{x}{61}, & x \leq 60 \\ 1, & 60 < x < 80 \\ \frac{100 - x}{21}, & x \geq 80 \end{matrix}$ (6) $μ III (x) = {\begin{matrix} \frac{x}{41}, & x < 80 \\ 1, & 40 \leq x \leq 60 \\ \frac{100 - x}{41}, & x > 60 \end{matrix}$ (7) $μ IV (x) = {\begin{matrix} \frac{x}{21}, & x < 80 \\ 1, & 40 \leq x \leq 60 \\ \frac{100 - x}{61}, & x > 60 \end{matrix}$ (8) $μ V (x) = {\begin{matrix} 1, & x \leq 20 \\ \frac{1}{1 + 0 . 2 \times (x - 20)}, & x > 20 \end{matrix}$ (9)

Its membership function is depicted in Fig. 3.

Next, we evaluate each attribute according to its membership function, and obtain four judgment matrices as shown in Equation (10). $Ri = [\begin{matrix} r_{11}^{(i)} & r_{12}^{(i)} & r_{13}^{(i)} \\ r_{21}^{(i)} & r_{22}^{(i)} & r_{23}^{(i)} \\ r_{31}^{(i)} & r_{32}^{(i)} & r_{33}^{(i)} \end{matrix}], (i = 1, 2, 3, 4)$ (10)

Step 5. First level fuzzy comprehensive evaluation. Conduct first level fuzzy comprehensive evaluation for 4-Qos attributes, and obtain four judgment results. The results can be obtained by Equation (11) where ∘ is the fuzzy operator, representing the evaluation function.

$\begin{matrix} \underset{\sim i}{B} & = & \underset{\sim i}{A} \circ R_{i} \\ = & (a_{1}^{(i)}, a_{2}^{(i)}, a_{3}^{(i)}, a_{4}^{(i)}) \circ \\ [\begin{matrix} r_{11}^{(i)} & r_{12}^{(i)} & r_{13}^{(i)} & r_{14}^{(i)} & r_{15}^{(i)} \\ r_{21}^{(i)} & r_{22}^{(i)} & r_{23}^{(i)} & r_{24}^{(i)} & r_{25}^{(i)} \\ r_{31}^{(i)} & r_{32}^{(i)} & r_{33}^{(i)} & r_{34}^{(i)} & r_{35}^{(i)} \\ r_{41}^{(i)} & r_{42}^{(i)} & r_{43}^{(i)} & r_{44}^{(i)} & r_{45}^{(i)} \end{matrix}] \\ = & (b_{1}^{(i)}, b_{2}^{(i)}, \dots, b_{5}^{(i)}) \end{matrix}$ (11)

In this paper, weighted averaging operators M(·,+) take all elements into account, as shown by Equation (12). $b_{j}^{(i)} = \sum_{k = 1}^{4} a_{k}^{(i)} \cdot r_{kj}^{(i)}, \sum_{k = 1}^{3} a_{k}^{(i)} = 1, (j = 1, 2, \dots, 5)$ (12)

Step 6. Second level comprehensive evaluation. Consider U_i as an element and use $\underset{\sim i}{B}$ as its single element evaluation, and then obtain the second level matrices using Equation (13). $R = (\underset{\sim 1}{B}, \underset{\sim 2}{B}, \underset{\sim 3}{B}, \underset{\sim 4}{B})^{T}$ (13)

Afterwards, a second level comprehensive evaluation result can be obtained using $\underset{\sim i}{A} = (a 1, a 2, a 3, a 4)$ as shown in Step 3 and based on Equation (14).

$\begin{matrix} \underset{\sim}{B} & = & \underset{\sim}{A} \circ R = (a 1, a 2, a 3, a 4) \\ \circ (\underset{\sim 1}{B}, \underset{\sim 2}{B}, \underset{\sim 3}{B}, \underset{\sim 4}{B})^{T} \\ = & (b 1, b 2, b 3, b 4, b 5) \end{matrix}$ (14)

Step 7. Calculatie the value of each candidate replica storage site, and sort the results. First, assign the value set V_P = {10, 30, 50, 70, 90} corresponding to the level set V = {I, II, III, IV, V}, with higher scores indicating greater satisfaction. Then, repeat steps 4 through 7 to evaluate each candidate replica storage site and obtain a second comprehensive result. Finally, calculate the final scores using Equation (15). $P_{i} = \underset{\sim}{B^{i}} \times (V^{P})^{T}$ (15)

Then the most suitable replica storage site is the data center with the highest scores $\max_{i \in [1, m]} (P_{i})$ .

5 Performance evaluation

In order to verify the validity and reliability of the comprehensive evaluation algorithm with respect to different individual Qos sensitivities, a replica storage placement problem simulation platform is designed and implemented at the Network & Information Security Lab, Shandong University of Finance and Economics (SDUFE). This simulation system is constructed based on SwinDeW-C [3, 21], which contains 10 super data centers (servers) and 200 ordinary data centers. The system on each data center is installed with VMWare (http://ww.vmware.com), so that it can offer unified computing and storage resources.

There are five modules in the simulation platform:

Simulation conditions generator, which is responsible for generating various simulated conditions, such as data center transfer delay, data update frequency, disk throughput, data transfer strategy, etc.;

Replica locater, which is responsible for locating candidate storage sites, including receiving the data request and collecting the data center information;

Qos perceiver, whose main purpose is to turn the user Qos preferences into a comprehensive evaluation weight set;

Replica placement selector, whose main purpose is to simulate the algorithm as in Section 4;

Results display, whose main purpose is to depict the results on the screen.

In the performance evaluation, in addition to the proposed algorithms, the no replication (NREP) scheme is also used as the baseline, and the random (RAND) scheme and greedy algorithm (GA) are used for comparison. As the name states, in NREP there is just the original objects in the root as a replica of the data set.

Random simulations are conducted on randomly generated data sets of different sizes, generation times, and usage frequencies. In addition, in the simulations, 100 data sets are used, each with a random size from 100 GB to 1TB. The usage frequency is also random, ranging from 1 to 10 uses.

Simulation 1: A comparison of available probability with different replica placement strategies. In simulation 1, CU_rel is used as an example (they express high reliability requirements for cloud storage service) by comparison of available probability. Generally speaking, users concerned with reliability can be satisfied with high data set available probability. Figure 4(a) depicts the comparison of available probabilities of data sets with an increasing number of replicas, indicating that the available probability will increase except with NREP. However, with comprehensive evaluation of data centers, FQPA is always better than the other three studied algorithms and can optimize the replica storage placement, which further increases the available probability of data sets.

Simulation 2: A comparison of cost with different replica placement strategies. In simulation 2, CU_cost is used as an example (high concern about data management cost) by the comparison of data set management costs. In order to compare the costs, the normalized cost (NC_k) is defined as the ratio of difference between the cost of NREP and the cost of the feasible solution found by the algorithm to the cost of the NREP scheme. Figure 4(b) presents a comparison graph of the total data management costs. The graph clearly shows a cost reduction using the proposed replica strategy as compared to the other algorithms. The total cost of replicas becomes constant for a certain number of replicas. As a result, the proposed algorithm optimizes the cost of replication.

Simulation 3: A comparison of average transfer times with different replica placement strategies. CU_time is used as an example (high concern about service time) by the comparison of average transfer times. Generally, those concerned with time can be satisfied with less data transfer delay. Figure 4(c) depicts the average transfer time among different replica placement strategies. Similarly, the NRCE average is defined as 100, and other values are the ratio of the average time to that if NCRE. As shown in Fig. 4(c), in all circumstance, the FQPA algorithm shows performance improvement over all other algorithms in terms of transfer time. As the number of replicas increases, the average transfer time decreases due to data set storage in more data centers, and data transfer must only occur from a nearby site.

6 Conclusions and future works

Because the human decision-making process usually contains fuzziness and vagueness, the FAHP is adopted to solve the problem in this paper. In order to find the correct replica storage sites for different kinds of users with individual Qos sensitivities in the cloud environment, this paper proposes a valuable approach based on FAHP with comprehensive consideration of the system configurations and user Qos preference. The analytic hierarchy is structured by four major parameters including data centers reliability, cost, time and security. Simulation results have shown that the selection strategy can achieve users IQS preferences and demonstrate improved effectiveness and practicality over traditional methods.

In future works, a knowledge-based expert system can be integrated to help decision-makers make more concise calculations and interpret the results in each step. Moreover, a proper evaluation approach as well as a suitable decision logic can be developed to make a correct final decision for data-driven framework systems, i.e., data driven approaches for industrial process monitoring or improved PLS focused on key performance indictors related to fault diagnosis.

Footnotes

Acknowledgments

This work presented in this paper is partly supported by Project of Shandong Provincial Natural Science Foundation (No.ZR2016FM01), China; the Doctor Foundation of Shandong University of Finance and Economics under Grant (No. 2010034), and the Project of Jinan Hightech Independent and Innovation (No. 201303015), China.

References

Higai

, Takefusa

, Nakada

and Oguchi

, A study of effective replica reconstruction schemes for the hadoop distributed file system, Ieice Transactions on Information & Systems E98D(4) (2015), 872–882.

Silberschatz

, Gagne

and Galvin

P.B.

, Operating System Concepts 8th Edition Binder Ready Version, Wiley Publishing, 2008.

Yuan

, Yang

, Liu

and Chen

, Local-optimizations based strategy for cost-effective datasets storage of scientific applications in the cloud, IEEE International Conference on Cloud Computing, Washington DC USA, 2011, pp. 179–186.

Sun

D.W.

, Chang

G.R.

, Gao

, Jin

L.Z.

and Wang

X.W.

, Modeling a dynamic data replication strategy to increase system availability in cloud computing environments, Journal of Computer Science and Technology27(2) (2012), 256–272.

AL-Mistarihi

H.H.E.

and Yong

C.H.

, Response time optimization for replica selection service in data grids, Journal of computer science4 (2008), 487–493.

Wang

, Zheng

, Wang

H.M.

and Wu

Q.Y.

, Fuzzy logic based replica management infrastructure for balanced resource allocation and efficient overload control of the complex service-oriented applications, SpringerBerlin Heidelberg, 2007.

, Ping

, Ge

, Wang

and Fu

, Cloud storage as the infrastructure of cloud computing, International Conference on Intelligent Computing and Cognitive Informatics (ICICCI), Kuala Lumpur, Malaysia, 2010, pp. 380–383.

Adjenughwure

and Papadopoulos

, A new hybrid fuzzy-statistical membership function based on fuzzy estimators, Journal of Intelligent & Fuzzy Systems30 (2016), 2761–2771.

Ranganathan

and Foster

, Identifying dynamic replication strategies for a high-performance data grid, GRID 2001 Second International Workshop, Denver CO USA, 2001, pp. 75–86.

10.

Wang

L.X.

, Liu

and Lin

W.W.

, Research on data replication optimization model based on fuzzy forecasting, Computer Technology and development12(12) (2013), 82–91.

11.

Beigrezaei

, Kanan

H.R.

and Haghighat

A.T.

, A new fuzzy based dynamic data replication algorithm in data grids, The 13th Iranian Conference on Fuzzy Systems (IFSC), Qazvin Iran, 2013, pp. 1–5.

12.

Lee

M.C.

, Leu

F.Y.

and Chen

Y.P.

, PFRF: An adaptive data replication algorithm based on star-topology data grids, Future Generation Computer Systems28 (2012), 1045–1057.

13.

Xiong

R.Q.

, Luo

J.Z.

, Song

A.B.

and Jin

J.H.

, QoS preference-aware replica selection strategy in cloud computing, Journal on Communications32 (2011), 93–102.

14.

Qin

S.J.

, Survey on data-driven industrial process monitoring and diagnosis, Annual Reviews in Control36(2) (2012), 220–234.

15.

Vobugari

, Somayajulu

and Subaraya

B.M.

, Dynamic replication algorithm for data replication to improve system availability: A performance engineering approach, IETE Journal of Research61(2) (2015), 132–141.

16.

Imam

and Rahman

R.M.

, Implementation and performance analysis of Fuzzy replica replacement algorithm in data grid, SpringerBerlin Heidelberg (2011), 95–110.

17.

Saaty

T.L.

and Vargas

L.G.

, Models, Methods, Concepts & Applications of the Analytic Hierarchy Process, Kluwer Academic Publishers, Norwell, MA, 2001.

18.

Saaty

T.L.

, The analytic hierarchy process: Planning, priority setting, resource Allocation, McGraw-HillNY USA, 1980.

19.

Farag

, El-Hosary

, El-Metwally

and Kamel

, Design and implementation of a variable-structure adaptive fuzzy-logic yaw controller for large wind turbines, Journal of Intelligent & Fuzzy Systems30 (2016), 2773–2785.

20.

Ren

X.Y.

, Wang

R.C.

and Kong

, Using optorsim to efficiently simulate replica placement strategies, The Journal of China Universities of Posts and Telecommunications17(1) (2010), 111–119.

21.

Yang

, Liu

, Chen

, Liu

, Yuan

and Jin

, An algorithm in SwinDeW-C for scheduling transaction-intensive cost-constrained cloud workflows, IEEE Fourth International Conference on e-Science IEEE, Indianapolis Indiana USA, 2008, pp. 374–375.