Abstract
Distributed storage systems store data redundantly at multiple servers that are geographically spread throughout the world. This basic approach would be sufficient in handling server failure due to natural faults, because when one server fails, data from healthy servers can be used to restore the desired redundancy level. However, in a setting where servers are untrusted and can behave maliciously, data redundancy must be used in tandem with Remote Data Checking (RDC) to ensure that the redundancy level of the storage systems is maintained over time.
All previous RDC schemes for distributed systems impose a heavy burden on the data owner (client) during data maintenance: To repair data at a faulty server, the data owner needs to first download a large amount of data, re-generate the data to be stored at a new server, and then upload this data at a new healthy server. We work on a new concept, namely, server-side repair, in which the servers are responsible to repair the corruption, whereas the client acts as a lightweight repair coordinator during repair. We propose two novel RDC schemes for replication-based distributed storage systems,
Finally, we evaluate the performance of the two schemes. For the
Keywords
Introduction
The recent proliferation of cloud services has made it easier than ever to build distributed storage systems based on Cloud Storage Providers (CSPs). Traditionally, a distributed storage system stores data redundantly at multiple servers that are geographically spread throughout the world. In a benign setting where the storage servers always behave in a non-adversarial manner, this basic approach would be sufficient in order to deal with server failure due to natural faults. In this work however, we consider a setting in which the storage servers are untrusted and may act maliciously. In this setting, Remote Data Checking (RDC) [4,5,27,34] can be used to ensure that the data remains recoverable over time even if the storage servers are untrusted.
When a distributed storage system is used in tandem with remote data checking, we can distinguish several phases throughout the lifetime of the storage system:
In cloud storage outsourcing, a data owner stores data in a distributed storage system that consists of multiple cloud storage servers. The storage servers may belong to the same CSP (e.g., Amazon has multiple data centers in different locations), or to different CSPs. The ultimate goal of the data owner is that the data will be retrievable at any point of time in the future. Conforming to this notion of storage outsourcing, the data owner would like to outsource both the storage and the management of the data. In other words, after the
In this work, we ask the question: Is it possible to design an RDC scheme which can repair corrupted data with the least data owner intervention? We answer this question in the positive by exploring a model which minimizes the data owner’s involvement in the
We consider a new storage system architecture in which each storage server exposes an interface for data manipulation so that the data owner can coordinate the actions of the storage servers in the
the system stores t replicas of the data owner’s original file.
the system imposes a small load on the verifier during the
the system imposes a small management load on the data owner (by minimizing the involvement of the data owner during the
The first two properties alone can be achieved based on techniques proposed in previous work ([16] provides multiple replica guarantees, whereas RDC based on spot-checking [4,27,34] supports a lightweight verification mechanism in the
Solution overview
Two insights motivate the design of our solution:
Previous work [7,30] proposed to store identical replicas at storage servers which are in different locations. To check that each server stores a replica, they require servers to respond fast, thus relying on the network delay and bandwidth properties. While storing identical replicas has the advantage of simplicity, in Section 2.1 we show that this approach has major limitations. Moreover, we show that for real-world CSPs, one of the assumptions made by prior work [7] does not hold.
the servers are usually connected through premium network connections (high bandwidth), as opposed to the data owner’s connection which may have limited download/upload bandwidth. Our experiments in Table 8 (Appendix A) show that Amazon AWS has premium Internet connection of up to tens of MB/s between its data centers.
the computational burden during the
Previous RDC schemes for replication-based distributed storage systems ([16]) do not give the storage servers access to the original data owner’s file. Each replica is a masked/encrypted version of the original file. As a result, the
In this work, we propose to use a different paradigm, in which the data owner gives the servers both access to the original file2
If data confidentiality is a concern, data can be first encrypted and our approach can be applied on top of the encrypted data.
This basic approach is vulnerable to a potential attack, the replicate on the fly (ROTF) attack: During
To overcome the ROTF attack, we make replica creation to be time consuming. In this way, malicious servers cannot generate replicas on the fly during
We consider two types of economically-motivated adversaries, a static adversary and a dynamic adversary. A static adversary refers to an adversary who performs the ROTF attack by using a fixed amount of computational power. This captures a CSP who initially has a fixed budget for its computational power, and does not increase its budget over time. Our first scheme,
We point out limitations of a previous network delay-based model for establishing data geolocation and revise this model to suit our approach. Based on experiments on the Amazon cloud platform, we show that one of the assumptions made in the network delay-based model does not hold in practice. We further show that an RDC scheme built on such a model can only provide a very low data possession assurance (Section 2.1). We revise this model to include replica differentiation and time-consuming replica generation, in order to limit the ability of economically-motivated adversaries to cheat. Our new model for checking replica storage allows servers to generate new replicas and shifts the burden during the All previous distributed RDC schemes place a heavy burden on the client during repair. We propose We propose We provide guidelines on how to choose the parameters for For
In this section, we first review a previously proposed theoretical framework that relies purely on network delay to establish the geolocation of files at cloud providers, and point out several limitations of this model when used with a basic RDC protocol on the Amazon cloud service provider. The main limitation is that one of its assumptions does not hold in a practical setting, and thus a protocol that relies only on the network delay to detect server misbehavior can only offer a very low data possession guarantee. We then augment this model to include time-consuming replica generation in order to make RDC usable for geolocation of files in the context of a real-world cloud storage provider such as Amazon.
A network delay-based model and its limitations
Benson, Dowsley and Shacham proposed a theoretical model for verifying that copies of a client’s data are stored at different geographic locations [7] (we refer to it as the “BDS model”). This model allows to derive a condition which can be used to detect if a server at some location does not have a copy of the data. The idea behind the condition is that an auditor which challenges a storage server must receive an answer within a certain time, otherwise the server is considered malicious. The time is chosen such that a server that does not have the challenged data cannot provide an answer by using data from a server at a different geolocation.
The locations of all data centers of the cloud provider are known.
The cloud provider does not have any exclusive Internet connection between the data centers.
For each data center s, it is possible to have access to a machine that is located very close to s (i.e., very small network latency), such as in the same data center.
Consider the case when the client wants to check if

Auditing protocol: Client C checks if server
Let
If the data center
Based on Assumption 3 in the BDS model, the auditor can be located very close to
To ensure that
Using the numbers in Tables 8 and 9, with
The main problem with the basic PoR protocol based on the BDS model (cf. Section 2.1) is that all the file copies are identical and the auditor relies solely on the network delay to detect malicious server behavior. As a result, the protocol must assume that there is no exclusive Internet connection between the data centers (Assumption 2 in the BDS model). Having established in Section 2.1 that this assumption does not hold in a practical setting, we augment the BDS model to make it usable in a practical setting. Namely, we require that the file replicas stored at each server must be different and personalized for each server. Upon being challenged, each server must produce an answer that is dependent on its own replica. As a result, a server cannot answer a challenge by using another server’s replica. An economically-motivated server who does not possess its assigned replica may try to cheat by using another server’s replica. But to do this, the cheating server must first generate its own replica in order to successfully answer a challenge. As a result, our model does not rely purely on network delay to differentiate benign behavior from malicious behavior, but also on the time it takes to generate a file replica. This allows us to eliminate Assumption 2 from the BDS model, because we require that replica generation be time consuming.
We propose a model in which the client creates t different file replicas and stores them at t data centers owned by the same CSP (one replica at each data center). To illustrate the model for
We only need to make the following two assumptions (note that we do not assume there is no exclusive Internet connection between the data centers like in the BDS model):
The locations of all data centers of the cloud provider are known. For each data center s, it is possible to have access to a machine that is located very close to s (i.e., very small network latency), such as in the same data center.
System and adversarial model
System model
The client wants to outsource the storage of a file
We note that, given an individual file replica, say
The file
Adversarial model
We assume that the CSP is rational and economically motivated. The CSP will try to cheat only if cheating cannot be detected and if it achieves the economic benefit of using less storage than required by contract. An economically motivated adversary captures many practical settings in which malicious servers will not cheat and risk their reputation, unless they can achieve a clear financial gain. We also note that when the adversary is fully malicious, i.e., it tries to corrupt the client’s data without regard to its own resource consumption, there is no solution to the problem of building a reliable system with t replicas [9,16].
The ROTF attack
To illustrate the ROTF attack that was introduced in Section 1, consider the setting in Fig. 1(b), where
The α-cheating adversary
A CSP is obligated by contract to store t file replicas, which requires a total of
An α-cheating adversary is an economically-motivated adversary that can successfully pass a challenge by only using
Note that if
We consider two types of α-cheating adversaries. A
Replica generation in our model is time consuming. A dishonest CSP trying to cheat by storing less replicas and executing the ROTF attack to pass challenges is always better off by keeping a copy of the original file
Also, recall that most RDC schemes ensure efficiency by using spot checking [4,27,34]: The client challenges the server to prove possession of a randomly chosen subset of c blocks out of all the n file blocks. This can provide a high likelihood that the server is storing the entire file.
Theorem 3.2 shows that the best data distribution strategy for an α-cheating adversary that wants to remain undetected is to store in each of the t servers an equal fraction of the whole storage, i.e.,
Recall that we have assumed that the adversary always stores one original file copy
For an α-cheating adversary, the best data distribution strategy to remain undetected is to store in each of the t servers an equal fraction of the whole
The proof is provided in Appendix C.
In this section, we propose
The original file
Figure 2 provides a reference sheet with various parameters of

A reference sheet with various parameters of
During the
The
A detailed description of


Components of
In
In order to setup the system, the data owner must initially decide the type of adversary it wants to protect the data against. Concretely, by picking a value for α, the data owner seeks to protect its data against a CSP that is modeled as an α-cheating adversary. For example, by picking a small α, the data owner achieves protection against a CSP that will try to cheat by corrupting a large amount of the data. This type of corruption is easier to detect and, as a result, the data owner can afford to use a smaller masking factor. On the other hand, by picking a large α, the data owner seeks protection against a more stealthy CSP that only corrupts a small fraction of the data. As a result, the data owner needs to use a larger masking factor.
Once the data owner fixes α, it can derive the two parameters: η (the masking factor) and τ (the time threshold used to validate the audit). In the following, we first describe the best adversarial storage strategy in
Values of
if the client is located in an AWS S3 region
Values of
Let
It turns out it is not trivial to estimate x for the Amazon CSP. In our experiments, the value x exhibits some variation due to the fact that sampling a random block in Amazon S3 can be very large in some rare cases (in those cases it will be difficult to differentiate between benign and malicious CSP behavior). However, based on our experiments we observed that, out of 240 protocol executions, 95% of the values for x are within the range [0.025 sec, 0.034 sec] for the AWS Oregon region. Thus, the data owner should use the top value in this range (0.034 sec) to estimate x in the formula for τ if the data is stored in the Oregon S3 region. We propose three ways in which the data owner can acquire x: First of all, data owners can estimate x themselves by measuring it directly in the target data centers; Secondly, the CSP could determine such a range and publish it; Thirdly, it can be estimated by a trusted third party. Note that if x is estimated by data owners or trusted third parties, the CSP should not be able to differentiate the events of “data access to estimate x” and “regular data access”, thus it cannot influence the outcome of verification by artificially manipulating the value of x.
Our
Different from previous work on RDC, the paradigm we introduce in this paper allows the servers themselves to generate new replicas for repair purposes. This opens the door to a new attack, the replicate on the fly (ROTF) attack, in which the economically-motivated servers claim to store t replicas, but in reality they store less than
In
According to Section 4.2,
In
For fixed values of α, we can always choose c such that the probability that a server is cheating successfully without being detected becomes negligibly small. For example, if the server is storing only Per Definition 3.1, an α-cheating adversary is an economically-motivated adversary that only uses As described in Section 4.2, the time threshold τ in When a server is missing an According to Lemma 4.1, if we choose the parameters η and τ according to the guidelines in Section 4.2, the cheating server cannot generate the missing Evaluating Thus, Thus,
The design of
Design for
In

A reference sheet with various parameters of

β-butterfly encoding.
As shown in Fig. 6, to create a new replica we use the collection of original file blocks as input (at level 0), and apply an atomic cryptographic transformation to pairs of blocks in a sequence of
each bit of the pair of output blocks depends on each bit of the pair of input blocks.
each output block has the same size as the input block.
In Section 5.2.1, we provide an instantiation for the cryptographic transformation used in
In the following, we present the


Components of
We provide a detailed description of
In
In this section, we provide guidelines for using

A reference sheet for all the parameters used in
The cryptographic transformation E used in
View the two input RDC blocks as a collection of
Apply a butterfly encoding (as shown in Fig. 6) over this collection of 64-bit blocks (at level 0) in a sequence of
This full butterfly encoding achieves “strong mixing” [41], such that each bit of the output (two RDC blocks) depends on each bit of the input (two RDC blocks). In Fig. 10, we show a concrete example for the instantiation of a cryptographic transformation where each RDC block has 4 64-bit blocks. Correspondingly, D (Section 5) can be instantiated as the reverse process of E. Note that m should be chosen as a power of 2, so that

An example for the instantiation of cryptographic transformation.
In the following, we first describe the best adversarial storage strategy for
m is a power of 2.
We provide next an efficient algorithm (Algorithm 1) to find out the minimum value of m (i.e.,
After finding the minimum value of m, picking the exact m value is a trade-off between the storage overhead of verification tags and the computation/communication overhead during
We provide in Tables 2, 3, 4 and 5 the

Compute
By knowing

Estimate β
Once m is fixed, we can compute e based on Equation 3. Using the aforementioned example, when
Pick α, ρ and ϕ, which are known in practical applications; pick c according to PDP [4]. Follow the aforementioned guidelines to determine τ. Follow the aforementioned guidelines to determine the minimum value of m, and choose m such that it is larger than or equal to its minimum value. Compute n and e based on m.
Following the previous example, in which
According to Section 5.2, ϕ is a time period after
When ϕ expires, the client estimates a new set of τ and β parameters according to the guidelines in Section 5.2.2, and then retrieves the original file, pre-processes it again based on this new set of parameters, and outsources the replicas again. This will provide a similar security guarantee for the next time period. The client repeats this process until needed, thus obtaining a long-term security guarantee for its outsourced data. In According to Section 5.2.2, β is chosen so that the condition By choosing the β and τ parameters according to the guidelines in Section
5.2.2
, the probability that an α-cheating adversary can successfully execute the ROTF attack at time We have established in Section 5.2.2 that the best adversarial storage strategy for As described in Section 5.2.2, the set of parameters τ and β in When a server is missing a According to Lemma 5.1, if we choose the parameters β and τ according to Section 5.2.2, the cheating server cannot generate the missing
In this section, we first experimentally evaluate
Experimental evaluation for
Background on Amazon’s cloud services
We first provide some background for Amazon’s cloud services within the United States, called Amazon Web Services (AWS). EC2 is Amazon’s cloud computing service and S3 is Amazon’s cloud storage service. In the United States, Amazon has three EC2 regions (US East – Virginia, US West – North California, and US West – Oregon) and three S3 regions (US Standard, US West – North California, and US West – Oregon). Based on our measurements in Table 8 and 9 of Appendix A, the following EC2 and S3 regions are located extremely close to each other and have very high network connection between them, thus we consider them in the same region: Virginia (EC2 US East – Virginia and S3 US Standard), N. California (EC2 US West – North California and S3 US West – North California), and Oregon (EC2 US West – Oregon and S3 US West – Oregon).
Experimental results
We build and test our prototype for
We measure the time for masking, verification tag generation and total preprocessing for one masked replica under three sets of
We have several observations for Table 6: First, the throughput of masking operation decreases when α increases. This is expected because a larger α means that it is more difficult to detect the adversarial behavior, thus, we need a larger η, hence more computations are required for masking. Secondly, the throughput of verification tag computation is independent of α, due to the fact that the verification tags are computed over the masked replica, which is independent of η, hence independent of α. Thirdly, the throughput of total preprocessing, which includes masking and verification tag computation, is always close to but a little smaller than the throughput of masking, since the verification tag computation is very efficient (can generate verification tags for more than 5 MB data in one second) and only has a small impact to the total preprocessing time.
Preprocessing throughput
Preprocessing throughput
Benign case: The CSP is honest, i.e., it strictly stores the replicas in the corresponding regions according to the contract. Upon challenge, the server uses the data from the same region to pass the challenge. In this case, the total server computation includes sampling challenged blocks from S3 of the same region and computing the proof.
Adversarial case: The CSP is cheating by not storing all replicas in their entirety according to the contract. The malicious CSP adopts the best attack strategy described in Section 3.2. Because the server will only have an α fraction of the challenged blocks, it retrieves the other
We repeat the experiments for different sets of

Computational cost for both the server and the client in challenge phase (benign case).

Computational cost for the server and its various components in challenge phase (adversarial case).
For the benign case, we observe from Fig. 11 that the total server computation and its various components as well as the client (verifier) computation are independent of file size and of α. This is expected because: First of all, we rely on spot checking [4] which always randomly samples a fixed number of blocks from the masked replica, thus can maintain constant server/client computation. Secondly, during a challenge, the operations on both server and client are over the masked replica, which is independent of η, hence independent of α. Figure 11(d) shows that the time for the client to check the proof is less than 7 msec, which justifies our claim that the system imposes a small load on the verifier during the challenge phase.
For the adversarial case, we observe from Fig. 12 that the total server computation and its various components are independent of filesize. The reason has been explained in the benign case. For Fig. 12(c), we expected to see that the masking time is independent of α, because: The malicious server always stores only an α fraction of the corresponding data, and generates the
Figure 12(d) shows that the time for the server to compute the proof varies with α. However, we can still conclude that this time is independent of α given that the variance is quite small (around 1%). According to the guidelines for establishing the time threshold τ in Section 4.2, τ should be 13.7 sec (
One significant advantage in the repair phase is that the client can be kept lightweight, e.g., the client only needs to exchange a few messages to coordinate the repair procedure. This justifies our claim that the system imposes a small management load on the data owner during repair.

Computational cost for repairing a replica.
In this section, we provide an analytical performance analysis for
For
To evaluate the time for creating a new replica in When When
When
When
Concrete values for r by varying ϕ and ρ when
(recall that r is the ratio between the overall computational time of
and that of
)
Concrete values for r by varying ϕ and ρ when
During
In both
Related work
Remote data checking
Although RAFT and our work share the idea of using a time-based mechanism to detect malicious behavior, they are fundamentally different in their basic approach and goals, and in the system and adversarial models. First, while in
Benson et al. [7] propose another time-based model (BDS model) to guarantee that multiple replicas are distributed to different data centers of the CSP. Our work adapts this model to enable the server-side repair paradigm.
Watson et al. [45] propose
Gondree and Peterson [26] further relax the adversarial models and assumptions of previous PoL scheme [7], and propose a constraint-based data geolocation protocol that binds the latency-based geolocation techniques with PDP scheme.
Chen et al. [10] also explored the concept of server-side repair, but in the context of erasure code-based distributed storage systems that can function under an adversarial setting. Our focus in this work is on replication-based distributed storage systems.
Proofs of work (PoW)
Conclusion
In this paper, we propose two schemes for replication-based distributed storage systems,
The proposed schemes have several limitations. Our approach minimizes the load on the data owner during Repair by shifting the computational burden to the cloud servers. However, to overcome the “replicate on the fly” attack, we require that replica generation is time consuming, which in turn makes Repair time consuming. This limits the applicability of our schemes to scenarios in which Repair is a rare operation. Moreover, both of our schemes rely on a fixed time threshold to detect server misbehavior. While this approach proved to work well in our evaluation, we assumed an accurate estimation of the time needed to generate a proof by the server, a task that may not always be feasible. Finally,
Footnotes
Acknowledgments
This research was supported by the National Science Foundation (NSF) under Grants No. CNS 1054754, CNS 1409523, and DGE 1565478, and by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL) under Contract No. A8650-15-C-7521. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF, DARPA, and AFRL. The United States Government is authorized to reproduce and distribute reprints notwithstanding any copyright notice herein.
Measurements for the Amazon CSP
Tables 8 and 9 show the bandwidth and the propagation delay between Amazon S3 data centers (regions) and between our institution and different S3 data centers (regions). For measurements, we used an EC2 instance within the corresponding Amazon data centers. To measure bandwidth, we used Wget [47] to download a large file. To measure the propagation delay, we adopt the method introduced in [7] that is, we measure the time between sending a SYN packet and receiving a SYN-ACK packet of a TCP connection, half of which is considered as the propagation delay. All the results in Tables 8 and 9 are averaged over 20 runs.
Sampling blocks from Amazon S3
We wrote a program running in an EC2 instance (Amazon Virginia region) to randomly sample 4 KB blocks from S3 Virginia region. We collect the time in Table 10. All the results are averaged over 20 runs.
Proof of Theorem 3.2
Quantifying the computation required for generating one replica block from the original file blocks in ERDC - SR
From Fig. 6, we want to generate a block at level
Quantifying the computation needed to generate a replica block when the adversary only stores one intermediate block
In Fig. 6, there are
Let
Determining the minimum value of e
Intuitively, β and e together determine the amount of computational effort the α-cheating adversary needs to spend in order to cheat successfully without being detected. In other words, when β is larger, e can be smaller. However, β cannot exceed n. Thus, e should have a minimum value, i.e.,
Quantifying the probability that all the c · ( 1 − α ) missing challenged blocks depend on different sets of β original file blocks
We can see from Fig. 6 that each of the replica blocks depends on a set of β original file blocks. In other words, a replica block with index i depends on a set of β original file blocks with indices in the range
We see from Fig. 6 that all the replica blocks with indices in the range
Quantifying the minimum computation for an α -cheating server to generate the c · ( 1 − α ) missing challenged blocks
During
Computing all the
Based on the previous observation, we state Theorem H.1.
We are now ready to evaluate the lower bound of
We have:
Based on Theorem H.1, when
