Abstract
Nowadays, cloud storage has become an attractive storage scheme for a user to store his files. When a user stores his files on a remote cloud storage system, he cannot make sure whether his files are intact, so he must use some protocol to check the integrity of his files in the cloud storage. To guarantee high availability, some cloud storage servers provide a kind of highly-available service, which stores multiple copies of user files in the cloud storage, and the file owner cannot make sure whether all these copies are intact as well. Some cloud storage servers allow his users to operate their files online. As the file owner cannot always be online, he must entrust a trusted public data auditor to check his files in the cloud storage. In this work, we investigate the above issues about provable data possession with multi-copy and data dynamics supporting public verification in a cloud storage. We design a kind of authenticated 2-3 tree with ordered leaves and use this kind of tree to organize file block tags. We design a privacy preserving provable data possession scheme with multi-copy and data dynamics which supports public verification, and use a kind of RSA tag to construct this scheme. We apply our scheme to a cloud file backup system. Our theoretical proofs and experiments show that our scheme is feasible and reasonable.
Introduction
Motivation
With the advancement of network technology, many users have outsourced their storage and computing needs [27]. Cloud storage becomes an attractive storage scheme for a user to store his files, which can reduce his maintenance cost, prevent his files from data corruption, and share his files with other persons. To satisfy the market demands, some cloud storage productions have been deployed in recent years, e.g., Amazon S3 [1] and Google Cloud Storage [17] are the two most prestigious cloud storage systems. All these cloud storage servers are deployed in some remote data centers, which are controlled, managed and maintained by some cloud storage providers. A file owner cannot physically possess his files, which raises some formidable and challenging tasks related to file security in a cloud storage. According to an investigation in [30], cloud storage security has become a lion in the way for large-scale application of cloud storage. It is important for a cloud storage provider to provide some file security mechanisms for its users. Generally speaking, file security consists of privacy, integrity and availability. A cloud storage provider can provide some encryption softwares based on some off-the-shelf encryption algorithms for a user to protect his file privacy. However, to resolve the file integrity problem, a cloud storage provider must provide his users with some file integrity checking tools. Up to now, researchers have designed two kinds of integrity checking scheme in the literature, namely, PDP [12, 13] and POR [3]. In POR, it inserts some tags as sentinels in a file, and uses a kind of erasure code to code this file. All sentinels are arranged in the file before being sent to an untrusted server, so it can be used to check the file only limited times. Subsequently, many researchers have extended these two kinds of schemes for different application fields with different properties, such as support public verification [8, 42], data dynamics [5, 42], privacy preservation [8], resource-constrained mobile devices [23] or cloud storage [19, 51].
To guarantee file availability, we can store multiple copies of the file in multiple servers using replication technique. We refer to this kind of cloud storage as multi-copy cloud storage. Usually there exist two kinds of storage services which are highly-available service and ordinarily-available service in a multi-copy cloud storage. We can regard the ordinarily-available as a special case of highly-available service that we stores only one file copy in a cloud storage. If a user requires a highly-available storage service, cloud storage server will replicate the user’s files according to the service level and store all these copies in the cloud storage. Accordingly, the cloud storage providers charge the file owners according to his renting storage space and service level. The same file will be charged more when it is stored in a higher service level. As his files are stored in the cloud storage, the file owner cannot make sure whether there exist exact number of copies according to the service level, rather than store only one copy. To resolve this problem, Curtmola et al. [45] proposed a MR-PDP scheme. Hereafter, Barsoum et al. [2] brought forward a more efficient MR-PDP scheme. But, there are two shortcomings in their schemes. Firstly, all their schemes only check static file data, not supporting data dynamics. When a user wants to modify or delete some blocks of his files, he must download his files to his local computer, operate his files, regenerate all copies and retransmit these copies to the remote server. This will bring about tremendous overhead of network and computation. Secondly, their schemes don’t support public verification. A user must be online all the day and check these files by himself. If there are some lawsuits about file security between a cloud storage provider and a file user, there does not exist a public data auditor to arbitrate this affair fairly.
Our Contributions
To solve above problems, we design a provable data possession scheme with multi-copy and data dynamic in a cloud storage, which supports public verification. The contributions of this work can be summarized as follows. Firstly, we propose a privacy preserving provable data possession scheme with multi-copy and data dynamics in a cloud storage. In the following section, we call it P3DP for short. Secondly, we construct the P3DP with a kind of RSA tag, and apply it to a cloud file backup system. Thirdly, we prove that the construction of P3DP is secure and efficient from theoretical perspective. Fourthly, we conduct some experiments for the P3DP with a kind of RSA tag. Results of experiments show that P3DP is rational and feasible.
The paper is organized as follows: In Section 2, we describe and give some definitions about the problem. The authenticated 2-3 tree with ordered leaves is presented in Section 3. In Section 4, we introduce the privacy preserving provable data possession with multi-copy and data dynamics in a cloud storage and give a construction for the scheme with a kind of RSA tag. In Section 5, we give some rigorous proofs about security for the construction. In Section 6, we describe how P3DP can be applied to a cloud file backup system. Our implementation and experiment results are presented in Section 7. Finally, conclusions are given in Section 8.
The Problem and Definitions
The System Model
Our cloud storage system model is described in Fig. 1. It is made up of three different kinds of entities, a cloud storage provider, a file owner and a public data auditor (PDA). Each entity has different responsibility respectively. The file owner owns some files and stores them on the remote cloud storage servers owned by a cloud storage provider. The cloud storage provider stores multiple copies of the user’s files in his cloud storage servers. The file owner can entrust a PDA to check the integrity of his files in the cloud storage or do it by himself.
Problem Formalization
In this work, we study the problem of file integrity checking in a cloud storage, where there exist multiple copies for each file. The file owner can check the integrity of his files, and insert, delete, or modify his files remotely. If the file owner is not online, he can entrust a PDA to check the integrity of his files. We formalize the problem as follows: When a file owner decides to store a file F to a cloud storage, he generates m copies of F, we denote them as F i (1 ≤ i ≤ m). No one can distinguish F i from F j (i ≠ j) and make sure whether all the F i is generated by F except the file owner. Then the file owner generates a pair of public key and private key (p k , s k ), keeps the s k secret and makes the p k public. It uses the private key and public key to generate φ ij (1 ≤ j ≤ n) which are block tags of F i , where n = len (F)/L and L is the length of file blocks. If the length of the last block does not equal to L, it will be padded with some random values. For simplification, we will use φ i (1 ≤ i ≤ m) to denote all the tags of the F i . Finally, the file owner sends all the F i and φ i to the cloud storage, and sends φ i to the PDA. Hereafter, the file owner and PDA can audit the integrity of F i . If some blocks of F i are corrupted or deleted, PDA and file owner can detect it. If a PDA detects it, he will notify the file owner that some blocks are corrupted or deleted and the file owner will regenerate these blocks and sends them to the cloud storage. At the same time, the file owner can operate his files remotely, and the PDA can learn nothing about the inspected blocks in the audit process.
The Threat Model
From above discussion, we know that there are three threats in our file integrity checking scheme. Firstly, the cloud storage system provider only considers his economic benefits. He tries to delete some file blocks or preserves only one copy rather than m copies for his users to save his storage space, and the cloud storage system provider tries to use some kinds of skills to cheat the file owner or PDA that all these files are store correctly. Secondly, the cloud storage provider tries to get some information from all files stored in the cloud servers. Thirdly, PDA is a curious and honest adversary that tries to gain some knowledge from the audit process.
Some Notations
The security parameter throughout the paper is ν. We say that f (ν) is negligible if f (ν) = o (ν-c) for every fixed constant c. We write negl (ν) to denote a negligible function. When saying that a Turing machine A is p . p . t. we mean that A is a uniform probabilistic polynomial time machine. We use i to denote file number, where 1 ≤ i ≤ m, and use j to denote block number, where 1 ≤ j ≤ n.
The P3DP Scheme
The P3DP scheme consists of five protocols which are initiation, data possession checking, file block requirement, dynamic operation and file block restoration.
The initiation protocol generates the system security parameters, block tags, file copies and a signature of the root of the tree that we use to organize block tags. It consists of Setup, GenTag, and GenCopy three algorithms. Setup (ν) → {P
k
, S
k
, k, sgk, vk}. It is a probability algorithm which is run by the client. In the following sections, we use the client to denote the file owner. It takes the security parameter ν as input, and outputs a block encryption key k, a key pair (P
k
, S
k
), a signing key sgk and a verification key vk. S
k
denotes the private key and P
k
denotes the public key. The client keeps S
k
and sgk secretly, and releases P
k
and vk publicly. GenCopy (F, m, k) → {F
i
(i = 1, 2, …, m)}. This algorithm is run by the client. It takes the file F, replica number m and block encryption key k as input and outputs a set of copies F
i
(i = 1, 2, …, m). All the F
i
are different, without the block encryption key k, no adversary can use F
i
to generate F
j
where i ≠ j. GenTag (F
i
, S
k
, sgk, g) → {φ
i
, sigsgk (H (R
i
))}. This algorithm is run by the client. It takes the file F
i
, private key S
k
, signing key sgk and g as input and outputs a block tag set φ
i
. F
i
is an ordered collection of blocks {B
ij
} and φ
i
is a block tag set on {B
ij
}. It also outputs sigsgk (H (R
i
)) which is the signature of R
i
, where R
i
is the root of the tree that we use to organize φ
i
, and the leaves of the tree are hash values of B
ij
.
Challenge (P
k
, i) → {chal}. This algorithm is run by the client or PDA. It takes P
k
and the copy id i as input and outputs the challenge chal for copy F
i
. GenProof (F
i
, chal, φ
i
) → {γ
i
}. This algorithm is run by the server. It takes the challenge chal, file F
i
, and file block tag set φ
i
as input and outputs a data integrity proof γ
i
for the blocks of the file F
i
that specified by chal. CheckProof (P
k
, vk, chal, i, γ
i
)→{success, failure}. This algorithm is run by the client or PDA. It takes the public key P
k
, verification key vk, challenge chal, copy id i and the data integrity proof γ
i
which is returned from the server as input. If the integrity of copy F
i
is verified as correct, it outputs success, otherwise it outputs failure.
GenBlockNo (i, j) → {bn}. This algorithm is run by the client. It takes i and j as input and outputs a block number bn for F
i
. TransmitBlock (F
i
, bn, φ
i
) → {B
ij
, PF
ij
}. This algorithm is run by the cloud server. It takes F
i
, bn and φ
i
as input and outputs B
ij
. It also outputs PF
ij
which is the proof of B
ij
. RestoreBlock (B
ij
, PF
ij
) → {success, failure}. This algorithm is run by the client. The client checks the proof of block B
ij
, if it passes the proof checking, it decrypts B
ij
using the block encryption key k and outputs success, otherwise it outputs failure.
. This algorithm is run by the cloud server. T denotes the dynamic operation type. If T equals I, it denotes the insertion operation; if T equals M, it denotes the modification operation; if T equals D, it denotes the deletion operation. is the root of the tree to organize the block tags of , and is the signature of . Given F
i
, i, B
ij
, T, and φ
i
, it outputs and . It also outputs , which is obtained from the client.
GenerateDuplicateBlock (i, k, j) → {B
ij
}. This algorithm is run by the client. It first invokes the file block requirement protocol to get b
ij
, then uses block encryption key k and i to generate the block B
ij
, and sends B
ij
to the cloud storage server.
Note that if one wants to restore F
i
which has m blocks, it just invokes the File block restoration protocol m rounds.
Security Requirements
From above threat model, we can see that there are three security requirements in our P3DP. We refer to them as security against the cloud storage provider with public verifiability, privacy against the cloud storage provider and privacy against the public data auditor. In the following section, we will define these security requirements formally through some standard challenger-attacker attack games or simulation technique. In Game 1 and Game 2, we use the challenger to stand for either a file owner or a PDA, and use the adversary to stand for an untrusted cloud storage provider.
We first give a definition of security against the cloud storage provider with public verifiability using Game 1. In Game 1, a malicious adversary tries to delete some file blocks in its cloud storage to save storage space, and uses other block tags or other obtained information to generate a response R to pass the challenge for these deleted blocks. Game 1 has four phases: Setup, Query, Challenge and Forge.
If Checkproof (P k , B i , chal, R, i) = success, then the adversary wins Game 1.
then we say that the protocol is security against the cloud storage provider with public verifiability.
In Game 2, the malicious adversary try to play a game with challenger to distinguish a designated block from some random blocks in its cloud servers. If it can accomplish this task, then it can get some useful information from files that are stored in its cloud storage servers. Game 2 has five phases: Setup, Query1, Forge, Query2 and Guest.
We say that the adversary wins Game 2 when the probability that the adversary guesses j correctly is more than 1/2.
To make our scheme efficient, we must use some data structure to organize block tags. Erway et al. designed a ranked skip list [6] and Wang et al. used an ordinary Merkle tree to support data dynamics [42]. As the efficiency of the ranked skip list is low and the depth of the ordinary Merkle tree is too large, which will affect the efficiency of tree operation, they cannot be applicable to our setting. In our multi-copy cloud storage, if there are n copies for F i , there must exist n file block tag sets. If we use an ordinary Merkle tree to represent one block tag set, it will increase the cloud storage space and increase its search time. We must use some new data structure to organize file block tags. 2-3 tree is a kind of balanced tree that is proposed by Aho [4]. When blocks are modified, deleted or inserted, it should be more efficient in the hash 2-3 trees, as it has fewer depth than the one in a Merkle tree. As the hash 2-3 tree is a kind of searchable tree, it cannot be applicable to our setting directly. In order to support data dynamics and blockless verification efficiently, we design an authenticated 2-3 tree with ordered leaves.
All the leaves in the tree have a sequence number.
Each interior node of the tree has two or three children.
Each path from the root to the leaf of the tree has the same length.
The root of the tree authenticates the entire tree.
There are three kinds of nodes in an A2-3OL tree (see Fig. 3), different types of nodes have different structures. A leaf node is composed of BN and H (BBN), where BN stores block number and H (BBN) stores hash value of BBN. An inner node or a root node has six fields where first, second and third store a pointer that points its child respectively. If a node has two children then third will be null. If a node has one child then second and third will be null. H (W1, W2, W3) is the hash value of its three children tags, if there is no third child then W3 = R, where R denotes some random value. BNlow denotes the lowest block number of a leaf in its second subtree, and BNhigh denotes the lowest block number of a leaf in the third subtree, and all block numbers of the first subtree are less than those in the second subtree and so does it between the second subtree and the third subtree.
A Construction for P3DP
We have defined P3DP in Section 2.5. In this section, we use a kind of RSA tag and A2-3OL tree to construct the P3DP.
. It takes the security parameter ν as input and outputs a block encryption key k, a public key P
k
= (N, g, e), a secret key S
k
= d, a signing key sgk = x and a verification key . ed = 1 mod (p - 1) (q - 1). N is a RSA modulus which is the product of p and q. All these quadratic residues modulo N form a multiplicative cyclic group QR
N
where g is its generator. To make our scheme secure, we make p and q are two big prime numbers. As a general rule, we usually make p = 2p′ + 1 and q = 2q′ + 1 where p′ and q′ are also two big prime numbers. In our construction, we use DSA [39] to sign the root of A2-3OL tree. GenCopy (F, m, k) → {F
i
(i = 1, 2, …, m)}. It takes F, m and k as input and outputs a set of copies F
i
(i = 1, 2, …, m). F
i
is created by concatenating i behind each block of file F and encrypting them with AES. B
ij
= E
k
(B
j
i) 1≤i≤m,1≤j≤n. The client keeps the secret key k to decrypt each copy block in the later time to restore the original blocks of file F. As k was kept by the file owner secretly, no one other than the file owner can restore F. It is widely believed that modern block ciphers, such as AES, have the avalanche property that a small change to its input will result in a large change to the output. So F
i
and F
j
(i ≠ j) are different. GenTag (F
i
, d, sgk, g) → {{Ti1, Ti2, ⋯ , T
in
} , Sig
sgk
(H (R
i
))}. It takes F
i
, d, sgk and g as input and outputs a set of verification tags φ
i
= {Ti1, Ti2, ⋯ , T
in
} (1 ≤ i ≤ m) and a root signature Sig
sgk
(H (R
i
)). It sends Sig
sgk
(H (R
i
)) to PDA and sends φ
i
to the server. Sig
sgk
(H (R
i
)) is the signature of R
i
which is the root of the A2-3OL tree of the file F
i
. T
ij
is a function of the tagged block B
ij
which can be used by the server to prove possession of the corresponding block B
ij
of F
i
. T
ij
= (g
B
ij
)
d
mod N for 1 ≤ j ≤ n.
Challenge ((g, N, e) , i) → {i, k1, k2, g
s
, c}. In order to check whether all blocks of the ith copy F
i
are possessed in cloud servers exactly, the client or PDA challenges cloud servers to prove possession of a random subset blocks of file F
i
. The challenge is made up of c, i, k1, k2 and g
s
. k1, k2 and s are chosen randomly for each challenge. c is the number of challenged blocks and i denotes the ith copy of F. k1 is a key for pseudorandom permutation Π1 to determine the indexes of challenged blocks. k2 is a key for the pseudorandom permutation Π2 to determine some random numbers. g
s
= g
s
mod N, where . GenProof (F
i
, (i, k1, k2, g
s
, c) , φ) → {p, T, {H (B
ij
) , ω
j
} j∈J, Sigsgk (H (R
i
))}. After it receives a challenge from the client, the server uses a pseudorandom permutation Π1 keyed with k1 to determine j
t
which is the indexes of challenged blocks of F
i
, where j
t
= Π1k1 (t) (1 ≤ t ≤ c), and uses the pseudorandom permutation Π2 keyed with k2 to generate the pseudorandom coefficient a
j
, where a
j
= Π2k2 (j) (1 ≤ j ≤ c). Then the server generates a proof of possession PR = (p, T, {H (B
ij
) , ω
j
} j∈J, Sigsgk (H (R
i
))) for these blocks and returns PR to the challenger. p is computed by raising g
s
to the sum of product of the challenged blocks and pseudorandom coefficient. , and T is computed by multiplying the verification tags a
i
power (1 ≤ j ≤ c) corresponding to the challenged blocks. T = (T
j
1
)
a
1
× (T
j
2
)
a
2
× ⋯ × (T
j
c
)
a
c
. {ω
j
} j∈J are the node siblings on the path from the leaves {H (B
ij
)} j∈J to the root R
i
of the A2-3OL tree. The server responds to the verifier with PR = (p, T, {H (B
ij
) , ω
j
} j∈J, Sigsgk (H (R
i
))). R
i
is the root of the A2-3OL tree of F
i
and Sigsgk (H (R
i
)) is the signature of R
i
. CheckProof ((g, N, e) , vk, (i, k1, k2, g
s
, c) , (p, T, {H (B
ij
) , ω
j
} j∈J, Sigsgk (H (R
i
)) , H (R
i
)) , s)→ {success, failure}. The client or PDA uses vk to check Sig
sgk
(H (R
i
)). If this check fails, it halts and outputs failure, otherwise it generates the root using {H (B
ij
) , ω
j
} j∈J, and checks whether equals (H (R
i
)). If this check fails, it halts and outputs failure, otherwise it checks if (T
e
)
s
equals p. If (T
e
)
s
equals p, it outputs success, otherwise it outputs failure.
GenBlockNo (i, j) → {bn}. When the file owner requires the block j of F
i
, it generates a block number bn = (i, j), and sends bn to the server. TransmitBlock (F
i
, bn) → {B
ij
, {H (B
ij
) , ω
ij
} j∈J, Sigsgk (H (R
i
))}. Once the file server receives bn, it sends the B
ij
, {H (B
ij
), ω
ij
} j∈J and Sigsgk (H (R
i
)) to the file owner. RestoreBlock (B
ij
, {H (B
ij
) , ω
ij
} j∈J, Sigsgk (H (R
i
))} , k, vk) → {success, failure}. When the file owner receives B
ij
, {H (B
ij
) , ω
ij
} j∈J and Sigsgk (H (R
i
)), he uses vk to check Sigsgk (H (R
i
)). If this check fails, the file owner rejects it by outputting failure, otherwise he generates the root of the A2-3OL tree of F
i
, and checks if equals (H (R
i
)). If this check fails, the file owner rejects it by outputting failure, otherwise he decrypts the block using the privacy key k to restore B
ij
and outputs success.
. T denotes the dynamic operation type. If T equals I, it denotes insertion operation; if T equals M, it denotes modification operation; if T equals D, it denotes deletion operation. is the root of the A2-3OL tree of and is the signature of . Given F
i
, i, B
j
, φ
i
and T, it outputs and . It also outputs , which is obtained from the file owner.
GenerateDuplicateBlock (i, k, j) → {B
ij
}. It first invokes the file block requirement protocol to get b
ij
, then uses the block encryption key k and i to generate the block B
ij
, and sends B
ij
to the cloud storage server.
Note that if one wants to restore the entire file, which has m blocks, it just invokes the file block restoration protocol m rounds.
Security Analysis
Let N = pq be an RSA modulus where p = 2p′ + 1 and q = 2q′ + 1 are safe primes, QR N is the unique cyclic subgroup of where g is its generator. In this section, we will analyze the security of our construction of P3DP, and prove that it satisfies these security requirements that have been defined in Section 2.6.
B generates s, g, e, g
s
and N, and gives (g, g
s
, N) to A. g is the generator of QR
N
, g
s
= g
s
mod N. A adaptively selects some file blocks B
ij
(i = 1, 2, ⋯ , m, j = 1, 2, ⋯ , n) and queries the verification bock tags T
ij
from B. B computes T
ij
= (g
B
ij
)
d
mod N, and gives T
ij
to A. B generates a chal = (i, c, k1, k2, g
s
) for B
i
which is the block collection of F
i
, and sends chal to A. A computes a response R = (p, T, {H (B
ij
) , ω
j
} j∈J, Sigsgk (H (R
i
))) to prove data possession of these requested file blocks, where p, T, j
t
and a
j
satisfy following equations:
If checkproof (e, chal, R) = success, then adversary A wins Game 1.We use pr [WinA,Game1 (ν)] to denote the probability that A wins Game 1. A inputs (N, g, g s ) and outputs p,T,{H (B ij ) , ω j } j∈J and Sigsgk (H (R i ))). As the signature is secure, we know that A cannot use some incorrect value of {H (B ij ) , ω j } j∈J and Sigsgk (H (R i )) to pass the data possession checking protocol. According to Equation (3) and Equation (4), we can get P = T s as follows:
B passes x and tags in the previous proof to , along with the chal. first checks to see if there is any tag mismatch: T
ij
≠ g
B
ij
mod N. For some 1 ≤ j ≤ c, B
ij
are the original blocks. If this is the case, can output T
ij
and g
B
ij
mod N for particular j as a collision. If all the tags match these original blocks, uses the challenge and ids of these challenged blocks to compute linear combination d = (a1B
ij
1
+ a2B
ij
2
+ ⋯ + a
c
B
ij
c
) of the original blocks. From the previous proof, we know T = g
x
mod N. Since all the tags are matched, we have T
ij
= g
B
ij
mod N, and we obtain T = g
d
mod N where x ≠ d. Using miller’s lemma [16], can factor N.
We use pr [FactorN] to denote the probability that can factor N. But according to the factoring assumption, we can see that can’t factor N. Therefore we can gets
invokes to run the setup function. keeps k secret and gives ((g, N, e) , (d)) to . gives (g, N, e) to the adversary A. firstly constructs an empty file block list for F
i
. The list is consisted of two columns. The first column is used to store the file block and the second column is used to store its corresponding random value. When gives some file blocks B
ij
(j = 1, 2, ⋯ , n) of F
i
to , will firstly check its file block list. If the file block is not in its file block list, will query to get a random value and give this random value to A, then appends the file block and its corresponding random value in the file block list. If this file block is in the file block list, will return its corresponding random value in the list to the A. A gives two file blocks Bi1 and Bi2 to . selects j1 randomly and sends to . sends AES
k
(Bij1) to . sends AES
k
(Bij1) to A and invokes A. If A outputs the value j2 that equals j1 then outputs 1, otherwise outputs 0.
From the construction of adversary , if the runtime of A is polynomial time, then the runtime of is also polynomial time. The probability that adversary attacks AES can be computed asfollows:
If A wins Game 2 then . That means can attack AES successfully. But as we know that AES is secure, A cannot win Game 2. This completes the proof.□
selects one random number s′. randomly selects some numbers {j1, j2, ⋯, j
c
} and some coefficient numbers {a1, a2, ⋯, a
c
}. sends a challenge (k, {j1, j2, ⋯ , j
c
} , gs′, {a1, a2, ⋯ , a
c
}) to the server, and gets the output o. Using the knowledge of {T
ij
}(j = 1, 2, ⋯, n) computes T = ((T
ij
1
)
a
1
× (T
ij
2
)
a
2
× ⋯ × (T
ij
c
)
a
c
mod N, p′ = (T
e
)
s
mod N, and PR′ = (p′, T, {H (B
ij
) , ω
j
} j∈J, Sigsgk (H (R
i
)). outputs (x, PR′). x denotes the PDA’s input where x = {N, g, {T
ij
} j=1,2,⋯,n, k1, k2, s}.
We use View
PDA
to denote the PDA’s view during the execution of the protocol:
T ij (j = 1, 2, ⋯ , n) are the block tags which are available to anyone that are also included in the PDA’s view. In the simulator’s output (x, PR′) and the PDA’s view View PDA , s and s′ are identical distribution and the other values are identical, so the simulator’s output and the PDA’s view are identical distribution, which are computationally indistinguishable. Therefore, PDA cannot get any information about the file in the course of data possession checking. This completes the proof.□
In this section, we explain how to use P3DP in a cloud file backup system. As mentioned in the introduction, P3DP allows storing multiple copies of a file in the remote server, which are in the encrypted form. The file backup client software does not need to be modified much. Since all copies can be verified by the client in the encrypted form, using P3DP would provide stronger security and availability guarantee. A trusted client-side application that use P3DP client to setup the security parameter and replica number m (e.g. m = 2). The file backup server software need to be modified to implement the GenProof and TransmitBlock of P3DP.
The client application maintains separate P3DP client state (e.g., block encryption key) for every backup file. All P3DP algorithms are described in Section 4.
Performance
Implementation
In order to demonstrate the feasibility of P3DP, we implemented it in C over the OpenSSL cryptographic library(version 0.98g) [38]. All algorithms in our implementation are described in Section 4. The cryptographic primitives for our implementation use OpenSSL. Encryption for the file block is the OpenSSL of 128-bit AES-CBC [47], and the hash function is the OpenSSL of SHA-256 [11]. The RSA for the block tag is the OpenSSL of RSA-1024 [46], and the signature of the root of A2-3OL tree is the OpenSSL of DSA [39].System actions (e.g. filesystem access and network transmission) and cryptographic computations are two kinds of time-intensive operations in the implementation of P3DP. As the systems costs will vary between underlying systems, to evaluate P3DP fairly, we don’t consider the system costs in our test framework; all operations are performed inmemory.
Experiments
All experiments were performed in Fedora 8.0 with kernel 2.6.23.1. The CPU of the experiment machine is Intel® Pentium® Dual E2160 1.8G and the RAM is 4GB. All experiments ran single-threaded on the processors. The Disk of the machine is Western Digital Caviar SE Hard Drive that has 320GB capacity, 7200rpm with 8 MB Cache. In the following experiments, the file size is 4MB, block size is 4KB, the number of each challenge is 460, and all data represents the mean of 10 trials.
We first study the scalability of P3DP when we create multiple copies and compare its performance to that of MR-PDP [45] and EMC-PDP [2]. The experiment results are shown in Fig. 4. The x axis represents the copy number, and the y axis represents the copy generation time. The copy generation time of the three schemes is almost equivalent. When the number of copy is less than or equal to 8, the copy generation time is less than 0.26 second. The copy generation time increases quickly when the number of copy increases, and it increases to about 2.5 second. So we can choose 8 as the ideal copy number.
Since the data possession checking is one of key functions of our P3DP, we evaluated the performance of P3DP in the data possession checking. This challenges in these experiments represent 99% confidence that less than 1% of the data have been damaged [12]. Ateniese et al. have demonstrated that if the server misses a portion of a file, then the number of blocks that the challenger needs to check, so as to detect the misbehavior of the server with a certain probability, is a constant amount of blocks, independently of the total number of file blocks: e.g., if the server misses 1% of the file F i , then the challenger only wants to request proof of possession for 460 randomly chosen blocks of F i which detects server misbehavior at least 99% [12]. Figures 5 and 6 show the processing costs of the server and client respectively, when the client challenges 460 blocks to the server for the different number of copies. The x axis in Figures 5 and 6 represents the number of copies to be checked in the experiments, the y axis in Fig. 5 represents the computation time spent by server, and the y axis in Fig. 6 represents the computation time spent by client. From these figures, we can see that computation time increased with the copy number in the server and client, our P3DP scheme outperforms EMC-PDP and MR-PDP, it reduces the server computation time when a large number of verifiers are connected to the server causing a huge computation overhead over the server, it also reduces the verifier computation time.
As there is no multi-copy PDP that supports data dynamics in the literature, we do the experiment of our P3DP against the DPDP-Skiplist [6] which uses an authenticated skip list to organize their tags and DPOR-Merkle [42] which uses a Merkle tree to organize their tags. To compare the performance of these scheme fairly in this experiment, we use our P3DP as a single DPDP and compare their insertion, deletion and modification time in server when the verifier wants to insert, delete or modify some blocks in the server. From Figs. 7–9 we can see that the server insertion performance outperforms those of the other schemes, while the deletion and the modification performances of our P3DP scheme are slightly slower than those of the other schemes. The reason is that the authenticated 2-3 tree is a kind of balanced tree. When we use it to organize the block tags, its tree depth is smaller than those of the authenticated skip list and the merkle tree. When the server inserts a block, it can quickly locate its insertion location. When the server delete or modify a block, it will cause the authenticated 2-3 tree to invoke the tree balance operation, which will add its deletion or modification operation time, while the other two data structures need not this tree balance operation. If we use the stochastic scheduling technology [28] and the batch processing technology [29] for block deletion or block modification in the server, then the server can reduce the number of tree balance operation at all, and it will improve the efficiency of block deletion or block modification eventually.
Conclusions
Cloud storage has become an important storage scheme for a user to store his files, but this scheme makes the file owner not possess his files physically, and raises some issues related to file security. In this work we have studied the problem of creating multiple copies of a file over the cloud servers and auditing all these copies to verify their correctness and completeness. We have proposed a privacy preserving provable data possession with multi-copy and data dynamics (P3DP) to achieve our goals, and have used a kind of authenticated 2-3 tree with ordered leaves and a kind of RSA tag to construct our P3DP. The security analysis and experimental results have showed that our construction of the proposed scheme is proven secure. As Hadoop is the prevailing cloud system [43], we will integrate P3DP to Hadoop in the future work.
Footnotes
Acknowledgments
The research was partially funded by the Key Program of National Natural Science Foundation of China (Grant Nos. 61133005, 61432005), and the National Natural Science Foundation of China (Grant Nos. 61370095, 61472124, 61502163).
