PCPOR: Public and constant-cost proofs of retrievability in cloud 1

Abstract

For data storage outsourcing services, it is important to allow users to efficiently and securely verify that cloud storage servers store their data correctly. To address this issue, a number of Proof of Retrievability (POR) and Proof of Data Possession (PDP) schemes have been proposed wherein servers must prove to a verifier that data are stored correctly. While existing POR and PDP schemes offer decent solutions addressing various practical issues, they either have non-trivial (linear or quadratic) communication and computational complexity, or only consider private verification. In this paper, we propose the first POR scheme with public verifiability, constant communication and computational costs on users. In our scheme, messages exchanged between cloud servers and users are composed of a constant number of group elements and random numbers; computational tasks required on users are also constant; batch auditing of multiple tasks is also efficiently supported. We achieved these by a unique design based on our novel polynomial-based authenticators. Extensive experiments on Amazon EC2 cloud and different client devices (contemporary and mobile devices) show that our design allows a user to audit the integrity of a file of any size with a constant computational cost of 150 ms on PC (2.11 s on mobile device) and a communication cost of 2.34 kB for 99% error detection probability when employing an erasure coding with 1% fault tolerance rate. We prove the security of our scheme based on the Computational Diffie–Hellman problem, the t-Strong Diffie–Hellman problem and the Static Diffie–Hellman problem.

Keywords

Integrity checking cloud storage public verification

1. Introduction

The rapid development of cloud technologies is increasingly attracting customers including both organizations and individuals. Currently, millions of users are using cloud storage services including Amazon S3, Microsoft SkyDrive, Google Cloud Storage, iCloud and Dropbox. Despite the proliferation of cloud storage service, it also raises security concerns since outsourcing data to the cloud makes the data owner lose physical control over the storage sites. One significant concern, among the many, is data integrity, i.e., whether or not the cloud server indeed stores its users’ data correctly. Recently, a number of data loss events have been reported for well-known storage providers [2,3,9,13], including Amazon S3 and Dropbox. In addition, we observed that there have been large discrepancies between the numbers of data corruption events reported by users and those acknowledged by service providers [13], which also causes users to doubt whether or not their data on cloud are truly intact. To ensure users’ confidence in the integrity of data remotely stored in the cloud, a reliable proof-of-retrievability (POR) [18] and/or Proof of Data Possession (PDP) [4] system is desirable. Specifically, in a POR or PDP system the data storage server must prove that it indeed stores users’ data correctly, and the user shall be able to verify whether or not the proof generated by the server is valid. In practice, the performance of a POR or PDP system is mainly impacted by the following factors: (1) the communication cost between server and user during the verification process; (2) the computational cost introduced to the user in each verification; (3) the storage overhead on server side and user side; (4) the way of verification, i.e., private verifiability or public verifiability. Specifically, private verifiability only allows the data owner with system secret keys to check data integrity, and public verifiability enables any entities with public keys to check data integrity without the help of the data owner. In today’s cloud computing, third party Cloud Management Broker (CMB) [20] plays an important intermediary role between cloud computing providers and cloud users. As cloud users may not have in depth of knowledge to deploy and manage their data and services in cloud computing, CMB now is employed by many users to handle their data and services in cloud. In order to continually guarantee the integrity of outsourced data in cloud, integrity verification should be performed periodically. In such a scenario, public verifiability is required if cloud users also want to delegate these periodical integrity verification tasks to a CMB. With public verifiability, a CMB can perform integrity verification on behalf of cloud users without knowing any secrets of them.

Toward designing a practical POR/PDP system, in past years a body of techniques [4,7,14,18,22,24,25,27,28,30] have been proposed. While these schemes offer flexible verification – private and public verifications, most of them have communication cost linear to the data block size and computational cost on the user side linear to the number of challenging blocks. With an increasing number of mobile users, who manage their cloud data through mobile apps (e.g., iCloud, iAWS, etc.) and have constrained bandwidth and computational resources (e.g., mobile phones with limited data plan), such communication complexity and computational complexity represent considerable cost to the system and can even make the POR system unaffordable. To enhance existing schemes, Xu et al. [28] constructed a POR scheme with constant communication cost by utilizing a polynomial commitment technique. The main drawback of this scheme, however, is that it only supports private verification, which means that all verifications must be performed by the data owner himself/herself. To reduce the computational cost for users, Wang et al. [25] proposed a public integrity auditing scheme with batch verification. However, Ref. [25] assumes cloud servers to be semi-honest, i.e., they will honestly follow the protocol. Malicious (misbehaving or compromised) cloud servers [22,28] are not considered. To our best knowledge, there is no existing secure POR/PDP solution that offers public verifiability with constant communication cost and computational cost.

In this work, we propose a public POR scheme with constant communication cost and computational cost, namely PCPOR, under the assumption that cloud servers can be misbehaving or compromised. Thanks to our novel design of polynomial-based authentication tags, PCPOR makes the cloud server to consolidate proof information for integrity verification into three constant size elements. To verify these constant size proof information, PCPOR requires just a constant number of computational operations for the verifier, no matter how large the audited file is and what the block size is in the audited file. By supporting public verifiability, PCPOR allows any user or third-party auditor, e.g., cloud management broker, to perform the verification process using public keys without contacting the data owner. PCPOR also enables aggregation of integrity auditing operations for multiple tasks (files) through our batch integrity auditing technique, which further promotes its auditing efficiency. Notably, PCPOR achieves these with a storage overhead comparable to existing schemes. Thorough numerical analysis and extensive experimental results on Amazon EC2 cloud and various client devices (PC and mobile devices) demonstrate the scalability and efficiency of PCPOR. Based on the Computational Diffie–Hellman (CDH) problem [11], the t-Strong Diffie–Hellman (t-SDH) problem [6] and the Static Diffie–Hellman problem [8], we prove the security of our PCPOR scheme.

Our main contributions can be summarized as further.

We construct the first secure and efficient public POR scheme with constant communication cost and computational cost considering malicious cloud servers; the proposed scheme can also efficiently handle multiple verification requests with batch operations.

Our proposed scheme obviates the need for a trade-off between communication cost and storage cost as claimed in the public POR scheme [22].

Our design of polynomial based authentication tag can be used as an independent solution for other related fields, such as verifiable SQL query, verifiable key word search, etc.

The rest of this paper is organized as follows. Section 2 describes the system model and security model; We introduce the detailed construction of PCPOR in Section 3, which is followed by the security proof in Section 4; We evaluate the performance of our scheme in Section 5; Section 6 introduces applications of our proposed authentication tag in other related fields; In Section 7, we review and discuss related works; We conclude the paper in Section 8.

Fig. 1.

System model. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/JCS-150525.)

2. Models

2.1. System model

In this work, we follow the POR model that is adopted by most existing POR schemes [7,18,22,28]. As shown in Fig. 1, we consider three participating entities: Data owner, User and Cloud server. Data owner has a collection of data and stores them on cloud servers together with corresponding authentication tags. The owner may share the stored data with a group of users who may perform integrity check upon each access. Note that data integrity check can also be conducted by a Third-Party Auditor (TPA) who has the necessary resource/expertise for periodical integrity auditing. In practice, a cloud management broker [20] can perform the role of as TPA. Since the proposed scheme is intended for public verifiability, we do not differentiate ordinary users from a TPA. To check the integrity of data,2

²
W.l.o.g., data are assumed to be stored data files.

a user generates a challenging message and sends it to the cloud server. The cloud server then responds the computed proof information for the selected file blocks to the user. After receiving the proof information, the user verifies the integrity of data through the verification algorithm. Our proposed scheme contains 5 algorithms: KeyGen, Setup, Challenge, Prove and Verify.

KeyGen: Given a selected security parameter λ, the randomized KeyGen algorithm outputs the system public key and secret key as $(PK, SK)$ .

Setup: Given a file $F \in {0, 1}^{*}$ and the public-secret key pair $(PK, SK)$ , the Setup algorithm generates authentication tags ${σ_{i}}$ . The file and its tags will be stored on the server.

Challenge: Given the public key PK, a user generates a challenging message CM and sends it to the cloud server.

Prove: Given the public key PK, authentication tags ${σ_{i}}$ and a challenging message CM, the Prove algorithm produces a proof response Prf.

Verify: Given the public key PK and the proof information Prf, the Verify algorithm checks the data integrity and outputs result as either Accept or Reject.

2.2. Security model

We consider the storage server as untrusted and potentially malicious, which is consistent with existing POR schemes [18,22,28]. The cloud server can modify or drop users’ data to save cost. In terms of security, we have correctness and soundness as our design goals as defined by POR schemes. For the correctness, we require that our Verify algorithm accepts a valid proof generated from valid keys, authentic files and tags. For the soundness, if any malicious cloud server does not store a file F correctly, it only has negligible probability to generate proof information that can pass our Verify algorithm.

3. Construction of PCPOR

3.1. Notations and technique preliminaries

Let $e : G \times G \to G_{1}$ be a bilinear map, which has following properties: bilinearity, i.e., for all $g_{1}, g_{2} \in G$ and $a, b \overset{R}{\leftarrow} Z_{p}^{*}$ , $e (g_{1}^{a}, g_{2}^{b}) = e {(g_{1}, g_{2})}^{a b}$ ; computability, i.e., there exists a computable algorithm that can compute e efficiently; non-degeneracy, i.e., for $g \in G$ , $e (g, g) \neq 1$ . G and $G_{1}$ are two multiplicative cyclic groups of order p, and $g, u \overset{R}{\leftarrow} G$ . H denotes the one-way hash function [16]. We define $f_{\vec{c} (x)}$ as a polynomial with coefficient vector $\vec{c}$ .

3.2. Single file verification

In this section, we first introduce our PCPOR scheme for the integrity verification of a single file. We then show how to efficiently support batch verification of multiple files in Section 3.3.

$KeyGen (1^{λ}) \to (PK, SK)$ : A data owner first generates a random signing key-pair ( $(spk, ssk) \overset{R}{\leftarrow} SKG$ ) using BLS signature [6]. The owner then chooses two random numbers $α, ϵ \overset{R}{\leftarrow} Z_{p}^{*}$ and computes $v \leftarrow g^{ϵ}$ , $κ \leftarrow g^{α ϵ}$ , and ${g^{α^{j}}}_{j = 0}^{s + 1}$ . The public and secret keys of the system are: $\begin{array}{rcl} PK = {p, ν, κ, spk, u, {g^{α^{j}}}_{j = 0}^{s + 1}}, \\ SK = {ϵ, ssk, α} . \end{array}$

$Setup (PK, SK, F) \to (σ, τ)$ : To process a file F, the owner splits F into n blocks after erasure coding, and each block into s elements: ${m_{i j}}$ , $1 ⩽ i ⩽ n$ , $0 ⩽ j ⩽ s - 1$ . The owner chooses a random file name $name \overset{R}{\leftarrow} Z_{p}^{*}$ . Let $τ^{'}$ be “ $name ∥ n$ ”; the file tag τ is $τ^{'}$ concatenated with a signature on $τ^{'}$ under ssk: $τ \leftarrow τ^{'} ∥ {SKG}_{ssk} (τ^{'})$ . For each data block, an authentication tag is computed as: $\begin{array}{rclr} σ_{i} & = & {(u^{H (name ∥ i)} \cdot \prod_{j = 0}^{s - 1} g^{m_{i j} α^{j + 2}})}^{ϵ} \\ = & {(u^{H (name ∥ i)} \cdot g^{f_{\vec{β_{i}}} (α)})}^{ϵ}, & (1) \end{array}$ where $1 ⩽ i ⩽ n$ , $\vec{β_{i}} = {0, 0, m_{i, 0}, m_{i, 1}, \dots, m_{i, s - 1}}$ . F, τ and ${σ_{i}}$ are outsourced to cloud server for storage.

$Challenge (PK, τ) \to CM$ : To challenge the integrity of a file stored on cloud server, a user first retrieves the file tag τ from the cloud and verifies the signature on it: if the signature is not valid, the user rejects it and halts; otherwise, the user parses τ to recover name and n. Then, the user randomly chooses a number $r \overset{R}{\leftarrow} Z_{p}^{*}$ and a k-element subset K of the set $[1, n]$ (we will discuss the size of K in Section 3.4). The user challenges the cloud storage server with a challenging message $CM = {K, r}$ .

$Prove (PK, F, CM) \to Prf$ : The cloud server parses the challenging message CM as ${K, r}$ and generates ${c_{i} = r^{i}}$ , $σ = \prod_{i \in K} σ_{i}^{c_{i}}$ and $y = f_{\vec{A}} (r) mod p$ , where $i \in K$ and $\vec{A} = {0, 0, \sum_{i \in K} c_{i} * m_{i, 0}, \dots, \sum_{i \in K} c_{i} * m_{i, s - 1}}$ . The cloud server also divides the polynomial $f_{\vec{A}} (x) - f_{\vec{A}} (r)$ with $(x - r)$ using polynomial long division, and denotes the coefficients vector of the resulting quotient polynomial as $\vec{w} = (w_{0}, w_{1}, \dots, w_{s})$ , that is, $f_{\vec{w}} (x) \equiv \frac{f_{\vec{A}} (x) - f_{\vec{A}} (r)}{x - r}$ . Note that, polynomials $f (x) \in Z [x]$ have the algebraic property that $(x - r)$ perfectly divides the polynomial $f (x) - f (r)$ , $r \overset{R}{\leftarrow} Z_{p}^{*}$ . Then, the cloud server generates $ψ = \prod_{j = 0}^{s} {(g^{α^{j}})}^{w_{j}} = g^{f_{\vec{w}} (α)} .$ (2) Finally, the cloud server responds the user with proof information $Prf = {σ, ψ, y}$ .

$Verify (PK, Prf) \to Rst$ : On receiving the proof information Prf, the user first computes $ω = \sum_{i \in K} c_{i} * H (name ∥ i)$ and $η = u^{ω}$ . Then the user verifies the integrity of file F as $e (η, ν) \cdot e (ψ, κ \cdot ν^{- r}) \overset{?}{=} e (σ, g) \cdot e (ν^{- y}, g) .$ (3) If Eq. (3) holds, the user outputs Rst as Accept; otherwise, outputs Reject.

Correctness: Considering a cloud server who honestly responds to the challenge with a $Prf = {ψ, y, σ}$ , we validate the correctness of our construction based on Eq. (3) as: $\begin{array}{rclr} e (η, ν) \cdot e (ψ, κ \cdot ν^{- r}) \\ = e {(u, g)}^{ϵ (\sum_{i \in K} c_{i} * H (name ∥ i))} \cdot e (g^{f_{\vec{w}} (α)}, g^{ϵ (α - r)}) \\ = e {(u, g)}^{ϵ (\sum_{i \in K} c_{i} * H (name ∥ i))} \cdot e {(g, g)}^{\frac{f_{\vec{A}} (α) - f_{\vec{A}} (r)}{α - r} \cdot ϵ (α - r)} \\ = e {(u, g)}^{ϵ (\sum_{i \in K} c_{i} * H (name ∥ i))} \cdot e {(g, g)}^{ϵ (f_{\vec{A}} (α) - f_{\vec{A}} (r))} \\ = e (u^{ϵ (\sum_{i \in K} c_{i} * H (name ∥ i))} \cdot g^{ϵ f_{\vec{A}} (α)}, g) \cdot e (ν^{- y}, g) \\ = e (σ, g) \cdot e (ν^{- y}, g) . & (4) \end{array}$ From Eq. (4), it is easy to see that our scheme is correct.

3.3. Efficient verification of multiple files

As the data owner can store a large number of files on cloud servers, when a user wants to verify the integrity of these files, it is inefficient to process these verification requests one by one. Specifically, given integrity verification requests for L different files ${F_{l}}_{1 ⩽ l ⩽ L}$ , it is desirable for the user to efficiently aggregate these verification requests and reduce both communication cost and computational cost. For this purpose, we design the batch verification algorithm based on our single file construction. In the batch verification algorithm, KeyGen and Setup algorithms are same as those in the single file scenario, respectively. Here we focus on introducing the remaining three algorithms, namely Batch-Challenge, Batch-Prove and Batch-Verify as further.

$Batch-Challenge (PK, τ) \to CM$ : To challenge L files ${F_{l}}_{1 ⩽ l ⩽ L}$ at the same time, a user first retrieves the file tags ${τ_{l}}_{1 ⩽ l ⩽ L}$ for these files from the cloud server and verify them. Note that, with the BLS batch verification technique proposed by Camenisch et al. [10], the computational load of verifying L signatures can be reduced to only 2 Pairing operations. If all file tags are valid, the user parses ${τ_{l}}_{1 ⩽ l ⩽ L}$ to recover ${{name}_{l}}_{1 ⩽ l ⩽ L}$ and ${n_{l}}_{1 ⩽ l ⩽ L}$ . Then, the user randomly chooses a number $r \overset{R}{\leftarrow} Z_{p}^{*}$ and a k-element subset K of the set $[1, n]$ and challenges the cloud storage server with challenging message $CM = {K, r}$ .

$Batch-Prove (PK, CM, {F_{l}}) \to Prf$ : On receiving the challenge message CM, the cloud server first generates the proof information for each file separately as our single file scenario as: $\begin{array}{rclr} y_{l} = f_{\vec{A_{l}}} (r), 1 ⩽ l ⩽ L, & (5) \\ ψ_{l} = \prod_{j = 2}^{s_{l} + 1} {(g^{α^{j}})}^{w_{l j}} = g^{f_{\vec{w_{l}}} (α)}, 1 ⩽ l ⩽ L, & (6) \\ σ_{l} = \prod_{i \in K} σ_{l i}^{c_{i}}, 1 ⩽ l ⩽ L . & (7) \end{array}$ Then the cloud server aggregates the proof information of each file as $\begin{array}{rcl} y = \sum_{l = 1}^{L} y_{l} mod p = f_{\vec{A}} (r) mod p, \\ ψ = \prod_{l = 1}^{L} ψ_{l}, σ = \prod_{l = 1}^{L} σ_{l}, \\ \vec{A} = {0, 0, \sum_{l = 1}^{L} \sum_{i \in K} c_{i} * m_{l i, 0}, \dots, \sum_{l = 1}^{L} \sum_{i \in K} c_{i} * m_{l i, s_{l} - 1}} . \end{array}$ The cloud server finally sends the proof information $Prf = {σ, ψ, y}$ to the user.

$Batch-Verify (PK, Prf) \to Rst$ : On receiving the proof information Prf, the user first computes $ω = \sum_{l = 1, i \in K}^{L} c_{i} * H ({name}_{l} ∥ i)$ and $η = u^{ω}$ . Then the user verifies the integrity of these L files together as $e (η, ν) \cdot e (ψ, κ \cdot ν^{- r}) \overset{?}{=} e (σ, g) \cdot e (ν^{- y}, g) .$ (8) If Eq. (8) holds, then the user outputs Rst as Accept; otherwise, outputs Rst as Reject.

Correctness: We validate the correctness of our batch verification construction based on Eq. (8) as: $\begin{array}{rclr} e (η, ν) \cdot e (ψ, κ \cdot ν^{- r}) \\ = e {(u, g)}^{ϵ (\sum_{l = 1, i \in K}^{L} c_{i} * H ({name}_{l} ∥ i))} \cdot e (g^{\sum_{l = 1}^{L} f_{\vec{w_{l}}} (α)}, g^{ϵ (α - r)}) \\ = e {(u, g)}^{ϵ (\sum_{l = 1, i \in K}^{L} c_{i} * H ({name}_{l} ∥ i))} \cdot e {(g, g)}^{(f_{\vec{A}} (α) - f_{\vec{A}} (r)) ϵ} \\ = e (u^{ϵ (\sum_{l = 1, i \in K}^{L} c_{i} * H ({name}_{l} ∥ i))}, g) \cdot e (g^{ϵ f_{\vec{A}} (α)}, g) \\ \cdot e {(g, g)}^{- ϵ f_{\vec{A}} (r)} \\ = e (σ, g) \cdot e (ν^{- y}, g) . & (9) \end{array}$ From Eq. (9), it is easy to verify that our construction is correct if the storage server generates the Prf honestly.

3.4. Error detection probability

In this section, we discuss the error detection probability of our PCPOR scheme and the size of set K in our scheme. As mentioned in Section 3.2, instead of challenging all data blocks of a file to check its integrity, we can randomly choose k blocks as set K to save communication and computational costs while remaining an acceptable level of error detection probability, which is proved in Refs [4,14,25]. Specifically, when we set fault tolerance rate of erasure coding as $rate = 1 %$ , the file can be recovered when less than 1% data blocks are corrupted. In this case, if there are more then 1% corrupted data blocks, the error detection probability will be $Pr [Detection] = 1 - {(1 - 0.01)}^{k}$ , in which 460 challenging data blocks will result in 99% detection probability. Similarly, when we set the fault tolerance rate of erasure coding as $rate = 2 %$ , we only need to challenge 228 data blocks to achieve at least 99% error detection probability. Therefore, the size of set K can be considered a fixed number in our scheme once the required error detection probability is determined.

4. Security analysis

In this section, we first introduce assumptions used in our proof, and then prove the security of our scheme based on these assumptions.

4.1. Assumptions

Definition 4.1 (Computational Diffie–Hellman (CDH) problem [11]).

Let $x, y \overset{R}{\leftarrow} Z_{p}^{*}$ . Given $(g, g^{x}, g^{y})$ , it is computationally intractable to compute the value of $g^{x y}$ , where G is a cyclic group of order p and g is a generator of G.

Definition 4.2 (Static Diffie–Hellman problem [8]).

Let $a \overset{R}{\leftarrow} Z_{p}^{*}$ . Given input as $(g, g^{a})$ and $h \in G$ , where g is a generator of a cyclic group G of order p. It is computationally intractable to compute the value $h^{a}$ .

Definition 4.3 (t-Strong Diffie–Hellman (t-SDH) problem [6]).

Let $α \overset{R}{\leftarrow} Z_{q}^{*}$ . Given input as a $(t + 1)$ -tuple $(g, g^{α}, \dots, g^{α^{t}}) \in G^{t + 1}$ , where g is the generator of a cyclic group G of order p. For any probabilistic polynomial time adversary ( $Adv$ ), the probability $Pr [Adv (g, g^{α}, \dots, g^{α^{t}}) = (c, g^{\frac{1}{α + c}})]$ is negligible for any value of $a \in Z_{q}^{*} / - α$ .

4.2. Security proof

Theorem 4.4.
If $g^{f_{\vec{c}} (α)}$ can be forged by an existed probabilistic polynomial time adversary Adv, we can construct an algorithm B that uses the Adv to efficiently solve the t-SDH problem.

By following the idea in Ref. [19], we prove Theorem 4.4 as further.
Proof of Theorem 4.4.
Suppose there exists a probabilistic polynomial time adversary Adv that can forge $f_{\vec{c_{1}}} (α)$ such that $g^{f_{\vec{c_{1}}} (α)} = g^{f_{\vec{c}} (α)}$ , where $f_{\vec{c}} (x)$ and $f_{\vec{c_{1}}} (x)$ are known to Adv. The Adv can construct another polynomial $f_{\vec{c_{2}}} (x) = f_{\vec{c}} (x) - f_{\vec{c_{1}}} (x)$ and obtain $g^{f_{\vec{c_{2}}} (α)} = g^{f_{\vec{c}} (α)} / g^{f_{\vec{c_{1}}} (α)} = g^{f_{\vec{c}} (α) - f_{\vec{c_{1}}} (α)} \in Z_{p} [x]$ . Since $f_{\vec{c_{1}}} (α) = f_{\vec{c}} (α)$ and $f_{\vec{c_{2}}} (α) = 0$ , i.e., α is a root of polynomial $f_{\vec{c_{2}}} (x)$ , by factoring $f_{\vec{c_{2}}} (x)$ [23], B can easily find $SK = α$ and use $(c, g^{\frac{1}{α + c}})$ to solve the instance of the t-SDH problem given by the system parameters. Note that, in our design, all information about α known to the adversary are in format $(g, g^{α}, \dots, g^{α^{t}})$ or further blinded by elements independent to α, therefore in our scheme the adversary’s view is essentially the same as that in the t-SDH problem. □

Based on the proof of Theorem 4.4, we are going to prove that the probability for a probabilistic polynomial time adversary Adv to bypass our verification algorithm (i.e., to break the soundness of PCPOR) is negligible. In our scheme, the Adv can interact with the data owner and make a polynomial number of oracle queries to authentication tags ${σ_{i j}}$ of files ${F_{i}}$ that are chosen by itself. The Adv can also execute the PCPOR protocol for polynomial times with the corresponding authentication tags and files. These executions can be arbitrarily interleaved with each other and with queries to the tags described further. Assuming the Adv finally generates a δ-admissible cheating prover $P^{'}$ that can convincingly bypass an δ fraction of the verification challenges, i.e., $Pr [Verify (PK, P^{'}, F_{i}) = Accept] ⩾ δ$ , it can claim that it has δ-confidence to break our PCPOR scheme. We now give the detailed proof that δ is negligible. Theorem 4.5.
If there exist a probabilistic polynomial time adversary Adv that can spoof the verifier using invalid proof information ${Prf}^{'}$ generated by the cheating prover with non-negligible probability δ (i.e., break the soundness of PCPOR), we can construct an algorithm B that uses the Adv to efficiently solve the CDH problem, the Static Diffie–Hellman problem or the t-SDH problem.
Proof.
Suppose a probabilistic polynomial time adversary Adv can use the cheating prover $P^{'}$ to generate a ${Prf}^{'} = (ψ^{'}, σ^{'}, y^{'})$ , $(ψ^{'}, σ^{'}, y^{'}) \neq (ψ, σ, y)$ and bypass the verification in our proposed scheme, where $(ψ, σ, y)$ are valid proof information, we can get the following two equations: $\begin{array}{rclr} e (η, ν) \cdot e (ψ, κ \cdot ν^{- r}) = e (σ, g) \cdot e (ν^{- y}, g), & (10) \\ e (η, ν) \cdot e (ψ^{'}, κ \cdot ν^{- r}) = e (σ^{'}, g) \cdot e (ν^{- y^{'}}, g) . & (11) \end{array}$ From Eq. (10) with Eq. (11), we can obtain: $\frac{e (ψ, κ \cdot ν^{- r})}{e (ψ^{'}, κ \cdot ν^{- r})} = \frac{e (σ, g)}{e (σ^{'}, g)} \cdot e (ν^{(y^{'} - y)}, g) .$ (12) Observe that, as ${Prf}^{'} \neq Prf$ , there is at least one different element in both proof information, i.e., $σ^{'} \neq σ$ , or $ψ^{'} \neq ψ$ or $y^{'} \neq y$ . With this in mind, we now do case analysis for ${Prf}^{'}$ . Case 1 ( $σ \neq σ^{'}$ ).

If $σ \neq σ^{'}$ , we rewrite Eq. (12) as $\begin{array}{rclr} e (σ, g) = \frac{e (ψ, κ \cdot ν^{- r})}{e (ψ^{'}, κ \cdot ν^{- r})} \cdot e (ν^{(y - y^{'})}, g) \cdot e (σ^{'}, g), \\ e (η, ν) = \frac{e (ν^{- y^{'}}, g) \cdot e (σ^{'}, g)}{e (ψ^{'}, κ \cdot ν^{- r})}, & (13) \\ e (η, ν) = \frac{e (g^{- y^{'}}, ν) \cdot e (σ^{' \frac{1}{ϵ}}, ν)}{e (ψ^{' (α - r)}, ν)}, \end{array}$ where $η = u^{\sum_{i \in K} c_{i} * H (name ∥ i)}$ is known to the Adv. We denote $ψ^{'}$ as $g^{θ^{'}}$ , η as $g^{ρ}$ , $σ^{'}$ as $g^{π^{'}}$ . Based on Eq. (13) we can get $\begin{array}{rclr} e (g^{ρ}, ν) = \frac{e (g^{- y^{'}}, ν) \cdot e (g^{\frac{π^{'}}{ϵ}}, ν)}{e (g^{θ^{'} (α - r)}, ν)}, \\ ρ = - y^{'} + \frac{π^{'}}{ϵ} - θ^{'} (α - r), & (14) \\ (θ^{'} (α - r) + ρ) ϵ = - y^{'} ϵ + π^{'} . \end{array}$ In this case, the Adv can output $g^{(θ^{'} (α - r) + ρ) ϵ} = ν^{- y^{'}} \cdot σ^{'}$ . Now, if the Adv knows the value of $θ^{'}$ , it can get ${(κ \cdot ν^{- r})}^{θ^{'}} \cdot η^{ϵ} = ν^{- y^{'}} \cdot σ^{'}$ . That is, given g and $g^{ϵ}$ , where ϵ is unknown, the Adv solve the Static Diffie–Hellman problem instance with $u^{ϵ} = {(\frac{ν^{- y^{'}} \cdot σ^{'}}{{(κ \cdot ν^{- r})}^{θ^{'}}})}^{\frac{1}{\sum_{i \in K} p_{i} H (name ∥ i)}}$ . If the Adv does not know the value of $θ^{'}$ , the data owner can give the Adv, $ψ^{' (α - r)} \cdot η = g^{(θ^{'} (α - r) + ρ)}$ and $ν = g^{ϵ}$ , where ϵ and $(θ^{'} (α - r) + ρ)$ are not known to the Adv. The Adv then solves the CDH problem instance with $ν^{- y^{'}} \cdot σ^{'}$ . Therefore, $σ^{'} = σ$ .

Case 2 ( $y \neq y^{'}$ ).

Here, we denote ψ as $g^{θ}$ and $ψ^{'}$ as $g^{θ^{'}}$ . If $y^{'} \neq y$ , the Adv can use Eqs (10), (11) and $σ^{'} = σ$ to output $\begin{array}{rclr} {(\frac{e (ψ, v)}{e (ψ^{'}, v)})}^{(α - r)} = \frac{e {(g, v)}^{- y}}{e {(g, v)}^{- y^{'}}}, \\ θ (α - r) + y = θ^{'} (α - r) + y^{'}, & (15) \\ \frac{(θ - θ^{'})}{y^{'} - y} = \frac{1}{α - r} . \end{array}$ In this case, the Adv can compute ${(\frac{ψ}{ψ^{'}})}^{\frac{1}{y^{'} - y}} = g^{\frac{θ - θ^{'}}{y^{'} - y}} = g^{\frac{1}{α - r}}$ (16) and output $(- r, g^{\frac{1}{α - r}})$ as a solution for t-SDH problem by given system parameters. Therefore, $y^{'} = y$ .

Case 3 (

ψ \neq ψ^{'}

If $ψ^{'} \neq ψ$ , the Adv can use Eqs (10), (11), $σ^{'} = σ$ and $y^{'} = y$ to output ${(\frac{e (ψ, g)}{e (ψ^{'}, g)})}^{ϵ (α - r)} = 1 .$ (17) As $ψ^{'} \neq ψ$ , the Adv can infer $α = r$ based on Eq. (17). In this case, the Adv can also output $(r, g^{\frac{1}{α + r}})$ as a solution of the t-SDH problem by given the system parameters. Therefore, $ψ^{'} = ψ$ . In addition, as proved in Theorem 4.4, $ψ = g^{f_{\vec{A}} (α)}$ cannot be forged. That is, when the Adv output $ψ^{'} = ψ$ , it has to computed based on actual data blocks according to our PCPOR scheme.

Based on our above analysis, we proved that there is no Adv that can use invalid proof information and bypass the verification in our PCPOR scheme with non-negligible probability δ.

Theorem 4.5 is proved. □

5. Performance evaluation

5.1. Numerical analysis

In this section, we numerically evaluate the performance of our proposed PCPOR scheme in terms of communication cost, computational cost and storage overhead. We compare our PCPOR scheme with existing POR schemes [22,25,28] and summarize the result in Table 1. For simplicity, in the following part of this paper, we denote the complexity of one multiplication operation on Group G as MUL and that of one exponentiation operation on Group G as EXP.3

³
When the operation is on the elliptic curve, EXP means scalar multiplication operation and MUL means one point addition operation.

Pairing is a bilinear pairing operation. λ is the security parameter,

| G |

is the size (in number of bits) of a group element on G. We ignore hash operations in our evaluation, since its cost is negligible compared to EXP, MUL and Pairing operations. For example, the cost of one EXP operation on a contemporary PC can be over 100,000 times more expensive than the cost of one SHA-1 hash operation required in our design.

Table 1

Complexity summary

	Ref. [22]	Ref. [28]	Ref. [25]	PCPOR
Public verifiability	Yes	No	Yes	Yes
Malicious cloud	Yes	Yes	No	Yes
Single task
Computational complexity (user)	$O (s + k) MUL + O (s + k) EXP + O (1) Pairing$	$O (1) EXP$	$O (k) EXP + O (k) MUL + O (1) Pairing$	$O (1) EXP + O (1) MUL + O (1) Pairing$
Communication complexity (user)	$O (s)$	$O (1)$	$O (1)$	$O (1)$
Multiple tasks
Computational complexity (user)	$O (L s + L k) MUL + O (L s + L k) EXP + O (L) Pairing$	$O (L) EXP$	$O (k) EXP + O (k) MUL + O (1) Pairing$	$O (1) EXP + O (1) MUL + O (1) Pairing$
Communication complexity (user)	$O (L s)$	$O (L)$	$O (L)$	$O (L)$
Storage overhead
Cloud server	$n \| G \|$	$n λ$	$n \| G \|$	$n \| G \|$
User	$3 \| G \|$	$λ + \| G \|$	$6 \| G \|$	$6 \| G \|$

Notes: MUL is one multiplication operation on Group G, EXP is one exponentiation operation on Group G, λ – the security parameter, $| G |$ is the size (in number of bits) of a group element on G, n is number of blocks for the file, s is the number of elements in each block and k is number of blocks selected for verification, L is the number of files for batch verification. Given a system security parameter, the values of λ and $| G |$ are fixed.

5.1.1. Communication cost

In our proposed PCPOR scheme, the communication cost comes from the challenging message CM, the file tag τ and the proof response Prf in each verification task. The challenging message consists of a k-element subset K and a random number $r \in Z_{p}^{*}$ . As we discussed in Section 3.4, the user can randomly challenge 460 data blocks to assure at least 99% error detection probability when the fault tolerance rate of erasure coding is 1%. Once the error detection probability is determined according to the system requirement, the size of set K is fixed and can be considered as constant and the complexity of challenging message CM is $O (1)$ . The file tag τ consists of the file signature and a file name, which have $| G | bits$ and λ bits, respectively. In the proof response, all information are aggregated into 3 elements ψ, σ and y, where ψ and σ are two group elements and y the result of a polynomial. The total size of the proof response is $2 | G | + λ bits$ . Therefore, the total complexity of communication cost in our PCPOR scheme is $O (1)$ once the security parameter is chosen and the underlying group is selected. Thanks to our design of batch verification, the communication cost for a user to simultaneously check multiple files is only $L (| G | + λ) bits$ , where L is the number of files to be checked.

Now, we compare existing POR schemes [22,25,28] with our PCPOR scheme and summarize the result in Table 1. In Ref. [22], the complexity of challenging message and proof response are $O (1)$ and $O (s)$ respectively, where s is the number of elements in each block. Differently, our proposed PCPOR scheme only introduces a constant communication cost. Notably, Ref. [28] is also characterized by constant communication cost for single file verification. However, their scheme only supports private verification, which requires the data owner to perform all integrity verification by himself/herself. Differently, PCPOR allows any user or a TPA with public keys to conduct verification without the help the data owner. Therefore, the data owner can go off-line after having outsourced data to the cloud. In Ref. [25], constant communication cost, public verifiability and batch verification are achieved at the same time. However, their scheme only considers honest-but-curious cloud service provider. In comparison, our PCPOR considers malicious cloud servers and hence provides a higher level of security assurance.

5.1.2. Computational cost

As shown in Section 3.2, our PCPOR scheme is composed of 5 algorithms: KeyGen, Setup, Challenge, Prove and Verify. Among these algorithms, KeyGen and Setup are performed off-line by the data owner at the beginning of the system deployment. To generate the public key PK as well as the secret key SK for the system, the data owner performs $(s + 3)$ EXP operations using the KeyGen algorithm. During the Setup procedure for a file, $(s + 2) n$ EXP and $s n$ MUL operations are needed to generate authentication tags, where n is number of blocks in the file and each block has s elements. In order to check the integrity of a file, the user first performs the Challenge algorithm to verify the file tag with 2 Pairing operations and generate the challenging message CM by choosing a constant number of random numbers with negligible cost. On receiving the challenge message, the cloud server runs the Prove algorithm with $(k + s - 1)$ MUL and $(s + k)$ EXP operations. To conduct the Verify algorithm, the user spends 3 EXP, 3 MUL and 4 Pairing operations for the final verification. When integrity of L files need to be verified at the same time, our batch verification design aggregates these tasks into one round operation, which reduce the $O (L)$ EXP and $O (L)$ Pairing operations on user to constant ones.

We now compare our PCPOR scheme with existing POR schemes [22,25,28] and show the result in Table 1. Compared with the public POR scheme in Ref. [22], while our PCPOR will introduce $(s - 1)$ more MUL, s more EXP operations to the server side, it reduce the computational complexity on the user side from $O (s + k) MUL + O (s + k) EXP + O (1) Pairing$ to $O (1) EXP + O (1) MUL + O (1) Pairing$ . As the cloud server is always much more powerful than user devices (e.g. Amazon EC2 vs Mobile devices), the additional computational cost brought to server side in our PCPOR can be easily handled in practical scenarios. Compared with the private POR scheme [28], our PCPOR scheme’s has the same constant computational complexity. However, it is notable that the private POR scheme [28] requires the data owner to stay online and process all verifications tasks. On the contrary, in our PCPOR scheme we allow the TPA to perform periodical integrity auditing for data owners. Considering Ref. [25], which does not support malicious cloud servers and has EXP operations on user side linear to k, our PCPOR scheme is more secure and efficient.

5.1.3. Storage overhead

In this section, we first analyze the storage cost of our PCPOR scheme on both the user side and server side, and then compare it with existing POR schemes [22,25,28]. On the user side, our PCPOR scheme only requires the user to store partial public key $PK : {p, v, κ, spk, g, u}$ in order to perform the challenge and verification processes. Thus, the size of storage cost for each user is $6 | G | bits$ . Compared with existing POR schemes [22,25,28], which require users to store $3 | G | bits$ , $λ + | G | bits$ and $6 | G | bits$ respectively, our PCPOR scheme achieves the same storage overhead level. The storage overhead on the server side mainly comes from authentication tags for data blocks. In our PCPOR scheme, each authentication tag is a group element with $| G | bits$ , thus the total size for tags is $n | G | bits$ , which similar to Refs [12,22,28] storage overhead on servers as shown in Table 1.

5.1.4. Discussion

Integrity auditing performance of small files: As discussed in Section 3.4, users can randomly challenge 460 data blocks to achieve 99% error detection probability if the fault tolerance rate of erasure coding is 1%. For small files that have less than 460 data blocks, users can simply challenge all data blocks to achieve 100% error detection probability. As shown in Table 1, the communication cost and computational cost on users of PCPOR are independent to the number of challenging data blocks. Specifically, a user can verify the integrity of any small file with a proof information of 3 constant size elements (ψ, σ and y) and 3 EXP, 3 MUL and 4 Pairing operations. Considering Ref. [22] which also supports public verifiability and malicious cloud as our scheme, its communication cost and computational cost on users for small files (less then 460 data blocks) is proportional to the number of challenging data blocks, which can become tens (even hundreds) times that of our scheme.

Error detection probability versus system performance: As discussed in Section 3.4, in order to achieve high error detection probability, the number of challenging data blocks is mainly determined by the fault tolerance rate of erasure coding employed in our scheme (also true for other POR schemes [22,25,28]). In particular, the error detection probability $Pr [Detection] = 1 - {(1 - ETR)}^{k}$ , where ETR is the fault tolerance rate of the erasure coding and k is the number of challenging data blocks. To achieve the same error detection probability, it is obvious that the increase of ETR can reduce the number of challenging data blocks. By reducing the number of challenging data blocks, we can reduce the proof generation cost of cloud servers, since they can process fewer data blocks. However, the increase of ETR also increases the total number of data blocks for a file after erasure coding, and thus introducing additional storage overhead to cloud servers. Therefore, the error detection probability and storage overhead of cloud servers is a trade-off of PCPOR and existing POR schemes [22,25,28].

Block size versus system performance: Different cloud storage platforms require different block sizes for performance purpose. An example is Amazon Elastic Block Store [5], wherein block size is an important factor that can affect the its performance such as the IOPS (Input/Output Operations Per Second) rate [1,21]. As discussed in Sections 5.1.1 and 5.1.2, the integrity auditing performance of PCPOR is independent to the block size, which makes PCPOR can be efficiently applied to cloud storage platforms that prefer different block sizes. In contrast, the communication cost and computational cost on users in Ref. [22] increase linearly to the block size. Another performance influence of block size is the storage overhead of cloud servers. In PCPOR and Ref. [22], the storage overhead on cloud server is linear to the number of data blocks of a file, because the number authentication tags is equal to the number of data blocks. When increasing the size of each data block, we can reduce the number of blocks of a file and thus reducing the storage overhead of cloud servers. In Ref. [22], however, the storage overhead and integrity verification cost is a trade-off, since the reduce of storage overhead will increase the integrity verification cost. Differently, PCPOR can obviate this trade-off since the change of block size does not influence the performance of integrity verification of PCPOR.

5.2. Experimental results

To evaluate the performance of our proposed scheme, we implemented PCPOR on Amazon EC2 cloud using JAVA with JAVA Pairing-Based Cryptography Library (jPBC) [17]. Our data owner is a laptop running OS X with 2.4 GHz Inter Core i5 CPU and 16 GB memory. User side devices in our implementation include the same laptop as the data owner and a Motorola MB860 running Android 2.3 with Dual-core 1 GHz Cortex-A9 Processor and 1 GB RAM. The cloud server in our implementation is Amazon EC2 c3.8xlarge instances running Ubuntu Linux Server 14.04. We set the security parameter $λ = 160 bits$ , which achieve 1024-bits RSA equivalent security since our implementation is based on Elliptic Curve Cryptography (ECC). We set the number of challenging blocks set as 460 to achieve 99% error detection probability. We vary the file size from 16 MB to 512 MB and the data block size from 2 kB to 24 kB. The parameters chosen in our experiment are consistent with existing works [25]. In order to compare our scheme with Ref. [22] and Ref. [25] that support public integrity auditing, we also implemented them under the same experiment environment. All experimental results represent the mean of 20 trials.

Fig. 2.

(a) System setup time on different data size. (b) System setup time on different block size. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/JCS-150525.)

5.2.1. System setup

In order to evaluate the system setup performance of our PCPOR scheme in terms of file size and block size, we first vary the file size from 16 MB to 512 MB with block size fixed as 4 kB. As shown in Fig. 2(a), the system setup time in our scheme increases proportionally to file size, from 12.12 s to 398.92 s.4

⁴
To further enhance the performance of the system setup process, we can process it in parallel.

This is because the number of authentication tags we need to generate during the setup process increases linearly to the file size. We then change the block size from 2 kB to 24 kB with the file size fixed as 128 MB. Figure 2(b) shows that the system setup time is inversely proportional to the block size, since the total number data blocks that require authentication tags reduces linearly with the increase of block size. Compared with Ref. [22] and Ref. [25], Fig. 2(a) and (b) show our scheme achieves comparable system setup performance. Note that the system setup processes is one-time cost, which can be conducted off-line and will not influence the real-time integrity auditing performance.

5.2.2. Real-time checking – Single file

In this section, we first measure the integrity checking performance of a single file. In our experiment, we vary the file size from 16 MB to 512 MB with block size fixed as 4 kB, and the block size from 2 kB to 24 kB with the file size fixed as 128 MB. Figures 3 and 4 show that the verification on user side only costs about 150 ms on laptops and 2.11 s on mobile phones. Figure 5(a) and (b) show that the communication cost required in PCPOR is only around 2336 bytes and can be easily handled by today’s mobile devices and Internet. In addition, as shown in Figs 3, 4 and 5, both computational cost and communication cost on users are constant in PCPOR, i.e., the computational cost and communication cost on user side is independent to the file size and block size, which is consistent to our previous analysis in Sections 5.1.1 and 5.1.2. Compared with Ref. [22], Figs 3(a), 4(a) and 5(a) show its computational cost and communication cost on users is tens times that of PCPOR. Furthermore, Ref. [22] has computational cost and communication cost proportional to the block size as shown in Figs 3(b), 4(b) and 5(b), which can make their cost even hundreds times that of PCPOR. Considering Ref. [25], our experimental results show their scheme requires communication cost similar to PCPOR and computational cost about four times of PCPOR. Note that, Ref. [25] cannot support malicious cloud server as PCPOR and Ref. [22].

Fig. 3.

(a) Verification time on different data size on PC. (b) Verification time on different block size on PC. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/JCS-150525.)

Fig. 4.

(a) Verification time on different data size on mobile phone. (b) Verification time on different block size on mobile phone. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/JCS-150525.)

Fig. 5.

(a) Communication cost on different data size. (b) Communication cost on different block size. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/JCS-150525.)

We now evaluate the latency of PCPOR using a laptop connected to a 20 Mbps network and a mobile phone connected to 3G network. In particular, our communication latency includes the transmission of both challenging message and proof information between client devices and cloud servers. As shown in Fig. 6(a), the latency of PCPOR is about 376 ms for PC and 1281 ms for mobile phone, which is independent to the file size and data block size. Figure 6(a) also shows that the latency of Ref. [22] is about four times that of PCPOR for different data files with fixed block size as 4 kB. However, when increasing the block size, the latency of Ref. [22] also increases linearly. The latency of Ref. [25] is comparable with PCPOR as shown in Fig. 6(a) and (b).

Fig. 6.

(a) Communication latency on different data size. (b) Communication latency on different block size. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/JCS-150525.)

Fig. 7.

(a) Average computational cost on different task number. (b) Average communication cost on different task number. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/JCS-150525.)

Fig. 8.

Worst case example of corrupted files in batch verification.

5.2.3. Real-time checking – Multiple files

To show the benefit of our batch verification design for multiple files scenario, we change the number of files for integrity checking from 16 to 256. These files have two combinations: (a) all challenged data files are 40 MB; (b) challenged data files are randomly selected from files of 2 MB to 512 MB. Here, we measure the average integrity verification cost per file and compare our batch verification design with linear verification (process tasks one by one). As shown in Fig. 7(a), the average computational cost per file on user side in our batch verification design is inversely proportional to the task number. Figure 7(b) shows that our batch verification design also reduces the average communication cost from about 2336 bytes to 818 bytes compared with linear verification.

We now discuss the efficiency for users to sort out invalid responses in batch verification using recursive binary search approach. Suppose a user detects failures in batch verification, he wants to figure out which files among challenged files are corrupted. To sort out corrupted files by themselves when detects failures in the batch verification of L files, a user will split these L files into two groups and each group has $\frac{L}{2}$ files. Then the user checks the integrity of these two groups using batch verification. For a group of files that fails in the integrity verification, the user continues splitting it into two groups and verifying the integrity of new file groups. The user will repeatedly perform this process until finding all corrupted files. In our experiment, we aggregate 256 integrity tasks and corrupt γ-fraction of these files. We vary the value of γ between 0% to 100%, and conduct experiments for around 20 times at each point to obtain a mean verification time. We also use the worst case distribution of corrupted files for each corruption rate, i.e., any 2 corrupted files will not distribute in the same 2-files group. We give an example for batch verification of 16 files with 4 corrupted files as shown in Fig. 8. In this case, users need to recursively perform batch verifications for groups of 8 files, 4 files, 2 files and single file to find those corrupted files. In the worst case scenario for the batch verification of $L = 2^{x}$ files, T corrupted files and $γ ⩽ 50 %$ , the number of verifications needed to sort out all corrupted files is $V_{num} = \sum_{t = 0}^{k} 2^{t} + (x - k - 1) T + 2 T,$ (18) where $2^{k} < T ⩽ 2^{k + 1}$ and $s = T - 2^{k}$ . Since the corrupted file fraction $T = L * γ$ and L is fixed for each verification, the increment of γ also increases $V_{num}$ and thus introducing more verification time per file. When $γ = 50 %$ , our batch verification reaches the maximum cost, since all files will be checked based on binary search. As demonstrated in Fig. 9, when $γ < 18.75 %$ , our experimental results show that our batch verification can sort out all corrupted files in the worst case and outperforms the linear verification in terms of average verification time for each file. When the file corruption rate $γ ⩾ 50 %$ , the worst case of our batch verification requires about two times cost as that of the linear verification.

Fig. 9.

Verification time on different corrupted file fractions. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/JCS-150525.)

Fig. 10.

Possible regular case of corrupted files in batch verification.

Note that, in practice corrupted files might be grouped together within the batch as shown in Fig. 10. In this case, we can avoid searching some branches to sort out corrupted files and thus achieving better verification performance. In addition, in practice cloud users can also ask cloud providers to figure out which files are broken once they detect their challenged data are corrupted. In such a case, our batch verification cost on users become constant. However, the cost for linear verification approach remains the same (i.e., linear to the number of files) in all the cases.

5.2.4. Storage overhead

In our scheme, the storage overhead mainly comes from authentication tags generated for each data block. Figure 11 shows the storage overhead of PCPOR decreases linearly from 7.42% to 0.62% when we change the block size from 2 kB to 24 kB. This is because our authentication tag is a group element with fixed size as 152 bytes and the total number of blocks is inversely proportional to the block size. Figure 11 also shows that our scheme achieves comparable storage overhead with Ref. [22] and Ref. [25].

Fig. 11.

Storage overhead. (Colors are visible in the online version of the article; http://dx.doi.org/10.3233/JCS-150525.)

6. Applications of our authentication tag

In this section, we discuss applications of our proposed polynomial based authentication tag in other related fields. Particularly, we briefly introduce how to use our proposed authentication tag to support verifiable keyword search and verifiable SQL query.

6.1. Verifiable keyword search

Consider a text file F with a set of defined keywords $T = {w_{1}, \dots, w_{t}}$ , we can construct a polynomial for it as $f_{\vec{c}} (x) = (x - w_{1}) (x - w_{2}) \dots (x - w_{t})$ , where $\vec{c}$ is the coefficient vector. It is obvious that each keyword $w_{i}$ is the root of $f_{\vec{c}} (x)$ , i.e., given a keyword w, we accept it as a keyword of file F if we have $f_{\vec{c}} (w) = 0$ .

To setup the verifiable keyword search, the file owner first runs our KeyGen algorithm to generate the public keys and secret keys. Then, the owner generates an authentication tag for file F as $σ = {(u^{H (FileName)} \cdot g^{f_{\vec{c}} (α)})}^{ϵ}$ . The owner finally encrypts file F with standard encryption scheme (e.g., AES) as $Enc (F)$ , and outsources $Enc (F)$ , its authentication tag σ, its polynomial $f_{\vec{c}} (x)$ , and PK to the cloud server.

When a client wants to search files with a keyword w, the cloud server can perform a search without decrypting files. Specifically, the cloud server can check whether or not F contains keyword w by checking $f_{\vec{c}} (w) \overset{?}{=} 0$ . If F contains the keyword w, the cloud computes $π = g^{ψ (α)}$ , where $ψ (x) = \frac{f_{\vec{c}} (x) - f_{\vec{c}} (w)}{x - w}$ . The cloud then returns $Enc (F)$ back to the client together with the proof information as ${σ, π, f_{\vec{c}} (w)}$ . On receiving the proof information, the client can check $(u^{H (FileName)}, ν) \cdot (π, κ \cdot ν^{- w}) \overset{?}{=} e (σ, g) \cdot e {(ν, g)}^{f_{\vec{c}} (w)} .$ (19) If Eq. (19) holds, the client accepts the search result; otherwise, reject it.

6.2. Verifiable SQL query

Consider a table T of a relational database outsourced to cloud servers, which consists of n tuples $t_{i}$ , $1 ⩽ i ⩽ n$ and each tuple has m attributes $a_{j}$ , $1 ⩽ j ⩽ m$ . For an attribute in a tuple $t_{i}$ , we denote it as $t_{i} . a_{j}$ . T is ordered by attribute $a_{0}$ (it can also be ordered by any other attribute).

To setup verifiable SQL query, the database owner first runs our KeyGen algorithm to generate the public keys and secret keys. Then, for each tuple $t_{i}$ , the owner generates an authentication tag as $σ_{i} = {(u^{H (T_{name} ∥ i)} \cdot \prod_{j = 0}^{s - 1} g^{a_{j} α^{j + 2}})}^{ϵ} = {(u^{H (T_{name} ∥ i)} \cdot g^{f_{\vec{β_{i}}} (α)})}^{ϵ},$ (20) where $T_{name}$ is the name of table T and $\vec{β_{i}} = {0, 0, t_{i} . a_{1}, \dots, t_{i} . a_{m}}$ .

When a client wants to execute a verifiable selection query on T: $SELECT * FROM T WHERE L ⩽ a_{0} ⩽ U$ , he/she can send a random number r and the query statement together to the cloud server. The cloud server gets the query result set as R and generates $σ = \prod_{t_{i} \in R} σ_{i}^{r^{i}}$ . The cloud server also gets two tuples $t_{LI - 1}$ and $t_{UI + 1}$ and their corresponding authentication tags $σ_{LI - 1}$ and $σ_{UI + 1}$ , where LI and UI are the lower bound index and upper bound index of the query results. The cloud server then sends ${R, σ, t_{LI - 1}, t_{UI + 1}, σ_{LI - 1}, σ_{UI + 1}}$ to the client. On receiving the query result and the proof information, the client can run our Prove algorithms to generate y, ψ and verify the integrity of R, $t_{LI - 1}$ and $t_{UI + 1}$ with our Verify algorithm. The client can also verify the completeness of the query result by checking whether or not $t_{LI - 1} . a_{0} < L$ and $t_{UI + 1} . a_{0} > U$ . The client accept the query result if the result can pass both integrity verification and completeness verification.

Our authentication tag proposed in this work can also be utilized to support more complicated SQL queries, such as aggregated query.

7. Related work

To guarantee the integrity of data stored on cloud servers, in past years a body of techniques [4,7,14,18,22,24–28,30] have been proposed. In Ref. [18], Juels et al. first defined the POR model formally, which allows a storage server to convince a client that it can correctly retrieve a file previously stored at the server. In their proposed POR scheme, disguised blocks hidden among regular file blocks are utilized to detect data modified by the server. The number of challenges supported by this scheme is fixed a priori and thus limits its application. With similar purpose of Ref. [18], Ateniese et al. [4] proposed an efficient but weaker provable data possession model using homomorphic authentication tag. Nevertheless, an adversary was later introduced for this scheme by Shacham and Waters [22], which can answer a fraction of queries correctly with non-negligible probabilities. To omit the limitation in Juels et al.’s POR scheme [18], Shacham and Waters (SW) [22] proposed a fast public POR schemes based on the homomorphic linear authenticators [4], which enables the storage server to reduce the proof complexity by aggregating the authentication tags of individual file blocks. Compared with the scheme in Ref. [18], the communication cost for proof response in Ref. [22] is reduced to $\frac{1}{λ}$ of Ref. [18] and it can support unlimited number of challenges. At the same time, they first provide a security proof against arbitrary adversaries in the formal POR model. However, in SW scheme, the communication complexity for proof response is still linear to the number of elements in each erasure coded file block. Following SW schemes [22], several POR schemes are proposed recently to enhance it in terms of communication cost. In Ref. [12], by using a $(γ, δ)$ -hitter introduced by Goldreich [15], Dodis et al. reduce the size of challenging message to $\frac{1}{λ}$ of Ref. [22]. Nevertheless, no change is made to the response size in this scheme, which is still linear to the number of elements in a data block. To further improve POR scheme and overcome the limitations in previous ones, Xu et al. [28] proposed a private POR scheme with constant communication cost based on polynomial commitment techniques. However, their scheme requires the data owner to stay online and help users to perform all verification tasks. These onerous tasks will inevitably introduce heavy communication cost and computational cost to the data owner, especially when multiple users submit verification requests at the same time. Wang et al. [25] proposed a public integrity auditing scheme, which achieves constant communication cost and batch auditing. Nevertheless, this scheme only considers semi-honest cloud server and achieves lower security level compared with previous works [12,22,28]. In other related works, Ref. [24,26,27,30] support dynamic operations besides integrity checking. However, none of them achieve constant communication cost or computational cost even considering the integrity checking without dynamic operations.

8. Conclusion

Proofs of Retrievability (POR) and Proof of Data Possession (PDP) techniques enable individuals and organizations to verify the integrity of their outsourced data on an untrusted server (e.g., public cloud storage platform). While existing POR schemes have focused on various practical issues, they still have limitations including non-trivial (linear or quadratic) communication and computational cost, no support of public verifiability and malicious cloud server. In this work, we proposed the first public POR scheme with constant communication cost and computational cost. By uniquely designing a novel polynomial-based authentication tag, our scheme achieves constant communication size and constant computation operations on users at the same time. In addition, by supporting the public verifiability, our scheme releases the data owner from onerous periodical auditing tasks, which have to be processed by the data owner in previous private POR scheme. Batch integrity auditing is also supported by our scheme, which greatly promotes the efficiency of the integrity auditing for multiple files. We proved the security of our scheme based on the CDH problem, the t-SDH problem and the Static Diffie–Hellman Problem. Our thorough analysis and experimental results validate the efficiency and scalability of our scheme. Applications of our proposed authentication tag in other related fields are provided to show it can be used as a general tool.

Footnotes

Acknowledgments

This work was supported by the US National Science Foundation award CNS-1338102 and Amazon AWS in Education Research Grant.

References

[1]Amazon EBS Volume Performance, available at: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSPerformance.html.

[2]Amazon Forum, Major outage for Amazon S3 and EC2, available at: https://forums.aws.amazon.com/thread.jspa?threadID=19714&start=15&tstart=0.

[3]Amazon Web Service, Summary of the Amazon EC2 and Amazon RDS service disruption in the US East region, available at: http://aws.amazon.com/message/65648/.

[4]

Ateniese,

Burns,

Curtmola,

Herring,

Kissner,

Peterson and

Song, Provable data possession at untrusted stores, in: Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS’07, ACM, New York, NY, USA, 2007, pp. 598–609.

[5]AWS-EBS, available at: http://aws.amazon.com/ebs/.

[6]

Boneh and

Boyen, Short signatures without random oracles, in: Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques, EUROCRYPT’04, Interlaken, Switzerland, 2004, pp. 56–73.

[7]

K.D.

Bowers,

Juels and

Oprea, Proofs of retrievability: Theory and implementation, in: Proceedings of the 2009 ACM Workshop on Cloud Computing Security, CCSW’09, ACM, New York, NY, USA, 2009, pp. 43–54.

[8]

D.R.L.

Brown and

R.P.

Gallant, The static Diffie–Hellman problem, 2004, available at: Cryptology ePrint Archive: Report 2004/306.

[9]Business Insider, Amazon’s cloud crash disaster permanently destroyed many customers’ data, available at: http://www.businessinsider.com/amazon-lost-data-2011-4.

10.

[10]

Camenisch,

Hohenberger and

M.Ø.

Pedersen, Batch verification of short signatures, in: Advances in Cryptology – EUROCRYPT 2007, Lecture Notes in Computer Science, Vol. 4515, Springer, Heidelberg, 2007, pp. 246–263.

11.

[11]

Diffie and

Hellman, New directions in cryptography, IEEE Transactions on Information Theory 22(6) (1976), 644–654.

12.

[12]

Dodis,

Vadhan and

Wichs, Proofs of retrievability via hardness amplification, in: Proceedings of the 6th Theory of Cryptography Conference on Theory of Cryptography, TCC’09, Heidelberg, 2009, pp. 109–127.

13.

[13]Dropbox, Dropbox forums on data loss topic, available at: https://www.dropboxforum.com/hc/en-us/search?utf8=%E2%9C%93&query=data+loss&commit=Search.

14.

[14]

Erway,

Küpçü,

Papamanthou and

Tamassia, Dynamic provable data possession, in: Proceedings of the 16th ACM Conference on Computer and Communications Security, CCS’09, ACM, New York, NY, USA, 2009, pp. 213–222.

15.

[15]

Goldreich, A sample of samplers – A computational perspective on sampling (survey), Electronic Colloquium on Computational Complexity (ECCC) 4(20) (1997).

16.

[16]NIST, Secure Hashing, available at: http://csrc.nist.gov/groups/ST/toolkit/secure_hashing.html.

17.

[17]jPBC, available at: http://gas.dia.unisa.it/projects/jpbc/.

18.

[18]

Juels and

B.S.

KaliskiJr., PORs: Proofs of retrievability for large files, in: Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS’07, ACM, New York, NY, USA, 2007, pp. 584–597.

19.

[19]

Kate,

G.M.

Zaverucha and

Goldberg, Constant-size commitments to polynomials and their applications, in: Proceedings of the 16th Annual International Conference on the Theory and Application of Cryptology and Information Security, ASIACRYPT’10, Singapore, 2010, pp. 177–194.

20.

[20]NIST, Cloud management broker, available at: http://www.nist.gov/itl/cloud/6_1.cfm.

21.

[21]Performance tuning Amazon elastic block store – IO block size, available at: http://harish11g.blogspot.com/2013/04/Understanding-Amazon-Elastic-block-store-Performance-Tuning-IO-Block-Size.html.

22.

[22]

Shacham and

Waters, Compact proofs of retrievability, in: Proceedings of the 14th International Conference on the Theory and Application of Cryptology and Information Security: Advances in Cryptology, ASIACRYPT’08, May 2008, Springer, Berlin, Heidelberg, 2008, pp. 90–107.

23.

[23]

Shoup, A Computational Introduction to Number Theory and Algebra, Cambridge Univ. Press, New York, NY, USA, 2005.

24.

[24]

Wang,

Li and

Li, Public auditing for shared data with efficient user revocation in the cloud, in: 2013 Proceedings IEEE INFOCOM (INFOCOM’2013), Turin, Italy, April 2013, 2013, pp. 2750–2758.

25.

[25]

Wang,

S.S.M.

Chow,

Wang,

Ren and

Lou, Privacy-preserving public auditing for secure cloud storage, IEEE Transactions on Computers 62(2) (2013), 362–375.

26.

[26]

Wang,

Ren and

Lou, Ensuring data storage security in cloud computing, in: Proceedings of the 17th IEEE International Workshop on Quality of Service, IWQoS’09, Charleston, SC, July 2009, 2009.

27.

[27]

Wang,

Li,

Ren and

Lou, Enabling public verifiability and data dynamics for storage security in cloud computing, in: Proceedings of the 14th European Conference on Research in Computer Security, ESORICS’09, Springer, Berlin, Heidelberg, 2009, pp. 355–370.

28.

[28]

Xu and

E.-C.

Chang, Towards efficient proofs of retrievability, in: Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security, ASIACCS’12, Seoul, Korea, 2012.

29.

[29]

Yuan and

Yu, Proofs of retrievability with public verifiability and constant communication cost in cloud, in: Proceedings of the 2013 International Workshop on Security in Cloud Computing, Cloud Computing’13, ACM, New York, NY, USA, 2013, pp. 19–26.

30.

[30]

Zhu,

Wang,

Hu,

G.J.

Ahn,

Hu and

S.S.

Yau, Dynamic audit services for integrity verification of outsourced storages in clouds, in: Proceedings of the 2011 ACM Symposium on Applied Computing, SAC’11, ACM, New York, NY, USA, 2011, pp. 1550–1557.

PCPOR: Public and constant-cost proofs of retrievability in cloud 1

Abstract

Keywords

1. Introduction

2.1. System model

2 W.l.o.g., data are assumed to be stored data files.

3. Construction of PCPOR

3.1. Notations and technique preliminaries

3.2. Single file verification

4. Security analysis

4.1. Assumptions

Definition 4.1 (Computational Diffie–Hellman (CDH) problem [11]).

Definition 4.2 (Static Diffie–Hellman problem [8]).

Definition 4.3 (t-Strong Diffie–Hellman (t-SDH) problem [6]).

4.2. Security proof

Case 2 ( y ≠ y ′ ).

5.1. Numerical analysis

3 When the operation is on the elliptic curve, EXP means scalar multiplication operation and MUL means one point addition operation.

5.1.2. Computational cost

5.1.3. Storage overhead

5.1.4. Discussion

5.2. Experimental results

4 To further enhance the performance of the system setup process, we can process it in parallel.

6.1. Verifiable keyword search

8. Conclusion

Footnotes

Acknowledgments

References

²
W.l.o.g., data are assumed to be stored data files.

Case 2 ( $y \neq y^{'}$ ).

³
When the operation is on the elliptic curve, EXP means scalar multiplication operation and MUL means one point addition operation.

⁴
To further enhance the performance of the system setup process, we can process it in parallel.