Abstract
For data storage outsourcing services, it is important to allow users to efficiently and securely verify that cloud storage servers store their data correctly. To address this issue, a number of Proof of Retrievability (POR) and Proof of Data Possession (PDP) schemes have been proposed wherein servers must prove to a verifier that data are stored correctly. While existing POR and PDP schemes offer decent solutions addressing various practical issues, they either have non-trivial (linear or quadratic) communication and computational complexity, or only consider private verification. In this paper, we propose the first POR scheme with public verifiability, constant communication and computational costs on users. In our scheme, messages exchanged between cloud servers and users are composed of a constant number of group elements and random numbers; computational tasks required on users are also constant; batch auditing of multiple tasks is also efficiently supported. We achieved these by a unique design based on our novel polynomial-based authenticators. Extensive experiments on Amazon EC2 cloud and different client devices (contemporary and mobile devices) show that our design allows a user to audit the integrity of a file of any size with a constant computational cost of 150 ms on PC (2.11 s on mobile device) and a communication cost of 2.34 kB for 99% error detection probability when employing an erasure coding with 1% fault tolerance rate. We prove the security of our scheme based on the Computational Diffie–Hellman problem, the t-Strong Diffie–Hellman problem and the Static Diffie–Hellman problem.
Introduction
The rapid development of cloud technologies is increasingly attracting customers including both organizations and individuals. Currently, millions of users are using cloud storage services including Amazon S3, Microsoft SkyDrive, Google Cloud Storage, iCloud and Dropbox. Despite the proliferation of cloud storage service, it also raises security concerns since outsourcing data to the cloud makes the data owner lose physical control over the storage sites. One significant concern, among the many, is data integrity, i.e., whether or not the cloud server indeed stores its users’ data correctly. Recently, a number of data loss events have been reported for well-known storage providers [2,3,9,13], including Amazon S3 and Dropbox. In addition, we observed that there have been large discrepancies between the numbers of data corruption events reported by users and those acknowledged by service providers [13], which also causes users to doubt whether or not their data on cloud are truly intact. To ensure users’ confidence in the integrity of data remotely stored in the cloud, a reliable proof-of-retrievability (POR) [18] and/or Proof of Data Possession (PDP) [4] system is desirable. Specifically, in a POR or PDP system the data storage server must prove that it indeed stores users’ data correctly, and the user shall be able to verify whether or not the proof generated by the server is valid. In practice, the performance of a POR or PDP system is mainly impacted by the following factors: (1) the communication cost between server and user during the verification process; (2) the computational cost introduced to the user in each verification; (3) the storage overhead on server side and user side; (4) the way of verification, i.e., private verifiability or public verifiability. Specifically, private verifiability only allows the data owner with system secret keys to check data integrity, and public verifiability enables any entities with public keys to check data integrity without the help of the data owner. In today’s cloud computing, third party Cloud Management Broker (CMB) [20] plays an important intermediary role between cloud computing providers and cloud users. As cloud users may not have in depth of knowledge to deploy and manage their data and services in cloud computing, CMB now is employed by many users to handle their data and services in cloud. In order to continually guarantee the integrity of outsourced data in cloud, integrity verification should be performed periodically. In such a scenario, public verifiability is required if cloud users also want to delegate these periodical integrity verification tasks to a CMB. With public verifiability, a CMB can perform integrity verification on behalf of cloud users without knowing any secrets of them.
Toward designing a practical POR/PDP system, in past years a body of techniques [4,7,14,18,22,24,25,27,28,30] have been proposed. While these schemes offer flexible verification – private and public verifications, most of them have communication cost linear to the data block size and computational cost on the user side linear to the number of challenging blocks. With an increasing number of mobile users, who manage their cloud data through mobile apps (e.g., iCloud, iAWS, etc.) and have constrained bandwidth and computational resources (e.g., mobile phones with limited data plan), such communication complexity and computational complexity represent considerable cost to the system and can even make the POR system unaffordable. To enhance existing schemes, Xu et al. [28] constructed a POR scheme with constant communication cost by utilizing a polynomial commitment technique. The main drawback of this scheme, however, is that it only supports private verification, which means that all verifications must be performed by the data owner himself/herself. To reduce the computational cost for users, Wang et al. [25] proposed a public integrity auditing scheme with batch verification. However, Ref. [25] assumes cloud servers to be semi-honest, i.e., they will honestly follow the protocol. Malicious (misbehaving or compromised) cloud servers [22,28] are not considered. To our best knowledge, there is no existing secure POR/PDP solution that offers public verifiability with constant communication cost and computational cost.
In this work, we propose a public POR scheme with constant communication cost and computational cost, namely PCPOR, under the assumption that cloud servers can be misbehaving or compromised. Thanks to our novel design of polynomial-based authentication tags, PCPOR makes the cloud server to consolidate proof information for integrity verification into three constant size elements. To verify these constant size proof information, PCPOR requires just a constant number of computational operations for the verifier, no matter how large the audited file is and what the block size is in the audited file. By supporting public verifiability, PCPOR allows any user or third-party auditor, e.g., cloud management broker, to perform the verification process using public keys without contacting the data owner. PCPOR also enables aggregation of integrity auditing operations for multiple tasks (files) through our batch integrity auditing technique, which further promotes its auditing efficiency. Notably, PCPOR achieves these with a storage overhead comparable to existing schemes. Thorough numerical analysis and extensive experimental results on Amazon EC2 cloud and various client devices (PC and mobile devices) demonstrate the scalability and efficiency of PCPOR. Based on the Computational Diffie–Hellman (CDH) problem [11], the t-Strong Diffie–Hellman (t-SDH) problem [6] and the Static Diffie–Hellman problem [8], we prove the security of our PCPOR scheme.
Our main contributions can be summarized as further.
We construct the first secure and efficient public POR scheme with constant communication cost and computational cost considering malicious cloud servers; the proposed scheme can also efficiently handle multiple verification requests with batch operations. Our proposed scheme obviates the need for a trade-off between communication cost and storage cost as claimed in the public POR scheme [22]. Our design of polynomial based authentication tag can be used as an independent solution for other related fields, such as verifiable SQL query, verifiable key word search, etc.
The rest of this paper is organized as follows. Section 2 describes the system model and security model; We introduce the detailed construction of PCPOR in Section 3, which is followed by the security proof in Section 4; We evaluate the performance of our scheme in Section 5; Section 6 introduces applications of our proposed authentication tag in other related fields; In Section 7, we review and discuss related works; We conclude the paper in Section 8.

System model. (Colors are visible in the online version of the article;
System model
In this work, we follow the POR model that is adopted by most existing POR schemes [7,18,22,28]. As shown in Fig. 1, we consider three participating entities: Data owner, User and Cloud server. Data owner has a collection of data and stores them on cloud servers together with corresponding authentication tags. The owner may share the stored data with a group of users who may perform integrity check upon each access. Note that data integrity check can also be conducted by a Third-Party Auditor (TPA) who has the necessary resource/expertise for periodical integrity auditing. In practice, a cloud management broker [20] can perform the role of as TPA. Since the proposed scheme is intended for public verifiability, we do not differentiate ordinary users from a TPA. To check the integrity of data,2
W.l.o.g., data are assumed to be stored data files.
KeyGen: Given a selected security parameter λ, the randomized KeyGen algorithm outputs the system public key and secret key as
Setup: Given a file
Challenge: Given the public key PK, a user generates a challenging message CM and sends it to the cloud server.
Prove: Given the public key PK, authentication tags
Verify: Given the public key PK and the proof information Prf, the Verify algorithm checks the data integrity and outputs result as either Accept or Reject.
We consider the storage server as untrusted and potentially malicious, which is consistent with existing POR schemes [18,22,28]. The cloud server can modify or drop users’ data to save cost. In terms of security, we have correctness and soundness as our design goals as defined by POR schemes. For the correctness, we require that our Verify algorithm accepts a valid proof generated from valid keys, authentic files and tags. For the soundness, if any malicious cloud server does not store a file F correctly, it only has negligible probability to generate proof information that can pass our Verify algorithm.
Construction of PCPOR
Notations and technique preliminaries
Let
Single file verification
In this section, we first introduce our PCPOR scheme for the integrity verification of a single file. We then show how to efficiently support batch verification of multiple files in Section 3.3.
Correctness: Considering a cloud server who honestly responds to the challenge with a
As the data owner can store a large number of files on cloud servers, when a user wants to verify the integrity of these files, it is inefficient to process these verification requests one by one. Specifically, given integrity verification requests for L different files
Correctness: We validate the correctness of our batch verification construction based on Eq. (8) as:
In this section, we discuss the error detection probability of our PCPOR scheme and the size of set K in our scheme. As mentioned in Section 3.2, instead of challenging all data blocks of a file to check its integrity, we can randomly choose k blocks as set K to save communication and computational costs while remaining an acceptable level of error detection probability, which is proved in Refs [4,14,25]. Specifically, when we set fault tolerance rate of erasure coding as
Security analysis
In this section, we first introduce assumptions used in our proof, and then prove the security of our scheme based on these assumptions.
Assumptions
(Computational Diffie–Hellman (CDH) problem [11]).
Let
(Static Diffie–Hellman problem [8]).
Let
(t-Strong Diffie–Hellman (t-SDH) problem [6]).
Let
Security proof
If
By following the idea in Ref. [19], we prove Theorem 4.4 as further.
Suppose there exists a probabilistic polynomial time adversary Adv that can forge
Based on the proof of Theorem 4.4, we are going to prove that the probability for a probabilistic polynomial time adversary Adv to bypass our verification algorithm (i.e., to break the soundness of PCPOR) is negligible. In our scheme, the Adv can interact with the data owner and make a polynomial number of oracle queries to authentication tags
If there exist a probabilistic polynomial time adversary Adv that can spoof the verifier using invalid proof information
Suppose a probabilistic polynomial time adversary Adv can use the cheating prover If Here, we denote ψ as If Based on our above analysis, we proved that there is no Adv that can use invalid proof information and bypass the verification in our PCPOR scheme with non-negligible probability δ. Theorem 4.5 is proved. □ (
Numerical analysis
In this section, we numerically evaluate the performance of our proposed PCPOR scheme in terms of communication cost, computational cost and storage overhead. We compare our PCPOR scheme with existing POR schemes [22,25,28] and summarize the result in Table 1. For simplicity, in the following part of this paper, we denote the complexity of one multiplication operation on Group G as MUL and that of one exponentiation operation on Group G as EXP.3
When the operation is on the elliptic curve, EXP means scalar multiplication operation and MUL means one point addition operation.
Complexity summary
Notes: MUL is one multiplication operation on Group G, EXP is one exponentiation operation on Group G, λ – the security parameter,
In our proposed PCPOR scheme, the communication cost comes from the challenging message CM, the file tag τ and the proof response Prf in each verification task. The challenging message consists of a k-element subset K and a random number
Now, we compare existing POR schemes [22,25,28] with our PCPOR scheme and summarize the result in Table 1. In Ref. [22], the complexity of challenging message and proof response are
Computational cost
As shown in Section 3.2, our PCPOR scheme is composed of 5 algorithms: KeyGen, Setup, Challenge, Prove and Verify. Among these algorithms, KeyGen and Setup are performed off-line by the data owner at the beginning of the system deployment. To generate the public key PK as well as the secret key SK for the system, the data owner performs
We now compare our PCPOR scheme with existing POR schemes [22,25,28] and show the result in Table 1. Compared with the public POR scheme in Ref. [22], while our PCPOR will introduce
Storage overhead
In this section, we first analyze the storage cost of our PCPOR scheme on both the user side and server side, and then compare it with existing POR schemes [22,25,28]. On the user side, our PCPOR scheme only requires the user to store partial public key
Discussion
Integrity auditing performance of small files: As discussed in Section 3.4, users can randomly challenge 460 data blocks to achieve 99% error detection probability if the fault tolerance rate of erasure coding is 1%. For small files that have less than 460 data blocks, users can simply challenge all data blocks to achieve 100% error detection probability. As shown in Table 1, the communication cost and computational cost on users of PCPOR are independent to the number of challenging data blocks. Specifically, a user can verify the integrity of any small file with a proof information of 3 constant size elements (ψ, σ and y) and 3 EXP, 3 MUL and 4 Pairing operations. Considering Ref. [22] which also supports public verifiability and malicious cloud as our scheme, its communication cost and computational cost on users for small files (less then 460 data blocks) is proportional to the number of challenging data blocks, which can become tens (even hundreds) times that of our scheme.
Error detection probability versus system performance: As discussed in Section 3.4, in order to achieve high error detection probability, the number of challenging data blocks is mainly determined by the fault tolerance rate of erasure coding employed in our scheme (also true for other POR schemes [22,25,28]). In particular, the error detection probability
Block size versus system performance: Different cloud storage platforms require different block sizes for performance purpose. An example is Amazon Elastic Block Store [5], wherein block size is an important factor that can affect the its performance such as the IOPS (Input/Output Operations Per Second) rate [1,21]. As discussed in Sections 5.1.1 and 5.1.2, the integrity auditing performance of PCPOR is independent to the block size, which makes PCPOR can be efficiently applied to cloud storage platforms that prefer different block sizes. In contrast, the communication cost and computational cost on users in Ref. [22] increase linearly to the block size. Another performance influence of block size is the storage overhead of cloud servers. In PCPOR and Ref. [22], the storage overhead on cloud server is linear to the number of data blocks of a file, because the number authentication tags is equal to the number of data blocks. When increasing the size of each data block, we can reduce the number of blocks of a file and thus reducing the storage overhead of cloud servers. In Ref. [22], however, the storage overhead and integrity verification cost is a trade-off, since the reduce of storage overhead will increase the integrity verification cost. Differently, PCPOR can obviate this trade-off since the change of block size does not influence the performance of integrity verification of PCPOR.
Experimental results
To evaluate the performance of our proposed scheme, we implemented PCPOR on Amazon EC2 cloud using JAVA with JAVA Pairing-Based Cryptography Library (jPBC) [17]. Our data owner is a laptop running OS X with 2.4 GHz Inter Core i5 CPU and 16 GB memory. User side devices in our implementation include the same laptop as the data owner and a Motorola MB860 running Android 2.3 with Dual-core 1 GHz Cortex-A9 Processor and 1 GB RAM. The cloud server in our implementation is Amazon EC2 c3.8xlarge instances running Ubuntu Linux Server 14.04. We set the security parameter

(a) System setup time on different data size. (b) System setup time on different block size. (Colors are visible in the online version of the article;
In order to evaluate the system setup performance of our PCPOR scheme in terms of file size and block size, we first vary the file size from 16 MB to 512 MB with block size fixed as 4 kB. As shown in Fig. 2(a), the system setup time in our scheme increases proportionally to file size, from 12.12 s to 398.92 s.4
To further enhance the performance of the system setup process, we can process it in parallel.
In this section, we first measure the integrity checking performance of a single file. In our experiment, we vary the file size from 16 MB to 512 MB with block size fixed as 4 kB, and the block size from 2 kB to 24 kB with the file size fixed as 128 MB. Figures 3 and 4 show that the verification on user side only costs about 150 ms on laptops and 2.11 s on mobile phones. Figure 5(a) and (b) show that the communication cost required in PCPOR is only around 2336 bytes and can be easily handled by today’s mobile devices and Internet. In addition, as shown in Figs 3, 4 and 5, both computational cost and communication cost on users are constant in PCPOR, i.e., the computational cost and communication cost on user side is independent to the file size and block size, which is consistent to our previous analysis in Sections 5.1.1 and 5.1.2. Compared with Ref. [22], Figs 3(a), 4(a) and 5(a) show its computational cost and communication cost on users is tens times that of PCPOR. Furthermore, Ref. [22] has computational cost and communication cost proportional to the block size as shown in Figs 3(b), 4(b) and 5(b), which can make their cost even hundreds times that of PCPOR. Considering Ref. [25], our experimental results show their scheme requires communication cost similar to PCPOR and computational cost about four times of PCPOR. Note that, Ref. [25] cannot support malicious cloud server as PCPOR and Ref. [22].

(a) Verification time on different data size on PC. (b) Verification time on different block size on PC. (Colors are visible in the online version of the article;

(a) Verification time on different data size on mobile phone. (b) Verification time on different block size on mobile phone. (Colors are visible in the online version of the article;

(a) Communication cost on different data size. (b) Communication cost on different block size. (Colors are visible in the online version of the article;
We now evaluate the latency of PCPOR using a laptop connected to a 20 Mbps network and a mobile phone connected to 3G network. In particular, our communication latency includes the transmission of both challenging message and proof information between client devices and cloud servers. As shown in Fig. 6(a), the latency of PCPOR is about 376 ms for PC and 1281 ms for mobile phone, which is independent to the file size and data block size. Figure 6(a) also shows that the latency of Ref. [22] is about four times that of PCPOR for different data files with fixed block size as 4 kB. However, when increasing the block size, the latency of Ref. [22] also increases linearly. The latency of Ref. [25] is comparable with PCPOR as shown in Fig. 6(a) and (b).

(a) Communication latency on different data size. (b) Communication latency on different block size. (Colors are visible in the online version of the article;

(a) Average computational cost on different task number. (b) Average communication cost on different task number. (Colors are visible in the online version of the article;

Worst case example of corrupted files in batch verification.
To show the benefit of our batch verification design for multiple files scenario, we change the number of files for integrity checking from 16 to 256. These files have two combinations: (a) all challenged data files are 40 MB; (b) challenged data files are randomly selected from files of 2 MB to 512 MB. Here, we measure the average integrity verification cost per file and compare our batch verification design with linear verification (process tasks one by one). As shown in Fig. 7(a), the average computational cost per file on user side in our batch verification design is inversely proportional to the task number. Figure 7(b) shows that our batch verification design also reduces the average communication cost from about 2336 bytes to 818 bytes compared with linear verification.
We now discuss the efficiency for users to sort out invalid responses in batch verification using recursive binary search approach. Suppose a user detects failures in batch verification, he wants to figure out which files among challenged files are corrupted. To sort out corrupted files by themselves when detects failures in the batch verification of L files, a user will split these L files into two groups and each group has

Verification time on different corrupted file fractions. (Colors are visible in the online version of the article;

Possible regular case of corrupted files in batch verification.
Note that, in practice corrupted files might be grouped together within the batch as shown in Fig. 10. In this case, we can avoid searching some branches to sort out corrupted files and thus achieving better verification performance. In addition, in practice cloud users can also ask cloud providers to figure out which files are broken once they detect their challenged data are corrupted. In such a case, our batch verification cost on users become constant. However, the cost for linear verification approach remains the same (i.e., linear to the number of files) in all the cases.
In our scheme, the storage overhead mainly comes from authentication tags generated for each data block. Figure 11 shows the storage overhead of PCPOR decreases linearly from 7.42% to 0.62% when we change the block size from 2 kB to 24 kB. This is because our authentication tag is a group element with fixed size as 152 bytes and the total number of blocks is inversely proportional to the block size. Figure 11 also shows that our scheme achieves comparable storage overhead with Ref. [22] and Ref. [25].

Storage overhead. (Colors are visible in the online version of the article;
In this section, we discuss applications of our proposed polynomial based authentication tag in other related fields. Particularly, we briefly introduce how to use our proposed authentication tag to support verifiable keyword search and verifiable SQL query.
Verifiable keyword search
Consider a text file F with a set of defined keywords
To setup the verifiable keyword search, the file owner first runs our KeyGen algorithm to generate the public keys and secret keys. Then, the owner generates an authentication tag for file F as
When a client wants to search files with a keyword w, the cloud server can perform a search without decrypting files. Specifically, the cloud server can check whether or not F contains keyword w by checking
Consider a table T of a relational database outsourced to cloud servers, which consists of n tuples
To setup verifiable SQL query, the database owner first runs our KeyGen algorithm to generate the public keys and secret keys. Then, for each tuple
When a client wants to execute a verifiable selection query on T:
Our authentication tag proposed in this work can also be utilized to support more complicated SQL queries, such as aggregated query.
To guarantee the integrity of data stored on cloud servers, in past years a body of techniques [4,7,14,18,22,24–28,30] have been proposed. In Ref. [18], Juels et al. first defined the POR model formally, which allows a storage server to convince a client that it can correctly retrieve a file previously stored at the server. In their proposed POR scheme, disguised blocks hidden among regular file blocks are utilized to detect data modified by the server. The number of challenges supported by this scheme is fixed a priori and thus limits its application. With similar purpose of Ref. [18], Ateniese et al. [4] proposed an efficient but weaker provable data possession model using homomorphic authentication tag. Nevertheless, an adversary was later introduced for this scheme by Shacham and Waters [22], which can answer a fraction of queries correctly with non-negligible probabilities. To omit the limitation in Juels et al.’s POR scheme [18], Shacham and Waters (SW) [22] proposed a fast public POR schemes based on the homomorphic linear authenticators [4], which enables the storage server to reduce the proof complexity by aggregating the authentication tags of individual file blocks. Compared with the scheme in Ref. [18], the communication cost for proof response in Ref. [22] is reduced to
Conclusion
Proofs of Retrievability (POR) and Proof of Data Possession (PDP) techniques enable individuals and organizations to verify the integrity of their outsourced data on an untrusted server (e.g., public cloud storage platform). While existing POR schemes have focused on various practical issues, they still have limitations including non-trivial (linear or quadratic) communication and computational cost, no support of public verifiability and malicious cloud server. In this work, we proposed the first public POR scheme with constant communication cost and computational cost. By uniquely designing a novel polynomial-based authentication tag, our scheme achieves constant communication size and constant computation operations on users at the same time. In addition, by supporting the public verifiability, our scheme releases the data owner from onerous periodical auditing tasks, which have to be processed by the data owner in previous private POR scheme. Batch integrity auditing is also supported by our scheme, which greatly promotes the efficiency of the integrity auditing for multiple files. We proved the security of our scheme based on the CDH problem, the t-SDH problem and the Static Diffie–Hellman Problem. Our thorough analysis and experimental results validate the efficiency and scalability of our scheme. Applications of our proposed authentication tag in other related fields are provided to show it can be used as a general tool.
Footnotes
Acknowledgments
This work was supported by the US National Science Foundation award CNS-1338102 and Amazon AWS in Education Research Grant.
