Abstract
In cloud storage, a data owner can store his data in a cloud and authorize some users to access his data. Since the data are outsourced to the cloud, the authorized users should check the data to assure integrity. The data owner does not allow unauthorized users to check integrity of his data. There are many private and public integrity check schemes. Nevertheless, this paper concerns the verification key spread problem. Delegated integrity check deals with the verification key spread problem and provides effective management of verifiers. A data owner can delegate a verifier to check his data and revoke the right of the verifier later. The verifier cannot re-delegate his integrity check capability to someone else. Delegated integrity check guarantees that only the delegated verifier can check integrity of data.
This paper provides the model of delegated integrity check, an application scenario of personal health records, and two delegated integrity check schemes for hierarchical cloud data. The first scheme allows a verifier to check data possession of a storage server. The second scheme allows a verifier to check data retrievability from a storage server. The schemes achieve proof unforgeability, proof indistinguishability and delegation key unforgeability in the random oracle model.
Introduction
By cloud storage, a user can store his data into a cloud for convenient data access, data sharing, etc. Nevertheless, security issues are the main concern for cloud data. When outsourcing the data to a cloud service provider, the user is concerned about data confidentiality, availability, integrity, etc. There have been many solutions to these issues. For example, private and public key encryption schemes protect data confidentiality, fault tolerance and data replica mechanisms enhance data availability, and private and public integrity check schemes assure data integrity.
Although private and public integrity check schemes are suitable for wide applications, there still are situations that need more delicate types of integrity check schemes. Consider that a user stores his data in a cloud and shares the data with his friends via the cloud. His friends should check the data to assure integrity because the cloud service provider could modify the data without the user’s permission. The user allows only his friends to check his data, and his friends are not allowed to delegate the integrity check capability to someone else. Neither private nor public integrity check schemes satisfy this type of requirement. Public integrity check schemes allow everyone with the public key to check data. The user cannot restrict the verifiers to his friends because he cannot control the spread of his public key. On the other hand, private integrity check schemes allow someone with the secret key to generate integrity tags and then check data. If the user provides his secret key to his friends, his friends can delegate someone else to check the data by providing the secret key. Furthermore, the friends can modify the data without the user’s awareness by generating new integrity tags. The failure of private and public integrity check schemes in this application scenario is due to the verification key spread problem.
The user needs a new type of integrity check schemes to satisfy the above-discussed scenario. This paper considers the delegated integrity check model. Figure 1 illustrates the system model. User

System model of delegated integrity check. Delegated integrity check is an analogous notion to proxy re-encryption.
Comparison with related work
Notes:
The properties of delegated integrity check are summarized as follows:
Effective management of verifiers. A user can delegate a verifier to check his data by generating the delegation key. The storage server can use the user’s and the verifier’s public keys to verify the delegation key. The verifier cannot re-delegate this integrity check capability to someone else unless he is willing to reveal his secret key. The user can revoke the verifier later by revealing the verifier’s key token.
No prior knowledge of verifiers for integrity tags. A user can generate integrity tags of his data without any information about verifiers. The integrity tags are independent of the verifiers. The user does not need to decide whom he will delegate in advance. He can delegate a verifier at any time and revoke the verifier later. Delegated integrity check supports a dynamic group of verifiers efficiently.
Efficient delegation for users and verifiers. In the delegation phase, a verifier generates his key token to a user (data owner). The user verifies the key token and generates the delegation key to the storage server. Neither of them need to access the stored data. The computation complexity is independent of the amount of stored data and the number of delegated verifiers. Delegated integrity check achieves large-scale delegation.
Efficient proof transformation for storage servers. A storage server can transform an integrity proof with a delegation key just in time. The computation complexity is independent of the amount of stored data and the number of delegated verifiers.
There are some related works, where a verifier can check data on behalf of a user. Table 1 provides the comparison with these works. In the proxy provable data possession (PPDP) model [51], user
This paper is an extension of the work on the delegable provable data possession (delegable PDP) model [44]. Delegated integrity check is a stronger model, which provides users with the capability to revoke delegated verifiers. This paper proposes the formal model of delegated integrity check and provides an application scenario of personal health records in Section 4. This paper proposes two delegated integrity check schemes for hierarchical cloud data. The scheme in Section 5 allows a user to check data possession of a storage server. The scheme in Section 6 allows a user to check data retrievability from a storage server. The schemes achieve proof unforgeability, proof indistinguishability and delegation key unforgeability in the random oracle model. Some related issues of delegated integrity check are discussed in Section 7, which provides further insight into curve selection, random sampling and dynamic data.
Ateniese et al. [2,3] defined the provable data possession (PDP) model, which ensures that stored data are intact in a storage server. Later on, Ateniese et al. [5] proposed the proof of storage (POS) framework to construct public PDP schemes. Their framework builds linear authenticators from homomorphic identification schemes. Purushothama and Amberker [41] proposed a public PDP scheme via polynomial interpolations. The integrity proof consists of one group element only, but the size of public information is linear to the size of data. Mohan and Katti [39] proposed a private PDP scheme via Sigma-protocols. They greatly reduced the number of exponentiations in an integrity proof.
Juels and Kaliski [33] defined the proofs of retrievability (POR) model, which ensures that stored data are retrievable from a storage server. Shacham and Waters [43] proposed two POR schemes which enable an unlimited number of verifications. They left the open problem of constructing a provably secure POR scheme which has linear communication complexity without using the random oracle model. Dodis et al. [22] solved the problem via the notion of hardness amplification in complexity theory. Bowers et al. [12] proposed a theoretical framework to construct POR schemes. Their framework is secure in the fully Byzantine adversarial model. Chang and Xu [15] proposed the remote integrity check (RIC) model, where stored data are not necessary to be retrievable from integrity proofs. They introduced the notion of trapdoor compression to capture the fundamental difference between the RIC model and the POR model. Xu and Chang [55] and Yuan and Yu [56] used constant size polynomial commitments to construct private and public POR schemes with constant communication complexity. Li et al. [34] proposed a public POR scheme, where a semi-honest verifier can generate integrity tags. Guang et al. [28] proposed a private POR scheme for fully homomorphic encrypted data. The integrity proof has constant size via securely outsourcing the operations of data blocks and private generators. Azraoui et al. [6] proposed StealthGuard, which is a POR scheme with hidden watchdogs. They used a privacy-preserving word search algorithm to protect the queried sentinels and achieve an unlimited number of verifications.
Since a user can perform dynamic operations on his data, integrity check schemes for dynamic data are considered. Ateniese et al. [4] proposed a private PDP scheme for dynamic data. The number of verifications is limited to the number of embedded tokens. The Merkle hash tree [36], authenticated hash table [40], and authenticated skip list [27] can be used to support dynamic data efficiently. Wang et al. [52,53], Zhu et al. [59] and Heitzmann et al. [30] used the above dynamic data structures to construct integrity check schemes for dynamic data. Erway et al. [24] proposed the dynamic provable data possession (DPDP) model. They also proposed the rank-based authenticated skip list. Huang et al. [31] proposed a systematic exact minimum bandwidth re-generating (SEMBR) code which can transform a dynamic PDP scheme into a dynamic POR scheme. Integrity check schemes for file systems are also considered. Goodrich et al. [26] and Stefanov et al. [46] proposed Athos and Iris, respectively. In recent years, efficiency of dynamic PDP/POR schemes has been improved. Mo et al. [38] proposed the cloud Merkle B+ tree (CMBT) to reduce the worst-cast communication complexity. The communication complexity is logarithmic in the size of data. Zhang and Blanton [57] proposed the balanced update tree, where each node is a range of data blocks that are applied to a dynamic operation. A user maintains his update tree so that he does not need to verify each dynamic operation. Cash et al. [14] and Shi et al. [45] proposed private and public POR schemes with polylogarithmic computation complexity and communication complexity.
To achieve data integrity and availability simultaneously, multiple replicas or coding methods are used with integrity check schemes. Curtmola et al. [20] proposed the multiple-replica provable data possession (MR-PDP) scheme, which ensures that each replica is stored in a storage server. Curtmola et al. [19] used forward error correction (FEC) codes to construct a robust RIC scheme. Bowers et al. [11] proposed the high-availability and integrity layer (HAIL), which uses an erasure code to ensure data retrievability among distributed storage servers. Chen et al. [18] proposed an integrity check scheme for network coding-based storage systems. Their scheme provides a data repair mechanism, but it needs a user to be involved. Cao et al. [13] proposed an integrity check scheme for Luby transform (LT) code-based storage systems. Their scheme provides a data repair mechanism, but it needs an extra backup server to recover the corrupted data. Sarkar and Safavi-Naini [42] and Han et al. [29] proposed POR schemes via fountain codes and maximum rank distance (MRD) codes, respectively. The integrity tags can be efficiently aggregated via XOR operations. High availability for dynamic data is also considered. Wang et al. [48,49] proposed a robust dynamic POR scheme via erasure codes. Chen and Curtmola [16,17] proposed the robust dynamic provable data possession (R-DPDP) model, which applies error correcting codes to dynamic data. Etemad and Küpçü [25] and Barsoum and Hasan [8] proposed dynamic PDP schemes for data replicas.
In recent years, various applications for integrity check schemes are considered. Zhu et al. [60,61] proposed the cooperative provable data possession (CPDP) scheme for hybrid clouds, where multiple clouds store data cooperatively. Wang et al. [47,50] proposed a privacy-preserving public integrity check scheme. They used a blinding technique to hide data information in integrity proofs. Shen and Tzeng [44] defined the delegable provable data possession (delegable PDP) model, where a user can authorize a verifier to check data integrity. The computation complexity of delegation is independent of the data size. Wang [51] proposed the proxy provable data possession (PPDP) model, where a verifier can check data on behalf of a user. Armknecht et al. [1] proposed the outsourced proofs of retrievability (OPOR) model, where a user can outsource integrity check tasks to a verifier. The user can validate the integrity check results. Zheng and Xu [58] and Du et al. [23] integrated the proofs of ownership (PoW) model with the POR model. They proposed public and private POR schemes with data deduplication, respectively.
Preliminary
Notation. Let
Bilinear map. Let q be a large prime. Let Bilinearity: Non-degeneration: Computability:
Complexity assumption
The truncated (decisional) bilinear Diffie–Hellman exponent assumption. Boneh et al. [9,10] introduced the bilinear Diffie–Hellman exponent (
The decisional version of the truncated ℓ-
The truncated (decisional)
The inverse computational Diffie–Hellman assumption. The inverse computational Diffie–Hellman (
The
The knowledge of exponent assumption. Damgård [21] introduced the knowledge of exponent assumption (
Delegated integrity check includes the notions of delegated PDP and delegated POR. The formal model and the threat model of delegated PDP/POR are defined in this section. An application scenario for personal health records (PHRs) is also provided in this section.
Formal model
Let
In the delegation phase,
In the integrity check phase,
Threat model
Delegated integrity check assumes that a verifier will not collude with a storage server. This assumption is natural because a verifier is in an opposite position to a storage server in an integrity check scheme. Delegated integrity check models the verifier and the storage server as follows:
A storage server rejects malformed delegation keys and malformed integrity challenges. The storage server may not store data and integrity tags correctly and will try to forge an integrity proof to pass the verification.
A verifier rejects invalid integrity proofs. The verifier may want to re-delegate his integrity check capability and will provide a malformed delegation key or a malformed integrity challenge to a storage server.
A delegated PDP/POR scheme is secure against the verifier and storage server if it satisfies the requirements of proof unforgeability, proof indistinguishability and delegation key unforgeability simultaneously.
Proof unforgeability
The proof unforgeability game models the notion that a storage server cannot modify stored data without being detected by a verifier. Challenger
The proof unforgeability game A delegated PDP/POR scheme Π satisfies the requirement of proof unforgeability if no probabilistic polynomial-time adversary has non-negligible advantage to win the proof unforgeability game
The proof indistinguishability game models the notion that only the verifiers can verify integrity proofs even if the network communication is eavesdropped. Challenger
The proof indistinguishability game A delegated PDP/POR scheme Π satisfies the requirement of proof indistinguishability if no probabilistic polynomial-time adversary has non-negligible advantage to win the proof indistinguishability game
The delegation key unforgeability game models the notion that only the user can generate delegation keys even if a verifier is corrupted. Challenger
The delegation key unforgeability game A delegated PDP/POR scheme Π satisfies the requirement of delegation key unforgeability if no probabilistic polynomial-time adversary has non-negligible advantage to win the delegation key unforgeability game
A patient’s personal health records (PHRs) contain systematic documentation of the patient’s medical histories and self-reporting data. The medical histories are provided by clinics and hospitals. The self-reporting data include daily physiological readings, drug-reaction records, etc. A patient has to share his PHRs among doctors in medical institutes for medical referral or research. PHRs are personal data and should not be checked by the public. A patient maintains his PHRs by himself, and he manages the verifiers of his PHRs. The patient can organize his PHRs in a hierarchical tree structure and categorize the PHRs according to the divisions of medical institutes (Fig. 2). The patient can authorize a doctor to access his PHRs in some categories. For example, the patient will authorize a dentist to access the PHRs in category

Example of the hierarchical structure for personal health records. A health record is stored in the category that the health record belongs to.
Consider that patient

The storing process of personal health records.
Figure 4 illustrates the delegation process of the PHRs. When

The delegation process of personal health records.
Figure 5 illustrates the integrity check process of the PHRs. When

The integrity check process of personal health records.
Figure 6 illustrates the retrieval process of the PHRs. When

The retrieval process of personal health records.
A delegated PDP scheme for hierarchical data is provided in this section. The security of the scheme is based on the truncated (decisional) bilinear Diffie–Hellman exponent assumption, the inverse computation Diffie–Hellman assumption, and the knowledge of exponent assumption in the random oracle model.
Construction

Example of two-degree hierarchical data. An internal node is a category, and an external node is a file.

Example of key derivation for two-degree hierarchical data.
To generate integrity tags Σ for M,
After generating the integrity tags for all files,
The integrity tags are aggregately verifiable. They can be aggregated together and verified at the same time. Nevertheless, the integrity tags are not homomorphically combinable. Combing tag
To check integrity,
Assume that
Thus, the three verification equations are derived as follows:
Performance
Computation cost
Computation cost
Notes: n is the number of integrity tags; b is the number of data identifiers; l is the number of categories.
Storage cost
Notes: k is the security parameter; p is the size of an element in
Communication cost
Notes: k is the security parameter; p is the size of an element in
Simulation result of the delegated PDP scheme
Notes: Security parameter
The number b of data identifiers affects the efficiency of the integrity check phase. If b is small, there will be too many integrity tags. Thus, computing the hash values of tag indexes will be a heavy duty. The user has to compute more hash values to generate integrity tags. The verifier has to compute more hash values to verify an integrity proof. If b is large, there will be too many data identifiers. Thus, the computation in the integrity check process will be inefficient. The verifier has to do more scalar exponentiations to generate an integrity challenge. The storage server has to do more scalar exponentiations to generate an integrity proof. The verifier has to do more bilinear maps to verify the integrity proof. We suggest that a user choose
The delegated PDP scheme satisfies the requirements of proof unforgeability, proof indistinguishability and delegation key unforgeability.
Proof unforgeability
The delegated PDP scheme is proof unforgeable under the truncated 1-
Let
Let Because (7) holds, Because (8) holds, Because (6) and
In the above reduction,
The delegated PDP scheme is proof indistinguishable under the truncated decisional 1-
Let
Let In the above reduction,
The delegated PDP scheme is delegation key unforgeable under the
Let
Let In the above reduction,
A delegated POR scheme for hierarchical data is provided in this section. The security of the scheme is based on the truncated (decisional) bilinear Diffie–Hellman exponent assumption, the inverse computation Diffie–Hellman assumption, and the knowledge of exponent assumption in the random oracle model.
Construction
The delegated POR scheme is based on the delegated PDP scheme in Section 5.1, but it has two major differences:
To enable data to be retrievable from integrity proofs, an integrity proof includes a linear combination of data. Storage server
To retrieve data from integrity proofs, an extraction method is provided. Verifier
The complete description of the integrity check phase is provided:
Correctness
Assume that
Thus, the two verification equations are derived as follows:
Performance
Computation cost
Computation cost
Notes: n is the number of integrity tags; b is the number of data identifiers; l is the number of categories.
Storage cost
Notes: k is the security parameter; p is the size of an element in
Communication cost
k is the security parameter; p is the size of an element in
Simulation result of the delegated POR scheme
Notes: Security parameter
The number b of data identifiers affects the efficiency of the integrity check phase. If b is small, the verifier has to compute more hash values to verify an integrity proof. The delegated POR scheme will have similar efficiency to the delegated PDP scheme. If b is large, the verifier has to do more scalar exponentiations to verify an integrity proof. Nevertheless, the delegated POR scheme will still be more efficient then the delegated PDP scheme. We suggest that a user choose
The delegated POR scheme satisfies the requirements of proof unforgeability, proof indistinguishability and delegation key unforgeability.
Proof unforgeability
The delegated POR scheme is proof unforgeable under the truncated 1-
Let
This proof is similar to the proof of Theorem 1, but it has two differences:
In the challenge phase, Integrity proof The reduced advantage is still
The delegated POR scheme is proof indistinguishable under the truncated decisional 1-
Let
This proof is similar to the proof of Theorem 2, but it has two differences:
The integrity challenge is in the form of The integrity proof is in the form of The reduced advantage is still
The delegated POR scheme is delegation key unforgeable under the
Let
This proof is the same as the proof of Theorem 3. □
The issues of curve suggestion, random sampling and dynamic data are discussed in this section.
Elliptic curve and block size
Suppose that element size
Elliptic curve suggestion
Elliptic curve suggestion
Notes: A type D curve is used for 256-bit block size, and the discriminant of the type D curve is 104,826,427; type A curves are used for the other block sizes.
In the integrity check phase, a verifier checks all data blocks to detect errors. The storage server has to access all data blocks to generate an integrity proof. The verifier has to compute all hash values of tag indexes to verify the integrity proof. The computation complexity is
The random sampling technique is a trade-off between efficiency and efficacy. Assume that there are n data blocks. Let α be the proportion of corrupted blocks, where

False negative rate of checking t-out-of-n data blocks. α is the proportion of corrupted blocks.
A verifier has to check a large number of data blocks when α is very small. For example, the verifier has to check
Given the number n of data blocks and the proportion β of error correcting, an
A user will update his data in many situations, and he has to update the integrity tags for the updated data. When the user inserts or deletes a data block, he has to maintain the tag indexes. When the user modifies a data block, he has to update the version number. Many integrity check schemes [8,16,17,24–26,30,31,38,45,46,52,53,57] use dynamic data structures to reduce the computation cost and the storage cost. The delegated PDP/POR scheme can use either the Merkle hash tree [36] or the authenticated skip list [27] to support dynamic data. We adapt the delegated PDP scheme for dynamic data via the Merkle hash tree as follows:
To reduce the computation cost of updating tag indexes, integrity tags do not include tag indexes anymore. User To assure integrity of the tags, a Merkle hash tree is built for each category. Let To check integrity of data, a verifier has to make sure that an integrity proof is generated with correct integrity tags. To check file
To prevent re-delegation, delegation keys have to be protected from leaks. If
To modify data, integrity tags and Merkle hash trees have to be updated accordingly. To insert file
and verifies
and verifies
Conclusion
Delegated integrity check is an efficient solution for a user to control the subject who can check his data. Delegated integrity check provides effective management of verifiers. A user can delegate a verifier to check his data and revoke the verifier later. The verifier cannot re-delegate his integrity check capability to other people. Delegated integrity check supports dynamic group of verifiers. A user can delegate a verifier to check his data. Delegated integrity check achieves large-scale delegation. The delegation process is lightweight and independent of the data size. The integrity check process is efficient.
This paper proposes the delegated integrity check model. This paper provides two delegated integrity check schemes for hierarchical data. The first scheme assures data possession of a storage server. The second scheme assures data retrievability from a storage server. The related issues of the delegated integrity check model are discussed.
Footnotes
Acknowledgments
This research is supported by parts of MOST projects MOST-101-2221-E-009-074-MY3, MOST-101-2219-E-009-016 and MOST-102-2219-E-009-010, Taiwan.
