Group ORAM for privacy and access control in outsourced personal records

Abstract

Cloud storage has rapidly become a cornerstone of many IT infrastructures, constituting a seamless solution for the backup, synchronization, and sharing of large amounts of data. Putting user data in the direct control of cloud service providers, however, raises security and privacy concerns related to the integrity of outsourced data, the accidental or intentional leakage of sensitive information, the profiling of user activities and so on. Furthermore, even if the cloud provider is trusted, users having access to outsourced files might be malicious and misbehave. These concerns are particularly serious in sensitive applications like personal health records and credit score systems.

To tackle this problem, we present $Π_{GORAM}$ , a definitional framework for Group Oblivious RAM, in which we formalize several security and privacy properties such as secrecy, integrity, anonymity, and obliviousness. $Π_{GORAM}$ allows per entry access control, as selected by the data owner. $Π_{GORAM}$ is the first framework to define such a wide range of security and privacy properties for outsourced storage. Regarding obliviousness, we tackle two different attacker models: our first definition protects against an honest-but-curious server while our second definition protects against such a server colluding with malicious clients.

In the latter model, we prove a server-side computational lower bound of $Ω (n)$ where n is the number of entries in the database, i.e., every operations requires to process a constant fraction of the database. Furthermore, we present two constructions: a pure cryptographic instantiation, which achieves an $O (\sqrt{n})$ amortized communication and computation complexity and a construction based on a trusted proxy with logarithmic communication and server-side computational complexity. The second construction bypasses the previously established lower bound leveraging a trusted party. Both schemes achieve secrecy, integrity, and obliviousness with respect to a server colluding with malicious clients, but not anonymity due to the deployed access control mechanism.

In the former model, we present a cryptographic system that achieves secrecy, integrity, obliviousness, and anonymity. In the process of designing an efficient construction, we developed three new, generally applicable cryptographic schemes, namely, batched zero-knowledge proof of shuffle correctness, the hash-and-proof paradigm, which even improves upon the former, and an accountability technique based on chameleon signatures, which we consider of independent interest.

We implemented our constructions in Amazon Elastic Compute Cloud (EC2) and ran a performance evaluation demonstrating the scalability and efficiency of our construction.

Keywords

Group ORAM oblivious RAM cloud storage provable security privacy-enhancing technologies

1. Introduction

Cloud storage has rapidly gained a central role in the digital society, serving as a building block of consumer-oriented applications (e.g, Dropbox, Microsoft SkyDrive, and Google Drive) as well as particularly sensitive IT infrastructures, such as personal record management systems. For instance, credit score systems rely on credit bureaus (e.g., Experian, Equifax, and TransUnion in US) collecting and storing information about the financial status of users, which is then made available upon request. As a further example, personal health records (PHRs) are more and more managed and accessed through web services (e.g., private products like Microsoft HealthVault and PatientsLikeMe in US and national services like ELGA in Austria), since this makes PHRs readily accessible in case of emergency even without the physical presence of the e-health card and eases their synchronization across different hospitals.

Despite its convenience and popularity, cloud storage poses a number of security and privacy issues. The first problem is related to the secrecy of user data, which are often sensitive (e.g., PHRs give a complete picture of the health status of citizens) and, thus, should be concealed from the server. A crucial point to stress is that preventing the server from reading user data (e.g., through encryption) is necessary but not sufficient to protect the privacy of user data. Indeed, as shown in the literature [55,76], the capability to link consecutive accesses to the same file can be exploited by the server to learn sensitive information: for instance, it has been shown that the access patterns to a DNA sequence allow for determining the patient’s disease. Hence the obliviousness of data accesses is another fundamental property for sensitive IT infrastructures: the server should not be able to tell whether two consecutive accesses concern the same data or not, nor to determine the nature of such accesses (read or write). Furthermore, the server has in principle the possibility to modify client’s data, which can be harmful for several reasons: for instance, it could drop data to save storage space or modify data to influence the statistics about the dataset (e.g., in order to justify higher insurance fees or taxes). Therefore another property that should be guaranteed is the integrity of user data.

Finally, it is often necessary to share outsourced documents with other clients, yet in a controlled manner, i.e., selectively granting them read and write permissions: for instance, PHRs are selectively shared with the doctor before a medical treatment and a prescription is shared with the pharmacy in order to buy a medicine. Data sharing complicates the enforcement of secrecy and integrity properties, which have to be guaranteed not only against a malicious server but also against malicious clients. Notice that the simultaneous enforcement of these properties is particularly challenging, since some of them are in seeming contradiction. For instance, access control seems to be incompatible with the obliviousness property: if the server is not supposed to learn which file the client is accessing, how can he check that the client has the rights to do so?

1.1. Our contributions

In this work, we present $GORAM$ , a novel framework for privacy-preserving cloud-storage. Users can share outsourced data with other clients, selectively granting them read and write permissions, and verify the integrity of such data. These are hidden from the server and access patterns are oblivious. $GORAM$ is the first system to achieve such a wide range of security and privacy properties for storage outsourcing. More specifically, the contributions of this work are the following:

We formalize the problem statement by introducing the notion of Group Oblivious RAM (GORAM) in Section 2. GORAM extends the concept of Oblivious RAM [47] (ORAM)1

¹
ORAM is a technique originally devised to protect the access pattern of software on the local memory and then used to hide the data and the user’s access pattern in storage outsourcing services.

by considering multiple, possibly malicious clients, with read and/or write access to outsourced data, as opposed to a single client. We propose a formal security model that covers a variety of security and privacy properties, such as data integrity, data secrecy, obliviousness of access patterns, and anonymity. For obliviousness, we consider two variants: first, we define obliviousness against malicious clients. Intuitively, none should be able to determine which entry is read by which client. However, write operations are oblivious only with respect to the server and to those clients who cannot read the modified entry, since clients with read access can obviously notice that the entry has changed. Second, we define obliviousness only against a malicious server that does not collude with clients. We find this slightly weaker obliviousness definition interesting, in particular, because it allows for significantly more efficient cryptographic constructions.

In the obliviousness against malicious clients setting, we establish an insightful computational lower bound (Section 3): if clients have direct access to the database, the number of operations on the server side has to be linear in the database size. Intuitively, the reason is that if a client does not want to access all entries in a read operation, then it must know where the required entry is located in the database. Since malicious clients can share this information with the server, the server can determine for each read operation performed by an honest client, which among the entries the adversary has access to might be the subject of the read, and which certainly not.

We present $PIR$ - $GORAM$ , the first cryptographic construction that ensures the obliviousness of data accesses against malicious clients as well as access control (Section 4). Our construction relies on Private Information Retrieval (PIR) [24] to achieve obliviousness, uses a new accumulation technique based on an oblivious gossiping protocol to reduce the communication in an amortized fashion, and combines public-key cryptography and zero-knowledge proofs for access control.

To bypass the aforementioned lower bound, we consider the recently proposed proxy-based Setting [13,66,79,87,93], which assumes the presence of a trusted party mediating the accesses between clients and server. We show, in particular, that a simple variant of TaoStore [79] guarantees obliviousness in the malicious setting as well as access control (Section 5).

We then move to the setting in which obliviousness holds only against a malicious server who does not collude with clients. We first introduce a cryptographic instantiation based on a novel combination of ORAM [90], predicate encryption [58], and zero-knowledge (ZK) proofs (of shuffle correctness) [10,51] (Section 6). This construction is secure, but building on off-the-shelf cryptographic primitives is not practical. In particular, clients prove to the server that the operations performed on the database are correct through ZK proofs of shuffle correctness, which are expensive when the entries to be shuffled are tuples of data, as opposed to single entries.

As a first step towards a practical instantiation, we maintain the general design, but we replace the expensive ZK proofs of shuffle correctness with a new proof technique called batched ZK proofs of shuffle correctness (Section 6.5.1). A batched ZK proof of shuffle significantly reduces the number of ZK proofs by “batching” several instances and verifying them together. Since this technique is generically applicable in any setting where one is interested to perform a zero-knowledge proof of shuffle correctness over a list of entries, each of them consisting of a tuple of encrypted blocks, we believe that it is of independent interest. This second realization greatly outperforms the first solution and is suitable for databases with relatively small entries, accessed by a few users, but it does not scale to large entries and many users.

In a second step we present a novel technique based on universal pair-wise hash functions [19] that speeds up batched shuffle proofs even further (Section 6.5.2). As opposed to batched shuffle proofs, which are secure when repeated λ many times where λ is the security parameter, this construction computes a constant number of proofs. It is generally applicable and we show that it outperforms batched shuffle proofs by one order of magnitude.

To obtain a scalable solution, we explore some trade-offs between security and efficiency. First, we present a new accountability technique based on chameleon signatures (Section 7). The idea is to let clients perform arbitrary operations on the database, letting them verify each other’s operation a-posteriori and giving them the possibility to blame misbehaving parties. Secondly, we replace the relatively expensive predicate encryption, which enables sophisticated role-based and attribute-based access control policies, with the more efficient broadcast encryption, which suffices to enforce per-user read/write permissions, as required in the personal record management systems we consider (Section 8). This approach leads to a very efficient solution that scales to large files and thousands of users, with a combined communication-computation overhead of only 7% (resp. 8%) with respect to state-of-the-art, single-client ORAM constructions for reading (resp. writing) on a 1 GB storage with 1 MB block size (for larger datasets or block sizes, the overhead is even lower).

We have implemented our constructions and conducted a performance evaluation demonstrating the scalability and efficiency of our constructions (Section 10). Although Group ORAM is generically applicable, the large spectrum of security and privacy properties, as well as the efficiency and scalability of the system, make it particularly suitable for the management of large amounts of sensitive data, such as personal records. We exemplify that in a case study presented in Appendix A.

2. Definitional framework

We detail the problem statement by formalizing the concept of Group ORAM (Section 2.1), introducing the attacker model (Section 2.2), and presenting the security and privacy properties (Section 2.3).

2.1. Group ORAM

We consider a data owner $O$ outsourcing her database $DB = d_{1}, \dots, d_{m}$ to the server $S$ . A set of clients $C_{1}, \dots, C_{n}$ can accesses parts of the database, as specified by the access control policy set by $O$ . This is formalized as an n-by-m matrix $AC$ (where $| AC |_{r} = n$ and $| AC |_{c} = m$ denote the number of rows and columns, respectively), defining the permissions of the clients on the files in the database: $AC (i, j)$ (i.e., the jth entry of the ith row) denotes the access mode for client i on data $d_{j}$ . An entry in $AC$ can take one of the values ⊥ (no access), $R$ (read access), or $RW$ (read-write access).

At registration time, each client $C_{i}$ receives a capability ${cap}_{i}$ , which gives $C_{i}$ access to $DB$ as specified in the corresponding row of $AC$ . Furthermore, we assume the existence of a capability ${cap}_{O}$ , which grants permissions for all of the operations that can be executed by the data owner only.

In the following we formally characterize the notion of Group ORAM. Intuitively, a Group ORAM is a collection of two algorithms and four interactive protocols, used to setup the database, add clients, add an entry to the database, change the access permissions to an entry, read an entry, and overwrite an entry. In the sequel, we let $⟨ A, B ⟩$ denote a protocol between the ppt machines A and B, $| a |$ the length of the vector a of access modes, and $a (i)$ the element at position i in a. In all our protocols $| DB | = | AC |_{c}$ and we write $AC ∥_{r} a$ and $AC ∥_{c} a$ to denote row and column concatenation, respectively.

Definition 1 (Group ORAM).

A Group ORAM scheme is a tuple of (interactive) ppt algorithms $Π_{GORAM} = (gen, addCl, addE, chMode, read, write)$ , such that: $({cap}_{O}, DB) \leftarrow gen (1^{λ}, n)$ :

on input a security parameter λ and an integer n indicating the maximum number of clients, this algorithm outputs $({cap}_{O}, DB)$ where $DB : = []$ is the database and ${cap}_{O}$ is the data owner’s capability. Additionally, the algorithm initializes the access control matrix $AC : = []$ , which is treated as a global variable hereafter.

{{cap}_{i}, deny} \leftarrow addCl ({cap}_{O}, a)

on input the data owner’s capability ${cap}_{O}$ and an access permission vector a, this algorithm checks whether $| a | = | AC |_{c}$ . In that case, $AC = AC ∥_{r} a$ and the algorithm outputs a client capability ${cap}_{i}$ that grants access permissions to $C_{i}$ according to a. Otherwise, it outputs $deny$ .

{{DB}^{'}, deny} \leftarrow ⟨ C_{addE} ({cap}_{O}, a, d), S_{addE} (DB) ⟩

on input the data owner’s capability ${cap}_{O}$ , an access permission vector a, and a data d, this protocol checks whether $| a | = | AC |_{r}$ . In that case, $AC = AC ∥_{c} a$ and the algorithm outputs a database ${DB}^{'}$ , which is equal to $DB$ augmented by d, granting clients access permissions according to a. Otherwise, it outputs $deny$ .

⟨ C_{chMode} ({cap}_{O}, a, j), S_{chMode} (DB) ⟩

on input the data owner’s capability ${cap}_{O}$ , an access permission vector a, and an index j, this protocol changes the access permissions for the jth entry as specified by a. If $j ⩽ | DB |$ and $| a | = | AC |_{r}$ , then the jth column of $AC$ is replaced by a.

{d, deny} \leftarrow ⟨ C_{read} ({cap}_{i}, j), S_{read} (DB) ⟩

on input a capability ${cap}_{i}$ and an index j, this protocol either outputs $d : = DB (j)$ or $deny$ if $| DB | < j$ or $AC (i, j) = ⊥$ .

{{DB}^{'}, deny} \leftarrow ⟨ C_{write} ({cap}_{i}, j, d), S_{write} (DB) ⟩

on input a capability ${cap}_{i}$ , an index j, and a data d, this protocol overwrites the jth entry of $DB$ with d. It succeeds and outputs ${DB}^{'}$ if and only if $AC (i, j) = RW$ , otherwise it outputs $deny$ .

2.2. The attacker model

We consider an adversarial model in which the data owner $O$ is honest, the clients $C_{1}, \dots, C_{n}$ may be malicious, and the server $S$ is assumed to be honest-but-curious (HbC)2

²
I.e., the server is regarded as a passive adversary, following the protocol but seeking to gather additional information.

(and not to collude with clients). These assumptions are common in the literature (see, e.g., [3,23]) and are well justified in a cloud setting, since it is of paramount importance for service providers to keep a good reputation, which discourages them from visibly misbehaving, while they may have an incentive in passively gathering sensitive information given the commercial interest of personal data.

Although we could limit ourselves to reason about all security and privacy properties in this attacker model, we find it interesting to state and prove some of them even in a stronger attacker model, where the server can arbitrarily misbehave and collude with malicious clients. This allows us to characterize which properties unconditionally hold true in our different systems, i.e., even if the server gets compromised (cf. the discussion in the end of this section).

2.3. Security and privacy properties

We formalize the security and privacy requirements in the following as game-based definitions. Our framework models all of the properties in the presence of sequential accesses to the database. Security under concurrent executions and parallel accesses to the ORAM is an interesting avenue for future research.

Fig. 1.

Game for secrecy.

Secrecy. Intuitively, a Group ORAM preserves the secrecy of outsourced data if no party is able to deduce any information about the content of any entry she does not have access to. We formalize this intuition through a cryptographic game in the following definition, which is illustrated in Fig. 1.

Definition 2 (Secrecy).

A Group ORAM $Π_{GORAM}$ preserves secrecy, if $\Pr [{Exp}_{A, secrecy}^{Π_{GORAM}} (λ, b) = 1]$ is negligibly close to $1 / 2$ for every ppt adversary $A$ , where ${Exp}_{A, secrecy}^{Π_{GORAM}} (λ, b)$ is the following game:

Setup. The challenger runs $({cap}_{O}, DB) \leftarrow gen (1^{λ})$ , sets $AC : = []$ , and runs a black-box simulation of $A$ to which it hands over $DB$ .

Queries. The challenger provides $A$ with interactive interfaces $addCl$ , $addE$ , $chMode$ , $read$ , $write$ , and $corCl$ that $A$ may query adaptively and in any order. These interfaces are described below:

On input $addCl (a)$ by $A$ , the challenger executes $addCl ({cap}_{O}, a)$ locally and stores the capability ${cap}_{i}$ returned by the algorithm.

On input $addE (a, d)$ by $A$ , the challenger executes $⟨ C_{addE} ({cap}_{O}, a, d), S_{addE} (DB) ⟩$ in interaction with $A$ , where the former plays the role of the client while the latter plays the role of the server.

On input $chMode (a, j)$ by $A$ , the challenger executes $⟨ C_{chMode} ({cap}_{O}, a, j), S_{chMode} (DB) ⟩$ in interaction with $A$ .

On input $corCl (i)$ by $A$ , the challenger hands over the capability ${cap}_{i}$ related to the ith client in the access control matrix $AC$ .

On input $read (i, j)$ by $A$ , the challenger executes $⟨ C_{read} ({cap}_{i}, j), S_{read} (DB) ⟩$ in interaction with $A$ .

On input $write (i, j, d)$ by $A$ , the challenger executes $⟨ C_{write} ({cap}_{i}, j, d), S_{write} (DB) ⟩$ in interaction with $A$ .

Challenge. Finally, $A$ outputs $(j, (d_{0}, d_{1}))$ , where j is an index denoting the database entry on which $A$ wants to be challenged and $(d_{0}, d_{1})$ is a pair of entries such that $| d_{0} | = | d_{1} |$ . The challenger accepts the request only if $AC (i, j) = ⊥$ , for every i corrupted by $A$ in the query phase. Afterwards it invokes $⟨ C_{write} ({cap}_{O}, j, d_{b}), S_{write} (DB) ⟩$ in interaction with $A$ .

Output. In the output phase $A$ still has access to the interfaces except for $addCl$ on input a such that $a (j) \neq ⊥$ ; $corCl$ on input i such that $AC (i, j) \neq ⊥$ ; and $chMode$ on input $a, i$ with $a (i) \neq ⊥$ for some previously corrupted client i. Eventually, $A$ stops, outputting a bit $b^{'}$ . The challenger outputs 1 if and only if $b = b^{'}$ .

Fig. 2.

Game for integrity.

Integrity. A Group ORAM preserves the integrity of its entries if none of the clients can modify an entry to which she does not have write permissions. The respective cryptographic game is depicted in Fig. 2 and we formalize it below.

Definition 3 (Integrity).

A Group ORAM $Π_{GORAM}$ preserves integrity, if $\Pr [{Exp}_{A, integrity}^{Π_{GORAM}} (λ) = 1]$ is negligible in λ for every ppt adversary $A$ , where ${Exp}_{A, integrity}^{Π_{GORAM}} (λ)$ is the following game:

Setup. The challenger runs $({cap}_{O}, DB) \leftarrow gen (1^{λ})$ , sets $AC : = []$ , and runs a black-box simulation of $A$ . Furthermore, the challenger initializes a second database ${DB}^{'} : = []$ which is managed locally.

Queries. The challenger provides $A$ with the same interfaces as in Definition 2, which $A$ may query adaptively and in any order. Since $DB$ is maintained on the challenger’s side, the queries to $addE$ , $chMode$ , $read$ , and $write$ are locally executed by the challenger, except for corrupted clients, for which the challenger executes $read$ and $write$ in collaboration with $A$ . Furthermore, the challenger updates ${DB}^{'}$ locally for all affecting interface calls.

Challenge. Finally, the adversary outputs an index $j^{*}$ which he wants to be challenged on. If there exists a capability ${cap}_{i}$ provided to $A$ with $AC (i, j^{*}) = RW$ , the challenger aborts. Otherwise it runs $d^{*} \leftarrow ⟨ C_{read} ({cap}_{O}, j^{*}), S_{read} (DB) ⟩$ locally.

Output. It outputs 1 if and only if $d^{*} \neq {DB}^{'} (j^{*})$ .

Fig. 3.

Game for tamper-resistance.

Tamper-resistance. Intuitively, a Group ORAM is tamper-resistant if the server, even colluding with a subset of malicious clients, is not able to convince an honest client about the integrity of some maliciously modified data. Notice that this property refers to a strong adversarial model, where the adversary may arbitrarily misbehave and collude with clients. Naturally, tamper-resistance holds true only for entries which none of the corrupted clients had ever access to. The respective cryptographic game is depicted in Fig. 3.

Definition 4 (Tamper resistance).

A Group ORAM $Π_{GORAM}$ is tamper-resistant, if for all ppt adversaries $A$ , $\Pr [{Exp}_{A, tam - res}^{Π_{GORAM}} (λ) = 1]$ is negligible in λ where ${Exp}_{A, tam - res}^{Π_{GORAM}} (λ)$ is the following game:

Setup. The challenger runs the Setup phase as in Definition 2. Furthermore, it forwards $DB$ to $A$ and initializes a second database ${DB}^{'}$ which is managed locally.

Queries. The challenger provides $A$ with the same interfaces as in Definition 2, which $A$ may query adaptively and in any order. Furthermore, it updates ${DB}^{'}$ locally for all affecting interface calls.

Challenge. Finally, the adversary outputs an index $j^{*}$ which he wants to be challenged on. If there exists a capability ${cap}_{i}$ that has ever been provided to $A$ such that $AC (i, j^{*}) = RW$ , then the challenger aborts. The challenger runs $d^{*} \leftarrow ⟨ C_{read} ({cap}_{O}, j^{*}), S_{read} (DB) ⟩$ in interaction with $A$ .

Output. It outputs 1 if and only if $d^{*} \neq {DB}^{'} (j^{*})$ .

Fig. 4.

Game for obliviousness.

Obliviousness. Intuitively, a Group ORAM is oblivious if the server cannot distinguish between two arbitrary query sequences which contain $read$ and $write$ operations. The cryptographic game is defined below and illustrated in Fig. 4.

Definition 5 (Obliviousness).

A Group ORAM $Π_{GORAM}$ is oblivious, if $\Pr [{Exp}_{A, obliv}^{Π_{GORAM}} (λ, b) = 1]$ is negligibly close to $1 / 2$ for every ppt adversary $A$ , where ${Exp}_{A, obliv}^{Π_{GORAM}} (λ, b)$ is the following game:

Setup. The challenger runs $({cap}_{O}, DB) \leftarrow gen (1^{λ})$ as in Definition 2 and it forwards $DB$ to $A$ .

Queries. The challenger provides $A$ with the same interfaces as in Definition 2 except $corCl$ , which $A$ may query adaptively and in any order. Additionally, $A$ is provided with the following interface:

On input $query ({(i_{0}, j_{0}), (i_{0}, j_{0}, d_{0})}, {(i_{1}, j_{1}), (i_{1}, j_{1}, d_{1})})$ by $A$ , the challenger checks whether $j_{0} ⩽ | DB |$ , $j_{1} ⩽ | DB |$ , and $i_{0}, i_{1}$ are valid clients. Furthermore, it checks that the operations requested by $A$ are allowed by $AC$ . If not it aborts. Otherwise it executes $⟨ C_{read} ({cap}_{i_{b}}, j_{b}), S_{read} (DB) ⟩$ or $⟨ C_{write} ({cap}_{i_{b}}, j_{b}, d_{b}), S_{write} (DB) ⟩$ depending on the input, in interaction with $A$ . Here the challenger plays the role of the client and $A$ plays the role of the server.

Output. Finally, $A$ outputs a bit $b^{'}$ . The challenger outputs 1 if and only if $b = b^{'}$ .

Fig. 5.

Game for obliviousness against malicious clients.

Obliviousness against malicious clients. Intuitively, a Group ORAM is oblivious against malicious clients if the server and an arbitrary subset of clients cannot get any information about the access patterns of honest clients, other than what is trivially leaked by the entries that the corrupted clients have read access to. The following definition, graphically supported by Fig. 5, is an extension of the previous one which allows the adversary to corrupt arbitrary clients. However, in order to avoid trivial attacks, we restrict the queries of the adversary to the $write$ oracle to indices that the set of corrupted clients cannot read.

Definition 6 (Obliviousness against Malicious Clients).

A Group ORAM $Π_{GORAM}$ is oblivious against malicious clients, if $\Pr [{Exp}_{A, m - obliv}^{Π_{GORAM}} (λ, b) = 1]$ is negligibly close to $1 / 2$ for all ppt adversaries $A$ , where ${Exp}_{A, m - obliv}^{Π_{GORAM}} (λ, b)$ is the following game:

Setup. The challenger runs $({cap}_{O}, DB) \leftarrow gen (1^{λ})$ as in Definition 2 and it forwards $DB$ to $A$ .

Queries. The challenger provides $A$ with the same interfaces as in Definition 2, which $A$ may query adaptively and in any order. Furthermore, $A$ is provided with the following additional interface:

On input $query ({(i_{0}, j_{0}), (i_{0}, j_{0}, d_{0})}, {(i_{1}, j_{1}), (i_{1}, j_{1}, d_{1})})$ by $A$ , the challenger checks in case that $d_{0} \neq ⊥$ or $d_{1} \neq ⊥$ whether there exists a corrupted client i for which $AC (i, j_{0}) \neq ⊥$ or $AC (i, j_{1}) \neq ⊥)$ ; in this case, the challenger aborts. It also checks whether $j_{0} ⩽ | DB |$ , $j_{1} ⩽ | DB |$ , and $i_{0}, i_{1}$ are valid clients. Furthermore, it checks that the operations requested by $A$ are allowed by $AC$ . If not it aborts. Otherwise it executes $⟨ C_{read} ({cap}_{i_{b}}, j_{b}), S_{read} (DB) ⟩$ or $⟨ C_{write} ({cap}_{i_{b}}, j_{b}, d_{b}), S_{write} (DB) ⟩$ depending on the input, in interaction with $A$ . Here the challenger plays the role of the client and $A$ plays the role of the server. In case $d_{0} \neq ⊥$ or $d_{1} \neq ⊥$ , from this moment on, the queries of $A$ to the interface $chMode$ on any corrupted i and $j_{0}$ or $j_{1}$ as well as calls to the $corCl$ interface on i with $AC (i, j_{0}) \neq ⊥$ or $AC (i, j_{1}) \neq ⊥$ are forbidden.

Output. Finally, $A$ outputs a bit $b^{'}$ . The challenger outputs 1 if and only if $b = b^{'}$ .

Fig. 6.

Game for anonymity.

Anonymity. A Group ORAM preserves anonymity if the data owner cannot efficiently link a given operation to a client, among the set of clients having access to the queried index. We formalize anonymity as a cryptographic game in the following definition, depicted in Fig. 6.

Definition 7 (Anonymity).

A Group ORAM $Π_{GORAM}$ preserves anonymity, if $\Pr [{Exp}_{A, anonymity}^{Π_{GORAM}} (λ, b) = 1]$ is negligibly close to $1 / 2$ for every ppt adversary $A$ , where ${Exp}_{A, anonymity}^{Π_{GORAM}} (λ, b)$ is the following game:

Setup. The challenger runs $({cap}_{O}, DB) \leftarrow gen (1^{λ})$ and it forwards ${cap}_{O}$ and $DB$ to $A$ .

Queries. The challenger provides $A$ with $read$ and a $write$ interfaces that $A$ may query adaptively and in any order. The interfaces are described below:

On input $read ({cap}_{i}, j)$ by $A$ , the challenger executes $⟨ C_{read} ({cap}_{i}, j), S_{read} (DB) ⟩$ in interaction with $A$ , where the former plays the role of the server and the latter plays the role of the client.

On input $write ({cap}_{i}, j, d)$ by $A$ , the challenger executes $⟨ C_{write} ({cap}_{i}, j, d), S_{write} (DB) ⟩$ in interaction with $A$ , where the former plays the role of the server and the latter plays the role of the client.

Challenge. $A$ outputs $(({cap}_{i_{0}}, {cap}_{i_{1}}), {j, (j, d)})$ , where $({cap}_{i_{0}}, {cap}_{i_{1}})$ is a pair of capabilities, j is an index denoting the database entry on which $A$ wishes to be challenged, and d is some data. The challenger checks whether $AC (i_{0}, j) = AC (i_{1}, j)$ : if not, then it aborts, otherwise it executes $⟨ C_{read} ({cap}_{i_{b}}, j), S_{read} (DB) ⟩$ or $⟨ C_{write} ({cap}_{i_{b}}, j, d), S_{write} (DB) ⟩$ in interaction with $A$ .

Output. Finally, $A$ outputs a bit $b^{'}$ . The challenger outputs 1 if and only if $b = b^{'}$ .

Accountable integrity. Intuitively, a Group ORAM preserves accountable integrity, if every client who modifies an entry without holding write permissions on that entry will be detected by an honest client. Clients detected of misbehavior are blamed and there is evidence in form of audit logs that can be used to hold them accountable. The blaming of clients must be correct in a natural way: (1) honest parties are never blamed and (2) in case of misbehavior by some party, at least one dishonest party is blamed. The literature defines this notion of accountability as fairness (cf. 1) and completeness (cf. 2) [61]. We define the accountable integrity property through a cryptographic game, illustrated in Fig. 7.

Definition 8 (Accountable integrity).

A Group ORAM $Π_{GORAM}$ achieves accountable integrity, if $\Pr [{Exp}_{A, acc - int}^{Π_{GORAM}} (λ) = 1]$ is negligible in λ for every ppt adversary $A$ the following probability where ${Exp}_{A, acc - int}^{Π_{GORAM}} (λ)$ is the following game:

Fig. 7.

Game for accountable integrity.

Setup. The challenger runs the Setup phase as in Definition 3.

Queries. The challenger runs the Query phase as in Definition 3.

Challenge. Finally, the adversary outputs an index $j^{*}$ which he wants to be challenged on. If there exists a capability ${cap}_{i}$ provided to $A$ such that $AC (i, j^{*}) = RW$ , then the challenger aborts. The challenger runs $d^{*} \leftarrow ⟨ C_{read} ({cap}_{O}, j^{*}), S_{read} (DB) ⟩$ and $L \leftarrow blame ({cap}_{O}, Log, j^{*})$ locally.

Output. It outputs 1 if and only if $d^{*} \neq {DB}^{'} (j^{*})$ and $\exists i \in L$ that has not been queried by $A$ to the interface $corCl (\cdot)$ or $L = []$ .

Table 1

Security and privacy properties together with their minimal assumptions

Property	Server	Collusion
Secrecy	Malicious	✓
Integrity	Hbc	✗
Accountable Integrity	Hbc	✗
Tamper-resistance	Malicious	✓
Obliviousness	Malicious	✗
Obliviousness	Malicious	✓
Anonymity	Malicious	✓

Discussion. Table 1 summarizes the security and privacy properties presented in this section, along with the corresponding assumptions. The HbC assumption is in fact only needed for integrity, since the correctness of client operations is checked by the server, thus avoiding costly operations on the client side. We will see in Section 7 that the HbC assumption is still needed for the accountable integrity property, since the server maintains a log of accesses, which allows for blaming misbehaving parties. Secrecy, tamper-resistance, and anonymity hold true even if the server is malicious and colludes with clients. Furthermore, we treat two variants of obliviousness, which differ in the non-collusion assumption: in the first variant, it holds only against a malicious server that does not collude with clients while the second variant allows for collusion. The rest of the paper is organized with respect to these two variants: we first investigate on the collusion-enabled variant in Section 3, Section 4, and Section 5. Then we focus on the non-collusion variant in Section 6, Section 7, and Section 8. The reason for this organization is motivated by the improvement in efficiency that we gain with each further construction.

3. Computational lower bound

In this section, we study how much computational effort is necessary to realize a Group ORAM where obliviousness should hold against a server colluding with malicious clients. Our result shows that any construction, regardless of the underlying computational assumptions, must access the entire memory (up to a constant factor) in every operation. Our lower bound can be seen as a generalization of the result on history independence of Roche et al. [78], in the sense that they consider a “catastrophic attack” where the complete state of the client is leaked to the adversary, whereas we allow only the corruption of a certain subset of clients. Note that, while the bound in [78] concerns the communication complexity, our result only bounds the computation complexity on the server side.

3.1. Formal result

In the following we state a formal lower bound on the computational complexity of any ORAM secure against malicious clients. The proof is postponed to Appendix D. We denote by physical addresses of a database the memory addresses associated with each storage cell of the memory. Intuitively, the lower bound says that the server has to access each entry of the dataset for any read and write operation.

Theorem 1.
Let n be the number of entries in the database and $Π_{GORAM}$ be a Group ORAM scheme. If $Π_{GORAM}$ accesses on average $o (n)$ physical addresses for each read and write operation (over the random coins of the read or write operation, respectively), $Π_{GORAM}$ is not oblivious against malicious clients (see Definition 6 ).

3.2. Discussion

Given the lower bound established in the previous section, we know that any Group ORAM scheme that is oblivious against malicious clients must read and write a fixed constant fraction of the database on every access. However, the bound does not impose any restriction on the required communication bandwidth. In fact, it does not exclude constructions with sublinear communication complexity, where the server performs a significant amount of computation. In particular, the aforementioned lower bound calls for the deployment of private information retrieval (PIR) [24] technologies, which allow a client to read an entry from a database without the server learning which entry has been read.

The problem of private database modification is harder. A naïve approach would be to let the client change each entry in the database $DB$ upon every access, which is however too expensive. Homomorphic encryption might be a natural candidate to outsource the computation to the server and to reduce the required bandwidth: unfortunately, Ostrovsky and Skeith III [73] showed that no private database modification (or PIR writing) scheme with sublinear communication (in the worst case) can be implemented using algebraic cryptographic constructions, such as linearly homomorphic encryption schemes. This result does not apply to schemes based on fully-homomorphic encryption, which is however hardly usable in practice due to the high computation cost associated with the currently known schemes.

The following sections describe our approach to bypass these two lower bounds. First we show how to integrate non-algebraic techniques, specifically out-of-band communication among clients, so as to achieve sublinear amortized communication complexity (Section 4). Second, we show how to leverage a trusted proxy performing the access to the server on behalf of clients in order to reach a logarithmic overhead in communication and server-side computation, with constant client-slide computation (Section 5).

4. PIR-GORAM

In this section, we present $PIR$ - $GORAM$ , a Group ORAM scheme based on PIR. Our construction is inspired by Franz et al. [42], who proposed to augment the database with a stack of modified entries, which is periodically flushed into the database by the data owner. In our construction, we let each client $C_{i}$ maintain its own temporary stack of entries $S_{i}$ that is stored on the server side in addition to the regular database $DB$ . These stacks contain recent changes to entries in $DB$ and to entries in other clients’ stacks, which are not yet propagated to $DB$ . In contrast to the approach by Franz et al. [42], clients themselves are responsible to flush their stack once it is filled (i.e., after $| S_{i} |$ many operations), without requiring any intervention of the data owner. An oblivious gossiping protocol, which can be realized using standard techniques [36,59], allows clients to find the most up-to-date entry in the database, thereby obtaining a sublinear communication bandwith even for write operations and thus bypassing the impossibility result by Ostrovsky and Skeith III [73].

More precisely, when operating on index j, the client performs a PIR read on $DB$ and on all stacks $S_{i}$ , which can easily be realized since all stacks are stored on the server. Thanks to the oblivious gossiping protocol, the client knows which index is the most current one. At this point, the client appends either a dummy entry (read) or a real entry (write) to its personal stack. If the stack is full, the client flushes it. Flushing means to apply all changes in the personal stack to the database. To be oblivious, the client has to ensure that all entries in $DB$ change. Moreover, for guaranteeing correctness, the client has to ensure that it does not overwrite entries which are more recent than those in its stack.

After explaining how to achieve obliviousness, we also need to discuss how to realize access control and how to protect the clients against the server. Data secrecy (i.e., read access control) is obtained via public-key encryption. Tamper-resistance (i.e., a-posteriori detection of illegal changes) is achieved by letting each client sign the modified entry so that others can check that this entry was produced by a client with write access. Data integrity (i.e., write access control) is achieved by further letting each client prove to the server that it is eligible to write the entry. As previously mentioned, data integrity is stronger than tamper-resistance, but assumes an honest-but-curious server: a malicious server may collude with malicious clients and thus store arbitrary information without checking integrity proofs.

4.1. Cryptographic preliminaries

$PIR$ - $GORAM$ relies on the standard cryptographic primitives public-key encryption, digital signatures, private information retrieval (PIR), and non-interactive zero-knowledge proofs of knowledge (NIZK). We summarize the notation in Table 2 while we postpone a detailed description to Appendix B. We require two different public-key cryptosystems: one that is IND-CPA secure [48], which also supports randomization and additively homomorphic operations, and one that is IND-CCA secure [11]. We index encryption and decryption keys with the respective acronym to make explicit which notion we use.

Table 2
Notation for cryptographic primitives

Primitive Notation

Public-key encryption $Π_{PKE}$

Key generation $({ek}^{x}, {dk}^{x}) \leftarrow {Gen}_{PKE} (1^{λ})$ where $x \in {CPA, CCA}$

Encryption $c \leftarrow E (ek, m)$

Decryption $m \leftarrow D (dk, c)$

Randomization $c^{'} \leftarrow Rnd (ek, c, r)$

Additive homomorphism $D (dk, E (ek, m) \otimes E (ek, n)) = m + n$ , $D (dk, α \cdot E (ek, m)) = α m$

Digital signatures $Π_{DS}$

Key generation $(vk, sk) \leftarrow {Gen}_{DS} (1^{λ})$

Signing $σ \leftarrow sign (sk, m)$

Verification ${⊤, ⊥} \leftarrow vfy (vk, σ, m)$

Private information retrieval $Π_{PIR}$

Query generation $q \leftarrow prepRead (DB, i)$

Query execution $r \leftarrow execRead (DB, q)$

Response decoding $d \leftarrow decodeResp (r)$

NIZK $P = PK {(\vec{x}) : F (\vec{x}, \vec{y})}$ , $\vec{x}$ hidden by P, $\vec{y}$ revealed by P

Primitive	Notation
Public-key encryption	$Π_{PKE}$
Key generation	$({ek}^{x}, {dk}^{x}) \leftarrow {Gen}_{PKE} (1^{λ})$ where $x \in {CPA, CCA}$
Encryption	$c \leftarrow E (ek, m)$
Decryption	$m \leftarrow D (dk, c)$
Randomization	$c^{'} \leftarrow Rnd (ek, c, r)$
Additive homomorphism	$D (dk, E (ek, m) \otimes E (ek, n)) = m + n$ , $D (dk, α \cdot E (ek, m)) = α m$
Digital signatures	$Π_{DS}$
Key generation	$(vk, sk) \leftarrow {Gen}_{DS} (1^{λ})$
Signing	$σ \leftarrow sign (sk, m)$
Verification	${⊤, ⊥} \leftarrow vfy (vk, σ, m)$
Private information retrieval	$Π_{PIR}$
Query generation	$q \leftarrow prepRead (DB, i)$
Query execution	$r \leftarrow execRead (DB, q)$
Response decoding	$d \leftarrow decodeResp (r)$
NIZK	$P = PK {(\vec{x}) : F (\vec{x}, \vec{y})}$ , $\vec{x}$ hidden by P, $\vec{y}$ revealed by P

4.2. System assumptions

Data structures. $DB$ stores up to N entries of size B each, hence the maximum capacity of $DB$ is $B N$ . The number of clients with access to $DB$ is at most M. We assume that the server $S$ has a storage capacity of $O (B N + \sum_{i = 1}^{k} B | S_{i} |)$ for the database and the client stacks; each $C_{i}$ has a storage capacity of $O (| S_{i} | B + N)$ to store the personal stack and a partial position map $posmap$ , which is used to find the most current version of each entry; finally, $O$ has a storage capacity of $O (N + B)$ to store the full position map and the access control matrix. While the position map is in general $O (N)$ , this is usually much less than the storage size of $O (N B)$ [13] and can also be decreased to $O (1)$ by storing it in recursive ORAMs [90]. The database $DB$ is accompanied by a private access control matrix $AC$ that lets $O$ manage the per-client permissions for each entry in $DB$ . The possible access rights are $R$ (read-only), $RW$ (read-write), and ⊥ (no access). Finally, we assume authenticated broadcast channels among clients so as to gossip position map updates using standard techniques3

³
Gossiping is necessary since we do not trust the server for consistency.

[36 ,59].

Database layout. We represent the logical database $DB$ as a list of entries $DB = E_{1}, \dots, E_{N}$ and a list of stacks $S_{1}, \dots, S_{M}$ , one stack for every client. Both the database and the stacks are stored on the server. A stack is an entry list $S_{i} = E_{j + 1}, \dots, E_{j + | S_{i} |}$ where $j = \sum_{k = 1}^{i - 1} | S_{k} |$ . We denote by $S_{i} (ℓ)$ the ℓth entry of $S_{i}$ . We write $S_{1} ‖ \dots ‖ S_{M}$ to express the list of entries in all stacks. Similarly, we count from 1 to $\sum_{i = 1}^{M} | S_{i} |$ to index an entry in $S_{1} ‖ \dots ‖ S_{M}$ .

Client capabilities. We assume that every client $C_{i}$ holds a key pair $({ek}_{i}^{CPA}, {dk}_{i}^{CPA})$ for a CPA-secure encryption scheme as well as a key pair $({ek}_{i}^{CCA}, {dk}_{i}^{CCA})$ for a CCA-secure encryption scheme where $({ek}_{i}^{CPA}, {ek}_{i}^{CCA})$ are publicly known and $({dk}_{i}^{CPA}, {dk}_{i}^{CCA})$ are $C_{i}$ ’s private keys. Moreover, each client stores a position map $posmap$ and version numbers $(vrs, {vrs}_{O})$ for every entry it holds permissions on. These version numbers are necessary to prevent roll-back attacks (intuitively, the former on data, the latter on the access control matrix) and they are broadcast together with new indices for an entry upon write operations or whenever the policy is changed. Finally, every client stores a mapping which maps stack positions to entry indices: we keep this mapping implicit.

Fig. 8.

The entry structure of an entry in the main database. If an entry resides on the stack, it contains only the part $c_{Data}$ .

Entry structure. An entry in the database $DB$ has the form $E = (c_{Data}, c_{BrCast}, c_{Auth}, σ_{O})$ where $c_{Data}$ , $c_{BrCast}$ , and $c_{Auth}$ are vectors of ciphertexts of length M and $σ_{O}$ is a signature of the data owner $O$ . We describe each component and its functionality in the following (see also Fig. 8).

$c_{Data}$ . The ciphertext $c_{Data}$ regulates read accesses. Specifically, it encrypts the data d of E for every client4

⁴

This notation simplifies the presentation, but in the implementation we use of course hybrid encryption (cf. Section 10).

with at least

R

permissions or a zero string for the others:

\begin{matrix} (1) & c_{Data}^{i} = \{\begin{matrix} E ({ek}_{i}^{CPA}, d ‖ vrs ‖ σ) & if AC (i, j) \neq ⊥ \\ E ({ek}_{i}^{CPA}, 0^{| d | + | vrs | + | σ |}) & otherwise \end{matrix} \end{matrix}

where in addition to d, the ciphertext also contains the data version number

vrs

as well as a signature σ such that

⊤ = Vrfy (vk, d ‖ vrs)

. By

vk

we denote a verification key of a signature scheme which is different for every entry. An entry is valid if the verification of σ with

vk

outputs ⊤ and

vk

is the key authenticated by the data owner

O

σ_{O}

, and invalid otherwise.

$c_{BrCast}$ . The ciphertext $c_{BrCast}$ is needed in the $Broadcast$ protocol (described below), which is used to obliviously propagate a new entry index and new version numbers to other clients with read access. Specifically, it encrypts either 1 for every client with at least $R$ permissions or zero for the others: $\begin{matrix} (2) & c_{BrCast}^{i} = \{\begin{matrix} E ({ek}_{i}^{CPA}, 1) & if AC (i, j) \neq ⊥ \\ E ({ek}_{i}^{CPA}, 0) & otherwise \end{matrix} \end{matrix}$

$c_{Auth}$ . This ciphertext contains the signing key corresponding to $vk$ for those clients with $RW$ permissions or the zero string for the others. The exact form is $\begin{matrix} (3) & c_{Auth}^{i} = \{\begin{matrix} E ({ek}_{i}^{CCA}, sk) & if AC (i, j) = RW \\ E ({ek}_{i}^{CCA}, 0^{| sk |}) & otherwise \end{matrix} \end{matrix}$

$σ_{O}$ . The signature $σ_{O}$ is created by $O$ on the entry index j, a version number ${vrs}_{O}$ , the verification key $vk$ , and $c_{BrCast}$ .

Note that one cannot store the signing key $sk$ in the entry $c_{Data}$ . The reason is that whenever an entry is updated, the client needs to update all entries in the vector. However, for all entries except for its own, it does not know the private decryption key ${dk}_{i}$ and thus, neither the corresponding private signing key nor the access rights for that entry. To update these entries, we exploit the homomorphic properties of the underlying encryption scheme, as explained below.

Update. Entries residing on a client’s stack consist only of $c_{Data}$ in modified form where the old payload $D = d ‖ vrs ‖ σ$ has been replaced with $D^{'} = d^{'} ‖ {vrs}^{'} ‖ Sign (sk, d^{'} ‖ {vrs}^{'})$ . Indeed, leveraging the homomorphic property and the structure of $c_{BrCast}$ (note that $c_{BrCast}$ is like $c_{Data}$ , where D is replaced with 1) it is possible to generate $c_{Data}^{'}$ as follows: choose $r_{i}$ uniformly at random and compute $\begin{array}{l} (4) & c_{Data}^{i^{'}} = Rnd ({ek}_{i}^{CPA}, c_{BrCast}^{i} \cdot D^{'}, r_{i}) . \end{array}$

Multiple data owners in one ORAM. The entry structure and database layout of $PIR$ - $GORAM$ can be easily extended in order to support multiple data owners storing their files in the same ORAM instance (think, e.g., of multiple patients storing their health record in the same ORAM), which is important to enhance user privacy (as the server does not even learn the owner of the accessed data). First, the signature $σ_{O}$ is obviously constructed by the data owner to which the entry belongs. Most importantly, every entry might have a different set of potential readers and writers (e.g., not every patient visits the same doctors or pharmacies). As a consequence, an important invariant to maintain is that $c_{Data}$ , $c_{BrCast}$ , and $c_{Auth}$ are of equal length for every entry (i.e., the number of encryption keys used to construct them are the same), which can be easily achieved by padding. Otherwise, trivial entry-size based attacks against obliviousness are possible.

Obliviously broadcasting new indices. We propagate the updates of the entries to the clients with read access via broadcast. That is, if $E = (c_{Data}, c_{BrCast}, c_{Auth}, σ_{O})$ with an old index j in the database $DB$ and the new index ℓ on a stack or in $DB$ , then we broadcast a message to all clients that can only be decrypted by clients having access to that entry. To this end, we leverage the same idea as in Equation (4), that is, we add the new index information to it. Clients compute $c_{BrCast}^{'}$ with $c_{BrCast}^{i^{'}} = Rnd ({ek}_{i}^{CPA}, c_{BrCast}^{i} \cdot (j ‖ ℓ ‖ {vrs}^{'}), r_{i})$ for some random values $r_{i}$ and a new version number ${vrs}^{'}$ , and broadcast $c_{BrCast}^{'}$ to all clients. We call this operation $Broadcast ((j ‖ ℓ ‖ {vrs}^{'}), c_{BrCast})$ .

Clients having read access update their position map as follows. Upon receiving such a message c, the client $C_{i}$ tries to decrypt the component corresponding to her identity with her private key ${dk}_{i}^{CPA}$ . If the result is $(j ‖ ℓ ‖ {vrs}^{'})$ and not 0 (which means that it has $R$ access at least), then $C_{i}$ updates its partial position map with the result. Otherwise it ignores the message. This protocol is oblivious since it is deterministically executed upon each operation and only clients with $R$ access (which, as previously discussed, are excluded by the definition of obliviousness) can extract knowledge from the received ciphertext thanks to the CPA-security of the underlying encryption scheme.

Since malicious clients could potentially send wrong gossip messages about entries, e.g., claiming that an entry is residing in a different place than it actually is, we require that clients upload their broadcast messages also onto an append-only log, e.g., residing on the cloud, which is accessible by everyone. If a client does not find an entry using the latest index information, due to the malicious behavior of another client, then it just looks up the previous index and tries it there, and so on. Such append-only logs can be realized securely both in centralized [52] and decentralized [28] fashion. Utilizing such logs also enables accountability since the client who announced a wrong index is identifiable and, hence, blamable.

4.3. Algorithmic description

Setup. The input to the setup algorithm is a list of data $d_{1}, \dots, d_{N}$ and a list of clients $C_{1}, \dots, C_{M}$ with an access control matrix $AC$ which has an entry for every entry-client pair. The data owner first generates her own signing key pair $({vk}_{O}, {sk}_{O}) \leftarrow {Gen}_{DS} (1^{λ})$ and generates two encryption key pairs $({ek}_{j}^{CPA}, {dk}_{j}^{CPA}) \leftarrow {Gen}_{PKE}^{CPA} (1^{λ})$ and $({ek}_{j}^{CCA}, {dk}_{j}^{CCA}) \leftarrow {Gen}_{PKE}^{CCA} (1^{λ})$ for every client $C_{j}$ . Second, the data owner prepares every entry separately as follows: she generates a fresh signing key pair $(vk, sk) \leftarrow {Gen}_{DS} (1^{λ})$ and sets up $c_{Data}$ as in Equation (1) using $d_{j}$ and a version number 0, attaching a signature $σ = Sign (sk, d_{j} ‖ 0)$ . $c_{BrCast}$ is generated as in Equation (2). Next, $c_{Auth}$ is generated as in Equation (3) using the just generated $sk$ . Finally, using a data owner version number 0, $O$ attaches $σ_{O} = Sign ({sk}_{O}, j ‖ 0 ‖ vk ‖ c_{BrCast})$ . $O$ uploads all entries to $S$ and broadcasts the client capabilities ${cap}_{i} = ({posmap}_{i}, \vec{ek}, {vk}_{O}, i_{S}, {len}_{S}, {dk}_{i}^{CPA}, {dk}_{i}^{CCA})$ where ${posmap}_{i}$ is the full position map $posmap$ restricted to those entries on which $C_{i}$ holds at least $R$ permissions, $\vec{ek}$ is a list of all clients’ encryption keys, $i_{S} = 0$ is $C_{i}$ ’s current stack pointer, and ${len}_{S}$ is the corresponding stack length. Notice that initially, $posmap$ is the identity mapping on the domain ${1, \dots, N}$ since all entries reside in the main database and the stacks are empty.

Algorithm 1

${d, deny} \leftarrow ⟨ C_{extData} ({cap}_{i}, j), S_{extData} (DB) ⟩$

Algorithm 2

$⟨ C_{repl} ({cap}_{i}, j, vrs, d^{'}, c_{BrCast}, c_{Auth}), S_{repl} (DB) ⟩$

Algorithm 3

$⟨ C_{addDummy} ({cap}_{i}, c_{BrCast}), S_{addDummy} (DB) ⟩$

Reading and writing. To read or write to the database, clients have to perform two steps: extracting the data (Algorithm 1) and appending an entry to the personal stack (Algorithm 2 for writing and Algorithm 3 for reading).

To extract the payload, the client performs two PIR queries: one on $DB$ for the desired index j and one on the concatenation of all stacks for either a more current version of j or an arbitrary one (lines 1.1–1.8): this is crucial to hide from the server the information on whether or not the client is retrieving a previously modified entry. It then checks the entry’s authenticity as provided by $σ_{O}$ and retrieves the verification key used for further verification (line 1.9). The client extracts henceforth the overall payload (line 1.11) from the most current entry (either in $DB$ or on a stack (line 1.10)) and verifies its validity (line 1.12). Before returning the extracted data (line 1.17), the client flushes the personal stack if it is full (lines 1.13–1.16). We explain this algorithm in the next paragraph. We stress that data extraction is performed independently of whether the client reads or writes. Note that up to this point, since the server only sees PIR queries, it cannot distinguish read and write.

The next step (i.e., adding an entry to the stack), however, requires more care in order to retain obliviousness. In particular, when writing, the client appends an entry to its personal stack that replaces the actual entry in $DB$ (see Algorithm 2). In order to make read and write indistinguishable, when reading, the client appends an entry to its stack which is indistinguishable from a real entry since it is an entry on which no-one holds any permissions (see Algorithm 3). Finally, the client broadcasts the modified index information in $write$ or a zero string in $read$ .

Flushing the stack (Algorithm 4 ). The flush algorithm pushes the elements in the stack that are up-to-date to $DB$ .5

⁵

Some elements may be outdated, since a different user may have the most recent version in its stack.

In particular, the client first builds an index structure that contains all elements that are up-to-date (ϕ, lines 4.2–4.9) based on the mapping of stack indices to real indices that the client stores implicitly. The client then downloads the stack (line 4.10) and changes every entry of

DB

(PIR writing). To this end, it downloads and uploads every entry

E_{j} \in DB

(lines 4.12, 4.21, and 4.27).

Algorithm 4

$⟨ C_{flush} ({cap}_{i}), S_{flush} (DB) ⟩$

If the currently downloaded entry is outdated, the client takes the locally stored data from $S_{i}$ and rerandomizes it (lines 4.14–4.18). Then it computes an integrity proof (technically, a NIZK) P that shows the following OR statement: either it is eligible to write the entry by proving that it knows the signing key (line 4.19) corresponding to the verification key (line 4.13) which is authenticated by the data owner, or it only rerandomized the data part (line 4.20). In that notation, the underscore $_$ refers to hidden variables in the proof that the client does not know.

In case there is no entry in the stack that is more recent, it rerandomizes the current entry in $DB$ (line 4.25) and creates an integrity proof with the same statement as in the previous case, just that now the second part of the disjunction is true (line 4.26). In any case, the client broadcasts the new indices of all updated entries to all clients (line 4.22 for a real update and line 4.28 for a dummy update). We stress that the two proofs created in lines 4.20 and 4.26 are indistinguishable by the zero-knowledge property and hence do not reveal to the server whether the entry is updated or left unchanged, which is crucial for achieving the obliviousness of data accesses.

Adding new clients. To grant a new client $C_{i}$ access to entries in $DB$ , $O$ prepares a client capability ${cap}_{i}$ as described above in the setup phase. In general, if not all capabilities are created initially, every entry has to be adapted when adding a new client as well as every client’s capability. More precisely, for each entry, $O$ adds a ciphertext to $c_{Data}$ , $c_{BrCast}$ and $c_{Auth}$ and every client needs to learn ${ek}_{i}^{CPA}$ and ${ek}_{i}^{CCA}$ .

Adding new entries. To add a new entry to the database, $O$ prepares it according to the entry structure and sends it to $S$ . Finally, $O$ obliviously broadcasts the corresponding index information to all clients.

Changing access permissions. To change access permissions of a certain entry, $O$ modifies $c_{Data}$ , $c_{BrCast}$ and/or $c_{Auth}$ as well as $σ_{O}$ (with a new version number ${vrs}_{O}$ ) accordingly.

4.4. Complexity analysis

We elaborate on the communication complexity of our solution. We assume that $| DB | = N$ , that there M clients, and we set the stack length ${len}_{S} = \sqrt{N}$ for every client. The worst case for an operation, hence, happens every $\sqrt{N}$ th operation for a client $C_{i}$ , meaning that besides extracting the data from the database and adding an entry to the personal stack, $C_{i}$ has also to flush the stack. We analyze the four algorithms independently: extracting data requires two PIR reads, one on $DB$ and the other on the concatenation of all stacks. Thus, the overall cost is $C_{PIR} (N) + C_{PIR} (M \sqrt{N})$ where $C_{PIR}$ is a function mapping input sizes to communication complexity; $C_{PIR}$ is to be replaced with concrete numbers when instantiating $Π_{PIR}$ . Adding an entry to the personal stack always requires to upload one entry, independently of whether this replacement is real or dummy.

Our flushing algorithm assumes that $C_{i}$ holds $\sqrt{N}$ entries and then down-and-uploads every entry of $DB$ . Thus, the overall complexity is $2 N + \sqrt{N}$ . A similar analysis shows that if the client holds only $O (1)$ many entries, then $C_{i}$ down-and-uploads $DB$ but additionally performs a PIR step for every downloaded entry in its own stack to retrieve a potential replacement, resulting in a complexity of $2 N + N \cdot C_{PIR} (\sqrt{N})$ .

To conclude, the construction achieves a worst-case complexity of $O (C_{PIR} (N) + C_{PIR} (M \sqrt{N}) + N)$ and $O (C_{PIR} (N) + C_{PIR} (M \sqrt{N}) + N C_{PIR} (\sqrt{N}))$ for $O (\sqrt{N})$ and $O (1)$ client-side memory, respectively. By amortizing the flush step over $\sqrt{N}$ many operations, we achieve an amortized complexity of $O (C_{PIR} (N) + C_{PIR} (M \sqrt{N}) + \sqrt{N})$ or $O (C_{PIR} (N) + C_{PIR} (M \sqrt{N}) + \sqrt{N} C_{PIR} (\sqrt{N}))$ , respectively. Since our construction is parametric over the specific PIR protocol, we can leverage the progress in this field: at present, the best $C_{PIR} (N)$ is $O (log log (N))$ [38] and, hence, the amortized cost becomes $O (log log (M \sqrt{N}) + \sqrt{N})$ or $O (log log (M \sqrt{N}) + \sqrt{N} log log (N))$ , respectively. Since, in most scenarios, $M \sqrt{N} < 2^{2^{N / 2}}$ , we get $O (\sqrt{N})$ and $O (\sqrt{N} log log (N))$ .

4.5. Variations

We discuss some variations of our constructions that achieve different assumptions and properties.

Malicious server. The construction, as we presented it, achieves integrity against an honest-but-curious server. If the server is malicious though, we cannot rely on it to verify the integrity proofs: a way to overcome this problem could be to force the server into honest behavior by letting him prove the correctness of his actions (e.g., using SMPC [20,46,63,95]). This is, unfortunately, prohibitively expensive. Furthermore, since the whole database is solely stored on the server, it is clearly impossible to guarantee that the read operation is always perfomed correctly (e.g., the server could just go offline and therefore effectively erasing all the entries in the database) and thus to achieve any meaningful notion of integrity. However, we can still allow clients to detect a-posteriori illegal data changes, a security property that we address as tamper resistance. For achieving this property, integrity proofs are useless and can be dropped.

Relaxed $AC$ privacy for better amortization. The analysis above shows that we achieve a $\sqrt{N}$ amortization factor, which is partially due to the fact that clients have to replace every entry of $DB$ even if they do not change it or do not even have the rights to change it for keeping the access structure secret from the server. Assume that $C_{i}$ has $RW$ access to $K_{i}$ many entries. If we were fine with giving up the privacy of $AC$ and restricting obliviousness to those entries which may actually be changed by $C_{i}$ , then it would be sufficient in the flush algorithm to only exchange those $K_{i}$ entries instead of $DB$ in its entirety. Adapting the analysis and only considering the case where clients have $\sqrt{N}$ storage space, we get a worst case communication complexity of $O (C_{PIR} (N) + K_{i})$ and an amortized communication complexity of $O (C_{PIR} (N) + \sqrt{N} / K_{i})$ . This means that the amortization factor is constant whenever $K_{i} = O (\sqrt{N})$ .

4.6. Discussion

The construction presented in this section leverages PIR for reading entries and an accumulated PIR writing technique to replace old entries with newer ones. Due to the nature of PIR, one advantage of the construction is its possibility to allow multiple clients to concurrently read from the database and to append single entries to their stacks. This is no longer possible when a client flushes her personal stack since the database is entirely updated, which might lead to inconsistent results when reading from the database. To overcome this drawback, we present a fully concurrent, maliciously secure Group ORAM in Section 5. Another drawback of the flush algorithm is the cost of the integrity (zero-knowledge) proofs. Since we have to use public-key encryption as the top-layer encryption scheme for every entry to allow for proving properties about the underlying plaintexts, the number of proofs to be computed, naïvely implemented, is proportional to the block size. Varying block sizes require us to split an entry into chunks and encrypt every chunk separately since the message space of public-key encryption is a constant amount of bits. The zero-knowledge proof has then to be computed on every of these encrypted chunks. Our later construction for obliviousness only against the server, called $GORAM$ , suffers a very similar problem. Hence, to overcome this linear dependency, we present two new proof paradigms. We present them in the context of $GORAM$ , where the impact of integrity proofs is more severe than in $PIR$ - $GORAM$ . The first paradigm decreases the linear amount of zero-knowledge proofs to an amount that only depends on the security parameter (Section 6.5.1) while the second paradigm decreases this still linear amount (though in a different, much smaller parameter) to one proof (Section 6.5.2).

5. TAO-GORAM

Driven by the goal of building an efficient and scaleable Group ORAM that achieves obliviousness against malicious users, we explore the usage of a trusted proxy mediating accesses between clients and the server, an approach advocated in recent parallel ORAM constructions [13,79,87]. In contrast to previous works, we are not only interested in parallel accesses, but also in handling access control and providing obliviousness against multiple, possibly malicious, clients.

TaoStore [ 79 ]. In a nutshell, trusted proxy-based ORAM constructions implement a single-client ORAM which is run by the trusted entity on behalf of clients, which connect to it with read and write requests in a parallel fashion. We leverage the state of the art, TaoStore [79], which implements a variant of a Path-ORAM [90] client on the proxy and allows for retrieving multiple paths from the server concurrently. More specifically, the proxy consists of the processor and the sequencer. The processor performs read and write requests to the untrusted server: this is the most complex part of TaoStore and we leave it untouched. The sequencer is triggered by client requests and forwards them to the processor which executes them in a concurrent fashion.

Our modifications. Since the proxy is trusted, it can enforce access control. In particular, we can change the sequencer so as to let it know the access control matrix and check for every client’s read and write requests whether they are eligible or not. As already envisioned by Sahin et al. [79], the underlying ORAM construction can be further refined in order to make it secure against a malicious server, either by following the approach based on Merkle-trees proposed by Stefanov et al. [90] or by using authenticated encryption as suggested by Sahin et al. [79]. In the rest of the paper, we call the system $TAO - GORAM$ .

6. GORAM

In this section, we first show how to realize a Group ORAM using a novel combination of ORAM, predicate encryption, and zero-knowledge proofs (Section 6.2 and Section 6.3). Since even the usage of the most efficient zero-knowledge proof system still yields an inefficient construction, we introduce a new proof technique called batched ZK proofs of shuffle (Section 6.5.1) and instantiate our general framework with this primitive.

6.1. Cryptographic preliminaries

Our construction relies on predicate encryption, public-key encryption, and NIZKs. We summarize the missing notation for predicate encryption in Table 3. Notice that $m \leftarrow D_{PE} ({psk}_{f}^{}, c)$ only returns a valid message m if $c \leftarrow E_{PE} (ppk, x, m)$ such that $f (x) = 1$ . Further details are postponed to Appendix B.

Table 3
Additional notation for cryptographic primitives

Primitive Notation

Predicate encryption $Π_{PE}$

Setup $(pmsk, ppk) \leftarrow {Gen}_{PE} (1^{λ}, n)$

Key generation ${psk}_{f}^{} \leftarrow K_{PE} (pmsk, f)$

Encryption $c \leftarrow E_{PE} (ppk, x, m)$

Decryption $m \leftarrow D_{PE} ({psk}_{f}^{}, c)$

Randomization $c^{'} \leftarrow R_{PE} (ppk, c, r)$

Multiplicative homomorphism $m n \leftarrow D_{PE} ({psk}_{f}^{}, E_{PE} (ppk, x, m) \otimes E_{PE} (ppk, x, n))$

$α m \leftarrow D_{PE} ({psk}_{f}^{}, α \cdot E_{PE} (ppk, x, m))$

Primitive	Notation
Predicate encryption	$Π_{PE}$
Setup	$(pmsk, ppk) \leftarrow {Gen}_{PE} (1^{λ}, n)$
Key generation	${psk}_{f}^{} \leftarrow K_{PE} (pmsk, f)$
Encryption	$c \leftarrow E_{PE} (ppk, x, m)$
Decryption	$m \leftarrow D_{PE} ({psk}_{f}^{}, c)$
Randomization	$c^{'} \leftarrow R_{PE} (ppk, c, r)$
Multiplicative homomorphism	$m n \leftarrow D_{PE} ({psk}_{f}^{}, E_{PE} (ppk, x, m) \otimes E_{PE} (ppk, x, n))$
$α m \leftarrow D_{PE} ({psk}_{f}^{}, α \cdot E_{PE} (ppk, x, m))$

6.2. System assumptions

Data structures and database layout. The layout of the database $DB$ follows the one proposed by Stefanov et al. [90]. To store N data entries, we use a binary tree T of depth $D = O (log N)$ , where each node stores a bucket of entries, say b entries per bucket. We denote a node at depth d and row index i by $T_{d, i}$ . The depth at the root ρ is 0 and increases from top to bottom; the row index increases from left to right, starting at 0. We often refer to the root of the tree as ρ instead of $T_{0, 0}$ . Moreover, Path-ORAM [90] uses a so-called stash as local storage to save entries that would overflow the root bucket. We assume the stash to be stored and shared on the server like every other node, but we leave it out for the algorithmic description. The stash can also be incorporated in the root node, which does not carry b but $b + s$ entries where s is the size of the stash. The extension of the algorithms is straightforward (only the number of downloaded entries changes) and does not affect their computational complexity. In addition to the database, there is an index structure $posmap$ that maps entry indices i to leaf indices $l_{i}$ . If an entry index i is mapped in $posmap$ to $l_{i}$ then the entry with index i can be found in some node on the path from the leaf $l_{i}$ to the root ρ of the tree. Finally, to initialize the database we fill it with dummy elements.

We assume that each client has a local storage of $O (log N)$ . Notice that the leaf index mapping has size $O (N)$ , but the local client storage can be decreased to $O (log N)$ by applying a standard ORAM construction recursively to it, as proposed by Shi et al. [85]. Additionally, the data owner stores a second database $ADB$ that contains the attributes $x_{w}$ and $x_{r}$ associated to every entry in $DB$ as well as predicates $f_{i}$ associated to the client identities $C_{i}$ . Intuitively, $ADB$ implements the access control matrix $AC$ used in Definition 1. Since also $ADB$ has size $O (N)$ , we use the same technique as the one employed for the index structure. We further assume that clients establish authenticated channels with the server. These channels may be anonymous (e.g., by using anonymity networks [37] and anonymous credentials for the login [7–9,70]), but not necessarily.

Client capabilities. Every client $C_{i}$ holds a capability ${cap}_{i}$ containing three different cryptographic keys: a decryption key $dk$ for a public-key encryption scheme that serves as the top layer encryption of an entry in the tree and two predicate encryption secret key ${psk}_{f_{i}}^{Auth}$ and ${psk}_{f_{i}}^{Data}$ for predicate $f_{i}$ that regulates the client’s access permissions.

Structure of an entry and access control modes. Abstractly, database entries are tuples of the form $E = (c_{1}, c_{2}, c_{3})$ where $c_{1}, \dots, c_{3}$ are ciphertexts obtained using a public-key encryption scheme (see Fig. 9). In particular, $c_{1}$ is the encryption of an index j identifying the jth entry of the database; $c_{2}$ is the encryption of a predicate encryption ciphertext $c_{Auth}^{j}$ , which regulates the write access to the payload stored at j using the attribute $x_{w}$ ; $c_{3}$ is the encryption of a ciphertext $c_{Data}^{j}$ , which is in turn the predicate encryption of the data d stored at position j.6

⁶
Since encrypting a long payload using predicate encryption is expensive, the concrete instantiation that we evaluate in Section 10 uses hybrid encryption instead.

We use the convention that an index

j > | DB |

indicates a dummy entry and we maintain the invariant that such entries are writable by each client.

Fig. 9.

The entry structure of an entry in the database.

Intuitively, using a client $C_{i}$ ’s capability ${cap}_{i}$ in order to implement the access control modes ⊥, $R$ , and $RW$ on a data index j, we have to assign the attributes for an entry such that following conditions hold: if $C_{i}$ ’s mode for j is ⊥, then $f_{i} (x_{r}) = f_{i} (x_{w}) = 0$ , hence, $C_{i}$ can neither decrypt $c_{Auth}^{j}$ nor $c_{Data}^{J}$ ; if $C_{i}$ ’s mode for j is $R$ , then $f_{i} (x_{w}) = 0$ while $f_{i} (x_{r}) = 1$ ; finally, if $C_{i}$ ’s mode for j is $RW$ , then $f_{i} (x_{w}) = f_{i} (x_{r}) = 1$ . Intuitively, in order to replace an entry, a client has to successfully prove that she can decrypt the ciphertext $c_{Auth}^{j}$ and the result of that decryption is 1.

Algorithm 5

$({cap}_{O}, DB) \leftarrow gen (1^{λ}, n)$

6.3. Algorithmic description

Implementation of $({cap}_{O}, DB) \leftarrow gen (1^{λ}, n)$ (Algorithm 5 ). Intuitively, the data owner initializes the cryptographic schemes (lines 5.1–5.3) as well as the rest of the infrastructure (lines 5.4–5.5), and finally outputs $O$ ’s capability (line 5.6).7

⁷
For simplifying the notation, we assume for each encryption scheme that the public key is part of the secret key.

Notice that this algorithm takes as input the maximum number n of clients in the system, since this determines the size of the predicates ruling access control, which the predicate encryption schemes are parameterized by.

Algorithm 6

${{cap}_{i}, deny} \leftarrow addCl ({cap}_{O}, a)$

Implementation of ${{cap}_{i}, deny} \leftarrow addCl ({cap}_{O}, a)$ (Algorithm 6 ). This algorithm allows $O$ to register a new client in the system. Specifically, $O$ creates a new capability for the new client $C_{i}$ according to the given access permission list $a$ (lines 6.6–6.10). If $O$ wants to add more clients than n, the maximum number she initially decided, she can do so at the price of re-initializing the database. In particular, she has to setup new predicate encryption schemes, since these depend on n. Secondly, she has to distribute new capabilities to all clients. Finally, for each entry in the database, she has to re-encrypt the ciphertexts $c_{Auth}$ and $c_{Data}$ with the new keys.

Algorithm 7

${{DB}^{'}, deny} \leftarrow ⟨ C_{addE} ({cap}_{O}, a, d), S_{addE} (DB) ⟩$

Implementation of ${{DB}^{'}, deny} \leftarrow ⟨ C_{addE} ({cap}_{O}, a, d), S_{addE} (DB) ⟩$ (Algorithm 7 ). In this algorithm, $O$ adds a new entry that contains the payload d to the database. Furthermore, the new entry is protected according to the given access permission list $a$ . Intuitively, $O$ assigns the new entry to a random leaf and downloads the corresponding path in the database (lines 7.5–7.6). It then creates the new entry and substitutes it for a dummy entry (lines 7.7–7.10). Finally, $O$ rerandomizes the entries so as to hide from $S$ which entry changes, and finally uploads the modified path to $S$ (lines 7.11–7.15).

Algorithm 8

$(E_{1}^{″}, \dots, E_{b (D + 1)}^{″}, π, [P]) \leftarrow Evict (E_{1}, \dots, E_{b (D + 1)}, s, j, k)$

Eviction. In all ORAM constructions, the client has to rearrange the entries in the database in order to make subsequent accesses unlinkable to each other. In the tree construction we use [90], this is achieved by first assigning a new, randomly picked, leaf index to the read or written entry. After that, the entry might no longer reside on the path from the root to its designated leaf index and, thus, has to be moved. This procedure is called eviction (Algorithm 8).

This algorithm assigns the entry to be evicted to a new leaf index (line 8.1). It then locally shuffles and rerandomizes the given path according to a permutation π (lines 8.2–8.4). After replacing the old path with a new one, the evicted entry is supposed to be stored in a node along the path from the root to the assigned leaf, which always exists since the root is part of the permuted nodes. A peculiarity of our setting is that clients are not trusted and, in particular, they might store a sequence of ciphertexts in the database that is not a permutation of the original path (e.g., they could store a path of dummy entries, thereby cancelling the original data).

Integrity proofs. To tackle this problem, a first technical novelty in our construction is, in the $read$ and $write$ protocols, to let the client output the modified path along with a proof of shuffle correctness [10,21], which has to be verified by the server ( $s = 1$ , lines 8.6–8.7). As the data owner is assumed to be honest, she does not have to send a proof in the $chMode$ protocol ( $s = 0$ , line 8.9).

Algorithm 9

$⟨ C_{chMode} ({cap}_{O}, a, j), S_{chMode} (DB) ⟩$

Implementation of $⟨ C_{chMode} ({cap}_{O}, a, j), S_{chMode} (DB) ⟩$ (Algorithm 9 ). In this protocol, $O$ changes the access mode of the jth entry in $DB$ according to the new access permission list $a$ . Intuitively, she does so by downloading the path where the entry resides on (lines 9.5–9.6), changing the entry accordingly (lines 9.7–9.12), and uploading a modified and evicted path to the server (lines 9.13–9.14).

Algorithm 10

${d, deny} \leftarrow ⟨ C_{read} ({cap}_{i}, j), S_{read} (DB) ⟩$

Implementation of ${d, deny} \leftarrow ⟨ C_{read} ({cap}_{i}, j), S_{read} (DB) ⟩$ (Algorithm 10 ). Intuitively, the client downloads the path which index j is assigned to and searches for the corresponding entry (lines 10.5–10.9). She then evicts the downloaded path, subject to the restriction that some dummy entry afterwards resides in the top position of the root node (lines 10.10–10.11). $C$ uploads the evicted path together with a proof of shuffle correctness to $S$ who verifies the proof and replaces the old with the new path in case of successful verification (line 10.12).

Obliviousness in presence of integrity proofs. $C$ could in principle stop here since she has read the desired entry. However, in order to fulfill the notion of obliviousness (Definition 5), the $read$ and $write$ operations must be indistinguishable. In single-client ORAM constructions, $C$ can make $write$ indistinguishable from $read$ by simply modifying the content of the desired entry before uploading the shuffled path to the server. This approach does not work in our setting, due to the presence of integrity proofs. Intuitively, in $read$ , it would suffice to produce a proof of shuffle correctness, but this proof would not be the same as the one used in $write$ , where one element in the path changes. Hence another technical novelty in our construction is the last part of the $read$ protocol (lines 10.13–10.17), which “simulates” the $write$ protocol despite the presence of integrity proofs. This is explained below, in the context of the $write$ protocol.

Algorithm 11

${{DB}^{'}, deny} \leftarrow ⟨ C_{write} ({cap}_{i}, j, d), S_{write} (DB) ⟩$

Implementation of ${{DB}^{'}, deny} \leftarrow ⟨ C_{write} ({cap}_{i}, j, d), S_{write} (DB) ⟩$ (Algorithm 11 ). Firstly, $C$ reads the element that she wishes to change (line 11.1). Secondly, $C$ evicts the path with the difference that here the first entry in the root node is the element that $C$ wants to change, as opposed to a dummy entry like in $read$ (line 11.10). It is important to observe that the shuffle proof sent to the server (line 8.6) is indistinguishable in $read$ and $write$ since it hides both the permutation and the randomness used to rerandomize the entries. So far, we have shown how $C$ can upload a shuffled and rerandomized path to the server without modifying the content of any entry.

In $write$ , $C$ can now replace the first entry in the root node with the entry containing the new payload (lines 11.12–11.13). In $read$ , this step is simulated by rerandomizing the first entry of the root node, which is a dummy entry (line 10.15).

The integrity proofs $P_{Auth}$ and $P_{Ind}$ produced in $read$ and $write$ are indistinguishable (lines 10.14 and 10.16 for both): in both cases, they prove that $C$ has the permission to write on the first entry of the root node and that the index has not changed. Notice that this proof can be produced also in $read$ , since all clients have write access to dummy entries.

Permanent entries. Some application scenarios of $GORAM$ might require determined entries of the database not to be modifiable nor deletable, not even by the data owner herself (for instance, in the case of PHRs, the user should not be able to cancel diagnostic results in order to pay lower insurance fees). Even though we did not explicitly describe the construction, we mention that such a property can be achieved by assigning a binary attribute (modifiable or permanent) to each entry and storing a commitment to this in the database. Every party that tries to modify a given entry, including the data owner, has to provide a proof that the respective attribute is set to modifiable. This can be efficiently instantiated using El Gamal encryption and Σ-protocols.

6.4. Complexity analysis

The computational and communication complexity of our construction, for both the server and the client, is $O ((B + G) log N)$ where N is the number of the entries in the database, B is the block size of the entries in the database, and G is the number of clients that have access to the database. $O (B log N)$ originates from the ORAM construction and we add $O (G log N)$ for the access structure. Hence, our solution only adds a small overhead to the standard ORAM complexity. The client-side storage is $O (B log N)$ , while the server has to store $O (B N)$ many data.

6.5. Integrity proofs revisited

A zero-knowledge proof of shuffle correctness of a set of ciphertexts proves in zero-knowledge that a new set of ciphertexts contains the same plaintexts in permuted order. In our system the encryption of an entry, for reasonable block sizes, yields in practice hundreds of ciphertexts, which means that we have to perform hundreds of shuffle proofs. These are computable in polynomial-time but, even using the most efficient known solutions (e.g., [10,56]), not fast enough for practical purposes. This problem has been addressed in the literature but the known solutions typically reveal part of the permutation (e.g., [57]), which would break obliviousness and, thus, are not applicable in our setting.

General problem description. Let $Π_{PKE} = ({Gen}_{PKE}, E, D, Rnd)$ be a randomizable, additively homomorphic public-key encryption scheme with message space $M = F_{p}$ for some field $F_{p}$ , e.g., ElGamal [39] or Paillier [74]. Let furthermore $(P, V)$ be a zero-knowledge proof system ( $ZKP$ ) that takes as input two n-length ciphertext vectors $\vec{a} \in E (ek, \vec{m})$ and $\vec{b} \in E (ek, π (\vec{m}))$ where π is a permutation applied to some message vector $\vec{m}$ , outputs a proof for the statement $\begin{matrix} \exists \vec{r}, π . \forall 1 ⩽ i ⩽ n . b_{i} = Rnd (ek, a_{π^{- 1} (i)}, r_{i}) . \end{matrix}$ The goal is to construct a proof system $(P^{*}, V^{*})$ that takes as input two $n \times m$ -dimensional ciphertext matrices $A \in E (ek, M)$ and $B \in E (ek, N)$ where N is M, column-wise permuted with a permutation π, and outputs a proof of the statement $\begin{matrix} \exists R, π . \forall 1 ⩽ i ⩽ n . \forall 1 ⩽ j ⩽ m . B_{i, j} = Rnd (ek, A_{π^{- 1} (i), j}, R_{i, j}) . \end{matrix}$

We propose two solutions to this problem that we discuss in the sequel. The underlying idea of both solutions is to compress the data on which the shuffle proof is computed so as to lower the amount of proofs that have to be computed.

6.5.1. Batched zero-knowledge proofs of shuffle correctness

The first solution is a new proof technique that we call batched zero-knowledge proofs of shuffle correctness, based on the idea of “batching” several instances and verifying them together. Our interactive protocol takes advantage of the homomorphic property of the public-key encryption scheme in order to batch the instances.

Intuitively, the batching algorithm randomly selects a subset of columns (i.e., block indices) and computes the row-wise homomorphic sum of the corresponding blocks for each row. It then computes the proof of shuffle correctness on the resulting single-block ciphertexts. The property we would like to achieve is that modifying even a single block in a row should lead to a different sum and, thus, be detected. Notice that naïvely multiplying all blocks together does not achieve the intended property, as illustrated by the following counterexample: $\begin{matrix} (\begin{matrix} E (pk, 3) & E (pk, 4) \\ E (pk, 5) & E (pk, 2) \end{matrix}) (\begin{matrix} E (pk, 2) & E (pk, 5) \\ E (pk, 5) & E (pk, 2) \end{matrix}) \end{matrix}$ In the above matrices, the rows have not been permuted but rather changed. Still, the row-wise sum is preserved, i.e., 7 in the first and 7 in the second. Hence, we cannot compute the sum over all columns. Instead, as proved in the long version, the intended property can be achieved with probability at least $\frac{1}{2}$ if each column is included in the homomorphic sum with probability $\frac{1}{2}$ . Although a probability of $\frac{1}{2}$ is not sufficient in practice, repeating the protocol k times increases the probability to $(1 - \frac{1}{2^{k}})$ .

Algorithm 12

Batched zero-knowledge proofs of shuffle correctness

The detailed construction is depicted in Algorithm 12. In line 12.1, $V^{*}$ picks a challenge, which indicates which column to include in the homomorphic product. Upon receiving the challenge, in line 12.2, $P^{*}$ and $V^{*}$ compute the row-wise homomorphic sum of the columns indicated by the challenge. Finally, $V^{*}$ and $P^{*}$ run an off-the-shelf shuffle proof protocol between $V$ and $P$ on the resulting ciphertext lists (line 12.3). Finally, the protocol can be made non-interactive by using the Fiat–Shamir heuristic [41].

Formal guarantees. We establish the following result for our new protocol and prove it in Appendix E.

Theorem 2 (Batched zero-knowledge proofs of shuffle correctness).

Let $Π_{PKE}$ be an additively homomorphic CPA-secure public-key encryption scheme and let $(P, V)$ be a $ZKP$ of shuffle correctness over $Π_{PKE}$ . Then $(P^{*}, V^{*})$ as defined in Algorithm 12 is a $ZKP$ of shuffle correctness over $Π_{PKE}$ , which is sound with probability at least $1 / 2$ .

6.5.2. The hash-and-proof paradigm

The second solution improves the integrity proofs even further. The high-level idea is to design a compression technique that is collision-resistant with overwhelming probability, which is in contrast to the previous compression technique, which can lead to collisions with probability 1/2. In a nutshell, a technique based on pairwise independent hash functions [19] applied on each row of the ciphertext matrix allows for reducing the number of computed proofs of shuffle correctness to one.

The detailed construction is reported in Algorithm 13, where we leverage the fact that the message space $M = F_{p}$ for some field $F_{p}$ . Moreover, we let $E (ek, z_{0}; z_{1})$ denote the encryption of $z_{0}$ with key $ek$ and randomness $z_{1}$ . In line 13.1, $V^{*}$ picks a challenge, which can be seen as the coefficients of the pairwise independent hash function. Upon receiving the challenge, in line 13.2, $P^{*}$ and $V^{*}$ compute the row-wise homomorphic sum of the columns as dictated by the challenge, additionally adding the encryption of $z_{0}$ using the randomness $z_{1}$ for both $\vec{a}$ and $\vec{b}$ . Finally, $V^{*}$ and $P^{*}$ run an off-the-shelf shuffle proof protocol between $V$ and $P$ on the resulting ciphertext lists (line 13.3). As before, the protocol can be made non-interactive by applying the Fiat–Shamir heuristic [41].

Algorithm 13

Shuffle proofs based on the hash-and-proof paradigm

Formal guarantees. We establish the following result for our new protocol and prove it in Appendix E.

Theorem 3 (Hash-and-Proof).

Let $Π_{PKE}$ be an additively homomorphic CPA-secure public-key encryption scheme and let $(P, V)$ be a $ZKP$ of shuffle correctness over $Π_{PKE}$ . Then $(P^{*}, V^{*})$ is a $ZKP$ of shuffle correctness over $Π_{PKE}$ .

6.5.3. Discussion

The improvements presented in this section cannot only be applied to $GORAM$ , they can also be used to speed-up $PIR$ - $GORAM$ . The reason is that in the $flush$ algorithm, clients have to prove that they either randomized an entry or that they know a signing key allowing them to change the data. The first half of this proof is clearly subject to the same problems as the ones we face in $GORAM$ : entries with bigger block sizes require more than one public-key ciphertext and, consequently, the number of proofs to be computed is linear in the block size. Inspecting the new proof techniques, we observe that if π is the identity function and the number of rows in the matrices is one, then we have reproduced exactly the setting of $PIR$ - $GORAM$ for a single such proof. Hence, batched proofs of shuffle correctness and the hash-and-proof paradigm can also be applied to general plaintext-equivalence-proofs (PEPs).

7. GORAM with accountable integrity (A-GORAM)

In this section we relax the integrity property by introducing the concept of accountability. In particular, instead of letting the server check the correctness of client operations, we develop a technique that allows clients to detect a posteriori non-authorized changes on the database and blame the misbehaving party. Intuitively, each entry is accompanied by a tag (technically, a chameleon hash along with the randomness corresponding to that entry), which can only be produced by clients having write access. All clients can verify the validity of such tags and, eventually, determine which client inserted an entry with an invalid tag. This makes the construction more efficient and scalable, significantly reducing the computational complexity both on the client and on the server side, since zero-knowledge proofs are no longer necessary and, consequently, the outermost encryption can be implemented using symmetric, as opposed to asymmetric, cryptography. Such a mechanism is supposed to be paired with a data versioning protocol in order to avoid data losses: as soon as one of the clients detects an invalid entry, the misbehaving party is punished and the database is reverted to the last safe state (i.e., a state where all entries are associated with a valid tag).

7.1. Cryptographic preliminaries

Our construction relies on predicate encryption, private-key encryption, digital signatures, and chameleon hashes. We summarize the missing notation for private-key encryption and chameleon hashes in Table 4. A chameleon hash allows for computing an explicit collision for the hash value if one knows a secret trapdoor. Further details are postponed to Appendix B.

Table 4
Notation for cryptographic primitives

Primitive Notation

Private-key encryption $Π_{SE}$

Key generation $k \leftarrow {Gen}_{SE} (1^{λ})$

Encryption $c \leftarrow E (k, m)$

Decryption $m \leftarrow D (k, c)$

Chameleon hash $Π_{CH}$

Key generation $(cpk, csk) \leftarrow {Gen}_{CHF} (1^{λ})$

Hashing $t \leftarrow CH (cpk, m, r)$

Collision $r^{'} \leftarrow Col (csk, m, r, m^{'})$ such that $CH (cpk, m, r) = CH (cpk, m^{'}, r^{'})$

Primitive	Notation
Private-key encryption	$Π_{SE}$
Key generation	$k \leftarrow {Gen}_{SE} (1^{λ})$
Encryption	$c \leftarrow E (k, m)$
Decryption	$m \leftarrow D (k, c)$
Chameleon hash	$Π_{CH}$
Key generation	$(cpk, csk) \leftarrow {Gen}_{CHF} (1^{λ})$
Hashing	$t \leftarrow CH (cpk, m, r)$
Collision	$r^{'} \leftarrow Col (csk, m, r, m^{'})$ such that $CH (cpk, m, r) = CH (cpk, m^{'}, r^{'})$

7.2. System assumptions

Data structures and database layout. We assume the same layout and data structures as for $GORAM$ . Additionally, we use a log file $Log$ so as to detect who has to be held accountable in case of misbehavior. $Log$ is append-only and consists of the list of paths uploaded to the server, each of them signed by the respective client. Such an append-only log file can be realized both in a centralized [52] or decentralized way [28].

Client capabilities. As in $GORAM$ , every client $C_{i}$ holds a capability ${cap}_{i}$ containing a collection of keys: predicate encryption keys ${psk}_{f_{i}}^{Auth}$ and ${psk}_{f_{i}}^{Data}$ and a key for the top level encryption, which is now, however, replaced by a private-key version. Hence, $C_{i}$ holds also $K$ .

Fig. 10.

The entry structure of an entry in the database.

Structure of an entry and access control modes. The structure of an entry in the database is depicted in Fig. 10. An entry E is protected by a top-level private-key encryption scheme with a key $K$ that is shared by the data owner $O$ and all clients $C_{1}, \dots, C_{n}$ . Under the encryption, E contains several elements, which we explain below:

j is the index of the entry;

$c_{Auth}$ is a predicate encryption ciphertext that encrypts the private key $csk$ of a chameleon hash function under an attribute $x_{w}$ , which regulates the write access;

$c_{Data}$ is unchanged;

$cpk$ is the public key of a chameleon hash function, i.e., the counterpart of $csk$ encrypted in $c_{Auth}$ ;

r is some randomness used in the computation of t;

t is a concatenation of hash tags: a chameleon hash tag produced by hashing $c_{Data}$ under randomness r, and a normal hash tag produced by hashing j, $c_{Auth}$ , and the public key of the chameleon hash function $cpk$ ; only $c_{Data}$ is hashed with the chameleon hash function so as to not allow clients to change every other part but $c_{Data}$ ;

σ is a signature on the tag t, signed by the data owner $O$ .

Intuitively, only clients with write access are able to decrypt $c_{Auth}$ , and thus to retrieve the key $csk$ required to compute a collision for the new entry $d^{'}$ (i.e., to find a randomness $r^{'}$ such that the chameleon hash t for the old entry d and randomness r is the same as the one for $d^{'}$ and $r^{'}$ ). The fundamental observation is that the modification of an entry is performed without changing the respective tag. Consequently, the signature σ is the same for the old and for the new entry. Computing a collision is the only way to make the tag t, originally signed by the data owner, a valid tag also for the new entry $d^{'}$ . Therefore verifying the signature and the chameleon hash suffices to make sure that the entry has been only modified by authorized clients.

7.3. Construction

Basic algorithms. The basic algorithms follow the ones defined in Section 6.3, except for natural adaptions to the new entry structure. Furthermore, the zero-knowledge proofs are no longer computed and the rerandomization steps are substituted by re-encryptions. Finally, clients upload on the server signed paths, which are stored in the $Log$ . We detail the differences to the algorithms of GORAM in Appendix G.

Entry verification. We introduce an auxiliary verification function that clients run in order to verify the integrity of an entry. During the execution of any protocol we maintain the invariant that, whenever a client i (or the data owner himself) parses an entry j that he downloaded from the server, he executes Algorithm 14. If the result is ⊥, then the client runs $blame ({cap}_{i}, Log, j)$ . The client also runs $blame ({cap}_{i}, Log, j)$ if she does not find j on the downloaded path even if the index mapping says so.

Algorithm 14

The pseudo-code for the verification of an entry in the database which is already decrypted

Blame. In order to execute the function $blame ({cap}_{i}, Log, j)$ , the client must first retrieve $Log$ from the server. Afterwards, she parses backwards the history of modifications by decrypting the paths present in the $Log$ . The client stops only when she finds the desired entry indexed by j in a consistent state, i.e., the data hashes to the associated tag t and the signature is valid. At this point the client moves forwards on the $Log$ until she finds an uploaded path where the entry j is supposed to lay on (the entry might be associated with an invalid tag or missing). The signature on the path uniquely identifies the client, whose identity is added to a list $L$ of misbehaving clients. Finally, all of the other clients that acknowledged the changes of the inconsistent entry are also added to $L$ , since they did not correctly verify its chameleon signature. If the entry is missing, we only add the client who removed it to $L$ . No other client can be deemed malicious since a missing entry is only detected when a client tries to actively read or write it.

7.4. Discussion

As explained above, the accountability mechanism allows for the identification of misbehaving clients with a minimal computational overhead in the regular clients’ operation. However, it requires the server to store a log that is linear in the number of modifications to the database and logarithmic in the number of entries. This is required to revert the database to a safe state in case of misbehaviour. Consequently, the $blame$ algorithm results expensive in terms of computation and communication with the server, in particular for the entries that are not regularly accessed. Nonetheless, $blame$ is supposed to be only occasionally executed, therefore we believe this design is acceptable in terms of service usability. Furthermore, we can require all the parties accessing the database to synchronize on a regular basis so as to verify the content of the whole database and to reset the $Log$ , in order to reduce the storage on the server side and, thus, the amount of data to transfer in the $blame$ algorithm. Such an approach could be complemented by an efficient versioning algorithm on encrypted data, which is however beyond the scope of this work and left as a future work. Finally, we also point out that the accountable-integrity property targeted by $A - GORAM$ sacrifices anonymity, since users have to sign the paths they upload to the server. This issue can be easily overcome by using any anonymous credential system that supports revocation [16].

8. Scalable solution (S-GORAM)

Even though the personal record management systems we consider rely on simple client-based read and write permissions, the predicate encryption scheme used in $GORAM$ and $A - GORAM$ support in principle a much richer class of access control policies, such as role-based access control (RBAC) or attribute-based access control (ABAC) [58]. If we stick to client-based read and write permissions, however, we can achieve a more efficient construction that scales to thousands of clients. To this end, we replace the predicate encryption scheme with a broadcast encryption scheme [45], which guarantees that a specific subset of clients is able to decrypt a given ciphertext. This choice affects the entry structure as follows (cf. Fig. 10):

$c_{Data}$ is the broadcast encryption of d;

$c_{Auth}$ is the broadcast encryption of $csk$ .

The subset of clients that can decrypt

c_{Data}

(resp.

c_{Auth}

) is then set to be the same subset that holds

R

(resp.

RW

) permissions on the given entry. By applying the aforementioned modifications on top of

A - GORAM

, we obtain a much more efficient and scalable instantiation, called

S - GORAM

, that achieves a smaller constant in the computational complexity (linear in the number of clients). For more details on the performance evaluation and a comparison with

A - GORAM

, we refer to Section 10.

9. Security and privacy results

In this section, we show that the Group ORAM instantiations presented in Section 4, in Section 5, in Section 6, in Section 7, and in Section 8 achieve the security and privacy properties stated in Section 2.3. The proofs are reported in Appendix F. A brief overview of the properties guaranteed by each construction is shown in Table 5. As previously discussed, relaxing the obliviousness property so as to consider only security against the server or assuming a trusted component in the system is required to enable constructions that are more efficient from a communication point of view. Hence, $TAO - GORAM$ and $GORAM$ are optimal with respect to communication. As opposed to $TAO - GORAM$ , which is oblivious with respect to malicious clients, $GORAM$ does not assume any trusted component in the system. Furthermore, dropping the computationally expensive integrity checks in favor of an accountability mechanism is crucial to achieve computational efficiency. It follows that $A - GORAM$ and $S - GORAM$ provide accountable integrity as opposed to integrity and tamper resistance. Having an accountable system trivially implies the loss of anonymity, as defined in Definition 7, although it is still possible to achieve pseudonym-based anonymity by employing anonymous credentials. The other privacy properties of our system, namely secrecy and obliviousness, are fulfilled by all of our instantiations. Moreover, by replacing predicate encryption with broadcast encryption ( $S - GORAM$ ), we sacrifice the possibility to enforce ABAC policies, although we can still handle client-based read/write permissions.

Table 5
Security and privacy properties achieved by each construction

Property $PIR$ - $GORAM$ $TAO - GORAM$ $GORAM$ $A - GORAM$ $S - GORAM$

Secrecy ✓ ✓ ✓ ✓ ✓

Integrity ✓ ✓ ✓ Accountable Accountable

Tamper-resistance ✓ ✓ ✓ ✗ ✗

Obliviousness ( $C + S$ ) ✓ ✓ ✗ ✗ ✗

Obliviousness ( $S$ only) ✓ ✓ ✓ ✓ ✓

Anonymity ✗ ✗ ✓ ✗ ✗

Access control R/W R/W ABAC ABAC R/W

Property	$PIR$ - $GORAM$	$TAO - GORAM$	$GORAM$	$A - GORAM$	$S - GORAM$
Secrecy	✓	✓	✓	✓	✓
Integrity	✓	✓	✓	Accountable	Accountable
Tamper-resistance	✓	✓	✓	✗	✗
Obliviousness ( $C + S$ )	✓	✓	✗	✗	✗
Obliviousness ( $S$ only)	✓	✓	✓	✓	✓
Anonymity	✗	✗	✓	✗	✗
Access control	R/W	R/W	ABAC	ABAC	R/W

The following theorems characterize the security and privacy properties achieved by each cryptographic instantiation presented in this paper. Interestingly enough, the properties of $TAO - GORAM$ are only conditioned by the trusted-component-based parallel ORAM construction that we base it on since our additions are not of cryptographic nature but simply modify the software component that synchronizes client accesses.

Theorem 4 (

PIR

GORAM

Let $Π_{PKE}$ be a CPA-secure encryption scheme, then $PIR$ - $GORAM$ achieves secrecy.

Let $Π_{DS}$ be an existentially unforgeable digital signature scheme, $ZKP$ be a zero-knowledge proof of knowledge protocol, and $Π_{PKE}$ be a CCA-secure encryption scheme, then $PIR$ - $GORAM$ achieves integrity.

Let $Π_{DS}$ be an existentially unforgeable digital signature scheme and let $Π_{PKE}$ be a CCA-secure encryption scheme, then $PIR$ - $GORAM$ achieves tamper resistance.

Let $Π_{PIR}$ be a private information retrieval scheme, let $Π_{PKE}$ be a CPA-secure encryption scheme, let $Π_{DS}$ be an existentially unforgeable digital signature scheme, and let $ZKP$ be a zero-knowledge proof of knowledge protocol, then $PIR$ - $GORAM$ is oblivious against malicious clients.

Theorem 5 ( $TAO - GORAM$ ).

Assume that TaoStore is a secure realization of a parallel ORAM. Then $TAO - GORAM$ achieves secrecy, integrity, tamper resistance, and obliviousness against malicious clients.

Theorem 6 ( $GORAM$ ).

Let $Π_{PE}$ be an attribute-hiding predicate encryption scheme. Then $GORAM$ achieves secrecy.

Let $ZKP$ be a zero-knowledge proof system. Then $GORAM$ achieves integrity.

Let $ZKP$ be a zero-knowledge proof system and $Π_{PE}$ be an attribute-hiding predicate encryption scheme. Then $GORAM$ achieves tamper-resistance.

Let $ZKP$ be a zero-knowledge proof system and $Π_{PKE}$ be a CPA-secure public-key encryption scheme. Then $GORAM$ achieves obliviousness.

Let $ZKP$ be a zero-knowledge proof system. Then $GORAM$ achieves anonymity.

Theorem 7 ( $A - GORAM$ ).

Let $Π_{PE}$ be an attribute-hiding predicate encryption scheme. Then $A - GORAM$ achieves secrecy.

Let $CH$ be a collision-resistant, key-exposure free chameleon hash function and $Π_{DS}$ be an existentially unforgeable digital signature scheme. Then $A - GORAM$ achieves accountable integrity.

Let $Π_{SE}$ be a CPA-secure private-key encryption scheme. Then $A - GORAM$ achieves obliviousness.

Theorem 8 ( $S - GORAM$ ).

Let $Π_{BE}$ be an adaptively secure broadcast encryption scheme and $Π_{SE}$ be a CPA-secure private-key encryption scheme. Then $S - GORAM$ achieves secrecy.

Let $CH$ be a collision-resistant, key-exposure free chameleon hash function and $Π_{DS}$ be an existentially unforgeable digital signature scheme. Then $S - GORAM$ achieves accountable integrity.

Let $Π_{SE}$ be a CPA-secure private-key encryption scheme. Then $S - GORAM$ achieves obliviousness.

10. Implementation and experiments

In this section, we present the concrete instantiations of the cryptographic primitives that we previously described (Section 10.1), we study their asymptotic complexity (Section 6.4), describe our implementation (Section 10.2), and discuss the experimental evaluation (Section 10.3).

10.1. Cryptographic instantiations

Encryption schemes. We use AES [29] as private-key encryption scheme with an appropriate message padding in order to achieve the elusive-range property [64].8

⁸
This property is formally necessary when proving the hybrid version of our constructions tamper-resistant. We refer to [69, Proof of Lemma 3] for the full proof in the hybrid version.

Furthermore, we employ the El Gamal encryption scheme [39] for public-key encryption. We use it for

PIR

GORAM

to construct an entry in the database (cf.

c_{Data}

in (1) and

c_{BrCast}

in (2)). It also fulfills all properties that we require for

GORAM

, i.e., it is rerandomizable and supports zero-knowledge proofs. We review the scheme below.

{Gen}_{PKE} (1^{λ})

Let $G$ be a cyclic group of prime order q and g be a generator of $G$ . Then, draw a random $x \in Z_{q}^{*}$ and compute $h = g^{x}$ . Output the pair $(dk, ek) = ((q, g, x), (q, g, h))$ .

E (ek, m)

In order to encrypt a message $m \in G$ using the public key $(q, g, h) \leftarrow ek$ , draw a random $r \in Z_{q}^{*}$ and output the ciphertext $c = (g^{r}, m h^{r})$ .

D (dk, c)

In order to decrypt a ciphertext $c = (c_{1}, c_{2})$ using the secret key $(q, g, x) \leftarrow dk$ , compute $m = c_{2} \cdot c_{1}^{- x}$ .

Rnd (ek, c, r)

In order to rerandomize a ciphertext $c = (c_{1}, c_{2})$ using the public key $(q, g, h) \leftarrow ek$ and randomness $r \in Z_{q}^{*}$ , output $c^{'} = (c_{1} \cdot g^{r}, c_{2} \cdot h^{r})$ .

It is also possible to produce information with which one can decrypt a ciphertext

c = (c_{1}, c_{2})

without knowing the secret key by sending

c_{1}^{- x}

. This is necessary to give the server access to

c_{Auth}

GORAM

. In

PIR

GORAM

, we encrypt the signing keys of the Schnorr signature scheme [80] (cf.

c_{Auth}

in (3)) using the Cramer–Shoup encryption scheme [27].

For $GORAM$ and $A - GORAM$ , we utilize the predicate encryption scheme introduced by Katz et al. [58]. Its ciphertexts are rerandomizable and we also show them to be compatible with the Groth–Sahai proof system [51]. For the details, we refer to Appendix C. Concerning the implementation, the predicate encryption scheme by Katz et al. [58] is not efficient enough since it relies on elliptic curves on composite-order groups. In order to reach a high security parameter, the composite-order setting requires us to use much larger group sizes than in the prime-order setting, rendering the advantages of elliptic curves practically useless. Therefore, we use a scheme transformation proposed by David Freeman [43], which works in prime-order groups and is more efficient. For implementing $S - GORAM$ we use an adaptively secure broadcast encryption scheme by Gentry and Waters [45].

Private information retrieval. We use XPIR [1], the state of the art in computational PIR.

Zero-knowledge proofs. We deploy several non-interactive zero-knowledge proofs. For $PIR$ - $GORAM$ , in order to implement the integrity proofs in (cf. lines 4.20 and 4.26 in Section 4), we use an OR-proof [25] over a conjunction of plaintext-equivalence proofs [56] (PEP) on the El Gamal ciphertexts forming one entry and a standard discrete logarithm proof [80] showing that the client knows the signing key corresponding to the authenticated verification key. When improving the proof computation using our new technique based on the hash-and-proof paradigm,9

⁹

A careful analysis of the computation shows that the technique from batched shuffle proofs mapped to standard PEPs as we deploy them in $PIR$ - $GORAM$ , would end up in a worse solution.

the conjunction of PEPs reduces to the computation of the homomorphic hash plus one PEP. As a matter of fact, since the public components necessary to verify a proof (the new and old ciphertexts and the verification key) and the secret components necessary to compute the proof (the randomness used for rerandomization or the signing key) are independent of the number of clients, all deployed proofs solely depend on the block size.

In $GORAM$ , for proving that a predicate ciphertext validly decrypts to 1 without revealing the key, we use Groth–Sahai non-interactive zero-knowledge proofs10

¹⁰

Groth–Sahai proofs are generally not zero-knowledge. However, in our case the witnesses fulfill a special equation for which they are zero-knowledge.

[51]. More precisely, we apply them in the proofs created in line 10.14 (

read

and

write

, see Algorithm 10 and Algorithm 11). We employ plaintext-equivalence proofs (PEPs) [56,80] for the proofs in line 10.16. Furthermore, we use a proof of shuffle correctness [10], batched shuffle proofs, and the hash-and-proof paradigm in lines 10.11 and 11.10.

Chameleon signatures. We use a chameleon hash function by Nyberg and Rueppel [5], which has the key-exposure freeness property. We complete the chameleon hash tags with SHA-256 for the ordinary hash function and combine both with RSA signatures [77].

Implementing permanent entries in $GORAM$ . We briefly outline how permanent entries can be implemented using El Gamal encryption and equality of discrete logarithm proofs [26]. Let $c_{p} = E (pk, permanent) = (G, H) = (g^{r}, g^{permanent} \cdot h^{r})$ be the ciphertext associated to the entry that is subject to change and $pk = (g, h)$ be the public key of the El Gamal scheme. If $permanent \neq 1$ then the entry may not be removed from the database completely. Hence, if $O$ attempts to remove an entry from the tree, she has to prove to $S$ that $permanent = 1$ . The following zero-knowledge proof serves this purpose, given that $permanent$ is encoded in the exponent of the message: $\begin{matrix} PK {(α) : H \cdot g^{- 1} = G^{α} \land h = g^{α}} . \end{matrix}$ Naturally, the re-randomization step as well as the shuffle proof step also apply to this ciphertext.

10.2. Java implementation

We implemented the six different versions of $GORAM$ in Java ( $PIR$ - $GORAM$ , $GORAM$ with off-the-shelf shuffle proofs, batched shuffle proofs, and shuffle proofs based on the hash-and-proof paradigm, $A - GORAM$ , and $S - GORAM$ ). Furthermore, we also implemented $A - GORAM$ and $S - GORAM$ on Amazon EC2. For the zero-knowledge proofs computed on predicate encryption, we build on a library [6] that implements Groth–Sahai proofs [51], which internally relies on jPBC/PBC [35,67].

Cryptographic setup. We use MNT curves [72] based on prime-order groups for primes of length 224 bits. This results in 112 bits of security according to different organizations [14]. We deploy AES with 128 bit keys and we instantiate the El Gamal and Cramer–Shoup encryption scheme, the RSA signature scheme, and the chameleon hash function with a security parameter of 2048 bits. According to NIST [14], this setup is secure until 2030.

10.3. Experiments

We evaluated the six different implementations. As a first experiment, we measured the computation times on client and server for the $read$ and $write$ operation for the constructions without accountable integrity. We performed these experiments on an Intel Xeon with 8 cores and 2.60 GHz in order to show the efficiency gained by using batched shuffle proofs instead of off-the-shelf zero-knowledge proofs of shuffle correctness. We vary different parameters: the database size from 128 MB to 2 GM ( $PIR$ - $GORAM$ ) and from 1 GB to 1TB ( $GORAM$ , $A - GORAM$ , and $S - GORAM$ ), the block size from 4 KB to 1 MB, the number of clients from 1 to 10, the number of cores from 1 to 8, and for batched shuffle proofs also the number of iterations k from 1 to 128. For $GORAM$ , $A - GORAM$ , and $S - GORAM$ we fix a bucket size of 4 since Stefanov et al. [90] showed that this value is sufficient to prevent buckets from overflowing.

The second experiment focuses on the solution with accountability. Here we measure also the overhead introduced by our realization with respect to a state-of-the-art ORAM construction, i.e., the price we pay to achieve a wide range of security and privacy properties in a multi-client setting. Another difference from the first experiment is the hardware setup. We run the server side of the protocol in Amazon EC2 and the client side on a MacBook Pro with an Intel i7 and 2.90 GHz. We vary the parameters as in the previous experiment, except for the number of clients which we vary from 1 to 100 for $A - GORAM$ and from 1 to 10000 for $S - GORAM$ , and the number of cores which are limited to 4. In the experiments where the number of cores is not explicitly varied, we use the maximum number of cores available.

Fig. 11.

The end-to-end running time of an operation in $PIR$ - $GORAM$ .

10.4. Discussion

Figure 11 and Fig. 16 report the results for $PIR$ - $GORAM$ . Figure 11(a) shows the end-to-end and partial running times of an access to the ORAM when the flush algorithm is not executed, whereas Fig. 11(b) depicts the worst case running time (i.e., with flush operation). We assume a mobile LTE connection for the network, i.e., 100 Mbit/s downlink and 50 Mbit/s uplink in peak. For the example of the medical record which usually fits into 128 MB (resp. 256 MB for additional files such as X-ray images), the amortized times per access range from 11 (resp. 15) seconds for 4 KB up to 131 (resp. 198) seconds for 1 MB sized entries (see Fig. 11(c)).

Figure 16 shows the improvement as we compare the combined proof computation and proof verification time in the flush algorithm of $PIR$ - $GORAM$ , first as described in Section 4 and then with the integrity proof based on the hash-and-proof paradigm (see Section 6.5.2). We observe that our expectations are fulfilled: the larger the block size, the more effect has the hash computation since the number of proofs to compute decreases. Concretely, with 1 MB block size we gain a speed-up of about 4% for flush operations with respect to the construction without homomorphic hash. 4% does not sound much and the reason for this little improvement is quite straightforward: the computations performed for computing the hash function and those performed for computing the single PEPs are the same. The only difference lies in the verification of the proof, which incurs one more modular exponentiation than the recomputation of the hash on the server side. Hence, there is an improvement, but this improvement shows only for big block sizes, say, more than 1 MB. We will see further below, it has much more effect on $GORAM$ .

Our solution $TAO - GORAM$ only adds access control to the actual computation of TaoStore’s trusted proxy [79]. Interestingly enough, TaoStore’s bottleneck is not computation, but communication. Hence, our modifications do not cause any noticeable slowdown on the throughput of TaoStore. Consequently, we end up with a throughput of about 40 operations per second when considering an actual deployment of $TAO - GORAM$ in a cloud-based setting [79].

Fig. 12.

The average execution time for the $read$ and $write$ protocol on client and server for varying B where $B N = 1$ GB and $G = 4$ .

Fig. 13.

The average execution time for the $read$ and $write$ protocol on client and server for varying $B N$ where $G = 4$ .

Fig. 14.

The average execution time for the $read$ and $write$ protocol on client and server for varying G where $B N = 1$ GB.

The results of the experiments for $GORAM$ , $A - GORAM$ , and $S - GORAM$ are reported in Fig. 12–19. As shown in Fig. 12(a), varying the block size has a linear effect in the construction without batched shuffle proofs. As expected, the batched shuffle proofs improve the computation time significantly (Fig. 12(b)). The new scheme even seems to be independent of the block size, at least for block sizes less than 64 KB. This effect is caused by the parallelization. Still, the homomorphic multiplication of the public-key ciphertexts before the batched shuffle proof computation depends on the block size (line 12.2). We do not depict individual results for $GORAM$ with proofs based on the hash-and-proof paradigm: the computation necessary is almost equivalent to that for batched shuffle proofs with $k = 1$ ; the only difference being the homomorphic pre-computation which is slightly more expensive. Hence, whenever we state something about batched shuffle proofs, the same holds for the hash-and-proof paradigm. Figure 12(c) and Fig. 12(d) show the results for $A - GORAM$ and $S - GORAM$ . Since the computation time is in practice almost independent of the block size, we can choose larger block sizes in the case of databases with large files, thereby allowing the client to read (resp. write) a file in one shot, as opposed to running multiple read (resp. write) operations. We identify a minimum computation time for 128 KB as this is the optimal trade-off between the index map size and the path size. The server computation time is low and varies between 15 ms and 345 ms, while client operations take less than 2 seconds for $A - GORAM$ and less than 1.3 seconds for $S - GORAM$ . As we obtained the best results for 4 KB in the experiments for $GORAM$ and 128 KB for the others, we use these block sizes in the sequel.

Fig. 15.

The average execution time for the $read$ and $write$ protocol on client and server for a varying number of cores where $B N = 1$ GB.

Fig. 16.

The improvement in percent when comparing the combined proof computation time on the client and proof verification time on the server for varying storage and block sizes, once without and once with the universal homomorphic hash.

Fig. 17.

Average execution time for the $read$ and $write$ protocol on client and server for $GORAM$ with batched shuffle proofs and varying k where $B N = 1$ GB, $B = 8$ KB, and $G = 4$ .

The results obtained by varying the storage size (Fig. 13) and the number of clients (Fig. 14) prove what the computational complexity suggests. Nevertheless, it is interesting to see the tremendous improvement in computation time between $GORAM$ with and without batched shuffle proofs. The results obtained by varying the iteration time of the batched shuffle proof protocol are depicted in Fig. 17 and we verify the expected linear dependency. Smaller values of k are more efficient but higher values give a better soundness probability. Even more impressive, the technique based on the hash-and-proof paradigm speeds up $GORAM$ even further. As shown in Fig. 19(a), we gain one order of magnitude (14× on the client and 10.8× on the server for $k = 128$ ) since it is sufficient to compute a single proof rather than k. Notice that we do not gain an improvement of 128 since the shuffle proofs are just one component of the $read$ and $write$ operations.

If we compare $A - GORAM$ and $S - GORAM$ in Fig. 14(c) and Fig. 14(d) we can see that $S - GORAM$ scales well to a large amount of users as opposed to $A - GORAM$ . The good scaling behavior is due to the used broadcast encryption scheme: it only computes a constant number of pairings independent of the number of users for decryption while the opposite holds for predicate encryption. Nevertheless, we identify a linear growth in the times for $S - GORAM$ , which arises from the linear number of exponentiations that are computed. For instance, in order to write 128 KB in a 1 GB storage that is used by 100 users, $A - GORAM$ needs about 20 seconds while $S - GORAM$ only needs about 1 second. Even when increasing the number of users to 10000, $S - GORAM$ requires only about 4 seconds, a time that $A - GORAM$ needs for slightly more than 10 users.

Fig. 18.

The up-/download amount of data compared between Path-ORAM [90] and $S - GORAM$ for varying B while $B N = 1$ GB and $G = 4$ .

Figure 15 shows the results obtained by varying the number of cores. In $GORAM$ most of the computation, especially the zero-knowledge proof computation, can be easily parallelized. We observe this fact in both results (Fig. 15(a) and Fig. 15(b)). In the efficient construction we can parallelize the top-level encryption and decryption, the verification of the entries, and the predicate ciphertext decryption. Also in this case parallelization significantly improves the performance (Fig. 15(c) and Fig. 15(d)). Notice that we run the experiments in this case for 20 clients, as opposed to 4 as done for the other constructions, because the predicate ciphertext decryption takes the majority of the computation time and, hence, longer ciphertexts take longer to decrypt and the parallelization effect can be better visualized.

Finally, Fig. 19(b) compares $S - GORAM$ with the underlying Path-ORAM protocol. Naturally, since Path-ORAM only uses symmetric encryption, no broadcast encryption, and no verification with chameleon signatures, the computation time is much lower. However, the bottleneck of both constructions is actually the amount of data that has to be downloaded and uploaded by the client (Fig. 18). The time required to upload and download data may take much more time than the computation time, given today’s bandwidths. Here the overhead is only between 1.02% and 1.05%. For instance, assuming a mobile client using LTE (100 Mbit/s downlink and 50 Mbit/s uplink in peak) transferring 2 and 50 MB takes 480 ms and 12 s, respectively. Under these assumptions, considering a block size of 1 MB, we get a combined computation and communication overhead of 8% for $write$ and 7% for $read$ , which we consider a relatively low price to pay to get a wide range of security and privacy properties in a multi-client setting.

Fig. 19.

Comparison of the two integrity proof improvements and the overhead with respect to state-of-the-art ORAM.

11. Related work

Oblivious RAM. Oblivious RAM (ORAM) [47] is a technique originally devised to protect the access pattern of software on the local memory and thus to prevent the reverse engineering of that software. The observation is that encryption by itself prevents an attacker from learning the content of any memory cell but monitoring how memory is accessed and modified may still leak a great amount of sensitive information. While the first constructions were highly inefficient [47], recent groundbreaking research paved the way for a tremendous efficiency boost, exploiting ingenious tree based constructions [2,4,17,31,32,49,68,76,85,86,88], server side computations [53,71], and trusted hardware [13,54,66,79,87].

While a few ORAM constructions guarantee the integrity of user data [88,92], none of them is suitable to share data with potentially distrustful clients. Goodrich et al. [50] studied the problem of multi-client ORAM, but their attacker model does not include malicious, and potentially colluding, clients. Furthermore, their construction does not provide fine-grained access control mechanisms, i.e., either all members of a group have access to a certain data, or none has. Finally, this scheme does not allow the clients to verify the data integrity.

The fundamental problem in existing ORAM constructions is that all clients must have access to the ORAM key, which allows them to read and potentially disrupt the entire database. Hence, dedicated solutions tailored to the multi-client setting are required.

Multi-client ORAM. A few recent constructions gave positive answers to this question, devising ORAM constructions in the multi-client setting, which specifically allow the data owner to share data with other clients while imposing fine-grained access control policies. Although, at a first glance, these constructions share the same high-level goal, they actually differ in a number of important aspects. Therefore we find it interesting to draw a systematic comparison among these approaches (cf. Table 6). First of all, obliviousness is normally defined against the server, but in a multi-client setting it is important to consider it against the clients too (MC), since they might be curious or, even worse, collude with the server. This latter aspect is important, since depending on the application, the cloud administrator might create fake clients or just have common interests with one of the legitimate clients. Some constructions allow multiple data owners to operate on the same ORAM (MD), while others require them to use disjoint ORAMs: the latter are much less efficient, since if the client does not want to reveal the owner of the accessed entry (e.g., to protect her anonymity, think for instance of the doctor accessing the patient’s record), then the client has to perform a fake access to each other ORAM, thereby introducing a multiplicative factor of $O (m)$ , where m is the number of data owners. Some constructions require the data owner to periodically access the dataset in order to validate previous accesses (PI), some others rely on server-side client synchronization, which can be achieved for instance by a shared log on the server, a gossiping protocol among clients, etc. (CS), while others assume a trusted proxy (Pr). Among these, gossiping is the mildest assumption since it can be realized directly on the server side as described by [59]. Another aspect to consider is the possibility for the data owner to specify fine-grained access control mechanisms (AC). Finally, some constructions enable concurrent accesses to the ORAM (P). The final three columns compare the asymptotic complexity of server-side and client-side computations as well as communication.

Franz et al. pioneered the line of work on multi-client ORAM, introducing the concept of delegated ORAM [42]. The idea of this construction, based on simple symmetric cryptography, is to let clients commit their changes to the server and to let the data owner periodically validate them according to the access control policy, finally transferring the valid entries into the actual database. Assuming periodic accesses from the data owner, however, constrains the applicability of this technique. Furthermore, this construction does not support multiple data owners. Finally, it guarantees the obliviousness of access patterns with respect to the server as well as malicious clients, excluding however the accesses on data readable by the adversary. While excluding write operations is necessary (an adversary can clearly notice that the data has changed), excluding read operations is in principle not necessary and limits the applicability of the obliviousness definition: for instance, we would like to hide the fact that an oncologist accessed the PHR of a certain patient even from parties with read access to the PHR (e.g., the pharmacy, which can read the prescription but not the diagnosis).

Table 6
Comparison of the related work supporting multiple clients to our constructions. The abbreviations mean: MC: Oblivious against malicious clients, MD: Supports multiple data owners sharing their data in one ORAM, PI: Requires the periodic interaction with the data owner, CS: Requires synchronization among clients, AC: Access control, Pr: Trusted proxy, P: Parallel accesses, S comp.: Server computation complexity, C comp.: Client communication complexity, Comm.: Communication complexity

Work MC MD PI CS Pr AC P S comp. C comp. Comm.

Franz et al. [42] ✓ ✗ ✓ ✗ ✗ ✓ ✗ $O (\sqrt{n})$ $O (\sqrt{n})$ $O (\sqrt{n})$

( $A$ -/ $S$ -) $GORAM$ (this work) ✗ ✗ ✗ ✓ ✗ ✓ ✗ $O (log (n))$ $O (log (n))$ $O (log (n))$

$PIR$ - $GORAM$ (this work) ✓ ✓ ✗ ✓ ✗ ✓ ✗ $O (n)$ $O (\sqrt{n})$ $O (\sqrt{n})$

BCP-OPRAM [15] ✗ ✗ ✓ ✓ ✗ ✗ ✓ $Ω ({log}^{3} (n))$ $Ω ({log}^{3} (n))$ $Ω ({log}^{3} (n))$

CLT-OPRAM [22] ✗ ✗ ✓ ✓ ✗ ✗ ✓ $O ({log}^{2} (n))$ $O ({log}^{2} (n))$ $O ({log}^{2} (n))$

PrivateFS [93] ✗ ✗ ✗ ✓ ✗ ✗ ✓ $O ({log}^{2} (n))$ $O (1)$ $O ({log}^{2} (n))$

Shroud [66] ✗ ✗ ✗ ✗ ✓ ✗ ✓ $O ({log}^{2} (n))$ $O (1)$ $O ({log}^{2} (n))$

TaoStore [79] ✗ ✗ ✗ ✗ ✓ ✗ ✓ $O (log (n))$ $O (1)$ $O (log (n))$

$TAO - GORAM$ (this work) ✓ ✓ ✗ ✗ ✓ ✓ ✓ $O (log (n))$ $O (1)$ $O (log (n))$

Work	MC	MD	PI	CS	Pr	AC	P	S comp.	C comp.	Comm.
Franz et al. [42]	✓	✗	✓	✗	✗	✓	✗	$O (\sqrt{n})$	$O (\sqrt{n})$	$O (\sqrt{n})$
( $A$ -/ $S$ -) $GORAM$ (this work)	✗	✗	✗	✓	✗	✓	✗	$O (log (n))$	$O (log (n))$	$O (log (n))$
$PIR$ - $GORAM$ (this work)	✓	✓	✗	✓	✗	✓	✗	$O (n)$	$O (\sqrt{n})$	$O (\sqrt{n})$
BCP-OPRAM [15]	✗	✗	✓	✓	✗	✗	✓	$Ω ({log}^{3} (n))$	$Ω ({log}^{3} (n))$	$Ω ({log}^{3} (n))$
CLT-OPRAM [22]	✗	✗	✓	✓	✗	✗	✓	$O ({log}^{2} (n))$	$O ({log}^{2} (n))$	$O ({log}^{2} (n))$
PrivateFS [93]	✗	✗	✗	✓	✗	✗	✓	$O ({log}^{2} (n))$	$O (1)$	$O ({log}^{2} (n))$
Shroud [66]	✗	✗	✗	✗	✓	✗	✓	$O ({log}^{2} (n))$	$O (1)$	$O ({log}^{2} (n))$
TaoStore [79]	✗	✗	✗	✗	✓	✗	✓	$O (log (n))$	$O (1)$	$O (log (n))$
$TAO - GORAM$ (this work)	✓	✓	✗	✗	✓	✓	✓	$O (log (n))$	$O (1)$	$O (log (n))$

Another line of work, summarized in the lower part of Table 6, focuses on the parallelization of client accesses, which is crucial to scale to a large number of clients, while retaining obliviousness guarantees. Most of them [13,66,79,87] assume a trusted proxy performing accesses on behalf of users, with TaoStore [79] being the most efficient and secure among them. These constructions do not formally consider obliviousness against malicious clients nor access control, although a contribution of this work is to prove that a simple variant of TaoStore [79] guarantees both. Finally, instead of a trusted proxy, BCP-OPRAM [15] and CLT-OPRAM [22] rely on a gossiping protocol while PrivateFS [93] assumes a client-maintained log on the server-side, but they do not achieve obliviousness against malicious clients nor access control. Moreover, PrivateFS guarantees concurrent client accesses only if the underlying ORAM already does so.

Other multi-client approaches. Huang and Goldberg have recently presented a protocol for outsourced private information retrieval [53], which is obtained by layering a private information retrieval (PIR) scheme on top of an ORAM data layout. This solution is efficient and conceals client accesses from the data owner, but it does not give clients the possibility to update data. Moreover, it assumes ℓ non-colluding servers, which is due to the usage of information theoretic multi-server PIR.

De Capitani di Vimercati et al. [33] proposed a storage service that uses a shuffle index structure to conceal access patterns over outsourced databases. The focus of their work is to study how indexing data in the storage can leak information to clients that are not allowed to access these data, although they are allowed to know the indices. In a followup work [34], the authors integrate access control in the shuffling index approach using selective encryption. However, this line of work achieves a weaker notion of obliviousness and does not consider verifiability.

Verifiable outsourced storage. Verifying the integrity of data outsourced to an untrusted server is a research problem that has recently received increasing attention in the literature. Schröder and Schröder introduced the concept of verifiable data streaming (VDS) and an efficient cryptographic realization thereof [81,82]. In a verifiable data streaming protocol, a computationally limited client streams a long string to the server, who stores the string in its database in a publicly verifiable manner. The client has also the ability to retrieve and update any element in the database. Papamathou et al. [75] proposed a technique, called streaming authenticated data structures, that allows the client to delegate certain computations over streamed data to an untrusted server and to verify their correctness. Other related approaches are proofs-of-retrievability [84]–[89], which allow the server to prove to the client that it is actually storing all of the client’s data, verifiable databases [12], which differ from the previous ones in that the size of the database is fixed during the setup phase, and dynamic provable data possession [40]. All the above do not consider the privacy of outsourced data. While some of the latest work has focused on guaranteeing the confidentiality of the data [91], to the best of our knowledge no existing paper in this line of research takes into account obliviousness.

Personal health records. Security and privacy concerns seem to be one of the major obstacles towards the adoption of cloud-based PHRs [18,30,94]. Different cloud architectures have been proposed [65], as well as database constructions [60,62], in order to overcome such concerns. However, none of these works takes into account the threat of a curious storage provider and, in particular, none of them enforces the obliviousness of data accesses.

12. Conclusion and future work

This paper introduces the concept of Group ORAM, which captures an unprecedented range of security and privacy properties in the cloud storage setting. Based on our definitional framework $Π_{GORAM}$ , we establish a lower bound on the server-side computational complexity, showing that any Group ORAM that is oblivious against malicious clients has to involve at least $Ω (n)$ computation steps, where n is the number of entries in the database. We further present a novel cryptographic instantiation, which achieves an amortized communication overhead of $O (\sqrt{n})$ by combining private information retrieval technologies, a new accumulation technique, and an oblivious gossiping protocol. Access control is enforced by integrity proofs. Finally, we showed how to bypass our lower bound by leveraging a trusted proxy [79], thereby achieving logarithmic communication and server side computational complexity. We then move to Group ORAM that is oblivious only against the server: the fundamental idea underlying our instantiation is to extend a state-of-the-art ORAM scheme [90] with access control mechanisms and integrity proofs while preserving obliviousness. To tackle the challenge of devising an efficient and scalable construction, we devised two novel zero-knowledge proof techniques for shuffle correctness as well as a new accountability technique based on chameleon signatures, both of which are generically applicable and thus of independent interest. We showed how $Π_{GORAM}$ is an ideal solution for personal record management systems.

This work opens up a number of interesting research directions. Among those, it would be interesting to prove a lower bound on the communication complexity. Furthermore, we would like to relax the obliviousness property in order to bypass the lower bound established in this paper, coming up with more efficient constructions and quantifying the associated privacy loss. Finally, a further research goal is the design of cryptographic solutions allowing clients to learn only limited information (e.g., statistics) about the dataset.

Footnotes

Acknowledgments

This research is based upon work supported by the state of Bavaria at the Nuremberg Campus of Technology (NCT). NCT is a research cooperation between the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and the Technische Hochschule Nürnberg Georg Simon Ohm (THN). Dominique Schröder is supported by the German Federal Ministry of Education and Research (BMBF) through funding for the project PROMISE and by an Intel Early Career Faculty Honor Program Award. Furthermore, this work has been partially supported by the European Research Council (ERC) under the European Unions Horizon 2020 research (grant agreement No 771527-BROWSEC), by Netidee through the project EtherTrust (grant agreement 2158), by the Austrian Research Promotion Agency through the Bridge-1 project PR4DLT (grant agreement 13808694) and COMET K1 SBA.

References

Aguilar-Melchor ,

Barrier ,

Fousse and

M.-O.

Killijian , XPIR: Private information retrieval for everyone, in: PETS’16, De Gruyter, 2016, pp. 155–174.

Ajtai , Oblivious RAMs without cryptographic assumptions, in: STOC’10, ACM, 2010, pp. 181–190. doi:10.1145/1806689.1806716.

I.E.

Akkus ,

Chen ,

Hardt ,

Francis and

Gehrke , Non-tracking Web analytics, in: CCS’12, ACM, 2012, pp. 687–698.

Apon ,

Katz ,

Shi and

Thiruvengadam , Verifiable oblivious storage, in: PKC’14, LNCS, Springer, 2014, pp. 131–148.

Ateniese and

de Medeiros , On the key exposure problem in Chameleon hashes, in: SCN’04, LNCS, Springer, 2004, pp. 165–179.

Backes ,

Lorenz and

Pecina , Zero-knowledge Library, online at github.com/peloba/zk-library.

Backes ,

Lorenz ,

Maffei and

Pecina , Anonymous Webs of trust, in: PETS’10, LNCS, Springer, 2010, pp. 130–148.

Backes ,

Maffei and

Pecina , Automated synthesis of privacy-preserving distributed applications, in: NDSS’12, Internet Society, 2012.

Baldimtsi and

Lysyanskaya , Anonymous credentials light, in: CCS’13, ACM, 2013, pp. 1087–1098. doi:10.1145/2508859.2516687.

10.

Bayer and

Groth , Efficient zero-knowledge argument for correctness of a shuffle, in: EUROCRYPT’12, LNCS, Springer, 2012, pp. 263–280.

11.

Bellare ,

Desai ,

Pointcheval and

Rogaway , Relations among notions of security for public-key encryption schemes, in: CRYPTO’98, LNCS, Springer, 1998, pp. 26–45.

12.

Benabbas ,

Gennaro and

Vahlis , Verifiable delegation of computation over large datasets, in: CRYPTO’11, LNCS, Springer, 2011, pp. 111–131.

13.

Bindschaedler ,

Naveed ,

Pan ,

Wang and

Huang , Practicing oblivious access on cloud storage: The gap, the fallacy, and the new way forward, in: CCS’15, ACM, 2015, pp. 837–849. doi:10.1145/2810103.2813649.

14.

BlueKrypt, Cryptograhpic Key Length Recommendation, online at www.keylength.com.

15.

Boyle ,

K.-M.

Chung and

Pass , Oblivious parallel RAM and applications, in: TCC’16, LNCS, Springer, 2016.

16.

Camenisch ,

Kohlweiss and

Soriente , An accumulator based on bilinear maps and efficient revocation for anonymous credentials, in: PKC’09, LNCS, Springer, 2009, pp. 481–500.

17.

Carbunar and

Sion , Regulatory compliant oblivious RAM, in: ACNS’10, LNCS, Springer, 2010, pp. 456–474.

18.

Carrión Señor ,

L.J.

Fernández-Alemán and

Toval , Are personal health records safe? A review of free Web-accessible personal health record privacy policies, J. Med. Int. Res. 14(4) (2012), e114.

19.

J.L.

Carter and

M.N.

Wegman , Universal classes of hash functions (extended abstract), in: STOC’77, ACM, 1977, pp. 106–112. doi:10.1145/800105.803400.

20.

Chaum ,

Crépeau and

Damgard , Multiparty unconditionally secure protocols, in: STOC’88, ACM, 1988, pp. 11–19. doi:10.1145/62212.62214.

21.

D.L.

Chaum , Untraceable electronic mail, return addresses, and digital pseudonyms, Comm. ACM 24(2) (1981), 84–90. doi:10.1145/358549.358563.

22.

Chen ,

Lin and

Tessaro , Oblivious parallel RAM: Improved efficiency and generic constructions, in: TCC’16, LNCS., Springer, 2016.

23.

Chen ,

Ekin Akkus and

Francis , SplitX: High-performance private analytics, in: SIGCOMM’13, ACM, 2013, pp. 315–326. doi:10.1145/2486001.2486013.

24.

Chor ,

Kushilevitz ,

Goldreich and

Sudan , Private information retrieval, J. ACM 45(6) (1998), 965–981. doi:10.1145/293347.293350.

25.

Cramer ,

Damgård and

Schoenmakers , Proofs of partial knowledge and simplified design of witness hiding protocols, in: CRYPTO’94, LNCS, Springer, 1994, pp. 174–187.

26.

Cramer ,

Gennaro and

Schoenmakers , A secure and optimally efficient multi-authority election scheme, in: EUROCRYPT’97, LNCS, Springer, 1997, pp. 103–118.

27.

Cramer and

Shoup , A practical public key cryptosystem provably secure against adaptive chosen ciphertext attack, in: CRYPTO’98, LNCS, Springer, 1998, pp. 13–25.

28.

Culnane and

Schneider , A peered bulletin board for robust use in verifiable voting systems, in: CSF’14, IEEE Press, 2014, pp. 169–183.

29.

Daemen and

Rijmen , The Design of Rijndael, AES – the Advanced Encryption Standard, Springer, 2002.

30.

Daglish and

Archer , Electronic Personal Health Record Systems: A Brief Review of Privacy, Security, and Architectural Issues, World Congress on Privacy, Security, Trust and the Management of e-Business (2009), 110–120. doi:10.1109/CONGRESS.2009.14.

31.

Damgård ,

Meldgaard and

J.B.

Nielsen , Perfectly secure oblivious RAM without random oracles, in: TCC’11, LNCS, Springer, 2011, pp. 144–163.

32.

Dautrich ,

Stefanov and

Shi , Burst ORAM: Minimizing ORAM response times for bursty access patterns, in: USENIX’14, USENIX Association, 2014, pp. 749–764.

33.

De Capitani di Vimercati ,

Foresti ,

Jajodia ,

Paraboschi and

Samarati , Private data indexes for selective access to outsourced data, in: WPES’11, ACM, 2011, pp. 69–80.

34.

De Capitani di Vimercati ,

Foresti ,

Paraboschi ,

Pelosi and

Samarati , Enforcing authorizations while protecting access confidentiality, Journal of Computer Security Preprint (2018), 1–33.

35.

De Caro , jPBC – Java Library for Pairing Based Cryptography, online at http://gas.dia.unisa.it/projects/jpbc/.

36.

Demers ,

Greene ,

Hauser ,

Irish ,

Larson ,

Shenker ,

Sturgis ,

Swinehart and

Terry , Epidemic algorithms for replicated database maintenance, in: PODC’87, ACM, 1987, pp. 1–12.

37.

Dingledine ,

Mathewson and

Syverson , Tor: The second-generation onion router, in: USENIX’04, USENIX Association, 2004, pp. 303–320.

38.

Dong and

Chen , A fast single server private information retrieval protocol with low communication cost, in: ESORICS’14, LNCS, Vol. 8712, Springer, 2014, pp. 380–399.

39.

El Gamal , A public key cryptosystem and a signature scheme based on discrete logarithms, in: CRYPTO’84, LNCS, Springer, 1985, pp. 10–18.

40.

Erway ,

Küpçü ,

Papamanthou and

Tamassia , Dynamic provable data possession, in: CCS’09, ACM, 2009, pp. 213–222. doi:10.1145/1653662.1653688.

41.

Fiat and

Shamir , How to prove yourself: Practical solutions to identification and signature problems, in: CRYPTO’86, Springer, 1987, pp. 186–194.

42.

Franz ,

Carbunar ,

Sion ,

Katzenbeisser ,

Sotakova ,

Williams and

Peter , Oblivious outsourced storage with delegation, in: FC’11, Springer, 2011, pp. 127–140.

43.

D.M.

Freeman , Converting pairing-based cryptosystems from composite-order groups to prime-order groups, in: EUROCRYPT’10, LNCS, Springer, 2010, pp. 44–61.

44.

D.L.

Gazzoni Filho and

P.S.L.M.

Barreto , Demonstrating Data Possession and Uncheatable Data Transfer, Cryptology ePrint Archive, Report 2006/150, 2006. http://eprint.iacr.org/.

45.

Gentry and

Waters , Adaptive security in broadcast encryption systems (with short ciphertexts), in: EUROCRYPT’09, LNCS, Springer, 2009, pp. 171–188.

46.

Goldreich ,

Micali and

Wigderson , How to play ANY mental game, in: STOC’87, ACM, 1987, pp. 218–229. doi:10.1145/28395.28420.

47.

Goldreich and

Ostrovsky , Software protection and simulation on oblivious RAMs, J. ACM 43(3) (1996), 431–473. doi:10.1145/233551.233553.

48.

Goldwasser and

Micali , Probabilistic encryption & how to play mental poker keeping secret all partial information, in: STOC’82, ACM, 1982, pp. 365–377. doi:10.1145/800070.802212.

49.

M.T.

Goodrich and

Mitzenmacher , Privacy-preserving access of outsourced data via oblivious RAM simulation, in: ICALP’11, LNCS, Springer, 2011, pp. 576–587.

50.

M.T.

Goodrich ,

Mitzenmacher ,

Ohrimenko and

Tamassia , Privacy-preserving group data access via stateless oblivious RAM simulation, in: SODA’12, SIAM, 2012, pp. 157–167.

51.

Groth and

Sahai , Efficient noninteractive proof systems for bilinear groups, SIAM J. Comp. 41(5) (2012), 1193–1232. doi:10.1137/080725386.

52.

Heather and

Lundin , The append-only web bulletin board, in: FAST’09, Springer, 2009, pp. 242–256.

53.

Huang and

Goldberg , Outsourced private information retrieval with pricing and access control, in: WPES’13, ACM, 2013.

54.

Iliev and

S.W.

Smith , Protecting client privacy with trusted computing at the server, IEEE Security and Privacy 3(2) (2005), 20–28. doi:10.1109/MSP.2005.49.

55.

Islam ,

Kuzu and

Kantarcioglu , Access pattern disclosure on searchable encryption: Ramification, attack and mitigation, in: NDSS’12, Internet Society, 2012.

56.

Jakobsson and

Juels , Millimix: Mixing in Small Batches, Technical Report 99-33, DIMACS, 1999.

57.

Jakobsson ,

Juels and

R.L.

Rivest , Making mix nets robust for electronic voting by randomized partial checking, in: USENIX’02, USENIX Association, 2002, pp. 339–353.

58.

Katz ,

Sahai and

Waters , Predicate encryption supporting disjunctions, polynomial equations, and inner products, in: EUROCRYPT’08, Springer, 2008, pp. 146–162.

59.

B.H,

Kim and

Lie , Caelus: Verifying the consistency of cloud services with battery-powered devices, in: S&P’15, IEEE Press, 2015, pp. 880–896.

60.

Korde ,

Panwar and

Kalse , Securing Personal Health Records in Cloud using Attribute Based Encryption, Int. J. Eng. Adv. Tech. (2013).

61.

Küsters ,

Truderung and

Vogt , Accountability: Definition and relationship to verifiability, in: CCS’10, ACM, 2010, pp. 526–535. doi:10.1145/1866307.1866366.

62.

Li ,

Yu ,

Ren and

Lou , Securing personal health records in cloud computing: Patient-centric and fine-grained data access control in multi-owner settings, in: SECURECOMM’10, 2010.

63.

Lindell and

Pinkas , An efficient protocol for secure two-party computation in the presence of malicious adversaries, in: EUROCRYPT’07, LNCS, Springer, 2007, pp. 52–78.

64.

Lindell and

Pinkas , A proof of security of Yao’s protocol for two-party computation, J. Cryptology 22(2) (2009), 161–188. doi:10.1007/s00145-008-9036-8.

65.

Löhr ,

A.-R.

Sadeghi and

Winandy , Securing the e-health cloud, in: IHI’10, ACM, 2010, pp. 220–229. doi:10.1145/1882992.1883024.

66.

J.R.

Lorch ,

Parno ,

Mickens ,

Raykova and

Schiffman , Shroud: Ensuring private access to large-scale data in the data center, in: FAST’13, USENIX Association, 2013, pp. 199–214.

67.

Lynn , PBC – C Library for Pairing Based Cryptography, online at http://crypto.stanford.edu/pbc/.

68.

Maas ,

Love ,

Stefanov ,

Tiwari ,

Shi ,

Asanovic ,

Kubiatowicz and

Song , PHANTOM: Practical oblivious computation in a secure processor, in: CCS’13, ACM, 2013, pp. 311–324. doi:10.1145/2508859.2516692.

69.

Maffei ,

Malavolta ,

Reinert and

D.S.

Schröder , GORAM: Privacy, Access Control, and Verifiability in Group Outsourced Storage, 2014, Full version online at http://www.sps.cs.uni-saarland.de/publications/goram.pdf.

70.

Maffei ,

Pecina and

Reinert , Security and privacy by declarative design, in: CSF’13, IEEE Press, 2013, pp. 81–96.

71.

Mayberry ,

E.-O.

Blass and

A.H.

Chan , Efficient private file retrieval by combining ORAM and PIR, in: NDSS’14, Internet Society, 2013.

72.

Miyaji ,

Nakabayashi and

Takano , Characterization of elliptic curve traces under FR-reduction, in: ICISC’00, LNCS, Vol. 2015, Springer, 2001, pp. 90–108.

73.

Ostrovsky and

W.E.

Skeith III , Algebraic Lower Bounds for Computing on Encrypted Data, ECCC 14(022) (2007).

74.

Paillier , Public-key cryptosystems based on composite degree residuosity classes, in: EUROCRYPT’99, LNCS, Springer, 1999, pp. 223–238.

75.

Papamanthou ,

Shi ,

Tamassia and

Yi , Streaming authenticated data structures, in: EUROCRYPT’13, 2013.

76.

Pinkas and

Reinman , Oblivious RAM revisited, in: CRYPTO’10, LNCS, Springer, 2010, pp. 502–519.

77.

R.L.

Rivest ,

Shamir and

Adleman , A method for obtaining digital signatures and public-key cryptosystems, Comm. ACM 21(2) (1978), 120–126. doi:10.1145/359340.359342.

78.

D.S.

Roche ,

Aviv and

S.G.

Choi , A practical oblivious map data structure with secure deletion and history independence, in: S&P’16, IEEE Press, 2016.

79.

Sahin ,

Zakhary ,

El Abbadi ,

H.R.

Lin and

Tessaro , TaoStore: Overcoming asynchronicity in oblivious data storage, in: S&P’16, IEEE Press, 2016.

80.

C.P.

Schnorr , Efficient identification and signatures for smart cards, in: CRYPTO’89, LNCS, Springer, 1989, pp. 239–252.

81.

Schröder and

Schröder , Verifiable data streaming, in: CCS’12, ACM, 2012, pp. 953–964. doi:10.1145/2382196.2382297.

82.

Schröder and

Simkin , VeriStream – a framework for verifiable data streaming, in: FC’15, Springer, 2015.

83.

Schwarz and

E.L.

Miller , Store, Forget, and Check: Using Algebraic Signatures to Check Remotely Administered Storage, 2006.

84.

Shacham and

Waters , Compact proofs of retrievability, in: ASIACRYPT’08, LNCS, Springer, 2008, pp. 90–107.

85.

Shi ,

T.-H.H.

Chan ,

Stefanov and

Li , Oblivious RAM with O((log n)³) worst-case cost, in: ASIACRYPT’11, LNCS, Springer, 2011, pp. 197–214.

86.

Stefanov and

Shi , Multi-cloud oblivious storage, in: CCS’13, ACM, 2013, pp. 247–258. doi:10.1145/2508859.2516673.

87.

Stefanov and

Shi , ObliviStore: High performance oblivious cloud storage, in: S&P’13, IEEE Press, 2013, pp. 253–267.

88.

Stefanov ,

Shi and

Song , Towards practical oblivious RAM, in: NDSS’12, Internet Society, 2012.

89.

Stefanov ,

van Dijk ,

Oprea and

Juels , Iris: A Scalable Cloud File System with Efficient Integrity Checks, Cryptology ePrint Archive, Report 2011/585, 2011. http://eprint.iacr.org/.

90.

Stefanov ,

van Dijk ,

Shi ,

Fletcher ,

Ren ,

Yu and

Devadas , Path ORAM: An extremely simple oblivious RAM protocol, in: CCS’13, ACM, 2013.

91.

van Dijk ,

Juels ,

Oprea ,

R.L.

Rivest ,

Stefanov and

Triandopoulos , Hourglass schemes: How to prove that cloud files are encrypted, in: CCS’12, ACM, 2012, pp. 265–280.

92.

Williams ,

Sion and

Carbunar , Building castles out of mud: Practical access pattern privacy and correctness on untrusted storage, in: CCS’08, ACM, 2008, pp. 139–148. doi:10.1145/1455770.1455790.

93.

Williams ,

Sion and

Tomescu , PrivateFS: A parallel oblivious file system, in: CCS’12, ACM, 2012, pp. 977–988. doi:10.1145/2382196.2382299.

94.

K.T.

Win ,

Susilo and

Mu , Personal health record systems and their security protection, J. Med. Sys. 30(4) (2006), 309–315. doi:10.1007/s10916-006-9019-y.

95.

A.C.-C.

Yao , How to generate and exchange secrets, in: FOCS’86, IEEE Press, 1986, pp. 162–167.

Group ORAM for privacy and access control in outsourced personal records

Abstract

Keywords

1. Introduction

1.1. Our contributions

1 ORAM is a technique originally devised to protect the access pattern of software on the local memory and then used to hide the data and the user’s access pattern in storage outsourcing services.

2.1. Group ORAM

Definition 1 (Group ORAM).

2.2. The attacker model

2 I.e., the server is regarded as a passive adversary, following the protocol but seeking to gather additional information.

Definition 8 (Accountable integrity).

3.1. Formal result

4. PIR-GORAM

4.1. Cryptographic preliminaries

3 Gossiping is necessary since we do not trust the server for consistency.

4.5. Variations

4.6. Discussion

5. TAO-GORAM

6. GORAM

6.1. Cryptographic preliminaries

6 Since encrypting a long payload using predicate encryption is expensive, the concrete instantiation that we evaluate in Section 10 uses hybrid encryption instead.

7 For simplifying the notation, we assume for each encryption scheme that the public key is part of the secret key.

6.5. Integrity proofs revisited

6.5.1. Batched zero-knowledge proofs of shuffle correctness

6.5.2. The hash-and-proof paradigm

6.5.3. Discussion

7. GORAM with accountable integrity (A-GORAM)

7.1. Cryptographic preliminaries

8. Scalable solution (S-GORAM)

9. Security and privacy results

Theorem 5 ( TAO - GORAM ).

Theorem 6 ( GORAM ).

Theorem 7 ( A - GORAM ).

Theorem 8 ( S - GORAM ).

10. Implementation and experiments

10.1. Cryptographic instantiations

8 This property is formally necessary when proving the hybrid version of our constructions tamper-resistant. We refer to [69, Proof of Lemma 3] for the full proof in the hybrid version.

10.3. Experiments

Footnotes

Acknowledgments

References

¹
ORAM is a technique originally devised to protect the access pattern of software on the local memory and then used to hide the data and the user’s access pattern in storage outsourcing services.

²
I.e., the server is regarded as a passive adversary, following the protocol but seeking to gather additional information.

³
Gossiping is necessary since we do not trust the server for consistency.

⁶
Since encrypting a long payload using predicate encryption is expensive, the concrete instantiation that we evaluate in Section 10 uses hybrid encryption instead.

⁷
For simplifying the notation, we assume for each encryption scheme that the public key is part of the secret key.

Theorem 5 ( $TAO - GORAM$ ).

Theorem 6 ( $GORAM$ ).

Theorem 7 ( $A - GORAM$ ).

Theorem 8 ( $S - GORAM$ ).

⁸
This property is formally necessary when proving the hybrid version of our constructions tamper-resistant. We refer to [69, Proof of Lemma 3] for the full proof in the hybrid version.