A multi-server oblivious dynamic searchable encryption framework

Abstract

Data privacy is one of the main concerns for data outsourcing on the cloud. Although standard encryption can provide confidentiality, it prevents the client from searching/retrieving meaningful information on the outsourced data thereby, degrading the benefits of using cloud services. To address this data utilization versus privacy dilemma, Dynamic Searchable Symmetric Encryption (DSSE) has been proposed. DSSE enables encrypted search and update functionality over the encrypted data via a secure index. However, the state-of-the-art DSSE constructions leak information from the access pattern, making them vulnerable against various attacks. While generic Oblivious Random Access Machine (ORAM) can hide the access pattern, it incurs a heavy communication overhead, which was shown costly to be directly used in the DSSE setting. In this article, by exploiting the multi-cloud infrastructure, we develop a comprehensive Oblivious Distributed DSSE (ODSE) framework that allows oblivious search and updates on the encrypted index with high security and improved efficiency over the use of generic ORAM. Our framework contains a series of $ODSE$ schemes each featuring different levels of performance and security required by various types of real-life applications. ODSE offers desirable security guarantees such as information-theoretic security and robustness in the presence of a malicious adversary. We fully implemented $ODSE$ framework and evaluated its performance in a real cloud environment (Amazon EC2). Our experiments showed that ODSE schemes are $3 \times$ - $57 \times$ faster than using generic ORAMs on a DSSE encrypted index under real network settings.

Keywords

Searchable encryption write-only ORAM multi-server PIR privacy-preserving clouds

1. Introduction

The concept of storage-as-a-service provides a comprehensive storage architecture for companies or individuals to store data on the cloud, thereby reducing the data management and maintenance cost. Despite its usefulness, recent data breach incidents on such systems have shown the importance of preserving the confidentiality of sensitive data stored on the cloud. Although standard encryption (e.g., AES) can preserve data privacy, it also prevents the users from searching or retrieving meaningful information from outsourced data, which invalidates some benefits of using cloud storage services. To address this data utilization vs. privacy dilemma, the concept of searchable symmetric encryption (SSE) was introduced [35]. Since then, many SSE schemes have been developed in attempts to offer practical query functionality while, at the same time, preserving user privacy and data confidentiality. In the following sections, we first review the ongoing research on SSE and outline the security limitations of state-of-the-art approaches.

1.1. State-of-the-arts and limitation

SSE. Song et al. were the first to propose the concept of SSE [35]. Later, Curtmola et al. [12] defined indistinguishability under the adaptive chosen keyword attack (IND-CKA2) as a formal security notion for SSE, and presented an IND-CKA2-secure scheme supporting single keyword search. The security is achieved by constructing a secure index (I) representing the relationship between keywords and encrypted files ( $F$ ), both of which ( $⟨ I, F ⟩$ ) are outsourced to the cloud. Several refinements based on this index model have been proposed to offer more functionality and query diversity such as ranked query [41] and/or multi-keyword search [8,39]. The main limitation of these constructions is that they are only static, meaning that they can only perform a search on the encrypted data with no update allowed after the setup. Kamara et al. were among the first to propose Dynamic Searchable Symmetric Encryption (DSSE) [25], which enables both search and update functionalities on encrypted files. After their studies, many DSSE schemes have been proposed, each offering distinct performance, functionality and security trade-offs [4,7,10,25,28,44,46].

Information leakages in SSE and limitations of other approaches. SSE without relying on the encrypted index has been shown to be vulnerable against many attacks [9,31]. On the other hand, although the encrypted index-based SSE is known to be more secure, it still leaks a lot of information that the adversary can exploit to conduct statistical attacks [23,27,45]. For instance, when the client performs a search on the encrypted index, the search token in DSSE might reveal the content files to be updated in the future as well as all of the historical updates on files matching with the token. These leakages are defined as forward-privacy and backward-privacy, respectively [37]. Zhang et al. showed that it is possible to learn which keywords have been searched in forward-insecure DSSE schemes through file-injection attacks [45]. Most efficient DSSE schemes [10,20,24,25] do not provide forward- and backward-privacy when searching on I. Although there are some forward- and backward-private DSSE schemes being proposed recently (e.g., [6,7]), they rely on costly public key operations [17]. More severely, since the search, update and retrieval queries in DSSE are deterministic, all standard DSSE schemes leak access patterns on both I and $F$ . In particular, the client leaks the file-access pattern when updating a file or when retrieving a set of files matching with the search query performed on the encrypted index. Similarly, the client leaks the index-access pattern when performing the search or update on the encrypted index. Liu et al. [27] demonstrated a practical attack that can determine which keywords being searched by observing the search pattern.

To seal most of the access pattern leakage in DSSE, one can use a generic1

¹
By generic ORAM, we mean the technique that can hide whether the access is to read or to write as opposed to read-only Private Information Retrieval or Write-Only ORAM.

Oblivious Random Access Machine (ORAM) technique [38] to conduct oblivious access on I and

F

. Garg et al. [15] proposed TWORAM, which optimizes the use of ORAM to hide file access patterns in DSSE.2

It differs from the objective of this paper, where we focus on hiding access patterns on the encrypted index in DSSE (see Section 8 for clarification).

Despite its merits, prior studies such as [10,29] stated that generic ORAM [38] is expensive to be used in DSSE setting due to its logarithmic client-bandwidth overhead. Although ORAM schemes with a constant bandwidth complexity have been introduced recently [14], they rely on costly cryptographic protocols (i.e., homomorphic encryption [30]), whose performance was shown worse than ORAMs with logarithmic bandwidth overhead [1]. Alternative solutions trying to avoid generic ORAM are either very costly or unable to seal access pattern leakage in DSSE [5,21].

1.2. Our contributions

In DSSE, it is highly desirable to seal access pattern leakages when accessing the encrypted index (I) and encrypted files ( $F$ ). Since the size of individual files in $F$ can be arbitrarily large and each search/update query might involve with a different number of files, to the best of our knowledge, generic ORAM seems to be the only option for oblivious access on $F$ . In this paper, we focus more on oblivious access techniques on the index (I) that are more bandwidth-efficient than using generic ORAM (Fig. 1). Specifically, we propose $ODSE$ , a comprehensive oblivious encrypted index framework in the multi-server setting with the application to DSSE. The framework contains three $ODSE$ schemes including ${ODSE}_{xor}^{wo}$ , ${ODSE}_{ro}^{wo}$ and ${ODSE}_{it}^{wo}$ each offering various performance and security properties as follows.

Fig. 1.

Our research objective and high-level approach.

Full obliviousness with information-theoretic security : $ODSE$ seals information leakages when accessing the encrypted index that might lead into statistical attacks. Our constructions hide the index-access pattern, and therefore provide forward- and backward-privacy and secrecy of the query types (search/update). ${ODSE}_{xor}^{wo}$ and ${ODSE}_{ro}^{wo}$ offers computational security for the encrypted index as well as access operations on it. On the other hand, ${ODSE}_{it}^{wo}$ provides information-theoretic statistical security (see Section 5).

Low end-to-end delay : All $ODSE$ schemes offer low end-to-end-delay, which are $3 \times$ - $57 \times$ faster than using generic ORAM atop the DSSE encrypted index (with optimization [15]) under real network settings (see Section 8).

Robustness against malicious adversary : In the present work, we provide secure methods not only in the honest-but-curious setting but also in the malicious environment. Our $ODSE$ schemes offer various levels of robustness in the distributed setting. In the semi-honest setting, ${ODSE}_{ro}^{wo}$ and ${ODSE}_{it}^{wo}$ are robust against corrupted servers that do not respond due to system/network failure. All $ODSE$ schemes can be extended to be secure against malicious adversary. Specifically, the extended ${ODSE}_{xor}^{wo}$ scheme can detect if there exists any malicious server in the system (but without knowing which server it is). The extended ${ODSE}_{ro}^{wo}$ and ${ODSE}_{it}^{wo}$ schemes can not only detect which server(s) is malicious, but also be robust against incorrect replies by malicious servers.

Full-fledged implementation and open-sourced framework : We fully implemented all the proposed $ODSE$ schemes, and evaluated their performance on real-cloud infrastructure. To the best of our knowledge, we are among the first to open-source an oblivious access framework for the encrypted index in DSSE (see Section 8). The code is available at https://github.com/thanghoang/ODSE.

Improvement over the IFIP DBSec’18 Conference Version [ 22 ]. This article is the extended version of [22], which includes the following improvements. First, we introduce a new $ODSE$ scheme called ${ODSE}_{ro}^{wo}$ , which is a hybrid scheme between ${ODSE}_{xor}^{wo}$ and ${ODSE}_{it}^{wo}$ originally presented in [22]. ${ODSE}_{ro}^{wo}$ inherits the best of both worlds, in which it features low search/update delay and robustness in the distributed setting simultaneously. Second, we extended all the proposed $ODSE$ schemes into the malicious-setting, which was only discussed briefly in [22]. Third, we conducted more experiments to evaluate the performance of the new ${ODSE}_{ro}^{wo}$ scheme as well as all the extended $ODSE$ schemes in the malicious setting with different number of corrupted servers.

2. Preliminaries and building blocks

2.1. Notation

We denote a finite field as $F_{p}$ where p is a prime. Operators $| |$ and ⊕ denote the concatenation and XOR, respectively. ${⟨ \cdot ⟩}_{bin}$ denotes the binary representation. $[N]$ denotes ${1, \dots, N}$ . $u \cdot v$ denotes the dot product of two vectors u and v. $x \overset{$}{\leftarrow} S$ denotes that x is randomly and uniformly selected from $S$ . Given I as a row/column of a matrix, $I [i]$ denotes accessing the i-th component of I. Given a matrix I, $I [*, j \dots j^{'}]$ denotes accessing columns j to $j^{'}$ of I. $I [i, *]$ and $I [*, j]$ denotes accessing the entire row i and column j of I, respectively. Let $E = (Gen, Enc, Dec)$ be an IND-CPA symmetric encryption [26]: $κ \leftarrow E . Gen (1^{λ})$ generating key with security parameter λ; $C \leftarrow E . {Enc}_{κ} (M, c)$ encrypting plaintext M with key κ and counter c; $M \leftarrow E . {Dec}_{κ} (C, c)$ decrypting ciphertext C with key κ and counter c.

Fig. 2.

Shamir secret sharing (SSS) scheme [33].

2.2. Shamir secret sharing

We present $(t, ℓ)$ -threshold Shamir Secret Sharing (SSS) scheme [33] in Fig. 2. Given a secret $b \in F_{p}$ to be shared, the dealer generates a random t-degree polynomial f and evaluates $f (x_{i})$ for each party $P_{i} \in {P_{1}, \dots, P_{ℓ}}$ , at point $x_{i}$ which is a non-zero element of $F_{p}$ . $x_{i}$ is used to identify party $P_{i}$ ( $SSS . CreateShare$ algorithm). We denote the share for $P_{i}$ as ${⟦ b ⟧}_{i}$ for $1 ⩽ i ⩽ ℓ$ . The secret can be reconstructed by combining at least $t + 1$ correct shares via Lagrange interpolation ( $SSS . Recover$ algorithm).

$SSS$ is a t-private secret sharing scheme in the sense that any combinations of t shares leak no information about the secret. $SSS$ offers homomorphic properties including addition, scalar multiplication, and partial multiplication. We extend the notion of SSS-share of value to indicate the share of a vector. That is, given a vector $v = (v_{1}, \dots, v_{n})$ , ${{⟦ v ⟧}_{i} = {(⟦ v_{1} ⟧}_{i}, \dots, ⟦ v_{n} ⟧}_{i})$ indicates the share of v for party $P_{i}$ , in which each components in $⟦ v ⟧$ is the $SSS$ -share of the corresponding components in v.

2.3. Private information retrieval

Private Information Retrieval (PIR) technique enables private retrieval of a data item from a (unencrypted) public database server. PIR in the distributed setting is defined as follows.

Fig. 3.

XOR-based PIR [11].

Definition 1 (multi-server PIR [2,16]).

Let $b = (b_{1}, \dots, b_{n})$ be a database consisting of n items being stored in ℓ servers. A multi-server PIR protocol consists of three algorithms as follows. Given an item $b_{j}$ in b to be retrieved, the client creates queries $(ρ_{1}, \dots, ρ_{ℓ}) \leftarrow PIR . CreateQuery (j)$ and distributes $ρ_{i}$ to server $S_{i}$ for each $i \in {1 \dots ℓ}$ . Each server $S_{i}$ responds with an answer $r_{i} \leftarrow PIR . Retrieve (ρ_{i}, b)$ . Upon receiving $R = {r_{1}, \dots, r_{ℓ}}$ answers, the client computes the value of item b by invoking the reconstruction algorithm $b \leftarrow PIR . Reconstruct (R)$ .

A multi-server PIR is correct if the client can obtain the correct value of b from ℓ answers via $PIR . Reconstruct$ algorithm with the probability 1. A multi-server PIR is t-private if $\forall j, j^{'} \in {1, \dots, n}$ , $\forall L \subseteq {1, \dots, ℓ}$ s.t. $| L | ⩽ t$ , the probability distributions of ${ρ_{j \in L} : (ρ_{1}, \dots, ρ_{ℓ}) \leftarrow PIR . CreateQuery (j)}$ and ${ρ_{j^{'} \in L}^{'} : (ρ_{1}^{'}, \dots, ρ_{ℓ}^{'}) \leftarrow PIR . CreateQuery (j^{'})}$ are identical.

We recall two efficient multi-server PIR protocols as follows.

XOR-based PIR [ 11 ]. It relies on XOR trick to perform the private retrieval, in which the database b contains n items $b_{i}$ , each being interpreted as a m-bit string (Fig. 3).

$SSS$ -based PIR [ 2 , 16 ]. It relies on $SSS$ to improve the robustness of multi-server PIR, in which the database b contains n items $b_{i}$ , each being interpreted as an element of $F_{p}$ (Fig. 4).

Fig. 4.

SSS -based PIR [2,16].

Write-Only ORAM. ORAM allows the user to hide access patterns when accessing their encrypted data on the cloud. In contrast to generic ORAM where both read and write operations are hidden, Blass et al. [3] proposed a Write-Only ORAM scheme, which only hides the write pattern in the context of hidden volume encryption. Intuitively, $2 n$ memory slots are used to store n blocks, each assigned to a distinct slot and a position map is maintained to keep track of block’s location. Given a block to be rewritten, the client reads $O (λ)$ slots chosen uniformly at random and writes the block to a dummy slot among $O (λ)$ slots. Data in all slots are encrypted to hide which slot is updated. By selecting λ sufficiently large, one can achieve a negligible failure probability, which might occur when all λ slots are non-dummy. It is also possible to select a small λ. In this case, the client maintains a stash component S of size $O (log n)$ to temporarily store blocks that cannot be rewritten when all read slots are full.

3. System and security models

In this section, we present the system and security models of our framework.

3.1. System model

Our system model comprises a client and ℓ servers $S = (S_{1}, \dots, S_{ℓ})$ , each storing a version of the encrypted index. The encrypted files are stored on a separate server different from $S$ (as in [21]), which can be obliviously accessed via a generic ORAM scheme [32,38]. In this paper, we focus only on oblivious access on distributed encrypted index $I$ on $S$ . We present the definition of $ODSE$ as follows.

Definition 2.
An Oblivious Distributed Dynamic Searchable Symmetric Encryption ( $ODSE$ ) scheme is a tuple of one algorithm and two protocols $ODSE = (Setup, Search, Update)$ , where the input and the output for the client and the servers are separated with semicolon such that:
$(σ, I) \leftarrow Setup (F)$ : Given a set of files $F$ as input, the algorithm outputs a distributed encrypted index $I$ and a client state σ.

$(R; ⊥) \leftarrow Search (w, σ; I)$ : The client inputs a keyword w to be searched and the state σ; the servers input the distributed encrypted index $I$ . The protocol outputs to the client a set $R$ containing file identifiers, in which w appears.

$(σ^{'}; I^{'}) \leftarrow Update (f_{i d}, σ; I)$ : The client inputs the updated file $f_{i d}$ and a state σ; the servers input the distributed encrypted index $I$ . The protocol outputs a new state $σ^{'}$ and the updated index $I^{'}$ to the client and servers, respectively.

3.2. Security model

In our system, the client is trusted and the set of servers $S$ are untrusted. We first consider the servers to be semi-honest, meaning that they follow the protocol faithfully, but can record the protocol transcripts to learn information regarding the client’s access pattern. Later, we show that our framework can be extended to be secure against malicious servers that can tamper with the input data to compromise the correctness and the security of the system (Section 6). We allow up to $t < ℓ$ (privacy parameter) servers among $S$ to be colluding, meaning that they can share their own recorded protocol transcripts with each other. Formally, the security of $ODSE$ in the semi-honest setting can be defined as follows.

Definition 3 ( $ODSE$ security w. r. t. semi-honest adversary).

Let $\vec{o} = ({op}_{1}, \dots, {op}_{q})$ be an operation sequence, where ${op}_{i} \in {Search (w, σ; I), Update (f_{i d}, σ; I)}$ , w is a keyword to be searched and $f_{i d}$ is a file with identifier $i d$ whose relationship with unique keywords in the distributed encrypted index $I$ need to be updated, and σ denotes a client state information. Let ${ODSE}_{j} (\vec{o})$ represent the $ODSE$ client’s sequence of interactions with server $S_{j}$ , given an operation sequence $\vec{o}$ .

Correctness: An $ODSE$ is correct if for any operation sequence $\vec{o}$ , ${{ODSE}_{1}, \dots, {ODSE}_{ℓ}}$ returns data consistent with $\vec{o}$ , except with negligible probability.

t-security: An $ODSE$ is t-secure if $\forall L \subseteq {1, \dots, ℓ}$ such that $| L | ⩽ t$ , for any two operation sequences ${\vec{o}}_{1}$ and ${\vec{o}}_{2}$ where $| {\vec{o}}_{1} | = | {\vec{o}}_{2} |$ , the views ${{ODSE}_{l \in L} ({\vec{o}}_{1})}$ and ${{ODSE}_{l \in L} ({\vec{o}}_{2})}$ observed by a coalition of up to t servers are computationally indistinguishable.

$ODSE$ operation obliviousness. By Definition 2, keyword search and file update are the two main operations in searchable encryption. Given that these operations might incur different procedures, we can trigger both search and update protocols for any actual action to achieve the operation obliviousness according to Definition 3. In this case, the server can guess (at best) with a probability of $\frac{1}{2}$ what operation the client is performing “in real” i.e. either search or update.

4. The proposed (semi-honest) $ODSE$ schemes

Intuition. In DSSE, keyword search and file update on I are read-only and write-only operations, respectively. This property permits us to leverage specific bandwidth-efficient oblivious access techniques for each operation such as multi-server PIR (for search) and Write-Only ORAM (for update) rather than using a generic ORAM. The second requirement is to identify a suitable data structure for I so that these bandwidth-efficient techniques can be adapted. In DSSE, forward index and inverted index are the ideal choices for the file update and keyword search operations, respectively as proposed in [20]. However, performing search and update on two isolated indexes will lead to inconsistency. The server might perform a synchronization to make two indices consistent; however, this will leak significant information regarding the client query and file content. Therefore, to avoid this problem, it is mandatory to seek a data structure, where both search index and update index can be integrated together. Fortunately, this can be achieved by harnessing a two-dimensional index (i.e., matrix), which allows keyword search and file update to be performed in two separate dimensions without creating any inconsistency at their intersections. This strategy permits us to perform computation-efficient (multi-server) PIR on one dimension, and communication-efficient (Write-Only) ORAM on the other dimension to achieve oblivious search and update, respectively.

In the following, we first describe the data structures used in $ODSE$ framework, and then present semi-honest $ODSE$ schemes in details. We analyze the security of $ODSE$ schemes and present their extension into malicious setting in Section 5 and Section 6, respectively.

4.1. $ODSE$ data structures

Table 1
$ODSE$ symbols and notation

Symbol Description

$N, M$ Maximum number of files and keywords in DB.

I Incidence Matrix Index

$N^{'}$ Number of $(⌈ {log}_{2} p ⌉ - 1)$ -bit blocks (i.e., $N^{'} = ⌈ \frac{N}{⌈ {log}_{2} p ⌉ - 1} ⌉$ ).

$T_{f}, T_{w}$ Static hash tables for files and keywords.

$D$ Set of dummy (empty) columns

S Stash to (temporarily) store column data

c Column counter vector

Symbol	Description
$N, M$	Maximum number of files and keywords in DB.
I	Incidence Matrix Index
$N^{'}$	Number of $(⌈ {log}_{2} p ⌉ - 1)$ -bit blocks (i.e., $N^{'} = ⌈ \frac{N}{⌈ {log}_{2} p ⌉ - 1} ⌉$ ).
$T_{f}, T_{w}$	Static hash tables for files and keywords.
$D$	Set of dummy (empty) columns
S	Stash to (temporarily) store column data
c	Column counter vector

Our index to be stored at the server(s) is an incidence matrix (I), where each cell ( $I [i, j] \in {0, 1}$ ) represents the relationship between the keyword indexed at row i and the file indexed at column j. So, each row of I represents the search result of a keyword and each column represents the content (i.e., keywords) of a file. Since we use Write-Only ORAM for file update, the number of columns in I are doubled to the maximum number of files that can be stored in the outsourced database. In other words, given N distinct files and M unique keywords in the database, our index is of size $M \times 2 N$ . At the client side, we leverage two position maps $T_{w}$ , $T_{f}$ to keep track of location of keywords and files in I, respectively. They are of structure $T : = ⟨ key, value ⟩$ , where $key$ is a keyword or file ID and $value \leftarrow T [key]$ is the (row/column) index of $key$ in I. Due to Write-Only ORAM, the client maintains a stash component S to temporarily store columns that might not be written back during the update due to the overflow. Table 1 summarizes notation and symbols used to describe our schemes.

4.2.

{ODSE}_{xor}^{wo}

: Fast ODSE

We introduce ${ODSE}_{xor}^{wo}$ , an $ODSE$ scheme that offers a low search delay by using XOR trick. We present the setup algorithm in $ODSE$ as well as its oblivious search and update protocols as follows.

Fig. 5.

${ODSE}_{xor}^{wo}$ setup algorithm.

Fig. 6.

${ODSE}_{xor}^{wo}$ search protocol.

Setup. Fig. 5 presents setup algorithm to construct the encrypted index in $ODSE$ . Specifically, it first initializes an unencrypted incidence matrix ( $I^{'}$ ) of size $M \times 2 N$ (line 1), and generates a master key to be used for generating row keys to encrypt each row of $I^{'}$ (line 3). It extracts unique keywords from input files (line 4), assigns each keyword and file into a row and column of $I^{'}$ selected randomly (lines 6, 9), and then sets the value for each cell of $I^{'}$ corresponding to the relationship between keywords and files (line 10). Finally, the algorithm generates a distinct key for each row of $I^{'}$ by the master key (line 12), and encrypts each cell of $I^{'}$ by a distinct pair of row key and column counter resulting in an encrypted index I (line 14). We encrypt the index bit-by-bit and the resulting ciphertext of each input bit is also one bit long. This can be implemented by, for example, AES with CTR mode, where we generate a 128-bit pseudorandom stream key by the master row key ( $τ_{i}$ ) and the column counter ( $j | | c_{j}$ ), but only XOR the plaintext bit with the most significant bit of the stream key. To this end, the client sends a replica of I to ℓ servers and keeps some information (i.e., $κ, T_{w}, T_{f}, c$ ) private.

Search. ${ODSE}_{xor}^{wo}$ harnesses XOR-based PIR on the row dimension of I to conduct the oblivious keyword search as shown in Fig. 6. The client first looks up the keyword position map to get the row index of the searched keyword (line 1). The client then creates XOR-PIR queries (line 2) and sends them to corresponding servers, each answering the client with the output of the PIR retrieval algorithm (line 4). Notice that the data is IND-CPA encrypted rather than being public as in the standard PIR model. Therefore, after recovering the row from the PIR retrieval (line 6), the client generates the row key (line 7) and then decrypts the row to obtain the search result (line 9).

Update. Recall that the content (i.e., keywords) of a file is represented by a column in I. Given a file $f_{i d}$ to be updated, ${ODSE}_{xor}^{wo}$ applies Write-Only ORAM mechanism on the column dimension of I to update keyword-file pairs in $f_{i d}$ as shown in Fig. 7. The client creates a new column representing the relationship between the updated file and keywords in the database (lines 2–3), and stores it in the stash (line 4). The client then randomly selects λ column indexes and requests an arbitrary server to transmit the corresponding columns of I (lines 5–6). The client generates row keys and decrypts λ columns (lines 7–10). The client overwrites dummy columns among λ columns with columns stored in the stash (lines 11–12). Finally, the client re-encrypts λ columns and sends them to ℓ servers (lines 18–20).

Fig. 7.

${ODSE}_{xor}^{wo}$ update protocol.

4.3.

{ODSE}_{ro}^{wo}

: Robust

ODSE

The described ${ODSE}_{xor}^{wo}$ scheme requires all ℓ servers in the system to answer the client. If one server does not reply due to system/network failure, the correctness of ${ODSE}_{xor}^{wo}$ will not hold anymore. We propose ${ODSE}_{ro}^{wo}$ , an $ODSE$ scheme that can achieve the robustness against unresponsive servers. ${ODSE}_{ro}^{wo}$ harnesses the t-out-of-ℓ property of $SSS$ , which allows to maintain the correctness given that some servers (i.e., up to $ℓ - t - 1$ ) do not answer. We define the setup algorithm along with the oblivious search and update protocols in ${ODSE}_{ro}^{wo}$ scheme as follows.

Setup. ${ODSE}_{ro}^{wo}$ works over the index encrypted with IND-CPA encryption. Therefore, the setup algorithm of ${ODSE}_{ro}^{wo}$ is identical to that of ${ODSE}_{xor}^{wo}$ scheme as shown in Fig. 8.

Fig. 8.

${ODSE}_{ro}^{wo}$ setup algorithm.

Fig. 9.

${ODSE}_{ro}^{wo}$ search protocol.

Fig. 10.

${ODSE}_{ro}^{wo}$ update protocol.

Search. ${ODSE}_{ro}^{wo}$ harnesses $SSS$ -based PIR protocol on the row dimension of I to conduct keyword search as shown in Fig. 9. Specifically, the client first retrieves the row index of the searched keyword from the keyword position map (line 1). The client then creates $SSS$ -based PIR queries (line 2) and sends to corresponding servers, each replying with the output of the $SSS$ -based PIR retrieval algorithm. Notice that the $SSS$ -based PIR retrieval algorithm performs the dot product between the client query and the database input via scalar multiplication and additive homomorphic properties of SSS. This requires the database input to be elements in $F_{p}$ . Since each row in I is a uniformly random binary string of length $2 N$ due to IND-CPA encryption, the servers split each row of I into $2 N^{'}$ chunks ( $c_{k}$ ) with the equal size such that $| c_{k} | < {log}_{2} p$ (line 6). The dot product is performed iteratively between the search query and divided chunks from all rows of I (lines 7–8). After receiving answers from ℓ servers, the client recovers all chunks of the searched row (lines 10–12) and finally, decrypts the row to obtain the search result (lines 13–17).

Update. ${ODSE}_{ro}^{wo}$ harnesses Write-Only ORAM mechanism on the column dimension of I to perform file update. Since the index I in ${ODSE}_{ro}^{wo}$ is identical to ${ODSE}_{xor}^{wo}$ , the update protocol of ${ODSE}_{ro}^{wo}$ is also identical to that of ${ODSE}_{xor}^{wo}$ (Fig. 10).

4.4.

{ODSE}_{it}^{wo}

: Robust and information-theoretically secure

ODSE

${ODSE}_{ro}^{wo}$ scheme relies on IND-CPA encryption for the encrypted index so that it only offers (at most) computational security. In this section, we introduce ${ODSE}_{it}^{wo}$ , an $ODSE$ scheme that can achieve the highest level of security (i.e., information-theoretic) for the index as well as any operations (search and update) on it. The main idea is to share the index with $SSS$ , and harness $SSS$ -based PIR to conduct private search. We describe the algorithms of ${ODSE}_{it}^{wo}$ as follows.

Setup. Fig. 11 presents the setup algorithm to construct the distributed index in ${ODSE}_{it}^{wo}$ . Specifically, it first constructs an (unencrypted) index ( $I^{'}$ ) representing keyword-file relationships as in other $ODSE$ schemes. Instead of encrypting $I^{'}$ with an IND-CPA encryption scheme, it creates the shares of $I^{'}$ with $SSS$ and distributes them to corresponding servers. As discussed above, $SSS$ operates over elements in $F_{p}$ . Therefore, it is required to split each row of $I^{'}$ into $⌊ {log}_{2} p ⌋$ -bit chunks (line 4), and compute $SSS$ share for each chunk (line 5). Therefore, the “encrypted” index in $ODSE$ contains ℓ $SSS$ -shares of $I^{'}$ for ℓ servers, each being a matrix $I_{l}$ of size $M \times 2 N^{'}$ , where $I_{l} [i, j] \in F_{p}$ and $N^{'} = N / ⌊ {log}_{2} p ⌋$ . To this end, the client sends $I_{l}$ to server $S_{l}$ and keep position maps (i.e., $T_{w}, T_{f}$ ) private.

Fig. 11.

${ODSE}_{it}^{wo}$ setup algorithm.

Fig. 12.

${ODSE}_{it}^{wo}$ search protocol.

Search. Similar to ${ODSE}_{ro}^{wo}$ , ${ODSE}_{it}^{wo}$ harnesses the $SSS$ -based PIR protocol on the row dimension of I to conduct the keyword search as presented in Fig. 12. Generally speaking, the client gets the row index to be searched from the keyword position map, creates SSS-based PIR queries and send them to the corresponding servers, each replying with the outputs of the $SSS$ -based PIR retrieval algorithm (lines 1–6). Notice that since the index stored on $S_{l}$ is a share matrix, each dot product computation in the $SSS$ -based PIR retrieval algorithm will result in a share represented by a $2 t$ -degree polynomial. Therefore, the client needs to call the $SSS$ -based recover algorithm with the privacy parameter of $2 t$ (vs. t as in ${ODSE}_{ro}^{wo}$ ) to obtain the correct search result (line 8).

Fig. 13.

${ODSE}_{it}^{wo}$ update protocol.

Update. Similar to other $ODSE$ schemes, ${ODSE}_{it}^{wo}$ harnesses Write-Only ORAM mechanism on the column dimension of the index for the oblivious file update as outlined in Fig. 13. Specifically, the client creates a column representing the relationship between the updated file and keywords in the database, and temporarily stores it in the stash (lines 1–4). In ${ODSE}_{it}^{wo}$ , each column of the share index $I_{l}$ on $S_{l}$ actually contains the share of $⌊ {log}_{2} p ⌋$ columns of the unencrypted index $I^{'}$ . Therefore, it suffices to read $λ^{'} = ⌈ \frac{λ}{⌊ {log}_{2} p ⌋} ⌉$ random columns of $I_{l}$ from $t + 1$ arbitrary servers to reconstruct λ columns of $I^{'}$ (lines 5–10). The update is similar to other $ODSE$ schemes, in which the client aggressively over-writes dummy columns of $I^{'}$ with columns stored in the stash (lines 11–12). Finally, the client creates new $SSS$ shares for the retrieved columns (lines 13–16) and writes them back to ℓ servers (lines 18–20).

5. Security analysis

Remark 1.
One might observe that search and update operations in $ODSE$ schemes are performed on the row dimension and the column dimension of the encrypted index, respectively. This access structure might enable the adversary to learn whether the operation is search or update, even though each operation is secure. Therefore, to achieve security as in Definition 3, where the query type should also be hidden, we can trigger both search and update protocols (one of them is the dummy operation) regardless of whether the intended action is search or update.

We argue the security of our proposed schemes as follows. Theorem 1.
${ODSE}_{xor}^{wo}$ scheme is computationally $(ℓ - 1)$ -secure by Definition 3 .
Proof.
(Sketch) (i) Oblivious Search: ${ODSE}_{xor}^{wo}$ leverages XOR-based PIR and therefore, achieves ( $ℓ - 1$ )-privacy for keyword search as proven in [11]. (ii) Oblivious Update: ${ODSE}_{xor}^{wo}$ employs Write-Only ORAM which achieves negligible write failure probability and therefore, it offers the statistical security without counting the encryption. The index in ${ODSE}_{xor}^{wo}$ is IND-CPA encrypted, which offers computational security. Therefore in general, the update access pattern of ${ODSE}_{xor}^{wo}$ scheme is computationally indistinguishable. ${ODSE}_{xor}^{wo}$ performs Write-Only ORAM with an identical procedure on ℓ servers (e.g., the indexes of accessed columns are the same in ℓ servers), and therefore, the server coalition does not affect the update privacy of ${ODSE}_{xor}^{wo}$ . (iii) ODSE Security: By Remark 1, ${ODSE}_{xor}^{wo}$ performs both search and update regardless of the actual operation. As analyzed, search is $(ℓ - 1)$ -private and update pattern is computationally secure. Therefore, ${ODSE}_{xor}^{wo}$ achieves computational $(ℓ - 1)$ -security by Definition 3. □
Theorem 2.
${ODSE}_{ro}^{wo}$ scheme is computationally t-secure by Definition 3 .
Proof.
(Sketch) (i) Oblivious Search: ${ODSE}_{ro}^{wo}$ leverages a SSS-based PIR protocol and therefore, achieves t-privacy for keyword search due to the t-privacy property of $SSS$ as proven in [2,16]. (ii) Oblivious Update: Similar to ${ODSE}_{xor}^{wo}$ , ${ODSE}_{ro}^{wo}$ leverages Write-Only ORAM over IND-CPA encrypted database, which offers computational security as shown in [3]. (iii) ODSE Security: By Remark 1, for each actual operation, the client triggers both search and update protocols. Given that search is t-private and update pattern is computationally oblivious, the access pattern in ${ODSE}_{ro}^{wo}$ is a computationally indistinguishable in the presence of t colluding servers. □
Theorem 3.
${ODSE}_{it}^{wo}$ scheme is information-theoretically (statistically) t-secure by Definition 3 .
Proof.
(Sketch) (i) Oblivious Search: ${ODSE}_{it}^{wo}$ leverages an SSS-based PIR protocol and therefore, achieves t-privacy for keyword search due to the t-privacy property of $SSS$ [16]. (ii) Oblivious Update: The index in ${ODSE}_{it}^{wo}$ is $SSS$ -shared, which is information-theoretically secure in the presence of t colluding servers. ${ODSE}_{it}^{wo}$ also employs Write-Only ORAM, which offers statistical security due to negligible write failure probability. Therefore in general, the update access pattern of ${ODSE}_{it}^{wo}$ scheme is information-theoretically (statistically) indistinguishable in the coalition of up to t servers. (iii) ODSE Security: By Remark 1, ${ODSE}_{it}^{wo}$ performs both search and update protocols regardless of the actual operation. As analyzed above, search is t-private and update pattern is statistically t-indistinguishable. Therefore, ${ODSE}_{it}^{wo}$ is information-theoretically (statistically) t-secure by Definition 3. □

6. $ODSE$ in the malicious setting

In previous sections, we have shown that $ODSE$ schemes offer a certain level of collusion-resiliency and robustness in the semi-honest setting where the servers follow the protocol faithfully. In some privacy-critical applications, it is necessary to achieve data integrity and robustness in the malicious environment, where the adversary can tamper with the query and data to compromise the correctness and privacy of the protocol. In this section, we show that our proposed semi-honest $ODSE$ schemes can be extended to be secure and robust against malicious adversaries.

To achieve integrity of the index and the server-computation, our main idea is to harness computational and information-theoretic message authentication code (MAC) techniques. We first provide the definition of computational and information-theoretic MAC as follows.

Computational MAC [ 26 ] : Let $Σ = (Gen, Mac, Vrfy)$ be a secure keyed MAC scheme [26]: $θ \leftarrow Σ . Gen (1^{λ})$ generating a MAC key with security parameter λ; $μ \leftarrow Σ . {Mac}_{θ} (m)$ generating a tag for message $m \in {0, 1}^{*}$ with key θ; ${0, 1} \leftarrow Σ . {Vrfy}_{θ} (m, μ)$ verifying if the tag (μ) associated with the message (m) is either valid (1) or invalid (0).

∙ Information-theoretic MAC [ 13 ]: Let $θ \overset{$}{\leftarrow} F_{p}$ be a global MAC key, which is known only by the client. The MAC tag (μ) for each data block (b) is computed as $μ = θ \cdot b$ (over $F_{p}$ ). Given that the client maintains a consistent relationship between μ, b and θ while keeping them hidden from the adversary, the adversary cannot change b without changing μ and/or α. Therefore, μ is secret-shared among servers along with the shares of b. The verification can be done by reconstructing the block (b) as well as its tag (μ) from the shares, and comparing if $μ = θ \cdot b$ holds at the end.

In ${ODSE}_{xor}^{wo}$ and ${ODSE}_{ro}^{wo}$ schemes, we leverage the computational MAC scheme to achieve the integrity of the index encrypted by IND-CPA encryption. On the other hand, the ${ODSE}_{it}^{wo}$ offers information-theoretic security since its index is secret-shared, instead of IND-CPA encrypted. Therefore, we apply information-theoretic MAC to this scheme to preserve its security level. We now present the extensions of ODSE schemes into the malicious setting in details as follows.

6.1. $MD- {ODSE}_{xor}^{wo}$ : Maliciously-detectable ${ODSE}_{xor}^{wo}$

We present $MD- {ODSE}_{xor}^{wo}$ , the extended version of ${ODSE}_{xor}^{wo}$ from Section 4.2, which offers security against malicious adversary using the computational MAC. The verification allows the client to abort the protocol if he/she detects any malicious behaviors attempting to tamper with the encrypted index and/or the search/update query. Our $MD- {ODSE}_{xor}^{wo}$ protocols are defined as follows.

Setup. Fig. 14 presents the setup of MD- ${ODSE}_{xor}^{wo}$ scheme with the MAC tag generation for the encrypted index. Generally speaking, it first generates the encrypted index I similar to semi-honest ${ODSE}_{xor}^{wo}$ (line 1), and then generates a MAC key (line 2), followed by computing a matrix T containing the MAC tag for each $| μ |$ -bit blocks of each row of I (lines 3–5). In this context, each server in the system stores two matrices including the encrypted index I and the MAC matrix T.

Search. Fig. 15 presents the search protocol of $MD- {ODSE}_{xor}^{wo}$ , which is extended from the search protocol of semi-honest ${ODSE}_{xor}^{wo}$ to be secure against malicious adversary. Specifically, the client generates XOR-PIR queries for ℓ servers similar to the semi-honest ${ODSE}_{xor}^{wo}$ scheme (line 1). Each server performs the XOR-PIR retrieval on both the encrypted index (line 3) and the MAC components (line 4) using the same query received, and sends the result to the client. The client recovers the row of the encrypted index (line 6) as well as its corresponding tag (line 7). The client verifies each $| μ |$ -bit block with its corresponding tag (lines 8–10). If all the tags are valid, the client continues to decrypt the row to obtain the search result as in the semi-honest ${ODSE}_{xor}^{wo}$ scheme (line 11). Otherwise, the client aborts and notifies that at least one of the servers is malicious (line 10).

Fig. 14.

$MD- {ODSE}_{xor}^{wo}$ setup algorithm. Extensions from its semi-honest version are highlighted.

Fig. 15.

$MD- {ODSE}_{xor}^{wo}$ search protocol. Extensions from its semi-honest version are highlighted.

Fig. 16.

$MD- {ODSE}_{xor}^{wo}$ update protocol. Extensions from its semi-honest version are highlighted.

Update. Fig. 16 presents the update protocol of $MD- {ODSE}_{xor}^{wo}$ extended from the semi-honest ${ODSE}_{xor}^{wo}$ for malicious security. Instead of downloading λ random 1-bit columns as in the semi-honest ${ODSE}_{xor}^{wo}$ , the client downloads λ random columns of $| t |$ -bits as well as their corresponding MAC tag. Before decryption, the client verifies the integrity of the retrieved data by the MAC (lines 5–8). If there exists one invalid tag, the client aborts and notifies that at least one server is malicious (line 8). Otherwise, the client performs the update following the same line with the semi-honest ${ODSE}_{xor}^{wo}$ (line 9). Finally, the client creates new MAC tags for re-encrypted columns and send all of them to ℓ servers to be updated (lines 10–14).

6.2.

MR- {ODSE}_{ro}^{wo}

: Maliciously-robust

{ODSE}_{ro}^{wo}

Since ${ODSE}_{ro}^{wo}$ relies on $SSS$ for oblivious search, we can extend it in various ways to not only detect but also be robust against malicious adversary. One straightforward extension is to consider $SSS$ as a particular instance of Reed Solomon Code, and then implement Reed Solomon Decoding techniques [19,42] to handle incorrect server replies. However, this approach can only handle a small number of the malicious servers in the system (e.g., $t < ℓ / 3$ if using [42]), which might increase the deployment cost. Another approach is to harness the t-out-of-ℓ threshold property of $SSS$ along with the MAC technique presented in the previous section. The main idea is to select $(t + 1)$ answers among ℓ answers from the servers to recover the encrypted search result and its MAC tags. If there exists one invalid MAC, we repeat the recover process by selecting a different set of $(t + 1)$ answers until we find that all the tags are valid. This strategy offers the detection capability and robustness against malicious behaviors given that the majority of the servers is honest (i.e., $t < ℓ / 2$ ). Therefore, we opt-to this approach to design $MR- {ODSE}_{ro}^{wo}$ , the maliciously-robust version of ${ODSE}_{ro}^{wo}$ as follows.

Setup. The index structure of $MR- {ODSE}_{ro}^{wo}$ is identical to that of $MD- {ODSE}_{xor}^{wo}$ . Thus, its setup algorithm is identical to that of $MD- {ODSE}_{xor}^{wo}$ , where the MAC tag is created for each $| t |$ -bit blocks in each row of the encrypted index (Fig. 17).

Search. Fig. 18 outlines the search protocol of $MR- {ODSE}_{ro}^{wo}$ extended from that of ${ODSE}_{ro}^{wo}$ for malicious security. For each time of oblivious keyword search, the client creates SSS-based PIR query as in the semi-honest ${ODSE}_{ro}^{wo}$ (line 1), and the servers perform the $SSS$ -based PIR retrieval on both the encrypted index (line 3) and MAC components (line 4). Once receiving answers from ℓ servers, the client picks $t + 1$ out of ℓ replies (lines 6–7), and performs the $SSS$ recover via the Lagrange interpolation to obtain the encrypted search row (line 8) as well its MAC tag (lines 9–14). The client verifies the integrity of the encrypted row and decrypts it if all MAC tags are valid. If there exists one invalid tag, the client selects another set of $t + 1$ replies, and repeats the verification process. If the client tries all possible sets, which incurs (in total) $(\binom{ℓ}{t + 1})$ verification tests, but none produces all valid tags, the client aborts the protocol and notifies that a majority of servers ( $t > ℓ / 2$ ) is corrupted (line 13).

Update. The update protocol in $MR- {ODSE}_{ro}^{wo}$ is similar to that of $MD- {ODSE}_{xor}^{wo}$ (Fig. 19). To improve the robustness against malicious adversary, the client can request ℓ servers to transfer λ $| t |$ -bit columns, and selects one of ℓ replies to verify the integrity and performs the update.

Fig. 17.

$MR- {ODSE}_{ro}^{wo}$ setup algorithm.

6.3.

MR- {ODSE}_{it}^{wo}

: Maliciously-robust and IT-secure

{ODSE}_{it}^{wo}

In this section, we present $MR- {ODSE}_{it}^{wo}$ , the extended version of ${ODSE}_{it}^{wo}$ that inherits all properties of ${ODSE}_{it}^{wo}$ (e.g., information-theoretic security) along with the robustness against malicious adversary. To preserve the information-theoretic security, we use the information-theoretic MAC as defined above for each block. The details are as follows.

Fig. 18.

$MR- {ODSE}_{ro}^{wo}$ search protocol. Extensions from its semi-honest version are highlighted.

Fig. 19.

$MR- {ODSE}_{ro}^{wo}$ update protocol.

Setup. $MR- {ODSE}_{it}^{wo}$ follows the principles in the semi-honest ${ODSE}_{it}^{wo}$ scheme to create the share index (Fig. 20, line 1). It then creates a global MAC key by selecting a random element in $F_{p}$ (line 2). It multiplies the representative element in $F_{p}$ of each index block with the global MAC key over $F_{p}$ yielding the MAC tag, and then creates the $SSS$ shares for each tag (line 3). The $SSS$ shares of MAC tags are distributed along with the share index across ℓ servers.

Search. Fig. 21 presents the search protocol of $MR- {ODSE}_{it}^{wo}$ extended from that of ${ODSE}_{it}^{wo}$ for malicious security. The extension follows the line of the $MR- {ODSE}_{ro}^{wo}$ scheme. Specifically, the servers perform SSS-based PIR retrieval on both index and the MAC components (lines 3–4). The client picks $2 t + 1$ out of ℓ replies to recover and verify the integrity of the search result (lines 6–7). If after $(\binom{ℓ}{2 t + 1})$ trials with different subsets but none producing the valid tags, the client aborts the protocol and notifies that more than $ℓ / 3$ servers are malicious (line 7). Otherwise, the client continues to process the recovered data as in the semi-honest $MR- {ODSE}_{it}^{wo}$ scheme to obtain the final search result (line 15).

Fig. 20.

$MR- {ODSE}_{it}^{wo}$ setup algorithm. Extensions from its semi-honest version are highlighted.

Fig. 21.

$MR- {ODSE}_{it}^{wo}$ search protocol. Extensions from its semi-honest version are highlighted.

Fig. 22.

$MR- {ODSE}_{it}^{wo}$ update protocol. Extensions from its semi-honest version are highlighted.

Update. Fig. 22 presents the update protocol of $MR- {ODSE}_{it}^{wo}$ . Basically, the client downloads λ columns of the share index and their corresponding MAC from ℓ servers. The client selects $t + 1$ replies to recover and verify the integrity of downloaded data before performing update. If all tags are valid, the client performs the write-only ORAM procedure as in ${ODSE}_{it}^{wo}$ scheme, re-calculates the MAC tag for each block, and then creates new $SSS$ shares for each tag. Otherwise, the client aborts the protocol and notifies that a majority of servers is malicious.

7. Implementation

We fully implemented all $ODSE$ schemes in C++. We used Google Sparsehash library [36] to implement position maps $T_{f}$ and $T_{w}$ . We utilized Intel AES-NI library [18] to implement AES-CTR encryption/decryption in ${ODSE}_{xor}^{wo}$ and ${ODSE}_{ro}^{wo}$ schemes. We leveraged Shoup NTL library [34] for pseudo-random number generator and arithmetic operations over finite field. We used ZeroMQ library [43] for client-server communication. We used multi-threading technique to accelerate PIR computation at the server. The code is available at https://github.com/thanghoang/ODSE.

8. Performance evaluation

8.1. Configurations

Hardware and network settings. We used Amazon EC2 with r4.4xlarge instance for server(s), each equipped with 16 vCPUs Intel Xeon @ 2.3 GHz and 122 GB RAM. We used a laptop with Intel Core i5 @ 2.90 GHz and 16 GB RAM as the client. All machines ran Ubuntu 16.04. The client established a network connection with the server via WiFi connection. We used a real network setting, in which the download/upload throughput is 27/5 Mbps, respectively.

Dataset. We used the subsets of the Enron dataset to build I containing from millions to billions of keyword-file pairs. The largest dataset contain around 300,000 files with 320,000 unique keywords. Our tokenization is identical to [29] so that our keyword distribution and query pattern are similar to [29].

Instantiation of compared techniques. We compared $ODSE$ with a standard DSSE scheme [10], and the use of generic ORAM atop the DSSE encrypted index. The performance of all schemes was measured under the same setting and configuration We configured $ODSE$ schemes and their counterparts as follows.

ODSE : For the semi-honest setting, we deployed two servers for ${ODSE}_{xor}^{wo}$ and ${ODSE}_{ro}^{wo}$ schemes, and three servers for ${ODSE}_{it}^{wo}$ scheme. We selected $λ = 4$ for ${ODSE}_{xor}^{wo}$ and ${ODSE}_{ro}^{wo}$ , and $λ^{'} = 4$ with $F_{p}$ where p is a 16-bit prime for ${ODSE}_{ro}^{wo}$ schemes ${ODSE}_{it}^{wo}$ . We note that selecting larger p (e.g., $| p | = 64$ bits) can reduce the PIR computation time with the cost of the bandwidth overhead due to the increase of query size. We chose a 16-bit prime field to achieve a balanced computation and communication overhead. For the malicious setting, we first fixed the number of servers for ${ODSE}_{xor}^{wo}$ and ${ODSE}_{ro}^{wo}$ schemes to be two, three and four, respectively to handle one adversary. We then increased the number of servers to allow more malicious servers (see Section 8.6 for details).

Standard DSSE : We selected one of the most efficient DSSE schemes by Cash et al. in [10] (i.e., $Π_{2 lev}^{dyn}$ variant) to demonstrate the performance gap between $ODSE$ and the standard DSSE. We estimated the performance of $Π_{2 lev}^{dyn}$ using the same software/hardware environments and optimizations as $ODSE$ (e.g., parallelization, AES-NI acceleration). Note that we did not use the Java implementation of this scheme available in Clusion library [40] for comparison due to its lack of hardware acceleration support (i.e., no AES-NI) and the difference between running environments (Java VM vs. C). Our estimation is conservative in which, we used numbers that would be better than the Clusion library.

Using generic ORAM atop DSSE encrypted index : We selected non-recursive Path-ORAM [38] and Ring-ORAM [32], as $ODSE$ counterparts since they are the most efficient generic ORAM schemes for data outsourcing to date. Since we focus on encrypted index rather than encrypted files in DSSE, we did not explicitly compare our schemes with TWORAM [15] but instead used one of their techniques to optimize the performance of using generic ORAM on DSSE encrypted index. Specifically, we applied the selected ORAMs on the dictionary index as in [29] along with the round-trip optimization as in [15]. Note that these estimates are also conservative, where memory access delays were excluded, and cryptographic operations were optimized and parallelized for an objective comparison.

Fig. 23.

Latency of semi-honest $ODSE$ schemes and their counterparts.

8.2. Overall end-to-end delay in the semi-honest setting

Figure 23 presents the end-to-end delays of $ODSE$ schemes and their counterparts, where we performed both search and update protocols in $ODSE$ schemes to hide the actual type of operation (see Remark 1). $ODSE$ offers a higher security than standard DSSE at the cost of a longer delay. Nevertheless, $ODSE$ schemes are $3 \times$ – $57 \times$ faster than the use of generic ORAMs atop DSSE encrypted index to hide the access patterns. Specifically, with an encrypted index consisting of ten billions of keyword-file pairs, $Π_{2 lev}^{dyn}$ cost 36 milliseconds and 600 milliseconds to finish a search and update operation, respectively. ${ODSE}_{xor}^{wo}$ and ${ODSE}_{it}^{wo}$ , respectively, took 2.8 seconds and 8.6 seconds to accomplish both keyword search and file update operations, compared with 160 seconds by using Path-ORAM with the round-trip optimization [15].

We present the separate delay for the search and update operations in $ODSE$ schemes in Table 2. ${ODSE}_{xor}^{wo}$ is the most efficient in terms of search, whose delay was less than 1 second. This is due to the fact that ${ODSE}_{xor}^{wo}$ only triggers XOR operations and the size of the search query is minimal (i.e., a binary string). ${ODSE}_{ro}^{wo}$ and ${ODSE}_{it}^{wo}$ are more robust (e.g., malicious tolerant) and one of which is more secure (e.g., information-theoretic security) than ${ODSE}_{xor}^{wo}$ at the cost of higher search delay (i.e., 4 seconds) due to its larger search query and $SSS$ arithmetic computations. ${ODSE}_{it}^{wo}$ is the slowest among the three $ODSE$ schemes since it requires three servers and, therefore, the client needs to transmit more data.

Table 2
Comparison of $ODSE$ and its counterparts for oblivious access on I

Scheme Security Delay (second) Distributed setting †

Forward privacy Backward privacy Hidden access pattern ‡ Encrypted index ∗ Search Update Privacy level Improved robustness

Standard DSSE [10] ✗ ✗ ✗ Computational 0.036 0.62 – –

Path-ORAM [38] ✓ ✓ Computational Computational 160.6 – –

Ring-ORAM [32] ✓ ✓ Computational Computational 137.4 – –

${ODSE}_{xor}^{wo}$ ✓ ✓ Computational Computational 0.48 2.32 $ℓ - 1$ ✗

${ODSE}_{ro}^{wo}$ ✓ ✓ Computational Computational 3.45 1.85 $< ℓ$ ✓

${ODSE}_{it}^{wo}$ ✓ ✓ Information theoretic Information theoretic 4.54 4.08 $< ℓ / 2$ ✓

Scheme	Security	Delay (second)	Distributed setting †
Standard DSSE [10]	✗	✗	✗	Computational	0.036	0.62	–	–
Path-ORAM [38]	✓	✓	Computational	Computational	160.6	–	–
Ring-ORAM [32]	✓	✓	Computational	Computational	137.4	–	–
${ODSE}_{xor}^{wo}$	✓	✓	Computational	Computational	0.48	2.32	$ℓ - 1$	✗
${ODSE}_{ro}^{wo}$	✓	✓	Computational	Computational	3.45	1.85	$< ℓ$	✓
${ODSE}_{it}^{wo}$	✓	✓	Information theoretic	Information theoretic	4.54	4.08	$< ℓ / 2$	✓

This delay is for semi-honest setting with encrypted index containing 300,000 files and 320,000 keywords under the network and configuration presented in Section 8.1.

^†

ℓ is # servers in the system. We define the robustness in distributed setting as the ability to tolerate unresponsive server(s) in the semi-honest setting or incorrect replies in the malicious setting. In ${ODSE}_{it}^{wo}$ , encrypted index and search query are SSS with the same privacy level. Generic ORAM solutions have a stronger adversarial model than ours because they are not vulnerable to collusion that arises in the distributed setting.

^‡

All $ODSE$ schemes perform search and update protocols to hide the actual query type. In ${ODSE}_{xor}^{wo}$ , search is IT-secure due to SSS-based PIR and update is computationally secure due to IND-CPA encryption. Hence, its overall security is computational.

^∗

The encrypted index in ${ODSE}_{it}^{wo}$ is information-theoretically secure because it is $SSS$ . Other schemes employ IND-CPA encryption so that their index is computationally secure (see Section 5).

For the oblivious file update, ${ODSE}_{xor}^{wo}$ and ${ODSE}_{ro}^{wo}$ achieved a similar delay since they have the same number of servers and incurred the same amount of data to be transmitted. ${ODSE}_{it}^{wo}$ is slightly slower than ${ODSE}_{xor}^{wo}$ and ${ODSE}_{ro}^{wo}$ because the client transmitted data to three servers, instead of two. We can see that in many cases, where it is not necessary to hide the operation types (search/update), using $ODSE$ to conduct individual oblivious operations, especially the keyword search, is much more efficient than generic ORAMs. We further provide a comparison of ODSE schemes with their counterparts in Table 2. In the following section, we dissect the end-to-end delay of $ODSE$ schemes to understand which factors contributing the most to their performance.

Fig. 24.

Detailed Search (S) and Update (U) costs of semi-honest $ODSE$ schemes.

8.3. Detailed cost analysis

Figure 24 presents the detailed delays of separate keyword search and file update operations in $ODSE$ schemes. There are three main factors impacting the end-to-end delay of $ODSE$ schemes as follows.

Client processing : As shown in Fig. 24, the client computation contributes the least amount to the overall search delay (less than $10 %$ ) in all $ODSE$ schemes. It comprises the following operations: (i) Generate search queries with PRF in ${ODSE}_{xor}^{wo}$ or SSS in ${ODSE}_{ro}^{wo}$ and ${ODSE}_{it}^{wo}$ schemes; (ii) $SSS$ recovery (in ${ODSE}_{ro}^{wo}$ and ${ODSE}_{it}^{wo}$ ) and/or IND-CPA decryption (in ${ODSE}_{xor}^{wo}$ and ${ODSE}_{ro}^{wo}$ ); (iii) Filter dummy columns and collect columns in the stash. Note that the client delay of $ODSE$ schemes can be further reduced (by at least 50%–60%) via pre-computation of some values such as row keys and PIR queries (only contain shares of 0 or 1). For the file update, the client performs either decryption followed by re-encryption on λ columns (in ${ODSE}_{xor}^{wo}$ and ${ODSE}_{ro}^{wo}$ ), or SSS over $λ^{'}$ blocks (in ${ODSE}_{it}^{wo}$ ). Since we used crypto acceleration (i.e., Intel AES-NI) and highly optimized number theory libraries (i.e., NTL), all these computations only contributed to a small fraction of the total delay.

Client-server communication : Data transmission is the most dominating factor in the delay of $ODSE$ schemes. The communication cost of ${ODSE}_{xor}^{wo}$ is the smallest among all ODSE schemes since the size of search query and the data transmitted from servers are only binary strings. In ${ODSE}_{ro}^{wo}$ and ${ODSE}_{it}^{wo}$ schemes, the size of components in the search query vector is 16 bits. Their communication overhead can be reduced by using a smaller finite field at the cost of increased PIR computation on the server side.

Server processing : The cost of PIR operations in ${ODSE}_{xor}^{wo}$ is negligible as it uses XOR tricks. The PIR computation overhead in ${ODSE}_{ro}^{wo}$ and ${ODSE}_{it}^{wo}$ is reasonable because it operates on a considerably large amount of 16-bit values. For the file update operations, the server-side cost is mainly due to memory accesses to overwrite some columns of the encrypted index. ${ODSE}_{ro}^{wo}$ and ${ODSE}_{it}^{wo}$ schemes are highly memory access-efficient since we store their matrix-based index column-wise in the memory. This memory layout organization allows the inner product in PIR to access contiguous memory blocks thereby, minimizing the memory access delay not only in the update but also in the search. In ${ODSE}_{xor}^{wo}$ , we stored the matrix row-wise for row-friendly access to permit efficient XOR operations during search. However, this requires file update to access non-contiguous memory blocks. Hence, the file update in ${ODSE}_{xor}^{wo}$ incurred a higher memory access delay than that of ${ODSE}_{ro}^{wo}$ and ${ODSE}_{it}^{wo}$ as shown in Fig. 24.

8.4. Storage overhead

The main limitation of $ODSE$ schemes is the size of encrypted index, whose asymptotic cost is $O (N \cdot M)$ , where N and M are the number of files and unique keywords, respectively. Given the largest database being experimented, the size of our encrypted index is 23 GB. The client storage includes two position maps of size $O (M log M)$ and $O (N log N)$ , the stash of size $O (M \cdot log N)$ , a counter vector of size $Ω (N)$ and a master key (in ${ODSE}_{xor}^{wo}$ scheme). Empirically, with the same database size discussed above, the client requires approximately 22 MB in all $ODSE$ schemes.

Fig. 25.

Delay of semi-honest $ODSE$ schemes and their counterparts with different fraction of keywords/files involved in a search/update.

8.5. Experiment with various query sizes

We studied the performance of our schemes and their counterparts in the context of various keyword and file numbers involved in search and update operations that we refer to as “query size”. As shown in Fig. 25, $ODSE$ schemes are more efficient than using generic ORAMs when more than 5% of keywords/files in the database are involved in the search/update operations. Since the complexity of $ODSE$ schemes is linear to the number of keywords and files (i.e., $O (M + N)$ ), their delay is constant and independent from the query size. The complexity of ORAM approaches is $O (r {log}^{2} (N \cdot M))$ , where r is the query size. Although the bandwidth cost of $ODSE$ schemes is asymptotically linear, their actual delay is much lower than using generic ORAM, whose cost is poly-logarithmic to the total number of keywords/files but linear to the query size. This confirms the results of Naveed et al. in [29] on the performance limitations of generic ORAM and DSSE composition, wherein we used the same dataset for our experiments.

Fig. 26.

End-to-end delay of maliciously-secure $ODSE$ schemes in the presence of one malicious adversary.

8.6.

ODSE

performance in the presence of malicious adversary

In this section, we present the performance of maliciously-secure $ODSE$ schemes described in Section 6. Figure 26 presents the search and update delay of $MD- {ODSE}_{xor}^{wo}$ , $MR- {ODSE}_{ro}^{wo}$ and $MR- {ODSE}_{it}^{wo}$ schemes in the presence of one malicious adversary, compared with their corresponding semi-honest version. Recall that in this setting, we set the number of servers in the system for $MD- {ODSE}_{xor}^{wo}$ , $MR- {ODSE}_{ro}^{wo}$ and $MR- {ODSE}_{it}^{wo}$ schemes to be two, three and four, respectively. We can see that the search delays of maliciously-secure $ODSE$ schemes are around two times slower than their semi-honest version. It is mainly due to the additional processing and network transmission overhead for the MAC components stored at the server-side, which has the same size with the encrypted index. The update of $MR- {ODSE}_{ro}^{wo}$ and $MR- {ODSE}_{it}^{wo}$ schemes are around three times slower than that of their semi-honest version. The main reason is that $MR- {ODSE}_{ro}^{wo}$ and $MR- {ODSE}_{it}^{wo}$ requires an extra server in the system to detect one malicious adversary, which leads to the increase of the client bandwidth overhead.

Fig. 27.

Delay of maliciously-secure $ODSE$ schemes with varied number of malicious servers.

We also explored the performance of maliciously-secure $ODSE$ schemes when the number of malicious servers increases. Allowing more servers to be malicious requires to deploy more servers in the system. Specifically, $MR- {ODSE}_{ro}^{wo}$ and $MR- {ODSE}_{it}^{wo}$ schemes need $2 t + 1$ and $3 t + 1$ servers in total to be robust against t number of malicious servers, respectively. Figure 27 presents the performance of maliciously-secure $ODSE$ schemes with the varied number of corrupted servers. We can see that it is expensive to offer the robustness for a number of malicious servers in the system. This is because it incurs not only the client bandwidth overhead to communicate with more servers, but also the client computation overhead. In the worst case, $MR- {ODSE}_{ro}^{wo}$ and $MR- {ODSE}_{it}^{wo}$ requires the client to perform $(\binom{ℓ}{t + 1})$ and $(\binom{ℓ}{2 t + 1})$ times of MAC verification, respectively, to find an authentic $| t |$ -bit data block in the presence of (less than) t malicious servers. Since $MD- {ODSE}_{xor}^{wo}$ can only detect the malicious behavior (without knowing which server it is), its overhead only increases slightly when allowing more servers to be malicious. This is because it only requires to deploy more servers in the system, and the client aborts the protocol immediately when he/she finds an invalid MAC tag (without trying aggressively to find an alternative authentic block as in $MR- {ODSE}_{ro}^{wo}$ and $MR- {ODSE}_{it}^{wo}$ schemes).

9. Conclusion

In this article, we present a novel Oblivious Distributed DSSE framework called $ODSE$ , which offers access pattern obliviousness, hidden size pattern, and low end-to-end for index access. These properties are achieved by exploiting unique characteristics of the index data structure and searchable encryption, which allows to deploy computation- and bandwidth-efficient techniques (i.e., multi-server PIR and Write-Only ORAM) to conduct oblivious search and update separately. Our framework contains a series of $ODSE$ schemes each featuring different levels of performance and security in terms of data confidentiality and access pattern obliviousness. Specifically, ${ODSE}_{xor}^{wo}$ offers the lowest end-to-end delay, smallest bandwidth overhead and the highest resiliency against colluding servers. ${ODSE}_{it}^{wo}$ offers the robustness and information-theoretic security for access patterns and the encrypted index. ${ODSE}_{ro}^{wo}$ inherits the best of both ${ODSE}_{xor}^{wo}$ and ${ODSE}_{it}^{wo}$ schemes: low end-to-end delay and robustness in the distributed setting. All these schemes can also be extended to be secure/robust against malicious adversary.

References

Abraham,

C.W.

Fletcher,

Nayak,

Pinkas and

Ren, Asymptotically tight bounds for composing ORAM with PIR, in: IACR Public Key Cryptography, Springer, 2017, pp. 91–120.

Beimel and

Stahl, Robust information-theoretic private information retrieval, in: International Conference on Security in Communication Networks, Springer, 2002, pp. 326–341.

E.-O.

Blass,

Mayberry,

Noubir and

Onarlioglu, Toward robust hidden volumes using write-only oblivious RAM, in: Proceedings of the 2014 ACM CCS, ACM, 2014, pp. 203–214.

Bösch,

Hartel,

Jonker and

Peter, A survey of provably secure searchable encryption, ACM Computing Surveys (CSUR) 47(2) (2015), 18.

Bosch,

Peter,

Leenders,

H.W.

Lim,

Tang,

Wang,

Hartel and

Jonker, Distributed searchable symmetric encryption, in: Privacy, Security and Trust (PST), 12th International Conference on, IEEE, 2014, pp. 330–337.

Bost, Sophos – forward secure searchable encryption, in: Proceedings of the 2016 ACM Conference on Computer and Communications Security, ACM, 2016.

Bost,

Minaud and

Ohrimenko, Forward and backward private searchable encryption from constrained cryptographic primitives, Technical Report, IACR Cryptology ePrint Archive 2017, 2017.

Cao,

Wang,

Li,

Ren and

Lou, Privacy-preserving multi-keyword ranked search over encrypted cloud data, IEEE Transactions on Parallel and Distributed Systems 25(1) (2014), 222–233. doi:10.1109/TPDS.2013.45.

Cash,

Grubbs,

Perry and

Ristenpart, Leakage-abuse attacks against searchable encryption, in: Proceedings of the 22nd ACM CCS, 2015, pp. 668–679.

10.

Cash,

Jaeger,

Jarecki,

C.S.

Jutla,

Krawczyk,

M.-C.

Rosu and

Steiner, Dynamic searchable encryption in very-large databases: Data structures and implementation, IACR Cryptology ePrint Archive 2014 (2014), 853.

11.

Chor,

Kushilevitz,

Goldreich and

Sudan, Private information retrieval, Journal of the ACM (JACM) (1998).

12.

Curtmola,

Garay,

Kamara and

Ostrovsky, Searchable symmetric encryption: Improved definitions and efficient constructions, in: Proceedings of the 13th ACM CCS, ACM, 2006, pp. 79–88.

13.

Damgård,

Pastro,

Smart and

Zakarias, Multiparty computation from somewhat homomorphic encryption, in: Annual Cryptology Conference, Springer, 2012, pp. 643–662.

14.

Devadas,

van Dijk,

C.W.

Fletcher,

Ren,

Shi and

Wichs, Onion oram: A constant bandwidth blowup oblivious ram, in: Theory of Cryptography Conference, Springer, 2016, pp. 145–174. doi:10.1007/978-3-662-49099-0_6.

15.

Garg,

Mohassel and

Papamanthou, TWORAM: Round-optimal oblivious RAM with applications to searchable encryption, IACR Cryptology ePrint Archive 2015 (2015), 1010.

16.

Goldberg, Improving the robustness of private information retrieval, in: IEEE Symposium on Security and Privacy, IEEE, 2007, pp. 131–148.

17.

M.D.

Green and

Miers, Forward secure asynchronous messaging from puncturable encryption, in: Security and Privacy (SP), 2015 IEEE Symposium on, IEEE, 2015, pp. 305–320. doi:10.1109/SP.2015.26.

18.

Gueron, White paper: Intel Advanced Encryption Standard (AES) new instructions set, Document Revision 3.01, September 2012.

19.

Guruswami and

Sudan, Improved decoding of Reed–Solomon and algebraic-geometric codes, in: Foundations of Computer Science, 1998. Proceedings. 39th Annual Symposium on, IEEE, 1998, pp. 28–37.

20.

Hahn and

Kerschbaum, Searchable encryption with secure and efficient updates, in: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, ACM, 2014, pp. 310–320.

21.

Hoang,

Yavuz and

Guajardo, Practical and secure dynamic searchable encryption via oblivious access on distributed data structure, in: Proceedings of the 32nd Annual Computer Security Applications Conference (ACSAC), ACM, 2016.

22.

Hoang,

A.A.

Yavuz,

F.B.

Durak and

Guajardo, Oblivious dynamic searchable encryption on distributed cloud systems, in: IFIP Annual Conference on Data and Applications Security and Privacy, Springer, 2018, pp. 113–130.

23.

M.S.

Islam,

Kuzu and

Kantarcioglu, Access pattern disclosure on searchable encryption: Ramification, attack and mitigation, in: NDSS, 2012.

24.

Kamara and

Papamanthou, Parallel and dynamic searchable symmetric encryption, in: Financial Cryptography and Data Security, Springer, 2013, pp. 258–274. doi:10.1007/978-3-642-39884-1_22.

25.

Kamara,

Papamanthou and

Roeder, Dynamic searchable symmetric encryption, in: Proceedings of the 2012 ACM Conference on Computer and Communications Security, ACM, 2012, pp. 965–976.

26.

Katz and

Lindell, Introduction to Modern Cryptography, CRC Press, 2014.

27.

Liu,

Zhu,

Wang and

Y.-a.

Tan, Search pattern leakage in searchable encryption: Attacks and new construction, Information Sciences (2014).

28.

Moataz,

Ray,

Shikfa,

Cuppens and

Cuppens, Substring search over encrypted data, Journal of Computer Security (2018), 1–30.

29.

Naveed, The fallacy of composition of oblivious RAM and searchable encryption, in: Cryptology ePrint Archive, Report 2015/668, 2015.

30.

Paillier, Public-key cryptosystems based on composite degree residuosity classes, in: International Conference on the Theory and Applications of Cryptographic Techniques, Springer, 1999, pp. 223–238.

31.

Pouliot and

C.V.

Wright, The shadow nemesis: Inference attacks on efficiently deployable, efficiently searchable encryption, in: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ACM, 2016, pp. 1341–1352.

32.

Ren,

C.W.

Fletcher,

Kwon,

Stefanov,

Shi,

van Dijk,

Devadas and

O.R.A.M.

Ring, Closing the gap between small and large client storage oblivious RAM, IACR Cryptology ePrint Archive (2014).

33.

Shamir, How to share a secret, Communications of the ACM (1979).

34.

Shoup, NTL: A library for doing number theory, 2016.

35.

D.X.

Song,

Wagner and

Perrig, Practical techniques for searches on encrypted data, in: Proceedings of the 2000 IEEE Symposium on Security and Privacy, IEEE Computer Society, 2000, pp. 44–55.

36.

sparsehash: An extemely memory efficient hash_map implementation, February 2012.

37.

Stefanov,

Papamanthou and

Shi, in: Practical Dynamic Searchable Encryption with Small Leakage, NDSS, San Diego, California, USA, 2014.

38.

Stefanov,

Van Dijk,

Shi,

Fletcher,

Ren,

Yu and

Devadas, Path ORAM: An extremely simple oblivious RAM protocol, in: Proceedings of the 2013 ACM CCS, ACM, 2013, pp. 299–310.

39.

Sun,

Wang,

Cao,

Li,

Lou,

Y.T.

Hou and

Li, Privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking, in: ACM SIGSAC AsiaCCS, ACM, 2013, pp. 71–82.

40.

The Clusion Library.

41.

Wang,

Cao,

Li,

Ren and

Lou, Secure ranked keyword search over encrypted cloud data, in: IEEE 30th International Conference on Distributed Computing Systems, IEEE, 2010, pp. 253–262.

42.

L.R.

Welch and

E.R.

Berlekamp, Error correction for algebraic block codes, Google Patents, 1986, US Patent 4,633,470.

43.

ZeroMQ library, 2016.

44.

Zhang,

Xue,

Yu and

Liu, Dynamic and efficient private keyword search over inverted index-based encrypted data, ACM Transactions on Internet Technology (TOIT) 16(3) (2016), 21. doi:10.1145/2940328.

45.

Zhang,

Katz and

Papamanthou, All your queries are belong to us: The power of file-injection attacks on searchable encryption, in: 25th USENIX Security Symposium (USENIX Security, Vol. 16, 2016, pp. 707–720.

46.

Zhou,

Li,

A.X.

Liu,

Lin and

Xu, Integrity preserving multi-keyword searchable encryption for cloud computing, in: International Conference on Provable Security, Springer, 2016, pp. 153–172.

A multi-server oblivious dynamic searchable encryption framework

Abstract

Keywords

1. Introduction

1.1. State-of-the-arts and limitation

1 By generic ORAM, we mean the technique that can hide whether the access is to read or to write as opposed to read-only Private Information Retrieval or Write-Only ORAM.

2.1. Notation

2.3. Private information retrieval

3.1. System model

Definition 3 ( ODSE security w. r. t. semi-honest adversary).

4. The proposed (semi-honest) ODSE schemes

4.1. ODSE data structures

6.1. MD- ODSE xor wo : Maliciously-detectable ODSE xor wo

8. Performance evaluation

8.1. Configurations

8.4. Storage overhead

References

¹
By generic ORAM, we mean the technique that can hide whether the access is to read or to write as opposed to read-only Private Information Retrieval or Write-Only ORAM.

Definition 3 ( $ODSE$ security w. r. t. semi-honest adversary).

4. The proposed (semi-honest) $ODSE$ schemes

4.1. $ODSE$ data structures

6.1. $MD- {ODSE}_{xor}^{wo}$ : Maliciously-detectable ${ODSE}_{xor}^{wo}$