Abstract
Software-based approaches for search over encrypted data are still either challenged by lack of proper, low-leakage encryption or slow performance. Existing hardware-based approaches do not scale well due to hardware limitations and software designs that are not specifically tailored to the hardware architecture, and are rarely well analyzed for their security (e.g. the impact of side channels). Additionally, existing hardware-based solutions often have a large code footprint in the trusted environment susceptible to software compromises. In this paper we present HardIDX: a hardware-based approach, leveraging Intel’s SGX, for search over encrypted data. It implements only the security critical core, i.e., the search functionality, in the trusted environment and resorts to untrusted software for the remainder. HardIDX is deployable as a highly performant encrypted database index: it is logarithmic in the size of the index and searches are performed within a few milliseconds rather than seconds. We formally model and prove the security of our scheme showing that its leakage is equivalent to the best known searchable encryption schemes. Our implementation has a very small code and memory footprint yet still scales to virtually unlimited search index sizes, i.e., size is limited only by the general – non-secure – hardware resources.
Introduction
Outsourcing the storage and processing of sensitive data to untrusted cloud environments introduces new threats, compared to local data processing, due to possible data leakage, government intrusion, and legal liability. Cryptographic solutions such as Secure Multiparty Computation (MPC) [8,34] and in particular Fully Homomorphic Encryption (FHE) [30] offer a high degree of protection by allowing arbitrary computation on encrypted data. However, MPC and FHE schemes are still impractical for adoption in large distributed systems [19,31].
Moreover, there are a number of useful applications that only require a small set of operations rather than the universal solutions offered by MPC/FHE. A prime example of such operations is the search and retrieval in encrypted databases without the need to download all data from the cloud. For searching over encrypted data, different cryptographic schemes have been proposed such as property-preserving encryption [7,10], or functional encryption [12] and its special case searchable encryption [22,41,48,65]. In this context, performing efficient and secure range queries are commonly considered to be very challenging. CryptDB [59] resorts to order-preserving encryption for this purpose which is susceptible to simple ciphertext-only attacks as shown by Naveed et al. [54].
As we will elaborate in our related work section, many schemes for search over encrypted data supporting range queries require search time linear in the number of database records. Recently, schemes with polylogarithmic search time, based on an index structure, have been proposed [23,26,48]. Nonetheless, the first scheme in [48] is not yet practical, because it applies pairing-based cryptography and also leaks sensitive information about the plaintext, namely the order of the plaintexts. In contrast, [23] and [26] presented approaches with polylogarithmic search time that utilize only lightweight cryptography, that is, pseudorandom functions and symmetric encryption. Out of the many schemes presented in [23], the most secure approach, without false positives and bearable storage cost achieves practical deployability. However, it still leaks much sensitive information, e.g. the search pattern and the range size of each query. Hence, designing an efficient searchable encryption scheme with minimal leakage on the queried ranges remains an open challenge.
Another line of research [4,5] leverages the developments in hardware-assisted Trusted Execution Environments (TEEs) for search over encrypted data. Although Intel’s recently introduced Software Guard Extension (SGX) [2,20,36,38,40,50] has inspired new interest in TEEs, related technologies have been available before, e.g. in ARM processors known as ARM TrustZone [3] as well as in academic research [14,44,66]. Also, AMD has recently announced a TEE for their CPUs [42] rising the hope that TEEs will be widely available in x86 processors, and thus in many relevant environments such as clouds, in the near future.
TEEs provide the secure isolation of sensitive data and computation in hostile environments. For instance, SGX loads application code and data from main memory into an isolated memory, which assures confidentiality and integrity of its data against malicious software on the same system, including privileged software like operating system (OS) or hypervisor. These properties are only guaranteed as long as the TEE operates completely self-contained. However, TEEs have to interact with untrustworthy components within the same computer system for various reasons. In particular, TEEs often have limited resources and need to utilize untrusted resources of their host system. For instance, in case of limited memory it swaps out data to untrusted memory or storage. Hence, in order to achieve comprehensive security, information leakage through those channels has to be considered and taken care of. Previous solutions for isolating databases with SGX load and execute the entire unmodified database management system (DBMS) into an enclave [4,5], but they do not consider possible side channels occurring from the use of untrusted resources as well as resource sharing between TEE and untrusted host. In particular, previous works do not formally consider information leakage. Additionally, they do not scale well due to the limited memory size of SGX’s enclaves and the large footprint of the code they require in the TEE.
Our goal and contribution. We present an efficient scheme for search over encrypted data using SGX that can directly be deployed as a database index. We utilize SGX’s protection characteristics to achieve an outstanding tradeoff between security, performance and functionality. The currently fastest software-based schemes that support range queries are [23] and [26]. Our solution significantly improves over these approaches in terms of performance and storage. Compared to the latest hardware-based schemes [4,5], we improve in terms of security and scalability. Our scheme organizes data in a
We provide two constructions of our search scheme differing in the management of the
We implemented and extensively evaluated both constructions on SGX-enabled hardware (see Section 7 and Section 8). The first construction loads the whole
Enhancements compared to conference version. Our solution displays its strength when it is deployed in the cloud. In this scenario, the cloud server has full control over the software. For that reason, internal or external attackers (e.g. administrators or governments) can launch active attacks on the data. Many related papers only consider a passive attacker, but there is no reason why an attacker with full control should refrain from active manipulations. We enhance the work in [27] by a thorough treatment of an active attacker in Section 6. In this extended version, we repeat the relevant background sections from [27] and extend them in many details, e.g. a more formal description of our first construction, a proof for the first construction to highlight the differences on a formal level and a detailed implementation section.
Our main contributions are as follows:
Our scheme has logarithmic complexity in the size of index and searches are performed within a few milliseconds. We formally model and prove our scheme secure showing that its security (leakage) is comparable to the best known searchable encryption schemes. We present a scheme that provides security under an active attacker. We provide an implementation and evaluate the performance and functional bottleneck of SGX on the basis of two different constructions that are designed specifically for SGX to reduce the Trusted Computing Base.
Background
We will start by briefly introducing SGX towards which we design HardIDX. Note, our solution is not strictly limited to SGX and could be used with any other system having a TEE with capabilities similar to SGX (e.g., Sanctum [21]). Afterwards we explain the basics of side channel attacks, which are a major concern for HardIDX.
Intel Software Guard Extensions (SGX)
SGX is an extension of the x86 instruction set architecture (ISA) introduced with the 6th Generation Intel Core processors (code name Skylake). We now present a high level overview of SGX’s features utilized by HardIDX (see [2,20,36,38,40,50] for more details).
Memory isolation. On SGX enabled platforms, programs can be divided into two parts, an untrusted part and an isolated, trusted part. The trusted part, called enclave in SGX terminology, is located in a dedicated portion of the physical RAM. The SGX hardware enforces additional protection on this part of the memory. In particular, all other software on the system, including privileged software like OS, hypervisor, firmware and code in system management mode (SMM) cannot access the enclave memory.
The untrusted part is executed as an ordinary process within the virtual memory address space and the enclave memory is mapped into the virtual memory of the untrusted host process. This mapping allows the enclave to access the entire virtual memory of its host process, while the (untrusted) host process can invoke the enclave only through a well-defined interface. Furthermore, all isolated code and data is encrypted while residing outside of the CPU package. Decryption and integrity checks are performed when the data is loaded inside the CPU.
Memory management. SGX dedicates a fixed amount of the system’s main memory (RAM) for enclaves and related metadata. For current systems this memory is limited to 128 MB which is used for both, SGX metadata and the memory for the enclaves themselves. The latter is called Enclave Page cache (EPC) and is about 96 MB according to our evaluations. The SGX memory is reserved in the early boot phase and is static throughout the runtime of the system. As the number of enclaves which may be loaded and executed in parallel is virtually unlimited, the OS has to manage the enclave memory dynamically. The OS can allocate (parts of) the memory to individual enclaves and change these allocation during the runtime of the enclaves. In particular, the OS can swap out enclave pages. SGX ensures integrity, confidentiality and freshness of swapped-out pages.
Attestation. SGX has a remote attestation feature which allows to verify the correct creation of an enclave on a remote system. During enclave creation the initial code and data loaded into the enclave are measured. This measurement can be provided to an external party to prove the correct creation of an enclave. The authenticity of the measurement as well as the fact that the measurement originates from a benign enclave is ensured by a signature, provided by SGX’s attestation feature (refer to [2] for details). This signature is provided by a component of SGX, called quoting enclave (QE). The QE accepts only measurements from the hardware and the hardware ensures that only correct enclaves are measured. Furthermore, the remote attestation feature allows for establishing a secure channel between an external party and an enclave.
Side channel attacks
Side channel attacks allow an adversary to extract sensitive information without having direct access to the information source by observing effects of the processing of the sensitive information. Side channel attacks have been known for a long time and various variants have been studied in the past, including hardware side channels like ground potential, EM or power consumption [29]; software timing side channels that can be observed remotely [15] as well as locally, e.g. through cache timing side channels [47,56,72]. However, all these side channel attacks are noisy and require repeated execution and measurements to extract the sensitive information.
In the context of SGX there exist a new class of side channels, called deterministic side channel [71]. As the OS is untrusted, yet still manages the enclave’s resources, including enclave memory, it can observe the enclaves behavior. In particular, the OS can generate a precise trace of the enclave’s code and data accesses at the granularity of pages. In [71] it is shown that this allows to extract sensitive information from an SGX enclave.
High level design
Now, we will give a high level description of our design and describe the general working of our scheme. Afterwards we explain our attacker model.
HardIDX overview
The high level design of our solution is shown in Fig. 1(a). The design involves three entities: the client (who is the data owner and therefore trusted), the untrusted SGX enabled server and the trusted SGX enclave within the server.

System model.
Initially, a client prepares its data values by augmenting it with (index) search keys. We abbreviate data values as values and search keys as keys throughout this paper. All other values and keys (e.g. cryptographic keys) are clearly differentiated if ambiguous. The values are stored at pseudo-random position. The keys are then inserted into a
in Fig. 1(a)1
For visualization purposes, the tree nodes and values are shown to be encrypted as a block. In reality each node and value is encrypted individually.
In the second step, a secure connection is established between the client and the enclave. The client uses the SGX attestation feature for authenticating the enclave (see Section 7 for details). Through this secure connection, the client provisions
). This step completes the setup of our scheme, which needs to be executed only once. Even when the enclave is unloaded, e.g., due to a reboot, the state (including
From now on the client can send (index) search queries to the server. Randomized encryption with the key
).
In step
, the enclave loads the
).
However, as the tree size can exceed the memory available inside the enclave we provide a second design. In this case, only a subset of tree nodes is loaded into the enclave. The tree is traversed starting from the root node. When the search reaches an edge to a node, which is currently not present in the enclave, it is fetched from the untrusted storage.
In both cases the search algorithm eventually reaches a set of leaf nodes, which holds pointers to values matching the query. This list of pointers, representing the search result, is passed to the untrusted part (see step
). The untrusted part learns nothing, except for the cardinality of the result set, from this interaction, because the values are stored in a randomized order.
The result of the index search could be processed further, e.g. in combination with additional SQL operators, in the SGX enclave at the server. In order to complete the end-to-end secure search we assume that the server uses the pointers, in step
, to fetch the encrypted values from untrusted storage and sends them to the client. The client uses
Notably, the plaintext data values are never available on the server. They are encrypted with strong standard cryptography methods (AES-128 in GCM mode in our case) and never decrypted on the server, not even inside the SGX enclave. The key to decrypt the values is only know to the client.
We assume a system which provides SGX (or any TEE with capabilities similar to SGX). In particular, code and data inside the TEE are protected with respect to their integrity and confidentiality. We further assume that the TEE provides means to establish a secure channel to the client which allows secure communication and secure provisioning.
Our attacker model is illustrated in Fig. 1(b). Due to SGX’s protection, the attacker cannot directly access the enclave. However, side channels exist, which could potentially be used by the attacker to extract sensitive information. In particular, the attacker aims to learn the structure of the
We assume the attacker has full control over all software on the system running HardIDX and he can perform active attacks. (1) The attacker can observe all interaction of the enclave with resources outside the enclave. In particular, the attacker can observe the access pattern to
Hardware attacks (in particular hardware/physical side channels) are out of scope in this paper. Furthermore, we consider denial of service (DoS) attacks out of scope, as the untrusted server could always refuse to serve the enclave. Similarly, the network connection between the client and the server is vulnerable to DoS attacks.
Notation and definitions
-Tree
A

Three node types are differentiated in a
We use unchained
In our constructions, it is not necessary to define the key domain in advance as in many other approaches. It is not even necessary that the domain is a range of integers. Instead, D can be an arbitrary domain with a defined order relation and a defined minimal and a maximal element recognizable by the algorithms. These two elements, denoted as
The branching factor b specifies a
A probabilistic authenticated symmetric encryption consists of three probabilistic polynomial-time algorithms
Hardware secured
-tree (
)
Based on the presented definition of a
(
).
A secure hardware
Algorithms executed at the client:
Take the security parameter λ as input and output a secret key Take the secret key Take the secret key Take the secret key Take a search token τ and optionally an encrypted tree γ as input and call the secure hardware function Take a search token τ and optionally nodes
In the following definitions and constructions, we assume a passive attacker. This is important, as especially the correctness can easily be thwarted by an active attacker. For instance, he can drop results before they are transferred to the client. We present implications of an active attacker and countermeasures in Section 6.
(Correctness).
Let
Our security model, which we define next, is based on a three step framework introduced by Curtmola et al. in [22]. The first step is to formulate a leakage, i.e. an upper bound of the information that an adversary can gather from the protocol. Secondly, one defines the
At security models of searchable encryption schemes so far, the leakage only covers the transaction between the client and server. In our scenario, there is an additional transaction between the server and the secure hardware that can be viewed by the adversary. Therefore, we extend the CKA2-security to CKA2-HW-security by introducing a new type of leakage denoted as Let the challenger runs the adversary We say (CKA2-HW-security).
Search algorithms
In this section, we will present two different constructions that enable a client to the search for a single value or a range of values based on keys. We use
Construction 1
We describe our first correct (according to Definition 2) and secure (according to Definition 3) construction in this section. The guiding idea of the construction is that the entire data should be stored and processed inside the enclave. The client starts by generating
The SGX application gets deployed to an SGX capable server at a cloud provider (see Section 2.1 for details). The remote attestation protocol of Intel SGX (see Section 2.1) is used between client and server to prove to the client that the correct software is deployed on an SGX enabled CPU. During the deployment, the application reserves an SGX protected memory region. The secure channel that is established during remote attestation is used to deploy
Next, the clients sends the
Since the values (e.g., full staff records) can be very large, the data transfer to the enclave can cause a severe performance overhead. We, therefore, store the values outside of the enclave. A second reason is the limited size of protected memory inside an enclave. Storing the values outside of the enclave has no security implications as the values are encrypted with an authenticated IND-CCA secure encryption scheme and they are stored in a randomized order. The untrusted part receives only pointers to the values from the trusted part and loads them from memory or disk by itself.
We now describe the
Algorithms executed at the client:
Use input λ to execute Take Use input Use input
Take the search token τ as input, pass τ to the trusted part and return the result values (by dereferencing the pointers):

Executed at the server on secure hardware:

Take the search token τ as input, decrypt the token, perform a
The construction is correct according to Definition 2, because it performs a textbook
A substantial advantage of the approach to put the whole functionality in an enclave is that SGX provides a high level of protection for the enclave memory and thus the
As mentioned before, we assume that an attacker is able to reveal the accessed pages during the
Next, we will prove the security of Construction 1. The first step is to define the leakage functions that are based on the attack model described in Section 3.2.
Given the key-value pairs Given the key-value pairs Loosely speaking, the pages access pattern The values access pattern

Illustration of page access pattern leakage: (a) example
The secure hardware
We describe a polynomial-time simulator Setup: Simulating γ: All described operations are possible, because the amount n of values, the size of each value and the amount of nodes Simulating τ: The simulator The simulated τ is indistinguishable from the output of Simulating secure hardware: At time
□
Cloud computing enables the fast and cost-effective processing of large amounts of data. Unfortunately, Construction 1 suffers from the substantial problem mentioned before: the memory reserved for SGX is limited to 128 MB, and only about 96 MB can be used for data and code. SGX supports a larger enclave size, but enclave pages have to be swapped in and out in this case. Our evaluation (see Section 8) shows that even
In this section, we describe our second correct (according to Definition 2) and secure (according to Definition 3) construction. Instead of loading all nodes into the enclave the main idea is to only load the nodes required to traverse the tree. The challenge is to optimize the communication bottleneck between the untrusted part and the enclave. We performed extensive benchmarking and algorithm engineering in order to identify and minimize the most important run-time consuming tasks, such as switching between the untrusted part and the enclave. The decisive advantage of our second construction is that the required memory space inside the enclave is
The setup of the
(
All algorithms but
Take the search token τ and the encrypted tree γ as input. At the beginning, pass only the root node to the trusted part and receive pointers to nodes that should be traversed next. The trivial solution is to pass one node after another to the enclave. A severe problem with this design is that every context switch from the untrusted to the trusted part or back causes a substantial overhead. We therefore optimized the number of context switches: transfer as many nodes as currently in the queue, but not more than fit into
Take a search token τ and nodes


The construction is correct according to Definition 2, because it is based on a textbook
We prove the security of Construction 2 by again defining the leakage functions that are based on the attack model described in Section 3.2.
Given the key-value pairs
Given the key-value pairs
The nodes access pattern
The value pointers access pattern

Illustration of nodes access pattern leakage: (a) example
The secure hardware
The simulator Setup: Simulating γ: All described operations can be executed by Simulating τ: The simulator The simulated τ is indistinguishable from the output of Simulating secure hardware: At time x is leaf: This output is indistinguishable from the output of
□
The main difference in the leakages of Construction 1 and Construction 2 is the granularity of the tree and value pointers access pattern. In the second construction, the attacker is able to reveal accesses on a node level. In contrast, the attacker is able to reveal accesses on a page level in the first construction, because SGX inherently leaks the page access pattern.
So far, we considered a setup comprising one user, but multiple user are directly supported by HardIDX. Multiple users are able to concurrently query data without limitations, as concurrent tree traversals do not influence each other. The only requirement is that each user has access to the key
Search algorithms under an active attacker
Constructions 1 and 2 are secure and correct for a passive attacker, but our overall goal is to enable the outsourcing of data to untrusted cloud providers. Therefore, it is important to consider active attackers. This attacker type tries to thwart the correctness and tries to gain additional sensitive information by not following the defined protocol. We omit concrete definitions of correctness and CKA2-HW-security under a active attacker, because they are easily deducible from Definition 2 and Definition 3. In the following, we consider only Construction 2, but the arguments and techniques can be applied to Construction 1 with minor modifications.
Attack vectors. We identified two basic attack vectors that cover a wide range of possible attacks. Firstly, the attacker can try to attack the protection mechanisms of SGX to gain insights about data and algorithm execution not under his control. We rely on SGX’s protection mechanism that guarantees security and correctness under an active attacker. However, as defined in our attacker model (see Section 3.2), we consider various side channels. Protection mechanisms against these are implementation details and thus described in our implementation section (Section 7). Secondly, the active attacker can try to influence the data and protocol execution that is under his control. Namely, all encrypted nodes, encrypted values, encrypted tokens and the
For now, we assume that there is a mechanism that guarantees the following to the client: if it gets a result, all results in the response match the query and the response contains all matching results. This mechanism is presented later in this section, but we first consider the correctness and security implications of this assumption. We show that there are no security implications by introducing an active attacker instead of an passive attacker.
Unprotected static data. The only static data influenceable by the active attacker are values and nodes. The security of this data is guaranteed by using an authenticated IND-CPA secure encryption scheme. The authenticated encryption also thwarts attacks on the correctness that try to modify or add static data, because these actions are noticed by the decryption algorithm. We do not consider the deletion of values or nodes, because it would lead to incomplete results and thus contradict the assumption stated above.
Unprotected dynamic data. The only dynamic data influenceable by the attacker is the search token. Again, the security of tokens and the prevention of modifications and additions is provided by the authenticated IND-CPA secure encryption scheme. A replay attack does also not give any additional information to the attacker as the tree is static and the leakage stays the same for a replayed token. The correctness could only be influenced by a denial of service (DoS) attack by dropping tokens, but DoS is out of scope according to our attacker model.
Unprotected algorithms.
Passing one node after another is already covered by the defined leakage.
One straightforward way to protect against deviations (2)–(4) is the usage of a Merkle tree structure (like done in [48]). It would be possible by transferring all values of a leaf node that contains any matching value, the hashes of the matching nodes and the hashes of any sibling of all matching nodes. Together with a stored root hash, the client can check the hashes bottom-up. However, this requires to transfer more data than necessary, leaks unnecessary information by touching sibling nodes and requires additional processing at the client.
A further method to prevent against deviations (2) and (3) is to store a list of requested node ids inside the enclave. For every incoming node, the enclave removes the corresponding id from the list and can be sure that it received all and only correct nodes when the list is empty. However, this approach requires
Deviations (4) can be mitigated by using the secure channel created during provisioning to transfer the results. Technically, it is possible, because the enclave has direct access to the values. Nevertheless, it would also require
We now present modifications to Construction 2 that do not exhibit the drawbacks of the Merkle tree approach and do not require
The construction of the
In the following modifications, we abbreviate the two algorithms of Construction 2 (i.e.
For every nonce,
The attacker does not gain additional information from the processing of wrong nodes. The reason is that the algorithm would not request them and thus already leaks that they do not contain values from the range.
Summarizing, the client is guaranteed the stated assumption through the presented mechanism. Performance measurements showed that the described protection mechanisms introduce an overhead of about
Subsequently, we elaborate on implementation details, which are important with respect to performance or security.
Platform. Our
Provisioning. As described before, the client initially provisions
Next, the enclave sends its just created
Side channels. Our implementation is concerned with three types of (side) channels: external resource access, page-fault side channel and cache timing side channel (see Section 3.2). In particular, by means of all three channels an adversary can observe access patterns to memory with the goal of inferring sensitive information from the observed access patterns.
In HardIDX, access to external resources by the enclave is limited to
While the external resources used by HardIDX store only data, page-fault side channel and cache timing side channels allow to observe access to enclave memory which contains both, data and code. These side channels do not reveal sensitive data directly. However, they might reveal access to memory locations, which can reveal sensitive information if the memory access to code or data differs depending on sensitive data.
As before, the attacker’s goal might be to extract information about the tree structure and the order of values stored and searched. Additionally the attacker might try to extract cryptographic secrets from the enclave. For instance, the attack might aim at learning the secret key
The page-fault side channel allows the attacker to reliably observe memory access patterns, however, the granularity is relatively coarse (4 kB). All accesses within the same page are indistinguishable for the attacker and, thus, are not exploitable. Construction 1 explicitly considers the leakage of the tree structure through this side channel. In Construction 2 the page-fault side channel does not leak additional information, as nodes are smaller than memory pages and the nodes access pattern is leaked anyway by storing the
Cache timing side channel allow finer grained memory access observations while being less reliable. Nevertheless, assuming an adversary who is able to observe accesses within a node, the attacker needs to determine which links to child nodes are followed. Our algorithm, however, accesses every key and pointer, whether the pointer is followed or not. By this and other fine grained implementation details, we achieve data independent accesses and thwart the cache timing side channel.
Leakage of cryptographic keys are thwarted for page-fault and cache timing side channel by using leakage resilient implementations and hardware features [9]. For instance, the AES implementation used in HardIDX uses AES-NI hardware which holds the S-Boxes in CPU registers instead of RAM, thus hampering cache side channel attacks [51,70].
Memory management. We implemented both constructions of HardIDX. In particular Construction 2 is optimized with respect to memory transfer operations, and context switches between untrusted and trusted part. To reduce the number of context switches a list of requested nodes is hold by the untrusted part. Nodes from this list are transfered and processed at once, i.e. with only one switch. The memory transfer is optimized by exploiting the fact that the enclave can access the memory of its host process. The
Performance evaluation
In this section, we present our evaluation results collected in a number of experiments. First, we compare our two constructions described in Section 5. Then, we examine the effects of the different ways of memory management for the constructions. Finally, we compare our solution against the currently fastest polylogarithmic range query search algorithm presented in [23,26].
Platform. Our evaluation system was equipped with an Intel Core i7-6700 processor at
Construction 1 vs. Construction 2
First, we compare the performance of our two constructions. The parameters of the

Comparison of constructions.
The performance difference can be explained by the following effects:
Construction 2, therefore, is slower than Construction 1 by a small factor at any result size. For an increasing size of the result set, both algorithms search a linearly increasing part of the tree. Figure 5 shows that the run-times of our two constructions converge (on a logarithmic scale). This shows that the effects described above diminish compared to the search time of the algorithm.
In order to identify the limiting parameters in the memory management of our two constructions, we evaluate

Effect of different branching factors in (a) Construction 1 and (b) Construction 2.
In Fig. 6(a), we see a sharp increase of the run-time above a tree size of
We see a significant difference in the impact of paging between the different branching factors. This becomes clear by considering the number of required page swaps. The lower the branching factor, the higher the number of nodes in a
In Fig. 6(b), we see that Construction 2 is not affected by paging, albeit supporting an unlimited tree size. Our data also shows that, as expected, higher branching factors result in better performance. Disregarding the paging problem of Construction 1 above a tree size of
In this section, we compare our Construction 2 against the currently fastest approach with comparable security features and a security proof presented by Demertzis et al. in [23]. The authors present seven different constructions that support range queries. The constructions have different tradeoffs regarding security, query size, search time, storage and false positives. We do not compare against the highly secure scheme with prohibitive storage cost and also exclude the approaches with false positives as our construction does not lead to false positives. Instead, we compare against the most secure approach without these problems: Logarithmic-URC.
We assume that the OXT construction from [16] is used as underlying symmetric searchable encryption scheme (SEE) by Logarithmic-URC. Fundamentally, the SSE scheme is changeable, but the authors of [23] also utilize OXT for the security and performance evaluation. One has to note that a quite equal construction as Logarithmic-URC was presented independently by Faber et al. in [26]. We implemented the algorithm of [23], but a security and performance comparison to [26] would lead to comparable results.
Table 1 compares our Construction 2 and Logarithmic-URC. In this evaluation, we use a branching factor of 100 for Construction 2 and search for a randomly chosen range that contains 100 results. Every test for the four different tree sizes (100,
Time comparison of random range queries with Logarithmic-URC [23] and our Constr. 2
Time comparison of random range queries with Logarithmic-URC [23] and our Constr. 2
Our construction runs in about a tenth of a millisecond and with very moderate increase for all tree sizes. In contrast, Logarithmic-URC requires at least multiple milliseconds up to a seconds for bigger trees. A reason for the performance difference might be that OXT construction itself is less efficient then our construction. Furthermore, the search time of OXT depends on the number of entries. Logarithmic-URC fills the OXT construction with elements from a binary tree over the domain for every stored key. An increasing domain severely increases the tree height of a binary tree and thus the number of entries for OXT. In contrast, the height of the
A functional difference between Logarithmic-URC and Construction 2 is that Logarithmic-URC requires to fix the search key domain beforehand. Cover a huge domain does negatively influence the setup and search time. In our construction, it is not necessary to fix the domain and the domain size has no performance implications.
It is not trivial to compare Logarithmic-URC and Construction 2 regarding security. The access pattern leakage and the leakage of the internal data structure of Logarithmic-URC is comparable to our access pattern leakages. However, Logarithmic-URC additionally leaks the domain size, the search range size and the search pattern. The search pattern reveals whether the same search was performed before, which might be sensitive information. Furthermore, our construction only requires index storage in
Our work is related to searchable encryption, encrypted databases and secure implementations based on a TEE (e.g. SGX).
Searchable encryption
Song et al. introduced in [65] the first searchable encryption schemes for single plaintexts. In order to improve performance, Goh [33] and Curtmola et al. [22] introduced encrypted (inverted) indices. However, these encryption schemes can only search for keyword equality and not ranges.
Comparison of range-searchable encryption schemes. n is the number of keys, D is the size of the plaintext domain and R is the query range size
Comparison of range-searchable encryption schemes. n is the number of keys, D is the size of the plaintext domain and R is the query range size
Searchable encryption scheme supporting range queries are rare. Table 2 shows a comparison of different searchable encryption schemes and other schemes that support range queries. Note that all existing range-searchable encryption schemes leak the access pattern – including ours. The first range-searchable scheme by Boneh and Waters in [13] encrypt every entry linear in the size of the plaintext domain. The first scheme with logarithmic storage size per entry in the domain was proposed by Shi et al. in [63]. Their security model (match-revealing) is somewhat weaker than standard searchable encryption. The construction is based on inner-product predicate encryption which has been made fully secure by Shen et al. in [62]. All of these schemes have linear search time.
Lu built the range-searchable encryption from [62] into an index in [48], thereby enabling polylogarithmic search time. However, his encrypted inverted index tree reveals the order of the plaintexts and is hence only as secure as order-preserving encryption. Wang et al. [68] proposed a multi-dimensional extension of Lu [48], but it suffers from the same problem of order leakage. There is no known searchable encryption schemes for ranges – until ours – that has polylogarithmic search time and leaks only the access pattern.
A Lu implementation done by us requires several seconds or minutes for a single range search, even with a security parameter much weaker than ours. Hence, we not only improve asymptotic search time, but more importantly reduce the constants in order to open application to much larger data sets.
ORAM can in principle be used to hide the access pattern of searchable encryption. However, Naveed shows that the combination of the two is not straightforward [53]. Special ORAM techniques, like TWORAM [28], are needed.
Encrypted databases, such as CryptDB [59], use property-preserving encryption for efficient search. Property-preserving encryption has very low deployment and runtime overhead due to the ability to use internal index structures of the database engine in the same way as on plain data. Order-preserving encryption [1,10,11,43] allows range queries on the ciphertexts as on the plaintexts. With order-revealing encryption [17,45] a generalization of order-preserving encryption has been published recently. However, Naveed et al. [54] initiated the research direction of practical ciphertext-only attacks on property-preserving encryption, in particular order-preserving encryption, which recover the plaintext in many cases with very high probability (close to
There have been a number of attempts to build indices for range queries based on deterministic encryption. Bucketization of ciphertexts [37] groups ciphertexts on the server and filters results at the client. Wang et al. [69] uses distance-revealing encryption in order to build an r-tree. Li and Omiecinski [46] use prefix-preserving encryption in order to build a prefix tree for range searches. However, all of these approaches are susceptible to the same attacks (and worse) as those by Naveed et al.
Four further approaches for a secure DBMS allowing range query evaluation have been published: Firstly, Cash et al. [16] introduce a new protocol called OXT that allows evaluation of boolean queries on encrypted data. Faber et al. [26] extend this data structure to support range queries but either leak additional information on the queried range or the result set contains false positives. In [23], Demertzis et al. present several approaches for range queries. The authors also evaluate the security and performance based on the OXT protocol. The scheme that is most comparable to ours, Logarithmic-URC, is quite equal to the range query approach without false positives from [26] and thus exhibits equal additional leakage. We provide an experimental comparison in Section 8.3. Secondly, Pappas et al. [57] evaluate encrypted bloom filters using Secure Multiparty Computation. However, in order to achieve practical efficiency they propose to split the server into two non-colluding parties. Our approach does not require any additional party. Thirdly, Egorov et al. [25] presented ZeroDB. A database system that enables a client to perform equality and range searches with the help of
TEE-based applications
Trusted Database System (TDB) uses a trusted execution environment (TEE) to operate the entire database in a hostile environment [49]. While TDB encrypts the entire database storage and metadata (e.g. tree structures to organize the data), TDB is not concerned with information leakage from the TEE. Neither does TDB aim at hiding access patterns nor does it consider side channels attacks against the TEE. Furthermore, since the entire DB operates in the TEE the trusted computing base is very large exposing a very large attack surface.
Haven is an approach to shield application on an untrusted system using SGX [5]. The goal of Haven is to enable the execution of unmodified applications inside an SGX enclave. This technique could be used to isolated off-the-shelf databases with SGX, however, Haven does not consider information leakages through memory access patterns or interactions with the untrustworthy outside world. Furthermore, this approach limits the size of the database due to limited enclave size.
VC3 (short for verifiable confidential cloud computing) adapts the MapReduce computing paradigm to SGX [61]. Mapper and Reducer entities are executed in individual enclaves, this means the data flow between them can leak sensitive information. While VC3 is tailored towards SGX they exclude information leakage from their adversary model. In contrast, we provide the first work on SGX that specifically focuses on information leakage in the interaction of an enclave with other entities.
Data-oblivious machine learning for SGX was presented in [55]. Four machine learning algorithm have been adapted by the authors in order to prevent the exploitation of side channels. All data and code accesses that are dependent on sensitive data are transferred into data-oblivious accesses by using a library providing a set of data-oblivious primitives. Access to external data, specifically input data, is addressed by randomizing the data and always accessing all data, i.e., their solution has an complexity of
Conclusion
In this paper, we introduce HardIDX – an approach to search for ranges and values over encrypted data using hardware support making it deployable as a secure index in an encrypted database. We provide a formal security proof explicitly including side channels and an implementation on Intel SGX. Additionally, we show how to secure HardIDX in an active attacker environment. Our solution compares favorably with existing software- and hardware-based approaches. We require few milliseconds even for range searches on large data and scale to almost arbitrarily large indices. We only leak the access pattern and our trusted code protected by SGX hardware is very small exposing a small attack surface.
Footnotes
Acknowledgments
This research was co-funded by the German Science Foundation, as part of project P3 within CRC 1119 CROSSING, the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No. 644412 (TREDISEC) and No. 643964 (SUPERCLOUD), and the Intel Collaborative Research Institute for Secure Computing (ICRI-SC).
