Abstract
Driven by the cloud-first initiative taken by various governments and companies, it has become a common practice to outsource spatial data to cloud servers for a wide range of applications such as location-based services and geographic information systems. Searchable encryption is a common practice for outsourcing spatial data which enables search over encrypted data by sacrificing the full security via leaking some information about the queries to the server. However, these inherent leakages could equip the server to learn beyond what is considered in the scheme, in the worst-case allowing it to reconstruct of the database. Recently, a novel form of database reconstruction attack against such kind of outsourced spatial data was introduced (Markatou and Tamassia, IACR ePrint 2020/284), which is performed using common leakages of searchable encryption schemes, i.e., access and search pattern leakages. An access pattern leakage is utilized to achieve an order reconstruction attack, whereas both access and search pattern leakages are exploited for the full database reconstruction attack. In this paper, we propose two novel schemes for outsourcing encrypted spatial data supporting dynamic range search. Our proposed schemes leverage R+tree to partition the dataset and binary secret sharing to support secure range search. They further provide backward and content privacy and do not leak the access pattern, therefore being resilient against the above mentioned database reconstruction attacks. The evaluations and results on the real-world dataset demonstrate the practicality of our schemes, due to (a) the minimal round-trip between the client and server, and (b) the low computation and storage overhead on the client side.
Introduction
The information retrieval community has been studying geometric range search (GRS) for decades [1,24] and it has a wide range of applications in geosciences, location-based services, geographical information system, geo-medical engineering, and so on. Besides its use in applications assisting in our daily life activities such as taking an Uber, finding nearby locations on Google Maps or friends on Facebook, GRS can be used in some significant emerging public health and safety applications. For instance, with the current COVID-19 outbreak, governments and researchers need to collect information (e.g. number of the test taken, confirmed cases, death toll, etc.) in a specific geometric area. The need is the same in other emergency situations, e.g., a bushfire emergency situation.
Driven by the cloud-first policy of many companies and governments, outsourcing the spatial data to a cloud server is a common practice around the world. The cloud provides the scalable infrastructure to handle large datasets and supports on-demand access through its highly available services. Data privacy is a necessity in such scenarios. Although public cloud providers are trusted in providing their services, they cannot be fully trusted for data privacy. One obvious solution is to only store encrypted data in the cloud. However, downloading and decrypting large datasets every time a search or update operation needs to be performed is completely impractical. Hence, searchable encryption (SE) is considered as a solution to correctly perform queries (search/update) over outsourced encrypted data.
Searchable Symmetric Encryption (SSE) efficiently enables search over encrypted data at the cost of revealing some well-defined information to the server, known as the leakage. The most common SSE leakage functions are access pattern and search pattern. Access pattern leaks all file identifiers that are matching a search query. In contrast, search pattern leaks the repetition of search queries (i.e., it is possible to determine if two search tokens correspond to the same query). Exploiting SSE leakages might enable an adversary (often an honest-but-curious cloud server) to infer information about the database beyond what is considered in an SSE scheme (e.g. leakage abuse attacks [8,26]).
Most of the existing SSE schemes that support geometric range search are designed in the static setting (i.e., updates of the database records after the setup are not possible or come at the cost of re-encryption and re-upload of the database). Although the dynamic setting provides more flexibility to the schemes and supports more real-world applications, it introduces more leakages. To capture new leakages in a dynamic setting, Bost et al. [5] introduced security notions for dynamic SSE, so called forward and backward privacy. Recently, Kasra-Kermanshahi et al. [18] showed that there might be additional leakages when dealing with geometric data that are not captured by Bost’s forward and backward privacy models, and introduced a new security notion for dynamic SSE over spatial data (called content privacy) that hides the access pattern both in search and update operations.
Different cryptographic primitives have been used to support secure range search over geometric data such as order-preserving encryption (OPE), somewhat/fully homomorphic encryption, Geohash, and so on [18,22,30–32,34–36]. However, due to the inherent leakages associated with geometric range search, the majority of them fail to resist the newly developed leakage abuse attacks that target SSE schemes designed for GRS [23,27].
Our contributions
In this paper, we propose two dynamic searchable symmetric encryption schemes to support geometric range search, Geo-DRS and Geo-DRS+. The first scheme illustrates a novel approach to support geometric range search using R+tree where more round trips between the client and the server are required to achieve content privacy (alternatively homomorphic encryption can be used at higher computational cost). Our Geo-DRS+ scheme provides an efficient dynamic range search by leveraging R+tree and secret sharing in
It is worth noting that this paper is an extension of our work published in European Symposium on Research in Computer Security (ESORICS’21) [16]. In this version, several sections are added to facilitate the understanding of the work presented. Furthermore, we conduct experiments on a real-world dataset, demonstrating the effectiveness of Geo-DRS+ in practice, and showing the significant improvement of efficiency by our design compared with state-of-the-art schemes.
Motivation and related works
Order Preserving Encryption (OPE) [2] is one of the most popular approaches to perform range search over encrypted data due to its efficiency. However, several studies have shown that it is possible to perform inference attacks on one-dimensional datasets using OPE leakages [11,17,19,20]. The search and access pattern leakages are the most common leakages used in performing inference attacks. For example, Naveed et al. [26] used frequency analysis to perform sorting and cumulative attack. Later, Durak et al. [8] discovered two more types of attacks (Inter-column correlation-based attacks and Inter+Intra-column correlation-based attack) using OPE leakages that have not been considered by Naveed’s work. Grubbs et al. [11] designed a leakage abuse attack which takes advantages of both frequency and order leakage of OPE. Grubbs’s attack is faster, with a higher recovery rate in comparison with Naveed’s cumulative attack. Furthermore, a passive adversary is also able to perform FDR without requiring auxiliary information, as discussed by Kellaris et al. [17].
The above discussed attacks mainly focused on one-dimensional data. Recently, Pan et al. [27] investigated data inference attacks against multi-dimensional OPE-encrypted databases. They designed a greedy and polynomial-time algorithm with approximation guarantees. The FDR attacks for geometric datasets were introduced recently by Markatou and Tamassia [23]. They utilized access pattern leakage to reconstruct the horizontal and vertical order of the points, and both access and search pattern leakages to recover the coordinates of the points.
Several studies have begun to support range search over encrypted spatial data [18,22,30–32,34–36]. For example, Wang et al. [30–32] proposed several constructions for geometric range search using SSW2
Shen-Shi-Waters.
Unlike other existing works in the area of geometric range search, Kasra-Kermanshahi et al. [18] proposed two constructions which consider forward and backward privacy. Moreover, they have defined a new security notion for spatial data named content privacy. Their constructions utilize binary tree and a special type of additive symmetric homomorphic encryption (ASHE). To the best of our knowledge, only three of the state-of-the-art symmetric searchable encryption schemes that support geometric range search are presented in a dynamic setting. Only one of them, Kasra-Kermanshahi et al. [18], considered forward, backward, and content privacy. However, the constructions are not scalable as the size of the utilized binary tree grows linearly with the number of grid points in each dimension of the environment.
Notation
Some of the notations that are used more frequently in the work are given in Table 1.
Table of notations
Table of notations
In this section, Dynamic Symmetric Searchable Encryption (DSSE) is briefly reviewed. Let Setup Search Update
R-tree and R+tree
R-tree was first introduced by Antonin Guttman in 1984 [13], to handle spatial data efficiently. This data structure is a height-balanced tree-structure with index records in its leaf nodes containing pointers to data objects. In this paper, we use R+tree [28], a variation of R-tree in which overlapping rectangles in intermediate nodes are avoided. Moreover, R+trees have better searching performance compared to R-trees [28].
We briefly review the example from [28] as shown in Figs 1, 2 and 3 to see how a R+tree is formed (for the sake of simplicity, the values of the bounding boxes (

The sample dataset.

The rectangles of Fig. 1 grouped to form an R+tree.

The R+tree built for Fig. 2.
In R+trees leaf nodes consist of
For each entry
There is no overlap in any two entries in an intermediate node.
The root has at least two children unless it is a leaf.
All leaves are at the same level/height.
We use the inverted index to facilitate the storage and search of the dataset. For instance, to build the inverted index of the R+tree shown in Fig. 3, we first label the R+tree nodes, where
For the sake of simplicity, the bounding boxes (
Inverted index
Inverted index
This work uses secure two-party computation based on bitwise secret sharings. An additively secret sharing of
In this work, secure multiplications of secret shared values are performed in a standard way using pre-distributed multiplications triples [3,7], which consist of (
For performing secure bitwise comparison, we use the same protocol as De Cock et al. [6], which is a variant of the protocol of Garay et al. [10] using bitwise secret sharings in the field

Comparison protocols
Syntax of our geometric dynamic range search (Geo-DRS+)
Our geometric dynamic range search (Geo-DRS+) scheme consists of the following algorithms:
Note that, in our model we assume that the data owner sets m large enough according to the size of the environment such that the insertion of new objects would not require node splitting (see [28] for more details). Thus, to add/delete an object only the corresponding leaf nodes would be updated. Moreover, even if the number of objects in a leaf node become larger than m, the data owner can proceed with splitting the corresponding leaf node and updating the encrypted records accordingly.
The Setup phase also generates the multiplications triples (that will be needed for the executions of Protocol
The leakage function
Range search leakage functions
We denote the leakage function of our Geo-DRS+ scheme by
Search pattern
Number of updates
Range search size
Range update size
R+tree structure (
Therefore, Geo-DRS+ leakage consists of
Security notions and definitions
Kasra-Kermanshahi et al. [18] introduced a new security notion for spatial data called content privacy. They formulated a leakage that was not captured in previous definitions such as forward/backward privacy [4,5]. In short, there should be no leakage on updated points neither in the search phase nor during the update. Content privacy and backward privacy (Type-II) have some common properties: both protect the content and do not leak anything about the documents’ identifiers in the update queries. However, backward privacy (Type-II) leaks information about the content in the search queries via the access pattern.
Backward privacy (Type-II) reveals all of the information contained in Backward privacy (Type-I)3
the document identifiers matching the issued search keyword when they were inserted, and the total number
A
(Content Privacy for Spatial Dataset).
A
Security model
The security model of the proposed constructions is formulated using two games; The scheme Σ is
This section first presents the Geo-DRS scheme to address the challenge of secure range search on spatial data in a dynamic manner. Figure 4 demonstrates the overview of Geo-DRS scheme. This base scheme imposes a logarithmic number of communication rounds between the client and the server to perform the search. One possible solution to avoid this communication overhead is to store the R+tree structure from root to the leaf nodes on the client side and put the rest on the server. However, this is not desirable as it contradicts the main goal of outsourcing the data and also is not appropriate for resource constrained devices. Therefore, we design Geo-DRS+, an enhanced version of the Geo-DRS scheme in which the single-server model of Geo-DRS is replaced with a two non-colluding server model, see Fig. 5. This enables us to shift the communication between the client and a server to the communication between the two non-colluding servers. To enable the servers to perform secure computation over the outsourced data and achieve backward and content privacy, we utilize binary secret sharing in Geo-DRS+.

The system model of Geo-DRS scheme.

The system model of Geo-DRS+ scheme.

Geo-DRS construction
To explain the ideas underlying our main construction (Geo-DRS+), we first describe the details of the Geo-DRS scheme in Algorithm 2. This scheme consists of three main algorithms: On input the dataset Encrypt each of the tree nodes and outsource it to the server. Client: Given the desired range query Server: Given the encrypted dataset It is also possible to use additive homomorphic encryption to perform the update at the server side (e.g. update in [18]), here we want to show only a basic scenario.
Data Owner: Given the update query
Server: The server replaces the corresponding entry for
In our model, we use a R+tree to categorise the data before creating the inverted index. We applied the technique of De Cock et al. [6] with the secret sharing of [10] in the field

Geo-DRS+ scheme.
The data owner can initially distribute some reasonable number of multiplication triples, and once the servers are about to run out of triples, they can request more triples to the data owner.
In our construction, each search result is a share of a list associated with a leaf node and client is the one who reconstructs the final result using these shares. To insert or delete an object within a list, the client generates the new shares of the list and the servers will replace the old shares with the new ones. Thus, (1) there is no leakage regarding the content of the dataset (object’s identifier), (2) it is impossible to distinguish which object was being updated, (3) the search queries do not leak matching objects after they have been deleted. As a result, our construction is content and backward private as proved below. Let Who follows the protocol instructions correctly, but try to learn additional information.
The proof proceeds using a hybrid argument, by game hopping, starting from the real-world game
As a result, our construction satisfies content and backward privacy as the search leakage does not include
We consider that the dataset objects are represented in a metre scale where coordinate values are 64 bits (
Our scheme requires the pre-distribution of random binary multiplication triples by the data owner to the servers in the setup phase which are needed for the secure comparisons during the search. This enables the servers to perform the search without further online interaction with the data owner. With the optimization explained in Section 2.5, the communication cost for pre-distributing each multiplication triple is a single bit. To compare the search query with each bonding box, four comparisons are required. As mentioned earlier each comparison costs less than
Comparison
Comparison
SE: Symmetric Encryption; ASHE: Additive Symmetric Homomorphic Encryption; PBPKE: Pairing-based Public Key Encryption; OPE: Order Preserving Encryption; Geohash:public domain geocoding system [25]; ASPE: Asymmetric Scalar-product-Preserving Encryption; SS: Secret Sharing R: Radius of the circle query; t: Bit length of coordinates (x and y); N: Number of the data points in the dataset;
Comparison
Table 3a and Table 3b illustrate the comparison between our Geo-DRS+ scheme with the state-of-the-art schemes supporting spatial range queries of encrypted data from different aspects. Except our scheme and Wang-2017, the search complexity on the server side in all of the existing related works is linearly dependent to the number of data points/records in the database. The token generation (search on client side) complexity is constant only in Geo-DRS+, Zhu-2015, and Zheng-2020, whereas in the rest of the related works it varies from scheme to scheme and depends on different factors such as radius of the circle query, bit length of coordinates, and number of data points/records in the database.
Beside of our Geo-DRS+ scheme, about half of the proposed schemes for geometric range search are presented in the dynamic setting, the rest have limited application as the update of the database cost the re-encryption and re-uploading the entire database. Among the dynamic schemes in this domain only our construction, Xu-2019, and Kasra-II-2020 have only one round of communication between the client and the server for search and update queries.
In terms of the leakages, the search pattern is inherent and unavoidable in all of the discussed schemes. Both constructions of Kasra-2020 and Geo-DRS+ support content privacy as they are not leaking the access pattern. More importantly the access pattern leakage is required to perform the order reconstruction attack, whereas both access and search pattern leakages are exploited for the full database reconstruction attack [23].
This section presents the experimental evaluation on the performance of the proposed constructions. All algorithms were implemented in Java (Nodejs v10.10.0, Typescript v3.4.3) on a 64-bit machine with 3.1GHz Intel®Core(i5) processor 8GB RAM and 256GB SSD. We implemented PRF evaluations with SHA-256. We conduct experiments on real-world datasets seqFISH+ [9] and STSCC [15] with 5000 records.
Memory cost
Memory cost
Performance (
Performance (
Performance (
The update operations for insertion and deletion perform the same. Thus, the cost of setup/update is fixed at 5.74 ms and 12.75 ms at client and server, respectively. The size of the encrypted database is affected by two parameters; the maximum number of entries per node and the total number of records in the dataset. That is, the distribution of the objects in the environment will result in different height of tree structures based on the limit on the maximum number of objects per node. As shown in Table 4, the encrypted dataset requires 114 MB of memory at the server for a dataset with 1000 records while the maximum number of objects per node is 30. The larger dataset with 4000 records requires 7.96 GB of memory if the maximum number of objects per node is set to 50.
We have tested the search time (both at client and server) as well as the communication overhead between client and server and between the two servers. The results for different settings are given in Table 5, 6, and 7.
The results indicate that while the increase in the number of records naturally increases the search time, the decrease in the maximum number of objects per node has the same effect. The reason for the increase in the search time, in this case, is that there are more round trips required to complete the search. For instance, as shown in Fig. 9 (similarly in Fig. 7 and 8) for the same number of objects per node (50), the search time increases from 128.75 ms to 180.1 ms when increasing the number of the dataset records from 1000 to 4000. On the other hand, when the number of dataset records is fixed, for example at 4000, the overall search time increases from 180.1 ms to 269.9 ms by decreasing the number of objects per node in the tree.

Search time of Geo-DRS+ scheme for

Search time of Geo-DRS+ scheme for

Search time of Geo-DRS+ scheme for
As shown in Table 5, 6, and 7, the communication cost between the client and the server is constant at 16 bytes which is the size of the token. However, the communications between the two servers vary from 501 KB (1000 records, 30 objects per node) to 1.1 MB (4000 records, 40/50 objects per node).
We first proposed a dynamic scheme for secure range search over spatial data and then extend it to a more efficient (in terms of client storage and round trips between client and server) version which we named Geo-DRS+. In terms of security and data privacy, Geo-DRS+ scheme has backward and content privacy. As Geo-DRS+ does not leak access pattern and does not rely on OPE, it is resilient against recently developed ADR and FDR attacks targeting the searchable encryption schemes supporting geometric range search. The comparisons between Geo-DRS+ and state-of-the-art schemes indicates that it is more appealing in practice due to lower computation and communication overhead.
