Homomorphic encryption approach for exploration of sensitive information retrieval

Abstract

This paper presents the core algorithms behind DB-Query-Encryption, a proposal that supports private information retrieval (PIR) explorations. DB-Query-Encryption permits users for selectively retrieve information from a cloud database whereas keeping sensitive data terms secretive. As an example use case, a medical research institute may, as part of a sensitive data exploration, requisite to look up facts about an individual person from a cloud database deprived of reveling the person’s identity. The basic idea behind DB-Query-Encryption is to uses homomorphic encryption, which allows the cloud server to fulfill this request, whereas making it infeasible for the database owner (or a hacker who might compromised) to conclude the name being explored for, either which records are retrieved. The query, which retrieved the information, still secretive even if the spectator be able to search all the data over the cloud server and all the actions as they are being executed. Within that period, the query response produced by the cloud server is considerable smaller than the whole cloud database, making it more convenient when it is not feasible or appropriate for the user to transfer the entire database.

Keywords

Cloud database homomorphic encryption privacy sensitive information retrieval information security

1 Introduction

Sensitive information retrieval (SIR) is a powerful technique, which permits users to retrieve data from an untrusted cloud server without the server, or any malicious spy being capable to conclude what data the users is exploring.

SIR has many potential applications besides the medical research use case mentioned in the abstract. An enterprise may not want to disclose its confidential investigation whenever the company are going for to register a patent. Numerous queries into a map database may specify a certain geographical position by an oil drilling business or military units. An financier interrogating the stock market database for the worth of a certain stock desires to retain secretive the identity of the searched stock. Businesses that possess an enormous volume of proprietary data and mark it accessible to customers may possibly over as an auxiliary confidentiality feature the capability to exploring it using Sensitive Information Retrieval.

In 1995 [1] SIR was first time proposed. With respect to the proposal [1], an information in theory protected SIR solution will permanently needed communicating data at-least proportionate to the size of the database. In view of this, a number of way out projected in [1, 2], which include initial duplicating the database among more than one server, by which the transfer complexity was much lesser in between the user and the databases. All the mentioned solutions necessitate the servers not to connive to each other in direction to keep the query protected in communication. It might not be a convincing theory. Therefor a new concept was introduce [3, 4], in which a single-server computationally secretive SIR was existing, where the query secrecy hang on computational toughness expectations slightly than data theoretic safety. Since then Sensitive Information Retrieval (and more generally, Sensitive search) has been a most demanding field of research. In earlier years [5 –8] there were numerous proposals of SIR systems has been introduced by the researchers.

Organization. The rest of the paper is structured as follows. Section 2 presents the basic concepts of the DB-Query-Encryption, how a secure query is to be generated and responded. Section 3 familiarizes the aids of the Paillier encryption one of the key techniques under homomorphic encryption scheme. The sensitive information retrieval using the homomorphic encryption over the cloud database presented in Section 4. Section 5 defines how DB-Query-Encryption encompasses the basic Sensitive Information Retrieval clarification in Section 4 to handle numerous equivalent records and numerous targeted selectors, as defined in Section 2. The distributed processing of the Sensitive Information Retrieval queries supported over DB-Query-Encryption system are presented in Section 6. Section 7 concludes the paper.

2 DB-query-encryption concepts

In the DB-Query-Encryption [9] model, a user with a query desires to selectively explore data from a cloud database that has access to a 1 stream 2 of records. The query user, who supposed to identify the organization of the information, communicates to the query responder to in which fashion every record by means of a query schema file will be processed. This file stipulates to the Query responder how to extract the information and send back data from the cloud record. The query user has a secretive list of precise selector values known as the targeted selectors, and requests to acquire the received data for all records, the values which matched with the one of the among targeted selectors, but deprived of exposing the targeted selectors to the query query responder.

Suppose there is a situation where the records may have the rows of a table or view in the form of relational database, then the query schema have to stipulate for which one table column to be used as the selector field and for which table columns to be explore as the returning data. For all the information, which processed in this operation are not hidden from the query query responder, because the query query responder needs to operate on those. In this process, the selected values of the targeted selectors are secure.

DB-Query-Encryption processes in the following steps:

2.1 Query generation

The Query user produces an encrypted query file that will be directed to the Query responder. This file just contain the encrypted query. The substance of this file contains list of cipher text data, which securely retrieves the information around the targeted selectors. The query file moreover contains the query schema in plaintext manner. Within this process of generating the query, file a decryption key also generated. That decryption key securely kept by the query user for further steps in decryption of explored data after query.

2.2 Response generation

Once the query responder receive the query file, it process the query file with related records in the stream. For every selected record, the Query responder extracts its selector S and return information I based on the query schema, and then executes a cryptographic action that “inserts” I into an “encrypted buffer” that either keeps or rejects the data founded on the value of S. Here the Query responder executes the alike operation on all the selected records, whereas the homomorphic encryption procedure did not identify whether the data rejected or kept. Finally, the encrypted buffer is reverted to the Query user in a response file. For optimal usage of the memory and disk for processing a determined stream or a large database, the Query responder send back response files and flush off the encrypted buffer time to time. The capability to reject data defines that the response file that is send back to the Query user in the form of lesser data than the total amount of responses data.

2.3 Response decryption

As the Query user retrieves the response files; it will be decrypted with the help of decryption keys and extract all the returned data from those records that will permits from the filtering process. So final records included all those records which selector S matches one of the targeted selectors. Within this, the percentage of non-matching records might take as false positives, which is in small in figures. Supplementary evidence (ex.: a checksum of the record selector, or the selector itself) might include in the response data to permit the Query user to recognize and reject these false positives.

3 Homomorphic encryption

Encryption translates a plaintext p into a cipher text $E$ (p) from which it is hard to recover information about p without having the decryption key. With traditional encryption approaches like AES [10], it is difficult to modify cipher texts with any method to fetch about an organized modification of the decrypted value. In spite of this, the homomorphic encryption schemes permit assured operations to perform on cipher texts that interpret into suitable operations on the original plaintext, deprived of requiring for decrypt the cipher texts first. Variations of such encryption arrangements offered for usage in the Sensitive Information Retrieval environment. Encrypted Query currently usages one such scheme named Paillier cryptosystem [11].

3.1 Paillier encryption

DB-Query-Encryption comprises the optimizations techniques to the Paillier cryptosystem, which are included in [12]. In this section, we define the basic system only, deprived of those optimizations.

Paillier cryptosystem, uses the two large random prime number p and q of approximately of same size to compute the public key N =p.q. The decryption key consists of the factors p and q. now from the generated public key N we can enable the encryption process of plain text, whereas the decryption key required for decrypting the decrypted cipher text. This arrangement is alike to that of the extensively used RSA cryptosystem [13].

In Paillier cryptosystem the plaintext can have values in the range of {0, …, N-1}, whereas in the cipher texts will yield on values in the higher range {0, …, N² -1} 3 . Therefore, in simple words, the cipher text will have twice the range of the plain text (with expansion factor 2). Meanwhile the protocol for DB-Query-Encryption includes transferring cipher texts, this proportion is an extent of exactly how efficiently bandwidth used to transmit data.

The bit length of N; determines the safety of the Paillier system; this consideration called the (Paillier) key size. Here “safety” determines the difficulty level of reversing the encryption process (i.e. exploring information 4 from cipher texts about the core plain texts) deprived of the decryption key. (Specifically, it should be challenging to conclude the decryption key assumed only the public modulus N.) [14, 15] determines the suggested key sizes with suggested plain text and cipher text sizes for three different safety levels 5 .

Encrypting a plaintext x encompasses producing an arbitrary value r $\overset{$}{\leftarrow}$ {1, …, N-1} and then using the formula: $E (x) \leftarrow (1 + xN) r^{N} mod N^{2}$ (1)

The “noise factor” r^N efficiently hides the plaintext value x. To recovering the plain text, we requisite r to be an invertible component of $ℤ$ / N² $ℤ$ . We can undertake this will be the circumstance, since it is particularly unlikely that r is divisible by the values with p or q, if r is arbitrarily produced. When $E$ (x) calculated, r will not be required another time and may rejected.

With the help of randomness approach the encryption process, generate different cipher text for the same plain text. In detail, the Paillier cryptosystem provides the following crucial semantic security property; where a unauthorized user (without the decryption key) cannot determine whether those cipher texts are generated from the same plain text or it generated from the another plain text.

The Paillier scheme contains following homomorphic property: For given two cipher texts E₁ = $E$ (x₁) and E₂ = $E$ (x₂) which encrypts the given plain texts x₁ and x₂. The product of E₁.E₂ mod N² will generates a new cipher text, which again encrypts the plain text x₁ + x₂ mod N. It can be written as follows: $E (x_{1}) \cdot E (x_{2}) mod N^{2} \sim E (x_{1} + x_{2}) mod N$ (2)

Inspite of using “ = ″ we are using “ ∼ ″, $“ E^{″}$ on the R.H.S. is a randomized algorithm, therefore $E (x_{1} + x_{2})$ does not have a distinct values. Generally, multiplication operation on cipher texts, relates to addition operation on plaintext. To verify this the equation 2 always satisfy, we write out the L.H.S. of equation 2 and it illustrate the form of a cipher text $E (x_{1} + x_{2} \mod N) :$ $\begin{matrix} E_{1} \cdot E_{2} & \equiv (1 + x_{1} N) r_{1}^{N} \cdot (1 + x_{2} N) r_{2}^{N} (mod N^{2}) \\ \equiv (1 + x_{1} N) (1 + x_{2} N) r_{1}^{N} r_{2}^{N} (mod N^{2}) \\ \equiv (1 + (x_{1} + x_{2}) N + x_{1} x_{2} N^{2}) \\ {(r_{1} r_{2})}^{N} (mod N^{2}) \\ \equiv (1 + (x_{1} + x_{2}) N) {(r_{1} r_{2})}^{N} (mod N^{2}) \\ \equiv E (x_{1} + x_{2}) \end{matrix}$

For future use, we will correspondingly require the subsequent second homomorphic property: $E (x)^{r} mod N^{2} \sim E (r \cdot x) mod N$ (3)

This is essentially just a consequence of equation 2, since $E (r)^{r}$ is repetitive multiplication operation by $E (x)$ and r · x is repetitive addition operation by x.

Decrypting of cipher text is performed using the decryption exponent λ = LCM (p - 1 ; q - 1) , which can be calculated from the Sensitive key 6 . We also require the value w = λ^-1 mod N . The values λ and w may be precomputed and stored as part of the Sensitive decryption key.We can now write down the formula for decrypting a given cipher text E = (1 + xN) r^N mod N² : $u \leftarrow E^{λ} mod N^{2}$ (4) $D (E) \leftarrow (w \cdot \frac{u - 1}{N}) mod N$ (5)

The main object to working of this equations is that Nλ is the “exponent” of the collection of units ${(ℤ / N^{2} ℤ)}^{x}$ [16], so that nurturing to the λ - th power revokes the noise factor r^N : $\begin{matrix} u & \equiv (1 + xN)^{λ} r^{N λ} (mod N^{2}) \\ \equiv (1 + x λ N) \cdot 1 (mod N^{2}) \\ \equiv 1 + x λ N (mod N^{2}) \end{matrix}$

3.2 Other homomorphic encryption schemes

Homomorphic encryption is one of the active field of the research, e.g. [7 , 18], and from now on a lot of researcher proposed modified Paillier cryptosystem in view of the Sensitive information retrieval. Between all those proposals two proposals getting solemn attention are the Brakerski-Fan-Vercauteren [19] and Brakerski-Gentry-Vaikuntanathan [20] systems. These two approaches were based upon the principle of mathematical objects known as lattices. While the lattice-based approaches are still progressing, they assurance numerous possible benefits over the Paillier approaches:

3.2.1 Flexibility

The latest proposed approaches are fully homomorphic or at least somewhat homomorphic, by which the users can accomplish both addition operation as well multiplication operation over the core plaintext just by functioning with cipher texts. This can potentially permit surplus difficult SIR queries.

3.2.2 Speed

Methods such as the Number Theoretic Transform allow faster plain text and cipher text operations in convinced lattice-based schemes [21].

3.2.3 Post-quantum security

The improvement of an operative full-scale quantum computer, assumed by some researchers to be attainable within two eras away [22, 23], would fully break all present factoring-based cryptography, as well as the Paillier cryptosystem. In spite of important work, there are at present no recognized effective quantum computer-based attacks on well-designed lattice-based systems like proposed in [19] and [20].

On the other hand, except for the probability of a quantum computer, the Paillier approach has few benefits over existing lattice-based homomorphic encryption approach, so there may be exciting tradeoffs to deliberate when selecting among different approach:

3.2.4 Known security

Factoring-based cryptography has been everywhere for much longer and has withstood important for a much longer time.

3.2.5 Lower expansion ratio

The expansion ratio for existing lattice-based approach is at least quite a few times longer than that of the Paillier approach, therefore, Paillier approach consume lesser bandwidth to transfer the similar quantity of information.

Whereas DB-Query-Encryption at present only uses the Paillier cryptosystem, provision for a lattice-based approach is being developed.

4 SIR with homomorphic encryption

With the concept of Paillier approach, we can now answer the basic problems in SIR: suppose a query Responder with a list of indexed data elements x₁, …… , x_n, how can a query user explore the i - th element x_i from the query Responder without disclosing i to the query Responder? For this, we neglects the insignificant solution where the query Responder simply just sends the complete list to the Query user. We assume that the list elements are representable as plain texts within the Paillier approach, i.e.0 ≤ x_j ≤ N - 1. The solution for the problem, we are explaining here derives from [24] and is the preliminary idea of the DB-Query-Encryption protocol.

The Query user initially generates a Paillier modulus N and the linked decryption key, and then formulates a list of n cipher text values E₁, …… , E_n, where E_i (conforming to the secret index i) is an encryption of the plain text value 1 and all the further values are encryptions of zero: $\begin{matrix} E_{i} : = E (1) \\ E_{j} : = E (0) for j \neq i \end{matrix}$

We calls it as the query components.

The Query user refers the public modulus N and the well-ordered list of query components E₁, …, E_n to the Query Responder, while possessing the decryption key (factors of p and q) Sensitive. Because of the randomized function of Paillier encryption approach, the E_i will be indistinguishable to the query Responder from entirely arbitrary cipher texts. Note the significance of semantic safety subsequently there are recurrent encryptions of the sample plain text (zero) but they will all be dissimilar, seemingly entirely arbitrary values. In specific, the query Responder will not be capable to tell which record relates to the plain text value of 1. The query Responder now adjusts an accumulator R ← 1(a Paillier cipher text value) and apprises it for respectively element in the list x₁, …… , x_n: $R \leftarrow E_{j}^{x_{j}} \cdot R mod N^{2} for j = 1, 2, \dots, n$ Note that every list element handled in the similar fashion. The query Responder refers the final value of R to the Query user.

When the Query user decrypts this value the user acquires the preferred element x_i in the list. To understand why this is the situation, we can practice Properties 2 and 3 to trail what is processing through the accumulator. The preliminary value of R is 1, which is a appropriate encryption of the zero plain text. For each repetition of the loop except for the i-th one, the query element E_j being used is $E (0)$ , and $E_{j}^{x_{j}} = E (0)^{x_{j}} \sim E (x_{j} \cdot 0) = E (0)$ by Property 3, so modernizing the accumulator R by $E_{j}^{x_{j}} \cdot R mod N^{2}$ does not distress the primary plain text, by Property 2. Throughout the i-th step (when j = i) we have $E_{j}^{x_{j}} = E (1)^{x_{i}} \sim E (x_{i} \cdot 1) = E (x_{i})$ , so multiplying R by $E_{j}^{x_{j}}$ in this circumstance has the consequence of adding x_i to the original plaintext.

We can visualize this procedure as in Figure 1. The metaphor of a box used to signify a cipher text comprising data. Figure 1(a) shows a sole box demonstrating the cipher text accumulator R from the query Responder’s opinion of observation. This box has n diverse cells, with each element x_j being concentrating to its individual cell by its equivalent query element E_j. Only the i-th cell is “real” (i.e. can hold data, because $E_{i} = E (1))$ but the query Responder does not recognize which one; the rest are“virtual” cells that silently rejects data (because $E_{j} = E (0))$ . When the Query user obtains the concluding value of R (Figure 1(b)), the cipher text simply comprises data from the actual cell, which the Query user can excerpt after decrypting the cipher text.

Fig. 1

Basic SIR Filtering.

5 SIR in DB-query-encryption

This section defines how DB-Query-Encryption encompasses the basic Sensitive Information Retrieval clarification in Section 4 to handle numerous equivalent records and numerous targeted selectors, as defined in Section 2. We construct upon the exposition in [25], which lays out the algorithms for the novel Pirk scheme. During this section, we delight the records as a stream of selector-return data pairs (T, D).

In view of researcher’s suitability, we comprise here a table 2 “table of notation”. Those expressions which are not previously presented, will be clarified with table 2.

Table 1
Paillier Cryptosystem key size and respective plaintext and cipher text sizes for three different safety levels

Security Level (Bits) Key Size (Bits) Plain Text (Bits) Cipher Text (Bits)

80 1024 128 256

112 2048 256 512

128 3072 384 768

Security Level (Bits)	Key Size (Bits)	Plain Text (Bits)	Cipher Text (Bits)
80	1024	128	256
112	2048	256	512
128	3072	384	768

Table 2

Table of Notation

Notation	Description
N	the Paillier modulus/public key
p, q	the Paillier private key (N = pq)
$E, D$	Paillier encryption and decryption
$E_{N}, D_{p, q}$	Paillier encryption and decryption, with explicit keys
ℓ	hash length, in bits
E _j	the query element corresponding to records with H(T) = j
H	selector hash function (with ℓ-bit output)
τ	number of selectors
b	chunk size, in bits
T _j	the j-th targeted selector
k	hash key
H _ℓ,k	selector hash function, with explicit output length and hash key
T	the selector for a general record
D	the return data for a general record
d _i	a chunk of the return data D, stored in the i-th position within a plaintext
H _a ux	auxiliary selector hash function

5.1 Handling large selector space via hashing

The simple explanation in Section 4 handles data indexed from 1 to n. To handle extra general selector places where the total of promising selector values is enormous or limitless (e.g. full names of users), DB-Query-Encryption usages a hash function⁷ H to translate selector values into numbers, alike to the method proposed in [27]. The outputs of H are ℓ-bit numbers, where the parameter ℓ is defined as the hash length. The encrypted query file now comprises 2^ℓ cipher texts E₀, …, E_2^ℓ-1, each conforming to one likely output value of H. This displayed in Figure 3. For each record, the hash of the selector (the full name in this sample) now concludes which query element E_j to use, and thus which cell the yield data drives into. The list of query elements was organized using “Smith, John” as the targeted selector, so E₇ is an encryption of 1 and corresponds to the only “real” cell in the image.

Since each cell can only comprise one data item, it is necessary to pick a large value for ℓ to reduce the possibility of hash collisions. However, since each E_j is characteristically numerous hundred bytes long (Section 3.1), ℓ cannot be bigger too much deprived of creating the query file excessively large. Therefore, there may be a applied upper bound on ℓ of about 20 or so.

5.2 Handling multiple selectors via slicing

DB-Query-Encryption makes well-organized use of bandwidth and permits better query elasticity by permitting several targeted selectors inside the same query. To handle τ targeted selectors, we preference a bit width b (the chunk size) and use plaintexts of the form $d_{0} + d_{1} \cdot 2^{b} + d_{2} \cdot 2^{2 b} + \dots + d_{τ - 1} \cdot 2^{(τ - 1) b}$ (6) to store τ pieces of data, each from a different record. In consequence, the offered storage space in a plain text is divided into τ consecutive b - bit ranges. The pieces d_i in 6 are also the digits of the plain text when articulated in base 2^b. The parameters τ and b are limited by the condition 2^bτ < N.

Query generation is improved to handle slicing as follows. Symbolize the τ targeted selectors by T₀, …, T_τ-1. For nowadays assume that their ℓ-bit hash values are different. The query elements E₀, … E_2^ℓ-1 are now selected so that each E_i is moreover an encryption of a power of two 2^jb (i.e. a 1 in the j-th plaintext “digit” location) or an encryption of zero: $E_{i} = {\begin{matrix} E (2^{j b}), & if i = H (T_{j}) \\ E (0), & otherwise \end{matrix}$ The adapted procedure clarified in Figure 3. The cipher text now comprises not just one but τ “real” cells (Figure 3a). We use reduced boxes to recommend the fact that while the plaintext and cipher text sizes remain the same, as before, a smaller value can now be stored in each cell. For example, taking b = 8 in this example, we store at most one byte (any value between 0 and 255) in each cell. The selector values “Smith, John” and “Smith, Jane” are two of the τ targeted selectors, and their hash values 7 and 17 correspond to real cells, i.e. byte positions within the plaintext. Figure 3b shows the Query response decryption and decoding step. Decrypting the cipher text results in a large integer, written here in hexadecimal form. The data for the different selectors are read off from different byte positions within the integer.

Fig. 2

Handling large selector space via hashing (Responder view).

Fig. 3

Handling multiple selectors.

This scheme would still work even if some of the targeted selectors have the same hash value, but the Query user would need additional information to determine which selector the recovered data corresponds to, since that information is no longer determined by the digit position alone. DB-Query-Encryption tries to ensure that different targeted selectors have different hash values if possible by using a randomly generated 32-bit hashkeyk as part of the input to H. If a hash collision occurs, a different hash key k′ is tried, up to a certain maximum number of tries.

5.3 Chunking up the data

To handle return data that are longer thanb bits, the data splits into b bit chunks that are handled sequentially with separate cipher text accumulators. New accumulators are created as necessary. A counter used for each of the 2^ℓ possible selector hash values to track which accumulator to use next to store a chunk of data for that hash value (Figure 4).

Fig. 4

Chunking (Query Responder view).

5.4 Handling hash collisions

If (T, D) is a data record and H (T) happens to match the hash value of one of the targeted selectors T_j, then the return data D will be included in the DB-Query-Encryption response. This is called a false positive. Since the selector hash function H has a limited output range (ℓ-bit values only), this is expected to occur in practice for a small percentage of records. The rate at which this is expected to happen for non-matching records (T ∉ { T_j }), known as the false positive rate, depends on the various parameters and the nature of the database. If we expect selector values to be very diverse, so that the hash value is more or less uniformly random, then a random non-matching selector has a τ/2^ℓ chance of having the same hash value as of the targeted selectors. (For example, for τ = 100 targeted selectors and hash length ℓ = 20 we would expect roughly 1 in 10,000 of all non-matching records to be retrieved.) In general, the false positive rate can be reduced by increasing ℓ (thus increasing the size of the query file), or (with lesser effect) decreasing τ.

The user should ensure that the return data allows the Query user to decide whether or not it is a false positive. DB-Query-Encryption provides an option to prepend to the return data a 32-bit auxiliary hash value H_aux (T) of the selector. During response decryption, the Query user compares this second hash value included in the return data with the expected value H_aux (T_j) and silently drops the data if the hash values do not match.

5.5 Putting it all together

We now put the preceding ideas together and encapsulate the primary algorithms used in DB-Query-Encryption. The Query user first generates a Paillier key pair N, p, q as in Section 3.1, then runs Algorithm 1 to create the query. The input to Algorithm 1 includes the Paillier public key N and a list of targeted selectors {T₀, … , T_τ-1 }, and the process returns a hash key and a list of query elements. The algorithm uses the Paillier encryption operation $E$ which is described in Section 3.1.

Algorithm 1 Query Generation

1: procedure QueryGen b, ℓ , N, { T₀, …, T_τ-1 }

2: repeat

3: $k \overset{$}{\leftarrow} {0, \dots, 2^{32} - 1}$

4: H ← H_ℓ,k

5: forj ← 0, …, τ - 1 do

6: h_j ← H (T_j)

7: end for

8: untilh₀, …, h_τ-1 are distinct

9: fori ← 0, …, 2^ℓ - 1 do

10: ifi = h_j for some jthen

11: $E_{i} \leftarrow E_{N} (2^{jb})$

12: else

13: $E_{i} \leftarrow E_{N} (0)$

14: end if

15: end for

16: returnk, { E₀, …, E_2^ℓ-1 }

17: end procedure

Algorithm 2 Response Generation

1: procedure ResponseGenb, ℓ , N, k, { E₀, …, E_2^ℓ-1 } , T = {(T, D)}

2: H ← H_ℓ,k

3: c ← 0

4: fori ← 0, …, 2^ℓ - 1 do

5: c_i ← 0

6: end for

7: for (T, D) in T do

8: h ← H_k (T)

9: D₀ ∥ … ∥ D_s-1 ← D

10: ifc < c_h + sthen

11: fort ← c, …, c_h + s - 1 do

12: Y_t ← 1

13: end for

14: end if

15: fort ← 0, …, s - 1 do

16: $Y_{c_{h} + t} \leftarrow E_{h}^{D_{t}} \cdot Y_{c_{h} + t} mod N^{2}$

17: end for

18: c_h ← c_h + s

19: end for

20: return{Y₀, … , Y_c-1 }

21: end procedure

Algorithm 2 encapsulates the query response generation process as the Query Responder processes the query against the data stream $T$ to produce the selector-return data pairs (T, D). We present here the basic “batch” version that returns a single response file after processing the entire data stream. Finally, the Algorithm 3 summarizes the response decryption process, which uses the Paillier decryption operation from Section 3.1. The algorithm returns τ separate result sets R₀, …, R_τ-1, where R_j contains the return data for the records matching the targeted selector T_j. This basic version assumes that the hash values H_k (T₀) , …, H_k (T_τ-1) are all distinct, and that all the return data have the same length, consisting of s chunks of b-bit values. We include the checking of an auxiliary hash value to reduce false positives.

Algorithm 3 Response Decryption

1: procedure ResponseDec b, ℓ , {p, q} , { T₀, …, T_τ-1 } , k, { Y₀, …, Y_c-1 }

2: H ← H_ℓ,k

3: fort ← 0, …, c - 1 do

4: $X_{t} \leftarrow D_{p, q} (Y_{t})$

5: $\sum_{j = 0}^{τ - 1} X_{t, j} \cdot 2^{bj} \leftarrow X_{t}$

6: end for

7: forj ← 0, …, τ - 1 do

8: R_j ← {}

9: fort ← 0, …, c/s - 1 do

10: D ← X_st,j ∥ X_st+1,j ∥ ⋯ ∥ X_st+s-1,j

11: ifD is not zero then

12: extract included auxiliary hash value h_aux from D

13: ifh_aux = H_aux (T_j) then

14: R_j ← R_j ∪ {D}

15: end if

16: end if

17: end for

18: end for

19: return{R₀, … , R_τ-1 }

20: end procedure

6 Distributed Processing

The Responder’s work can be distributed amongst multiple processors to increase throughput. The method described here is an elaboration of the method in [25, 28].

By regarding the power $E_{i}^{d_{i}} mod N^{2}$ as a cipher text-plaintext multiplication operation E_i ⊗ d_i, one can regard the key SIR computation $Y = \prod_{i = 0}^{2^{ℓ} - 1} E_{i}^{d_{i}} mod N^{2}$

(where the d_i are plaintext values coming from data chunks) as a matrix product of a row vector times a column vector: $Y = [\begin{matrix} E_{0} & \dots & E_{2^{ℓ} - 1} \end{matrix}] \cdot [\begin{matrix} d_{0} \\ ⋮ \\ d_{2^{e} - 1} \end{matrix}]$ To handle multiple columns of plaintext chunks, we simply replace the column vector with a multi-column matrix D: $\begin{matrix} Y & = [\begin{matrix} E_{0} & \dots & E_{2^{ℓ} - 1} \end{matrix}] \cdot D \\ = [\begin{matrix} E_{0} & \dots & E_{2^{ℓ} - 1} \end{matrix}] \cdot [\begin{matrix} d_{0, 0} & \dots & d_{0, r - 1} \\ ⋮ & ⋮ & ⋮ \\ d_{2^{ℓ} - 1, 0} & \dots & d_{2^{ℓ} - 1, r - 1} \end{matrix}] \end{matrix}$

The work to compute this vector-matrix product can be parallelized by partitioning the query element indices 0, …, 2^ℓ - 1 and/or the column indices 0, …, r - 1 into subranges. The work for separate subranges can be performed independently.

Algorithm 4 Response Generation (Distributed)

1: procedure RResponseGenDist b, ℓ , N, k, { E₀, …, E_2^ℓ-1 } , T = {(T, D)} , m, r

2: H ← H_ℓ,k

3: w← ⌈ 2^ℓ/m ⌉

4: forj ← 0, …, m - 1 do

5: H_j← { jw, …, min { (j + 1) w, 2^ℓ - 1 }}

6: end for

7: for (T, D) in T in parallel do

8: i ← H (T)

9: j← ⌊ i/w ⌋ (j, i, D)

10: end for

11: forj ← 0, …, m - 1 in parallel do

12: initialize M to a $| H_{j} | \times r$ all-zero matrix

13: for $i in H_{j}$ do

14: c_i ← 0

15: end for

16: for (j, i, D) do

17: D₀ ∥ … ∥ D_s-1 ← D

18: ifr < c_i + sthen

19: end if

20: fort ← 0, …, s - 1 do

21: M_{i,c_i+t} ← D_t

22: end for

23: c_i ← c_i + s

24: fort ← 0, …, r - 1 in parallel do

25: $Y_{t}^{(j)} \leftarrow \prod_{i \in H_{j}} E_{i}^{M_{i, t}} mod N^{2}$ $(t, j, Y_{t}^{(j)})$

26: end for

27: end for

28: end for

29: fort = 0, …, r - 1 in parallel do

30: Y_t ← 1

31: for $(t, j, Y_{t}^{(j)})$ do

32: $Y_{t} \leftarrow Y_{t} \cdot y_{t}^{(j)} mod N^{2}$

33: end for

34: end for

35: return{Y₀, … , Y_r-1 }

36: end procedure

The modified procedure is shown as Algorithm 4. This version also assumes that the return data has a fixed size of sb-bit chunks. To divide the work amongst mprocessors P₀, … P_m-1, we split the range of possible hash values 0, …, 2^ℓ - 1 into m disjoint ranges $H_{0}, \dots, H_{m - 1}$ and assign one range to each processor. Processor P_j is only responsible for those records (T, D) for which $H_{k} (T) \in H_{j}$ . The outputs from the various processors then can be combined together simply by multiplying all m ciphertexts together at each position. If the output of each processor P_i is $Y_{0}^{(i)}, Y_{1}^{(i)}, \dots$ , then the combined result is Y₀, Y₁, … where $Y_{t} = \prod_{j = 0}^{m - 1} Y_{t}^{(j)} mod N^{2}$ for each column index t.

This time we use a fixed number r of ciphertext accumulators (the matrix width). Records (T, D) that do not fit into the accumulators (c_H (T) + s > r) are discarded for this iteration of the algorithm and must be reprocessed by the next iteration.

7 Conclusion

In this paper, we have introduced the core concepts and algorithms behind DB-Query-Encryption system that supports Sensitive Information Retrieval queries. These algorithms include the Paillier homomorphic encryption scheme as well as the algorithms for query generation, response generation, and response decryption. The algorithms are highly parallelizable, so DB-Query-Encryption can handle large databases with high throughput on distributed systems. The algorithms are also tunable via parameter selection to enable optimizations for specific query conditions. With support for a lattice-based encryption scheme to be added in the near future, DB-Query-Encryption will become even more efficient in addition to achieving post-quantum security.

Footnotes

The algorithms used in DB-Query-Encryption can be adjusted to handle more complex data flows with numerous sources.

This could be a finite stream, e.g. the rows of a database table, or a stream that runs indefinitely, e.g. a Medical Research topic.

He usable decrypted cipher texts are the values in {1... N²- 1} that are relatively prime to N.

For case, fully recovering one or more than one plaintexts, or decisive whether two dissimilar cipher texts characterize the identical plaintext.

The safety level calculated in terms of the predictable calculation effort it receipts to breakdown the system. Even though not sanctioned by the U.S. NIST for protective U.S. federal government information [], safety levels underneath 112 bits might nonetheless be satisfactory for few applications.

The value LCM (p - 1 ; q - 1) is known as the Carmichael function of N[].

This hash function would not require cryptographic safety properties; it just desires to translate arbitrary-length input strings to fixed bit-length output values and have sensible anti-collision properties. Currently DB-Query-Encryption computes H by taking ℓ bits from an MD5 output [].

References

Chor

, Goldreich

, Kushilevitz

and Sudan

, Private information retrieval, in: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, FOCS ’95, IEEE Computer Society, Washington, DC, USA, (1995), pp. 41.

Rauthan

and Vaisla

, Vrs-db: Preserve confidentiality of users’ data using encryption approach, Digital Communications and Networks (2019).

Kushilevitz

and Ostrovsky

, Replication is not needed: Single database, computationally-private information retrieval, in: Proceedings of the 38th Annual Symposium on Foundations of Computer Science, FOCS ’97, IEEE Computer Society,Washington, DC, USA, (1997), pp. 364.

Rauthan

J.S.

and Vaisla

K.S.

, Vrs-db: Computation exploration on encrypted database, in: 2019 International Conference on Big Data and Computational Intelligence (ICBDCI), IEEE, (2019), pp. 1–6.

Angel

and Setty

, Unobservable communication over fully untrusted infrastructure, in: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, OSDI’16, USENIX Association, Berkeley, CA, USA, (2016), pp. 551–569.

Angel

, Chen

, Laine

and Setty

, Pir with compressed queries and amortized query processing, in: 2018 IEEE Symposium on Security and Privacy (SP), IEEE Computer Society, Los Alamitos, CA, USA, (2018), pp. 962–979. doi:10.1109/SP.2018.00062

Rauthan

and Vaisla

D.K.

, Privacy and security of user’s sensitive data: A viable analysis, in: International Conference on Research in Intelligent and Computing in Engineering10 (2017),pp. 67–71. doi: 10.15439/2017R45

Rauthan

J.S.

and Vaisla

K.S.

, Scrambled database with encrypted query processing: Cryptdb a computational analysis, in: 2017 1st International Conference on Intelligent Systems and Information Management (ICISIM), (2017), pp. 199–211. doi:10.1109/ICISIM.2017.8122174

Rauthan

J.S.

and Singh Vaisla

, Vrs-db: Computation exploration on encrypted database, in: 2019 International Conference on Big Data and Computational Intelligence (ICBDCI), (2019), pp. 1–6. doi:10.1109/ICBDCI.2019.8686098

10.

Miller

F.P.

, Vandome

A.F.

and McBrewster

, Advanced Encryption Standard, Alpha Press, 2009.

11.

Paillier

, Public-key cryptosystems based on composite degree residuosity classes, in: Proceedings of the 17th International Conference on Theory and Application of Cryptographic Techniques, EUROCRYPT’99, Springer-Verlag, Berlin, Heidelberg, (1999), pp. 223–238.

12.

Jost

, Lam

, Maximov

and Smeets

B.J.M.

, Encryption performance improvements of the paillier cryptosystem, IACR Cryptology ePrint Archive2015 (2015), 864.

13.

Rivest

R.L.

, Shamir

and Adleman

, A method for obtaining digital signatures and public-key cryptosystems, Commun ACM21(2) (1978), 120–126. doi: 10.1145/359340.359342

14.

Barker

E.B.

, Chen

, Regenscheid

A.R.

and Smid

M.E.

, Sp 800-56b. recommendation for pair-wise key establishment schemes using integer factorization cryptography, Tech. rep., Gaithersburg, MD, United States (2009).

15.

Barker

E.B.

, Barker

W.C.

, Burr

W.E.

, Polk

W.T.

and Smid

M.E.

, Sp 800-57. recommendation for key management, part 1: General (revised), Tech. rep., Gaithersburg, MD, United States (2007).

16.

Weisstein

E.W.

, Carmichael function (2009).

17.

E. Union, Homomorphic encryption, applications and technology (heat). URL https://heat-project.eu

18.

Homomorphic encryption standardization: An open industry/government/academic consortium to advance secure computation. URL http://homomorphicencryption.org

19.

Fan

and Vercauteren

, Somewhat practical fully homomorphic encryption, IACR Cryptology ePrint Archive2012 (2012), 144.

20.

Brakerski

, Gentry

and Vaikuntanathan

, (leveled) fully homomorphic encryption without bootstrapping, in: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, ITCS –12, ACM, New York, NY, USA, (2012), pp. 309–325. doi:10.1145/2090236.2090262

21.

Bernstein

D.J.

, Fast multiplication and its applications (2003).

22.

Moody

, Chen

, Jordan

, Liu

Y.-K.

, Smith

, Perlner

and Peralta

, Nist report on post-quantum cryptography (04 2016). doi:10.6028/NIST.IR.8105

23.

Greenemeier

, How close are we really to building a quantum computer?

24.

Stern

J.P.

, A new and efficient all-or-nothing disclosure of secrets protocol, in: K. Ohta, D. Pei (Eds.), Advances in Cryptology — ASIACRYPT’98, Springer Berlin Heidelberg, Berlin, Heidelberg, (1998), pp. 357–371.

25.

Williams

E.A.

, Friends, Wideskies:scalable private information retrieval.

26.

Rivest

, The md5 message-digest algorithm (1992).

27.

Blass

E.-O.

, Di Pietro

, Molva

and Önen

, Prism – privacy-preserving search in mapreduce, in: S. Fischer-Hübner, M.Wright (Eds.), Privacy Enhancing Technologies, Springer Berlin Heidelberg, Berlin, Heidelberg, (2012), pp. 180–200.

28.

Mayberry

, Blass

E.-O.

and Chan

A.H.

, Pirmap: Efficient private information retrieval for mapreduce, in: A.-R. Sadeghi (Ed.), Financial Cryptography and Data Security, Springer Berlin Heidelberg, Berlin, Heidelberg, (2013), pp. 371–385.

Homomorphic encryption approach for exploration of sensitive information retrieval

Abstract

Keywords

1 Introduction

2 DB-query-encryption concepts

2.1 Query generation

2.2 Response generation

2.3 Response decryption

3 Homomorphic encryption

3.1 Paillier encryption

3.2.1 Flexibility

3.2.2 Speed

3.2.3 Post-quantum security

3.2.4 Known security

3.2.5 Lower expansion ratio

4 SIR with homomorphic encryption

Table 1 Paillier Cryptosystem key size and respective plaintext and cipher text sizes for three different safety levels Security Level (Bits) Key Size (Bits) Plain Text (Bits) Cipher Text (Bits) 80 1024 128 256 112 2048 256 512 128 3072 384 768

5.2 Handling multiple selectors via slicing

5.5 Putting it all together

6 Distributed Processing

7 Conclusion

Footnotes

References

Table 1
Paillier Cryptosystem key size and respective plaintext and cipher text sizes for three different safety levels

Security Level (Bits) Key Size (Bits) Plain Text (Bits) Cipher Text (Bits)

80 1024 128 256

112 2048 256 512

128 3072 384 768