PrivGRU: Privacy-preserving GRU inference using additive secret sharing

Abstract

Gated Recurrent Unit (GRU) has wide application fields, such as sentiment analysis, speech recognition, and other sequential data processing. For efficient prediction, a growing number of model owners choose to deploy the trained GRU models through the machine-learning-as-a-service method (MLaaS). However, deploying a GRU model in cloud generates privacy issues for both model owners and prediction clients. This paper presents the architecture of PrivGRU and designs the privacy-preserving protocols to complete the secure inference. The protocols include base protocols and principal protocols. Base protocols define basic linear and non-linear computations, while principal protocols construct the gating mechanisms of GRUs. The main benefit of PrivGRU is to address privacy problems while enjoying the efficiency and convenience of MLaaS. The overall secure inference is performed on shares, which retain two properties of security: correctness and privacy. To prove the security, this work adopts Universal Composability (UC) framework with the honest-but-curious corruption model. As each protocol is proved to UC-realize the ideal functionality, it can be arbitrarily composed in any manner. This strong security feature makes PrivGRU more flexible and practical in future implementation.

Keywords

Privacy-preserving MLaaS gated recurrent unit additive secret sharing UC framework

1 Introduction

Different types of machine learning algorithms have widely implemented in many fields, such as financial, healthcare, and security [2]. As deep learning algorithms emerge, Deep Neural Network (DNN) has achieved promising results in various applications. DNN includes three main types: Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Multi-Layer Perceptron (MLP). It is noted that CNN has made great success in image-related tasks, but it has shortcomings for processing sequential data. With regard to the natural language processing tasks, CNN is faced with the problem of learning contextual information precisely because the natural language has variable length [4]. RNN can deal with variable lengths of text sequence in natural language processing tasks and are equipped with internal memory mechanism.

In 2014, GRU, one of the improved versions of RNN, is introduced in [12] to improve the memory consumption and efficiency of long short-term memory (LSTM). In addition to natural language processing (NLP) tasks, it also exists other applications adopting GRU instead of CNN. Many works have proved that GRU performs better than other neural networks in specific fields, such as sentiment analysis [4], spam detection [30], traffic flow prediction [17], and even malware classification [1].

To bring the trained models into practical usage, machine-learning-as-a-service (MLaaS) has become a new buzzword in recent years. The low deployment costs and high computational performances are appealing to model owners. However, as the model owners benefit from effective computational service through MLaaS, they may take the risk of leaking their intellectual property, such as invaluable model weights. On the other hands, it is convenient for clients to upload their data and get the prediction results, but the data may contain confidential or private information which individuals or companies do not want to reveal. Take malware classification as an example, the malware data may contain confidential information about the security breach or hacker campaign which is a secret to individuals or companies. Therefore, the main purpose of this paper is to eliminate data privacy concern while deploying GRU model in cloud.

This paper presents PrivGRU, a system implementing the privacy-preserving inference of GRU. To ensure the privacy of data and model weights, the real values are additively shared between parties so that no individual party owns the private data. PrivGRU extends the scalability of SecureNN [35], which is empirically proved as the most efficient state-of-the-art work of privacy-preserving neural networks. Originally, the protocols in SecureNN [35] are mainly designed for CNN. As GRU is composed of different gates with various linear and non-linear computations [41], this paper designs base protocols for basic computations and principal protocols for GRU gating mechanisms. Base protocols include Π _HP1 , Π _HP2 , Π _Sigmoid , and Π _Tanh , which are designed for Hadamrd Product, Sigmoid activation, and Tanh activation respectively. Both Π _HP1 and Π _HP2 support element-wise matrix multiplication but they require different types of input shares. Π _Sigmoid is used for update gate and reset gate, while Π _Tanh contributes to computing current memory state. Principal protocols define the gating mechanisms of update gate, reset gate, current memory, and activation of current unit, which are represented as Π _UG , Π _RG Π _CM , and Π _AU respectively. This paper ensures the security in atomic operation level and derives the secure combination using UC framework [6, 7].

In summary, the main contributions are listed below:

Present the PrivGRU architecture to address the contradiction between data privacy and efficiency while deploying trained GRU model in cloud.

Design base protocols and principal protocols for secure inference of GRU model using additive secret sharing.

Show our protocols UC-realize corresponding ideal functionalities and infer that the arbitrarily-composed GRU model is also UC-secure.

The remainder of this paper is organized as follows. Section 2 reviews previous works for privacy-preserving machine learning methods. Section 3 describes preliminaries about GRU, additive secret sharing, and UC framework. The architecture of PrivGRU is presented in Section 4 and the privacy-preserving protocols are explained in Section 5. Section 6 demonstrates the security analysis. Finally, the last section concludes the paper and suggests ideas for future works.

2 Related works

2.1 Privacy-preserving machine learning techniques

With the development of cloud computing, how to protect sensitive data has become an important issue [9 , 38]. Especially, privacy-preserving computation has been studied in many papers, which includes perturbation methods or cryptographic protocols. Differential privacy [16] is one of the main perturbation methods for privacy-preserving machine learning, which can be further categorized as output perturbation and objective perturbation mechanisms [21]. The former methods add noise to data before publishing, while the latter mechanisms utilize the noisy function as the machine learning model. However, the two above methods may lead to the decreased accuracy of data mining or machine learning [37].

Cryptographic protocols can preserve the original model accuracy. State-of-the-art research tends to mix different cryptographic primitives to support linear and non-linear computations. Commonly adopted techniques come in three folds: homomorphic encryption, secret sharing [5, 34], and garbled circuits [39]. Garbled circuits are largely used for computing secure non-linear functions in machine learning algorithms [22 , 32]. Although garbled circuits can achieve secure non-linear function evaluation, the trade-off of heavy computation is inefficiency. SecureNN [35] improves the efficiency of non-linear computations by replacing garbled circuits with secret sharing methodology. In most of these papers, the authors implement homomorphic encryption [22, 25] or secret sharing [25 , 35] for secure linear computations.

2.2 State-of-the-art privacy-preserving neural networks

Many state-of-the-art privacy-preserving neural networks have implemented the operations of CNN, and experimented well on MNIST data for hand-written digit image recognition [25 , 35]. However, for the scenario of text representation learning like NLP, RNN is often adopted. In [40], the authors conduct a systematic comparison of CNN and RNN on a wide range of NLP tasks and give the conclusion that the more important the semantic of the sentence is, the more suitable it is to adopt RNN [19, 33]. In addition, RNN algorithms are stated as the backend techniques of Apple Siri [8] and Google voice transcription [11].

In [13], the authors present a framework for privacy-preserving detection of hate speech in text messages with secure multi-party computation. However, they mainly design the protocols for Logistic Regression (LR) or Adaboost model instead of GRU. Other papers focusing on text representation learning evaluate GRU on differentially private data to ensure privacy [3, 23]. Using ε-differential privacy should strike a balance between utility and privacy protection by ε. To the best of our knowledge, there is still seldom research designing cryptographic protocols for privacy-preserving GRU.

As efficiency and accuracy are the important factors for future implementation, this paper presents PrivGRU using additive secret-sharing methods. Protocols in SecureNN [35] are mainly designed for three layers of CNN (convolutional layer, pooling layer, and fully-connected layer) and the ReLU activation function. GRU has different requirements for secure protocols, which are not defined in SecureNN paper [35]. Therefore, our main purpose is to design suitable building blocks for GRU that can preserve privacy in MLaaS fashion.

3 Preliminaries

3.1 Gated recurrent unit (GRU)

GRU is one of the variants of RNN [12]. The designer proposes a new type of hidden unit activated by the update gate and reset gate. The functionality of the hidden unit is to remember context information in sequential data and the architecture of GRU can solve vanishing gradient problems. A complete GRU layer is composed of a series of units, and the following Eq. 1 to Eq. 4 demonstrates the basic components within j-th unit. Initially, the current feature and previous hidden state are acted as the unit input to calculate the current hidden state. The update gate Z [j] is a variable that allows the model to control how much of the current state is referenced from the previous state, which is defined by Equation 1 [12, 41]: $Z [j] = σ (D [j] W_{z} [j] + H [j - 1] U_{z} [j] + b_{z} [j])$ (1) where D [j] and W_z [j] are the j-th unit of current feature and weight, while H [j - 1] and U_z [j] stands for previous hidden state and weight. σ represents the Sigmoid activation function and outputs the current update gate value Z [j]. On the other hands, the reset gate is used to decide how much of the previous state to abandon, which is given by Equation 2 [12, 41]: $R [j] = σ (D [j] W_{r} [j] + H [j - 1] U_{r} [j] + b_{r} [j])$ (2)

The equation of the reset gate seems similar to the update gate, but the usage and the weights are different. The reset gate is utilized in calculating the current memory content H [j] ′, and the actual activation of the current unit H [j] is mainly determined by update gate and H [j] ′. The functions are shown as Eq. 3 and Eq. 4 respectively [12, 41]:

$\begin{matrix} H [j]^{'} = & \tanh (D [j] W [j] \\ + (R [j] ⊙ H [j - 1]) U [j] + b [j]) \end{matrix}$ (3) $H [j] = Z [j] ⊙ H [j - 1] + (1 - Z [j]) ⊙ H [j]^{'}$ (4) where ⊙ is represented as Hadamard product, also known as element-wise matrix multiplication. The result of H [j] is the output of j-th unit and the output dimension of GRU layer depends on the number of units.

3.2 Additive secret sharing

Secret sharing was introduced by Shamir [34] and Blakley [5] in 1979. It has been implemented in many scenarios, such as bitcoin signatures [36], digital rights management [26], and privacy-preserving machine learning [25 , 35]. The overall computations of GRU can be divided into linear computations and non-linear computations. For linear computations, addition and subtraction between secrets or secrets with plaintext value can be done in the local side of share owners; multiplication or division between shares should depend on secure computation protocols [35]. For non-linear computations, neural networks require different activation functions. Depending on the gating mechanisms in GRU, this paper designs privacy-preserving protocols for Sigmoid and Tanh activation based on 2-out-of-2 additive secret sharing.

3.3 Universal composability (UC) framework

UC framework is presented by Canetti et al. [6, 7], which gives very strong definitions of security that can support the modularity of protocols. The main difference between the UC framework and the stand-alone model like simulation paradigm lies in the concurrent interaction during the execution of protocol. UC framework considers threats coming from the execution environment, any malicious requests from the adversary, and concurrent execution with other protocols. An additional entity, environment, is added as a distinguisher to determine if the protocols are UC-secure. To prove a protocol Π UC-realizes the corresponding ideal functionality $F$ , one needs to first prove Π UC-emulates $F$ (Definition 1 [7]). Therefore, an ideal world running $F$ with the dummy parties and simulator needs to be constructed and conforms to Definition 2 [7].

Definition 1. Protocol Π UC-realizes ideal functionality $F$ if Π UC-emulates $F$ .

Definition 2. Let Π( $P_{0}$ , $P_{1}$ , $P_{2}$ ) be a multi-party protocol that emulates the computation of ideal functionality $F$ ( $P_{0}$ , $P_{1}$ , $P_{2}$ ) which is secure by definition. If for every possible real-world adversary $A$ during the execution of Π( $P_{0}$ , $P_{1}$ , $P_{2}$ ), there exists an ideal-world simulator $S$ perfectly simulating $A$ and $F$ ( $P_{0}$ , $P_{1}$ , $P_{2}$ ) such that the view of environment $Z$ between real-world $V_{Z, A, Π}$ and ideal-world $V_{Z, S, F}$ is indistinguishable, then Π( $P_{0}$ , $P_{1}$ , $P_{2}$ ) UC-emulates $F$ ( $P_{0}$ , $P_{1}$ , $P_{2}$ ).

Protocols that UC-realize corresponding functionalities can securely combine with other UC-secure protocols regardless of the environment or other concurrent executing processes. The formal description of universal composition theorem is presented in Definition 3 [7]. Typically, the universal composition operation can be viewed as âĂIJsubroutine substitutionâĂİž [7]. If $Π_{B}$ calls $F_{A}$ as its subroutine and $Π_{A}$ UC-realizes $F_{A}$ , the parties executing $Π_{B}$ can substitute a call to an instance of $Π_{A}$ for an instance of $F_{A}$ because they achieve the same security under arbitrary composition. Accordingly, consider a protocol Π invokes other ideal functionalities ( $F_{A}$ , $F_{B}, \dots, F_{N}$ ) as subroutines, Π formalized as ( $F_{A}$ , $F_{B}, \dots, F_{N}$ )-hybrid model can securely invoke designed privacy-preserving protocols using universal composition theorem.

Definition 3. (Universal Composition Theorem [7]) Let $Π_{A}$ be a protocol that UC-realizes $F_{A}$ , while $Π_{B}$ be a composed protocol that UC-realizes $F_{B}$ with a call to $F_{A}$ as its subroutine. If Π_A is identity-compatible with $F_{A}$ and $F_{B}$ , protocol $Π_{B}^{F_{A} \to Π_{A}}$ UC-realizes $F_{B}$ .

4 PrivGRU architecture

This paper explores the techniques in [35] and designs the protocols especially for GRU. The architecture of PrivGRU is shown in Fig. 1. This figure takes sentiment analysis as an example to explain the secure inference process. The main roles include three parts: the model owner, prediction client, and the secure inference servers in cloud. Most of the computations in PrivGRU are based on massive matrix computations, and each element in matrices is represented over the ring $ℤ_{L}$ , where $L = 2^{l}, l \in ℤ^{+}$ . The model weights (W), data from clients (D), and the results of prediction (r) remain secret in this architecture. Initially, the prediction client and the model owner will split the secrets through $Share (X) = 〈 X 〉_{0}^{L}$ , $〈 X 〉_{1}^{L}$ , where $〈 X 〉_{i}^{L}$ is denoted as the share of X over the ring $ℤ_{L}$ sent to $P_{i}$ . The process of Share (X) is to first choose $R \in ℤ_{L}$ randomly, where the dimension of R is the same as X. Then, the system sets the $〈 X 〉_{i}^{L}$ as R and $〈 X 〉_{1 - i}^{L}$ as X - R (mod L) [35]. $P_{0}$ , $P_{1}$ , $P_{2}$ cooperate with each other to perform GRU inference through privacy-preserving protocols (represented by dashed double-headed arrows). When all of the shares are collected finally by the prediction client, the final result r is reconstructed through Reconstruct( $〈 r 〉_{0}^{L}$ , $〈 r 〉_{1}^{L}$ ) = $〈 r 〉_{0}^{L}$ + $〈 r 〉_{1}^{L} (mod L)$ [14].

Fig. 1

Our PrivGRU high-level architecture.

4.1 Model owner

A basic GRU model may contain the embedding layer, GRU layer, and fully-connected layer shown in Fig. 2. The trained weights from these layers are generally denoted as W. The model owner generates the shares of the model weights $〈 W 〉_{0}^{L}$ , $〈 W 〉_{1}^{L}$ locally and sends the corresponding part to $P_{0}$ , $P_{1}$ to deploy the model in cloud. The model owners can benefit from MLaaS not only for the efficiency but also for the convenience of updating maintenance. Moreover, the privacy issues are addressed by applying PrivGRU architecture and privacy-preserving protocols.

Fig. 2

Atomic operations of GRU secure inference.

4.2 Prediction client

When prediction clients have the requirements of model prediction, they convert the documents to mathematical representation and generate the shares $〈 D 〉_{0}^{L}$ and $〈 D 〉_{1}^{L}$ before sending to $P_{0}$ and $P_{1}$ . All protocols of model inference are designed for shares, so that none of the $P_{i}$ can gain the complete secret from their part of data. Finally, the prediction result r is reconstructed by the client after receiving the shares $〈 r 〉_{0}^{L}$ and $〈 r 〉_{1}^{L}$ .

4.3 Secure inference servers in cloud

The secure inference is based on three independent servers denoted as $P_{0}$ , $P_{1}$ , and $P_{2}$ shown in Fig. 1. $P_{0}$ and $P_{1}$ get the shares of model weights and data from the model owner and the prediction client respectively, while $P_{2}$ acts as the trusted initializer to generate correlated randomness. A simple example of inference flow is demonstrated in Fig. 2, which includes the embedded layer, GRU layer, and the fully-connected layer. This paper designs base and principal protocols for computations of GRU model. Basic operations include matrix multiplication, Hadamard product, Sigmoid activation function, and Tanh activation function. Matrix multiplication on shares has already been defined in SecureNN as Π _MatMul [35]. Accordingly, this paper develops new base protocols for remaining operations. Principal protocols define the gating mechanisms within the GRU unit. As the protocols are designed in a modular manner, PrivGRU is flexible in model structure for different scenarios.

5 Privacy-preserving protocols

In this section, four privacy-preserving protocols are introduced as base protocols: Π _HP1 , Π _HP2 , Π _Sigmoid , and Π _Tanh . Then, the principal protocols for gating mechanisms in GRU layer are presented: Π _UG , Π _RG , Π _CM , and Π _AU . These protocols act as the building blocks to construct the secure inference of GRU layer and the whole network.

5.1 Base protocols

5.1.1 Hadamard product

This paper designs two protocols for securely evaluating element-wise matrix multiplication. The difference lies in the inputs of protocol. In Π _HP1 , $P_{0}$ and $P_{1}$ hold shares of X and Y respectively ( $P_{0}$ holds $〈 X 〉_{0}^{L}, 〈 Y 〉_{0}^{L}$ and $P_{1}$ holds $〈 X 〉_{1}^{L}, 〈 Y 〉_{1}^{L}$ ); in comparison, in Π _HP2 , the inputs of $P_{0}$ and $P_{1}$ are individual matrices not the typical shares ( $P_{0}$ holds X and $P_{1}$ holds Y). Π _HP1 is adapted from Π _MatMul in SecureNN [35], which is used to calculate GRU current memory and activation of current unit. On the other hands, Π _HP2 is adapted from A-SS Engine in Chameleon [31] to support Π _Sigmoid and Π _Tanh computations. Our protocol relies on the Du-Atallah protocol [15] to perform multiplication on each matrix element of additive shared values. In addition to generating matrix sizes of multiplication triples (MT): C = A ⊙ B [20], $P_{2}$ also generates the shares before sending to $P_{0}$ and $P_{1}$ . The shares of zero matrix U are the common randomness and will be added to the final results. The usage of common randomness can ensure the fresh share of the outputs. Besides, the main reason for separating Π _HP1 and Π _HP2 is the consideration of efficiency. The comparison of two protocols is shown in Table 1.

Table 1
Comparison of Π _HP1 and Π _HP2

Protocol Usage Rounds Communication

Hadamard product 1 Π _HP1 Current memory Π _CM and activation of unit Π _AU 2 12mnl

Hadamard product 2 Π _HP2 Π _Sigmoid and Π _Tanh 2 8mnl

Protocol	Usage	Rounds	Communication
Hadamard product 1 Π _HP1	Current memory Π _CM and activation of unit Π _AU	2	12mnl
Hadamard product 2 Π _HP2	Π _Sigmoid and Π _Tanh	2	8mnl

5.1.2 Sigmoid activation function

Sigmoid activation function is one of the non-linear functions in the GRU layer. Both update gate and reset gate apply Sigmoid activation function to determine how much the previous data can be kept or discarded. It is a non-linear function that transforms input values into the interval of 0 and 1.

Protocol 1 Hadamard Product 1 Π _HP1 ( $P_{0}$ , $P_{1}$ , $P_{2}$ )

Input: $P_{0}$ holds $〈 X 〉_{0}^{L}, 〈 Y 〉_{0}^{L} \in ℤ_{L}^{m \times n}$ and $P_{1}$ holds $〈 X 〉_{1}^{L}, 〈 Y 〉_{1}^{L} \in ℤ_{L}^{m \times n}$ .

Output: $P_{0}$ , $P_{1}$ obtain $〈 X ⊙ Y 〉_{0}^{L}$ and $〈 X ⊙ Y 〉_{1}^{L}$ respectively.

1: TrustedInitialization: $P_{2}$ picks a zero matrix $U \in ℤ_{L}^{m \times n}$ and random matrices A, $B \in ℤ_{L}^{m \times n}$ , where C = A ⊙ B. Then, $P_{2}$ generates shares by calling $Share (U) = 〈 U 〉_{0}^{L}$ , $〈 U 〉_{1}^{L}$ ; $Share (A) = 〈 A 〉_{0}^{L}$ , $〈 A 〉_{1}^{L}$ ; $Share (B) = 〈 B 〉_{0}^{L}$ , $〈 B 〉_{1}^{L}$ ; $Share (C) = 〈 C 〉_{0}^{L}$ , $〈 C 〉_{1}^{L}$ . Finally, $P_{2}$ sends $〈 A 〉_{0}^{L}$ , $〈 B 〉_{0}^{L}$ , $〈 C 〉_{0}^{L}$ , $〈 U 〉_{0}^{L}$ to $P_{0}$ and sends $〈 A 〉_{1}^{L}$ , $〈 B 〉_{1}^{L}$ , $〈 C 〉_{1}^{L}$ , $〈 U 〉_{1}^{L}$ to $P_{1}$ .

2: $P_{0}$ sets $〈 D 〉_{0}^{L} = 〈 X 〉_{0}^{L} + 〈 A 〉_{0}^{L}$ , $〈 E 〉_{0}^{L} = 〈 Y 〉_{0}^{L} + 〈 B 〉_{0}^{L}$ and $P_{1}$ sets $〈 D 〉_{1}^{L} = 〈 X 〉_{1}^{L} + 〈 A 〉_{1}^{L}$ , $〈 E 〉_{1}^{L} = 〈 Y 〉_{1}^{L} + 〈 B 〉_{1}^{L}$ .

3: $P_{0}$ sends $〈 D 〉_{0}^{L}$ , $〈 E 〉_{0}^{L}$ to $P_{1}$ , while $P_{1}$ sends $〈 D 〉_{1}^{L}$ , $〈 E 〉_{1}^{L}$ to $P_{0}$ . Then, $P_{0}$ and $P_{1}$ can set D = Reconstruct( $〈 D 〉_{0}^{L}$ , $〈 D 〉_{1}^{L}$ ) and E = Reconstruct( $〈 E 〉_{0}^{L}$ , $〈 E 〉_{1}^{L}$ ).

4: $P_{0}$ sets $〈 X ⊙ Y 〉_{0}^{L} = - D ⊙ E + 〈 X 〉_{0}^{L} ⊙ E + D ⊙ 〈 Y 〉_{0}^{L} + 〈 C 〉_{0}^{L} + 〈 U 〉_{0}^{L}$ and $P_{1}$ sets $〈 X ⊙ Y 〉_{1}^{L} = 〈 X 〉_{1}^{L} ⊙ E + D ⊙ 〈 Y 〉_{1}^{L} + 〈 C 〉_{1}^{L} + 〈 U 〉_{1}^{L}$ .

Protocol 2 Hadamard Product 2Π _HP2 ( $P_{0}$ , $P_{1}$ , $P_{2}$ )

Input: $P_{0}$ holds $X \in ℤ_{L}^{m \times n}$ and $P_{1}$ holds $Y \in ℤ_{L}^{m \times n}$ .

Output: $P_{0}$ , $P_{1}$ obtain $〈 X ⊙ Y 〉_{0}^{L}$ and $〈 X ⊙ Y 〉_{1}^{L}$ respectively.

2: $P_{0}$ sets D = X + A and sends D to $P_{1}$ . Similarly, $P_{1}$ sets E = Y + B and sends E to $P_{0}$ .

3: $P_{0}$ sets $〈 X ⊙ Y 〉_{0}^{L} = - A ⊙ E + 〈 C 〉_{0}^{L} + 〈 U 〉_{0}^{L}$ and $P_{1}$ sets $〈 X ⊙ Y 〉_{1}^{L} = Y ⊙ D + 〈 C 〉_{1}^{L} + 〈 U 〉_{1}^{L}$ .

Protocol 3 Sigmoid Π _Sigmoid ( $P_{0}$ , $P_{1}$ , $P_{2}$ )

Input: $P_{0}$ , $P_{1}$ hold $〈 X 〉_{0}^{L}$ , $〈 X 〉_{1}^{L} \in ℤ_{L}^{m \times n}$ respectively.

Output: $P_{0}$ , $P_{1}$ get $〈 R 〉_{0}^{L}$ and $〈 R 〉_{1}^{L}$ respectively.

1: fork = {1, ⋯ , m × n} do

2: $P_{0}$ computes $α [k] = e^{〈 X [k] 〉_{0}^{L}}$ and $P_{1}$ computes $β [k] = e^{〈 X [k] 〉_{1}^{L}}$ .

3: end for

4: $P_{0}$ , $P_{1}$ , and $P_{2}$ call $F_{HP 2} (P_{0}, P_{1}, P_{2})$ with $P_{0}$ and $P_{1}$ having input α and β respectively. Then, $P_{0}$ learns $〈 Y 〉_{0}^{L}$ and $P_{1}$ learns $〈 Y 〉_{1}^{L}$ .

5: fort = {1, ⋯ , m × n} do

6: $P_{0}$ , $P_{1}$ , and $P_{2}$ call $F_{DIV} (P_{0}, P_{1}, P_{2})$ with P_i, i ∈ {0, 1} having input $(〈 Y [t] 〉_{i}^{L}, 〈 Y [t] 〉_{i}^{L} + i)$ and output $〈 R [t] 〉_{0}^{L}$ and $〈 R [t] 〉_{1}^{L}$ .

7: end for

8: Finally, $P_{0}$ , $P_{1}$ get $〈 R 〉_{0}^{L}$ and $〈 R 〉_{1}^{L}$ respectively.

Π _Sigmoid is a $(F_{HP 2}$ , $F_{DIV})$ -hybrid model. At first, $P_{0}$ holds $〈 X 〉_{0}^{L}$ and $P_{1}$ holds $〈 X 〉_{1}^{L}$ . After calculating the exponential of e using original shares of input, they should multiply the results using $F_{HP 2}$ to recombine the additive shares in exponent. In the final step, $P_{0}$ and $P_{1}$ gets $〈 R 〉_{0}^{L}$ and $〈 R 〉_{1}^{L}$ respectively by calling $F_{DIV}$ in SecureNN [35].

5.1.3 Tanh activation function

The computation flow of Tanh activation function is similar to Sigmoid activation function. The difference is that Tanh activation function transforms the values into the interval of -1 and 1 instead of 0 and 1.

Protocol 4 Tanh Π _Tanh ( $P_{0}$ , $P_{1}$ , $P_{2}$ )

Input: $P_{0}$ , $P_{1}$ hold $〈 X 〉_{0}^{L}$ , $〈 X 〉_{1}^{L} \in ℤ_{L}^{m \times n}$ respectively.

Output: $P_{0}$ , $P_{1}$ get $〈 R 〉_{0}^{L}$ and $〈 R 〉_{1}^{L}$ respectively.

1: fork = {1, ⋯ , m × n} do

2: $P_{0}$ sets $α [k] = e^{〈 X [k] 〉_{0}^{L} \times 2}$ and $P_{1}$ sets $β [k] = e^{〈 X [k] 〉_{1}^{L} \times 2}$ .

3: end for

5: fort = {1, ⋯ , m × n} do

6: $P_{0}$ , $P_{1}$ , and $P_{2}$ call $F_{DIV} (P_{0}, P_{1}, P_{2})$ with P_i, i ∈ {0, 1} having input $(〈 Y [t] 〉_{i}^{L} - i, 〈 Y [t] 〉_{i}^{L} + i)$ and output $〈 R [t] 〉_{0}^{L}$ and $〈 R [t] 〉_{1}^{L}$ .

7: end for

8: Finally, $P_{0}$ , $P_{1}$ get $〈 R 〉_{0}^{L}$ and $〈 R 〉_{1}^{L}$ respectively.

5.2 Principal protocols

As the base protocols for GRU are defined, it is convenient to construct principal secure inference protocols in GRU based on these protocols. Protocol 5 to Protocol 8 design secure inference within one unit of GRU and the GRU layer is composed of a sequence of units.

5.2.1 Update gate and reset gate

The privacy preserving protocols of update gate and reset gate are demonstrated in Protocol 5 and Protocol 6 respectively. Both Protocol 5 and Protocol 6 are ( $F_{MatMul}$ [35], $F_{Sigmoid}$ )-hybrid model. The output dimension of embedding layer is denoted as d and the batch size is denoted as n. The number of hidden unit is represented as h. The inputs of Π _UG and Π _RG are defined by Eq. 1 and Eq. 2.

Protocol 5 Update Gate Π _UG ( $P_{0}$ , $P_{1}$ , $P_{2}$ )

Input: $P_{i}$ holds $〈 D 〉_{i}^{L} \in ℤ_{L}^{n \times d}, 〈 H 〉_{i}^{L} \in ℤ_{L}^{n \times h}, 〈 W_{z} 〉_{i}^{L} \in ℤ_{L}^{d \times h}, 〈 U_{z} 〉_{i}^{L} \in ℤ_{L}^{h \times h}, 〈 b_{z} 〉_{i}^{L} \in ℤ_{L}^{1 \times h}$ , for i ∈ {0, 1}.

Output: $P_{0}$ , $P_{1}$ get $〈 Z 〉_{0}^{L}$ and $〈 Z 〉_{1}^{L}$ respectively.

1: $P_{0}$ , $P_{1}$ , and $P_{2}$ call $F_{MatMul} (P_{0}, P_{1}, P_{2})$ with $P_{i}$ having input $〈 D 〉_{i}^{L}$ and $〈 W_{z} 〉_{i}^{L}$ , for i ∈ {0, 1}. Then, $P_{0}$ and $P_{1}$ learn $〈 Z 1 〉_{0}^{L}$ and $〈 Z 1 〉_{1}^{L} \in ℤ_{L}^{n \times h}$ respectively.

2: $P_{0}$ , $P_{1}$ , and $P_{2}$ call $F_{MatMul} (P_{0}, P_{1}, P_{2})$ with $P_{i}$ having input $〈 H 〉_{i}^{L}$ and $〈 U_{z} 〉_{i}^{L}$ , for i ∈ {0, 1}. Then, $P_{0}$ and $P_{1}$ learn $〈 Z 2 〉_{0}^{L}$ and $〈 Z 2 〉_{1}^{L} \in ℤ_{L}^{n \times h}$ respectively.

3: $P_{i}$ sets $〈 Z 3 〉_{0}^{L}$ = $〈 Z 1 〉_{i}^{L}$ + $〈 Z 2 〉_{i}^{L}$ + $〈 b_{z} 〉_{i}^{L}$ , for i ∈ {0, 1}.

4: $P_{0}$ , $P_{1}$ , and $P_{2}$ call $F_{Sigmoid} (P_{0}, P_{1}, P_{2})$ with $P_{0}$ , $P_{1}$ having input $〈 Z 3 〉_{0}^{L}, 〈 Z 3 〉_{1}^{L}$ respectively. Finally $P_{0}$ learns $〈 Z 〉_{0}^{L}$ and $P_{1}$ learns $〈 Z 〉_{1}^{L}$ .

Protocol 6 Reset Gate Π _RG ( $P_{0}$ , $P_{1}$ , $P_{2}$ )

Input: $P_{i}$ holds $〈 D 〉_{i}^{L} \in ℤ_{L}^{n \times d}, 〈 H 〉_{i}^{L} \in ℤ_{L}^{n \times h}, 〈 W_{r} 〉_{i}^{L} \in ℤ_{L}^{d \times h}, 〈 U_{r} 〉_{i}^{L} \in ℤ_{L}^{h \times h}, 〈 b_{r} 〉_{i}^{L} \in ℤ_{L}^{1 \times h}$ , for i ∈ {0, 1}.

Output: $P_{0}$ , $P_{1}$ get $〈 R 〉_{0}^{L}$ and $〈 R 〉_{1}^{L}$ respectively.

1: $P_{0}$ , $P_{1}$ , and $P_{2}$ call $F_{MatMul} (P_{0}, P_{1}, P_{2})$ with $P_{i}$ having input $〈 D 〉_{i}^{L}$ and $〈 W_{r} 〉_{i}^{L}$ , for i ∈ {0, 1}. Then, $P_{0}$ and $P_{1}$ learn $〈 R 1 〉_{0}^{L}$ and $〈 R 1 〉_{1}^{L} \in ℤ_{L}^{n \times h}$ respectively.

2: $P_{0}$ , $P_{1}$ , and $P_{2}$ call $F_{MatMul} (P_{0}, P_{1}, P_{2})$ with $P_{i}$ having input $〈 H 〉_{i}^{L}$ and $〈 U_{r} 〉_{i}^{L}$ , for i ∈ {0, 1}. Then, $P_{0}$ and $P_{1}$ learn $〈 R 2 〉_{0}^{L}$ and $〈 R 2 〉_{1}^{L} \in ℤ_{L}^{n \times h}$ respectively.

3: $P_{i}$ sets $〈 R 3 〉_{0}^{L}$ = $〈 R 1 〉_{i}^{L}$ + $〈 R 2 〉_{i}^{L}$ + $〈 b_{r} 〉_{i}^{L}$ , for i ∈ {0, 1}.

4: $P_{0}$ , $P_{1}$ , and $P_{2}$ call $F_{Sigmoid} (P_{0}, P_{1}, P_{2})$ with $P_{0}$ , $P_{1}$ having input $〈 R 3 〉_{0}^{L}, 〈 R 3 〉_{1}^{L}$ respectively. Finally $P_{0}$ learns $〈 R 〉_{0}^{L}$ and $P_{1}$ learns $〈 R 〉_{1}^{L}$ .

5.2.2 Current memory

Protocol 7 is a $(F_{MatMul}$ [35], $F_{HP 1}$ , $F_{Tanh})$ -hybrid model. One of the input shares $〈 R 〉_{i}^{L}$ comes from the results of Π _RG , and the other inputs are the weights and data that are defined by Eq. 3.

Protocol 7 Current Memory Π _CM ( $P_{0}$ , $P_{1}$ , $P_{2}$ )

Input: $P_{i}$ holds $〈 D 〉_{i}^{L} \in ℤ_{L}^{n \times d}, 〈 H 〉_{i}^{L} \in ℤ_{L}^{n \times h}, 〈 W 〉_{i}^{L} \in ℤ_{L}^{d \times h}, 〈 R 〉_{i}^{L} \in ℤ_{L}^{n \times h}, 〈 U 〉_{i}^{L} \in ℤ_{L}^{h \times h}, 〈 b 〉_{i}^{L} \in ℤ_{L}^{1 \times h}$ , for i ∈ {0, 1}.

Output: $P_{0}$ , $P_{1}$ get $〈 H^{'} 〉_{0}^{L}$ and $〈 H^{'} 〉_{1}^{L}$ respectively.

1: $P_{0}$ , $P_{1}$ , and $P_{2}$ call $F_{MatMul} (P_{0}, P_{1}, P_{2})$ with $P_{i}$ having input $〈 D 〉_{i}^{L}$ and $〈 W 〉_{i}^{L}$ , for i ∈ {0, 1}. Then, $P_{0}$ and $P_{1}$ learn $〈 H 1^{'} 〉_{0}^{L}$ and $〈 H 1^{'} 〉_{1}^{L} \in ℤ_{L}^{n \times h}$ respectively.

2: $P_{0}$ , $P_{1}$ , and $P_{2}$ call $F_{HP 1} (P_{0}, P_{1}, P_{2})$ with $P_{i}$ having input $〈 R 〉_{i}^{L}$ and $〈 H 〉_{i}^{L}$ , for i ∈ {0, 1}. Then, $P_{0}$ and $P_{1}$ learn $〈 H 2^{'} 〉_{0}^{L}$ and $〈 H 2^{'} 〉_{1}^{L} \in ℤ_{L}^{n \times h}$ respectively.

3: $P_{0}$ , $P_{1}$ , and $P_{2}$ call $F_{MatMul} (P_{0}, P_{1}, P_{2})$ with $P_{i}$ having input $〈 H 2^{'} 〉_{i}^{L}$ and $〈 U 〉_{i}^{L}$ , for i ∈ {0, 1}. Then, $P_{0}$ and $P_{1}$ learn $〈 H 3^{'} 〉_{0}^{L}$ and $〈 H 3^{'} 〉_{1}^{L} \in ℤ_{L}^{n \times h}$ respectively.

4: $P_{i}$ sets $〈 H 4^{'} 〉_{0}^{L}$ = $〈 H 1^{'} 〉_{i}^{L}$ + $〈 H 3^{'} 〉_{i}^{L}$ + $〈 b 〉_{i}^{L}$ , for i ∈ {0, 1}.

5: $P_{0}$ , $P_{1}$ , and $P_{2}$ call $F_{Tanh} (P_{0}, P_{1}, P_{2})$ with $P_{0}$ , $P_{1}$ having input $〈 H 4^{'} 〉_{0}^{L}, 〈 H 4^{'} 〉_{1}^{L}$ respectively. Finally $P_{0}$ learns $〈 H^{'} 〉_{0}^{L}$ and $P_{1}$ learns $〈 H^{'} 〉_{1}^{L}$ .

5.2.3 Activation of current unit

Protocol 8 is a $F_{HP 1}$ -hybrid model. It is noted that the output of the $F_{AU}$ becomes one of the inputs of the next unit of GRU. That’s the core of GRU network to learn the context of data.

Protocol 8 Activation of Unit Π _AU ( $P_{0}$ , $P_{1}$ , $P_{2}$ )

Input: $P_{i}$ holds $〈 Z 〉_{i}^{L} \in ℤ_{L}^{n \times d}, 〈 H 〉_{i}^{L} \in ℤ_{L}^{n \times h}, 〈 H^{'} 〉_{i}^{L} \in ℤ_{L}^{n \times h}$ , for i ∈ {0, 1}.

Output: $P_{0}$ , $P_{1}$ get $〈 C 〉_{0}^{L}$ and $〈 C 〉_{1}^{L}$ respectively.

1: $P_{0}$ , $P_{1}$ , and $P_{2}$ call $F_{HP 1} (P_{0}, P_{1}, P_{2})$ with $P_{i}$ having input $〈 Z 〉_{i}^{L}$ and $〈 H 〉_{i}^{L}$ , for i ∈ {0, 1}. Then, $P_{0}$ and $P_{1}$ learn $〈 C 1 〉_{0}^{L}$ and $〈 C 1 〉_{1}^{L} \in ℤ_{L}^{n \times h}$ respectively.

2: $P_{0}$ , $P_{1}$ , and $P_{2}$ call $F_{HP 1} (P_{0}, P_{1}, P_{2})$ with $P_{i}$ having input $〈 1 - Z 〉_{i}^{L}$ and $〈 H^{'} 〉_{i}^{L}$ , for i ∈ {0, 1}. Then, $P_{0}$ and $P_{1}$ learn $〈 C 2 〉_{0}^{L}$ and $〈 C 2 〉_{1}^{L} \in ℤ_{L}^{n \times h}$ respectively.

3: $P_{i}$ sets $〈 C 〉_{0}^{L}$ = $〈 C 1 〉_{i}^{L}$ + $〈 C 2 〉_{i}^{L}$ , for i ∈ {0, 1}.

5.2.4 Putting it all together

In a unit of GRU, $P_{0}$ , $P_{1}$ , and $P_{2}$ compute the update gate, reset gate, current memory, and activation through Π _UG , Π _RG , Π _CM , and Π _AU . Therefore, Π _GRU is a ( $F_{UG}$ , $F_{RG}$ , $F_{CM}$ , $F_{AU}$ )-hybrid model. The same modular construction is inferred to the whole GRU layer. The base protocols support the principal protocols and these protocols are stacked up to compose the whole GRU network.

6 Security analysis

This paper uses UC framework [6, 7] to prove the security of privacy-preserving protocols, which should conform to Definition 3.3 and Definition 3.3. UC is the strictest simulation-based proof that allows any modular composition and concurrent execution. As PrivGRU is a framework that defines protocols for atomic operations, the real-world implementation may vary by composing protocols in a different manner. With universal composition theorem (Definition 3.3), the privacy-preserving protocols can be arbitrarily composed and still remain UC-secure. This paper defines security for two requirements: correctness and privacy [24]. The correctness is proved by comparing the protocol reconstruction result with the output of corresponding ideal functionality. The privacy is confirmed by indistinguishability from the environment view between the real-world execution and the ideal-world execution.

6.1 Security model

In this work, honest-but-curious model (semi-honest adversary) is adopted to simulate the corruption in the UC framework [6 , 18]. Especially, Π _HP1 and Π _HP2 use trusted initialization as the setup assumption. As $P_{2}$ is the trusted initializer in our settings, the adversary can only turn either $P_{0}$ or $P_{1}$ adversarial and observe all the internal states during computation. The previous section has designed the base and principal protocols for the real world, while the ideal world functionalities are defined in the following sub-sections. In the ideal world, the simulator $S$ executing the functionalities runs an internal copy of real-world adversary. With the UC framework, by corrupting one of the $P_{0}$ or $P_{1}$ , $S$ can get access to the internal state of the corrupted party as well as all the messages transmitted in the network except for the part during trusted initialization. In addition, the adversary can interact with the environment $Z$ at any time. $Z$ is the interactive distinguisher between real world and ideal world through all of the messages from its point of view, including the view of $S$ and the input/output values.

6.2 Security of base protocols

Functionality $F_{HP 1} (P_{0}, P_{1}, P_{2})$

$F_{HP 1}$ interacted with $P_{0}, P_{1}, P_{2}$ , adversary $S$ is parameterized by ring L and matrix size (m, n).

Input: Upon receiving ( $〈 X 〉_{0}^{L}$ , $〈 Y 〉_{0}^{L}$ ) from $P_{0}$ and ( $〈 X 〉_{1}^{L}$ , $〈 Y 〉_{1}^{L}$ ) from $P_{1}$ , verify if all the shares $\in ℤ_{L}^{m \times n}$ .

Process:

X = Reconstruct( $〈 X 〉_{0}^{L}$ , $〈 X 〉_{1}^{L}$ )

Y = Reconstruct( $〈 Y 〉_{0}^{L}$ , $〈 Y 〉_{1}^{L}$ )

Z = X ⊙ Y

( $〈 Z 〉_{0}^{L}$ , $〈 Z 〉_{1}^{L}$ ) = Share(Z)

Ouput: Send $〈 Z 〉_{0}^{L}$ to $P_{0}$ and $〈 Z 〉_{1}^{L}$ to $P_{1}$ .

Functionality $F_{HP 2} (P_{0}, P_{1}, P_{2})$

$F_{HP 2}$ interacted with $P_{0}, P_{1}, P_{2}$ , adversary $S$ is parameterized by ring L and matrix size (m, n).

Input: Upon receiving X from $P_{0}$ and Y from $P_{1}$ , verify if X, Y $\in ℤ_{L}^{m \times n}$ .

Process:

Z = X ⊙ Y

( $〈 Z 〉_{0}^{L}$ , $〈 Z 〉_{1}^{L}$ ) = Share(Z)

Ouput: Send $〈 Z 〉_{0}^{L}$ to $P_{0}$ and $〈 Z 〉_{1}^{L}$ to $P_{1}$ .

Theorem 1. Π _HP1 and Π _HP2 UC-realize $F_{HP 1}$ and $F_{HP 2}$ respectively with the setup of trusted initialization.

Proof 1. First, to prove the correctness of Π _HP1 and Π _HP2 , the reconstruction of shared results should be equal to X ⊙ Y. In the final step of Π _HP1 , $P_{0}$ holds $〈 X ⊙ Y 〉_{0}^{L} = - A ⊙ E + 〈 C 〉_{0}^{L} + U_{0}$ and $P_{1}$ holds $〈 X ⊙ Y 〉_{1}^{L} = Y ⊙ D + 〈 C 〉_{1}^{L} + U_{1}$ . Accordingly, $〈 X ⊙ Y 〉_{0}^{L} + 〈 X ⊙ Y 〉_{1}^{L} = - A ⊙ Y - A$ $⊙ B + 〈 A ⊙ B 〉_{0}^{L} + U_{0} + Y ⊙ X + Y ⊙ A + 〈 A ⊙ B 〉_{1}^{L} + U_{1} = X ⊙ Y$ . Π_HP2 can also be proved correct by reconstructing $〈 X ⊙ Y 〉_{0}^{L}$ and $〈 X ⊙ Y 〉_{1}^{L}$ .

Second, the privacy of Π _HP1 and Π _HP2 can be proved using Definition 3.3 and Definition 3.3. For the simulation of Π _HP1 , $S$ is assumed to control the network transmission; therefore, $S$ can view: $〈 D 〉_{0}^{L}$ , $〈 D 〉_{1}^{L}$ , $〈 E 〉_{0}^{L}$ , $〈 E 〉_{1}^{L}$ . As all of the secrets are masked by multiplication triples, the values are random to $S$ . For the corruption model, if $S$ corrupts $P_{0}$ , $S$ can not infer any secrets from the observed values as $P_{0}$ only holds one part of the secret sharings during computation. The simulation of corrupting $P_{1}$ is the same as the case of corrupting $P_{0}$ . No matter which party $S$ corrupts, $S$ can not infer the secrets from the held part of values and the transmitted messages. Note that all the multiplication triples, transmitted matrices, input or final results held by $P_{0}$ and $P_{1}$ are all randomly chosen or randomized by the mask. As the view of environment $Z$ between real world and ideal world is indistinguishable, Π _HP1 UC-emulates $F_{HP 1}$ using Definition 3.3 and therefore Π _HP1 UC-realizes $F_{HP 1}$ using Definition 3.3. The proof of Π _HP2 is similar to the case of Π _HP1 and is omitted here.

Functionality $F_{Sigmoid} (P_{0}, P_{1}, P_{2})$

$F_{Sigmoid}$ interacted with $P_{0}, P_{1}, P_{2}$ , adversary $S$ is parameterized by ring L and matrix size (m, n).

Input: Upon receiving $〈 X 〉_{0}^{L}$ from $P_{0}$ and $〈 X 〉_{1}^{L}$ from $P_{1}$ , verify if the shares $\in ℤ_{L}^{m \times n}$ .

Process:

X = Reconstruct( $〈 X 〉_{0}^{L}$ , $〈 X 〉_{1}^{L}$ )

R [k] = $\frac{1}{1 + e^{- X [k]}}$ for k = {1, ⋯ , m × n}

( $〈 R 〉_{0}^{L}$ , $〈 R 〉_{1}^{L}$ ) = Share(R)

Ouput: Send $〈 R 〉_{0}^{L}$ to $P_{0}$ and $〈 R 〉_{1}^{L}$ to $P_{1}$ .

Functionality $F_{Tanh} (P_{0}, P_{1}, P_{2})$

$F_{Tanh}$ interacted with $P_{0}, P_{1}, P_{2}$ , adversary $S$ is parameterized by ring L and matrix size (m, n).

Input: Upon receiving $〈 X 〉_{0}^{L}$ from $P_{0}$ and $〈 X 〉_{1}^{L}$ from $P_{1}$ , verify if the shares $\in ℤ_{L}^{m \times n}$ .

Process:

X = Reconstruct( $〈 X 〉_{0}^{L}$ , $〈 X 〉_{1}^{L}$ )

R [k] = $\frac{e^{X [k]} - e^{- X [k]}}{e^{X [k]} + e^{- X [k]}}$ for k = {1, ⋯ , m × n}

( $〈 R 〉_{0}^{L}$ , $〈 R 〉_{1}^{L}$ ) = Share(R)

Ouput: Send $〈 R 〉_{0}^{L}$ to $P_{0}$ and $〈 R 〉_{1}^{L}$ to $P_{1}$ .

Theorem 2. Π _Simoid and Π _Tanh UC-realize $F_{Sigmoid}$ and $F_{Tanh}$ respectively in the ( $F_{HP 2}$ , $F_{DIV}$ )-hybrid model.

Proof 2. For the correctness of Π _Sigmoid , the reconstruction of final result needs to be $\frac{1}{1 + e^{- x}}$ . Π _Sigmoid has shown that the final shares are generated from the value $\frac{e^{x}}{e^{x} + 1}$ , which is equal to $\frac{1}{1 + e^{- x}}$ . The definition of Tanh activation function is equal to $\frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}$ . The correctness of Π _Tanh is easily presented by showing $\frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}$ is equal to $\frac{e^{2 x} - 1}{e^{2 x} + 1}$ , which is the reconstruction of final results.

Π _Sigmoid and Π _Tanh are ( $F_{HP 2}$ , $F_{DIV}$ )-hybrid model. Since Π _HP2 and Π _DIV have been proved to UC-realize $F_{HP 2}$ and $F_{DIV}$ in Theorem 1 and SecureNN [35] respectively, Π _Sigmoid and Π _Tanh can substitute (Π _HP2 , Π_DIV) for ( $F_{HP 2}$ , $F_{DIV}$ ) in arbitrary combination type using Definition 3.3. Other than the subroutines, there is no additional trnasmitted values between $P_{0}$ and $P_{1}$ . If $S$ corrupts either $P_{0}$ or $P_{1}$ , it is in vain for $S$ to recover the secrets because all the computations are performed on shares. When all of the values are random from the $Z$ point of view, it is a perfect simulation. The probability of $Z$ to distinguish between real world and ideal world is negligible. Thus, Theorem 6.2 is proved.

6.3 Security of principal protocols

Theorem 3. Π _UG , Π _RG , Π _CM , and Π _AU UC-realize $F_{UG}$ , $F_{RG}$ , $F_{CM}$ , and $F_{AU}$ respectively.

Proof 3. Principal protocols invoke base protocols as subroutines, which are already proved UC-secure in Theorem 6.2 and Theorem 6.2. In addition to the subroutines, the only operations are the addition of shares computed locally by $P_{0}$ and $P_{1}$ . The correctness of Π _UG , Π _RG , Π _CM , and Π _AU are easily demonstrated.

Consider the proof of privacy, the part of subroutines are perfectly simulated using Definition 3.3 and there is no other transmitted messages. If $S$ corrupts one of $P_{0}$ or $P_{1}$ , all the intermediate values are kept in shares. $Z$ can not get any clue to differentiate between real-world execution and ideal-world execution. All of the principal protocols are proved to UC-realize corresponding ideal functionalities.

Functionality $F_{UG} (P_{0}, P_{1}, P_{2})$

$F_{UG}$ interacted with $P_{0}, P_{1}, P_{2}$ , adversary $S$ is parameterized by ring L and matrix size (n, d), (n, h), (d, h), (h, h), (1, h).

Input: Upon receiving ( $〈 D 〉_{0}^{L}$ , $〈 H 〉_{0}^{L}$ , $〈 W_{z} 〉_{0}^{L}$ , $〈 U_{z} 〉_{0}^{L}$ , $〈 b_{z} 〉_{0}^{L}$ ) from $P_{0}$ and ( $〈 D 〉_{1}^{L}$ , $〈 H 〉_{1}^{L}$ , $〈 W_{z} 〉_{1}^{L}$ , $〈 U_{z} 〉_{1}^{L}$ , $〈 b_{z} 〉_{1}^{L}$ ) from $P_{1}$ , verify if all the shares are in the correct sizes.

Process:

D = Reconstruct( $〈 D 〉_{0}^{L}$ , $〈 D 〉_{1}^{L}$ )

H = Reconstruct( $〈 H 〉_{0}^{L}$ , $〈 H 〉_{1}^{L}$ )

W_z = Reconstruct( $〈 W_{z} 〉_{0}^{L}$ , $〈 W_{z} 〉_{1}^{L}$ )

U_z = Reconstruct( $〈 U_{z} 〉_{0}^{L}$ , $〈 U_{z} 〉_{1}^{L}$ )

b_z = Reconstruct( $〈 b_{z} 〉_{0}^{L}$ , $〈 b_{z} 〉_{1}^{L}$ )

Z = σ (DW_z + HU_z + b_z)

( $〈 Z 〉_{0}^{L}$ , $〈 Z 〉_{1}^{L}$ ) = Share(Z)

Ouput: Send $〈 Z 〉_{0}^{L}$ to $P_{0}$ and $〈 Z 〉_{1}^{L}$ to $P_{1}$ .

Functionality $F_{RG} (P_{0}, P_{1}, P_{2})$

$F_{RG}$ interacted with $P_{0}, P_{1}, P_{2}$ , adversary $S$ is parameterized by ring L and matrix size (n, d), (n, h), (d, h), (h, h), (1, h).

Input: Upon receiving ( $〈 D 〉_{0}^{L}$ , $〈 H 〉_{0}^{L}$ , $〈 W_{r} 〉_{0}^{L}$ , $〈 U_{r} 〉_{0}^{L}$ , $〈 b_{r} 〉_{0}^{L}$ ) from $P_{0}$ and ( $〈 D 〉_{1}^{L}$ , $〈 H 〉_{1}^{L}$ , $〈 W_{r} 〉_{1}^{L}$ , $〈 U_{r} 〉_{1}^{L}$ , $〈 b_{r} 〉_{1}^{L}$ ) from $P_{1}$ , verify if all the shares are in the correct sizes.

Process:

D = Reconstruct( $〈 D 〉_{0}^{L}$ , $〈 D 〉_{1}^{L}$ )

H = Reconstruct( $〈 H 〉_{0}^{L}$ , $〈 H 〉_{1}^{L}$ )

W_r = Reconstruct( $〈 W_{r} 〉_{0}^{L}$ , $〈 W_{r} 〉_{1}^{L}$ )

U_r = Reconstruct( $〈 U_{r} 〉_{0}^{L}$ , $〈 U_{r} 〉_{1}^{L}$ )

b_r = Reconstruct( $〈 b_{r} 〉_{0}^{L}$ , $〈 b_{r} 〉_{1}^{L}$ )

R = σ (DW_r + HU_r + b_r)

( $〈 R 〉_{0}^{L}$ , $〈 R 〉_{1}^{L}$ ) = Share(R)

Ouput: Send $〈 R 〉_{0}^{L}$ to $P_{0}$ and $〈 R 〉_{1}^{L}$ to $P_{1}$ .

Functionality $F_{CM} (P_{0}, P_{1}, P_{2})$

$F_{CM}$ interacted with $P_{0}, P_{1}, P_{2}$ , adversary $S$ is parameterized by ring L and matrix size (n, d), (n, h), (d, h), (n, h), (h, h), (1, h).

Input: Upon receiving ( $〈 D 〉_{0}^{L}$ , $〈 H 〉_{0}^{L}$ , $〈 W 〉_{0}^{L}$ , $〈 R 〉_{0}^{L}$ , $〈 U 〉_{0}^{L}$ , $〈 b 〉_{0}^{L}$ ) from $P_{0}$ and ( $〈 D 〉_{1}^{L}$ , $〈 H 〉_{1}^{L}$ , $〈 W 〉_{1}^{L}$ , $〈 R 〉_{0}^{L}$ , $〈 U 〉_{1}^{L}$ , $〈 b 〉_{1}^{L}$ ) from $P_{1}$ , verify if all the shares are in the correct sizes.

Process:

D = Reconstruct( $〈 D 〉_{0}^{L}$ , $〈 D 〉_{1}^{L}$ )

H = Reconstruct( $〈 H 〉_{0}^{L}$ , $〈 H 〉_{1}^{L}$ )

W = Reconstruct( $〈 W 〉_{0}^{L}$ , $〈 W 〉_{1}^{L}$ )

R = Reconstruct( $〈 R 〉_{0}^{L}$ , $〈 R 〉_{1}^{L}$ )

U = Reconstruct( $〈 U 〉_{0}^{L}$ , $〈 U 〉_{1}^{L}$ )

b = Reconstruct( $〈 b 〉_{0}^{L}$ , $〈 b 〉_{1}^{L}$ )

H′ = tanh (D [j] W [j] + (R [j] ⊙ H [j - 1]) U [j] + b [j])

( $〈 H^{'} 〉_{0}^{L}$ , $〈 H^{'} 〉_{1}^{L}$ ) = Share(H′)

Ouput: Send $〈 H^{'} 〉_{0}^{L}$ to $P_{0}$ and $〈 H^{'} 〉_{1}^{L}$ to $P_{1}$ .

Functionality $F_{AU} (P_{0}, P_{1}, P_{2})$

$F_{AU}$ interacted with $P_{0}, P_{1}, P_{2}$ , adversary $S$ is parameterized by ring L and matrix size (n, d), (n, h), (n, h).

Input: Upon receiving ( $〈 Z 〉_{0}^{L}$ , $〈 H 〉_{0}^{L}$ , $〈 H^{'} 〉_{0}^{L}$ ) from $P_{0}$ and ( $〈 Z 〉_{1}^{L}$ , $〈 H 〉_{1}^{L}$ , $〈 H^{'} 〉_{1}^{L}$ ) from $P_{1}$ , verify if all the shares are in the correct sizes.

Process:

Z = Reconstruct( $〈 Z 〉_{0}^{L}$ , $〈 Z 〉_{1}^{L}$ )

H = Reconstruct( $〈 H 〉_{0}^{L}$ , $〈 H 〉_{1}^{L}$ )

H′ = Reconstruct( $〈 H^{'} 〉_{0}^{L}$ , $〈 H^{'} 〉_{1}^{L}$ )

C = Z ⊙ H + (1 - Z) ⊙ H′

( $〈 C 〉_{0}^{L}$ , $〈 C 〉_{1}^{L}$ ) = Share(C)

Ouput: Send $〈 C 〉_{0}^{L}$ to $P_{0}$ and $〈 C 〉_{1}^{L}$ to $P_{1}$ .

6.4 Security of privGRU

PrivGRU model is composed of base protocols and principal protocols. The architecture of PrivGRU needs not to be fixed. As the security of each individual protocol has been proved UC-secure, the implementation of network can differ by arbitrarily combining these protocols. The security of PrivGRU can be inferred using Theorem 3.3. This security guarantee provides higher practical value of this work.

7 Conclusions and future works

This paper presents the PrivGRU, a framework of privacy-preserving GRU that can be deployed in cloud. The security of each protocol has been proved in UC framework; therefore, PrivGRU can retain the privacy of both model owners and prediction clients by composing the UC-secure protocols arbitrarily. This flexibility makes PrivGRU useful for many practical scenarios while nowadays people value thier privacy increasingly. In future works, we will extend the PrivGRU with secure training protocols and conduct experiments as the proof-of-work. [28 , 37].

Footnotes

Acknowledgment

This research was supported by the Ministry of Science and Technology, Taiwan (ROC), under Project Numbers MOST 108-2218-E-004-001-, and by Taiwan Information Security Center at National Sun Yat-sen University (TWISC@NSYSU).

References

Agarap

A.F.

, Pepito

F.J.H.

, Towards Building an Intelligent Anti-Malware System: A Deep Learning Approach using Support Vector Machine (SVM) for Malware Classification. arXiv preprint arXiv:1801.00318, 2017.

Barni

, Failla

, Kolesnikov

, Lazzeretti

, Sadeghi

A.-R.

, Schneider.

, Secure evaluation of private linear branching programs with medical applications. In European Symposium on Research in Computer Security, pages 424–439. Springer, 2009.

Beigi

, Shu

, Guo

, Wang

, Liu

, Privacy Preserving Text Representation Learning. Proceedings of the 30th on Hypertext and Social Media (HTâ.ĂŹ19). ACM, 2019.

Biswas

, Chadda

, Ahmad

, Sentiment Analysis with Gated Recurrent Units. Department of Computer Engineering. Annual Report Jamia Millia Islamia New Delhi, India, 2015.

Blakley

G.R.

, et al. Safeguarding Cryptographic Keys. In Proceedings of the National Computer Conference, volume 48, 1979.

Canetti

, Security and composition of multiparty cryptographic protocols, Journal of CRYPTOLOGY13(1) (2000), 143–202.

Canetti

, Universally Composable Security: A new Paradigm for Cryptographic Protocols. In Proceedings 42nd IEEE Symposium on Foundations of Computer Science, pages 136–145. IEEE, 2001.

Capes

, Coles

, Conkie

, Golipour

, Hadjitarkhani

, Hu

, Huddleston

, Hunt

, Li

, Neeracher

, et al. Siri on-device deep learning-guided unit selection text-to-speech system. In INTERSPEECH, pages 4011–4015, 2017.

Chen

C.-M.

, Wang

K.-H.

, Yeh

K.-H.

, Xiang

, Wu

T.-Y.

, Attacks and solutions on a three-party password-based authenticated key exchange protocol for wireless communications, Journal of Ambient Intelligence and Humanized Computing, 10(8) (2019), 3133–3142.

10.

Chen

C.-M.

, Xiang

, Liu

, Wang

K.-H.

, A secure authentication protocol for internet of vehicles, IEEEAccess 7 (2019), 12047–12057.

11.

Chiu

C.-C.

, Sainath

T.N.

, Wu

, Prabhavalkar

, Nguyen

, Chen

, Kannan

, Weiss

R.J.

, Rao

, Gonina

, et al. State-of-the-art Speech Recognition with Sequence-to-sequenceModels. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4774–4778. IEEE, 2018.

12.

Cho

, Van Merriënboer

, Gulcehre

, Bahdanau

, Bougares

, Schwenk

, Bengio

, Learning Phrase Representations using RNN Encoder-decoder for Statistical Machine Translation. arXiv preprint arXiv:1406.1078, 2014.

13.

De Cock

, Dowsley

, Nascimento

A.C.

, Reich

, Todoki

, Privacy-preserving classification of personal text messages with secure multi-party computation: An application to hate-speech detection. arXiv preprint arXiv:1906.02325, 2019.

14.

Demmler

, Schneider

, Zohner

, Aby-a framework for efficient mixed-protocol secure two-party computation. In NDSS, 2015.

15.

, Atallah

M.J.

, Protocols for Secure Remote Database Access with Approximate Matching. In E Commerce Security and Privacy, pages 87–111. Springer, 2001.

16.

Dwork

, Differential Privacy. Encyclopedia of Cryptography and Security, pages 338–340, 2011.

17.

, Zhang

, Li

, Using lstm and gru neural network methods for traffic flow prediction. In 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), pages 324–328. IEEE, 2016.

18.

Goldreich

, Foundations of Cryptography: volume 1, basic tools. Cambridge university press, 2007.

19.

, LU

, LIU

, A novel recurrent neural network algorithm with long short-term memory model for futures trading, Journal of Intelligent & Fuzzy Systems, (Preprint):1–8.

20.

Huang

, Practical Secure Two-party Computation. PhD thesis, Citeseer, 2012.

21.

, Lipton

Z.C.

, Elkan

, Differential Privacy and Machine Learning: A Survey and Review. arXiv preprint arXiv:1412.7584, 2014.

22.

Juvekar

, Vaikuntanathan

, Chandrakasan

, GAZELLE: A Low Latency Framework for Secure Neural Network Inference. In 27th USENIX Security Symposium (USENIX Security 18), pages 1651–1669, 2018.

23.

, Baldwin

, Cohn

, Towards Robust and Privacy-preserving Text Representations. arXiv preprint arXiv:1805.06093, 2018.

24.

Lindell

, How to Simulate It–A Tutorial on the Simulation Proof Technique. In Tutorials on the Foundations of Cryptography, pages 277–346. Springer, 2017.

25.

Liu

, Juuti

, Lu

, Asokan

, Oblivious Neural Network Predictions via Minionn Transformations. In Proceedings of the 2017 ACMSIGSAC Conference on Computer and Communications Security, pages 619–631. ACM, 2017.

26.

, Liu

, Wang

, A DRM model based on Proactive Secret Sharing Scheme for P2P Networks. In 9th IEEE International Conference on Cognitive Informatics (ICCI’10), pages 859–862. IEEE, 2010.

27.

Mohassel

, Zhang

, Secureml: A System for Scalable Privacy-preserving Machine Learning. In 2017 IEEE Symposium on Security and Privacy (SP), pages 19–38. IEEE, 2017.

28.

Pan

J.-S.

, Kong

, Sung

T.-W.

, Tsai

P.-W.

, Snášel

, α-Fraction first strategy for hierarchical model in wireless sensor networks, Journal of Internet Technology19(6) (2018), 1717–1726.

29.

Pan

J.-S.

, Lee

C.-Y.

, Sghaier

, Zeghid

, Xie

, Novel systolization of subquadratic space complexity multipliers based on toeplitz matrix-vector product approach. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2019.

30.

Pumrapee Poomka

N.K.

, Pongsena

Wattana

, Kerdprasop

, SMS spam detection based on long short-term memory and gated recurrent unit, International Journal of Future Computer and Communication8(2019).

31.

Riazi

M.S.

, Weinert

, Tkachenko

, Songhori

E.M.

, Schneider

, Koushanfar

, Chameleon: A Hybrid Secure Computation Framework for Machine Learning Applications. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, pages 707–721. ACM, 2018.

32.

Rouhani

B.D.

, Riazi

M.S.

, Koushanfar

, Deepsecure: Scalable Provably-secure Deep Learning. In Proceedings of the 55th Annual Design Automation Conference, page 2. ACM, 2018.

33.

Saleem

, Irfan Khattak

, Qazi

A.B.

, Supervised speech enhancement based on deep neural network, Journal of Intelligent & Fuzzy Systems, (Preprint):1–15, 2019.

34.

Shamir

, How to share a secret, Communications of the ACM22(11) (1979), 612–613.

35.

Wagh

, Gupta

, Chandran

, SecureNN: 3-Party secure computation for neural network training, Proceedings on Privacy Enhancing Technologies1(2019), 24.

36.

Wang

, Shen

, Li

, Shao

, Yang

, Cryptographic primitives in blockchains, Journal of Network and Computer Applications127 (2019), 43–58.

37.

J.M.-T.

, Lin

J.C.-W.

, Tamrakar

, High-utility itemset mining with effective pruning strategies, ACM Trans. Knowl. Discov. Data13(6) (2019), 58:1–58:22. ISSN 1556–4681. doi: 10.1145/3363571. URL http://doi.acm.org/10.1145/3363571.

38.

T.-Y.

, Chen

C.-M.

, Wang

K.-H.

, Meng

, Wang

E.K.

, A provably secure certificateless public key encryption with keyword search, Journal of the Chinese Institute of Engineers42(1) (2019), 20–28.

39.

Yao

A.C.-C.

, How to Generate and Exchange Secrets. In 27th Annual Symposium on Foundations of Computer Science (sfcs 1986), pages 162–167. IEEE, 1986.

40.

Yin

, Kann

, Yu

, Schütze

, Comparative Study of CNN and RNN for Natural Language Processing. arXiv preprint arXiv:1702.01923, 2017.

41.

Zhang

, Lipton

Z.C.

, Li

, Smola

A.J.

, Dive into Deep Learning, 2019.

PrivGRU: Privacy-preserving GRU inference using additive secret sharing

Abstract

Keywords

1 Introduction

2 Related works

2.1 Privacy-preserving machine learning techniques

2.2 State-of-the-art privacy-preserving neural networks

3 Preliminaries

3.1 Gated recurrent unit (GRU)

3.3 Universal composability (UC) framework

4 PrivGRU architecture

4.3 Secure inference servers in cloud

5 Privacy-preserving protocols

5.1 Base protocols

5.1.1 Hadamard product

Table 1 Comparison of Π HP1 and Π HP2 Protocol Usage Rounds Communication Hadamard product 1 Π HP1 Current memory Π CM and activation of unit Π AU 2 12mnl Hadamard product 2 Π HP2 Π Sigmoid and Π Tanh 2 8mnl

5.1.3 Tanh activation function

5.2 Principal protocols

5.2.1 Update gate and reset gate

5.2.2 Current memory

5.2.3 Activation of current unit

5.2.4 Putting it all together

6 Security analysis

6.1 Security model

6.2 Security of base protocols

6.3 Security of principal protocols

6.4 Security of privGRU

7 Conclusions and future works

Footnotes

Acknowledgment

References

Table 1
Comparison of Π _HP1 and Π _HP2

Protocol Usage Rounds Communication

Hadamard product 1 Π _HP1 Current memory Π _CM and activation of unit Π _AU 2 12mnl

Hadamard product 2 Π _HP2 Π _Sigmoid and Π _Tanh 2 8mnl