Privacy preserving,verifiable and efficient outsourcing algorithm for regression analysis to a malicious cloud

Abstract

Cloud computing has become ubiquitous, offers an economical solution for convenient on-demand access to computing resources, which enable the resource-constrained clients to execute extensive computation. However, outsourcing of data and computation to the cloud server is a great cause of concern, such as confidentiality of input/output and verifiability of the result. This paper addresses the problem of designing outsourcing algorithm for linear regression analysis (LR), which is an important data analysis technique and widely applied across multiple domains. The outsourcing framework illustrated by the following scenario: a client is having a large dataset and needs to perform regression analysis, but unable to process due to lack of computing resources. Therefore, the client outsources the computation to the cloud server. In the proposed LR outsourcing algorithm, the client outsources LR problem to the cloud server without revealing to them either the input dataset and the output. The algorithm is a non-interactive solution to the client, it sends only input and receives output along with the proof of verification from the cloud server. The client in the proposed algorithm able to verify the correctness of result with an optimal probability. The analytical analysis shows that the algorithm is successfully meeting the challenges of correctness, security, verifiability, and efficiency. The experimental evaluation validates the proposed algorithm. The result analysis shows that the algorithm is highly efficient and endorses the practical usability of the algorithm.

Keywords

Regression analysis computation outsourcing cloud computing linear transformation

1 Introduction

The rise in the popularity of cloud computing and mobile devices such as smartphones, notebooks, and other handheld battery operated devices provide an ideal situation where a weak device outsources its computation to more powerful cloud servers. This ideal condition is a strong motivation for researchers to give outsourcing solution for more real world applications. These resource-constrained devices have strong desire to execute computationally intensive task; that makes the outsourcing of computation to the cloud server a promising solution. The outsourcing paradigm enables resource-constrained clients to execute large computation task by offloading their computation load to massive cloud servers. Due to easy availability of cloud servers, the clients are no longer restricted to their limited CPU, storage, and bandwidth; else they are leveraging the abundance of computing resources due to seamless access to the cloud servers [1]. Moreover, outsourcing provides significant economic benefit to the clients. It brings down the capital and operational cost [2, 3]. Suppose somebody wants to analyze seismic, astronomic, or financial data. These datasets are often gigantic and involve large computations. Maintaining data center for computation requires critical human resources and significant capital investment [4]. Therefore, owning a data center incurs significant capital and operation cost to the client [5]. Hence, outsourcing the computation to third party cloud server is a beneficial option for a client [6, 7]. Despite the tremendous advantage, this promising paradigm brings many security and privacy concerns such as confidentiality of data (input and output) and integrity of result which makes a client reluctant to outsourced its computation on the cloud server. The client’s data is confidential in nature. It may contain information of personal, medical, financial, trends of stock, scientific research records, etc. Therefore, the data needed to be encrypted to maintaining the confidentiality and integrity before being outsourced. One way to address this security concern is to apply an encryption scheme, but the tradition encryption scheme would not work out since performing meaningful computation on cipher is very difficult. The second concern is correctness of result because the operation performed inside the cloud is non-transparent. A client cannot trust the result computed by the cloud server. The cloud may return wrong result due to a flaw, bug in the logic or may intentionally deviate from the algorithm instructions (malicious cloud). Therefore, there is no guaranty on the integrity of the result. Therefore, an outsourcing algorithm must be proficient in providing privacy to confidential data (input and output) and verifying the correctness of the result. Other important challenges are correctness and efficiency of the outsourcing algorithm. The client system needs to perform some pre-processing such as the transformation operation for preserving the input and output privacy, verification to check its correctness and finally the retransformation to obtain the final result. Therefore, the computation performs on the client-side system (viz., transformation, verification, and retransformation) must be substantially lesser. Otherwise, the outsourcing is not beneficial [8 –10]. Thus an outsourcing algorithm must satisfy the following four design goals: correctness, security, verifiability, and efficiency [11].

Linear regression analysis is an important data analysis technique, which has application in many domains such as financial modeling (for the prediction of stock price based on previous rates) [12, 13], weather prediction (prediction of weather to take correct decision of on the type of crops to maximize the yield) [14], machine learning (for training and prediction of dataset) [15], sensor node deployment (help to analyses the correct positions of sensor nodes) [16], traffic management [17], to just list a few. Thus, a variety of customers from many domains needs a solution for LR problem. Moreover, when the client is resource-constrained and LR deals with a large dataset, a useful option is to outsource the LR problem to the cloud server. Even if the dataset is on moderate scale, for the resource-constrained clients such as mobile phones, laptops, portable hand-held devices, performing such significant execution is a huge job. Therefore, outsourcing of LR is a convenientoption.

A typical regression model is shown as the linear function Y = Xβ, where (X) is the observed variable, (Y) is the dependent variable and (β) is the regression parameter. The value of the regression parameter (β) makes the linear function to best fit for the dataset D (X : Y). However, computing the value of (β) is a computation intensive problem since the procedure requires to perform matrix multiplication and inversion operation, that has O (n³) complexity and when LR is applied on a large data set memory is another bottleneck. The wide applicability in various domains of LR problem, its computation complexity and the inability of resource-constrained clients to execute such problem motivates us to design privacy-preserving and efficient outsourcing algorithm in a malicious cloud environment. The paper offers a solution for the linear regression problem via a completely new approach, the client outsources the complex computation of LR to the cloud server, while itself maintaining the privacy of input/output, result verifiability. The main idea of this algorithm is to apply an efficient linear transformation technique on the input dataset. The transformation must able to preserve the input/output privacy and also allows the cloud server to perform meaningful operation. In this way the cloud server is entirely unaware of “what exactly is computed” on its hardware. Further, the result verification step should not involve any complex operation. The proposed algorithm ensures the correctness of result. The verification phase is very efficient and requires only a matrix-vector multiplication. The verification phase enables the client to detect server misconduct with an optimal probability. Finally, the client performs retransformation to obtain the original solution ofLR problem.

The contribution of this paper summarize as follows:

Privacy preserving transformation technique has been introduced, which provide security to the input/output dataset. It is not revealing any sensitive information to the cloud server. It allows the cloud to perform an operation (regression analysis) on the transformed dataset. It is also suitable for any permitted dimension of (X_(m*n), Y_(m)).

The client in the proposed algorithm verifies the correctness of result with an optimal probability of 1. The client is saving expensive computation of result retransformation if the result obtained from cloud server fails the verification test. The client able to verify encrypted result. Further, the verification algorithm is efficient and not involves in any complex computation.

The paper discusses all nuances and the methods in designing of the outsourcing algorithm. The analytical and experimental performance demonstrates the practical usability of the proposed LR outsourcing algorithm in a malicious cloud environment.

The rest of the paper is organized as follows: Section 2 presents the related work done in the area of secure computation. Section 3 provides a detail discussion of the regression analysis outsourcing problem, its mathematical elements, and system model to solve them. Section 4 carries out the discussion of proposed regression analysis outsourcing algorithm and all nuances, and the methods require in designing outsourcing algorithm. The analytical analysis of the proposed regression analysis outsourcing algorithm on correctness, security, verifiability and efficiency parameter presented in Section 5. Section 6 presents the experimental analysis of the proposed regression analysis outsourcing algorithm. Finally, concluding remark and future direction incorporated in Section 7.

2 Related work

In literature, there are various algorithms available for secure outsourcing of various core problems of linear algebra. These secure outsourcing algorithms are divided into two parts, which are semi-trusted computing model and untrusted computing model. A detail classification of secure outsourcing algorithms is presented Fig. 1. In the semi-trusted environment, the cloud server follows the algorithm instructions and produces the correct result, but the cloud server secretly records all information, which it has access and attempt to retrieve secret information. In the semi-trusted environment, the first one is audit based [18 –20], the client or the trusted workers in audit-based approach recomputes some part of computation done by the untrusted workers. This approach is infeasible for a computationally weak client because if the client is capable of performing such computation, then there is no need to take help from the cloud server. This method also requires that some workers must be honest or at least non-colluding in nature. The second one is secure-co-processors [21 –23], or Trusted Platform Modules (TPMs) method, which requires the deployment of trusted hardware on the server to provide an isolated execution environment. The last one is multiparty computation; the computation is being divided among two or more workers without allowing any participated worker to view another individual’s private data. The resultant of computation is the union of the output of allworkers.

Fig.1

Taxonomy of secure computation.

The server can deviate from the algorithm instructions and behave arbitrarily in untrusted model (malicious), the solution to this problem is to verify the correctness of result in addition to the secure computation. The verifiable outsourcing is of two types that are interactive proof and non-interactive proof. In interactive proof, a weak verifier actively challenges a server (prover). The prover replies a probabilistic proof to the client (verifier) to convince him (verifier) of the truth of the statement that he is unable to compute [24 –26]. The last one is the non-interactive approach, where a weak client outsources the computation to a powerful server, the server returns the result along with the proof of verification [9 , 27–37].

The proposed solution for LR applies non-interactive proof to provide verifiability of computation. Therefore, the attention is more on non-interactive verifiable algorithms. In the same direction, Gentry’s provides a novel Fully Homomorphic Encryption (FHE) scheme [38]. This scheme allows the server to perform arbitrary computation on encrypted input, but the solution suffers from complexity and inefficiency that makes it far from practical use and it does not guarantee that the server performs correct computation [10, 39]. Gennaro et al. [27] have proposed non-interactive proof based solutions for secure outsourcing of polynomial function. First, they model the polynomial function into Yao’s grabbled circuit [40, 41]. Then encrypt this circuit homomorphically [42] and send it to the cloud server for the execution of the polynomial function. The cloud server carries out the execution & returned a computationally sound non-interactive proof that can verify in O (m) time. Another elegant solution for secure outsourcing proposed by Chung et al. [43]. The basic idea of this algorithm is that the client need to carryout pre-processing and creates hundreds of problem instances mostly of same type. Then apply homomorphic encryption for privacy. The cloud server computes these functions without knowing the actual inputs. Finally, the client verifies the solution of same type of problem for ensuring their correctness. The main drawback of these two algorithm is that they incur huge computation load on both client and cloud server due to complex homomorphic encryption. These algorithms are also required to modeling the problem into the circuits, which needs to deal with a large number of parameters. However, the main advantage of these methods is, that they need constant time for verification. Unlike the generalized approach to solving the problem, there are many algorithms available which address specific problems. Lei et al., [33, 44] presents matrix multiplication and matrix inversion outsourcing algorithm. They applied monomial matrix based transformation method for preserving the privacy of input/output matrix in both the outsourcing algorithm. These algorithms efficiently outsource the matrix multiplication and matrix inversion to the cloud server, while maintaining the correctness, privacy of the input/output, and verifies the result efficiently. An algorithm address linear programming (LP) [45]. This method split the LP solver into two parts; those are the LP solver on cloud and the LP parameters on client-side. The client transforms the input data using a transformation technique and outsources to the LP solver on a cloud server. Then cloud server solves the linear programming problem with the help of LP solver and returns the result. The client verifies the result using the fundamental theorem of duality. Further, an algorithm is also presented for LP problem, which reduces the computation and communication time significantly [9]. In [46], Blanton et al. addressed the large scale computation of biometric computation. The implementation leverages individual structures of distance computation and random sampling. The result verification method can verify the result with modest overhead and high probability.There are many algorithms published recently for the “system of linear equation” [9 , 47]. The algorithm proposed by [32] requires (n) round of communication between the client and cloud before the client get the convergent solution. The algorithm encrypts the input in O (n²) in setup phase and verify the output in O (n²). The (n) round of communication between the client and the cloud is the main drawback of this method, which increases the communication overhead and add security vulnerability. Next [9], an algorithm identified a new solution for linear equation which employs some special linear transformations using diagonal matrices. This algorithm requires one round of communication between the client and the cloud. Further, an algorithm [11] utilizes the sparse matrix to propose a new secure outsourcing algorithm of large-scale linear equations in the fully malicious model. The algorithm requires only O (1) round of communication between the client and cloud server. In [34] Chen et al., proposes an outsourcing algorithm for modular exponent in malicious cloud model. Further, they use this algorithm as a sub-routine for outsourcing Carmer-Shoup encryption and Schnorr signature. They further extend the algorithm for outsourcing of simultaneous modular exponent. Recently, in [35] Lei et al. address determinant computation using block matrix and monomial based transformation operation for a malicious cloud environment. The outsourcing algorithm is an efficient implementation of determinant computation which reduces the complexity of computing determinant from O (n^2.373) to O (n²). Our work is an extension of these algorithm that address an important data analysis problem regression analysis via a completely new transformationoperation.

3 Details of regression analysis outsourcing algorithm

3.1 System model

The system model for secure computation outsourcing algorithm for regression analysis is shown in Fig. 2. The client wants to analyze large data set to make a regression model, but due to lack of computing resources, the client is unable to perform such analysis. Therefore, the client outsources the computation to the cloud servers. The client first performs a linear transformation on the input problem φ (X, Y) using a secret key (k), the operation transform the problem into φk (X′, Y′). Then, the transformed regression analysis problem outsourced to the cloud server. The cloud servers are expected to compute correct result for the outsourced regression problem (φ_k). However, for the assurance of correctness of the result, the client verifies the result received from the cloud server. The client first verifies the encrypted result received from the cloud server. If the client found that the result received from the cloud server is correct, then only the client performs a retransformation to obtain decrypted result.

Fig.2

System model for secure outsourcing of linear regression.

3.2 Threat model

The security threat in secure outsourcing system model comes from the suspicious behavior of the cloud server. The previous work for the secure outsourcing computation defines three types of threat model that are “Trusted Model,” “Semi-Trusted Model”, and “Untrusted Model” [11 , 44].

Trusted Model: The cloud server follows the algorithm correctly and not deviates from algorithm instructions. It does not record input and output data. Therefore, no need to perform encryption of data and verification the output.

Semi-Trusted Model: In this model, cloud behaves as “honest but curious” or “lazy but honest” or even both. Goldreich et al., first introduce the “honest but curious model” [48], the cloud server follows the algorithm instructions and produces the correct result, but the cloud server secretly records all information which it has access and attempt to retrieve sensible information. Further, the lazy but honest model behaves honestly, that it does not record any such information but perform lazily, that the cloud server might not perform on the agreed service level. It may send some random invalid result to save its computing resources, to share them with other clients to increase the financial gain. However, the proposed algorithm has been designed to handle such security threats. Even a curious cloud server attempt to retrieve original information, but due to privacy preserving transformation. It is unable to succeed in such malicious act. Further, if the cloud behaves lazily the client can detect the correctness of result with an optimal probability.

Untrusted Model:The untrusted model comes from the malicious behavior of cloud server, which is the strongest adversarial model. In this model, the server could be lazy, curious and dishonest. The cloud server may deviate from the algorithm instructions and performs arbitrary. It may return random indistinguishable result and try to escape for not being detected. It may secretly record all information that’s comes into its possession. The proposed algorithm can address any such malicious adversary.

3.3 Outsourcing algorithm framework

A secure and verifiable outsourcing algorithm framework has five sub-steps in the following order: KeyGen, ProbTrans, Compute, Verify, and Retransform. Table 1 presents the notations and symbols used in the paper.

KeyGen (1^λ): The algorithm takes input security parameter λ and generates the key for transformation operation. The key generation step runs for every new problem submission.

ProbTrans (φ, k): The client first encrypts the input problem φ (X, Y) with key (k) and generates the transformed problem φ_k (X′, Y′).

Computeφ_k (X′, Y′): The client outsources the transformed problem φ_k (X′, Y′)to the cloud server. Then the cloud server carries out the computation for the regression parameter (β′) on the transformed data set.

Verify(β′, k): The client verifies the encrypted regression parameter (β′) obtained from the cloud server.

Retransform(β′, k): The client retransform/decrypts regression parameter (β′) (if verification step is successfully passed), to obtain the regression parameter (β)

Table 1
List of common notations and symbols

Notations and Description

Symbols

D (X : Y) Input dataset

φ (X, Y) Linear regression without encryption

φ_k (X′, Y′) Encrypted linear regression

(β) Regression parameter

Transformed regression parameter

(π₁, π₂) Permutation function

(γ, δ) Non-Zero random numbers

(I₁, I₂) Identity matrix

Key k = Inv₁, Inv₂ for transformation

Inv₁, Inv₂ The key matrices Inv₁, Inv₂ are general permutation matrix)

Notations and	Description
D (X : Y)	Input dataset
φ (X, Y)	Linear regression without encryption
φ_k (X′, Y′)	Encrypted linear regression
(β)	Regression parameter
	Transformed regression parameter
(π₁, π₂)	Permutation function
(γ, δ)	Non-Zero random numbers
(I₁, I₂)	Identity matrix
Key	k = Inv₁, Inv₂ for transformation
Inv₁, Inv₂	The key matrices Inv₁, Inv₂ are general permutation matrix)

3.4 Basic idea of regression analysis outsourcing

Regression analysis is an important data analysis technique. It has applications in modeling and analysis of numerical dataset across many domains [49]. A regression model estimates the values of dependent variable (Y) on the set of observed values (X). A linear relation exists between the observed variable and the estimated variable i.e. $Y = X β$ (1) where, the (β) is the regression parameter. The main objective is to find such (β) which best fit the regression model Y = Xβ. However, finding such regression parameter (β) is a trivial task. Since, there are differences exists between the actual value of (Y) and the observed value of $(\tilde{Y})$ . The difference between the values of (Y) and $(\tilde{Y})$ known as residual error. The objective of the best-fit regression model is to find such regression parameters (β), which minimize the squared sum of residual error. A best way to find the value of (β) is to use least square method. It gives good approximation of (β) based on the values of (X, Y). The equation of least square method to calculate the value of regression parameter (β)i.e. $β = (X^{T} X)^{- 1} X^{T} Y$ (2) where (Y) is a m * 1 vector in $ℝ^{m}$ , (X) is a m * n matrix in $ℝ^{m * n}$ , (β) is a n * 1 vector in $ℝ^{n}$ where m > n.

The basic idea of regression analysis outsourcing algorithm is as follows: Let the resource-constrained client needs to perform regression analysis on large dataset D (X : Y), but due to lack of computing resources the client is unable to perform this computation. Therefore, the client outsources the regression analysis problem to the third party cloud server to perform the computation on its behalf, but before outsourcing the LR problem, the client first applies a transformation on the input dataset D (X : Y) to provide privacy. Then the client outsources the transformed dataset D (X′ : Y′) to the cloud server. Thus, for the cloud server, this transformed dataset is same as any other dataset. However, the cloud server could read this dataset but could not recover the original values. In this way, the privacy preserving transformation operation protect the input dataset from the cloud server.

Our goal is to find a new transformation technique for regression analysis outsourcing algorithm. Therefore, an investigation has been performed on matrix property to find a candidate solution. Finally, the research has reached on a conclusion that the inverse matrix may be a possible solution. However, matrix inversion is a complex operation, which makes the transformation process inefficient and hence the outsourcing algorithm. Therefore, the research further proceeds for an efficient solution; the finding is a generalized permutation matrix. A square matrix is generalized permutation matrix if it has only one non-zero entry in each row or column. This particular arrangement of the matrix makes the transformation operation very efficient. Further, the next section presents the discussion of the development of general permutation matrix and the transformationoperation.

4 Secure linear regression analysis outsourcing algorithm construction

This section starts with a discussion on the generation of permutation matrix. The permutation matrix is used as key for the privacy-preserving (transformation operation) method. The analysis of the least square method gives insight and motivation for the design of privacy preserving transformation method for linear regression analysis.

4.1 Creation of generalized permutation matrix

This discussion helps us to generate security key that will further use in the transformation operation. The privacy preserving transformation method for linear regression analysis uses generalized permutation matrix. This matrix has one element in each row & column. It takes the only linear time to the size of input for a generation. The development of permutation matrix is inspired by [50]. Cauchy’s permutation function [51] generates random permutations. The permuted values are used to shuffle the matrix values row-wise and column-wise. The Cauchy’s Permutation function can be written as $π = (\begin{matrix} 1 \dots \dots \dots \dots \dots \dots \dots n \\ μ_{1} \dots \dots \dots \dots \dots \dots μ_{n} \end{matrix})$ (3)

When Cauchy’s permutation function applies on the row of an identity matrix, it randomizes the indices positions of the rows. The identity matrix with permutated row further masked with positive random numbers γ → {γ₁, γ₂, … … … γ_n} here 0 ∉ γ. In this way we find the final key matrix $[\begin{matrix} 0 & γ_{2} & 0 \\ 0 & 0 & γ_{3} \\ γ_{1} & 0 & 0 \end{matrix}]$ .

This key matrix has some unique qualities, it is square and non-singular matrix, thus invertible and a matrix multiplication with this matrix and a general matrix cost only O (n²), since the cost of addition get omitted, for simplicity we present the key matrix as Inv₁, Inv₂ and their inversions as ${Inv}_{1}^{- 1}, {Inv}_{2}^{- 1}$ in the paper.

Let, the key matrices are ${Inv}_{1 (π (i), j)} = [\begin{matrix} 0 & γ_{2} & 0 \\ 0 & 0 & γ_{3} \\ γ_{1} & 0 & 0 \end{matrix}]$ and ${Inv}_{2 (π (i), j)} = [\begin{matrix} 0 & 0 & δ_{3} \\ δ_{1} & 0 & 0 \\ 0 & δ_{2} & 0 \end{matrix}]$ . We can find their inversion in O (n) time. The inversion are ${Inv}_{1}^{- 1} = [\begin{matrix} 0 & 0 & \frac{1}{γ_{1}} \\ \frac{1}{γ_{2}} & 0 & 0 \\ 0 & \frac{1}{γ_{3}} & 0 \end{matrix}]$ and ${Inv}_{2}^{- 1} = [\begin{matrix} 0 & \frac{1}{δ_{1}} & 0 \\ 0 & 0 & \frac{1}{δ_{2}} \\ \frac{1}{δ_{3}} & 0 & 0 \end{matrix}]$ .

The permuted matrix when multiplied from left-side it performs row-wise permutation, and when it multiplied from right-side, it performs column-wise permutation. Further, each value of the transformed matrix is masked by a multiplicative factor of γ_i/δ_i.

The final transformed matrix of input X could be written as, $\begin{array}{l} X^{'} = I n v_{1} * X * I n v_{2}^{- 1} \\ \Rightarrow X^{'} = I n v_{1} * X * I n v_{2}^{- 1} \\ \Rightarrow [\begin{matrix} γ_{1} / δ_{1} x_{π_{1} (1), π_{2} (1)} & \dots & γ_{1} / δ_{n} x_{π_{1} (1), π_{2} (n)} \\ . & . \\ . & . \\ γ_{k} / δ_{1} x_{π_{1} (k), π_{2} (1)} & \dots & γ_{k} / δ_{n} x_{π_{1} (k), π_{2} (n)} \\ . & . \\ . & . \\ γ_{n} / δ_{1} x_{π_{1} (n), π_{2} (1)} & \dots & γ_{n} / δ_{n} x_{π_{1} (n), π_{2} (n)} \end{matrix}] \end{array}$ (4)

4.2 Privacy preserving regression analysis

The privacy preserving transformation operation has three objectives. Firstly, it should able to provide privacy to the input and output. Secondly, it allows meaningful computation on the encrypted data. Third, the client in the algorithm could efficiently retransform the output β′ to β without any complexity.

The least square method for the calculation of regression parameter β = (X^TX) ^-1X^TY provide us insight to transform (X) into (X′). When the dateset (X : Y) has transformed into $(X^{'} = {Inv}_{1} {XInv}_{2}^{- 1})$ and the $(Y^{'} = {Inv}_{1} Y)$ . Putting the values of (X′, Y′) in Equation (2).

The transformed regression parameter

$\begin{array}{l} β' = {(X^{' T} X^{'})}^{- 1} X^{' T} Y \\ \Rightarrow {({(I n v_{1} X I n v_{2}^{- 1})}^{T} * (I n v_{1} X I n v_{2}^{- 1}))}^{- 1} \\ * {(I n v_{1} X I n v_{2}^{- 1})}^{T} I n v_{1} Y) \end{array}$ (5)

This method provides privacy to the input data D (X : Y). But there are too many extra terms in Equation (5), which make difficult to establish a meaningful relationship between (β′) and (β),without a relation between (β) and (β′), we could not perform retransformation to recover the original values (β). Thus, this transformation method incapable to meet the third objective.

Moreover, to reduce the additional term, it is needed to relook on the operation to find a solution to reduce the additional terms. When closely observed, it has concluded that if the transformation is applied in the following order, i.e.,

First, transform the (X) into $X^{'} = {Inv}_{1} {XInv}_{2}^{- 1}$ then transform (X^T) into $X^{T^{'}} = {Inv}_{2} X^{T} {Inv}_{1}^{- 1}$ and finally (Y) into Y′ = Inv₁Y, and put these values into the Equation (2), the resulting equation,

$\begin{matrix} β^{'} & = & (({Inv}_{2} X^{T} {Inv}_{1}^{- 1}) * ({Inv}_{1} {XInv}_{2}^{- 1}))^{- 1} \\ * ({Inv}_{2} X^{T} {Inv}_{1}^{- 1}) * {Inv}_{1} Y) \end{matrix}$ (6)

Since, the invertible matrix $({Inv}_{1} * {Inv}_{1}^{- 1}) = 1$ the value of regression parameter will reduce to

$\begin{matrix} β^{'} & = & (({Inv}_{2} X^{T} {XInv}_{2}^{- 1}))^{- 1} \\ * ({Inv}_{2} X^{T} {Inv}_{1}^{- 1}) {Inv}_{1} Y) \end{matrix}$ (7)

Further, the property of invertible matrix that it reverses the order of multiplication, i.e., $\begin{matrix} Property 1 (({Inv}_{1}) * ({Inv}_{2}))^{- 1} = ({Inv}_{2})^{- 1} ({Inv}_{1})^{- 1} \\ Property 2 (({Inv}_{2})^{- 1})^{- 1} = {Inv}_{2} \end{matrix}$ when these matrix properties (1 & 2) applied on the Equation (7). The Equation (7) will reduce to

$\begin{matrix} β^{'} & = & ({Inv}_{2} * {(X^{T} X)}^{- 1} {Inv}_{2}^{- 1} \\ * ({Inv}_{2} X^{T} {Inv}_{1}^{- 1}) * {Inv}_{1} Y) \end{matrix}$ (8)

Simplifying the terms, the Equation (8) will further reduce to $β^{'} = {Inv}_{2} * {(X^{T} X)}^{- 1} X^{T} Y$ (9)

There is a visible relationship exists between (β′) & (β), which could efficiently perform retransformation to obtain the original solution of regression parameter (β).

4.3 Details of LR algorithm

Five sub-algorithms has been developed for the proposed LR outsourcing algorithm, that are KeyGen, ProbTrans, Compute, Verify, and Retransform as follows:

a. KeyGen (1^λ): The client invokes this algorithm with an input security parameter λ, this algorithm generates two sets of non-zero random numbers γ→ { γ₁, γ₂, …… … γ_n } , δ → { δ₁, δ₂, …… … δ_n } where {0 ∉ (γ, δ)} and random permutations {π₁, π₂}. This implementation is very efficient and takes only O (n) time complexity. The key generation algorithm has got inspiration form algorithm [33], which was developed for secure outsourcing of matrix inversion algorithm.

Algorithm 1. KeyGen
1.	Input: λ
2.	The client creates two set of non-zero random numbers,
γ → { γ₁, γ₂, …… … γ_n } ,
δ→ { δ₁, δ₂, …… … δ_n }.
3.	Client Generate two random permutations {π₁, π₂}
4.	Output: {π₁, π₂, γ, δ}

Algorithm 2. KMG (Key Matrix Generation)
1.	InputI_n(Identity Matrix), {π₁, π₂, γ, δ}
2.	for i = 1: m
Inv₁ = γ_i * I_{(π_1(i),i)}
end
3.	for j = 1 : n
Inv₂ = δ_j * I_{(π_2(j),j)}
end
4.	Output key Matrix (Inv₁, Inv₂)

b. ProbTrans (φ, k ): On the input of a new LR problem, this algorithm transformed the problem to preserve the privacy of input.

Algorithm 3. ProbTrans
1.	Input (Inv₁, Inv₂), X_(mn), Y_(n1)
2.	The client computes
$X^{'} = {Inv}_{1} * X * {Inv}_{2}^{- 1}$
$X^{T^{'}} = {Inv}_{2} * X^{T} * {Inv}_{1}^{- 1}$
Y′ = Inv₁ * Y
3.	Output encrypted MatrixX′, Y′

c. Compute φ_k ( X ′, Y ′): This sub-algorithm is performing the computation of the cloud server on the transformed problem.

Algorithm 4. Compute
1.	Inputφ_k (X′, Y′)
2.	The cloud invokes least square algorithm and computes β′
3.	$β^{'} = (({Inv}_{2} X^{T} {Inv}_{1}^{- 1}) * ({Inv}_{1} {XInv}_{2}^{- 1}))^{- 1} * ({Inv}_{2} X^{T} {Inv}_{1}^{- 1}) * {Inv}_{1} Y)$
4.	Outputβ′

d. Verify (β′, k ): The result computed on the cloud server returns back to the client, which became input for this sub-algorithm. The client computes (Y′ = X * β′). If the cloud server has correctly executed the algorithm Compute φ_k (X′, Y′) correctly, the relation (Y′ = X′ * β′) should be hold true.

Algorithm 5. Verify
1.	Input encrypted Matrices (X′, Y′, β′)
2.	The client computesD = (Y′–X′ * β′)
3.	If (D = = { 0, 0, 0 … … … , 0 } ^T)
return(1)
else
return(0)
end if
4.	Output: Client accepts the result β′ if it gets (1) else rejects.

e. Retransform (β′, k ): If the regression parameter (β′) has passed the result verification algorithm then only this step will be executed, else simply omitted.

Algorithm 6. Retransform
1.	Input encrypted Matrices $({Inv}_{2}^{- 1}, β')$
2.	The client computes $β = {Inv}_{2}^{- 1} * β^{'}$
3.	Output: Client gets the result β

Further, the flow of working on LR outsourcing algorithm presented in Fig. 3. The algorithm has following phases (KeyGen, ProbTrans, Compute, Verify, and Retransform).

Fig.3

Flowchart of LR outsourcing algorithm.

5 Analysis of the proposed algorithm

5.1 Correctness analysis

The proposed algorithm performs correctly, only if the client and the cloud server follows the algorithm instructions and computed the result correctly. Equation (9) provides the value of (β′). A relation can be easily establish between the regression parameter (β) and the transformed regression parameter (β′).

β′ = Inv₂β, so, the value of regression parameter (β) could be computed using $β = {Inv}_{2}^{- 1} * β^{'}$ (10)

Thus, the proposed outsourcing algorithm is correct, and retransformation using Equation (10) could be performed to obtain the regression parameter (β) from (β′).

5.2 Security analysis

The original input dataset D (X : Y) has been transformed to D (X′, Y′) with the help of ProbTrans algorithm. The proposed algorithm is able to provide privacy to the input/output pair only if the cloud server could not recover the original information.

Theorem 1.The proposed LR outsourcing algorithm can provide privacy to inputD (X : Y) and the output (β) in a malicious cloud environment.

Proof. The cloud server may behave as lazy, curious and dishonest. Since the cloud server is considered as a malicious cloud, which is a strongest adversarial model. The cloud server may attempt to record the complete client information (input and output). The cloud server then tries to retrieve the original information from the recorded information, but the cloud server will never able to recover the original client information (input and output). Further, a mathematical discussion has been presented to justifies the claim.

Step 1: First, each entry in original matrix is randomly permuted row-wise & then column-wise using permutation function {π₁, π₂} whereπ_i ∈ {1, … … … , n} and i∈ { 1, 2 }, $i . e ., X_{(i, j)}^{'} = X_{(π_{1} (i), π_{2} (j))}$ (11) $similarly, for Y_{(i, 1)} = Y_{(π_{1} (i), 1)}$ (12)

Step 2: Each entry in matrix (X, Y) is masked by multiplicative factor $\begin{matrix} γ \to {γ_{1}, γ_{2}, \dots \dots \dots γ_{m}} \\ δ \to {δ_{1}, δ_{2}, \dots \dots \dots δ_{n}} where {0 \notin γ \cup δ}, \\ i . e ., X^{'} (i, j) = (γi / - δj) X_{(π_{1} (i), π_{2} (j))} \end{matrix}$ (13) $Similarly, for Y_{(i, 1)}^{'} = γ_{i} (Y_{(π_{1} (i), 1)})$ (14)

In step 1, each entry is randomly permuted by two permutation functions {π₁, π₂}, so there are (m) ! * (n) ! cases for each entry in the matrix where each case occur with the probability of 1/(mn) !. The estimated time of brute-force attack on the key space to recover the original matrix (X) is (mn) !/2, which is a non-polynomial bound quantity in terms of (m, n).

In step 2, the matrix produced in step (1), X′ (i, j) is further multiplied by the multiplying factor γ→ { γ₁, γ₂, …… … γ_m }, δ → {δ₁, δ₂, …… … δ_n}, where {0∉ γ ∪ δ }. The approximate time for brute-force to recover the key space (γ & δ) is (|K_γ|^m * |K_δ|ⁿ)/2. A large choice of (γ & δ) will sufficiently reduce the chance of guessing (γ & δ). Therefore, the total complexity of this method is |K_γ|^2m * |K_δ|ⁿ * (m !) ²n ! .

Further, the proposed algorithm also protects the output result in the same way as input. The cloud server could not recover (β) from (β′). Moreover, the client generates a pair of new securities keys (Inv₁ and Inv₂) for every new problem submission. Therefore, the proposed encryption system is similar to the one-time-pad encryption system. Therefore, there is no chance of known-plain-text attack or chosen-plain-text attack. Thus, the proposed outsourcing algorithm can provide security to client information and not leaking any information to the cloud server.

5.3 Verifiability analysis

In the malicious threat model, the cloud server may deviate from the actual algorithm instructions and return random arbitrary results. Thus, the proposed algorithm must be equipped with the rigorous result verification process, which verifies the correctness of the result. The client receives the regression parameter (β′) from the cloud server. The client already have the transformed matrices (X′, Y′). If the regression parameter (β′) is correctly computed by the cloud server then only the equation (Y′ = X′ * β′) holds true. If the equation (Y′ = X′ * β′) holds truethen,

$\begin{matrix} D & = & Y^{'} - X * β^{'} \\ = & {(0, 0, 0 \dots \dots \dots \dots \dots \dots 0)}^{T} \end{matrix}$ (15)

Further, a mathematical discussion has been presented to prove that a false result never passes the verification test and the client always detect the server misconduct with an optimal probability of one.

Theorem 2.The proposed verification algorithm detects the server misconduct and cheating with an optimal probability of one.

Proof. The client in the algorithm verifies whether the result produces by the cloud server is correct or not. To, do so the client check the equation (Y′ = X′ * β′) holds. The verification proof is yet efficient since it only cost only O (mn) to the client. If (Y′ - X′ * β′ ≠ 0) there must be at least one row that is not equal to zero.

$\begin{matrix} \Rightarrow D = (Y^{'} - X^{'} * β^{'}) \\ \Rightarrow D = {(d_{1}, \dots, d_{k}, \dots, d_{m})}^{T} \\ \Rightarrow Let, some d_{i} \neq 0, then \\ \Rightarrow d_{i} = y_{i}^{'} - (X_{i 1}^{'} * β_{1}^{'} + X_{i 2}^{'} * β_{2}^{'} + \dots X_{in}^{'} * β_{n}^{'}) \\ \Rightarrow d_{i} = y_{i}^{'} - \sum_{j = 1}^{n} X_{i, j} * β_{j}^{'} \\ \Rightarrow d_{i} = y_{i}^{'} - (X_{i 1}^{'} * β_{1}^{'} + \sum_{j = 2}^{n} X_{i, j} * β_{j}^{'}) \\ \Rightarrow X_{i, 1}^{'} * β_{1}^{'} = y_{i}^{'} - d_{i} - \sum_{j = 2}^{n} X_{i, j} * β_{j}^{'}) \\ \Rightarrow β_{1}^{'} = \frac{y_{i}^{'} - d_{i} - \sum_{j = 2}^{n} X_{i, j} * β_{j}^{'}}{X_{i, 1}^{'}} \end{matrix}$ (16)

The value of $(β_{1}^{'})$ could be any real number, but if the server correctly computes the values of (β′), there should be only one value which satisfies Equation (16). So, $(β_{1}^{'})$ has only one value, so the probability (Pr = 1/n) where $n \in ℝ$ so Pr ≅ 0. So, the server misconduct detected with the probability of (Pr = 1 -1/n) ≅ 1. Thus, the result verification step verifies the result with the probability of (1).

5.4 Efficiency analysis

The outsourcing algorithm for regression analysis has two parts, the client-side computation, and the cloud-side computation. The client performs the following sub-operations: the KeyGen, ProbTrans, Verify, and Retransform. However, cloud server runs the complex computation of regression parameter on the transformed input Computeφ_k (X′, Y′).

Theorem 3.The secure outsourcing isO (n) times efficient implementation of linear regression.

Proof. In order to execute the proposed LR outsourcing algorithm the client needs to carry out matrix multiplication in ProbTrans (φ, k), Verify (β′, k), and Retransform (β′, k). The problem transformation operation requires to perform the following matrix multiplication such that $X^{'} = {Inv}_{1} {XInv}_{2}^{- 1}$ , $X^{T^{'}} = {Inv}_{2} X^{T} {Inv}_{1}^{- 1}$ , and Y′ = Inv₁Y. Due to the special arrangement of key matrices Inv₁ and Inv₂ that is one element in each row and column, the matrix multiplication operation causes only the complexity of matrix-vector multiplication because the cost of addition operation is omitted during the matrix multiplication. So, the upper bound asymptotic requirement of this operation is only O (mn). In the result verification the client needed to perform (Y′ = X’ * β′), where X′ is general (m × n) matrix and β′ is a column matrix (n × 1). So, a single matrix-vector multiplication only causes O (mn). Finally, the retransformation operation, which requires to carry out $β = {Inv}_{2}^{- 1} * β^{'}$ . The multiplication of the key matrix ${Inv}_{2}^{- 1}$ with a column vector only causes O (n). Thus, the client side computation never cross the asymptotic upper bound of O (mn) during the entire course of execution of LR outsourcing algorithm. However, in order to solve a linear regression analysis in naïve method causes O (n²m + n³ + mn + n²) ≈ (n²m) , where m > n. Therefore, the client system has the computer saving of O (n). Thus, the LR outsourcing algorithm is O (n) times efficient implementation than the direct method. The summary of the running time of the sub-algorithms of LR outsourcing algorithm is shown in Table 2.

Table 2
Theoretical performance analysis of regression analysis outsourcing algorithm

Client Side Computation Cloud Server

KeyGen ProbTrans ( φ , k) Verify (β, k) Retransform (β′, k) Computeφ_k (X′, Y′)

O (m + n) O (4mn + m) O (m + mn) O (n) O (n²m + n³ + mn + n²)

Client Side Computation	Cloud Server
O (m + n)	O (4mn + m)	O (m + mn)	O (n)	O (n²m + n³ + mn + n²)

6 Experimental analysis

The experimental analysis of the proposed algorithm is based on the mathematical and theoretical analysis, which has been discussed in the previous sections. The regression analysis outsourcing algorithm is developed using Matlab language version 2014a. The client & the cloud both implemented on a system having same computing capacity. The CPU is Intel^® Xenon^® CPU E5-2620 v2 @ 2.10 GHz ∼ 8 GB RAM. The reason for implementing client and server on the same node is to show the outsourcing algorithm actual efficiency and performance gain. If there are differences between the computing performance of client and server, then the performance of algorithm will be case specific. However, in reality, the cloud server always has more computing resources than the client does.

To, measure the performance of the algorithm the experimental analysis uses three standard parameters the efficiency, performance gain of the client and the relative extra cost incur by the outsourcing paradigm. $Efficiency (η) = OT / CSPT$ (17) where OT is an original time of execution of LR problem without encryption, CSPT is cloud server processing time. Ideally, the efficiency of the algorithm should be close to one. If the efficiency is nearby one. It indicates that the execution time of the original problem and the encrypted problem is almost same.

The second parameter is the Performance Gain (PG) for the client “It represents the actual speed-up the client has gained from outsourcing the problem.” $PG = OT / CPT$ (18) where, CPT is client processing time. Theoretically, the performance gain for the client should always be greater than one.

REC: The relative extra cost (REC) is define as the amount of extra work done by the client and cloud server in outsourcing paradigm as compared to direct method. Ideally, REC should be near to zero that is there should be no extra burden incur by the outsourcing paradigm. $REC = (CPT + CSPT - OT) / OT$ (19)

Further, the notations use to calculate the performance of the algorithm presented in Table 3.

Table 3

Terms & Notations

Notations	Descriptions
t _ keygen	Time for Key Generation
t _ transformation	Time for Problem Transformation
t _ verify	Time for Result Verification
t _ retransform	Time for retransformation
Original Time (OT)	Time to execute regression analysis without transformation
Cloud Server Processing Time (CSPT)	Time to execute transformed/encrypted regression analysis on the cloud server
Client Processing Time (CPT)	Client processing time(t_keygen + t_transformation+ t_verify + t_retransform)
η	Efficiency
PG	Performance Gain/Client Speedup
REC	Relative extra cost

6.1 Experiments for proposed regression analysis outsourcing algorithm

This section presents the experiment analysis of the proposed linear regression analysis outsourcing algorithm. At the very first, the client generates random instances of input to perform experiments on LR algorithm. Then, the client initiates the LR outsourcing algorithm. In the first stage, the client generates a new secret key, after that the client transforms the input using ProbTrans (φ, k) algorithm. This operation transformed the input matrices to D (X′ : Y′).

6.1.1 Computation of problem transformation

Problem transformation cost dominates the client-side computation. This sub-algorithm requires to perform matrix multiplication, but this operation is efficient as compared to executing the regression analysis computation on client system (see details in section 4.2) since the asymptotic complexity of ProbTrans (φ, k) is only O (mn). Table 4, presents the time required to perform problem transformation for different size problem. The size of the problem varies from m * n = 1000 * 500 to 10000 * 4500, and the time required to carry out the transformation of largest size problem in our experiment is 2.672 seconds on the client’s system. This amount of time is much less than performing the LR analysis, the time for LR analysis of execution shown in Table 5.

Table 4
Client side computation cost of proposed LR outsourcing algorithm

Matrix t_keygen (second) t_verify (second)

Dimension + t_transform + t_retransform

(second) (second)

1000 500 0.02047 0.00107

2000 1000 0.08016 0.00344

3000 1500 0.20371 0.00816

4000 2000 0.35334 0.01378

5000 2500 0.61406 0.02324

6000 3000 0.9297 0.03305

7000 3500 1.24628 0.0482

8000 4000 1.62968 0.06375

9000 4500 2.13453 0.0803

10000 5000 2.67216 0.09675

Matrix	t_keygen (second)	t_verify (second)
1000	500	0.02047	0.00107
2000	1000	0.08016	0.00344
3000	1500	0.20371	0.00816
4000	2000	0.35334	0.01378
5000	2500	0.61406	0.02324
6000	3000	0.9297	0.03305
7000	3500	1.24628	0.0482
8000	4000	1.62968	0.06375
9000	4500	2.13453	0.0803
10000	5000	2.67216	0.09675

Table 5

Performance analysis of proposed LR outsourcing algorithm

Matrix		Original Time (OT)	CSPT	CPT	Performance Gain	Efficiency	REC
Dimension		(second)	(second)	(second)	(OT/CPT)	(OT/CSPT)
1000	500	0.0776	0.0687	0.0215	3.6093	1.1295	0.1624
2000	1000	0.4371	0.3908	0.0836	5.2285	1.1185	0.0853
3000	1500	2.4656	2.4523	0.2119	11.6357	1.0054	0.0805
4000	2000	5.6028	5.6357	0.3671	15.2623	0.9942	0.0714
5000	2500	11.1627	11.1917	0.6373	17.5156	0.9974	0.0597
6000	3000	19.3503	19.3865	0.9627	20.1	0.9981	0.0516
7000	3500	30.7696	30.6297	1.2945	23.7695	1.0046	0.0375
8000	4000	45.5757	45.4601	1.6935	26.9121	1.0025	0.0346
9000	4500	65.165	64.5714	2.2149	29.4212	1.0092	0.0249
10000	5000	88.5563	87.6284	2.7689	31.9825	1.0106	0.0208

6.1.2 Computation of cloud-side

The cloud server performs the computation of regression parameter on transformed input D (X′ : Y′). The cloud server executes the least square method on the transformed input in Equation (6). The cloud server needs to perform some matrix multiplication operations but the elegance (the time complexity of multiplication operation is in the order of O (mn)) of the proposed method makes the cloud-side complexity same as the direct method. The upper bound time complexity of (least square method) direct method for linear regression computation and in the proposed LR outsourcing algorithm is same. As shown in Fig. 4, most of the work is accomplish by the cloud server and very little work is perform by the client for LR outsourcing in order to perform the operations of KeyGen, ProbTrans, Verify, and Retransform.

Fig.4

Execution time comparison between client and cloud.

6.1.3 Computation of verification and retransformation

Once the cloud server finishes the execution of regression analysis problem, it returns the results to the client. The client first verifies the correctness of values of the result, as the procedure discussed in Section 4.3 (Algorithms 5, 6) & Section 5.3. For the correctness of result, the equation (Y′ = X′ * β′) must hold true. The computation cost of verification process is dominated by a matrix vector multiplication, which cost only O (mn + m). Finally, if the result passes the verification process, then only the outsourcing algorithm reaches to retransformation. For retransformation operation the client will carry out $β = {Inv}_{2}^{- 1} * β^{'}$ . This operation is very efficient, cost only O (n) (linear time) because there are only n elements in ${Inv}_{2}^{- 1}$ and β′ is a column vector of n row. This matrix-vector multiplication requires only the multiplication of n elements.

The proposed algorithm has executed multiple times for each problem instance to get stable system performance. The main performance analysis is presented in Table 5. It has shown in Fig. 5 as the dimension of the problem increases the performance gain also increases. The performance gain is in double-digit for larger LR problem and is able to attain more than 31.98 times client speed-up, which is a very motivating factor to use linear regression outsourcing algorithm in real-world scenarios.

Fig.5

Client performance gain.

It has shown in Table 5, that the efficiency parameter remains close to one, that means outsourcing paradigm add no extra burden on the cloud server for executing of an encrypted problem.

Finally, REC is monotonically decreasing as the size of problem increases that means the combined extra work done by the cloud and client reduces as the problem size increases as presented in Fig. 6. It is shown in Table 5 that the REC parameter stays 0.1624 for (1000*500) problem size whereas 0.0208 for (10000*5000) which 87.19% decrease in relative extra cost.

Fig.6

Relative extra cost.

Interestingly, the experimental performance depends on problem dimension and the underlying execution platform. If the cloud exploits other faster matrix operation algorithms, then client speedup will decrease to some extent. However, as long as the size of the input goes sufficiently large, the client get a performance gain due to the apparent computation gap of O (n) between the client-side computation and server-side computation.

7 Conclusion

This work presents a privacy preserving, verifiable and efficient outsourcing algorithm for linear regression analysis in a malicious cloud environment. It has shown in the analytical analysis that the proposed algorithm is meeting the design goal of input/output privacy, correctness, verifiability, and efficiency. The proposed LR algorithm requires a one-time amortizable setup phase (KeyGen + Transformation) with O (4mn + m) cost, then each following algorithm such as verification only cost O (mn + m) & Retransformation cost O (n). However, the time complexity for execution of regression analysis problem cost O (n²m + n³ + mn + n²). Therefore, the client has the computational saving of O (n) than the direct algorithm for LR analysis. A study has also conducted in the paper, which investigated the least square method and the algebraic property of the matrix-vector multiplication for the development of an efficient result verification algorithm which verify the correctness of result with an optimal probability. Further, it has shown in the experimental analysis that the proposed algorithm able to achieve an astonishing 31.98 times client speed in the experiment. The efficiency parameter remains close to one which means outsourcing paradigm add no extra burden on the cloud server for executing an encrypted problem. Finally, REC is monotonically decreasing as the size of problem increases that means the combined extra work done by the cloud and client reduces as the problem size increases. Through analytical and experimental performance demonstrated the practical usability of the proposed LR outsourcing algorithm in a malicious cloud environment. In future, it would be remarkable to find newer computationally expensive mathematical, scientific and engineering problems and then designing outsourcing algorithms to solve them.

References

Shiraz

, Gani

, Khokhar

R.H.

and Buyya

, A review on distributed application processing frameworks in smart mobile devices for mobile cloud computing, IEEE Commun Surv Tutorials15(3) (2013), 1294–1313.

Xiao

and Xiao

, Security and privacy in cloud computing, IEEE Commun Surv Tutorials15(2) (2013), 843–859.

Marston

, Bandyopadhyay and Ghalsasi

, Cloud Computing - The Business Perspective, 44th Hawaii Int Conf Syst Sci, 2011, pp. 1–11.

Greenberg

, Hamilton

, Maltz

D.A.

and Patel

, The cost of a cloud: Research problems in data center networks, ACM SIGCOMM Comput Commun Rev39(1) (2009), 68–73.

Murugesan

, Harnessing green IT: Principles and practices, IT Professional10(1) (2008), 24–33.

Atallah

M.J.

and Frikken

K.B.

, Securely Outsourcing Linear Algebra Computations, Proc 5th ACM Symp Information, Comput Commun Secur - ASIACCS, 2010, pp. 48–59.

Bednarz

, Bean

and Roughan

, Hiccups on the road to privacy-preserving linear programming, Proc 8th ACM Work Priv Electron Soc - WPES, 2009, pp. 117–120.

Lin

and Chen

M.S.

, Privacy-preserving outsourcing support vector machines with random transformation, Proc 16th ACM SIGKDD Int Conf Knowl Discov Data Min, 2010, pp. 363–372.

Chen

, Xiang

and Yang

, Privacy-preserving and verifiable protocols for scientific computation outsourcing to the cloud, J Parallel Distrib Comput74(3) (2014), 2141–2151.

10.

Mohassel

, Efficient and secure delegation of linear algebra, IACR Cryptol ePrint Arch (2011), 1–33.

11.

Chen

, Huang

, Li

, Ma

, Lou

and Wong

D.S.

, New algorithms for secure outsourcing of large-scale systems of linear equations, IEEE Trans on Infor Forens and Secur10(1) (2015), 69–78.

12.

Beale

C.M.

, Lennon

J.J.

, Yearsley

J.M.

, Brewer

M.J.

and Elston

D.A.

, Regression analysis of spatial data, Ecol Lett13(2) (2010), 246–264.

13.

Harrell

, Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis. Springer, 2015.

14.

De Wolf

E.D.

, Madden

L.V.

and Lipps

P.E.

, Risk assessment models for wheat Fusarium head blight epidemics based on within-season weather data, Phytopathology93(4) (2003), 428–435.

15.

Chu

, Kim

S.K.

, Lin

Y.A.

, Yu

, Bradski

, Ng

A.Y.

and Olukotun

, Map-reduce for machine learning on multicore, Adv Neural Inf Process Syst19 (2007), 281.

16.

Guestrin

, Bodik

, Thibaux

, Paskin

and Madden

, Distributed regression: An efficient framework for modeling sensor network data, Third International Symposium on in Information Processing in Sensor Networks, 2004, pp. 1–10.

17.

Shankar

, Mannering

and Barfield

, Effect of roadway geometrics and environmental factors on rural freeway accident frequencies, Accid Anal Prev27(3) (1995), 371–389.

18.

Seshadri

, Luk

, Shi

, Perrig

and Van

, Doorn and P. Khosla, Pioneer: Verifying integrity and guaranteeing execution of code on legacy platforms, In Proceedings of ACM Symposium on Operating Systems Principles (SOSP), 2005, 173.

19.

Belenkiy

, Chase

, Erway

C.C.

, Jannotti

, Küpçü

and Lysyanskaya

, Incentivizing outsourced computation, In Proceedings of the 3rd International Workshop on Economics of Networked Systems, 2008, pp. 85–90.

20.

Monrose

, Wyckoff

and Rubin

, Distributed execution with remote audit, Ndss99 (1999), 3–5.

21.

Yee

, Using secure coprocessors, PhD Dissertation, CMU, 1994.

22.

Smith

S.W.

and Weingart

, Building a high-performance, programmable secure coprocessor, Comput Networks31(8) (1999), 831–860.

23.

Bajikar

, Trusted platform module (tpm) based security on notebook pcs-white paper, Mob Platforms Gr Intel Corp (2002), 1–20.

24.

Goldwasser

, Kalai

Y.T.

and Rothblum

G.N.

, Delegating computation: Interactive proofs for muggles, Stoc62(4) (2008), 113–122.

25.

Goldwasser

, Micali

and Rackoff

, The knowledge complexity of interactive proof systems, SIAM J Comput18(1) (1989), 186–208.

26.

Fortnow

and Lund

, Interactive proof systems and alternating time— space complexity, Theor Comput Sci113(1) (1993), 55–73.

27.

Gennaro

, Gentry

and Parno

, Non-interactive verifiable computing: Outsourcing computation to untrusted workers, Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics)6223 (2010), 465–482.

28.

Wang

, Ren

, Wang

and Wang

, Harnessing the cloud for securely outsourcing large-scale systems of linear equations, IEEE Trans Parallel Distrib Syst24(6) (2013), 1172–1181.

29.

Laud

and Pankova

, Transformation-based outsourcing of linear equation systems over real numbers, IACR Cryptology ePrint Archive (2015), 322.

30.

Hong

, Vaidya

and Lu

, Secure and efficient distributed linear programming, J Comput Secur20(5) (2012), 583–634.

31.

Chen

, Xiang

, Lei

and Chen

, Highly efficient linear regression outsourcing to a Cloud, IEEE Trans Cloud Comput2(4) (2014), 499–508.

32.

Wang

, Ren

, Wang

and Urs

K.M.

, Harnessing the Cloud for Securely Solving Large-Scale Systems of Linear Equations, 31st Int Conf Distrib Comput Syst, 2011, pp. 549–558.

33.

Lei

, Liao

, Huang

, Li

and Hu

, Outsourcing large matrix inversion computation to a public cloud, IEEE Trans Cloud Comput1(1) (2013), 78–87.

34.

Chen

, Li

and Ma

, New algorithms for secure outsourcing of modular exponentiations, IEEE Trans Parallel Distrib25(9) (2014), 2386–2396.

35.

Lei

, Liao

, Member

, Huang

and Li

, Cloud Computing Service: The case of large matrix determinant computation, IEEE Trans. on Services Comput8(5) (2015), 688–700.

36.

and Tang

, Secure outsourced computation of the characteristic polynomial and eigenvalues of matrix, J Cloud Comput4(1) (2015), 4–9.

37.

Zhou

and Li

, Outsourcing eigen-decomposition and singular value decomposition of large matrix to a public cloud, in IEEE Access4 (2016), 869–879. doi: 10.1109/ACCESS.2016.2535103

38.

Gentry

, Computing arbitrary functions of encrypted data, Commun ACM53(3) (2010), 97–105.

39.

Van Dijk

, Gentry

, Halevi

and Vaikuntanathan

, Fully homomorphic encryption over the integers, Adv Cryptology–EUROCRYPT2010 (2010), 24–43.

40.

Yao

A.C.

, Protocols for secure computations, 23rd Annu Symp Found Comput Sci (sfcs 1982), 1982, pp. 1–5.

41.

Yao

A.C.

, How to generate and exchange secrets, 27th Annu Symp Found Comput Sci (sfcs 1986)1 (1986), 162–167.

42.

Gentry

, A fully homomorphic encryption scheme, PhD Dissertation, Stanford University, 2009.

43.

Chung

K.M.

, Kalai

and Vadhan

, Improved delegation of computation using fully homomorphic encryption, Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics)6223 (2010), 483–501.

44.

Lei

, Liao

, Huang

and Heriniaina

, Achieving security, robust cheating resistance, and high-efficiency for outsourcing large matrix multiplication computation to a malicious cloud, Inf Sci (Ny)280 (2014), 205–217.

45.

Wang

, Ren

and Wang

, Secure and Practical Outsourcing of Linear Programming in Cloud Computing, Proc IEEE INFOCOM, 2011, pp. 820–828.

46.

Blanton

, Zhang

and Frikken

K.B.

, Secure and verifiable outsourcing of large-scale biometric computations, ACM Trans Inf Syst Secur16(3) (2016), 1–33.

47.

Troncoso-Pastoriza

J.R.

, Comesana

and González

, Secure Direct and Iterative Protocols for Solving Systems of Linear Equations, In Proc of 1st International Workshop on Signal Processing in the EncryptEd Domain (SPEED), 2009, pp. 122–141.

48.

Goldreich

W.A.

, Micali

, Goldreich

S.O.

, Micali

and Wigderson

, How To Play Any Mental Game, or A Completeness Theorem for Protocols with Honest Majority, Proceedings of 19th Annual ACM Symposium on Theory of Computing, 1987, pp. 218–229.

49.

, Chen

and Han

Y.S.

, Privacy-Preserving Multivariate Statistical Analysis: Linear Regression and Classification, Proc 4th SIAM Int Conf Data Min, 2004, pp. 222–233.

50.

Knuth

D.E.

, The Art of Computer Programming: Sorting And Searching, 3, Pearson Education, 1998.

51.

Anderson

M.J.

and Robinson

, Permutation tests for linear models, Aust N Z J Stat43(1) (2001), 75–88.

52.

Dreier

and Kerschbaum

, Practical secure and efficient multiparty linear programming based on problem transformation, IACR Cryptology ePrint Archive, 2011, p. 108.

53.

Lindell

and Pinkas

, Secure multiparty computation for privacy-preserving data mining, J Priv Confidentiality, Berkeley Electronic Press1(1) (2009), 59–98.

54.

López-Alt

, Tromer

and Vaikuntanathan

, On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption, Proc 44th Symp Theory Comput–STOC, 2012, pp. 1219–1234.

Notations and	Description
Symbols
D (X : Y)	Input dataset
φ (X, Y)	Linear regression without encryption
φ_k (X′, Y′)	Encrypted linear regression
(β)	Regression parameter
	Transformed regression parameter
(π₁, π₂)	Permutation function
(γ, δ)	Non-Zero random numbers
(I₁, I₂)	Identity matrix
Key	k = Inv₁, Inv₂ for transformation
Inv₁, Inv₂	The key matrices Inv₁, Inv₂ are general permutation matrix)

Client Side Computation				Cloud Server
KeyGen	ProbTrans ( φ , k)	Verify (β, k)	Retransform (β′, k)	Computeφ_k (X′, Y′)
O (m + n)	O (4mn + m)	O (m + mn)	O (n)	O (n²m + n³ + mn + n²)

Privacy preserving,verifiable and efficient outsourcing algorithm for regression analysis to a malicious cloud

Abstract

Keywords

1 Introduction

2 Related work

3.1 System model

3.3 Outsourcing algorithm framework

4.1 Creation of generalized permutation matrix

5.1 Correctness analysis

Table 2 Theoretical performance analysis of regression analysis outsourcing algorithm Client Side Computation Cloud Server KeyGen ProbTrans ( φ , k) Verify (β, k) Retransform (β′, k) Computeφ k (X′, Y′) O (m + n) O (4mn + m) O (m + mn) O (n) O (n2m + n3 + mn + n2)

6.1.1 Computation of problem transformation

References

Table 2
Theoretical performance analysis of regression analysis outsourcing algorithm

Client Side Computation Cloud Server

KeyGen ProbTrans ( φ , k) Verify (β, k) Retransform (β′, k) Computeφ_k (X′, Y′)

O (m + n) O (4mn + m) O (m + mn) O (n) O (n²m + n³ + mn + n²)