New SVD-based collaborative filtering algorithms with differential privacy

Abstract

In the era of big data, real and reliable user data information is an important factor in the recommendation technology; therefore, the disclosure of personal privacy has become a significant problem user concern. Differential privacy protection is a proven and very strict privacy protection technology, which is particularly good at protecting the privacy of indirect derivation. Singular Value Decomposition (SVD) is one of the common matrix factorization techniques used in collaboration filtering for recommender systems and it considers the user and item bias. This paper will develop a flexible application that implements differential privacy in SVD. As part of the development process, on one hand, our algorithms do not need to perform any pre-processing of the raw input matrix. On the other hand, the experimental results, using two real datasets, show that our algorithms not only protect privacy information in the raw data but also ensure the accuracy of recommendations. Finally, a trade-off scheme is used, which can balance the privacy protection and the recommendation accuracy to a certain extent.

Keywords

Recommender system collaborative filtering privacy information differential privacy matrix factorization

1 Introduction

To offer useful recommendations, two critical elements are essential in recommender system (RS) research: one element is the collection of users’ massive personal information, such as purchase and browsing information on the Internet; another element is RS technology. The core technology of recommender systems uses a collaborative filtering algorithm, particularly the Matrix Factorization (MF), which is based on the latent factor model and is a widely used algorithm and winner of the Netflix Prize. In the real world, ratings, comments and other data provided by users often have some bias. This bias may stem from a user’s personal preference, consumer personality differences, or an item. Singular Value Decomposition (SVD) [1, 2] is a type of MF method. When used for recommender systems, SVD considers the user and item bias information, which can often improve the better predictive accuracy over traditional MF methods. However, while a user enjoys the convenience of recommender systems, there is also a risk of disclosure of user privacy.

A Collaboration Filtering (CF), based on items chosen by a user during a transaction, will increase the similarity commodity list based on a user’s previous transaction commodities. Thus, an attacker can track similar commodity lists related to the targeted user and then determine a new commodity. When similar commodities appear in these lists, the attacker can deduce the item added to the target user’s records. Calandrino et al. [3] investigated the privacy risks of recommender systems based on CF and showed that it is feasible to draw meaningful inferences about transactions of specific users from the public outputs of recommender systems. Dwork proposed three types of inference attacks for attack modes based on user purchase records [4], i.e., when an attacker has some background knowledge about a user. Such attacks are significant threats to user privacy.

One of the goals of the RS field, and even the whole machine learning field [5, 6], is the improvement of the Quality of Service (QoS) [7]. To ensure personal privacy and security, and eliminate user concerns over data protection, users are encouraged to provide true and reliable information to ensure the establishment of effective information system safeguards and security guidelines. In 2006, Dwork et al. [4 , 9] proposed the Differential Privacy (DP) method; it has a very strict definition and has nothing to do with background knowledge.

Our contributions are summarized as follows. First, we propose three new DP algorithms for SVD: Private ALS with Input perturbation, Private ALS with Objective Perturbation and Private ALS with Objective Perturbation. Second, some key mathematical proofs are provided to ensure that the new algorithms satisfy the definition of DP. Third, we compare the predictive accuracy of our algorithms, over two real datasets, to a baseline and related methods found in the relevant literature. The results show that our algorithms achieve better results. Finally, to address the trade-off between the strength of privacy protection and predictive accuracy, we use a scheme that selects a reasonable range of DP protection parameter ɛ. A reasonable range of the DP parameter can be obtained by this scheme.

2 Related work

Work investigating the implementation of DP methods in RS, due to privacy protection concerns, has become a popular research topic in recent years. Zhu et al. [10] applied DP to the neighbourhood-based CF methods and addressed privacy concerns, in this context, by proposing a private neighbour collaborative filtering (PNCF) algorithm. Hua et al. [11] proposed a method to prevent the unauthorized use of user ratings by untrusted recommenders while allowing the user to abstain from or join the MF process. The DP protection was implemented by disturbing the MF objective function. Liu et al. [12] proposed the application of DP to Bayesian posterior sampling via a Stochastic Gradient Langevin Dynamics (SGLD) step, thus avoiding the influence of Gaussian noise on the entire parameter space. Yan et al. [13] proposed the utilization of the DP principle and social relationships to adaptively modify user-rating histories to prevent exact user information from being leaked. Balu et al. [14] proposed using sketching techniques to implicitly provide DP guarantees by taking advantage of the inherent randomness of data structures; this approach is well suited for large-scale applications. Javidbakht et al. [15] proposed using DP as a metric to quantify the privacy of an intended destination; optimal probabilistic routing schemes are investigated under unicast and multicast paradigms. Boutet et al. [16] proposed an original obfuscation scheme and a randomized dissemination protocol to preserve privacy while leveraging user profiles in distributed recommender systems. Berlioz et al. [17] proposed applying DP to latent factor model in each step of MF; however, they lacked the rigorous mathematical proofs and required for pre-processing raw data, and thus, the experimental results determined a DP parameter that was too large when good recommender accuracy was obtained.

Chaudhuri et al. [18] proposed general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). They proposed an output perturbation and objective perturbation based DP model, but they were applied to SVM and Logistic regression. In this paper, we propose new algorithms to preserve the privacy of a RS. We apply DP to SVD. Rigorous proofs will be provided to ensure these algorithms satisfy DP. Finally, some experimental results show that our algorithms can also obtain good predictive accuracy.

3 Preliminaries

3.1 Differential privacy

Differential privacy [3, 8] is essentially different from the traditional privacy protection model. DP requires that the removal or addition of a single database record should not significantly affect the outcome of any analysis based on the database. It defines an extremely strict attack model and can provide a rigorous, quantitative representation and proof of the disclosure risk of private information. DP can provide good protection even if the attacker has an extensive background information.

Definition 3.1. (ɛ - differential privacy [4, 9]). A random privacy algorithm A provides DP for any adjacent datasets D and D′, which differ by at most one record (i.e.,), and for any possible sanitized subset S of possible outcomes in the Range (A) such that $\Pr [A (D) \in S] \leq \exp (ε) \times \Pr [A (D^{'} \in S]$ (3.1) where Pr [.] is the probability that privacy will be disclosed and is controlled by the randomness of algorithm A. Note, this has nothing to do with the background knowledge of the attacker. Parameter ɛ is used to indicate the degree-of-privacy-protection, where smaller values indicate a higher degree-of-privacy protection.

The key technology of DP protection adds noise that satisfies the Laplace or Exponent mechanism [7, 18]. The former is applied to the results for numerical protection and the latter for non-numerical protection. The amount of noise has a direct correlation with the function’s sensitivity and the privacy protection parameter ɛ.

Definition 3.2. (global sensitivity [19]). For any function $f : ℝ^{n \times m} \to ℝ^{n \times m}$ , the L_k - sensitivity of function f is $S_{k} (f) = max_{D, D^{'}} {∥ f (D) - f (D^{'}) ∥}_{k}$ (3.2) where d is the dimension of function f and ∥ . ∥ _k denotes the L_k - norm.

Laplace Mechanism. Dwork et al. [8, 19] proposed that the Laplace mechanism could be used to obtain ɛ - differenatial privacy. The main idea is to add noise sampled from a Laplace distribution with a calibrated scale b. The probability density function of the Laplace distribution with mean 0 and scale b is $f (x | b) = \frac{1}{2 b} exp (- \frac{| x |}{b})$ (3.3)

We will sometimes write Lap (b) to denote a random variable X ∼ Lap (b) [3], where b is determined by both S_k (f) and the privacy parameter ɛ.

Theorem 3.3.For any function $g : ℝ^{n \times m} \to ℝ^{n \times m}$ , if algorithmAhas $A (x) = g (x) + (Laplace (S_{1} (g) / ɛ)^{d}$ (3.4) then algorithm A provides ɛ - differential privacy.

In this work, we also rely on the K - norm mechanism [20], which makes it possible to calibrate noise of the L₂ - sensitivity for the evaluated function. The outputs of our algorithms are all numerical, so we use the Laplace mechanism to achieve DP.

3.2 Matrix factorization

MF is one of the effective methods used to predict the missing ratings of a sparse matrix (e.g. ratings matrix). Briefly, MF implies factorizing a matrix, that is, to find two (or more) matrices, such that when we multiply them, we can obtain the approximate matrix of the raw matrix. From an application point-of-view, MF can be used to discover latent features underlying the interactions between two different kinds of entities.

Let the input of MF be a rating matrix R_n×m containing the ratings of n users for m items. Each matrix element r_ui refers to the rating of user u for item i. Assume a low-dimensional d and that MF factorizes the raw matrix R_n×m into two latent matrices: a user-factor matrix P_n×d and an item-factor matrix Q_d×m. The factorization is done such that R is approximated as the product of P and Q; that is, each known rating r_ui is approximated by ${\bar{r}}_{ui} = p_{u}^{T} q_{i}$ ( $p_{u} \in ℝ^{n \times d}$ is the u - th row of matrix P and represents the coordinates of user u projected in the d - dimensional latent space. Likewise, $q_{i} \in ℝ^{d \times m}$ is the i - th row of matrix Q and is considered the coordinate of item i in this latent space.). To obtain P and Q, MF minimizes the regularized squared error as follows [1]:

$\begin{matrix} (P, Q) & = & \underset{P, Q}{arg min} \sum_{r_{ui} \in R} (r_{ui} - p_{u}^{T} q_{i})^{2} \\ + λ (\sum_{u} | | p_{u} | |_{2}^{2} + \sum_{u} | | q_{i} | |_{2}^{2}) \end{matrix}$ (3.5) where λ is the regularization parameter used to regularize the factors, and prevent over-fitting.

3.3 Singular Value Decomposition (SVD)

In general, the predictive rating $(p_{u}^{T} q_{i})$ only captures the interaction between the user and the item. In the real world, the users or the items (also known as biases) may affect observed ratings, not the interaction between them. A systematic tendency of some users to give higher ratings than others is an example of this behaviour; as is the tendency of some items receiving higher ratings than others. In fact, consider the user and item bias to more objectively reflect the truth of the rating. SVD is such a typical MF technology (known as Baseline predictors in some literature). Thus, the predictive rating is changed to ${\tilde{r}}_{ui} = μ + b_{u} + b_{i} + p_{u}^{T} q_{i}$ (3.6) where μ is the global average rating, b_u and b_i indicate the observed deviations of user u and item i, respectively, from the average. Let $e_{ui} = r_{ui} - {\tilde{r}}_{ui}$ . Hence, the objective function of the SVD can be changed from formula (3.5) to

$\begin{matrix} (P, Q) & = & \underset{P, Q}{arg min} \sum_{r_{ui} \in R} e_{ui}^{2} + λ (\sum_{u} b_{u}^{2} + \sum_{i} b_{i}^{2} \\ + \sum_{u} {∥ p_{u} ∥}_{2}^{2} + \sum_{i} {∥ q_{i} ∥}_{2}^{2}) \end{matrix}$ (3.7) where $e_{ui} = r_{ui} - {\tilde{r}}_{ui} = r_{ui} - μ - b_{u} - b_{i} - p_{u}^{T} q_{i}$ .

Ref. [1] provides two methods to solve b_U and b_i. One is an empirical likelihood estimation (the formula (8)), the other is Stochastic Gradient Descent (SGD). In this paper, we adopt the first method due to the rate of convergence and the influence of the error in each iteration.

$\begin{matrix} b_{i} & = & \frac{\sum_{u \in R (i)} (r_{ui} - μ)}{λ_{1} + | R (i) |}, \\ b_{u} & = & \frac{\sum_{i \in R (u)} (r_{ui} - μ - b_{i})}{λ_{2} + | R (u) |} \end{matrix}$ (3.8) where |R (i) | is the number of users who rated item i and |R (u) | is the number of items rated by user u. The averages are reduced to a value between zero and one using the regularization parameters λ₁ and λ₂, which are determined by cross-validation.

4 Applying DP to SVD

If an attacker has some background knowledge, he can obtain a user’s private data from the raw rating matrix. For example, an attacker can infer that a user who likes certain types of movies may not want other people to know. Thus, our goal is to protect the raw rating matrix by using DP reasonably. In this paper, we apply DP to SVD. According to the principle of MF, we can see that the MF process can be divided into the four stages: input the “user-item” ratings matrix; SVD solved by ALS; output the latent user matrix and item matrix; rating prediction. Berlioz et al. [17] proposed to apply DP to these four stages and needed to perform pre-processing in the raw matrix. In contrast, our algorithms do not perform any pre-processing with DP. On the other hand, our algorithms consider user and item bias information to improve recommender accuracy. We propose three new DP algorithms for SVD: Private ALS with Input perturbation, Private ALS with Objective Perturbation and Private ALS with Objective Perturbation. The whole framework of our three new algorithms is shown in Fig. 1.

Fig.1

The whole framework of our three algorithms.

4.1 Alternating Least Squares (ALS)

ALS is one of the common methods for solving the resulting non-convex optimization problem. In ALS, we can solve the optimization problem iteratively. One latent matrix (say P) in each iteration is fixed, then the objective function of the SVD (Equation (3.7)) is converted into a convex optimization problem, where the solution (for Q) can be found efficiently. Similarly, we can find another latent matrix in this way. Finally, these steps are repeated until convergence is achieved.

According to the principle of ALS, the raw objective function (Equation 3.7) can become two convex optimization problems as follows. $J_{Q} (p_{u}, R) = \sum_{R_{u}} e_{ui}^{2} + n_{u} λ {∥ p_{u} ∥}_{2}^{2}$ (4.9) $J_{P} (q_{i}, R) = \sum_{R_{i}} e_{ui}^{2} + n_{i} λ {∥ q_{i} ∥}_{2}^{2}$ (4.10) where n_u = |R (u) |, n_i = |R (i) |, (Section 3.3).

Then, each user vector p_u and item vector q_i can be obtained by solving the ERM [18] problem as follows $p_{u} (R, Q) = \underset{p_{u}}{arg min} J_{Q} (p_{u}, R)$ (4.11) $q_{i} (R, P) = \underset{q_{i}}{arg min} J_{P} (q_{i}, R)$ (4.12)

We first give a function (named alsSVD ()) that solving SVD by ALS.

Function : alsSVD (R, k) // SVD solved by ALS
// R is the input matrix, k is the number of iteration
1: F ork iterationsdo
2: For each user u, when given matirx Q do
3: $p_{u} (R, Q) = \underset{p_{u}}{arg min} J_{Q} (p_{u}, R)$ ;
4: E nd F or
5: F oreach item i, when given matirx Pdo
6: $q_{i} (R, P) = \underset{q_{i}}{arg min} J_{P} (q_{i}, R)$ ;
7: E nd F or
8: E nd F or

4.2 Private ALS with input perturbation

The premise of Algorithm 1 is to perturb the raw-ratings matrix with Laplace noise; then, the algorithm is trained using the noisy input ratings; after this, we use ALS to solve SVD.

Algorithm 1: DPSVDALSIn
Input:R_n×m = {r_ui}-“user-item” ratings matrix
d-number of factors
λ-regularization parameter of SVD objective function
k-number of ALS iterations
ɛ-differential privacy budget
Output: Latent factor matrices P_n×d and P_d×m
1: Initialize random latent factor matrices P and Q;
2: Compute the perturbed ratings matrix:
3: Let $R^{'} = {r_{u i}^{'}} = {r_{u i} + b \| r_{u i} \in R}$ ;
4: $(while υ (b) \propto exp (- \frac{ɛ \| \| b \| \|}{Δ r}) and Δ r = r_{max} - r_{min})$ ;
5: Clamp the rating $r_{ui}^{'}$ in R′ to the range [r_min, r_max];
6: Call Function alsSVD(, k);
7: ReturnP_n×d and Q_d×m

Algorithm 1 requires the following explanations, which are provided here:

r_max is the maximum value and r_min is the minimum value in the rating matrix. The perturbed rating $r_{ui}^{'}$ should be associated with [r_min, r_max] (r_min = 1 and r_max = 5 in our experiment), considering the raw rating is in this range.

In Algorithm 1, the predictive rating (according to Equation (3.6)) should be computed by using the perturbed rating matrix. Here, μ, b_u and b_i are all new values. Thus, the error also should be the value between the perturbed rating and it’s predictive rating.

Δr (Δr = r_max - r_min) is the difference between the maximum value and the minimum value.

Theorem 4.1.Given the DP budgetɛ, the maximum value (r_max), the minimum value (r_min) in the “user-item” rating matrix, and letΔr = r_max - r_min. If the noise vector is $υ (b) \propto exp (- \frac{ɛ | | b | |}{Δ r})$ , then Algorithm 1 providesɛ - differential privacy.

Proof. First, the global sensitivity of ratings (GS_{r
_ui}) has the largest distance rating, so GS_{r
_ui} = Δr = r_max - r_min.

Second, b is a noise vector that is added to each r_ui and its probability density is $υ (b) \propto exp (- \frac{ɛ | | b | |}{Δ r})$ . By the Laplace Mechanism (see Section 3.1.), the new rating is $r_{ui}^{'} = r_{ui} + Lap (\frac{{GS}_{r_{ui}}}{ɛ}) = e_{ui} + Lap (\frac{Δ r}{ɛ})$ . Thus, Algorithm 1 provides ɛ - differential privacy.

4.3 Private ALS with objective perturbation

Chaudhuri et al. [18] proposed two new approaches: objective perturbation and output perturbation using DP for the design of privacy-preserving algorithms. Specifically, their experiments showed that the results of objective perturbation are optimal when balancing privacy protection and recommendation accuracy. In this subsection, we apply this approach to an ALS objective function and provide a reasonable proof of this method.

The premise of Algorithm 2 is to add noise to the objective function (Equations (4.9 or 4.10)). Furthermore, from the predictive rating (Equation (3.6)), we infer that the inner product of matrix P and Q is disturbed if at least one of two matrices is perturbed. So, we only add noise to the ALS objective function for solving P, and can obtain Equation (4.13), $J_{Q}^{priv} (p_{u}, R) = J_{Q} (p_{u}, R) + \frac{1}{n} b^{T} p_{u}$ (4.13) where b is a noise vector with d components, d is the number of features of P or Q. To solve the convex optimization problem, we use ERM [18]. From (4.13), we obtain $p_{u}^{priv} = \underset{p_{u}}{arg min} J_{Q}^{priv} (p_{u}, R) + \frac{1}{2} Δ | | p_{u} | |_{2}^{2}$ (4.14)

According to Algorithm 2 of Ref. [18], the regularizing terms $\frac{1}{2} Δ | | p_{u} | |_{2}^{2}$ avoid over-fitting after perturbation, whereas Δ is determined by the privacy parameter ɛ and the slack term parameter C.

The ALS objective functions of the SVD are convex and differentiable, so they satisfy the application conditions of the Algorithm 2 of Ref. [18]. Our Algorithm 2 describes the DP protection of ALS objective perturbation to solve the latent factors of SVD.

Algorithm 2: DPSVDALSObj
Input:R_n×m = {r_ui}-“user-item” ratings matrix
d-number of factors
λ-regularization parameter of SVD objective function
k-number of ALS iterations
ɛ-differential-privacy parameter
C-the parameter of computing slack term
Output: Latent factor matrices P_n×d and Q_d×m
1: Initialize random latent factor matrices P and Q;
2: Call Function alsSVD(R, k - 1);
3: Foreach user u, when given matirx Qdo
4: Let $ε^{'} = ε - \log (1 + \frac{2 C}{N n_{u} λ} + \frac{C^{2}}{N^{2} {(n_{u} λ)}^{2}})$ ;
5: f ɛ′ then Δ = 0;
6: else $Δ = \frac{C}{N (e^{ɛ / 4} - 1)} - λ and ɛ^{'} = ɛ / 2$ ;
7: Generate random noise vector b with pdf
$υ (b) \propto \exp (- \frac{ε^{'} \| \| b \| \|}{2})$ ;
8: $p_{u}^{priv} = \underset{p_{u}}{arg min} J_{Q}^{priv} (p_{u}, R) + \frac{1}{2} Δ \| \| p_{u} \| \|_{2}^{2}$ ;
9: End for
10: Foreach item i, when given matirx Pdo
11: $q_{i} (R, P) = \underset{q_{i}}{arg min} J_{P} (q_{i}, R)$ ;
12: End for
13: ReturnP_n×d and Q_d×m

Our Algorithm 2 requires the following explanations:

First, we need to deduce and compute the value of parameter C in steps 6 and 8. Here, we determine that the value of C is 2. Refer to [18], the value of C is deduced as follows:

From the J_Q (p_u, R) objective function for solving SVD (Equation (4.9)), we know the loss function of J_Q (p_u, R) is $ℓ_{p_{u}} (e_{ui}) = e_{ui}^{2}$ (4.15) where $e_{ui} = r_{ui} - μ - b_{u} - b_{i} - p_{u}^{T} q_{i}$ . Since it’s convex and doubly differentiable, then obtain the first derivative and second derivative of ℓ_{p
_u} (e_ui) as follows,

$\begin{matrix} (ℓ_{pu} (e_{ui}))' & = & \frac{\partial ℓ (e_{ui})}{\partial e_{ui}} = 2 e_{ui}, \\ (ℓ_{pu} (e_{ui})) ″ & = & \frac{\partial (ℓ (e_{ui}))}{\partial e_{ui}} = 2 \end{matrix}$ (4.16)

In addition, the regularization term $| | p_{u} | |_{2}^{2}$ is 1 - strongly convex and doubly differentiable.

According to the Theorem 2 of [18], | ℓ _{p
_u}″| ≤ C. Here, we can obtain C = 2 from (ℓ _{p
_u} (e_ui))″ = 2.

We solve the value of p_u after objective function perturbation; that is, we solve the partial derivative for Equation (4.14), where n indicates the number of users, m indicates the number of items in the raw matrix and N indicates the number of samples.

When ∀1 ≤ u ≤ n, and 1 ≤ k ≤ d, we obtain

$\begin{matrix} \frac{1}{2} \frac{\partial p_{u}^{priv}}{\partial p_{u, k}} & = & \sum_{i} (μ + b_{u} + b_{i} + p_{u}^{T} q_{i} - r_{ui}) q_{ik} \\ + λ n_{u} p_{uk} + \frac{1}{N} b_{k} + \frac{1}{2} Δ p_{uk} \end{matrix}$ (4.17)

And then, we have

$\begin{matrix} \frac{1}{2} \frac{\partial p_{u}^{priv}}{\partial p_{uk}} & = & \frac{1}{2} (\frac{\partial p_{u}^{priv}}{\partial p_{u 1}}, \dots, \frac{\partial p_{u}^{priv}}{\partial p_{ud}}) \\ = & p_{u} [Q^{T} Q + (λ n_{u} + \frac{1}{2} Δ) I] \\ - R_{u} Q + b_{u} Q + b_{i} Q + μ Q + \frac{1}{N} b \end{matrix}$ (4.18) where I is an d × d identity matrix.

Then, fixing Q and solving $\frac{\partial p_{u}^{priv}}{\partial p_{u}} = 0$ , we have

$\begin{matrix} p_{u} & = & (R_{u} Q - b_{u} Q - b_{i} Q - μ Q - \frac{1}{N} b) \\ {[Q^{T} Q + (λ n_{u} + \frac{1}{2} Δ) I]}^{- 1} \end{matrix}$ (4.19)

Theorem 4.2.Given the DP budgetɛand the parameter computing the slack termC, if ||p_u||₂and the loss functions ofJ_Q (p_u, R) are convex and differentiable, Algorithm 2 providesɛ - differential privacy.

Proof. Our Algorithm 2 satisfies the application condition of Algorithm 2 in Ref. [18], which was proven to provide ɛ - differential privacy; thus, our Algorithm 2 also provides ɛ - differential privacy. Space constraints prevent a detailed description in this paper.

4.4 Private ALS with output perturbation

The main idea of Algorithm 3 is that it guarantees DP by adding a random noise vector b to the output of J_Q (p_u, R) (Equation(4.9)) or q_i (R, P) (Equation(4.12)). Similar to Algorithm 2, we only add noise to the output of J_Q (p_u, R).

Algorithm 3: DPSVDALSOut
Input:R_n×m = {r_ui}-“user-item” ratings matrix
d-number of factors
λ-regularization parameter of SVD objective function
k-number of ALS iterations
e_max, e_min-upper and lower bounds on per-rating error
ɛ-differential privacy budget
Output: Latent factor matrices P_n×d and Q_d×m
1: Initialize random latent factor matrices P and Q;
2: Call Function alsSVD(R, k - 1);
3: Foreach user u, when given matirx Qdo
4: Generate random noise vector b with pdf
5: $f (b) \infty exp (- \frac{ɛ \| \| b \| \|}{2 k} \cdot \frac{n_{u} λ}{2 q_{max} Δ r})$ ;
6: $p_{u} (R, Q) = \underset{p_{u}}{arg min} J_{Q} (p_{u}, R) + b$ ;
7: End for
8: Foreach item i, when given matirx Pdo
9: $q_{i} (R, P) = \underset{q_{i}}{arg min} J_{P} (q_{i}, R)$ ;
10: End for
11: ReturnP_n×d and Q_d×m

Our Algorithm 3 requires the following explanations:

J_Q (p_u, R) is the L₂ - sensitivity. We can achieve the sensitivity of J_Q (p_u, R): $Δ p_{u} = \frac{2 q_{max} Δ r}{n_{u} λ}$ (4.20) where q_max is the upper bound on ||q_i||₂. Δr is (still) the difference between the maximum and minimum rating.

According to the Laplace mechanism (Section 3.1.), for a fixed matrix Q, we generate a random noise vector b with pdf $f (b) \infty exp (- \frac{ɛ | | b | |}{2 k} \cdot \frac{n_{u} λ}{2 q_{max} Δ r})$ (4.21)

From the J_Q (p_u, R) objective function for solving SVD (Equation (4.9)), we provide Corollary 4.1 and Theorem 4.3 as follows.

Corollary 4.1.If $N (.) = | | p_{u} | |_{2}^{2}$ is differentiable and 1 - stronglyconvex, the loss function ofJ_Q (p_u, R) $(ℓ_{p_{u}} (e_{ui}) = (r_{ui} - μ - b_{u} - b_{i} - p_{u}^{T} q_{i})^{2})$ is convex and differentiable, then theL₂ - sensitivityofJ_Q (p_u, R) is at most $\frac{2 q_{max} Δ r}{n_{u} λ}$ .

Proof. Let

$R = (\begin{matrix} r_{11} & \dots & r_{1 n} \\ ⋮ & ⋱ & ⋮ \\ r_{n 1} & \dots & r_{nm} \end{matrix}) and R' = (\begin{matrix} r_{11} & \dots & r_{1 n} \\ ⋮ & ⋱ & ⋮ \\ r_{n 1} & \dots & r_{nm}^{'} \end{matrix})$ be two matrices whose ratings differ in the value of the jth entry. Moreover, we let $\begin{array}{l} G (p_{u}) = J_{Q} (p_{u}, R), g (p_{u}) \\ = J_{Q} (p_{u}, R^{'}) - J_{Q} (p_{u}, R) \end{array}$ (4.22) $\begin{matrix} p_{u 1} & = & \underset{p_{u}}{arg min} J_{Q} (p_{u}, R), p_{u 2} \\ = & \underset{p_{u}}{arg min} J_{Q} (p_{u}, R') \end{matrix}$ (4.23) $\begin{matrix} g (p_{u}) & = & (r_{ui}^{'} - μ - b_{u} - b_{i} - p_{u}^{T} q_{i})^{2} \\ - (r_{ui} - μ - b_{u} - b_{i} - p_{u}^{T} q_{i})^{2} \end{matrix}$ (4.24)

We observe that due to the convexity of ℓ and the 1 - strongly convexity of $N (.) = | | p_{u} | |_{2}^{2}$ , G (p_u) = J_Q (p_u, R) is n_uλ - strongly convex. In addition, due to the differentiability of $N (.) = | | p_{u} | |_{2}^{2}$ and ℓ_{p
_u}, G (p_u) and g (p_u) are also differentiable for all points. Then, we have

$\begin{matrix} \nabla g (p_{u}) & = & - 2 (r_{ui}^{'} - μ - b_{u} - b_{i} - p_{u}^{T} q_{i}) q_{i} \\ + 2 (r_{ui} - μ - b_{u} - b_{i} - p_{u}^{T} q_{i}) q_{i} \\ = & 2 q_{i} (r_{ui} - r_{ui}^{'}) \\ = & 2 q_{i} Δ r \end{matrix}$ (4.25)

So, we find that $| | \nabla g (p_{u}) | | = 2 \nabla r | | q_{i}^{T} | | \leq 2 q_{max} Δ r .$ (4.26)

The proof now follows by an application of Lemma 1 of Ref. [18].

Theorem 4.3.If $N (.) = | | p_{u} | |_{2}^{2}$ is differentiable and 1 - stronglyconvex, the loss function ofJ_Q (p_u, R) $(ℓ_{p_{u}} (e_{ui}) = (r_{ui} - μ - b_{u} - b_{i} - p_{u}^{T} q_{i})^{2})$ is convex and differentiable, Algorithm 3 providesɛ - differential privacy.

Proof. The proof of Theorem 4.3 follows from Corollary 4.1 and Dwork et al. [2]. The proof is provided here for completeness.

According to the proof from Corollary 4.1, if the conditions on $N (.) = | | p_{u} | |_{2}^{2}$ and $(ℓ_{p_{u}} (e_{ui}) = (r_{ui} - μ - b_{u} - b_{i} - p_{u}^{T} q_{i})^{2})$ hold, the L₂ - sensitivity of J_Q (p_u, R) with the regularization parameter n_uλ is at most $\frac{2 q_{max} Δ r}{n_{u} λ}$ . We observe that when we pick ||b|| from the distribution $υ (b) = \frac{1}{α} e^{- β | | b | |}$ , where $β = \frac{n_{u} λ ɛ}{2 q_{max} Δ r}$ , for a specific vector $b_{0} \in ℝ^{d}$ , the density at b₀ is proportional to e^-β||b₀||. Let R_n×m and R_n×m′ be any two matrices whose ratings differ in the value of the jthentry, b₁ and b₂ are the corresponding noise vectors and g(p_u|R)(g(p_u|R′) is the density of the output of Algorithm 3 at p_u, when the input is R(R′ respectively). For any p_u, we have

$\begin{array}{l} \frac{g (p_{u} | R)}{g (p_{u} | R^{'}} = \frac{υ (b_{1})}{υ (b_{2})} = \frac{\frac{1}{α} e^{- β | | b_{1} | |}}{\frac{1}{α} e^{- β | | b_{2} | |}} \\ = e^{- β (| | b_{1} | | - | | b_{2} | |)} = e^{- \frac{n_{u} λ ε}{2 q_{\max} Δ r} (| | b_{1} | | - | | b_{2} | |)} \end{array}$ (4.27)

If p_u1 and p_u2 are the respective solutions to non-private regularized J_Q (p_u, R) when the input is R and R′, then b₁ - b₂ = p_u1 - p_u2. From Corollary 4.1 and using a triangle inequality,

$\begin{matrix} | | b_{1} | | - | | b_{2} | | & \leq & | | b_{1} - b_{2} | | = | | p_{u 1} - p_{u 2} | | \\ \leq & \frac{2 q_{max} Δ r}{n_{u} λ} \end{matrix}$ (4.28)

By symmetry, the densities of the directions of b₁ and b₂ are uniform. Therefore, by construction, $\frac{υ (b_{1})}{υ (b_{2})} \leq e^{ɛ} .$ (4.29)

So, according to the definition of DP, Algorithm 3 provides ɛ - differential privacy.

5 Experiments

5.1 Experiment datasets

In our experiments, we use two datasets to verify that our algorithms do not only fit a single kind of dataset. One dataset is a Movielens dataset from http://grouplens.org/datasets/movielens/[movielens/], which include 100 k, 1 M and 10 M datasets. The other is Netflix, which was constructed to support participants in the Netflix Prize. To compare the performance of our algorithms on different datasets, we choose the Movielens-1M dataset and the partial Netflix dataset (called Netflix-1M in this paper). It must be explained that the Netflix dataset is a partial dataset we captured from the original Netflix dataset. Some statistical properties of the Movielens-1M and the Netflix-1M datasets used are shown in Table 1.

Table 1
Statistical properties of the two datasets

Property Movielens-1M Netflix-1M

Users 6040 4996

Movies 3952 3999

Density 4.19% 0.19%

Average rating 3.5816 3.5956

Variance rating 1.2479 1.2208

Property	Movielens-1M	Netflix-1M
Users	6040	4996
Movies	3952	3999
Density	4.19%	0.19%
Average rating	3.5816	3.5956
Variance rating	1.2479	1.2208

5.2 Experimental setting evaluation measurement

As a frequently used methodology in machine learning and data mining, we use ten-fold cross-validation to train and evaluate the performance of our algorithms. The training datasets are divided into training and validation set with an 90/10 splitting ratio. We measure the accuracy of the predicted ratings ${\tilde{r}}_{ui}$ using the Root Mean Square Error (RMSE) metric. The smaller the RMSE, the more accurate the prediction is. The RMSE is computed by $RMSE = \sqrt{\sum_{R} (r_{ui} - {\tilde{r}}_{ui})^{2} / | R |}$ (4.30) where |R| means the number of effective ratings. Considering the possible discrepancies resulting from adding noise, the final RMSE is averaged across multiple runs.

5.3 Experimental results and comparison

In Ref. [17], the algorithms proposed needed to perform some pre-processing for the raw input matrix. We also applied DP to MF, but our algorithms introduced SVD that did not require any pre-processing as it considers the user and item bias. Particularly, we propose that the ALS objective perturbation obtained better results by comparing them with some experimental results from Ref. [17].

5.3.1 Experimental parameter settings

Some parameters were used in our algorithms. First, the selection of the parameters in each algorithm is introduced briefly.

In Algorithms 1, 2 and 3, we set the number of factors to d = 5, the learning rate to γ = 0.001 and the regularization parameter to λ = 0.125 by experience.

In Algorithms 1, 2 and 3, we set the regularization parameter used to compute user’s bias and item’s bias to λ₁ = 10 and λ₂ = 25 by referring to Ref. [1].

In Algorithms 1, 2 and 3, the method setting the number of iterations is such that the algorithm gave an upper limit (k = 20) first and then the iteration stopped when the error was less than 0.0001.

In Algorithm 3, we bounded the L₂ - norm of the user vectors to p_max = 0.5 and the item vectors to q_max = 0.5 by experience.

5.3.2 Experimental results and analysis

The meanings of the legend terms in the experimental results are shown in Table 2.

Table 2
The meanings of the legend

Name Meanings

SVDALSBase Without DP protection, no pre-processing, ALS for SVD

IALS Algorithm 3 of Ref. [17] (Differentially Private Input Perturbation), do pre-processing, ALS for MF

PALS Algorithm 5 of Ref. [17] (Differentially Private ALS with Output Perturbation), do pre-processing, ALS for MF

PSGD Algorithm 4 of Ref. [17] (Differentially Private SGD), do pre-processing, SGD for MF

DPSVDALSIn Our Algorithm 1, no pre-processing, ALS Input perturbation for SVD

DPSVDALSObj Our Algorithm 2, no pre-processing, ALS objective perturbation for SVD

DPSVDALSOut Our Algorithm 3, no pre-processing, ALS output perturbation for SVD

Name	Meanings
SVDALSBase	Without DP protection, no pre-processing, ALS for SVD
IALS	Algorithm 3 of Ref. [17] (Differentially Private Input Perturbation), do pre-processing, ALS for MF
PALS	Algorithm 5 of Ref. [17] (Differentially Private ALS with Output Perturbation), do pre-processing, ALS for MF
PSGD	Algorithm 4 of Ref. [17] (Differentially Private SGD), do pre-processing, SGD for MF
DPSVDALSIn	Our Algorithm 1, no pre-processing, ALS Input perturbation for SVD
DPSVDALSObj	Our Algorithm 2, no pre-processing, ALS objective perturbation for SVD
DPSVDALSOut	Our Algorithm 3, no pre-processing, ALS output perturbation for SVD

Fig. 2 shows how the results of our three algorithms compare with their Baseline (without DP protection) for two datasets. Figs. 3 and 4 shows how the results of our algorithms compare with the correlation algorithms of Ref. [17] for two datasets.

Fig.2

Comparing our algorithms with their Baseline.

From Fig. 2, we can observe that the RMSE of our algorithms with DP protection for SVD are acceptable within a certain range of the two datasets, that is, none of the RMSEs deviate from their Baseline. Moreover, the larger the value of DP parameter ɛ is the more accurate the prediction. In general, the results of our algorithms on Movielens-1M dataset are better, this is because the training samples of the Netflix-1M dataset are fewer than in the Movielens-1M, and the Netflix-1M dataset is also sparser than Movielens-1M. Thus, we can draw a conclusion that the prediction accuracy is closely related to the data set size and scarcity, even when processing by DP. Particularly, Algorithm 3 obtains the best prediction accuracy on Movielens-1M dataset whether the privacy parameter ɛ is large or small; that is, the results of this approach processed by DP are the most stable. Second, the result of Algorithm 2 is better than Algorithm 1 on the two datasets.

The predictive accuracy of Algorithm 1 is almost as good as the other two algorithms when ɛ ≥ 2 (especially in (b)), but it becomes worse when ɛ < 2. This result implies that the availability of training data will be deduced due to the smaller DP parameter ɛ, the larger noise added to the raw rating matrix. In addition, the prediction accuracy of ALS output perturbation becomes poor when ɛ < 0.1. The reason is that the latent factor matrices are perturbed after decomposition and the smaller the value of ɛ, the more noise is added, resulting in the inner product of two latent factors deviating greatly from its true value.

Fig. 3 shows the results of our algorithms compared with the correlative algorithm of Ref. [17] for the two datasets. In Ref. [17], Berlioz et al. also proposed Input perturbation (called IALS in our experiment) and ALS output perturbation (called PALS in our experiment). However, their algorithms needed to do some DP pre-processing of the raw rating matrix. In fact, pre-processing of the raw matrix, or adding noise to it, will affect the result of SVD.

Fig.3

Comparing our algorithm with the ALS-based DP algorithms of Ref. [17].

However, our algorithms not only omit pre-processing steps but also obtain better prediction accuracy on two test datasets. Even if the result of ALS input perturbation is the worst result in our three algorithms, its result is better than the input-perturbed algorithms of Ref. [17]. Particularly, the advantage of our ALS objective function perturbation is more obvious. Furthermore, it can be seen from Fig. 3, that their algorithms achieve better prediction accuracy when the value of ɛ is larger (value of ɛ even up to 20); however, when the value of ɛ is too large, it would be unreasonable to use according to the meaning of DP.

Considering the optimization algorithm selection, Fig. 4 shows the results of our algorithms compared with SGD-based DP algorithm (SGD gradient perturbation) of Ref. [17]. Overall, we see that the predictive accuracy of our three ALS-based DPs are better than the SGD-based DP algorithm of Ref. [17] based on the two datasets. This is because the update at each iteration of SGD has a significant relationship with the error and each iteration of ALS is directly related to the training data set; namely, the ALS method itself is better than SGD, even if they are processed by DP.

Fig.4

Comparing our algorithm with SGD-based DP algorithm of Ref. [17].

In summary, our three DP algorithms proposed for SVD not only protect the information of raw data to a certain extent but also do not significantly affect the recommendation accuracy, thus balancing privacy with recommendation efficiency. The ALS objective perturbation algorithm obtains a better trade-off between privacy and recommendation efficiency.

5.3.3 Selection of DP parameter ɛ

To balance the strength of privacy and recommendation accuracy, this paper proposes a selection scheme of DP protection parameter ɛ. The specific steps are described as follows:

First, determine the recommended object user (User ID must exist in the dataset).

Second, for a certain DP process and not for any other DP process, we compute the recommended item set (this paper is a recommended movie set) to the object user.

Third, compute the intersection of two recommended item sets obtained by the second step.

Finally, compute a percentage by the intersection dividing the total number of recommended item sets. The greater the value of this percentage is, the smaller the influence of recommendation accuracy is and thus the value of ɛ should be relatively reasonable.

For further explanation, this scheme can only provide a reasonable range of the DP parameter ɛ. Normally, if this percentage is less than 20%, we consider the recommended results to be seriously affected despite the privacy protection being very strong. If this percentage is more than 80%, we think the power of privacy protection is too weak despite the recommendation results being better. Therefore, the value of DP parameter ɛ is relatively reasonable when this percentage is between 20% and 80%. To verify this scheme, our Algorithm 2 and Algorithm 3 are compared with Algorithm 5 of Ref. [17] (PALS). Fig. 3 shows the selection of DP parameter ɛ for the Movielens-1M dataset. Each parameter in this experiment is still set in accordance with what is described in Section 5.3.1. In addition, the number of recommended movie sets is set to 20 and the recommended user is selected randomly.

From Fig. 5, we can see that the impact of the privacy parameter ɛ on the recommendation results in our algorithms (especially Algorithm 2) is smaller than that of Algorithm 5 from Ref. [17]. It is noteworthy that our two algorithms obtain the percentage of coincidence degree of 20% and 80% when the value of the privacy parameter ɛ is 2 and 11, respectively. In other words, those values of ɛ in this percentage range can be acceptable.

Fig.5

The impact comparison of the privacy parameter ɛ on the recommendation results.

6 Conclusion

In recent years, recommender systems have become one of many types of necessary services for Internet businesses and the service is subject to many online users. However, from the perspective of both academic and commercial industries, most of the research considers how to improve the recommendation accuracy while ignoring the problem of privacy of the raw data. In this paper, we investigated the application of DP to ALS for SVD through Input perturbation, ALS objective perturbation and ALS Output perturbation. Rigorous mathematical proofs are provided to ensure that all three algorithms maintain the differential privacy. Finally, through experimental verification and comparison with other methods using two real datasets, it is shown that our privacy algorithms for SVD obtain better results. We think that our approaches could be applied to effectively protect data in recommendation system due to they maintain the accuracy of recommendation system algorithm after processed by DP. In addition, a scheme for the selection of DP parameters can obtain a reasonable range for the DP parameter, balancing privacy and recommendation accuracy.

Due to the rapid development of the Internet, an increasing number of users are expecting personalized recommendation services, and consequently, privacy issues are becoming a worrying problem for them. Recommender systems and the field of data mining require healthy development and so they are inseparable from the protection of privacy in in-depth research.

In this paper, we studied the privacy-protection issue with regards to the SVD optimization algorithm. In the future, a more in-depth study of the following aspects can be expected.

SVD parameter tuning. Typically, SVD parameters, such as the number of factors, the regularization parameter and the learning rate, are tuned to increase prediction accuracy, while preventing over-fitting and ensuring convergence.

The selection of the DP parameter ɛ. In this paper, we only provide a selection interval of ɛ, which could not determine the optimal interval. After all, the Laplace noise itself is random.

Comparison of other CF algorithms. In this paper, we proposed new approaches that apply DP in optimal algorithms of SVD. To extend the application of DP, other CF algorithms could be studied and their recommendation results could be compared with each other.

Multiple evaluation measurements might be used to verify our algorithms.

Footnotes

Acknowledgments

This work is partially supported by the Natural Science Foundation of Guangdong Province (No. 2014A030313662), the Special Funds for Welfare Research and Capacity Building Project of Guangdong Province (No. 2015A030402003), the Funds for Science and Technology Project of Guangdong Province (No. 2016ZC0039), the Funds for Philosophy and Social Science Project of Guangdong Province (No. GD15CGL05) and the Fundamental Research Funds for the Central Universities of SCUT (No. 2015QNXM20). We would like to express our sincere appreciation to the anonymous reviewers for their insightful comments, which have greatly aided us in improving the quality of the paper.

References

Ricci

, Rokach

and Shapira

, Recommender Systems Handbook, Springer, 2011.

Koren

, Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model, Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08), Las Vegas, Nevada, USA, (2008, pp. 426–434.

Calandrino

J.A.

, Kilzer

, Narayanan

, Felten

E.W.

and Shmatikov

, You Might Also Like:” Privacy Risks of Collaborative Filtering, Proceedings of the 2011 IEEE Symposium on Security and Privacy (SP’11), Washington, DC, USA, 2011, pp. 231–246.

Dwork

, Differential Privacy, Proceedings of the 33rd International Conference on Automata, Languages and Programming, Part II (ICALP ’06), Venice, Italy, 2006, pp. 1–12.

Liu

Q.X.

, Wu

Q.R.

, Zhang

Q.Y.

and Wang

L.X.

, POSTER: Recommendation-based Third-Party Tracking Monitor to Balance Privacy with Personalization, Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security (CCS’14), Scottsdale, Arizona, USA, 2014, pp. 1472–1474.

Dandekar

, Fawaz

and Ioannidis

, Privacy auctions for recommender systems, ACM Transactions on Economics and Computation3(2) (2012), 1–22.

, Ma

L.L.

, Xiao

and Zhang

H.Q.

, Web service QoS prediction by neighbor information combined non-negative matrix factorization, Journal of Intelligent & Fuzzy Systems30(6) (2016), 3593–3604.

Dwork

and Roth

, The algorithmic foundations of differential privacy, Foundations and Trends^® in Theoretical Computer Science9(3) (2013), 211–407.

Dwork

, Differential privacy: A survey of results, Proceedings of the 5th International Conference on Theory and applications of models of computation (TAMC’08), Xi’an, China, 2008, pp. 1–19.

10.

T.Q., Zhu

, Li

, Ren

Y.L.

, Zhou

W.L.

and Xiong

, Differential privacy for neighborhood-based collaborative filtering, In International Conference on Advances in Social Networks Analysis and Mining (ASONAM’13), Niagara, Ontario, Canada, 2013, pp. 752–759.

11.

Hua

J.Y.

, Xia

and Zhong

, Differentially private matrix factorization, Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI’15), Buenos Aires, Argentina, 2015, pp. 1763–1770.

12.

Wang

Z.Q. Y.X.

and Smola

, Fast Differentially Private Matrix Factorization, Proceedings of the 9th ACM Conference on Recommender Systems (RecSys’15), Vienna, Austria, 2015, pp. 171–178.

13.

Yan

, Pan

S.R.

, Zhu

W.T.

and Chen

K.K.

, DynaEgo: Privacy-Preserving Collaborative Filtering Recommender System Based on Social-Aware Differential Privacy, International Conference on Information and Communications Security (ICICS’16), 2016, pp. 347–357.

14.

Balu

and Furon

, Differentially Private Matrix Factorization using Sketching Technique, Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security (IH&MMSec’16), Vigo, Galicia, Spain, 2016, pp. 57–62.

15.

Javidbakht

and Venkitasubramaniam

, Differential privacy in networked data collection, Conference on Information Science & Systems (CISS’16), 2016, pp. 117–122.

16.

Boutet

, Frey

, Guerraoui

, Jégou

and Kermarrec

A.M.

, Privacy-preserving distributed collaborative filtering, Computing98(8) (2016), 827–846.

17.

Berlioz

, Friedman

, Kaafar

M.A.

, Boreli

and Berkovsky

, Applying Differential Privacy to Matrix Factorization, Proceedings of the 9th ACM Conference on Recommender Systems (RecSys’15), Vienna, Austria, 2015, pp. 107–114.

18.

Chaudhuri

, Monteleoni

and Sarwate

, Differentially private empirical risk minimization, The Journal of Machine Learning Research12 (2011), 1069–1109.

19.

Dwork

, McShery

, Nissim

and Smith

, Calibrating noise to sensitivity in private data analysis, Theory of Cryptography Conference (TCC’06), 2006, pp. 265–284.

20.

Hardt

and Talwar

, On the geometry of differential privacy, Proceedings of the forty-second ACM symposium on Theory of computing (STOC’10), Cambridge, Massachusetts, USA, 2010, pp. 705–714.

New SVD-based collaborative filtering algorithms with differential privacy

Abstract

Keywords

1 Introduction

2 Related work

3 Preliminaries

3.1 Differential privacy

4.3 Private ALS with objective perturbation

5.1 Experiment datasets

Table 1 Statistical properties of the two datasets Property Movielens-1M Netflix-1M Users 6040 4996 Movies 3952 3999 Density 4.19% 0.19% Average rating 3.5816 3.5956 Variance rating 1.2479 1.2208

5.3.1 Experimental parameter settings

5.3.2 Experimental results and analysis

Footnotes

Acknowledgments

References

Table 1
Statistical properties of the two datasets

Property Movielens-1M Netflix-1M

Users 6040 4996

Movies 3952 3999

Density 4.19% 0.19%

Average rating 3.5816 3.5956

Variance rating 1.2479 1.2208