Abstract
In the era of big data, real and reliable user data information is an important factor in the recommendation technology; therefore, the disclosure of personal privacy has become a significant problem user concern. Differential privacy protection is a proven and very strict privacy protection technology, which is particularly good at protecting the privacy of indirect derivation. Singular Value Decomposition (SVD) is one of the common matrix factorization techniques used in collaboration filtering for recommender systems and it considers the user and item bias. This paper will develop a flexible application that implements differential privacy in SVD. As part of the development process, on one hand, our algorithms do not need to perform any pre-processing of the raw input matrix. On the other hand, the experimental results, using two real datasets, show that our algorithms not only protect privacy information in the raw data but also ensure the accuracy of recommendations. Finally, a trade-off scheme is used, which can balance the privacy protection and the recommendation accuracy to a certain extent.
Keywords
Introduction
To offer useful recommendations, two critical elements are essential in recommender system (RS) research: one element is the collection of users’ massive personal information, such as purchase and browsing information on the Internet; another element is RS technology. The core technology of recommender systems uses a collaborative filtering algorithm, particularly the Matrix Factorization (MF), which is based on the latent factor model and is a widely used algorithm and winner of the Netflix Prize. In the real world, ratings, comments and other data provided by users often have some bias. This bias may stem from a user’s personal preference, consumer personality differences, or an item. Singular Value Decomposition (SVD) [1, 2] is a type of MF method. When used for recommender systems, SVD considers the user and item bias information, which can often improve the better predictive accuracy over traditional MF methods. However, while a user enjoys the convenience of recommender systems, there is also a risk of disclosure of user privacy.
A Collaboration Filtering (CF), based on items chosen by a user during a transaction, will increase the similarity commodity list based on a user’s previous transaction commodities. Thus, an attacker can track similar commodity lists related to the targeted user and then determine a new commodity. When similar commodities appear in these lists, the attacker can deduce the item added to the target user’s records. Calandrino et al. [3] investigated the privacy risks of recommender systems based on CF and showed that it is feasible to draw meaningful inferences about transactions of specific users from the public outputs of recommender systems. Dwork proposed three types of inference attacks for attack modes based on user purchase records [4], i.e., when an attacker has some background knowledge about a user. Such attacks are significant threats to user privacy.
One of the goals of the RS field, and even the whole machine learning field [5, 6], is the improvement of the Quality of Service (QoS) [7]. To ensure personal privacy and security, and eliminate user concerns over data protection, users are encouraged to provide true and reliable information to ensure the establishment of effective information system safeguards and security guidelines. In 2006, Dwork et al. [4, 9] proposed the Differential Privacy (DP) method; it has a very strict definition and has nothing to do with background knowledge.
Our contributions are summarized as follows. First, we propose three new DP algorithms for SVD: Private ALS with Input perturbation, Private ALS with Objective Perturbation and Private ALS with Objective Perturbation. Second, some key mathematical proofs are provided to ensure that the new algorithms satisfy the definition of DP. Third, we compare the predictive accuracy of our algorithms, over two real datasets, to a baseline and related methods found in the relevant literature. The results show that our algorithms achieve better results. Finally, to address the trade-off between the strength of privacy protection and predictive accuracy, we use a scheme that selects a reasonable range of DP protection parameter ɛ. A reasonable range of the DP parameter can be obtained by this scheme.
Related work
Work investigating the implementation of DP methods in RS, due to privacy protection concerns, has become a popular research topic in recent years. Zhu et al. [10] applied DP to the neighbourhood-based CF methods and addressed privacy concerns, in this context, by proposing a private neighbour collaborative filtering (PNCF) algorithm. Hua et al. [11] proposed a method to prevent the unauthorized use of user ratings by untrusted recommenders while allowing the user to abstain from or join the MF process. The DP protection was implemented by disturbing the MF objective function. Liu et al. [12] proposed the application of DP to Bayesian posterior sampling via a Stochastic Gradient Langevin Dynamics (SGLD) step, thus avoiding the influence of Gaussian noise on the entire parameter space. Yan et al. [13] proposed the utilization of the DP principle and social relationships to adaptively modify user-rating histories to prevent exact user information from being leaked. Balu et al. [14] proposed using sketching techniques to implicitly provide DP guarantees by taking advantage of the inherent randomness of data structures; this approach is well suited for large-scale applications. Javidbakht et al. [15] proposed using DP as a metric to quantify the privacy of an intended destination; optimal probabilistic routing schemes are investigated under unicast and multicast paradigms. Boutet et al. [16] proposed an original obfuscation scheme and a randomized dissemination protocol to preserve privacy while leveraging user profiles in distributed recommender systems. Berlioz et al. [17] proposed applying DP to latent factor model in each step of MF; however, they lacked the rigorous mathematical proofs and required for pre-processing raw data, and thus, the experimental results determined a DP parameter that was too large when good recommender accuracy was obtained.
Chaudhuri et al. [18] proposed general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). They proposed an output perturbation and objective perturbation based DP model, but they were applied to SVM and Logistic regression. In this paper, we propose new algorithms to preserve the privacy of a RS. We apply DP to SVD. Rigorous proofs will be provided to ensure these algorithms satisfy DP. Finally, some experimental results show that our algorithms can also obtain good predictive accuracy.
Preliminaries
Differential privacy
Differential privacy [3, 8] is essentially different from the traditional privacy protection model. DP requires that the removal or addition of a single database record should not significantly affect the outcome of any analysis based on the database. It defines an extremely strict attack model and can provide a rigorous, quantitative representation and proof of the disclosure risk of private information. DP can provide good protection even if the attacker has an extensive background information.
The key technology of DP protection adds noise that satisfies the Laplace or Exponent mechanism [7, 18]. The former is applied to the results for numerical protection and the latter for non-numerical protection. The amount of noise has a direct correlation with the function’s sensitivity and the privacy protection parameter ɛ.
We will sometimes write Lap (b) to denote a random variable X ∼ Lap (b) [3], where b is determined by both S k (f) and the privacy parameter ɛ.
In this work, we also rely on the K - norm mechanism [20], which makes it possible to calibrate noise of the L2 - sensitivity for the evaluated function. The outputs of our algorithms are all numerical, so we use the Laplace mechanism to achieve DP.
MF is one of the effective methods used to predict the missing ratings of a sparse matrix (e.g. ratings matrix). Briefly, MF implies factorizing a matrix, that is, to find two (or more) matrices, such that when we multiply them, we can obtain the approximate matrix of the raw matrix. From an application point-of-view, MF can be used to discover latent features underlying the interactions between two different kinds of entities.
Let the input of MF be a rating matrix Rn×m containing the ratings of n users for m items. Each matrix element r
ui
refers to the rating of user u for item i. Assume a low-dimensional d and that MF factorizes the raw matrix Rn×m into two latent matrices: a user-factor matrix Pn×d and an item-factor matrix Qd×m. The factorization is done such that R is approximated as the product of P and Q; that is, each known rating r
ui
is approximated by
In general, the predictive rating
Ref. [1] provides two methods to solve b U and b i . One is an empirical likelihood estimation (the formula (8)), the other is Stochastic Gradient Descent (SGD). In this paper, we adopt the first method due to the rate of convergence and the influence of the error in each iteration.
If an attacker has some background knowledge, he can obtain a user’s private data from the raw rating matrix. For example, an attacker can infer that a user who likes certain types of movies may not want other people to know. Thus, our goal is to protect the raw rating matrix by using DP reasonably. In this paper, we apply DP to SVD. According to the principle of MF, we can see that the MF process can be divided into the four stages: input the “user-item” ratings matrix; SVD solved by ALS; output the latent user matrix and item matrix; rating prediction. Berlioz et al. [17] proposed to apply DP to these four stages and needed to perform pre-processing in the raw matrix. In contrast, our algorithms do not perform any pre-processing with DP. On the other hand, our algorithms consider user and item bias information to improve recommender accuracy. We propose three new DP algorithms for SVD: Private ALS with Input perturbation, Private ALS with Objective Perturbation and Private ALS with Objective Perturbation. The whole framework of our three new algorithms is shown in Fig. 1.

The whole framework of our three algorithms.
ALS is one of the common methods for solving the resulting non-convex optimization problem. In ALS, we can solve the optimization problem iteratively. One latent matrix (say P) in each iteration is fixed, then the objective function of the SVD (Equation (3.7)) is converted into a convex optimization problem, where the solution (for Q) can be found efficiently. Similarly, we can find another latent matrix in this way. Finally, these steps are repeated until convergence is achieved.
According to the principle of ALS, the raw objective function (Equation 3.7) can become two convex optimization problems as follows.
Then, each user vector p
u
and item vector q
i
can be obtained by solving the ERM [18] problem as follows
We first give a function (named alsSVD ()) that solving SVD by ALS.
The premise of Algorithm 1 is to perturb the raw-ratings matrix with Laplace noise; then, the algorithm is trained using the noisy input ratings; after this, we use ALS to solve SVD.
Algorithm 1 requires the following explanations, which are provided here: rmax is the maximum value and rmin is the minimum value in the rating matrix. The perturbed rating In Algorithm 1, the predictive rating (according to Equation (3.6)) should be computed by using the perturbed rating matrix. Here, μ, b
u
and b
i
are all new values. Thus, the error also should be the value between the perturbed rating and it’s predictive rating. Δr (Δr = rmax - rmin) is the difference between the maximum value and the minimum value.
Second, b is a noise vector that is added to each r
ui
and its probability density is
Private ALS with objective perturbation
Chaudhuri et al. [18] proposed two new approaches: objective perturbation and output perturbation using DP for the design of privacy-preserving algorithms. Specifically, their experiments showed that the results of objective perturbation are optimal when balancing privacy protection and recommendation accuracy. In this subsection, we apply this approach to an ALS objective function and provide a reasonable proof of this method.
The premise of Algorithm 2 is to add noise to the objective function (Equations (4.9 or 4.10)). Furthermore, from the predictive rating (Equation (3.6)), we infer that the inner product of matrix P and Q is disturbed if at least one of two matrices is perturbed. So, we only add noise to the ALS objective function for solving P, and can obtain Equation (4.13),
According to Algorithm 2 of Ref. [18], the regularizing terms
The ALS objective functions of the SVD are convex and differentiable, so they satisfy the application conditions of the Algorithm 2 of Ref. [18]. Our Algorithm 2 describes the DP protection of ALS objective perturbation to solve the latent factors of SVD.
Our Algorithm 2 requires the following explanations: First, we need to deduce and compute the value of parameter C in steps 6 and 8. Here, we determine that the value of C is 2. Refer to [18], the value of C is deduced as follows: From the J
Q
(p
u
, R) objective function for solving SVD (Equation (4.9)), we know the loss function of J
Q
(p
u
, R) is
In addition, the regularization term According to the Theorem 2 of [18], | ℓ
p
u
″| ≤ C. Here, we can obtain C = 2 from (ℓ
p
u
(e
ui
))″ = 2. We solve the value of p
u
after objective function perturbation; that is, we solve the partial derivative for Equation (4.14), where n indicates the number of users, m indicates the number of items in the raw matrix and N indicates the number of samples.
When ∀1 ≤ u ≤ n, and 1 ≤ k ≤ d, we obtain
And then, we have
Then, fixing Q and solving
The main idea of Algorithm 3 is that it guarantees DP by adding a random noise vector b to the output of J Q (p u , R) (Equation(4.9)) or q i (R, P) (Equation(4.12)). Similar to Algorithm 2, we only add noise to the output of J Q (p u , R).
Our Algorithm 3 requires the following explanations: J
Q
(p
u
, R) is the L2 - sensitivity. We can achieve the sensitivity of J
Q
(p
u
, R):
According to the Laplace mechanism (Section 3.1.), for a fixed matrix Q, we generate a random noise vector b with pdf
From the J
Q
(p
u
, R) objective function for solving SVD (Equation (4.9)), we provide Corollary 4.1 and Theorem 4.3 as follows.
We observe that due to the convexity of ℓ and the 1 - strongly convexity of
So, we find that
The proof now follows by an application of Lemma 1 of Ref. [18].
According to the proof from Corollary 4.1, if the conditions on
If pu1 and pu2 are the respective solutions to non-private regularized J Q (p u , R) when the input is R and R′, then b1 - b2 = pu1 - pu2. From Corollary 4.1 and using a triangle inequality,
By symmetry, the densities of the directions of b1 and b2 are uniform. Therefore, by construction,
So, according to the definition of DP, Algorithm 3 provides ɛ - differential privacy.
Experiment datasets
In our experiments, we use two datasets to verify that our algorithms do not only fit a single kind of dataset. One dataset is a Movielens dataset from http://grouplens.org/datasets/movielens/[movielens/], which include 100 k, 1 M and 10 M datasets. The other is Netflix, which was constructed to support participants in the Netflix Prize. To compare the performance of our algorithms on different datasets, we choose the Movielens-1M dataset and the partial Netflix dataset (called Netflix-1M in this paper). It must be explained that the Netflix dataset is a partial dataset we captured from the original Netflix dataset. Some statistical properties of the Movielens-1M and the Netflix-1M datasets used are shown in Table 1.
Statistical properties of the two datasets
Statistical properties of the two datasets
As a frequently used methodology in machine learning and data mining, we use ten-fold cross-validation to train and evaluate the performance of our algorithms. The training datasets are divided into training and validation set with an 90/10 splitting ratio. We measure the accuracy of the predicted ratings
In Ref. [17], the algorithms proposed needed to perform some pre-processing for the raw input matrix. We also applied DP to MF, but our algorithms introduced SVD that did not require any pre-processing as it considers the user and item bias. Particularly, we propose that the ALS objective perturbation obtained better results by comparing them with some experimental results from Ref. [17].
Experimental parameter settings
Some parameters were used in our algorithms. First, the selection of the parameters in each algorithm is introduced briefly. In Algorithms 1, 2 and 3, we set the number of factors to d = 5, the learning rate to γ = 0.001 and the regularization parameter to λ = 0.125 by experience. In Algorithms 1, 2 and 3, we set the regularization parameter used to compute user’s bias and item’s bias to λ1 = 10 and λ2 = 25 by referring to Ref. [1]. In Algorithms 1, 2 and 3, the method setting the number of iterations is such that the algorithm gave an upper limit (k = 20) first and then the iteration stopped when the error was less than 0.0001. In Algorithm 3, we bounded the L2 - norm of the user vectors to pmax = 0.5 and the item vectors to qmax = 0.5 by experience.
Experimental results and analysis
The meanings of the legend terms in the experimental results are shown in Table 2.
The meanings of the legend
The meanings of the legend
Fig. 2 shows how the results of our three algorithms compare with their Baseline (without DP protection) for two datasets. Figs. 3 and 4 shows how the results of our algorithms compare with the correlation algorithms of Ref. [17] for two datasets.

Comparing our algorithms with their Baseline.
From Fig. 2, we can observe that the RMSE of our algorithms with DP protection for SVD are acceptable within a certain range of the two datasets, that is, none of the RMSEs deviate from their Baseline. Moreover, the larger the value of DP parameter ɛ is the more accurate the prediction. In general, the results of our algorithms on Movielens-1M dataset are better, this is because the training samples of the Netflix-1M dataset are fewer than in the Movielens-1M, and the Netflix-1M dataset is also sparser than Movielens-1M. Thus, we can draw a conclusion that the prediction accuracy is closely related to the data set size and scarcity, even when processing by DP. Particularly, Algorithm 3 obtains the best prediction accuracy on Movielens-1M dataset whether the privacy parameter ɛ is large or small; that is, the results of this approach processed by DP are the most stable. Second, the result of Algorithm 2 is better than Algorithm 1 on the two datasets.
The predictive accuracy of Algorithm 1 is almost as good as the other two algorithms when ɛ ≥ 2 (especially in (b)), but it becomes worse when ɛ < 2. This result implies that the availability of training data will be deduced due to the smaller DP parameter ɛ, the larger noise added to the raw rating matrix. In addition, the prediction accuracy of ALS output perturbation becomes poor when ɛ < 0.1. The reason is that the latent factor matrices are perturbed after decomposition and the smaller the value of ɛ, the more noise is added, resulting in the inner product of two latent factors deviating greatly from its true value.
Fig. 3 shows the results of our algorithms compared with the correlative algorithm of Ref. [17] for the two datasets. In Ref. [17], Berlioz et al. also proposed Input perturbation (called IALS in our experiment) and ALS output perturbation (called PALS in our experiment). However, their algorithms needed to do some DP pre-processing of the raw rating matrix. In fact, pre-processing of the raw matrix, or adding noise to it, will affect the result of SVD.

Comparing our algorithm with the ALS-based DP algorithms of Ref. [17].
However, our algorithms not only omit pre-processing steps but also obtain better prediction accuracy on two test datasets. Even if the result of ALS input perturbation is the worst result in our three algorithms, its result is better than the input-perturbed algorithms of Ref. [17]. Particularly, the advantage of our ALS objective function perturbation is more obvious. Furthermore, it can be seen from Fig. 3, that their algorithms achieve better prediction accuracy when the value of ɛ is larger (value of ɛ even up to 20); however, when the value of ɛ is too large, it would be unreasonable to use according to the meaning of DP.
Considering the optimization algorithm selection, Fig. 4 shows the results of our algorithms compared with SGD-based DP algorithm (SGD gradient perturbation) of Ref. [17]. Overall, we see that the predictive accuracy of our three ALS-based DPs are better than the SGD-based DP algorithm of Ref. [17] based on the two datasets. This is because the update at each iteration of SGD has a significant relationship with the error and each iteration of ALS is directly related to the training data set; namely, the ALS method itself is better than SGD, even if they are processed by DP.

Comparing our algorithm with SGD-based DP algorithm of Ref. [17].
In summary, our three DP algorithms proposed for SVD not only protect the information of raw data to a certain extent but also do not significantly affect the recommendation accuracy, thus balancing privacy with recommendation efficiency. The ALS objective perturbation algorithm obtains a better trade-off between privacy and recommendation efficiency.
To balance the strength of privacy and recommendation accuracy, this paper proposes a selection scheme of DP protection parameter ɛ. The specific steps are described as follows: First, determine the recommended object user (User ID must exist in the dataset). Second, for a certain DP process and not for any other DP process, we compute the recommended item set (this paper is a recommended movie set) to the object user. Third, compute the intersection of two recommended item sets obtained by the second step. Finally, compute a percentage by the intersection dividing the total number of recommended item sets. The greater the value of this percentage is, the smaller the influence of recommendation accuracy is and thus the value of ɛ should be relatively reasonable.
For further explanation, this scheme can only provide a reasonable range of the DP parameter ɛ. Normally, if this percentage is less than 20%, we consider the recommended results to be seriously affected despite the privacy protection being very strong. If this percentage is more than 80%, we think the power of privacy protection is too weak despite the recommendation results being better. Therefore, the value of DP parameter ɛ is relatively reasonable when this percentage is between 20% and 80%. To verify this scheme, our Algorithm 2 and Algorithm 3 are compared with Algorithm 5 of Ref. [17] (PALS). Fig. 3 shows the selection of DP parameter ɛ for the Movielens-1M dataset. Each parameter in this experiment is still set in accordance with what is described in Section 5.3.1. In addition, the number of recommended movie sets is set to 20 and the recommended user is selected randomly.
From Fig. 5, we can see that the impact of the privacy parameter ɛ on the recommendation results in our algorithms (especially Algorithm 2) is smaller than that of Algorithm 5 from Ref. [17]. It is noteworthy that our two algorithms obtain the percentage of coincidence degree of 20% and 80% when the value of the privacy parameter ɛ is 2 and 11, respectively. In other words, those values of ɛ in this percentage range can be acceptable.

The impact comparison of the privacy parameter ɛ on the recommendation results.
In recent years, recommender systems have become one of many types of necessary services for Internet businesses and the service is subject to many online users. However, from the perspective of both academic and commercial industries, most of the research considers how to improve the recommendation accuracy while ignoring the problem of privacy of the raw data. In this paper, we investigated the application of DP to ALS for SVD through Input perturbation, ALS objective perturbation and ALS Output perturbation. Rigorous mathematical proofs are provided to ensure that all three algorithms maintain the differential privacy. Finally, through experimental verification and comparison with other methods using two real datasets, it is shown that our privacy algorithms for SVD obtain better results. We think that our approaches could be applied to effectively protect data in recommendation system due to they maintain the accuracy of recommendation system algorithm after processed by DP. In addition, a scheme for the selection of DP parameters can obtain a reasonable range for the DP parameter, balancing privacy and recommendation accuracy.
Due to the rapid development of the Internet, an increasing number of users are expecting personalized recommendation services, and consequently, privacy issues are becoming a worrying problem for them. Recommender systems and the field of data mining require healthy development and so they are inseparable from the protection of privacy in in-depth research.
In this paper, we studied the privacy-protection issue with regards to the SVD optimization algorithm. In the future, a more in-depth study of the following aspects can be expected. SVD parameter tuning. Typically, SVD parameters, such as the number of factors, the regularization parameter and the learning rate, are tuned to increase prediction accuracy, while preventing over-fitting and ensuring convergence. The selection of the DP parameter ɛ. In this paper, we only provide a selection interval of ɛ, which could not determine the optimal interval. After all, the Laplace noise itself is random. Comparison of other CF algorithms. In this paper, we proposed new approaches that apply DP in optimal algorithms of SVD. To extend the application of DP, other CF algorithms could be studied and their recommendation results could be compared with each other. Multiple evaluation measurements might be used to verify our algorithms.
Footnotes
Acknowledgments
This work is partially supported by the Natural Science Foundation of Guangdong Province (No. 2014A030313662), the Special Funds for Welfare Research and Capacity Building Project of Guangdong Province (No. 2015A030402003), the Funds for Science and Technology Project of Guangdong Province (No. 2016ZC0039), the Funds for Philosophy and Social Science Project of Guangdong Province (No. GD15CGL05) and the Fundamental Research Funds for the Central Universities of SCUT (No. 2015QNXM20). We would like to express our sincere appreciation to the anonymous reviewers for their insightful comments, which have greatly aided us in improving the quality of the paper.
