Wide & deep generative adversarial networks for recommendation system

Abstract

Generative Adversarial Networks (GANs) has achieved great success in computer vision like Image Inpainting, Image Super-Resolution. Many researchers apply it to improve the effectiveness of recommendation system. However, GANs-based methods obtain users’ preferences using a single Neural Network framework in generative model, which may not be fully mined. Furthermore, most GANs-based algorithms adopt cross-entropy loss to get pair-wise bias, but these methods don’t reveal global data distribution loss when data are sparse. Those problems will influence the performance of the algorithm and result in poor accuracy. To address these problems, we introduce Wide & Deep Generative Adversarial Networks for Recommendation System (a.k.a W & DGAN) in this paper. On the one hand, we employ Wide & Deep Learning as a generative model capable of extracting both explicit and implicit information of user preferences. Furthermore, we combine Cross-Entropy loss in G with Wasserstein loss in D to get data distribution, then, the joint loss will be to receive the training information feedback from data distribution. Empirical results on three public benchmarks show that W&DGAN significantly outperforms state-of-the-art methods.

Keywords

Wide & deep learning Generative Adversarial Networks joint loss recommendation system

1. Introduction

As an important way of information filtering, Recommendation System (RS) plays a key role in our daily lives.The recommendation task is to produce a list of recommendations that a user may be interested in. Collaborative Filtering (CF) [27], as is a core algorithm of RS, provides information recommendations like music recommendation and movie recommendation. However, CF and its extended methods, such as Matrix Factorization [18], Probabilistic Matrix Factorization (PMF) [22], and Bayesian Probabilistic Matrix Factorization [26] doesn’t receive satisfactory results. In recent years, deep learning [20] has achieved great success in computer vision and natural language processing [5]. Thus, many researchers use Deep Neural Networks [19] to obtain better recommendation accuracy. For example, Deep Neural Networks (DNNs) is applied in rating prediction, the core idea of which acquired user and item vector representation from DNNs; Zhang et al. [43] use two parallel neural networks to learn the hidden features of users and items. However, when the data are sparse, these methods can not get satisfactory results.

Goodfellow et al. [9] propose Generative Adversarial Networks (GANs) that includes two parts: a Generator (G) and Discriminator (D), they play min-max games: G creates an image and sends it into D, and D discriminates whether the image generated by G is true or false, and the information will be feedback to G. Based on this, many GANs-based algorithms are proposed, such as Conditional Generative Adversarial Nets (CGAN) [21], Deep Convolutional Generative Adversarial Networks (DCGAN) [24], Wasserstein GAN (WGAN) [1], Large Scale GAN Training for High Fidelity Natural Image Synthesis [3], and so on [23]. All of them apply to computer vision, and good results are obtained. Then, many researchers manipulate GANs for RS and design GANs-based methods for recommendation system. For example: Unifying Generative and Discriminative Information Retrieval Models (IRGAN) [35], and Graph representation learning with generative adversarial nets (GraphGAN) [34] based on pair-wise theory, namely, given a user, to let G generate items sequences that the user might be interested in, and to use D to discriminate it with specific user’s ground truth. However, these methods rely on Reinforcement Learning (RL) [17] to guide the whole process. Cross-Entropy loss is used to receive feedback information, but it can not express global data distribution of the whole training process; However, the recommendation results of these methods is not satisfactory. The reason for this is that these approaches are ineffective in mining useful information. Therefore, GANs-based methods exist some disadvantages: (1) they employ neural nets of G and D to train a model, which receive a poor reseult since G fully mine users’ preferences; (2) Pair-wise loss is designed to as target function in GANs for RS, but it doesn’t acquire global data distribution and affects information feedback.

To solve those problems we have mentioned, Wide & Deep Generative Adversarial Networks for Recommendation System (W&DGAN) is proposed. First of all, Because of memorization and generalization capabilities of Wide & Deep Learning [7],we introduce it into GANs as generative model, it can fully mine the preferences of users. On the other hand, we take advantage of the Cross-Entropy loss in G and Wasserstein loss in D, then W&DGAN not only obtains the paired results, but also get the global data distribution, which improves recommendation accuracy and even data sparsity. The contributions of this paper can be summarized as follows:

•
We propose a new GANs recommendation framework called W&DGAN, as far as we know, this is the first time to combine the Wide & Deep Learning model with GANs for Top- $N$ recommendation. Extensive experiments on several real-world datasets to demonstrate the effectiveness and rationality of the proposed W&DGAN framework.
•
The generator of W&DGAN, Wide & Deep Learning model can fully mine users’ preferences from the interaction records of user-item because of its memorization and generalization. Thus, W&DGAN receives useful preference information and recommends the sequence of items that users may like.
•
The Cross-Entropy loss in G and Wasserstein loss in D are utilized to get data distribution, which means that the joint loss will get training information feedback and obtain better experimental results.

The remainder of this paper is organized as follows: Related Work is introduced in Section 2. In Section 3, it includes problem definition, and Wide&Deep Learning. W&DGAN algorithm will be described in Section 4. Section 5 contains a description of the datasets, measurement metrics, experimental results and analysis. Conclusion and future work will be introduced in Section 6.
2. Related work

In recent years, deep learning [20] has achieved great success in computer vision and natural language processing [5]. Thus, many researchers use Deep Neural Networks to obtain better recommendation accuracy. He et al. [12] create Neural Collaborative Filtering (NCF) for the recommendation, which gets input information representation from Multi-Layer Perceptron [13]. Deep Matrix Factorization [40] adopts user-item matrix as the input, and deep structure learning architecture is to learn a common low dimensional space for the representations of users and items. In addition, many researchers employed GANs to receive better accuracy in Recommendation System. A Generic Collaborative Filtering Framework based on Generative Adversarial Networks (CFGAN) [4] is proposed. The framework redefines GANs for recommendation with CF, and abandons RL mechanism, but does not fully mine users’ preferences for interactive information; Recurrent Generative Adversarial Networks for Recommendation Systems (RecGAN) [2] is based on Recurrent Neural Networks [28], which also doesn’t obtain satisfactory recommendation based on Cross-Entropy loss; In addition, some researchers have designed adversarial networks methods for RS, such as: an Adversarial collaborative neural network for the robust recommendation (FG-ACAE) [41], Prioritize Long And Short-Term Information in Top- $N$ recommendation [42] using adversarial training (PLASTIC) and so on [6][41][31].

3. Preliminaries

In this section, we introduce some background knowledge about problem definition and Wide & Deep Learning, which are important perspectives for understanding our proposed method.

3.1 Problem definition

Given user $U=(u_{1},u_{2},u_{3},u_{4},\ldots,u_{n})$ and item $I=(i_{1},i_{2},i_{3},i_{4},\ldots,i_{n})$ in social network, user rate speficed items. The rating matrix is that an item is rated by a user, the rating may be 1, 2, 3 or 5. The interaction matrix is related to rating matrix can be describe as: an item is rated by a user. Therefore, the user has got a interaction with an item, and the result is 1. In this way, the user’s behavior can express the feedback list of the interactive information. According to He et al. idea, interaction matrix can be defined as $y_{ui}=1$ , indicating that an user is interested in item. This shows that if $y_{ui}=0$ , the user is not interested. The expression can be expressed by the following formula:

$\displaystyle y_{ui}=\left\{\begin{array}[]{ll}1,&u_{k}\textit{ has got a % interaction d with }i_{k},\\ 0,&u_{k}\textit{ without any interaction with }i_{k}.\\ \end{array}\right.$ (1)

Therefore, the Top- $N$ recommendation is to recommend to each consumer a small set of $N$ items from a large collection of items. For example, NetFlix may want to recommend $N$ appealing movies to each consumer.

3.2 Wide & deep learning

Cheng et al. [11] analyzed the users’ data and found that there are not only implicit representations but also correlations (user preference). They called it memorization and generalization capability. By combining the generalization ability of deep neural networks with the memory ability of a linear model, the user’s preference can be obtained.

The Wide model consists of a linear structure. Part of the information is input to construct the feature engineering. The following equations describe the details.

$\displaystyle y=\sigma(w^{T}x+b)$ (2) $\displaystyle\varphi(x)=\prod_{i=1}^{d}x_{i}^{c_{ki}},c_{ki}\epsilon\{0,1\}$ (3)

where the parameter $x=[x_{1},x_{2},\ldots,x_{n}]$ represents an input with a $d$ -dimension vector. $w=[w_{1},\linebreak w_{2},\ldots,w_{n}]$ is the weight parameters of the model. $b$ means bias. $\varphi(x)$ represents feature engineering, $\sigma$ represents activation, $y$ is the final output. $c_{ki}$ represents a Boolean variable, and if there is a feature engineering, the value is 1, otherwise the value is 0.

Deep model is a deep neural network. The information input is processed by the embedding layer, and then transferred into vector representation, which makes deep model has the ability to extract features. Then the vector representation is sent to the hidden layer, and the implicit information representations of the final prediction is obtained through training. The following equation shows the process of calculating:

$\displaystyle a^{(l+1)}=f(a^{(l)}w^{(l)}+b^{(l)})$ (4)

where the parameter $a^{(l)}$ , $w^{(l)}$ , $b^{(l)}$ represent vector representation, the weight of $l$ -th layer and bias, respectively, $f$ represents activation function.

Wide & Deep Learning includes two previous models, and outputs values after joint training is finished. The equation gives the details.

$\displaystyle P(Y=1|x)=\sigma(w_{\textit{wide}}[x,\varphi(x)]+w_{\textit{deep}% }a^{(l)}+b)$ (5)

where $x$ is input vector, $\phi(x)$ represents feature engineering, and $Y$ is the two classification label. $w_{\textit{wide}}^{T}$ represents the weight of the wide model, $w_{\textit{deep}}^{T}$ represents the weight of the deep model, and $b$ is bias.

4. W&DGAN

Typically, methods which are based on GANs (e.g. CFGAN) exist the same train way in recommendation system as shown as in Fig. 1. It demonstrates that G generates items for specific-user each time under the guidance of D, and that D can distinguish input from the results of G. However, it will cause some problems: (1) D distinguishes user-item pair but doesn’t consider global data distribution; (2) GANs-based models can not utilize some potential interactive information. Thus, we propose the W&DGAN algorithm to solve these problems and give a detailed introduction.

Figure 1.

The GAN-based algorithms item generation processing.

Figure 2.

W&DGAN framework.

In this section, we mainly introduce the W&DGAN method, and the framework is as shown as in Fig. 2. Before we start introducing the model, there are something need to explain the preprocessing of the input information, which makes it easier to understand the algorithm we proposed.

Input Preprocessing. According to Eq. (1), dataset information has been transformed into an interactive matrix. Therefore, we need to change it as vector representation, and the operation can be defined as follows:

$\displaystyle x_{\textit{emb}}=\textit{Embedding}(x)$ (6)

Here $x_{\textit{emb}}$ is the vector representation of the input, and the input is the user-item interaction matrix. Embedding means embedding function.

Based on GANs theory, we have a masking operation before data enters the model. The masking operation can be represented by the following formula:

$\displaystyle x_{0}=x_{\textit{emb}}\otimes\textit{mask}$ (7)

Here, mask means the random noise sequence of input information, $\otimes$ is the dot product.

4.1 Wide & deep generative learning (G)

Wide & Deep Learning mines user’s preferences through memorization and generalization capability. Therefore, we employ Wide & Deep Learning to generate item sequences for specific users. The whole process can be shown as:

$\displaystyle x_{\textit{wide}}=f_{\textit{linear}}(w^{T}_{\textit{wide}}x_{0}% +b_{\textit{wide}})$ (8) $\displaystyle x^{(l+1)}_{\textit{deep}}=f(x^{(l)}_{\textit{deep}}w^{(l)}_{% \textit{deep}}+b^{(l)}_{\textit{deep}})$ (9)

Where $x_{\textit{wide}}$ is vector of linear representation, $x^{(l+1)}_{\textit{deep}}$ means vector of deep model, and $x^{0}_{\textit{deep}}=x_{0}$ . $f_{\textit{linear}}$ is linear activation. $w^{T}_{\textit{wide}}$ , $w^{(l)}_{\textit{deep}}$ , $b_{\textit{wide}}$ , and $b^{(l)}_{\textit{deep}}$ are Wide model weight, Deep model weight, Wide model and Deep model bias. It is noted that we give up feature engineering because it is complex and requires professional knowledge. Thus, W&DGAN is easy to understand and extend.

We fuse two models, then it outputs recommendation results, which can be written by:

$\displaystyle\hat{x}=\textit{concat}(x_{\textit{wide}},x_{\textit{deep}})$ (10) $\displaystyle y=\sigma(\hat{w}\hat{x}+\hat{b})$ (11)

Where $\hat{x}$ is fusion information of Wide model and Deep model, concat is concatenate fuction. $y$ is output and $\sigma$ is sigmoid function. $\hat{w}$ and $\hat{b}$ are weight of output and bias of output.

As the target function of G, $V^{G}$ can be expressed by:

$\displaystyle\underset{\mu}{V(G)}=\sum_{\mu}[\log(1-D(y|u_{k}\odot x|u_{k}))]$ (12)

Where $\mu$ is parameters, $V^{G}$ represents sum of G loss, $\odot$ means element multiplication, $y|u_{k}$ is generative items sequences of specific user $u_{k}$ , and $x|u_{k}$ represents original input information of user $u_{k}$ , Deep Generative Recommendation Based on List-Wise Ranking.

4.2 Adversarial nets (D)

In this part, we will describe Discriminator, shortly D. Here, D is designed by deep neural networks. The reason is that deep neural networks encode the input sequences, and pay attention to distinguish differences between G and D. Therefore, D can provide feedback information for G to update parameters. Traditionally, GANs-based methods employ Cross-Entropy loss for D in Top- $N$ recommendation. The loss function can be expressed by:

$\displaystyle L=-\frac{1}{k}\sum_{k=1}^{n}((x|u_{k})ln({y|u_{k}})+(1-x|u_{k})% ln(1-{y|u_{k}}))$ (13)

However, Cross-Entropy loss doesn’t receive global distribution because of insufficient capacity [1]. Therefore, we use Wasserstein loss as the loss function of D to receive global distribution. The expression can be shown by:

$\displaystyle\underset{\theta}{V(D)}=\underset{\theta}{\sum}[D(x)-D(y)]$ (14)

Here, $\theta$ is parameters.

4.3 Training

W&DGAN combines G and D to achieve better results, so the training process can be expressed by:

$\displaystyle\underset{\mu}{\min}\underset{\theta}{\max}V(D,G)=\sum_{\mu}[\log% (1-D(y|u_{k}\odot x|u_{k}))]{}+\underset{\theta}{\sum}[D(x]-D(y)]$ (15)

We also adopt ZR loss and OR loss [4] to construct triplet loss, it can constrain the generated vectors not to be loose or dense. This means the vectors do not contain many negative or positive samples. The target function of triplet loss can be expressed by the following formula:

$\displaystyle\textit{loss}1=(y|u_{k}-a)^{2}$ (16) $\displaystyle\textit{loss}2=(y|u_{k}-b)^{2}$ (17) $\displaystyle\textit{loss}=\sum[\textit{loss}1+\textit{loss}2]=\sum[\alpha(y|u% _{k}-a)^{2}+\beta(y|u_{k}-b)^{2}]$ (18)

Here $a$ is negative value 0, $b$ represents positive value 1. $\alpha$ and $\beta$ are penalty coefficient.

Therefore, the W&DGAN target function of training process can be shown as:

$\displaystyle\underset{\mu}{\min}\underset{\theta}{\max}V(D,G)=\underset{\mu}{% \sum}[\log(1-D(y|u_{k}\odot x|u_{k}))]+\underset{\theta}{\sum}[D(x)-D(y)]+% \textit{loss}=\sum_{u_{k}}[\log(1-D(y|u_{k}\odot x|u_{k})+\alpha(y|u_{k}-a)^{2% }+\beta(y|u_{k}-b)^{2}){}+[D(x)-D(y)]]$ (19)

$\mu$ and $\theta$ are parameters, respectively. In addition, G of W&DGAN generates the sequences of item that might be of interest to the specific users. D discriminates against ground truth with items sequences of the G generated, and feedbacks the information to G. In addition, we use Adam function to optimize model and update parameters.

: W&DGAN Input: Interactive matrix $M$ , Batch_size, learning parameters of W&DGAN. Initialize parameters $\mu$ and $\theta$ . Training:

Sample user-item pair ( $x|u_{k}$ ) from $M$ . Mask $x|u_{k}$ , and some values changed from 1 to 0. All data sampling finished.

Generate items of user may like $y|u_{k}=\{\hat{i_{1}},\hat{i_{2}},\ldots,\hat{i_{n}}\}\leftarrow$ G. Update parameters of G: $\mu\leftarrow\mu-\triangledown{V^{G}_{\mu}}$ . Real items from ground truth: $\{i_{1},i_{2},\ldots,i_{k},\ldots,i_{n}\}\leftarrow M$ . D ( $y|u_{k}\approx M$ )? and get ${V^{D}_{\theta}}$ . Update parameters of D: $\theta\leftarrow\theta-\triangledown{V^{D}_{\theta}}$ . D not converged

G not converged Output: Generate item sequences for sepecific-user.

In summary, the W&DGAN algorithm can be described as: the input is operated by embedding and mask, then it will be sent into the G of W&DGAN. After that, G uses Wide & Deep model to generate sequences of item for specific user. D will discriminate item sequence of G generated whether is true or not based on the ground truth value, and feedback result information to guide G. The whole algorithm of W&DGAN is shown in Algorithm 1.

5. Experiments and analysis

In this section, we will introduce the datasets, the experimental environment and analyze the experimental results. In order to evaluate our proposed model correctly, we conduct experiments to answer the following research questions:

RQ1
How does the performance of W&DGAN compared with state-of-art algorithms for Top- $N$ recommendation?
RQ2
Is it useful components in W&DGAN (i.e., Joint loss, Wide& Deep Learning) for improving recommendation results?
RQ3
How does the stability of W&DGAN in epoch?

5.1 Datasets

The Datasets include Ciao, MovieLens 100K, and MovieLens 1M,1

¹
https://grouplens.org/datasets/movielens/.

which are very popular in Top-

N

recommendation. Three datasets’ characteristics can be summarized in Table 1. We split randomly datasets into two parts: 80% for training and the rest 20% for testing. It needs to pay attention that we never use any auxiliary context information of user and item.

Ciao

It is 18,648 interactive records of 996 users in 1,927 items. the less interactive records, the more sparseness of the data increases synchronously.

MovieLens 100K

It contains 100,000 interactive reco-rds of 943 users in 1,682 items.

MovieLens 1M

It has 1,000,209 interactive records from 6,040 users on 3,883 items, the scale of the dataset increases, and the sparseness of the data increases synchronously.

All information of datasets are as shown in Table 1. In addition, we need to preprocess the datasets: the value is 1 if an item is rated by an user, else value is 0. The input of W&DGAN is (user, item, rating) and the output is (user, item1, item2, …, itemN).

Table 1

Statistics of the experimental datasets

	Ciao	MovieLens 100K	MovieLens 1M
Users	996	943	6,040
Items	1,927	1,682	3,883
Records	18,648	100,000	1,000,209
Sparsity	98.72%	93.7%	95.8%

5.2 Metrics and implementation details

5.2.1 Metrics

We use four popular accuracy metrics to evaluate W&DGAN in Top- $N$ recommendation: Recall (Recall), Normalized Discounted Cumulative Gain (NDCG), Mean Reciprocal Rank (MRR), and Precision (Precision). The expressions are as follow:

$\displaystyle\textit{DCG}=\sum_{i}^{k}{\frac{r(i)}{\log_{2}(i+1)}}$ (20) $\displaystyle\textit{NDCG}_{u}@N=\frac{\textit{DCG}_{u}@K}{\textit{IDCG}_{u}}$ (21) $\displaystyle\textit{NDCG}@N=\frac{\textit{NDCG}_{u}@K}{|u|}$ (22) $\displaystyle\textit{Precision}@N=\frac{TP}{TP+FP}$ (23) $\displaystyle\textit{Recall}@N=\frac{TP}{TP+FN}$ (24) $\displaystyle\textit{MRR}@N=\frac{1}{|Q|}\sum_{|U|}^{i=1}\frac{1}{\textit{rank% }_{i}}$ (25)

Here value of N is 5 and 20, TP is numbers of the true prediction, FP represents numbers of the false prediction, FN is numbers of false negative samples. TP $+$ FP is numbers of all positive samples, and TP $+$ FN means numbers of all samples. IDCG is the maximum DCG and r (i) represents the relevance of i-th user.

5.2.2 Implementation details

We employ the deep learning framework which is TensorFlow-1.13 to implement our model and deployed it on a NVIDIA Tesla P100 GPU with 16 GB of memory. The OS is Ubuntu 16.04.5LTS server, and memory is 128 GB.

5.2.3 Comparison algorithms

To better answer RQ1, The experimental settings are based on IRGAN [4] algorithm, and 5 cross-validation experiments are performed for comparison, and then the average value is taken. Comparison algorithms with the following:

(1)
ItemPop [27]. This method is that items are recommended based on popularity of user.
(2)
BPR [25]. This method inherits Matrix Factorization with implicit feedback.
(3)
FISM [16]. Factored item similarity models,it is based on CF method.
(4)
CDAE [37]. Collaborative Denoise AutoEncoder.
(5)
IRGAN [35]. Information Retrieval Generative Adversarial Nets for recommendation.
(6)
GraphGAN [44]. Graph Generative Adversarial Nets for recommendation.
(7)
CAAE [6]. Collaborative Adversarial AutoEncoder.
(8)
CFGAN [4]. Collaborative Filtering algorithm based on conditional Generative Adversarial Nets.
(9)
PLASTIC [42]. Prioritize Long And Short-Term Information in top-n recommendation using adversarial training.
(10)
FG-ACAE [41]. Adversarial collaborative neural network for robust recommendation.
(11)
CollaGAN [31]. Collaborative generative adversarial network for recommendation systems.
(12)
NGCF [36]. Neural Graph Collaborative Filtering.
(13)
GAT [32]. Graph Attention Networks.
(14)
MCREC [14]. Leveraging Meta-path based Context with A Neural Co-Attention Model.
(15)
DGRBR [39]. Deep Generative Recommendation Based on List-Wise Ranking.
(16)
SEMI-FL-MV-DSSM [15]. A Federated Multi-View Deep Learning Framework for Privacy-Preserving Recommendations.
(17)
FED-MVMF [8]. Federated multi-view matrix factorization for personalized recommendations.

5.3 Performance comparison with baselines(RQ1)

Table 2 shows the experimental results of the Ciao dataset (Bold indicates the best result on different datasets). We can see that W&DGAN receives better results with an average improvement of 0.3%. According to the dataset description by Table 1 and metrics definition, we find that Ciao dataset size of user-item interactive information is less, but W&DGAN obtains a better recommendation effect. It proves that W&DGAN can mine useful information from user-item interactive matrix even in the case of sparse data.

Table 3 shows the experimental performance of MovieLens 100K dataset. It has a similar situation with Table 2, W&DGAN is better than the other baselines. Compared with the other algorithms, W&DGAN improves by 0.3% on average regardless of the top-5 or top-20 recommendation. It demonstrates that the W&DGAN can efficiently mine potential interactive information of user and item and receive good recommendation effects.

Table 2
Experimental performance of W&DGAN and baselines on the Ciao dataset

	Precision@5	Precision@20	Recall@5	Recall@20	NDCG@5	NDCG@20	MRR@5	MRR@20
ItemPop	0.031	0.024	0.040	0.127	0.047	0.065	0.056	0.067
BPR	0.036	0.025	0.040	0.141	0.052	0.066	0.066	0.078
FISM	0.062	0.040	0.072	0.178	0.079	0.109	0.127	0.147
CDAE	0.061	0.042	0.075	0.185	0.081	0.108	0.127	0.151
GraphGAN	0.026	0.017	0.1041	0.100	0.041	0.058	0.057	0.068
IRGAN	0.035	0.023	0.042	0.111	0.046	0.066	0.082	0.088
CAAE	0.067	0.042	0.079	0.187	0.086	0.120	0.144	0.164
CFGAN	0.072	0.045	0.081	0.194	0.092	0.124	0.154	0.167
W&DGAN	0.073	0.045	0.084	0.198	0.094	0.127	0.161	0.186

Table 3

Experimental performance of W&DGAN and baselines on the MovieLens 100K dataset

	Precision@5	Precision@20	Recall@5	Recall@20	NDCG@5	NDCG@20	MRR@5	MRR@20
ItemPop	0.181	0.138	0.102	0.251	0.163	0.195	0.254	0.292
BPR	0.348	0.236	0.116	0.287	0.370	0.380	0.556	0.574
FISM	0.426	0.285	0.140	0.353	0.462	0.429	0.674	0.685
CDAE	0.433	0.287	0.144	0.353	0.465	0.425	0.664	0.674
GraphGAN	0.212	0.151	0.102	0.260	0.183	0.249	0.282	0.312
IRGAN	0.312	0.221	0.107	0.275	0.342	0.368	0.536	0.523
CAAE	0.435	0.289	0.151	0.348	0.475	0.432	0.686	0.697
PLASTIC	0.312	–	–	–	0.331	–	–	–
CFGAN	0.444	0.294	0.152	0.360	0.476	0.433	0.683	0.693
W&DGAN	0.451	0.298	0.158	0.362	0.484	0.442	0.686	0.697

Table 4

Experimental performance of W&DGAN and baselines on the MovieLens 1M dataset

	Precision@5	Precision@20	Recall@5	Recall@20	NDCG@5	NDCG@20	MRR@5	MRR@20
ItemPop	0.157	0.121	0.076	0.197	0.154	0.181	0.252	0.297
BPR	0.341	0.252	0.077	0.208	0.349	0.362	0.537	0.556
FISM	0.420	0.302	0.107	0.270	0.443	0.399	0.637	0.651
CDAE	0.419	0.307	0.108	0.272	0.439	0.401	0.629	0.644
GraphGAN	0.178	0.194	0.070	0.179	0.205	0.184	0281	0.316
IRGAN	0.263	0.214	0.072	0.166	0.264	0.246	0.301	0.338
CFGAN	0.432	0.309	0.108	0.272	0.455	0.406	0.647	0.660
FG-ACAE	–	–	–	–	0.458	–	–	–
CollaGAN	0.428	–	–	–	0.417	–	–	–
W&DGAN	0.437	0.314	0.110	0.275	0.461	0.412	0.652	0.666

Table 5

Experimental performance of W&DGAN in NDCG@10 on the MovieLens 100K dataset

	MovieLens 100K NDCG@10
BMF	0.408
NMF	0.234
GAT	0.256
MCRec	0.262
NGCF	0.418
DGRBR	0.327
SEMI-FL-MV-DSSM	0.317
FED-MVMF	0.280
W&DGAN	0.453

MovieLens 1M dataset experimental results are shown in Table 4. It is similar to Table 3, W&DGAN’s performance higher than the baselines with is 0.4% improvement on average. Therefore, it illustrates that W&DGAN obtains the potential interactive information between users and items on large dataset.

Recently, Graph Neural Networks (GNNs) [38] has an advantage in network representation of the social network, so many researchers adopt it into RS [36, 32, 14]. The experimental performance is measured by NDCG@10 of MovieLens 100K because of the sparsity of data. And the results of NDCG@10 are shown in Table 5. We can find that some GNNs-based algorithms like NGCF can obtain better recommendation effects. Nevertheless, W&DGAN also achieved the best recommendation accuracy. Other MF-based methods (BMF, NMF, etc. [45]) can not get satisfactory recommendation results because of the sparsity of data. It also proves that W&DGAN is efficient in mining social network data.

Figure 3.

W&DGAN different components results on Ciao dataset.

Figure 4.

W&DGAN different components results on MovieLens 100K dataset.

Figure 5.

W&DGAN different components results on MovieLens 1M dataset.

5.4 Wide& deep learning and joint loss (RQ (2))

The results of different components in W&DGAN on different datasets are as shown in Figs 3–5, respectively. The W&DGAN_Wide_Model and W&DGAN_Deep_Model mean that W&DGAN uses Wide Model or Deep Model. Next, we will introduce it in detail.

Figure 3 shows the different results of the Ciao dataset. We can see that W&DGAN gains advantage instead of W&DGAN without Wasserstein loss. In addition, compared to other different components, W&DGAN keeps good results. It is clearly shown that different components will affect the recommendation accuracy of Ciao dataset.

MovieLens 100K results of different components are shown in Fig. 4. The W&DGAN obtains an average improvement of 0.1%, which is similar to the Ciao dataset. We think that MovieLens 100K is denser than Ciao, so interactive information will be received more. The results also explain that Wasserstein loss can receive user preferences from global distribution of data.

Figure 5 shows the experimental results of W&DGAN with different components in MovieLens 1M. It shows that the average improvement of W&DGAN is less than 0.1%. According to Table 1, we think that MovieLens 1M is larger than the other two datasets. Therefore, the potential preference information can be easily mined by recommendation algorithms. That’s why most of the methods in MovieLens 1M can get better results.

It should be noted that the performace of W&DGAN_Wide_Model is not satisfactory, we think that the interaction matrix is the linear representation, and W&DGAN_Wide_Model can not directly obtain a better representation from $n-\textit{dimension}$ to $1-\textit{dimension}$ . Even so, W&DGAN combines with Wide & Deep model fully mine effective preferences of the user by $n+1-\textit{dimension}$ to $n-\textit{dimension}$ information representation.

5.5 Stability of W&DGAN (RQ (3))

Figures 6–11 reveal the stability of W&GAN in training epochs at three datasets, as we can see that W&DGAN has better stable results, not under unstable conditions. Firstly, the Precision, NDCG, Recall, and MRR of Ciao dataset have achieved the best results in 300 epochs; MovieLens 100K has obtained the best recommendation accuracy of Precision, NDCG, Recall, and MRR in about 100 epochs; the best results of MovieLens 1M of Precision, NDCG, Recall, and MRR meet in 150 epochs. Secondly, the W&DGAN method not only obtains a good recommendation effect, but also possesses better stability. When the model trains processing period on the best effect epoch, the MovieLens 100K and MovieLens 1M keep good results, but the Ciao dataset is not good. We think the reason is that the dataset is small and it is easy to get overfitting.

Figure 6.

Ciao dataset MRR and NDCG measure results of epochs.

Figure 7.

Ciao dataset Recall and Precision measure results of epochs.

Figure 8.

MovieLens 100K dataset MRR and NDCG measure results of epochs.

Figure 9.

MovieLens 100K dataset Recall and Precision measure results of epochs.

Figure 10.

MovieLens 1M dataset MRR and NDCG measure results of epochs.

Figure 11.

MovieLens 1M dataset Recall and Precision measure results of epochs.

6. Conclusion

In this paper, we propose a new framework for recommendation system: Wide&Deep Generative Adversarial Networks for recommendation system (W&DGAN), it is to obtain implicit preference information and improve recommendation accuracy. W&DGAN takes Wide&Deep Learning to learn users’ preferences from interactive information of user-item. Cross-Entropy loss in generator (G) and Wasserstein loss in discriminator (D) of W&DGAN is to obtain information feedback from distribution loss of data. The experimental results prove that W&DGAN possesses better recommendation accuracy in several datasets

Many GANs (BigGANs, WGAN-GP [10] etc.) have greatly improved in convergence properties and optimization stability, and attention mechnism [30] has been applied in recommendation. In the next work, we will apply these methods to W&DGAN, and get better recommendation accuracy.

Footnotes

Acknowledgments

This work was partially supported by the University-level key projects of Anhui University of Science and Technology (Grants #xjzd2020-15), Collaborative Innovation Project of Anhui Province (Grants #GXXT-2019-018), and the Provincial Artificial Intelligence and Robot Experimental Training Center Project (Grants # 2020sxzx08). The authors also would like to thank the anonymous reviewers for their valuable comments.

References

Martin

Soumith

and Léon

, Wasserstein gan, arXiv preprint arXiv:1701.07875, 2017.

Homanga

Homin

and Brian

, Recgan: recurrent generative adversarial networks for recommendation systems, in: Proceedings of the 12th ACM Conference on Recommender Systems, 2018, pp. 372–376.

Andrew

Jeff

and Karen

, Large scale gan training for high fidelity natural image synthesis, in: Proceedings of International Conference on Learning Representations, 2018, pp. 1–11.

Dong

Jin

Sang

and Jung

, Cfgan: A generic collaborative filtering framework based on generative adversarial networks, in: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018, pp. 137–146.

Mao

Wang

and Zhao

, Automatic image detection of multi-type surface defects on wind turbine blades based on cascade deep learning network, Intelligent Data Analysis 25(2) (2021), 463–482.

Dong

Jung

and Sang

, Collaborative adversarial autoencoders: An effective collaborative filtering model under the gan framework, IEEE Access 7 (2019), 37650–37663.

Heng

Levent

Jeremiah

and Mustafa

, Wide & deep learning for recommender systems, in: Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, 2016, pp. 7–10.

Adrian

Were

Alexander

and Muhammad

, Federated multi-view matrix factorization for personalized recommendations, in: Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2020.

Goodfellow

Jean

and Yoshua

, Generative adversarial nets, in: Proceedings of Advances in Neural Information Processing Systems, 2014, pp. 2672–2680.

10.

Ishaan

Faruk

Martin

Vincent

and Aaron

, Improved training of wasserstein gans, in: Proceedings of Advances in Neural Information Processing Systems, 2017, pp. 5767–5777.

11.

Zhang

Ren

and Sun

, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.

12.

Liao

Zhang

and Chua

, Neural collaborative filtering, in: Proceedings of the 26th International Conference on World Wide Web, 2017, pp. 173–182.

13.

Kurt

Maxwell

and Halbert

, Multilayer feedforward networks are universal approximators, Neural Networks 2(5) (1989), 359–366.

14.

Shi

Zhao

and Philip

S.Y.

, Leveraging meta-path based context for top-n recommendation with a neural co-attention model, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 1531–1540.

15.

Huang

Bai

and Wang

, A federated multi-view deep learning framework for privacy-preserving recommendations, arXiv preprint arXiv:2008.10808, 2020.

16.

Santosh

Xia

and George

, Fism: factored item similarity models for top-n recommender systems, in: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013, pp. 659–667.

17.

Lesli

Michael

and Andrew

, Reinforcement learning: A survey, Journal of Artificial Intelligence Research 4 (1996), 237–285.

18.

Yehuda

Robert

and Chris

, Matrix factorization techniques for recommender systems, Computer 8 (2009), 30–37.

19.

Hugo

Yoshua

Jérôme

and Pascal

, Exploring strategies for training deep neural networks, Journal of Machine Learning Research 10(1) (2009), 1–40.

20.

Yann

Yoshua

and Geoffrey

, Deep learning, Nature 521(7553) (2015), 436.

21.

Mehdi

and Simon

, Conditional generative adversarial nets, arXiv preprint arXiv:1411.1784, 2014.

22.

Andriy

and Ruslan

, Probabilistic matrix factorization, in: Proceedings of Advances in Neural Information Processing Systems, 2008, pp. 1257–1264.

23.

Qian

and Zhang

, Generative image inpainting for link prediction, Applied Intelligence, 2020, 1–13.

24.

Alec

Luke

and Soumith

, Unsupervised representation learning with deep convolutional generative adversarial networks, arXiv preprint arXiv:1511.06434, 2015.

25.

Steffen

Christoph

Zeno

and Lars

, Bpr: Bayesian personalized ranking from implicit feedback, arXiv preprint arXiv:1205.2618, 2012.

26.

Ruslan

and Andriy

, Bayesian probabilistic matrix factorization using markov chain monte carlo, in: Proceedings of the 25th International Conference on Machine Learning, 2008, pp. 880–887.

27.

Badrul

George

Joseph

and John

, Item-based collaborative filtering recommendation algorithms, 2001, 285–295.

28.

Mike

and Kuldip

, Bidirectional recurrent neural networks, IEEE Transaction on Signal Processing, 1997, 2673–2681.

29.

Shi

Wang

Zhang

and Zhou

, Heterogeneous graph neural network for recommendation, arXiv preprint arXiv:2009.00799. 2020.

30.

Shi

Shen

Kou

Nie

and Yu

, Attentional memory network with correlation-based embedding for time-aware poi recommendation, Knowledge-Based Systems, 2021, 106747.

31.

Tong

Luo

Zhang

Sadiq

and Cui

, Collaborative generative adversarial network for recommendation systems, in: Proceedings of the IEEE 35th International Conference on Data Engineering Workshops (ICDEW), 2019, pp. 161–168.

32.

Petar

Adriana

Pietro

and Yoshua

, Graph attention networks, in: Proceedings of International Conference on Learning Representations, 2018.

33.

Wang

and Yeung

, Collaborative deep learning for recommender systems, in: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 1235–1244.

34.

Wang

Zhang

Xie

and Guo

, Graphgan: Graph representation learning with generative adversarial nets, in: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018, pp. 2508–2515.

35.

Wang

Zhang

and Dell

, Irgan: A minimax game for unifying generative and discriminative information retrieval models, in: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017, pp. 515–524.

36.

Wang

and Chua

, Neural graph collaborative filtering, in: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019, pp. 165–174.

37.

DuBois

and Ester

, Collaborative denoising auto-encoders for top-n recommender systems, in: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, 2016, pp. 153–162.

38.

Pan

Chen

and Philip

, A comprehensive survey on graph neural networks, IEEE Transaction on Neural Networks and Learning Systems, 2020, 1–21.

39.

Sun

Liu

Jing

and Yu

, Deep generative recommendation based on list-wise ranking, Journal of Computer Research and Development, 2020, 1697–1706.

40.

Xue

Dai

Zhang

and Chen

, Deep matrix factorization models for recommender systems, in: Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017, pp. 3203–3209.

41.

Yuan

Yao

and Boualem

, Adversarial collaborative neural network for robust recommendation, in: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019, pp. 1065–1068.

42.

Zhao

Wang

and Chen

, Plastic: prioritize long and short-term information in top-n recommendation using adversarial training, in: Proceedings of the 27th International Joint Conference on Artificial Intelligence, 2018, pp. 3676–3682.

43.

Zheng

Tang

Ding

and Zhou

, A neural autoregressive approach to collaborative filtering, in: Proceedings of the 33rd International Conference on International Conference on Machine Learning, 2016, pp. 764–773.

44.

Wang

and Zhao

, Graphgan: Graph representation learning with generative adversarial nets, in: Proceedings of the the AAAI Conference on Artificial Intelligence, 2018.

45.

Mikkel

Ole

and Kai

, Bayesian non-negative matrix factorization, in: Proceedings of International Conference on Independent Component Analysis and Signal Separation, 2009, pp. 540–547.

Wide & deep generative adversarial networks for recommendation system

Abstract

Keywords

1. Introduction

3. Preliminaries

3.1 Problem definition

RQ1 How does the performance of W&DGAN compared with state-of-art algorithms for Top- N recommendation? RQ2 Is it useful components in W&DGAN (i.e., Joint loss, Wide& Deep Learning) for improving recommendation results? RQ3 How does the stability of W&DGAN in epoch? 5.1 Datasets

1 https://grouplens.org/datasets/movielens/.

Ciao

MovieLens 100K

MovieLens 1M

5.2.1 Metrics

5.2.3 Comparison algorithms

Table 2 Experimental performance of W&DGAN and baselines on the Ciao dataset

5.5 Stability of W&DGAN (RQ (3))

Footnotes

Acknowledgments

References

RQ1
How does the performance of W&DGAN compared with state-of-art algorithms for Top- $N$ recommendation?
RQ2
Is it useful components in W&DGAN (i.e., Joint loss, Wide& Deep Learning) for improving recommendation results?
RQ3
How does the stability of W&DGAN in epoch?

5.1 Datasets

¹
https://grouplens.org/datasets/movielens/.

Table 2
Experimental performance of W&DGAN and baselines on the Ciao dataset