Abstract
Recommender systems have been widely used in our life in recent years to facilitate our life. And it is very important and meaningful to improve recommendation performance. Generally, recommendation methods use users’ historical ratings on items to predict ratings on their unrated items to make recommendations. However, with the increase of the number of users and items, the degree of data sparsity increases, and the quality of recommendations decreases sharply. In order to solve the sparsity problem, other auxiliary information is combined to mine users’ preferences for higher recommendation quality. Similar to rating data, review data also contain rich information about users’ preferences on items. This paper proposes a novel recommendation model, which harnesses an adversarial learning among auto-encoders to improve recommendation quality by minimizing the gap of the rating and review relation between a user and an item. The empirical studies on real-world datasets show that the proposed method improves the recommendation performance.
Introduction
Recommender systems provide users with their preferred items to solve the information overload problem in the era of the explosive growth of Web information. They have been playing an increasingly important role in Web applications in a variety of areas. Collaborative Filtering (CF) is a widely used recommendation approach [1]. The matrix factorization (MF) method [11] predicts ratings by the inner product of the user and item latent feature vectors learned from the historical user-item rating information. CF algorithms usually use users’ historical ratings on items to predict ratings on their unrated items to make recommendations. However, the rating data are sparse because a user may only rate several items. In order to solve the sparsity problem, we need other data sources to mine users’ preferences for higher recommendation quality. As user-generated reviews on items contain valuable information including users’ preferences on items and item features, many methods have been proposed to combine users’ historical reviews with real ratings on items to generate high-quality recommendations [5].
The main idea of these methods is to generate recommendations by transforming reviews, users, and items into a unified latent feature space. The latent feature space can be obtained from user-item rating information and user-item review information. Obviously, in the rating prediction task for review-based recommendations, on the one hand, we can draw a user-item rating relation from latent feature representations of a user and an item; on the other hand, we can also extract the user-item review relation from the review contents of the user on the item. The crucial problem is how to fuse the two relations to make recommendations. As both the rating relation and review relation between a user and an item can show the same preference of the user on the item, we should minimize their gap and make them indistinguishable.
The mechanism of generative adversarial nets (GANs) [6] trains a discriminator to distinguish real data from the generated data, and a generator to generate high-quality data to fool the discriminator. When the discriminator fails to discriminate between real data and fake data, we can say that the fake data are the same as the real data. So it is intuitive to employ such kind of min-max game to fuse the user-item rating relation and review relation by making them indistinguishable. In this paper, we train a discriminator to discriminate the two relations. As a result, the feature extractors represented by auto-encoders are optimized to minimize the difference of the two relations in an adversarial manner. Finally, we can use the learned user and item representations to predict ratings for review-based recommendations.
The remainder of this paper is organized as follows: Section 2 gives a brief overview of related work; Section 3 details the proposed model; Section 4 introduces the experimental setup and reports the results. Finally, Section 5 concludes the paper.
Related work
Rating and review-based recommendation methods
Traditional methods
In the traditional collaborative filtering algorithms, rating information is usually used to calculate the similarity of users or items. On the one hand, under the assumption that a user tends to select items similar to what the user has bought in the past, the item-based collaborative filtering (ICF) recommends items by using the similarities between different items. On the other hand, under the assumption that similar users have similar tastes, the user-based collaborative filtering (UCF) recommends items by using the similarities between different users [1]. The matrix factorization (MF) method [11] predicted ratings by the inner product of the user and item latent feature vectors learned from the historical user-item rating matrix. Many subsequent works added auxiliary information including users’ reviews on items to improve the recommendation performance under the MF framework. HFT [19] assumed that topic factors of an item share topics of all review documents of the item. Specially, the topic probability distribution of all reviews of the item can be measured by topic factors of the item. TopicMF [2] further combined user and item topic factors with reviews simultaneously.
Neural network-based methods
The neural networks for traditional recommendations can be divided into two categories [23]. One is how to use multi-layer non linear transformation functions to calculate the matching score of user and item embedding. For example, NeuMF [8] concatenated the user and item vectors and then feeded them into a multi-layer neural network to obtain the score values. But because of the huge differences in semantics between users and items, we cannot match them in the initial space. To solve this problem, the other methods are proposed to use neural networks for collaborative filtering based on representation learning. These methods focus on how to map users and items into a unified space by using neural networks and then match them. For example, ReDa [24] simultaneously learned the latent representations of users and items using auto-encoders only from historical user-item rating information. In ReDa, the encoder part transformed the user or item rating vector into a low-dimensional latent feature space, and the decoder part reconstructed user or item information.
Recently, deep neural networks have also been successfully applied to predicting ratings for review-based recommendations. These methods can be roughly divided into two categories. One is how to use user-item review information to generate latent feature representations of users and items to improve recommendation quality. For example, ConvMF [9] employed a convolutional neural network to learn item features from item review documents for rating prediction. TransNet [3] learned the latent representations of users and items by convolutional neural networks from their review documents respectively, and then the learned latent representations are concatenated to predict ratings. TARMF [16] co-learned matrix factorization and an attention-based GRU network to represent user and item features from ratings and user review documents. Following an adversarial training strategy, MT [17] employed a neural architecture of sequence-to-sequence learning to obtain user and item representations from user review documents and item review documents. DAML [15] utilized local and mutual attention of the convolutional neural network to jointly learn the user and item representations from user review documents and item review documents. The other is how to generate reviews by combining user-item rating information to enhance the performance of latent feature representations of users and items, and then achieve better rating prediction performance. For example, NRT [13] employed gated recurrent neural networks to transform user and item latent representations into a concise abstractive review, and simultaneously predict precise ratings.
Generative adversarial networks for recommendations
GANs are widely studied during recent several years for generative models. The basic idea is to play a min-max game between two adversarial methods: the generator is trained to generate high-quality data to fool the discriminator, while the discriminator is trained to discriminate real data and generated data [6]. [18] proposed an adversarial auto-encoder (AAE) in which the encoder learned the latent representation following an arbitrary prior distribution by the adversarial learning, while the decoder mapped the imposed prior to the reconstructed data distribution by minimizing the reconstruction loss.
Researchers have tried to exploit GANs to improve the performance of recommender systems by using the discriminator to distinguish true instances from fake instances based on the data space or the latent space. Based on the data space, [22] used the GAN mechanism to generate an item that a user may prefer, and obtained impressive results; [7] enhanced the pairwise ranking method by performing adversarial training on the user and item latent representations. Based on the latent space, [21] used the adversarial learning method to seek a common subspace of different models for retrieval; [12] used the discriminator to discriminate the same user’s different latent representations generated by two generators. [4] followed the direction of real-valued vector-wise GAN in order to fully exploit the advantage of adversarial training for higher recommendation accuracy in CF. Different from their recommendation methods by using GAN, our model plays a min-max game between the user-item rating relation and review relation, instead of between different latent representations of a user or an item.
Model
As shown in Fig. 1, our model (aae-RS for short)
composes of three auto-encoders, namely

The framework of our model aae-RS.
Let Y be a triplet set in which each triplet
Taking the obtained binary vector
In the above method, we actually employ a deep auto-encoder to construct the
review representations. The encoder part composes of non-linear hidden layers
from the first hidden layer to the
Obviously, we don’t need to recommend an item to a user if the user has written a review on the item. So a separate review can not be used to generate recommendation. In addition, we need the representations of users and items. In the next section, we will use the rating information of users on items to represent users and items.
Representations of users and items
The rating matrix is denoted
Taking the vector
Fusion
Given user u, item j, and a review of the user
on the item
Rating prediction
The obtained latent feature representations of user u and item
j are fed into
At last, the predicted rating of user u on item
j is obtained via a regression layer with weight matrix
As our task is rating prediction, the square loss between the predicted ratings
and the real ratings is used for training. The loss function is
Model inference
By fixing
We train
By fixing
Furthermore, in order to avoid the overfitting problem, we use the
Experiments
In this section, we conduct experiments on two real-world data sets to answer the
following questions which can certify the effectiveness of our model. Q1: How does our proposed model aae-RS perform
compared to other recommendation
methods? Q2: How do the key components
impose influence on the performance of our model
aae-RS? Q3: Is our model aae-RS sensitive
to the key hyper-parameters?
Experimental settings
Data set
We adopt Yelp dataset and Amazon video game dataset to perform a fair
evaluation of our models. Yelp is the largest comment website in the USA.
Users can rate merchants, submit comments, and exchange shopping experiences
on the Yelp website. The Yelp dataset used in this paper includes
Statistics of the datasets
Statistics of the datasets
We randomly select
Implementation details
In our experiments, the

Comparisons of aae-RS with different methods.
Performance of aae-RS compared with its two variants
We compare our aae-RS with ReDa, mDA-CF, ConvMF- and HFT on these data sets.
ReDa [24] simultaneously learns latent representations of users and items using auto encoders only from historical user-item rating information, and values of their dot product are the predicted ratings.
mDA-CF [14] learns latent representations of users and items using auto encoders only from review information, and values of their dot product are the predicted ratings.
ConvMF- [9] learns item latent features from item review documents which are close to ones from the matrix factorization method. ConvMF- refers to the ConvMF model uses the full connected network instead of a convolutional neural network.
HFT [19] co-learns topic features from review documents and matrix factorization techniques for rating prediction.
All comparison results on RMSE are summarized in Fig. 2. In this figure, the performance of each method generally becomes better when the sampling training data increases in the two datasets. On Yelp, aae-RS and ReDa, both of which use the encoder-decoder to reconstruct rating information, achieve higher performance. Specially, our model aac-RS achieves the best results by additional review information. Furthermore, we can observe that the effects of mDA-CF which only reconstructs review information and ConvMF- which does not reconstruct review information are not stable in different proportion of training sets. Therefore, we can conclude that it is necessary for aae-RS to reconstruct both rating and review information. On Amazon, we can see that ReDa and mDA-CF, which only reconstruct either the rating information or review information, are not as effective as ConvMF- and HFT which only consider the items’ review information. Our method achieves the best performance by comprehensively considering users’ ratings, items’ ratings, users’ reviews and items’ reviews.
Effect of adversarial learning and noise (Q2)
To deeply demonstrate the contributions of adversarial learning and noises for
aae-RS, we compare the performance of aae-RS with its two variants, namely
aae-RS without D and aae-RS without

Training loss trend.
On Yelp, we can also see that the recommendation performance improves
significantly by adversarial learning and noise for the
We further explore the trend of training loss to show the effect of adversarial learning in aae-RS. Let other loss denote the sum of three reconstruction losses and prediction loss. We sample values of the adversarial loss and the other loss from epoch 1 to 500 on the two datasets as shown in Fig. 3. If the value of adversarial loss would have a remarkable increasing or decreasing trend, the discriminator can distinguish two relations well and can not fuse them to achieve effective recommendations. Fortunately, such results do not happen in Fig. 3. In Fig. 3, we can see that the other loss decreases almost monotonously and converges smoothly during training, while the adversarial loss decreases almost around the initial 10 epochs and then stabilizes. So we can conclude that results in Fig. 3 are in accordance with the expectation of the fusion strategy in our aae-RS framework.
In this section, we investigate the influence of the hyper parameters

The study of parameter influence of aae-RS on yelp.

The study of parameter influence of aae-RS on Amazon.
In general, we use user-item rating information to make recommendations. In addition, auxiliary information can be used to improve the recommendation performance. In this paper, we propse the aae-RS model to fuse user-item rating and review information. In aae-RS, we simultaneously learn the latent factors of users and items from rating and review information. Specially, by adversarial learning, rating and review information can be better fused for recommendations when discriminators cannot distinguish them. Experiments on real-world data sets validate the superiority of our proposed framework. In the future, ass-RS will use more effective neural networks including Convolutional Neural Network (CNN) to represent user review information and make better recommendations.
Footnotes
Acknowledgement
This work was partly supported by the National Natural Science Foundation of China (61562009, 61420106005), and Program of Guizhou Provincial Science and Technology Department (No. [2019]2502). This paper is extended from our paper in the 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2018)
