Heterogeneous information fusion based graph collaborative filtering recommendation

Abstract

Nowadays, with the application of 5G, graph-based recommendation algorithms have become a research hotspot. Graph neural networks encode the graph structure information in the node representation through an iterative neighbor aggregation method, which can effectively alleviate the problem of data sparsity. In addition, more and more information graph can be used in collaborative filtering recommendation, such as user social information graph, user or item attributed information graph, etc. In this paper, we propose a novel heterogeneous information fusion based graph collaborative filtering method, which models graph data from different heterogeneous graph, and combines them together to enhance presentation learning. Through information propagation and aggregation, our model can learn the latent embeddings effectively and enhance the performance of recommendation. Experimental results on different datasets validate the outperformance of the proposed framework.

Keywords

Heterogeneous information collaborative filtering graph neural network recommender systems

1. Introduction

With the application of 5G and Web 3.0, the problem of information overload has become more and more serious. In recent years, recommender systems have played an increasingly important role in various online services, which can provide users with personalized and accurate information by analyzing users’ behaviors. Traditional recommendation methods are roughly divided into three categories: collaborative filtering (CF)-based [11], content-based [24] and hybrid-based [27]. User item interaction data can be explicit feedbacks (ratings, reviews, etc.) or implicit feedbacks (browsing, click, etc.). CF-based recommendation methods need to utilize historical interactions to construct and factorize the interaction matrix, obtain the latent features of users and items, and then predict and recommend [40].

In recent years, graph learning (GL) methods have developed rapidly and have showed good application prospects [8]. It is an emerging artificial intelligence technology, which is essentially a machine learning method based on graph-structured data. Wang et al. [33] deeply researched and summarized the application of graph learning methods in recommender systems. Most of the data in recommender systems is in the form of graph structure, such as users’ ratings (or clicks) data on items and attribute information between users (or items) is essentially a bipartite graph. These latent features of data determine the complex relations between objects (users and items) and should be considered when designing recommendation algorithms. As one of the most promising ML methods, graph learning methods have natural advantages in learning complex relations between objects and have shown great potential in the ability to acquire knowledge from various graph structure data. For example, graph learning methods represented by random walk [23], graph representation learning [19], and graph neural networks [41], which have been widely used to learn specific types of relationships on graph structure data effectively. Therefore, it is natural and sensible to apply graph learning methods to the field of recommender systems.

Graph neural networks (GNNs) [5, 34] technology has stimulated strong interest in academia and industry in recent years, which can better combine node information with topology structure. Due to the excellent performance of GNNs in processing graph structure data, GNNs have been widely used in many field [16, 25, 38, 42, 43, 44], such as knowledge graph, recommender systems, CV, NLP and social networks. Graph neural networks with good interpretability and superior performance have become a popular graph analysis method and have broad research prospects. In addition to the problem of data sparseness, the following challenges still exist: how to learn users’ latent preferences on items effectively and how to extract latent embeddings from interactions history and side information (e.g., social relations or item attribute). Since most data in recommender systems are essentially in the form of graph structures, and GNNs have great advantages in graph structure data and representation learning.

Although graph learning technology has made some progress, how to apply it to recommender systems is still a hot research topic, and how to utilize different heterogeneous graphs to construct a unified model to extract the users’ preferences is still under explored.

Figure 1.

Heterogeneous information graph.

To sum up, in order to achieve an efficient personalized recommender system, this paper mainly integrates graph neural networks into recommender system application scenarios for research. This paper aims to solve some key problems encountered in traditional recommender systems: cold start (user cold start and item cold start) and data sparsity problem (user interaction data about items is very sparse), and integrate heterogeneous information (user social relations, and correlation information between items) into recommender system to improve recommendation performance, as shown in Fig. 1.

To better tackle these problems, we aim to construct a novel heterogeneous information fusion-based graph collaborative filtering method, which fuses heterogeneous information with user item interactions to enhance recommendation accuracy. The major contributions are as follows:

•

We construct a new heterogeneous information fusion-based graph collaborative filtering method, which integrates social information (e.g., friend or colleague) and item attributed information (e.g., price or category) with interaction information (e.g., ratings or clicks) to learn latent embeddings effectively and accurately. Social information can better model users’ preferences by diffusing social influence, and item attributed information can help better inference users’ preferences by diffusing item attribute relation.

•

We adopt information propagation and aggregation to integrate heterogeneous information with interactions to extract the latent embeddings, which can greatly enhance recommendation performance.

•

We conduct empirical analysis on different datasets and compare to different variants of our proposed framework. Extensive results verify the effectiveness of our proposed framework.

The remainder is organized as follows. Section 2 gives related literatures to our work. Then, Section 3 describes our proposed framework completely. Next, Section 4 conducts extensive experiments. Finally, Section 5 gives conclusion with future directions.

2. Related works

In this subsection, we give some related literatures to our work: network representation learning, social recommendation and graph neural network.

2.1 Representation learning-based recommendation

Network representation learning is an important research field and branch of graph data mining and complex network analysis. It has developed rapidly in recent years and has also been favored by many researchers in the field of recommendation. Recommendation based on network representation learning has become an emerging research direction [3]. Most traditional recommendation methods generate recommendations based on user-item interaction data, item features and other factors such as time, location, etc. [28]. CF-based or content-based recommendation methods can perform general recommendation tasks, hybrid recommendation methods integrate multi-source heterogeneous side information to alleviate the problems of cold start and matrix sparse to a certain extent.

The essence of network representation learning is to map the network nodes to a low dimensional vector space while maintaining the node topology [31]. Many works have shown that recommendation results based on network representation learning have significant improvements over traditional models. At present, recommendation methods based on network representation learning mainly include: factorization machine model [35], tree model for recommendation [45] and deep learning models [39]. For instance, RNN to model user interaction sequences for sequential recommendation [20]; CNN to model user latent preferences, which treats user interaction data or features as images [13]; GNN to build and propagate information in the graph to get node embeddings for recommendation [17, 26].

2.2 Social-based recommendation

Social based recommendation methods utilize the user’s social relationships, which can reduce the sparsity of user behavior by using the preferences of neighbors of users and learn more accurate user preferences. Social recommendation [15, 32] refers to obtaining social behavior data from social networks, social media, personal blogs, online communities and other media, then integrating social behavior data with user’s historical ratings into recommender systems to generate recommendation results.

TrustSVD model proposed by Guo et al., which introduced explicit and implicit influences of user trust [6]. The model is based on the classic recommendation method SVD $++$ [14], which introduces the trust relationship in SVD $++$ and decomposes user trust relationship matrix to improve recommendation performance. TrustMF model proposed by Jamali et al. [2]. They believed that interconnected users tend to have similar interests and user’s embedding is close to the embedding of his trusted users, which further improves the accuracy of social recommendation. SocialMF model proposed by Ma et al. [12]. They introduced user social matrix factorization into traditional matrix factorization method, and simultaneously decomposed rating matrix and user trust relationship matrix to represent user’s latent preferences. Due to the natural existence of graph structure data of user social relations in social recommendation tasks, some scholars also apply graph neural networks to social recommendation. Wu et al. proposed the influence diffusion and interest diffusion neural network (DiffNet $++$ ) [18], which adopted graph neural network to analog the diffuse process of social influence and user-item interest in social recommendation, which can make the most of social relations and greatly improve the accuracy of recommendation results.

2.3 GNN-based recommendation

In recent years, graph neural networks have shown superior performance on graph learning tasks and have been studied from the aspects of theory and application. Traditional deep learning-based algorithm has made great progress and application in extracting data features in Euclidean space, and many researchers have proposed algorithms based on deep learning, such as: DeepFM [7], CDL [36], AutoRec [29], etc. But the complexity of graphs makes traditional deep learning algorithms unable to learn graph representations efficiently, the key points of graph learning is how to describe graph topology and node attribute in the form of vectors.

GCN [16] proposed to use graph structure information for node representation learning. This method adopted multilayer graph convolutional network for information propagation and made a breakthrough in the semi-supervised graph learning task. GCN assigns the weight of edge based on the outdegree and indegree of node in the graph. GAT ı4 model proposed multihead self-attention method to give different weights to different edges, which input the representation vector of any two nodes to calculate the weights of edges according to the self-attention network. LightGCN [9] only uses the aggregation operation of traditional GCN, linearly propagates on the interaction graph to extract the latent embedding of nodes, then fuses the vectors of all layers to get the final featrues for recommendation, thereby reducing the complexity of the model. In 2017, Berg proposed GC-MC model(graph convolutional matrix completion) [1], which regards users and items as nodes, and rating information as labels on the interactive edges. The model transformed the prediction task into link prediction task and constructed graph AE-based recommendation method. In the encoder, adopted GCN to aggregate neighbor information of nodes and in the decoder predicted the edge labels.

3. Methodology

In this subsection, we present details of our framework and describe the process of fusing different heterogeneous information. Then, we give the complexity analysis.

3.1 Model overview

In our model, there are three heterogeneous information, i.e., rating information, social information and item attribute information. Let $R$ represent rating information, i.e., user-item rating matrix, which shows users’ ratings on items. In real-world scenario, we usually consider implicit feedbacks (e.g., click, watch, or purchase) to denote rating matrix, which $r=$ 1 if user click or watch item, otherwise $r=$ 0. Then we can construct interaction graph from rating matrix. Let $S$ represent social information, which shows social relationships among users. If $s=$ 1 denotes the social relationship between them, otherwise $s=$ 0. Then we can construct user social graph from social information. Let $X$ represent item attribute information, which shows the relationships among items. If $x=$ 1denotes the same category or price between them, otherwise $x=$ 0. Then we can construct item attributed graph form item attribute information.

Then, three heterogeneous information graphs can be defined as follows:

Definition 1: Interaction Graph $G_{r}$ . We translate historical rating matrix into a bipartite graph $G_{r}={\{}U,V{\}}$ , which denotes interactions among users and items, $U$ denotes users set and $V$ denotes items set. $r_{ij}=$ 1 represents user $u_{i}$ has rated item $v_{j}$ , otherwise $r_{ij}=$ 0.

Definition 2: User Social Graph $G_{s}$ . $G_{s}={\{}U,E_{u}{\}}$ denotes the social relationships among users, $U$ denotes users set and $E_{u}$ denotes edges set, where $e_{ij}=$ 1 represents $u_{i}$ has social relation with $u_{j}$ , otherwise $e_{ij}=$ 0.

Definition 3: Item Attributed Graph $G_{v}$ . $G_{v}={\{}V,E_{v}{\}}$ denotes the relationships among items, $V$ represents items set and $E_{v}$ represents edges set, where $e_{ij}=$ 1 represents item $v_{i}$ and $v_{j}$ belong to the same categories or prices, and so on.

We formulate the task as: given different heterogeneous graph $G_{s}$ , $G_{r}$ and $G_{v}$ , our main purpose is to make the predictions of user’s preferences on items, i.e. $({G_{r},G_{s},G_{v}})\to\hat{R}_{ui}$ .

Figure 2.

The overview of framework.

Here, we present the model framework, which is shown in Fig. 2. The proposed model consists of four main parts: 1) embedding layer, its main function is to parameterize each node while maintaining the topology of graph; 2) aggregation layer, which updates node representation by propagating embeddings from its neighbors. In this component, three aggregations are adopted to extract the latent embeddings from three different graphs, respectively. The interaction graph aggregation can be used to extract the latent embeddings of users and items from the interaction graph, the user social graph aggregation can be used to extract the latent embeddings of users from the user social graph and the item attributed graph aggregation can be used to extract the latent embeddings of items from the item attributed graph. 3) combination layer, which combines the final latent embeddings together, i.e., fuses the latent embeddings of users and items from different heterogeneous graphs to get the final latent embeddings of them; 4) prediction layer, its main function is to predict the rating of a user-item pair.

3.2 Embedding layer

We convert the ID embeddings of user $u$ and item $i$ to vectorized representations $e_{u}\in R^{D}$ and $e_{i}\in R^{D}$ , $D$ represents the size of embedding dimension, then fed into our proposed model and refine the embeddings by aggregation layer on three different graphs. We can fuse different collaborative information into embeddings by the refinement step. It is important to mention that the user and item nodes in $G_{r}$ , $G_{s}$ and $G_{v}$ are parameter shared.

3.3 Aggregation layer

3.3.1 Interaction graph aggregation

Generally speaking, graph data have node features and structural features, and need to consider both node information and structural information. Graph convolutional network can automatically learn not only node features, but also the association information between nodes. The core idea of graph convolution is to use “edge information” to “aggregate” “node information” to generate a new “node representation”. We can define the aggregation as:

$\displaystyle e_{u}^{l}=\textit{AGG}({e_{u}^{l},\{{e_{i}^{l}|i\in N_{u}}\}})$ (1)

Where $\textit{AGG}(\cdot)$ is message aggregation function, which quantify the propagation embedding from its neighbors’ latent embeddings. $N_{u}$ represents the neighbors set of user $u$ . $e_{u}^{l}$ denotes user’s embedding of the $l$ th layer, $l\geqslant 1$ denotes the layers of embedding propagation.

In the graph $G_{r}$ , we adopt the method of GCN to extract the neighborhood relationship features between item $i$ and user $u$ . User’s preference directly influenced by the interacted items; on the other hand, item’s features can be constructed by the users who interacted. The process of embedding propagation between directly connected users and items can be seen as the first-order propagation, then we get the representation of user $u$ and item $i$ by the message aggregated from its directly connected neighborhood. $e_{u\leftarrow i}^{0}$ and $e_{i\leftarrow u}^{0}$ denotes the message embedding propagation on edge ( $u, i$ ), they can be defined as:

$\displaystyle e_{u\leftarrow i}^{0}=\frac{1}{|{N_{u}}|^{0.5}|{N_{i}}|^{0.5}}({% W_{1}^{0}e_{i}^{0}+W_{2}^{0}({e_{i}^{0}\odot e_{u}^{0}})})$ (2) $\displaystyle e_{i\leftarrow u}^{0}=\frac{1}{|{N_{u}}|^{0.5}|{N_{i}}|^{0.5}}({% W_{1}^{0}e_{u}^{0}+W_{2}^{0}({e_{u}^{0}\odot e_{i}^{0}})})$ (3)

Where $e_{u}^{0}$ and $e_{i}^{0}$ denote the ID embeddings of user $u$ and item $i$ , respectively. $\odot$ represents the element-wise product. $e_{i}^{0}\odot e_{u}^{0}$ boosts the representation ability by propagating interaction information between them. $W_{1}^{0}$ and $W_{2}^{0}$ denotes the weight matrices, which is used to extract useful information for dissemination, $|{N_{u}}|$ represents the size of item set which interacted by user $u$ , $|{N_{i}}|$ represents the size of user set which interacted by item $i$ .

Then we can get the representations of user and item by propagating message from their neighbors. The process can be defined as:

$\displaystyle e_{u}^{1}=g\left({W_{1}^{1}e_{u}^{0}+\mathop{\sum}\limits_{i\in N% _{u}}e_{u\leftarrow i}^{0}}\right)$ (4) $\displaystyle e_{i}^{1}=g\left({W_{1}^{1}e_{i}^{0}+\mathop{\sum}\limits_{u\in N% _{i}}e_{i\leftarrow u}^{0}}\right)$ (5)

Where $e_{u}^{1}$ and $e_{i}^{1}$ represent the user’s and item’s embeddings of the 1th layer, respectively. $g(\cdot)$ denotes aggregation function, which quantify the propagation embedding from his neighbors’ latent embeddings, such as LeakyReLU, $W_{1}^{1}$ denotes the weight matrices of 1th layer, which is used to extract useful information for dissemination, $e_{u}^{0}$ and $e_{i}^{0}$ denote the ID embeddings of user $u$ and item $i$ , respectively. $N_{u}$ and $N_{i}$ denote the neighbors set of nodes (user $u$ and item $i$ ).

Then, we stack more embedding propagation layers so that messages can be continuously propagated from higher-order neighbors, the representations of user $u$ and item $i$ can be defined as:

$\displaystyle e_{u}^{l}=g\left({W_{1}^{l-1}e_{u}^{l-1}+\mathop{\sum}\limits_{i% \in N_{u}}e_{u\leftarrow i}^{l-1}}\right)$ (6) $\displaystyle e_{i}^{l}=g\left({W_{1}^{l-1}e_{i}^{l-1}+\mathop{\sum}\limits_{u% \in N_{i}}e_{i\leftarrow u}^{l-1}}\right)$ (7)

Where $l\geqslant 1$ denotes the propagation layer.

In different propagation layers, the representations of user and item emphasize messages passed through different connections, and contribute different on user’s preference. After $L$ layers propagation, we sum them to obtain the final embeddings of user and item, it can be defined as:

$\displaystyle e_{u}=\mathop{\sum}\limits_{l=0}^{L}e_{u}^{l},e_{i}=\mathop{\sum% }\limits_{l=0}^{L}e_{i}^{l}$ (8)

3.3.2 User social graph aggregation

According to the common sense of sociology, the interests and hobbies of users are usually directly or indirectly influenced by their friends, and at the same time, the interests and hobbies of friends are also directly or indirectly influenced by them. Therefore, users want to propagate their preferences, and diffuse their latent preferences to their friends. In social graph, we also adopt graph convolution network to model the influence diffusion effect for user embedding modeling.

We use social information to model user embeddings, defined as:

$\displaystyle e_{s_{u}}^{l}=\textit{AGG}_{\textit{social}}({\{{e_{b}^{l-1},% \forall b\in S_{u}}\}})$ (9)

Where $e_{s_{u}}^{l}$ denote the user’s embedding of the lth layer, $\textit{AGG}_{\textit{social}}$ denotes social information aggregation function, which quantify the propagation embedding from his social neighbors’ latent embeddings, such as max or average aggregation. $e_{b}^{l-1}$ denotes the user’s embedding of the $l$ -1th layer, $S_{u}$ denotes the social neighbors of user $u$ in social graph.

Then, we concatenate the embedding of $e_{s_{u}}^{l}$ and $e_{u}^{l-1}$ to obtain the final embedding of user $u$ , defined as:

$\displaystyle e_{u}^{s}=f({W_{s}\times[{e_{s_{u}}^{l}\oplus e_{u}^{l-1}}]+b_{s% }})$ (10)

Where $e_{u}^{s}$ denotes the final embedding of user $u$ in user social graph, $f(\cdot)$ represents ReLU function, $e_{u}^{l-1}$ represents the latent embedding of user $u$ in the $l$ -1th layer, $W_{s}$ and $b_{s}$ denote the weight and bias, respectively, $\oplus$ denotes the concatenation operation.

Through the layer-wise GCN operation, we simulate the social relationship diffusion process of user preferences in the user social graph and obtain the user’s final embedding.

3.3.3 Item attributed graph aggregation

In item attributed graph, some items belong to the same categories or price, and so on. It means they are similar or related to each other. For example, a user who likes Huawei notebooks may also like Huawei mouse because these items have common attributes. Thus, the relations between items can provide more information to enhance the items’ latent embeddings.

We also adopt aggregation operation to learn the item’s latent embedding from similar or related items, defined as:

$\displaystyle e_{i}^{o}=\textit{AGG}_{\textit{attribute}}({\{{e_{j}^{o},% \forall j\in A_{a}}\}})$ (11)

Where $e_{i}^{o}$ denotes the item’s embedding after aggregation operation, $\textit{AGG}_{\textit{attribute}}$ denotes attribute information aggregation function, which quantify the embedding from item’s neighbor’s latent embeddings, such as max or average aggregation. $A_{a}$ denotes the neighbors of item which has the common attribute in item attributed graph.

Then, we obtain the final embedding of item, defined as:

$\displaystyle e_{i}^{a}=f({W_{a}\times e_{i}^{o}+b_{a}})$ (12)

Where $e_{i}^{a}$ denotes the final embedding of item in item attributed graph, $f(\cdot)$ denotes the activation function of ReLU, $W_{a}$ and $b_{a}$ denote the weight and bias, respectively.

3.4 Combination layer & model prediction

Given these different representations of users and items from different perspective, i.e., $e_{u}$ , $e_{i}$ , $e_{u}^{s}$ and $e_{i}^{a}$ , we combine them together to get the final latent embeddings, defined as:

$\displaystyle e_{u}^{\ast}=e_{u}\oplus e_{u}^{s},\quad e_{i}^{\ast}=e_{i}% \oplus e_{i}^{a}$ (13)

We obtain the final latent embeddings of users and items by embedding propagation and aggregation, then adopt inner product to make predictions of users on items, defined as:

$\displaystyle\hat{r}_{ui}=e_{u}^{*T}e_{i}^{*}$ (14)

3.5 Optimization

We concentrate on the implicit feedbacks of users, so adopt pair-wise BPR loss as objective function to learn the model parameters. The core idea of BPR loss is that observed user-item pair should have higher predicted values than its unobserved counterparts:

$\displaystyle{\cal L}_{\textit{BPR}}=\mathop{\sum}\limits_{({u,i,j})\in O}-\ln% \sigma(\hat{r}_{ui}-\hat{r}_{uj})+\lambda||\Theta||_{2}^{2}$ (15)

Where $O=\{({u,i,j})|({u,i})\in R^{+},({u,j})\notin R^{+}\}$ represents the pair-wise training data, $R^{+}$ denotes the observed interactions, $\sigma(\cdot)$ denotes the activation function of sigmoid, i.e., $\sigma(x)=1/({1+\exp({-x})})$ , $\Theta$ are trainable model parameters, $\lambda$ denotes the coefficient which controls the strength of $\ell$ 2 regularization.

We adopt dropout method and adaptive momentum to optimize the model and update the parameters. In the training process, it drops out some messages and nodes randomly and only updates some of parameters.

3.6 Complexity analysis

Next, we present the complexity analysis. In three heterogeneous graphs, the main computational loss is the aggregation process. Given m users and n items, assuming the number of edges are $|{\varepsilon_{1}}|$ (interaction graph), $|{\varepsilon_{2}}|$ (social graph) and $|{\varepsilon_{3}}|$ (item attributed graph), respectively, $D$ denotes the embedding size, $b$ denotes the size of each training batch, $s$ represents the number of training epochs, and L represents the number of GCN layers. For user item interaction graph and user social graph, the time complexities of GCN process are $O\left({2\textit{LDs}\frac{|{\varepsilon_{1}}|}{b}}\right)$ and $O\left({\textit{LDs}\frac{|{\varepsilon_{2}}|}{b}}\right)$ , respectively. For item attributed graph, the time complexity of aggregation process is $O\left({Ds\frac{|{\varepsilon_{3}}|}{b}}\right)$ . To sum up, the whole complexity of our proposed model is close to $O\left({2\textit{LDs}\frac{|{\varepsilon_{1}}|}{b}+\textit{LDs}\frac{|{% \varepsilon_{2}}|}{b}+Ds\frac{|{\varepsilon_{3}}|}{b}}\right)$ .

4. Experiment

In this subsection, to validate the overall effectiveness of our proposed framework, we conduct a series of experiments. Firstly, three different datasets are introduced, and then the evaluation metrics and comparison algorithms are given. Finally, the experimental results and analysis are presented.

4.1 Experimental settings

4.1.1 Datasets

We select three different dada sets to validate the effectiveness of our framework. Table 1 shows the statistics of three datasets.

Table 1
Statistics of Epinions, Yelp and Ciao, where rating density and social density are calculated by using #ratings/(#users $\times$ #items) and #social relations/(#users $\times$ #users), respectively

Dataset	#Users	#Items	#Ratings	#Social relations	Rating density (%)	Social density (%)
Epinions	18,202	47,449	298,173	355,813	0.035%	0.041%
Yelp	17,237	38,342	185,869	143,765	0.048%	0.028%
Ciao	7317	104,195	283,319	111,781	0.037%	0.21%

Ciao is a consumer review website with social network, which users can rate, comment on products and make social relationships with each other. The dataset contains 7,317 users and 104,195 items, with a total of 283,319 user ratings and 111,781 social connections. The density of ratings and relationships are 0.037% and 0.21%, respectively.

Epinions is a review website, which users can make ratings and social relationships. In order to add reviews, users can register and post their own comments on products, movies or other users’ reviews. Each user maintains a “trust” list and a “distrust” list, which shows social information between users. According to social information, the website can rank users’ comments on items in a certain order, so that the opinions expressed by the most trusted users will be ranked first. In this experiment, the social trust relationships are used, and distrust relationships are ignored. The dataset contains 18,202 users and 47,449 items, with a total of 298,173 user ratings and 355,813 social connections. The density of ratings and relationships are 0.035% and 0.041%, respectively.

Yelp is the largest review website in the United States. It provides users with review services for restaurants, hotels and travel. Users can rate and comment on businesses through the Yelp website. The Yelp dataset collects users’ ratings, comment records and users’ social relationships. The dataset contains 17,237 users and 38,342 items, with a total of 185,869 user ratings and 143,765 social connections. The density of ratings and relationships are 0.048% and 0.028%, respectively.

4.1.2 Evaluation metrics

In order to better evaluate Top-k recommendation quality of our framework, we adopt the commonly used protocols: NDCG@k, HR@k and Recall@k. The former measures the position of items which user prefers in the ranking list, the latter measures the number of items which user prefers in the ranking list. To better prove the performance of our framework, we add another metric Recall@k to measure Top-k recommendation quality, which refers to the proportion of samples with positive prediction and correct classification to the positive category, mainly measuring the ranking performance of model. The larger values of metrics indicate better performance. For each user, we randomly select 100 samples as negative and combine with the user’s positive samples to calculate the evaluation metrics.

4.1.3 Baselines

To verify the performance of our framework, we consider the following compared baselines:

•
BPRMF [30]: it proposed by Steffen et al., which adopts BPR loss to optimize MF. MF is a frequently adopted collaborative filtering method.
•
SoREC [21]: This method is a PMF model designed by Ma et al. it decomposes user social information data and user item interaction information data at the same time and shares the user’s latent preference features.
•
TrustSVD [6]: This method proposed by Guo et al., which uses direct social information data to mine the latent preference features of different users. It is an extension of SVD $++$ to integrate social trust relationship data.
•
TrustMF [2]: It is based on SoRec proposed by He et al., which uses user social information to model and sharing users’ latent preference features from the perspectives of trusters and trustees.
•
NFM [10]: It is a hybrid recommendation method, which combines user feedback information and item content information. It integrates user implicit feedback information and item content information through factorization machine and neural network.
•
NEUMF [37]: It is proposed by He et al., which uses neural network to extract the latent features of historical interactions.
•
ContextMF [22]: It is a collective MF method, which combines social information and context information to recommend.

4.1.4 Parameter setting

The dimension of embedding is set to the value of {16, 32, 64, 128, 256}, the learning rate is set to 0.001 and mini-batch size is set to 1024. The depth of neural component layer and regularization coefficient are set to three layer and 1e-4, respectively. We adopt early stopping strategy and Adam optimizer. The parameters of comparison methods are tuned to achieve optimal performance in the corresponding papers.

For these three datasets, we select items with more than 2 rating records, select users with more than 2 social connections and 3 rating records, and removed users and items with no rating records or social connections. We randomly sample 80% for training, 10% for testing and 10% for validation.

4.2 Experimental results

4.2.1 Overall performance

To verify the performance of our framework, we compare it with baselines. Experimental results as shown in Fig. 3. In this experiment, Top-K is set to {5, 10, 20, 30} and the dimension of embedding is set to 64.

From Fig. 3, We have the following observations:

•
BPRMF only uses the user item interactions to factorize and adopts inner product to recommend, ma $k$ es it cannot learn the latent factors accurately, therefore its recommendation performance is the weakest among all baselines. This indicates the significance of side information.
•
The social-based methods (SoREC, TrustSVD and TrustMF) utilize social relationships as side information to reduce data sparsity and enhance the effectiveness of feature extraction. Therefore, the recommendation performance is better than BPRMF. SOREC and TrustMF are based on MF method, while TrustSVD is an extension of SVD $++$ and incorporates social trust relationship data, so its recommendation performance is better than SOREC and TrustMF.
•
Although NEUMF only uses user-item interactions for recommendation, because NEUMF adopts neural networks to model user-item interactions, it performs better than SoREC, TrustSVD and TrustMF, which is due to the strong modeling ability of neural networks. NFM is a hybrid recommendation method, which combines user’s implicit feedback information and item’s content information through factorization machine and neural network, so it achieves better performance than NEUMF method.
•
ContextMF is a collective MF method, which combines social information and context information to recommend. Copared to other baselines, the performance of it achieves best.
•
On three different datasets, our framework outperforms other baselines and obtains the best results, which demonstrates the importance of mining higher-oder user social relationship and item attribute information, e.g., comparison results on SoREC, TrustSVD and TrustMF on Epinions dataset, the NDCG@20 of our proposed model improved 21.72%, 13.87% and 16.66%, respectively. Compared with the optimal results of NFM and NEUMF on Ciao dataset, the NDCG@20 of our proposed model improved 10.94% and 12.21%, respectively. Compared with the strongest baseline ContextMF method on Yelp dataset, our proposed model obtains improvement of 18.34% on NDCG@20. Our proposed model can learn more accurate latent embedding by information propagation and aggregation. The effective use of information propagation and aggregation demonstrate more important for the sparsity of user item interatciton.

Figure 3.
Performance comparison on three datasets.

Figure 3.
Continued.

4.2.2 The effect of different heterogeneous information

In this subsection, we conduct comparative experiments with the proposed model and its variants. Our proposed model mines information from different heterogeneous graph. To fully compare the impact of different heterogeneous information, we set up three variants of our proposed model, defined as follows:

•
w/o I: it removes the item attributed graph from our proposed model.
•
w/o S: it removes the user social graph from our proposed model.
•
w/o S&I: it removes both user social graph and item attributed graph, which is equivalent to graph convolution network which only captures higher-order user item information.

We conduct experiments on three different datasets, the experiments results as shown in Fig. 4. In this experiment, K is also set to {5, 10, 20, 30} and the dimension of embedding is set to 64.

Figure 4.
Effect of different heterogeneous information on three datasets.

Figure 5.
Effect of embedding size on three datasets.

Figure 6.
Effect of regularization coefficient $\lambda$ .

From Fig. 4, we can find:

•
Our proposed model consistently achieves better performance than its variants, which indicates that our model integrate social graph and item attributed graph can better extract user and item latent embeddings, therefore enhance the performance of recommendation, e.g., comparison results of w/o S, w/o I and w/o S&I on Ciao dataset, the HR@10 of our proposed model improved 10.61%, 8.2% and 21.99%, respectively.
•
w/o S&I variant performs worst in all comparions, which demonstrates the importance of fusion different information, and verifies the different information have different impact on extracting embeddings and improving the accuracy of recommendation.
•
w/o S is slightly worse than w/o I in all datasets, which indicates that social graph has more useful information in exatracting user latent embeddings and boosting the recommendation performance, e.g., comparison results of w/o S variant on Yelp dataset, the HR@10 of w/o I variant improved 12.69%.
•
w/o I variant performs worse than our proposed model, but better than the other two variants, which means that the attribute information of item play a smaller role in extracting users’ latent preferences for the item compared with the other two variants. After removing the item attributed information, the recommendation performance has little impact, e.g., comparison results of w/o I variant on Epinions dataset, the NDCG@10 of our proposed model improved 4.29%.

4.2.3 The effect of embedding size

In this subsection, we conduct comparative experiments on the impact of embedding size. we conduct experiments on dimensionality D of embeddings on three datasets. In this experiment, K is set to 10. As shown in Fig. 5.

From Fig. 5, we can find:

•
Firstly, the embedding size has a certain impact on the recommendation performance, with the increasing of dimension, the performance of recommendation first increases and then degrades.
•
Secondly, with the dimension increases from 16 to 64, the predictive accuracy increases significantly, and achieves the best performance when the embdding size is 64. Then with the dimension increases from 64 to 256, the performance of recommendation degrades. It indicates that a large dimension has a good performance. However, if the dimension is too large, the complexity of the model will increase, thus reducing the recommendation performance. Therefore, we need a suitable dimension to balance performance and complexity.

4.2.4 The effect of regularization coefficient $\lambda$

In this subsection, we make comparative experiments on the impact of regularization coefficient. we conduct experiments on regularization coefficient $\lambda$ on Epinions dataset. In this experiment, we set K to 20 and the embedding size to 64, respectively. As shown in Fig. 6.

From Fig. 6, we can find:

•
Firstly, our proposed framework is relatively sensitive to the regularization coefficient, with the increasing of $\lambda$ , the performance of recommendation first increases and then degrades.
•
Secondly, with $\lambda$ increases from 1e-6 to 1e-3, the predictive accuracy increases slightly, and achieves the best when $\lambda$ is 1e-3. Then with $\lambda$ exceeds 1e-3, the predictive accuracy degrades. It means that a small regularization coefficient has a good performance. However, if the regularization coefficient is too large, the predictive accuracy of our framework degrades rapidly. It means that a large coefficient $\lambda$ has negative impact on model performance, therefore it is not encouraged. Therefore, we set the value of regularization coefficient $\lambda$ is 1e-3.

5. Conclusion

In this work, we presented a new heterogeneous information fusion-based graph collaborative filtering recommendation, which extracted the embeddings from social graph, historical interaction graph and item attributed graph, respectively. Through information propagation and aggregation, our model can obtain the embeddings of users and items effectively and enhance the performance of recommendation. Firstly, we adopted graph convolution network to catch the latent relations from historical interaction graph. Then we modeled influence diffusion effect for user embedding from user social graph to enhance the user’s latent factors and modeled the item’s latent embeddings by aggregation operating to enhance item’s latent representations. Finally, we conducted different comparative validations on three different datasets, and comparison results demonstrated the excellent performance of our framework.

In future work, we plan to utilize more heterogeneous data to learn the latent preferences of users, such as review information or sequential information and so on. In addition, group recommendation is a trend in the development of recommender systems. How to analyze complex group relationships, group scenarios, group preferences, and build a more accurate and intelligent group recommendation model will be the future research direction. Finally, our framework assumes that the user’s preference is invariable and static, but with the change of time, the user’s preference will shift. How to model dynamic graph to enhance the accuracy of recommendation, and increase user’s satisfaction and experience will be another future research direction.

Footnotes

Acknowledgments

We thank the anonymous referees for their helpful comments and suggestions on the initial version of this paper.

References

Berg

Kipf

and Welling

, Graph convolutional matrix completion, in: International Conference on Learning Representations, 2017, pp. 123–138.

Yang

Lei

Liu

and Li

, Social collaborative filtering by trust, IEEE Transactions on Pattern Analysis and Machine Intelligence 39(8) (2017), 1633–1647.

Cheng

et al., Wide & Deep learning for recommender systems, in: Deep Learning for Recommender System, ACM, 2016, pp. 7–10.

et al., Graph contextualized self-attention network for session-based recommendation, in: IJCAI, 2019, pp. 3940–3946.

Fan

et al., Graph neural networks for social recommendation, in: The World Wide Web Conference, 2019, pp. 1–14.

Guo

Zhang

and Smith

N.Y.

, TrustSVD: Collaborative filtering with both the explicit and implicit influence of user trust and of item ratings, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29(1), 2015, pp. 123–129.

Guo

et al., DeepFM: A Factorization-Machine based Neural Network for CTR Prediction, in: IJCAI, 2017, pp. 2782–2788.

Hamilton

W.L.

, Graph representation learning, Synthesis Lectures on Artificial Intelligence and Machine Learning 14(3) (2020), 1–159.

et al., LightGCN: Simplifying and powering graph convolution network for recommendation, in: The 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2020, pp. 356–379.

10.

and Chua

, Neural factorization machines for sparse predictive analytics, in: Proceeding of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017, pp. 355–364.

11.

Ilhami

and Su

, Film recommendation systems using matrix factorization and collaborative filtering, in: International Conference on Information Technology Systems & Innovation (ICITSI), 2014, pp. 1–6.

12.

Jamali

and Ester

, A matrix factorization technique with trust propagation for recommendation in social networks, in: Proceedings of the Fourth ACM Conference on Recommender Systems, 2010, pp. 135–142.

13.

Kim

et al., Convolutional matrix factorization for document context-aware recommendation, in: Proceedings of the 10th ACM Conference on Recommender Systems, ACM, 2016, pp. 233–240.

14.

Koren

, Factor in the neighbors: scalable and accurate collaborative filtering, in: ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 4(1), 2010, pp. 1–24.

15.

King

Lyu

M.R.

and Ma

, Introduction to social recommendation, in: Proceedings of the 19th International Conference on World Wide Web, 2010, pp. 1355–1356.

16.

Kipf

and Welling

, Semi-supervised classification with graph convolutional networks, in: International Conference on Learning Representations, 2017, pp. 1–14.

17.

Liu

, Enhancing graph neural networks for recommender systems, in: The 43rd International ACM SIGIR Conference, ACM, 2020, pp. 356–367.

18.

et al., DiffNet+⁣+: A neural influence and interest diffusion network for social recommendation, IEEE Transactions on Knowledge and Data Engineering 34(10) (2020), 4753–4766.

19.

Kartik

et al., A survey of graph neural networks for social recommender systems, ACM Transactions on Recommender Systems 1(1) (2022), 1–31.

20.

Quadrana

Cremonesi

and Jannach

, Sequence-aware recommender systems, ACM Computing Surveys (CSUR) 51(4) (2018), 1–36.

21.

Yang

Lyu

and King

, SoRec: social recommendation using probabilistic matrix factorization, in: Proceedings of the 17th ACM Conference on Information and Knowledge Management, 2008, pp. 931–940.

22.

Jiang

et al., Scalable recommendation with social contextual information, in: TKDE 26, Vol. 26, 2014, pp. 2789–2802.

23.

Nikolakopoulos

and Karypis

, RecWalk: nearly uncoupled random walks for top-n recommendation, in: International Conference on Web Search and Data Mining, ACM, 2019, pp. 150–158.

24.

Pazzani

M.J.

and Billsus

, Content-based recommendation systems, The Journal of Adaptive Web, Springer, Berlin, Heidelberg 4321 (2007), 325–341.

25.

et al., 3D graph neural networks for RGBD semantic segmentation, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 5199–5208.

26.

and Zeng

, Auxiliary stacked denoising autoencoder based collaborative filtering recommendation, KSII Transactions on Internet and Information Systems 14(6) (2020), 2310–2332.

27.

Smith

Weeks

Jacob

Freeman

and Magerko

, Towards a hybrid recommendation system for a sound library, in: IUI Workshops, 2019, pp. 1–6.

28.

Smirnova

and Vasile

, Contextual sequence modeling for recommendation with recurrent neural networks, in: Proceedings of ACM Recommender Systems Conference, Como, Italy, 2017, pp. 1–8.

29.

Sedhain

Menon

and Xie

, AutoRec: Autoencoders meet collaborative filtering, in: Proceedings of the 24th International Conference on World Wide Web, ACM, 2015, pp. 111–112.

30.

Rendle

et al., BPR: Bayesian personalized ranking from implicit feedback, in: Association for Uncertainty in Artificial Intelligence, 2009, pp. 452–461.

31.

Tang

et al., Line: Large-scale information network embedding, in: The 24th International Conference on World Wide Web, ACM, 2015, pp. 123–138.

32.

Tang

and Liu

, Social recommendation: A review, Social Network Analysis and Mining 3 (2013), 1113–1133.

33.

Wang

et al., Graph Learning Approaches to Recommender Systems: A Review, in: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21), 2020, pp. 4644–4652.

34.

et al., Session-based recommendation with graph neural networks, in: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI, 2019, pp. 1–12.

35.

and Qin

, Graph convolutional network for recommendation with low-pass collaborative filters, in: ICML, 2020, pp. 10936–10945.

36.

Wang

and Yeung

, Collaborative deep learning for recommender systems, in: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2015, pp. 1235–1244.

37.

et al., Neural collaborative filtering, in: Proceedings of the 26th International Conference on World Wide Web, WWW, 2017, pp. 173–182.

38.

Yao

Mao

and Luo

, Graph convolutional networks for text classification, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 7370–7377.

39.

Yang

and Dong

, HAGERec: Hierarchical attention graph convolutional network incorporating knowledge graph for explainable recommendation, Knowledge-Based Systems 24 (2020), 106–194.

40.

Zhou

Yang

S.H.

and Zha

, Functional matrix factorizations for cold-start recommendation, in: International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2011, pp. 315–324.

41.

et al., A comprehensive survey on graph neural networks, IEEE Transactions on Neural Networks and Learning Systems 32 (2021), 4–24.

42.

Zhang

et al., Efficient probabilistic logic reasoning with graph neural networks, in: International Conference on ICLR 2020, 2020, pp. 1–20.

43.

Zhang

and Chen

, Inductive Matrix Completion Based on Graph Neural Networks, in: International Conference on Learning Representations, 2020, pp. 1–10.

44.

Tang

and Mei

, Deep collaborative embedding for social image understanding, IEEE Transactions on Pattern Analysis and Machine Intelligence 41(9) (2018), 2070–2083.

45.

Wang

et al., Joint representation learning with ratings and reviews for recommendation, Neurocomputing 425 (2021), 181–190.

Heterogeneous information fusion based graph collaborative filtering recommendation

Abstract

Keywords

1. Introduction

2.1 Representation learning-based recommendation

2.2 Social-based recommendation

2.3 GNN-based recommendation

3. Methodology

3.1 Model overview

3.3 Aggregation layer

3.3.1 Interaction graph aggregation

4. Experiment

4.1 Experimental settings

4.1.1 Datasets

Table 1 Statistics of Epinions, Yelp and Ciao, where rating density and social density are calculated by using #ratings/(#users × #items) and #social relations/(#users × #users), respectively

4.1.3 Baselines

4.2 Experimental results

4.2.1 Overall performance

Footnotes

Acknowledgments

References

Table 1
Statistics of Epinions, Yelp and Ciao, where rating density and social density are calculated by using #ratings/(#users $\times$ #items) and #social relations/(#users $\times$ #users), respectively