Dual attentive graph convolutional networks for cross-domain recommendation

Abstract

Cross-domain recommendation aims to alleviate the target domain’s data sparsity problem by leveraging source domain knowledge. Existing GCN-based approaches perform graph convolution operations in each domain separately. However, the direct effect of item feature and topological structure information in the source domain are neglected for user preference modeling in the target domain. In this paper, we propose a novel Dual Attentive Graph Convolutional Network for Cross-Domain Recommendation (DAG4CDR). Specifically, we integrate the source and target domain’s interaction data to construct a unified user-item bipartite graph and then perform GCN propagation on the graph to learn user and item embeddings. Over the unified graph, the interaction data from both domains can be leveraged to learn user and item embeddings via information propagation. In the embedding aggregation phase, the messages passed from different items of two domains to users are weighted by a designed dual attention mechanism, which considers the contributions of different items from both node- and domain-level. We conducted extensive experiments to validate the effectiveness of our method on several publicly available datasets, and the results demonstrate the superiority of our model on preference modeling for both common and non-common users.

Keywords

Cross-domain recommendation graph convolutional network attention mechanism

1 Introduction

In the information era, the amount of information on the internet has grown explosively. Hundreds of millions of internet users surf online daily and hourly. While benefiting from the vast amount of data, it has become more and more challenging for users to get the desired information. The recommender system has become increasing important on tackling this information overload problem in various online platforms. For example, Amazon 1 recommends products that may be appealing to users based on their purchase history; YouTube 2 recommends videos to users based on their viewing history. Among the existing recommendation methods, Collaborative Filtering (CF)-based approaches have achieved a great success due to their simplicity and effectiveness in modeling user preference and item characteristics relying on their interaction data. Yet, the recommendation performance will be degraded sharply when the interactions are sparse [1, 2].

Cross-Domain Recommendation (CDR) methods have been widely investigated in recent years to alleviate the data sparsity problem by exploiting helpful information from a related domain (i.e., source domain) to enhance the performance in a target domain. The underlying assumption of CDR is that similar users share similar preferences across different domains. For example, users who purchased soccer balls in sports products would mostly prefer soccer socks in clothing. We broadly classified existing methods into two approaches established on exploiting information from different domains. One approach is to transfer knowledge from the source domain to the target domain by collective matrix factorization [3] or to map the respective user representations of two domains [4]. Despite the success of these approaches, the drawback is that they still maintain the independence of the domains and fail to connect different domains directly through users or items. Another approach is to aggregate knowledge from multiple domains by embedding all users and items into the same latent space and using the shared users and items between domains as a bridge to connect different domains [5]. However, these approaches have not exploited the higher-order connections between users and items. Recently, Graph Convolutional Networks (GCNs) have drawn increasing attention in CDR, and some GCN-based CDR methods have been proposed due to their good capability in modeling higher-order relations. For example, BiTGCF [2] exploits the higher-order connectivity between nodes in each domain and shares information between the common users in two domains based on a bidirectional knowledge transfer method.

Despite the progress, the direct effect of item features and topological structure information in the source domain have not been explored in existing methods. Current GCN-based CDR methods first perform graph convolution operation in each domain separately to obtain the domain-specific user and item representations and then adopt a feature transfer technique to fuse embeddings from different domains to obtain the final representations of the common users [2]. We argued that this approach cannot well exploit the information of common users in the user and item embedding learning process, because the interacted item information of common users is still used separately in each domain in the embedding learning process. And the information is only used for updating the embeddings of the common users via a late fusing strategy. A basic assumption of CDR models is that users share some general interests across different domains. Therefore, the interacted item information of the common users across different domains can be leveraged to enhance the embedding learning mutually in different domains, and it should be able to influence the embeddings of non-common users and items indirectly in the collaborative learning process. Besides, most existing GCN-based CDR methods treat items equally in each domain [2 , 6]. In those methods, the message of all items is passed to the user node without differentiating their importance. For example, PPGN [7] constructs a unified cross-domain preference graph through user-item interactions in two domains to capture the propagation of user preferences in the graph, however, it treats all the items (from different domains) equally in the aggregation phase. We deem that some items are more important for a user [8]. More importantly, for recommending items to a user in the target domain, items from different domains may contribute differently to the prediction. Generally, items in the target domain (e.g., Cloth) should contribute more than the items from the source domain (e.g., Sport). It should be noted that for different users, the contributions from different domains may contribute differently.

Inspired by the above considerations, we proposed a Dual Attentive Graph Convolutional Networks for Cross-Domain Recommendation (short for DAG4CDR) in this paper. Specifically, to leverage the collaborative information from two domains, we construct a unified interaction graph using the interaction data of two domains. Based on the graph, our model can simultaneously exploit the interacted nodes from both the source and target domains to learn the embeddings. Accordingly, our proposed method can naturally model the direct effect of item features and topological information in the source domain. To model the effects of different items, especially the ones from different domains, on a user’s preference for the items in the target domain, we design a novel dual attention mechanism to estimate the importance of item nodes from both node-level and domain-level. We conducted extensive experiments on several large-scale real-world datasets. A set of competitive methods, which are designed for single-domain and cross-domain recommendation were adopted in our experiments for comparisons. Experimental results show that our method can achieve substantial improvement over all the adopted competitors. Additional experiments for users with different sparsity levels demonstrate the advantage of leveraging information from auxiliary domains in recommendation and the effectiveness of our model. In addition, ablation studies are conducted to validate the importance of differentiating the contributions of different items and domains in embedding learning.

2 Related work

Collaborative Filtering (CF) has accomplished great success in recommender systems due to its simplicity by modeling user preference purely based on user-item interaction data. A representative CF-based method is matrix factorization (MF) [9 –11], which learns users and items representations as feature vectors by using user-item interaction matrix, eventually inferring a user’s preferences to an item by multiplying their feature vectors. Due to its great success, various variants have been proposed, such as probabilistic matrix factorization (PMF) [9], weighted regularized matrix factorization (WRMF) [10] and bayesian personalized ranking (BPR) [11]. The powerful capability of deep learning techniques has been demonstrated in various tasks. Accordingly, it is introduced to model the non-linear user behaviors in the recommendation [12, 13]. Besides, metric learning-based approaches replace the dot product with Euclidean distance to capture fine-grained user preferences [14, 15].

Despite the remarkable success, the above recommendation methods suffer from the problem of data sparsity. The reason is that they rely on exploiting the first-order proximity of interaction data in a single domain. Exploiting the high-order proximity of interaction data or leveraging the information of auxiliary domain are effective ways to assist in learning feature vectors. Next, we focus on graph convolutional networks in recommendations and cross-domain recommendation methods.

2.1 Graph convolutional networks

In the last few years, graph convolutional networks (GCNs) have achieved great success for their capability on non-Euclidean data in various tasks [1 , 16–19]. The core of GCNs is how to aggregate the messages of neighboring nodes iteratively.

More recently, the GCNs have also been widely applied in recommendation. For example, graph convolutional matrix completion (GC-MC) [16] performs the graph convolutional operation to exploit the first-order connectivities in neighbor nodes. To capture the information from high-order neighbors, neural graph collaborative filtering (NGCF) [1] propagates embeddings over the interaction graph and achieves promising results. Given that the transformation function and nonlinear activation function contribute less to embedding propagation in GCNs, LightGCN [18] simplifies the model structure by removing them in NGCF. Besides, stacking multiple embedding propagation layers generally leads to the over-smoothing problem. To deal with this problem, IMP-GCN [19] performs high-order graph convolution over the sub-graphs consisting of users with similar interests.

Despite its success, recommender systems inevitably encounter data sparsity problem. There is a practical solution to the problem which is transferring knowledge from other related domains. In this paper, a novel GCN-based CDR method is proposed, which learns the representation of users and items through performing GCN operation over a unified interaction graph constructed with the interaction data of different domains.

2.2 Cross-domain recommendation

CDR aims to utilize the information of the source domain to alleviate the data sparsity problem in the target domain. Most existing CDR methods typically bridge these two domains by using transferring techniques [2 , 20].

The transfer can happen explicitly through inter-domain similarities and common attributes. For example, Chung et al. [21] picked the items relevant in the source domain based on the common attributes with the target domain. In contrast, the transfer can also be performed implicitly via shared user/item latent features or rating patterns. For example, Chen et al. [5] encoded users and items into the latent representations of their domains and used the common users and items transfer knowledge between the two domains. Hu et al. [6] designed a neural network connected by cross mappings of two sets of basic networks to learn complex user-item interactions between two domains. This effectively refines users’ embedding vectors in the target domain with the features extracted from the two domains. Its drawback is that it is difficult to have adequate knowledge transfer when the common users and items are sparse.

With the success of applying GCN techniques for modeling users’ preferences, some GCN-based CDR models have also been developed over the years [2, 7]. For instance, PPGN [7] models the higher-order interaction data in different domains by constructing a cross-domain preference matrix. However, existing GCN-based CDR methods often treat the neighboring nodes in the graph equally when learning the embedding of target nodes, which may degrade the performance. For example, different items may contribute differently to modeling a user’s preferences; and the items from different domains should also affect the user’s preference for the target domain of items differently in the preference modeling. Therefore, we presented the dual attention-based GCN-based CDR model to capture such effects.

3 Proposed model

3.1 Preliminaries

Before describing the details of DAG4CDR, we first describe the problem setup. For a source domain S and a target domain T, we denote the user set of them as $U_{S}$ and $U_{T}$ , respectively. N_t and N_s are the number of users in domains T and S; N_i and N_j are the number of items in domains T and S, respectively. In this way, the common user set of these two domains is represented as $U_{C} = U_{S} \cap U_{T}$ , in which the number of common users is represented as N_c. $I$ , R^N_t×N_i respectively represent the item set and the rating matrix of the domain T; and $J$ , R^N_s×N_j for the domain S. Take the interaction matrix R^N_t×N_i as an example, it records the users’ historical interaction behavior in the target domain T. If a user $u \in U$ interacted with an item $i \in I$ , the entry r_ui ∈ R in the matrix is non-zero; otherwise it is zero. By exploiting the interaction data in these two domains, we aim to improve the recommendation performance for users in domain T by introducing the information in domain S.

We constructed a unified interaction graph $G = (N, E)$ using all the interaction data in both domains S and T, Fig. 1 gives a toy example to illustrate how to construct the unified graph. $N$ is the node set, and $E$ is the edge set. $N$ consists of two types of nodes: user $u_{k} \in U_{S} \cup U_{T}$ with k ∈ {1, . . . , N_s + N_t - N_c}, items $i_{m} \in I$ with m ∈ {1, . . . , N_i} and items $j_{n} \in J$ with n ∈ {1, . . . , N_j}. k, m, and n are defined as the indexes of users in two domains and items in each domain, respectively.

Fig. 1

A toy example of a unified interaction graph based on user interaction data in target domain T and source domain S.

3.2 Model overview

Most existing CF-based recommendation methods suffer from the problem of data sparsity, and a promising solution is to leverage information from the source domain to enhance the recommendation performance for users in the target domain. In fact, different users have different preferences for the items in the inter-domain [8]. Moreover, for the representation learning of users, the items of different domains may contribute differently. Therefore, the desired model should capture users’ attention on items across the target and source domains to model users’ preferences more accurately.

In this paper, we bridge two domains by constructing a unified interaction graph using the common users as the connector. As a result of the powerful representation learning capabilities of GCNs, especially on non-euclidean structured data, it is utilized to learn the representation of each node based upon the constructed unified interaction graph. In our case, we present a newly designed dual attentive embedding propagation approach to capture users’ attention on items across the same or different domains. In particular, we present a novel dual attention mechanism consisting of the node- and domain-level attention methods to estimate the weight of messages passed from items of two domains.

3.3 Dual attentive embedding propagation

Let $e_{u} \in ℝ^{d}$ , $e_{i} \in ℝ^{d}$ , $e_{j} \in ℝ^{d}$ be the embedding vectors of user $u \in U_{C}$ , item $i \in I$ and $j \in J$ , respectively, where d denotes the embedding size. Based upon the constructed interaction graph, we adopt the message-passing strategy [16] to deliver messages from high-order neighbors. Next, we detail the dual attention embedding propagation from the perspective of the target domain T.

3.3.1 Message passing

The representation of a user $u \in U_{C}$ is obtained by accumulating the incoming messages from all item neighbors $i \in N_{i}^{u}$ and $j \in N_{j}^{u}$ based on the unified interaction graph $G$ , where $N_{i}^{u}$ and $N_{i}^{j}$ are the item neighbor sets of user u. In our model, the message passed from an item $i \in N_{i}^{u}$ of domain T to a user u is defined as:

$m_{u \leftarrow i} = γ_{u \leftarrow i} e_{i},$ (1) where γ_u←i is the parameter which controls information passed from item nodes. It is computed by our proposed the novel dual attention mechanism and is described in detail subsequently. As in [18], we remove the feature transformation and nonlinear activation function in our model.

Likewise, the message passing from a user u to an item i is defined as: $m_{i \leftarrow u} = γ_{i \leftarrow u} e_{i} .$ (2) The γ_i←u is computed in the same manner as in Eq. 1. Inspired by the above, given an item $j \in N_{j}^{u}$ , m_j←u and m_u←j are formulated in a similar way.

Fig. 2

Node-level attention.

3.3.2 Dual attention mechanism

For common users between the source and target domains, we design a dual attention mechanism to consider both the node-level and domain-level attentions when learning their embeddings. The dual attention mechanism is to differentiate the contributions of different neighbor nodes and different domains to their embedding update. Specifically, given a node-level attention weight $γ_{u \leftarrow i}^{N}$ and domain-level attention weight $γ_{u \leftarrow I}^{D}$ for user u in domain T, the final weight can be computed as:

$\begin{matrix} γ_{u \leftarrow i} & = γ_{u \leftarrow i}^{N} γ_{u \leftarrow I}^{D}, \end{matrix}$ (3)

Note that for the non-common users and all the items, as they only have neighbors in a single domain, only the node-level attention is used in the embedding update process. Take the item as an example, the attention weight for them is computed as:

$\begin{matrix} γ_{i \leftarrow u} & = γ_{i \leftarrow u}^{N}, \end{matrix}$ (4) where $γ_{i \leftarrow u}^{N}$ denotes the node-level attention weight of user u to item i. For item j of domain S, the node-level attention weights are defined as $γ_{u \leftarrow j}^{N}$ , $γ_{j \leftarrow u}^{N}$ , and $γ_{u \leftarrow J}^{D}$ is the domain-level attention weights. The node-level and domain-level attention methods are described next.

Node-level Attention. We assume that different neighbors have different influences for the current node (user u or item i) in a domain. Based on this assumption, we develop a node-level attention method to estimate the influence of neighbor nodes of the same domain on the current node. For the target domain T, the similarity weight to obtain the contribution of each item node to the user node in domain T is formulated as: $\underset{γ u \leftarrow i}{N} = \frac{\exp (h (e_{u}, e_{i}))}{\sum_{i \in N_{i}^{u}} \exp (h (e_{u}, e_{i}))}$ (5) where h (·) is a similarity measure function that computes the similarity of the current node to its neighbor nodes. In this work, we adopt the cosine similarity function. The attention weight $γ_{i \leftarrow u}^{N}$ of a user u to an item i can be computed in the same way.

Fig. 3

Domain-level attention.

Domain-level Attention. To fully capture users’ attention on domains T and S, we present the domain-level attention method to estimate the influence of different domains while passing messages between nodes.

To compute the domain-level attention for a domain (e.g., T) to a user, we aggregate the influence of all the neighboring nodes from this domain to the target user, and then make a normalization between the two domains. Specifically, for a user u, the domain-level attention weight of domain T is formulated as follows: $e_{u} = L e a k y Re L U (\sum_{i \in N_{i}^{u}} m_{u \leftarrow i} + \sum_{i \in N_{i}^{u}} m_{u \leftarrow j})$ (6) where h (·) is a similarity measure function as Eq. 5.

3.3.3 Message aggregation

We aggregate the messages passed from neighbor nodes to update the embeddings e_u, e_i and e_j for user u, items i and j: $e_{i} = L e a k y Re L U (\sum_{u \in N_{u}^{i}} m_{i \leftarrow U})$ (7) $e_{j} = L e a k y Re L U (\sum_{u \in N_{u}^{j}} m_{j \leftarrow U})$ (8) $\hat{y} u i = e_{u}^{T} e_{i}$ (9) We adopt the activation function LeakyReLU as in previous studies because it’s capability on encoding both positive and small negative signals.

Finally, the model prediction is defined as an inner product of the user and item embeddings, which is computed as: ${\hat{y}}_{ui} = {e_{u}}^{T} e_{i}$ (10)

3.4 Model learning

In this work, we focus on the top-n recommendation task. For model training, we adopt the commonly-used BPR loss as the objective function for this task as previous work [13 , 19]. Formally, the objective function is defined as:

$L_{T} = \sum_{(u, i^{+}, i^{-}) \in O} - ln φ ({\hat{y}}_{{ui}^{+}} - {\hat{y}}_{{ui}^{-}}) + λ {∥ Θ ∥}_{2}^{2},$ (11) where $O = {(u, i^{+}, i^{-}) | (u, i^{+}) \in R^{+}, (u, i^{-}) \in R^{-}}$ denotes the training set. More specifically, $R^{+}$ indicates the observed interactions between user u and i⁺ in the training dataset, and $R^{-}$ denotes the sampled unobserved interaction set. λ is the regularization parameter, which controls the regularization strength. Θ represents the model parameter. φ is the activate function, and we adopt sigmoid in our model. Likewise, the loss function of the source domain can be defined as $L_{S}$ . It should be noted that the item embedding is shared in these two recommendation domains, and different user embeddings are learned for each user in each domain.

In our implementation, message dropout and node dropout has been successfully applied in previous GCN-based models [1]. In our proposed DAG4CDR, we adopt this technique to ameliorate overfitting. In particular, the message dropout randomly ignores some messages passed between nodes in training; the node dropout randomly removes some nodes to block all messages passing through this node. The drop ratios are carefully tuned in our experiments. Besides, the mini-batch Adam is employed for model optimization and parameter update.

4 Experiment

In this section, we conduct extensive experiments on four real datasets to validate our model’s effectiveness and answer the following four questions:

RQ1: Does DAG4CDR outperform the SOTA single-domain and cross-domain methods for common users on the recommendation task?

RQ2: How is the performance of DAG4CDR for non-common users on the recommendation task?

RQ3: How does the interaction level in the two domains affect the performance of DAG4CR?

RQ4: Can the proposed dual attention mechanisms improve performance?

4.1 Experimental setup

4.1.1 Datasets

The public amazon review dataset has been widely adopted in previous studies [2]. In our experiments, we use four categories of this dataset, including Cell Phones and Accessories (Cell_Phone for short), Clothing Shoes and Jewelry (Cloth for short), Electronics (Elec for short), Sports_and_Outdoors (Sport for short), which is shown in Table 1. Moreover, as shown in Table 2, we evaluate our DAG4CDR over four couple datasets, including Cloth & Sport, Cell_Phone & Elec, Cloth & Cell_Phone and Sport & Cell_Phone. For all the data in the four couple datasets, we only use user IDs, item IDs, and their implicit feedback information.

Table 1
Basic statistics of the datasets

Dataset #user #items #interactions sparsity

Cloth 39,387 23,033 278,677 99.97%

Sport 35,598 18,357 296,377 99.95%

Cell_Phone 27,879 10,429 194,439 99.93%

Elec 192,439 63,001 1,689,188 99.99%

Dataset	#user	#items	#interactions	sparsity
Cloth	39,387	23,033	278,677	99.97%
Sport	35,598	18,357	296,377	99.95%
Cell_Phone	27,879	10,429	194,439	99.93%
Elec	192,439	63,001	1,689,188	99.99%

Table 2

Basic statistics of the couple datasets with common users

Dataset	#user	#items	#interactions	sparsity
Cloth	9,928	41,303	97,757	99.98%
Sport	9,928	32,310	102,540	99.97%
Cell_Phone	20,448	28,657	163,238	99.97%
Elec	20,448	60,756	324,344	99.97%
Cloth	5,860	30,870	55,876	99.97%
Cell_Phone	5,860	17,685	53,531	99.95%
Sport	4,998	22,101	55,556	99.95%
Cell_Phone	4,998	14,618	47,444	99.94%

In our experiments, given two datasets (e.g., Cloth and Sport), notice that each dataset can be either used as the source dataset or the the target datasets. More specifically, we can use Cloth as source and Sport as the target datasets for evaluation, and we can also use Sport as source and Cloth as the target datasets.

4.1.2 Evaluation metrics

To evaluate the performance of DAG4CDR and its competitors, We adopt Leave-One-Out (LOO) evaluation method as previous studies [2, 12]. Specifically, we randomly sample one interaction from the dataset for each user as the test set and use the rest of the data as the training set. We randomly selected 999 items from unobserved items to form negative samples for each user [22].

In our evaluation, we adopted Hit Ratio (HR for short) and Normalized Discounted Cumulative Gain (NDCG for short) as the evaluation metrics for the recommendation task. Besides, all the metrics are computed based on the top 20 results, and the reported result is the average values across all the test users.

4.1.3 Baselines

To validate the effectiveness, we compare DAG4CDR with several SOTA competitors. Among these competitors, BPR-MF, NeuMF, and NGCF model users’ preferences and items’ characteristics merely rely on the interactions in the single domain; TMCDR, CoNet, and PPGN leverage information from the auxiliary domain. Those competitors are summarized as follows:

BPR-MF [23]: It is a classical MF-based method that learns implicit feedback from the user’s historical interactions and optimizes through BPR loss using positive and negative samples.

NeuMF [12]: This is a classical neural collaborative method that captures the non-linear feature interactions above the concatenation of user and item vectors using a multi-layer neural network.

NGCF [1]: This is a SOTA GCN-based method for top-n recommendation. It explicitly encodes the collaborative signals from higher-order neighbors by propagating embeddings over the interaction graph.

CoNet [6]: It is a deep learning-based CDR method, which enables knowledge transfer between domains through cross-connected units between the base networks.

PPGN [7]: This is a GCN-based cross-domain recommendation model. It fuses user-item interactions of two domains into a large graph and user and item representations through graph convolutional networks. Finally, the learned representation is used for predictions using MLP

TMCDR [4]: This method implicitly transforms the user representations, which are learned through matrix factorization from the source domain to the target domain feature space by learning a task-oriented meta-network.

4.1.4 Implementation details

For a fair comparison, we referred to the best hyperparameter settings reported in the original papers of the baselines and fine-tuned them with the grid search. We adopt the Xavier method to initialize the user and item embeddings and use a default embedding size of 64. For other settings, we use the default mini-batch size of 1024 and use Adam to optimize all the models with the learning rate of 0.0001. In our model, We implemented DAG4CDR with Pytorch 3 . and carefully tuned its key parameters. We used the default embedding size of 64, and the default layer number of 3. Specifically, the L₂ regularization coefficient is carefully searched in range of {1e^-6, 1e^-5, ⋯, 1e^-1}. Besides, our model is tested every 5 epochs and saved the best parameters. In our experiments, the model is trained in a maximum of 1,000 epochs and adopts the early stopping strategy the same as NGCF [1]. For all models, we performed five experiments and took the average of all experiments as the final result. The codes of our models are released for reproducibility of the experiments. 4 .

4.2 Performance comparison w.r.t common users (RQ1)

The results of our model and all competitors to all common users over four couple datasets are reported in Table 3. The best and second best results are highlighted in bold and underlined form, respectively. From the experiment results, we have some interesting observations.

Table 3
Performance comparison in terms of HR and NDCG to common users

Dataset Metrics BRP-MF NeuMF NGCF CoNet PPGN TMCDR Ours

Cloth HR 0.1121 0.1623 0.1643 0.1658 0.1647 0.1627 0.2148 22.81% ↑

NDCG 0.0568 0.0739 0.0787 0.0759 0.0756 0.0756 0.1273 38.18% ↑

Sport HR 0.1747 0.1907 0.2140 0.1711 0.1703 0.1864 0.2623 18.41% ↑

NDCG 0.0862 0.0855 0.1045 0.0729 0.0725 0.0832 0.1531 31.74% ↑

Cell_Phone HR 0.3261 0.3304 0.3777 0.3142 0.3126 0.3422 0.3827 1.31% ↑

NDCG 0.1751 0.1635 0.2019 0.1507 0.1496 0.1706 0.2151 6.14% ↑

Elec HR 0.3215 0.3265 0.3250 0.3103 0.3076 0.3260 0.3442 6.14% ↑

NDCG 0.1656 0.1667 0.1676 0.1586 0.1563 0.1693 0.1853 8.63% ↑

Cloth HR 0.0662 0.1091 0.0821 0.1123 0.1106 0.0973 0.1276 11.99% ↑

NDCG 0.0325 0.0425 0.0356 0.0490 0.0487 0.0432 0.0500 2.00% ↑

Cell_Phone HR 0.2171 0.2394 0.2397 0.2075 0.2035 0.2335 0.2447 2.04% ↑

NDCG 0.1085 0.1113 0.1282 0.0992 0.0986 0.1102 0.1346 4.75% ↑

Sport HR 0.1987 0.1967 0.2169 0.1821 0.1803 0.2053 0.2213 1.99% ↑

NDCG 0.1011 0.0910 0.1103 0.0791 0.0786 0.1097 0.1235 10.69% ↑

Cell_Phone HR 0.2255 0.2467 0.2543 0.2319 0.2296 0.2542 0.2599 2.39% ↑

NDCG 0.1147 0.1203 0.1268 0.1047 0.1035 0.1256 0.1298 2.37% ↑

Dataset	Metrics	BRP-MF	NeuMF	NGCF	CoNet	PPGN	TMCDR	Ours
Cloth	HR	0.1121	0.1623	0.1643	0.1658	0.1647	0.1627	0.2148	22.81% ↑
	NDCG	0.0568	0.0739	0.0787	0.0759	0.0756	0.0756	0.1273	38.18% ↑
Sport	HR	0.1747	0.1907	0.2140	0.1711	0.1703	0.1864	0.2623	18.41% ↑
	NDCG	0.0862	0.0855	0.1045	0.0729	0.0725	0.0832	0.1531	31.74% ↑
Cell_Phone	HR	0.3261	0.3304	0.3777	0.3142	0.3126	0.3422	0.3827	1.31% ↑
	NDCG	0.1751	0.1635	0.2019	0.1507	0.1496	0.1706	0.2151	6.14% ↑
Elec	HR	0.3215	0.3265	0.3250	0.3103	0.3076	0.3260	0.3442	6.14% ↑
	NDCG	0.1656	0.1667	0.1676	0.1586	0.1563	0.1693	0.1853	8.63% ↑
Cloth	HR	0.0662	0.1091	0.0821	0.1123	0.1106	0.0973	0.1276	11.99% ↑
	NDCG	0.0325	0.0425	0.0356	0.0490	0.0487	0.0432	0.0500	2.00% ↑
Cell_Phone	HR	0.2171	0.2394	0.2397	0.2075	0.2035	0.2335	0.2447	2.04% ↑
	NDCG	0.1085	0.1113	0.1282	0.0992	0.0986	0.1102	0.1346	4.75% ↑
Sport	HR	0.1987	0.1967	0.2169	0.1821	0.1803	0.2053	0.2213	1.99% ↑
	NDCG	0.1011	0.0910	0.1103	0.0791	0.0786	0.1097	0.1235	10.69% ↑
Cell_Phone	HR	0.2255	0.2467	0.2543	0.2319	0.2296	0.2542	0.2599	2.39% ↑
	NDCG	0.1147	0.1203	0.1268	0.1047	0.1035	0.1256	0.1298	2.37% ↑

Firstly, we focus on the performance of the methods for the single domain. As shown in Table 3, NeuMF generally achieves better results than BPR because it adopts a neural network that can better model user-item interactions. NGCF surpasses BPR-MF and NeuMF across all cases, demonstrating the importance of exploiting high-order connectivities over an interaction graph.

Secondly, from the results in Table 3, the CDR models perform better than BPR-MF and NeuMF on most occasions. The results demonstrate the effectiveness of leveraging information from the auxiliary domain. Benefiting from enabling dual knowledge transfer across domains, ConNet performs well over Cloth & Sport and Cloth & Cell_Phone. TMCDR performs better than PPGN, as it can implicitly transform the user embedding of the source domain to the target domain. However, the capability of PPGN is somehow insufficient. The reason might be that it cannot differentiate the contributions of neighbor nodes.

Finally, our proposed DAG4CDR outperforms all competitors across all datasets. Significantly, DAG4CDR can improve the recommendation performance for both the common and unique users of these two domains. This should be credited to the following reasons: (1) Transferring knowledge from the auxiliary domain improves the target domain’s recommendation performance; (2) The direct effect of item feature and topological structure information can be obtained by performing GCN operation on the constructed unified interaction graph; (3) The proposed novel dual attention mechanism models the influences of different items from both the node- and domain-level.

4.3 Performance comparison w.r.t non-common users (RQ2)

Table 4 shows the results of our model and the competitors to all non-common users over four couple datasets. For convenience, we highlight the best and second best results in bold and underlined form, respectively.

Table 4
Performance comparison in terms of HR and NDCG to non-common users

Dataset Metrics BRP-MF NeuMF NGCF Ours

Cloth HR 0.0947 0.0642 0.1024 0.1459 29.81% ↑

NDCG 0.0387 0.0258 0.0432 0.0661 34.64% ↑

Sport HR 0.1146 0.1292 0.1600 0.1926 16.93% ↑

NDCG 0.0563 0.0520 0.0667 0.0830 19.644% ↑

Cell_Phone HR 0.1874 0.1732 0.2123 0.2624 19.09% ↑

NDCG 0.0822 0.0715 0.0846 0.1109 23.72% ↑

Elec HR 0.2141 0.2236 0.2397 0.2517 4.77% ↑

NDCG 0.1073 0.1091 0.1101 0.1199 8.17% ↑

Cloth HR 0.0947 0.0642 0.1024 0.1491 31.32% ↑

NDCG 0.0387 0.0258 0.0432 0.0651 33.64% ↑

Cell_Phone HR 0.1874 0.1732 0.2123 0.2532 16.15% ↑

NDCG 0.0822 0.0715 0.0846 0.1103 23.30% ↑

Sport HR 0.1146 0.1292 0.1600 0.2033 21.30% ↑

NDCG 0.0563 0.0520 0.0667 0.0897 25.64% ↑

Cell_Phone HR 0.1874 0.1732 0.2123 0.2616 18.85% ↑

NDCG 0.0822 0.0715 0.0846 0.1195 29.21% ↑

Dataset	Metrics	BRP-MF	NeuMF	NGCF	Ours
Cloth	HR	0.0947	0.0642	0.1024	0.1459	29.81% ↑
	NDCG	0.0387	0.0258	0.0432	0.0661	34.64% ↑
Sport	HR	0.1146	0.1292	0.1600	0.1926	16.93% ↑
	NDCG	0.0563	0.0520	0.0667	0.0830	19.644% ↑
Cell_Phone	HR	0.1874	0.1732	0.2123	0.2624	19.09% ↑
	NDCG	0.0822	0.0715	0.0846	0.1109	23.72% ↑
Elec	HR	0.2141	0.2236	0.2397	0.2517	4.77% ↑
	NDCG	0.1073	0.1091	0.1101	0.1199	8.17% ↑
Cloth	HR	0.0947	0.0642	0.1024	0.1491	31.32% ↑
	NDCG	0.0387	0.0258	0.0432	0.0651	33.64% ↑
Cell_Phone	HR	0.1874	0.1732	0.2123	0.2532	16.15% ↑
	NDCG	0.0822	0.0715	0.0846	0.1103	23.30% ↑
Sport	HR	0.1146	0.1292	0.1600	0.2033	21.30% ↑
	NDCG	0.0563	0.0520	0.0667	0.0897	25.64% ↑
Cell_Phone	HR	0.1874	0.1732	0.2123	0.2616	18.85% ↑
	NDCG	0.0822	0.0715	0.0846	0.1195	29.21% ↑

The results show that the performance of BPR-MF, NeuMF, and NGCF to non-common users is similar to their performance to common users. Owing to the constructed unified cross-domain interaction graph, our proposed model can enhance the representation learning of non-common users in both domains. Furthermore, it significantly outperforms BPR-MF, NeuMF, and NGCF across all couple datasets. This should be credited to the following reasons. Firstly, the representation of the non-common users indirectly benefits from the neighbor nodes of the source domain by receiving information from the common users. Secondly, the non-common users’ representation learning process can also benefit from the attention mechanism on the node-level, which differentiates the influence of items of the target domain on user preference (See Sec. 3.3.2). Overall, the results demonstrate the effectiveness of DAG4CDR in modeling non-common users’ preferences.

4.4 Effects of data sparsity (RQ3)

An important advantage of CDR is that the source domain information can assist in alleviating the problem of data sparsity. To verify the effectiveness of DAG4CDR for users of the target domain with limited interactions, we conducted experiments to study its performance on the couple datasets Sport &Cloth with different sparsity levels.

In particular, we cluster the common users of the couple dataset Sport&Cloth based upon interaction levels in Sport, which is the target domain in this case. As shown in Fig. 4 (a), users are split into groups: less than 5, 8, 15, and more than 15. This figure shows the performance of different user groups and the number of users in various user groups. The results show that DAG4CDR outperforms all the other competitors across all user groups of Sport. This demonstrates that the information of the source domain is beneficial in alleviating the problem of data sparsity in the target domain.

To further investigate the difference in performance improvement between inactive users (x ≤ 5) and active users (x > 15) in the target domain by using information transferred from the source domain, we cluster users of two specific groups in Sport into four groups based upon the interaction levels of Cloth (See Fig. 4(b) and Fig. 4(c)). From the results illustrated in Fig 4 (b), our model yields substantial performance improvement for inactive users with increasing interactions of the source domain. It also verifies our model’s effectiveness of leveraging information from the source domain to tackle with the sparsity problem. In contrast, Fig 4 (c) illustrated the results of active users of Sport. From the results, the users of the target domain can also benefit from information from the source domain. However, the performance degrades with increasing interactions (from x > 8). The reason might be that excessive interactions of the source domain may inject noisy information into representation learning process, which will hurt the final performance.

Fig. 4

Performance Comparison of various user groups with different interaction levels. Specifically, the lines denote the performance w.r.t HR@20. Note that x denotes the interaction number in the target domain, and y denotes the interaction number in the source domain. (a) Users are grouped based on the interaction levels in the target domain; (b) Users, who have less than 6 (x ≤ 5) interactions in the target domain, are grouped based on the interaction levels in the source domain; (c) Users, who have more than 15 (x > 15) interactions in the target domain, are grouped based on the interaction levels in the source domain.

4.5 Effects of our attention mechanism (RQ4)

In this section, we examine the effectiveness of the presented attention mechanisms. Due to space limitations, we only show experimental results collected on the Cloth & Sport. The analysis is based on the performance comparison of the following variants of our DAG4CDR.

DAG_wa: In this variant, we remove the node-level attention mechanism and the domain-level attention mechanism from DAG4CDR.

DAG_n: This variant removes the domain-level attention from DAG4CDR.

DAG_d: This variant removes the node-level attention from DAG4CDR.

The results of variants and DAG4CDR are reported in Tables 5 and 6. From the results, DAG_n, DAG_d, and DAG4CDR outperform DAG_wa, demonstrating the importance of differentiating the influence from neighbor nodes in GCN. Besides, the performance of DAG_d outperforms DAG_n, which indicates that the influence of different domains is greater than items on users’ preference over Sport&Cloth. Finally, our model outperforms all three variants, which validates the effectiveness of the dual attention mechanisms. In other words, our proposed model can effectively capture user preferences in cross-domain recommendation.

Table 5
Performance of DAG4CDR and its variant for common users

Dataset Metrics DAG_wa DAG_n DAG_d DAG4CDR

Cloth HR 0.2010 0.2098 0.2110 0.2148

NDCG 0.1114 0.1265 0.1270 0.1273

Sport HR 0.2503 0.2601 0.2612 0.2623

NDCG 0.1487 0.1519 0.1528 0.1531

Table 6

Performance of DAG4CDR and its variant for non-common users

Dataset	Metrics	DAG_wa	DAG_n	DAG_d	DAG4CDR
Cloth	HR	0.1424	0.1447	0.1451	0.1459
	NDCG	0.0609	0.0628	0.0634	0.0661
Sport	HR	0.1871	0.1915	0.1924	0.1926
	NDCG	0.0810	0.0824	0.0828	0.0830

5 Conclusion

In this paper, we presented a novel Dual Attentive Graph Convolutional Networks for Cross-Domain Recommendation, which performs embedding propagation over a unified graph constructed based on the interactions of both source and target domains to learn user and item embeddings. With the embedding propagation on the graph, our model can not only enable the common users’ embedding learning to directly benefit from the information in both domains, but also enhance the embedding learning of non-common users indirectly. In addition, it adopts a dual attention mechanism to differentiate the different contributions of neighboring nodes and different domains to the target prediction. The extensive experiment results on four real-world and four couple datasets demonstrate that our model can significantly improve the performance of both common and non-common users with comparisons to the SOTA CDR and single domain recommendation models. Further experiments also validate the effectiveness of the dual attentive mechanism. In the future, we will try to disentangle the learned features in the target domain and source domain to distill the common features and distinct features in both domains, so as to better utilize the information extracted from the source domain to facilitate the recommendation in the target domain.

Footnotes

Acknowledgments

This research is supported in part by the Key R&D Program of Shandong (Major scientific and technological innovation projects), No.:2022CXGC020107, the National Natural Science Foundation of China, No.:61902223, No.:61906108, Young creative team in universities of Shandong Province, No.:2020KJN012 and the NSF of Shandong Province, No.:ZR2021MF040.

www.amazon.com

www.youtube.com

https://pytorch.org

https://github.com/zhangyucs/DAG4CDR

References

Wang , Xiangnan He , Meng Wang , Fuli Feng and Tat-Seng Chua , Neural graph collaborative filtering, In SIGIR, pages 165–174, 2019.

Meng Liu , Jianjun Li , Guohui Li and Peng Pan , Cross domain recommendation via bi-directional transfer graph collaborative filtering networks, In CIKM, pages 885–894, 2020.

Ajit Singh, P. and Geoffrey Gordon, J., Relational learning via collective matrix factorization, In SIGKDD, pages 650–658, 2008.

Yongchun Zhu , Kaikai Ge , Fuzhen Zhuang , Ruobing Xie , Dongbo Xi , Xu Zhang , Leyu Lin and Qing He , Transfer meta framework for cross-domain recommendation to cold start users, In SIGIR, pages 1813–1817, 2021.

Leihui Chen , Jianbing Zheng , Ming Gao , Aoying Zhou , Wei Zeng and Hui Chen , Tlrec: transfer learning for cross domain recommendation, In ICBK, pages 167–172. IEEE, 2017.

Guangneng Hu , Yu Zhang and Qiang Yang , Conet: Collaborative cross networks for cross-domain recommendation, In CIKM, pages 667–676, 2018.

Cheng Zhao , Chenliang Li and Cong Fu , Cross-domain recommendation via preference propagation graphnet, In CIKM, pages 2165–2168, 2019.

Xiangnan He , Zhankui He , Jingkuan Song , Zhenguang Liu , Yu-Gang Jiang and Tat-Seng Chua , Nais: Neural attentive item similarity model for recommendation, TKDE 30 (2018), 2354–2366.

Andriy Mnih and Russ Salakhutdinov, R., Probabilistic matrix factorization, NIPS 20 (2007).

10.

Yifan Hu , Yehuda Koren and Chris Volinsky , Collaborative filtering for implicit feedback datasets, In ICDM, pages 263–272. IEEE, 2008.

11.

Steffen Rendle , Christoph Freudenthaler , Zeno Gantner and Lars Schmidt-Thieme , Bpr: Bayesian personalized ranking from implicit feedback, In UAI, pages 452–461. AUAI, 2009.

12.

Xiangnan He , Lizi Liao , Hanwang Zhang , Liqiang Nie , Xia Hu and Tat-Seng Chua , Neural collaborative filtering, In WWW, pages 173–182, 2017.

13.

Zhiyong Cheng , Fan Liu , Shenghan Mei , Yangyang Guo , Lei Zhu and Liqiang Nie , Feature-level attentive icf for recommendation, ACM Transactions on Information Systems (TOIS) 40(4) (2022), 1–24.

14.

Cheng-Kang Hsieh , Longqi Yang , Yin Cui , Tsung-Yi Lin , Serge Belongie and Deborah Estrin , Collaborative metric learning, In WWW, pages 193–201. IW3C2, 2017.

15.

Fan Liu , Zhiyong Cheng , Changchang Sun , Yinglong Wang , Liqiang Nie and Mohan Kankanhalli , User diverse preference modeling by multimodal attentive metric learning, In MM, page 1526–1534. ACM, 2019.

16.

Rianne van den Berg , Kipf, N. and Max Welling , Graph convolutional matrix completion, arXiv preprint arXiv:1706.02263, 2017.

17.

Felix Wu , Amauri Souza , Tianyi Zhang , Christopher Fifty , Tao Yu and Kilian Weinberger , Simplifying graph convolutional networks, In ICML, pages 6861–6871. PMLR, 2019.

18.

Xiangnan He , Kuan Deng , Xiang Wang , Yan Li , Yongdong Zhang and Meng Wang , Lightgcn: Simplifying and powering graph convolution network for recommendation, In SIGIR, pages 639–648, 2020.

19.

Fan Liu , Zhiyong Cheng , Lei Zhu , Zan Gao and Liqiang Nie , Interest-aware message-passing gcn for recommendation, In WWW, page 1296–1305. ACM, 2021.

20.

Babak Loni , Yue Shi , Martha Larson and Alan Hanjalic , Cross-domain collaborative filtering with factorization machines, In ECIR, pages 656–661. Springer, 2014.

21.

Ronald Chung , David Sundaram and Ananth Srinivasan , Integrated personal recommender systems, In ICEC, ICEC ’07, page 65–74. ACM, 2007.

22.

Wayne Xin Zhao , Junhua Chen , Pengfei Wang , Qi Gu and Ji-Rong Wen , Revisiting alternative experimental settings for evaluating top-n item recommendation algorithms, In CIKM, pages 2329–2332. ACM, 2020.

23.

Yehuda Koren , Robert Bell and Chris Volinsky , Matrix factorization techniques for recommender systems, Computer 42(8) (2009), 30–37.