Abstract
Cross-domain recommendation aims to alleviate the target domain’s data sparsity problem by leveraging source domain knowledge. Existing GCN-based approaches perform graph convolution operations in each domain separately. However, the direct effect of item feature and topological structure information in the source domain are neglected for user preference modeling in the target domain. In this paper, we propose a novel Dual Attentive Graph Convolutional Network for Cross-Domain Recommendation (DAG4CDR). Specifically, we integrate the source and target domain’s interaction data to construct a unified user-item bipartite graph and then perform GCN propagation on the graph to learn user and item embeddings. Over the unified graph, the interaction data from both domains can be leveraged to learn user and item embeddings via information propagation. In the embedding aggregation phase, the messages passed from different items of two domains to users are weighted by a designed dual attention mechanism, which considers the contributions of different items from both node- and domain-level. We conducted extensive experiments to validate the effectiveness of our method on several publicly available datasets, and the results demonstrate the superiority of our model on preference modeling for both common and non-common users.
Introduction
In the information era, the amount of information on the internet has grown explosively. Hundreds of millions of internet users surf online daily and hourly. While benefiting from the vast amount of data, it has become more and more challenging for users to get the desired information. The recommender system has become increasing important on tackling this information overload problem in various online platforms. For example, Amazon 1 recommends products that may be appealing to users based on their purchase history; YouTube 2 recommends videos to users based on their viewing history. Among the existing recommendation methods, Collaborative Filtering (CF)-based approaches have achieved a great success due to their simplicity and effectiveness in modeling user preference and item characteristics relying on their interaction data. Yet, the recommendation performance will be degraded sharply when the interactions are sparse [1, 2].
Cross-Domain Recommendation (CDR) methods have been widely investigated in recent years to alleviate the data sparsity problem by exploiting helpful information from a related domain (i.e., source domain) to enhance the performance in a target domain. The underlying assumption of CDR is that similar users share similar preferences across different domains. For example, users who purchased soccer balls in sports products would mostly prefer soccer socks in clothing. We broadly classified existing methods into two approaches established on exploiting information from different domains. One approach is to transfer knowledge from the source domain to the target domain by collective matrix factorization [3] or to map the respective user representations of two domains [4]. Despite the success of these approaches, the drawback is that they still maintain the independence of the domains and fail to connect different domains directly through users or items. Another approach is to aggregate knowledge from multiple domains by embedding all users and items into the same latent space and using the shared users and items between domains as a bridge to connect different domains [5]. However, these approaches have not exploited the higher-order connections between users and items. Recently, Graph Convolutional Networks (GCNs) have drawn increasing attention in CDR, and some GCN-based CDR methods have been proposed due to their good capability in modeling higher-order relations. For example, BiTGCF [2] exploits the higher-order connectivity between nodes in each domain and shares information between the common users in two domains based on a bidirectional knowledge transfer method.
Despite the progress, the direct effect of item features and topological structure information in the source domain have not been explored in existing methods. Current GCN-based CDR methods first perform graph convolution operation in each domain separately to obtain the domain-specific user and item representations and then adopt a feature transfer technique to fuse embeddings from different domains to obtain the final representations of the common users [2]. We argued that this approach cannot well exploit the information of common users in the user and item embedding learning process, because the interacted item information of common users is still used separately in each domain in the embedding learning process. And the information is only used for updating the embeddings of the common users via a late fusing strategy. A basic assumption of CDR models is that users share some general interests across different domains. Therefore, the interacted item information of the common users across different domains can be leveraged to enhance the embedding learning mutually in different domains, and it should be able to influence the embeddings of non-common users and items indirectly in the collaborative learning process. Besides, most existing GCN-based CDR methods treat items equally in each domain [2, 6]. In those methods, the message of all items is passed to the user node without differentiating their importance. For example, PPGN [7] constructs a unified cross-domain preference graph through user-item interactions in two domains to capture the propagation of user preferences in the graph, however, it treats all the items (from different domains) equally in the aggregation phase. We deem that some items are more important for a user [8]. More importantly, for recommending items to a user in the target domain, items from different domains may contribute differently to the prediction. Generally, items in the target domain (e.g., Cloth) should contribute more than the items from the source domain (e.g., Sport). It should be noted that for different users, the contributions from different domains may contribute differently.
Inspired by the above considerations, we proposed a Dual Attentive Graph Convolutional Networks for Cross-Domain Recommendation (short for DAG4CDR) in this paper. Specifically, to leverage the collaborative information from two domains, we construct a unified interaction graph using the interaction data of two domains. Based on the graph, our model can simultaneously exploit the interacted nodes from both the source and target domains to learn the embeddings. Accordingly, our proposed method can naturally model the direct effect of item features and topological information in the source domain. To model the effects of different items, especially the ones from different domains, on a user’s preference for the items in the target domain, we design a novel dual attention mechanism to estimate the importance of item nodes from both node-level and domain-level. We conducted extensive experiments on several large-scale real-world datasets. A set of competitive methods, which are designed for single-domain and cross-domain recommendation were adopted in our experiments for comparisons. Experimental results show that our method can achieve substantial improvement over all the adopted competitors. Additional experiments for users with different sparsity levels demonstrate the advantage of leveraging information from auxiliary domains in recommendation and the effectiveness of our model. In addition, ablation studies are conducted to validate the importance of differentiating the contributions of different items and domains in embedding learning.
Related work
Collaborative Filtering (CF) has accomplished great success in recommender systems due to its simplicity by modeling user preference purely based on user-item interaction data. A representative CF-based method is matrix factorization (MF) [9–11], which learns users and items representations as feature vectors by using user-item interaction matrix, eventually inferring a user’s preferences to an item by multiplying their feature vectors. Due to its great success, various variants have been proposed, such as probabilistic matrix factorization (PMF) [9], weighted regularized matrix factorization (WRMF) [10] and bayesian personalized ranking (BPR) [11]. The powerful capability of deep learning techniques has been demonstrated in various tasks. Accordingly, it is introduced to model the non-linear user behaviors in the recommendation [12, 13]. Besides, metric learning-based approaches replace the dot product with Euclidean distance to capture fine-grained user preferences [14, 15].
Despite the remarkable success, the above recommendation methods suffer from the problem of data sparsity. The reason is that they rely on exploiting the first-order proximity of interaction data in a single domain. Exploiting the high-order proximity of interaction data or leveraging the information of auxiliary domain are effective ways to assist in learning feature vectors. Next, we focus on graph convolutional networks in recommendations and cross-domain recommendation methods.
Graph convolutional networks
In the last few years, graph convolutional networks (GCNs) have achieved great success for their capability on non-Euclidean data in various tasks [1, 16–19]. The core of GCNs is how to aggregate the messages of neighboring nodes iteratively.
More recently, the GCNs have also been widely applied in recommendation. For example, graph convolutional matrix completion (GC-MC) [16] performs the graph convolutional operation to exploit the first-order connectivities in neighbor nodes. To capture the information from high-order neighbors, neural graph collaborative filtering (NGCF) [1] propagates embeddings over the interaction graph and achieves promising results. Given that the transformation function and nonlinear activation function contribute less to embedding propagation in GCNs, LightGCN [18] simplifies the model structure by removing them in NGCF. Besides, stacking multiple embedding propagation layers generally leads to the over-smoothing problem. To deal with this problem, IMP-GCN [19] performs high-order graph convolution over the sub-graphs consisting of users with similar interests.
Despite its success, recommender systems inevitably encounter data sparsity problem. There is a practical solution to the problem which is transferring knowledge from other related domains. In this paper, a novel GCN-based CDR method is proposed, which learns the representation of users and items through performing GCN operation over a unified interaction graph constructed with the interaction data of different domains.
Cross-domain recommendation
CDR aims to utilize the information of the source domain to alleviate the data sparsity problem in the target domain. Most existing CDR methods typically bridge these two domains by using transferring techniques [2, 20].
The transfer can happen explicitly through inter-domain similarities and common attributes. For example, Chung et al. [21] picked the items relevant in the source domain based on the common attributes with the target domain. In contrast, the transfer can also be performed implicitly via shared user/item latent features or rating patterns. For example, Chen et al. [5] encoded users and items into the latent representations of their domains and used the common users and items transfer knowledge between the two domains. Hu et al. [6] designed a neural network connected by cross mappings of two sets of basic networks to learn complex user-item interactions between two domains. This effectively refines users’ embedding vectors in the target domain with the features extracted from the two domains. Its drawback is that it is difficult to have adequate knowledge transfer when the common users and items are sparse.
With the success of applying GCN techniques for modeling users’ preferences, some GCN-based CDR models have also been developed over the years [2, 7]. For instance, PPGN [7] models the higher-order interaction data in different domains by constructing a cross-domain preference matrix. However, existing GCN-based CDR methods often treat the neighboring nodes in the graph equally when learning the embedding of target nodes, which may degrade the performance. For example, different items may contribute differently to modeling a user’s preferences; and the items from different domains should also affect the user’s preference for the target domain of items differently in the preference modeling. Therefore, we presented the dual attention-based GCN-based CDR model to capture such effects.
Proposed model
Preliminaries
Before describing the details of DAG4CDR, we first describe the problem setup. For a source domain S and a target domain T, we denote the user set of them as
We constructed a unified interaction graph

A toy example of a unified interaction graph based on user interaction data in target domain T and source domain S.
Most existing CF-based recommendation methods suffer from the problem of data sparsity, and a promising solution is to leverage information from the source domain to enhance the recommendation performance for users in the target domain. In fact, different users have different preferences for the items in the inter-domain [8]. Moreover, for the representation learning of users, the items of different domains may contribute differently. Therefore, the desired model should capture users’ attention on items across the target and source domains to model users’ preferences more accurately.
In this paper, we bridge two domains by constructing a unified interaction graph using the common users as the connector. As a result of the powerful representation learning capabilities of GCNs, especially on non-euclidean structured data, it is utilized to learn the representation of each node based upon the constructed unified interaction graph. In our case, we present a newly designed dual attentive embedding propagation approach to capture users’ attention on items across the same or different domains. In particular, we present a novel dual attention mechanism consisting of the node- and domain-level attention methods to estimate the weight of messages passed from items of two domains.
Dual attentive embedding propagation
Let
Message passing
The representation of a user
Likewise, the message passing from a user u to an item i is defined as:

Node-level attention.
For common users between the source and target domains, we design a dual attention mechanism to consider both the node-level and domain-level attentions when learning their embeddings. The dual attention mechanism is to differentiate the contributions of different neighbor nodes and different domains to their embedding update. Specifically, given a node-level attention weight
Note that for the non-common users and all the items, as they only have neighbors in a single domain, only the node-level attention is used in the embedding update process. Take the item as an example, the attention weight for them is computed as:

Domain-level attention.
To compute the domain-level attention for a domain (e.g., T) to a user, we aggregate the influence of all the neighboring nodes from this domain to the target user, and then make a normalization between the two domains. Specifically, for a user u, the domain-level attention weight of domain T is formulated as follows:
We aggregate the messages passed from neighbor nodes to update the embeddings e
u
, e
i
and e
j
for user u, items i and j:
Finally, the model prediction is defined as an inner product of the user and item embeddings, which is computed as:
In this work, we focus on the top-n recommendation task. For model training, we adopt the commonly-used BPR loss as the objective function for this task as previous work [13, 19]. Formally, the objective function is defined as:
In our implementation, message dropout and node dropout has been successfully applied in previous GCN-based models [1]. In our proposed DAG4CDR, we adopt this technique to ameliorate overfitting. In particular, the message dropout randomly ignores some messages passed between nodes in training; the node dropout randomly removes some nodes to block all messages passing through this node. The drop ratios are carefully tuned in our experiments. Besides, the mini-batch Adam is employed for model optimization and parameter update.
In this section, we conduct extensive experiments on four real datasets to validate our model’s effectiveness and answer the following four questions:
RQ1: Does DAG4CDR outperform the SOTA single-domain and cross-domain methods for common users on the recommendation task?
RQ2: How is the performance of DAG4CDR for non-common users on the recommendation task?
RQ3: How does the interaction level in the two domains affect the performance of DAG4CR?
RQ4: Can the proposed dual attention mechanisms improve performance?
Experimental setup
Datasets
The public amazon review dataset has been widely adopted in previous studies [2]. In our experiments, we use four categories of this dataset, including Cell Phones and Accessories (
Basic statistics of the datasets
Basic statistics of the datasets
Basic statistics of the couple datasets with common users
In our experiments, given two datasets (e.g., Cloth and Sport), notice that each dataset can be either used as the source dataset or the the target datasets. More specifically, we can use Cloth as source and Sport as the target datasets for evaluation, and we can also use Sport as source and Cloth as the target datasets.
To evaluate the performance of DAG4CDR and its competitors, We adopt Leave-One-Out (LOO) evaluation method as previous studies [2, 12]. Specifically, we randomly sample one interaction from the dataset for each user as the test set and use the rest of the data as the training set. We randomly selected 999 items from unobserved items to form negative samples for each user [22].
In our evaluation, we adopted Hit Ratio (HR for short) and Normalized Discounted Cumulative Gain (NDCG for short) as the evaluation metrics for the recommendation task. Besides, all the metrics are computed based on the top 20 results, and the reported result is the average values across all the test users.
Baselines
To validate the effectiveness, we compare DAG4CDR with several SOTA competitors. Among these competitors, BPR-MF, NeuMF, and NGCF model users’ preferences and items’ characteristics merely rely on the interactions in the single domain; TMCDR, CoNet, and PPGN leverage information from the auxiliary domain. Those competitors are summarized as follows:
Implementation details
For a fair comparison, we referred to the best hyperparameter settings reported in the original papers of the baselines and fine-tuned them with the grid search. We adopt the Xavier method to initialize the user and item embeddings and use a default embedding size of 64. For other settings, we use the default mini-batch size of 1024 and use Adam to optimize all the models with the learning rate of 0.0001. In our model, We implemented DAG4CDR with Pytorch 3 . and carefully tuned its key parameters. We used the default embedding size of 64, and the default layer number of 3. Specifically, the L2 regularization coefficient is carefully searched in range of {1e-6, 1e-5, ⋯, 1e-1}. Besides, our model is tested every 5 epochs and saved the best parameters. In our experiments, the model is trained in a maximum of 1,000 epochs and adopts the early stopping strategy the same as NGCF [1]. For all models, we performed five experiments and took the average of all experiments as the final result. The codes of our models are released for reproducibility of the experiments. 4 .
Performance comparison w.r.t common users (RQ1)
The results of our model and all competitors to all common users over four couple datasets are reported in Table 3. The best and second best results are highlighted in bold and underlined form, respectively. From the experiment results, we have some interesting observations.
Performance comparison in terms of HR and NDCG to common users
Performance comparison in terms of HR and NDCG to common users
Firstly, we focus on the performance of the methods for the single domain. As shown in Table 3, NeuMF generally achieves better results than BPR because it adopts a neural network that can better model user-item interactions. NGCF surpasses BPR-MF and NeuMF across all cases, demonstrating the importance of exploiting high-order connectivities over an interaction graph.
Secondly, from the results in Table 3, the CDR models perform better than BPR-MF and NeuMF on most occasions. The results demonstrate the effectiveness of leveraging information from the auxiliary domain. Benefiting from enabling dual knowledge transfer across domains, ConNet performs well over Cloth & Sport and Cloth & Cell_Phone. TMCDR performs better than PPGN, as it can implicitly transform the user embedding of the source domain to the target domain. However, the capability of PPGN is somehow insufficient. The reason might be that it cannot differentiate the contributions of neighbor nodes.
Finally, our proposed DAG4CDR outperforms all competitors across all datasets. Significantly, DAG4CDR can improve the recommendation performance for both the common and unique users of these two domains. This should be credited to the following reasons: (1) Transferring knowledge from the auxiliary domain improves the target domain’s recommendation performance; (2) The direct effect of item feature and topological structure information can be obtained by performing GCN operation on the constructed unified interaction graph; (3) The proposed novel dual attention mechanism models the influences of different items from both the node- and domain-level.
Table 4 shows the results of our model and the competitors to all non-common users over four couple datasets. For convenience, we highlight the best and second best results in bold and underlined form, respectively.
Performance comparison in terms of HR and NDCG to non-common users
Performance comparison in terms of HR and NDCG to non-common users
The results show that the performance of BPR-MF, NeuMF, and NGCF to non-common users is similar to their performance to common users. Owing to the constructed unified cross-domain interaction graph, our proposed model can enhance the representation learning of non-common users in both domains. Furthermore, it significantly outperforms BPR-MF, NeuMF, and NGCF across all couple datasets. This should be credited to the following reasons. Firstly, the representation of the non-common users indirectly benefits from the neighbor nodes of the source domain by receiving information from the common users. Secondly, the non-common users’ representation learning process can also benefit from the attention mechanism on the node-level, which differentiates the influence of items of the target domain on user preference (See Sec. 3.3.2). Overall, the results demonstrate the effectiveness of DAG4CDR in modeling non-common users’ preferences.
An important advantage of CDR is that the source domain information can assist in alleviating the problem of data sparsity. To verify the effectiveness of DAG4CDR for users of the target domain with limited interactions, we conducted experiments to study its performance on the couple datasets Sport &Cloth with different sparsity levels.
In particular, we cluster the common users of the couple dataset Sport&Cloth based upon interaction levels in Sport, which is the target domain in this case. As shown in Fig. 4 (a), users are split into groups: less than 5, 8, 15, and more than 15. This figure shows the performance of different user groups and the number of users in various user groups. The results show that DAG4CDR outperforms all the other competitors across all user groups of Sport. This demonstrates that the information of the source domain is beneficial in alleviating the problem of data sparsity in the target domain.
To further investigate the difference in performance improvement between inactive users (x ≤ 5) and active users (x > 15) in the target domain by using information transferred from the source domain, we cluster users of two specific groups in Sport into four groups based upon the interaction levels of Cloth (See Fig. 4(b) and Fig. 4(c)). From the results illustrated in Fig 4 (b), our model yields substantial performance improvement for inactive users with increasing interactions of the source domain. It also verifies our model’s effectiveness of leveraging information from the source domain to tackle with the sparsity problem. In contrast, Fig 4 (c) illustrated the results of active users of Sport. From the results, the users of the target domain can also benefit from information from the source domain. However, the performance degrades with increasing interactions (from x > 8). The reason might be that excessive interactions of the source domain may inject noisy information into representation learning process, which will hurt the final performance.

Performance Comparison of various user groups with different interaction levels. Specifically, the lines denote the performance w.r.t HR@20. Note that x denotes the interaction number in the target domain, and y denotes the interaction number in the source domain. (a) Users are grouped based on the interaction levels in the target domain; (b) Users, who have less than 6 (x ≤ 5) interactions in the target domain, are grouped based on the interaction levels in the source domain; (c) Users, who have more than 15 (x > 15) interactions in the target domain, are grouped based on the interaction levels in the source domain.
In this section, we examine the effectiveness of the presented attention mechanisms. Due to space limitations, we only show experimental results collected on the Cloth & Sport. The analysis is based on the performance comparison of the following variants of our DAG4CDR.
The results of variants and DAG4CDR are reported in Tables 5 and 6. From the results, DAG n , DAG d , and DAG4CDR outperform DAG wa , demonstrating the importance of differentiating the influence from neighbor nodes in GCN. Besides, the performance of DAG d outperforms DAG n , which indicates that the influence of different domains is greater than items on users’ preference over Sport&Cloth. Finally, our model outperforms all three variants, which validates the effectiveness of the dual attention mechanisms. In other words, our proposed model can effectively capture user preferences in cross-domain recommendation.
Performance of DAG4CDR and its variant for common users
Performance of DAG4CDR and its variant for common users
Performance of DAG4CDR and its variant for non-common users
In this paper, we presented a novel Dual Attentive Graph Convolutional Networks for Cross-Domain Recommendation, which performs embedding propagation over a unified graph constructed based on the interactions of both source and target domains to learn user and item embeddings. With the embedding propagation on the graph, our model can not only enable the common users’ embedding learning to directly benefit from the information in both domains, but also enhance the embedding learning of non-common users indirectly. In addition, it adopts a dual attention mechanism to differentiate the different contributions of neighboring nodes and different domains to the target prediction. The extensive experiment results on four real-world and four couple datasets demonstrate that our model can significantly improve the performance of both common and non-common users with comparisons to the SOTA CDR and single domain recommendation models. Further experiments also validate the effectiveness of the dual attentive mechanism. In the future, we will try to disentangle the learned features in the target domain and source domain to distill the common features and distinct features in both domains, so as to better utilize the information extracted from the source domain to facilitate the recommendation in the target domain.
Footnotes
Acknowledgments
This research is supported in part by the Key R&D Program of Shandong (Major scientific and technological innovation projects), No.:2022CXGC020107, the National Natural Science Foundation of China, No.:61902223, No.:61906108, Young creative team in universities of Shandong Province, No.:2020KJN012 and the NSF of Shandong Province, No.:ZR2021MF040.
www.amazon.com
www.youtube.com
https://pytorch.org
https://github.com/zhangyucs/DAG4CDR
