Asymmetric multilevel interactive attention network integrating reviews for item recommendation

Abstract

Recently, most studies in the field have focused on integrating reviews behind ratings to improve recommendation performance. However, two main problems remain (1) Most works use a unified data form and the same processing method to address the user and the item reviews, regardless of their essential differences. (2) Most works only adopt simple concatenation operation when constructing user-item interaction, thus ignoring the multilevel relationship between the user and the item, which may lead to suboptimal recommendation performance. In this paper, we propose a novel Asymmetric Multi-Level Interactive Attention Network (AMLIAN) integrating reviews for item recommendation. AMLIAN can predict precise ratings to help the user make better and faster decisions. Specifically, to address the essential difference between the user and the item reviews, AMLIAN uses the asymmetric network to construct user and item features using different data forms (document-level and review-level). To learn more personalized user-item interaction, the user ID and item ID and some processed features of user reviews and item reviews are respectively used for multilevel relationships. Experiments on five real-world datasets show that AMLIAN significantly outperforms state-of-the-art methods.

Keywords

Recommender systems neural network attention mechanism review analysis

1. Introduction

In recent years, recommender systems have been widely used in e-commerce platforms [1, 2], e.g., Amazon and Taobao. Traditional recommendation methods are based on collaborative filtering [3, 4], which has achieved great success, due to its simplicity and effectiveness. However, it also presents some limitations. Specifically, such methods [5, 6, 7] only make use of the user’s rating information, while it is difficult for us to provide reliable recommendations for a user or item with few ratings (cold start).

Figure 1.

Examples of user and item historical reviews.

To address this problem, some studies [1, 8, 9, 10, 11] have begun to focus on the reviews written by users. Compared to ratings, the user’s reviews for the items can reflect the user’s interest intuitively and accurately on many aspects. The reviews accepted by the item are unified in the review target, which can reflect the significant attributes of the item. In Fig. 1, on the left side are a user’s reviews of a mobile phone, pair of headphones and clothes purchased. We find that users prefer items from specific brands, with high cost performance and that are comfortable (aspect features). On the right side are reviews written by several users on a specific mobile phone. System fluency, battery capacity and signal strength are found to be the significant attributes of this item. In addition, studies have proven that, especially for users and items with few ratings, using review information in recommender systems can effectively improve prediction accuracy [12, 13, 14, 15, 16].

In recent years, deep learning has made remarkable achievements in various fields, such as natural language processing. Inspired by these works [17, 18], we use the strong fitting ability of deep learning to help our recommendation task achieve better results. From the perspective of problem modeling, recent works on exploiting review information can be divided into two types [19]: those using document-level methods and those applying review-level methods. The former combine all reviews into a document before processing. For example, DeepCoNN [1] utilizes two identical convolutional neural networks (CNNs) to extract semantic features from a user and an item document, and then two features are fused to generate rating prediction. The latter model reviews separately and then fuse them. For example, NARRE [20] utilizes two same CNNs and a specific attention mechanism to extract the features of user and item reviews at the review level and integrates the features and user ID and item ID to complete the rating prediction. Although these approaches have improved the performance of the most advanced methods, some problems remain:

The types of user and item reviews are essentially different. User reviews are created by the user on a number of purchased items, which reflects the user’s wide aspect interests, and the importance of these reviews is necessarily varied. Therefore, it is appropriate to adopt review-level data and methods. Item reviews are created by different users according to the attributes of the given item, which are equal in aspect style. Therefore, it is suitable to adopt document-level data and methods. However, most works use the same data form and method to address user and item reviews, which may fail to capture more accurate user and item features.

There is a lack of user-item multilevel interaction. For example, when measuring whether a user would buy a pair of headphones, we should determine the relevance of this product as indicated by the user’s reviews of digital products. Furthermore, it is necessary for each user to learn about specific multilevel (e.g., from coarse-grained colors to fine-grained price ratios) features related to the target item. However, most approaches lack consideration of multilevel interaction when capturing user and item features, which may lead to suboptimal recommendation performance.

To address these limitations, we propose a novel model, named Asymmetric Multi-Level Interactive Attention Network (AMLIAN). AMLIAN can use the user (item) review information and user (item) ID to generate more accurate prediction ratings. Specifically, the model uses an asymmetric multilevel interactive network and attention mechanisms to fuse different types of reviews and user and item IDs to achieve prediction ratings. The main contributions of this paper can be summarized as follows:

Our proposed AMLIAN adopts four attention layers. The user representation network includes a text attention layer and a review attention layer. The item representation network includes a word attention layer and an aspect attention layer. The former network is used to progressively select useful aspect features included in a single review and the useful aspect features of all reviews. The latter progressively selects useful words and aspect features at the document level.

We propose a unified asymmetric multilevel interactive deep learning model that fuses ratings and reviews. The model considers both coarse- and fine-grained interactions between the user and item in constructing user-item interaction. In addition, as far as we know, AMLIAN is the first model to adopt different types of data according to the characteristics of user and item reviews.

Experiments are performed on five real-world datasets, and the experimental results show that the proposed AMLIAN model achieves better rating prediction accuracy than the existing state-of-the-art methods.

2. Related work

Many approaches have been developed to improve recommendation performance. Our work is related to two lines of literature focused on document-level and review-level methods. Document-level methods: concatenate all user or item reviews to form a document, and then extract the features of the document as the feature representation of the user or item. Review-level methods: model each review separately first and then construct the feature representation of the user or item according to the importance of each review.

2.1 Document-level methods

Compared to review-level methods, this is a coarse-grained method. For example, D-Att [21] followed the premise that different words had different importance for modeling users and items. It introduced two word-level attention mechanisms to find more informative words. DAML [22] focused on the importance of different words and the relationship between feature interactions by using two kinds of attention, and then fed the features into neural factorization machines to generate the predicted score. ANR [23] used user and item documents to model the features of a user and an item from aspect perspectives using a coattention mechanism. CARL [24] believed that the same word had different semantic information in different contexts, so the author that paired, interrelated and dynamic features should be learned with CNNs.

However, this paradigm is not suitable for dealing with reviews written by a user. These reviews reflect a wide range of user interests. For different target items, the importance of these reviews must vary. We treat these reviews in the same way as the item, which is not conducive to the hierarchical construction of user features.

2.2 Review-level methods

Compared to document-level methods, this is a fine-grained method. For example, NRPA [25] selected different important words and reviews for different users and items through an attention mechanism. It learned the feature representation of reviews from words and users or items from reviews. TAERT [26] used a temporal convolutional network to obtain the feature representation of user and item reviews and then took advantage of three interrelated attention mechanisms to generate rating predictions and explanations. EDMF $+$ [27] utilized the attention mechanism of CNNs to obtain word-level feature representation for users and items. Considering the sparseness of review information, the author uses the L0 norm to constrain review information. DARMH [5] constructed local and interactive attention to extract a user’s personalized preference for a target item. In addition, to help achieve the features of item reviews, review helpfulness features were used as part of the attention vector of item reviews.

However, this paradigm is not suitable for dealing with reviews received by an item. Because such reviews come from reviews written by different users of an item, their reviews of the item are made according to the attributes of the item. Therefore, these reviews are the same in aspect style. It is not conducive to unified construction item attributes to treat these reviews in the same way as the user.

3. Problem formalization

Let $u_{i}$ and $U=\{u_{1},u_{2},\ldots,u_{i},\ldots,u_{M}\}$ denote a user and the user set, respectively; similarly, $i_{j}$ and $I=\{i_{1},i_{2},\ldots,i_{j},\ldots,i_{N}\}$ denote an item and the item set, respectively. $R\in R^{M\times N}$ denotes the interaction matrix, which can include explicit real-valued ratings or implicit binary 0/1 feedback. Here, we focus on explicit real-valued ratings $r_{u,i}\in R$ .

Let $s_{i}^{u}$ and $S^{u}=\{s_{1}^{u},s_{2}^{u},\ldots,s_{i}^{u},\ldots,s_{m}^{u}\}$ denote a user review text and the whole user review text set, respectively. Let $t_{j}^{i}$ and $T^{i}=\{t_{1}^{i},t_{2}^{i},\ldots,t_{l}^{i}\}$ denote a word in the whole spliced item review text set and the whole spliced item review text set, respectively.

The AMLIAN model can be formalized as follows:

Input: The input of interaction data is the identity of users and items. We use one-hot encoded vectors $v_{u}^{U}$ and $v_{i}^{I}$ that describe user $u\in U$ and item $i\in I$ , respectively. The input of the review data is the user review text $s_{i}^{u}\in S^{u}$ , which is the review dataset of user $u$ , and the item document text $t_{j}^{i}\in T^{i}$ is the whole spliced review text set of item $i$ .

Output: The whole training process can be expressed by function: $f_{u}:U,I,S,T\rightarrow\hat{R}$ . The output of the model is prediction rating $\hat{R}$ . That is, for any user $u$ , we can obtain the prediction rating $\hat{r}_{u,i}$ based on the function $f_{u}:u,i,s_{i}^{u},t_{j}^{i}\rightarrow\hat{r}_{u,i}$ .

4. The proposed AMLIAN model

The goal of our model is to predict a rating given a user and an item. For example, as shown in Fig. 1, the user “katrina Malat” reviewed on headphone, underwear and so on. The target item “Apple iPhone 12” received reviews from different users. We construct user feature representation according to the user review information and item feature representation according to target item review information. Then they are fused and sent to the prediction layer (NFM) to generate the prediction ratings of the target item “Apple iPhone 12” by the user “katrina Malat”. The overall architecture of our proposed AMLIAN is shown in Fig. 2. It consists of three key modules:

Figure 2.

The AMLIAN model architecture.

User representation network: This network uses the user’s review-level data to learn the feature representation of the user’s reviews, including the review aspect level attention layer and review level attention layer. Specifically, the review aspect level attention layer uses the attention mechanism of the item ID, the single aspect features extracted by CNNs and single review features to learn the multiaspect feature distribution weight in each user review to make it more relevant to the aspects of target reviews. According to the characteristics of the target item, the review level attention layer uses the attention mechanism of the item ID, all aspects extracted by CNNs and all review features to learn the weight distribution of all reviews of the user. Finally, according to the attention score, all features of the user are fused as the feature representation of the user.

Item representation network: This network uses document-level data of the target item to learn the feature representation of the target item, including the word level attention layer and document aspect level attention layer. Specifically, the word level attention layer uses the attention mechanism of the word sequence, user ID and word set around the center word to learn the weight distribution of words in the document to make it more relevant to the user’s preferences. The document aspect level attention layer uses the attention mechanism of Euclidean distance between the fitting features of aspects of the weighted document and the fitting features of aspects of the user’s weighted reviews to learn the weight distribution of the aspect features of the target item. Finally, according to the attention score, all features of the target item are fused as the feature representation of the target item.

Prediction layer: the user feature representation, item feature representation, user ID and item ID are integrated, and the feature is fed into the NFM prediction layer to generate the final prediction rating. Specifically, NFM is a classic recommendation prediction model, which introduces Bi-interaction Pooling layer on the basis of Factorization Machines (FM), so that it can deal with high-order and nonlinear features.

4.1 User representation network

Review aspect level attention layer

This layer is shown in detail in Fig. 3a. The user’s interests are extensive, and the purpose of this layer is to extract the most relevant aspects of the target item from each user’s review according to the single characteristics of the target item.

Embedding look up layer

Consider review text $s_{i}^{u}\in S^{u}$ , which is composed of $v$ words. It can be expressed as $s_{i}^{u}=\{t_{1}^{u},t_{2}^{u},\ldots,t_{v}^{u}\}$ . In the embedding layer, the word vector model Glove [28] is used to map each word in $s_{i}^{u}$ to the word vector $E^{u}=\{e_{1}^{u},e_{2}^{u},\ldots,e_{v}^{u}\}$ in its original order. Where $d$ is the embedded dimension of each word and $e_{i}^{u}$ represents the word vector of the i-th word in $s_{i}^{u}$ .

Figure 3.

The architecture of four attention modules: (a) Review aspect level attention layer, (b) Review level attention layer, (c) Word level attention layer and (d) Document aspect level attention layer.

Convolution layer

We use CNNs to extract multiaspect aspects of $E^{u}$ , comprising $k$ convolution filters. Each filter is a parameter matrix $f_{k}\in R^{\tau\times d}$ , where $\tau$ is the sliding window size. The convolutional result of filter $f_{k}$ over the i-th window can be formulated as:

$\displaystyle c_{i}^{u}=f_{k}\odot E_{(:,i:(i+\tau-1))}^{u}+b_{n},$ (1)

where $\odot$ is inner product and $E_{(:,i:(i+\tau-1))}^{u}$ is the slice of matrix $E^{u}$ from the i-th position to the $(i+\tau-1)-th$ position in the sliding window. Based on zero padding, the feature vector produced from $f_{k}$ is represented as follow:

$\displaystyle c_{k}^{u}=[c_{k,1}^{u},c_{k,2}^{u},\ldots,c_{k,v}^{u}],$ (2)

where $c_{k}^{u}\in R^{v\times 1}$ . Then, we stack the features generated by all filters $f_{k}$ to form a matrix, which is expressed as:

$\displaystyle C^{u}=[c_{1}^{u},c_{2}^{u},\ldots,c_{k}^{u}],$ (3)

where $C^{u}\in R^{v\times k}$ , and the i-th line of $C^{u}$ represents multiaspect features of the i-th word in the review.

Attention over review aspect level

We obtained the aspect feature vectors of all words through CNNs. However, as mentioned above, the word aspect features in user reviews are not all equally important compared to the aspect features of the target items. Therefore, we use the review aspect attention mechanism to learn the importance distribution of the user’s aspect features according to the aspect feature of the target item. The attention weight $\alpha$ of the aspect feature of the i-th word of a review is expressed as follows:

$\displaystyle\alpha_{i}^{*u}=\frac{\exp(V^{T}\tanh(W_{a}^{u}c_{i}^{u};q_{i}^{*% u}))}{\sum_{i=1}^{k}\exp(V^{T}\tanh(W_{a}^{u}c_{i}^{u};q_{i}^{*u}))},$ (4) $\displaystyle q_{i}^{*u}=\textit{ReLU}(f(E_{i}^{u}W_{b}^{u})+W_{c}^{u}i_{id}+b),$ (5)

where $W_{*}^{u}$ is the user attention parameter matrix (same as below) and $b$ is the bias. $q_{i}^{*u}$ is the combination of the attention vector at the word level and the fitting vector of the corresponding item ID and review text matrix $E_{i}^{u}$ . Then, by weighting the word aspect in the i-th review of the user, we obtain the following review representation:

$\displaystyle s_{i}^{*u}=[\alpha_{1}^{*u}c_{1}^{u},\alpha_{2}^{*u}c_{2}^{u},% \ldots,\alpha_{k}^{*u}c_{k}^{u}],$ (6)

Review level attention layer

Since we have obtained the weighted representation vectors of all reviews of the user, we can now explore how to aggregate them to construct user feature representation. The reviews written by the user are varied, and they always exhibit different preferences for different items. Therefore, the importance of different user reviews to the target item varies. As shown in Fig. 3b (The structure is consistent, just remove the stars in the symbol), we propose a review level attention mechanism to learn user representation. Given review feature set $S^{*u}=[s_{1}^{*u},s_{2}^{*u},\ldots,s_{m}^{*u}]$ , we calculate the weight $\alpha_{i}^{u}$ of the i-th review of the user as follows:

$\displaystyle\alpha_{i}^{u}=\frac{\exp(V^{T}\tanh(W_{d}^{u}s_{i}^{*u};q_{i}^{u% }))}{\sum_{i=1}^{m}\exp(V^{T}\tanh(W_{d}^{u}s_{i}^{*^{*}};q_{i}^{u}))}$ (7) $\displaystyle q_{i}^{u}=\textit{ReLU}(f(C^{u}W_{e}^{u})+W_{f}^{u}i_{id}+b),$ (8)

where $q_{i}^{u}$ is the combination of the weighted attention vector of the review level and the fitting vector of the corresponding item and all aspects matrix $C^{u}$ . Then, according to this weight, we aggregate all reviews of the user to obtain review level feature representation:

$\displaystyle r^{u}=\sum_{i=1}^{m}\alpha_{i}^{u}s_{i}^{*u}.$ (9)

4.2 Item representation network

Word level attention layer

Inspired by the work of [9], as shown in Fig. 3c, the purpose of this layer is to better combine document-level information and filter out unnecessary words.

Embedding look up layer

Given item document text $T^{i}$ , similar to the user embedding look-up layer, we can obtain $E^{i}=[e_{1}^{i},e_{2}^{i},\ldots,e_{l}^{i}]$ .

Attention over word level

There is considerable noise in the item document set, and every word within it is not equally important. Therefore, we use the word level attention mechanism to learn the importance distribution of words according to the characteristics of the item itself and the characteristics of the user associated with the target item.

The i-th word in our word matrix $E^{i}$ is used as the central word, and the filter is used as the sliding window. The words swept by the sliding window $e_{\omega,j}^{i}$ are those associated with the central word, where $\omega$ is the length of the sliding window. The attention weight $\alpha_{j}^{*i}$ of the i-th word can be calculated as follows:

$\displaystyle\alpha_{j}^{*_{i}}=\frac{\exp(V^{T}\tanh(W_{a}^{i}D_{k}^{T};q_{j}% ^{i}))}{\sum_{i=1}^{k}\exp(V^{T}\tanh(W_{a}^{i}D_{i}^{T};q_{j}^{i}))}$ (10) $\displaystyle D=E^{i}\cdot e_{\omega.j}^{i}$ (11) $\displaystyle e_{\omega,j}^{i}=\left(e_{j+(-\omega+1)/2}^{i},e_{j+(-\omega+3)/% 2}^{i},\ldots,e_{j}^{i},\ldots,e_{j+(\omega-3)/2}^{i},e_{j+(\omega-1)/2}^{i}\right)$ (12) $\displaystyle q_{j}^{i}=\textit{ReLU}(f(E^{i}W_{b}^{i})+W_{c}^{i}u_{id}+b),$ (13)

where $W_{*}^{i}$ is the user attention parameter matrix (same as below) and $q_{j}^{i}$ is the combination of the attention vector at word level and the fitting vector $D$ of the corresponding user and document aggregation matrix $E^{i}$ . According to the attention weight, word vector $e_{j}^{i}$ of the $j-th$ word can be computed as follows:

$\displaystyle\hat{e_{j}^{i}}=\alpha_{j}^{*i}e_{j}^{i},$ (14)

where $e_{j}^{i}$ denotes the word vectors of the $j-th$ word. The word vector matrix with word attention weights can be expressed as follows:

$\displaystyle\hat{E^{i}}=[\hat{e_{1}^{i}},\hat{e_{2}^{i}},\ldots,\hat{e_{l}^{i% }}],$ (15)

Document aspect level attention layer

An all-round search for the relationship between item aspect features and the user is helpful to build the item features that better satisfy the preferences of the target user. Since we have obtained the purified document information of the item, we can explore the relationship between the aspect features of the purified target item and the aspect features of the user’s purified review information, as shown in Fig. 3d.

Convolution layer

The process is similar to that of the user convolution layer. By inputting $\hat{E^{i}}$ into the convolution layer, we can obtain $c_{i}^{j}$ , $c_{k}^{i}$ and $C^{i}$ .

Attention over document aspect level

Then, we map the item and user features to the same feature space and calculate the Euclidean distance between the filtered item aspect features and filtered user features. The filtered user features are as follows:

$\displaystyle S^{*u}=[\alpha_{1}^{u}s_{1}^{*u},\alpha_{2}^{u}s_{2}^{*u},\ldots% ,\alpha_{m}^{u}s_{m}^{*u}],$ (16)

We use $k$ convolution filters to purify the aspect features to obtain the multiaspects features:

$\displaystyle c_{k}^{*u}=[c_{k,1}^{*u},c_{k,2}^{*u},\ldots,c_{k,mv}^{*u}],$ (17)

where $c_{k}^{*u}\in R^{mv}$ . Next, we stack all features and denote them as a matrix

$\displaystyle C^{*u}=[c_{1}^{*u},c_{2}^{*u},\ldots,c_{k}^{*u}],$ (18)

where $C^{*u}\in R^{l\times k}$ , and the i-th line of $C^{*u}$ represents multiaspect features of the i-th word in the user review. Specifically, we define a correlation scoring function $g_{j}^{*i}$ , which explores the Euclidean distance between the filtered item review features and the filtered user review features mapped to the same feature space. The correlation score can be calculated as follows:

$\displaystyle g_{j}^{*i}=\frac{1}{|f(c_{j}^{i}W_{d}^{i})-f(c_{i}^{*u}W_{e}^{i}% )|+\epsilon},$ (19)

The weight distribution of the features of the $j-th$ document aspect features is as follows:

$\displaystyle\alpha_{j}^{i}=\frac{\exp(g_{j}^{*i})}{\sum_{j=1}^{i}{\exp(g_{j}^% {*i})}},$ (20)

Then, we obtain the aspect-level feature $r^{i}$ of item $i$ by aggregating all aspects according to their weights:

$\displaystyle r^{i}=\sum_{j}^{k}\alpha_{j}^{i}c_{j}^{i}.$ (21)

4.3 Fusion and prediction layer

Because some users or items have few reviews, we combine the ID embedding vectors of the user and item into their final feature representation with the following formula:

$\displaystyle u=r^{u}\oplus u_{id}$ (22) $\displaystyle i=r^{i}\oplus i_{id},$ (23)

where $\oplus$ is the concatenation operator, $u$ is the final user feature representation and $i$ is the final target item feature representation. We combine these two features as follows:

$\displaystyle z=\delta[u,i],$ (24)

where $z$ denotes the user-item interaction, $\delta$ is the activation function, and [, ] combines the two features by concatenating them in the latent layer. Inspired by [22], we utilize NFM to capture the high-order nonlinear interaction of features, and the objective function is expressed as follows:

$\displaystyle\hat{r}_{u,i}(z)=m_{0}+\sum_{j=1}^{|z|}m_{j}z_{j}+\Gamma(f(z))$ (25) $\displaystyle f(z)=\frac{1}{2}\left[\left(\sum_{j=1}^{|z|}z_{j}v_{j}\right)^{2% }-\left(\sum_{k=1}^{|z|}z_{k}v_{k}\right)^{2}\right]$ (26) $\displaystyle\Gamma(f(z))=h^{T}\delta_{L}(W_{L}(\cdots\delta_{1}(W_{1}f(z)+b_{% 1})\cdots)+b_{L}),$ (27)

where $\hat{r}_{u,i}(z)$ represents the predicted value, $m_{o}$ is the global bias, $m_{j}$ is the coefficient for the latent vector, $z_{j}\in z$ is the value of the feature $j$ , and $f(z)$ represents the high-order feature interaction, $z_{j}$ , $z_{k}\in z$ denotes the $j-th$ and $k-th$ user-item feature vector, $v_{j}$ , $v_{k}\in R^{s}$ denotes the embedding features, $s$ is the embedding dimension, $\theta=\{m_{o},\{m_{j},v_{j}\},h,\{W_{L},b_{L}\}\}$ are model parameters, and $\delta_{L}$ is the ReLU activation function. $\{W_{L},b_{L}\}$ denotes a high-order interactive feature parameter that can be learned, and $L$ is the number of MLP.

To train the parameters of the AMLIAN model, We utilize the regression with squared loss as the objective function.

$\displaystyle J=\sum_{(u,i)\in R}\left(\hat{r}_{u,i}-r_{u,i}\right)+\lambda_{% \theta}\|\theta\|^{2},$ (28)

Where $R$ denotes the user-item rating matrix, $r_{u,i}$ denotes the real value of the user $u$ for the item $i$ , $\hat{r}_{u,i}$ is the prediction rating, and $\theta$ denotes the AMLIAN model parameters. As a regularization term, $\lambda_{\theta}\|\theta\|^{2}$ is used to adjust the fitting degree of the model.

4.4 Time complexity analysis

For the review aspect level attention layer, the time complexity value is $O(\textit{dvk}_{u})$ , where $d$ is the embedding dimension of each word, $v$ is the number of words in each review, and $k$ is the number of aspect features in the review. For the review level attention layer, the time complexity value is $O(\textit{dvm})$ , where $m$ is the number of user reviews. For the word-level attention layer, the time complexity value is $O(dl)$ , where $l$ is the number of words in the document. For the document aspect level attention layer, the time complexity value is $O(dk_{i})$ , where $k_{i}$ is the number of aspect features in the document. For the CNNs layer, the time complexity value is $O(\textit{dnp}(vm+l))$ , where $n$ is the number of user and item reviews and $p$ is the extracted feature vector dimension. For the feature interaction layer, the time complexity of the feature interaction is $O(d(M+N))$ , in which $M$ and $N$ denote the number of user and item contextual features, respectively. The complexity of the prediction layer is denoted as $O(d_{L})$ . The overall time complexity for evaluating an NFM is $O(kN_{x}+\sum_{l-1}^{L}d_{l-1}d_{l})$ , where $N_{x}$ denotes the number of nonzero entries in the input feature vector, $k$ is the size of the embedding vector, and $d_{l}$ is the interaction feature dimension of the $(l-1)-th$ hidden layer. Thus, the total time complexity value is $O(d(vk_{u}+vm+k_{i}+nplvm+M+N)+kN_{x}+\sum_{l}^{L}d_{l-1}d_{l})$ . We can see that the time complexity of the proposed AMLIAN model is mainly related to the dimension and quantity of the latent features.

5. Experiments

To comprehensively evaluate the performance of our proposed AMLIAN model, we conduct experiments to answer following question:

(RQ1) How does AMLIAN model our proposed compare to the state-of-the-art recommendation models?

(RQ2) How do some model hyper-parameters affect the AMLIAN?

(RQ3) How do four attention layers our proposed above affect the experimental results?

5.1 Datasets

We conduct experiments based on Amazon product data,1 including user reviews and ratings. Furthermore, we choose the 5-core version, for which all users and items have at least 5 reviews. To alleviate the long tail effect, we adopt the same preprocessing method as [20] to adjust the length of reviews. The statistical information of these datasets is presented in Table 1.

Table 1
Statistics of datasets used in this paper

Dataset	#Users	#Items	#Ratings&Reviews	Sparsity
Office products	4,905	2,420	53,237	0.448%
Digital music	5,541	3,568	64,706	0.327%
Baby	19,445	7,050	160,732	0.117%
Toys and games	19,412	11,924	167,597	0.072%
Cell phones and accessories	27,879	10,429	194,493	0.067%

Note that we randomly divide each dataset into a training set (80%), verification set (10%) and test set (10%). The comparison results of the models come from the test set.

5.2 Evaluation metric

To evaluate the performance of the proposed algorithm, we use the mean absolute error (MAE) and mean square error (MSE) as standard metrics.

$\displaystyle\textit{MAE}=\frac{1}{N}\sum_{(u,i)\in R}|r_{u,i}-\hat{r}_{u,i}|,$ (29) $\displaystyle\textit{MSE}=\frac{1}{N}\sum_{(u,i)\in R}(r_{u,i}-\hat{r}_{u,i})^% {2},$ (30)

where $r_{u,i}$ denotes the actual rating, $\hat{r}_{u,i}$ denotes the prediction rating value, and $N$ denotes the number of tested ratings. For fair comparison, we repeat each experiment five times and report mean ( $\pm$ std) for model comparison.

5.3 Baselines

To evaluate the performance of the proposed model, ten baselines are selected, including two classical methods: PMF [29] and HFT [30]; three document-level baseline methods: DeepCoNN [1], CARL [24], and DAML [22]; five review-level baseline methods: NARRE [20], NRPA [25], TAERT [26], EDMF $+$ [27], and ARPCNN [31]; and one mixed document- and review-level method: NRCA [19].

1. PMF [29] uses a Gaussian distribution to construct the potential features of users and items. Then, a priori hypothesis user and item features are applied to predict the corresponding rating.

2. HFT [30] is a classic recommendation algorithm based on reviews. It uses the LDA method to obtain the review topic distribution of users and items. Then, the potential features are learned through topic distribution to make rating predictions.

3. DeepCoNN [1] utilizes two parallel CNNs to extract the representative features of users and items from their reviews and then combines these two features and feeds them into a factorization machine to generate the rating prediction.

4. NARRE [20] explores CNNs and an attention mechanism to extract the feature representations of users and items, and the embedded values of the user and the item ID are integrated into them. Finally, the predicted values are generated by the latent factor model.

5. CARL [24] utilizes CNNs and a specific attention mechanism to derive the features of user-item pairs and then introduces a dynamic linear fusion mechanism to make predictions.

6. NRPA [25], when constructing personalized representations of users and items, first considers the importance of the word level of users and items through an attention mechanism and then considers the attention of the review level.

7. DAML [22] uses local and global attention layers to learn the dynamic interactive features of users and item reviews. Then, ratings are fused into features to capture users’ preferences.

8. NRCA [19] explores three encoders to select different information words and reviews for users and target items from the document and review levels, respectively, to form a comprehensive representation of users and items.

9. TAERT [26] uses a temporal convolutional network to extract the feature representation of user and item reviews. Then, three related attention mechanisms are used to optimize user and item features to generate the final feature representation and interpretability.

10. EDMF $+$ [27] uses CNNs and a word-level attention mechanism to capture features of users and item reviews. Given that review information is a sparse feature, the L0 norm is then used to confine the review.

11. ARPCNN [31] uses two parallel CNNs combined with the personalized word-level attention mechanism and the personalized review-level attention mechanism to process user reviews and item reviews. In addition, the personalized attention mechanism is used to deal with the extended latent factor. Finally, these features are fused to generate rating prediction.

5.4 Parameter setting

The parameters of the baseline methods are selected based on the setting strategies reported in past papers. For the AMLIAN model we propose, learning rates of [0.00001, 0.00002, 0.0005, 0.006, 0.06] are examined. The dropout ratio range is explored within [0.1, 0.2, 0.3, 0.4, 0.5] to avoid overfitting. The size of the training batch is tested within [32, 64, 128, 256], and the dimension of latent factors is changed with in [25,50,100,150,200]. By adjustment, the learning rate is set to 0.006, the dimension of latent features is 50, the batch size is 128, and its value is 64 for the word embedding size. The number of convolution filters is set to 100, and the sliding size is set to 3. The output dimension of the CNNs is set to 50. The vocabulary value is set to 50,000, the maximum length of the input text is 1,000, the regularization parameter is 0.0009, and the dimension of the latent feature for User u and Item i is set to 8.

5.5 Performance comparison (RQ1)

The performance of our proposed AMLIAN model and baselines on the task of rating prediction is shown in Tables 2 and 3. We can infer the following conclusions from Tables 2 and 3:

Table 2
Performance comparison of five datasets for all methods by MAE. Boldface and underlining are used to highlight the top two results. $\Delta$ % indicates how much better AMLIAN performed than the best baseline. All the results are reported as “mean ( $\pm$ std)” across 5 random runs

MAE	Office products	Digital music	Baby	Toy and games	Cell phone and accessories
PMF	0.8327 $\pm$ 3e-4	0.8669 $\pm$ 2e-4	1.1054 $\pm$ 4e-4	0.8541 $\pm$ 4e-4	1.1125 $\pm$ 3e-4
HFT	0.8124 $\pm$ 4e-4	0.8577 $\pm$ 5e-4	1.1121 $\pm$ 7e-4	0.8485 $\pm$ 6e-4	1.0212 $\pm$ 5e-4
DeepCoNN	0.6277 $\pm$ 8e-4	0.6634 $\pm$ 2e-3	0.8145 $\pm$ 3e-3	0.6468 $\pm$ 2e-3	0.8993 $\pm$ 1e-3
CARL	0.6563 $\pm$ 2e-3	0.7603 $\pm$ 3e-3	0.8681 $\pm$ 6e-3	0.7409 $\pm$ 2e-3	0.9115 $\pm$ 1e-3
DAML	0.6187 $\pm$ 2e-3	0.6679 $\pm$ 2e-3	0.8312 $\pm$ 1e-3	0.6703 $\pm$ 1e-3	0.9255 $\pm$ 1e-3
NARRE	0.6263 $\pm$ 2e-3	0.6677 $\pm$ 1e-3	0.8417 $\pm$ 1e-3	0.6482 $\pm$ 1e-3	0.8845 $\pm$ 1e-3
NRPA	0.6213 $\pm$ 3e-3	0.6458 $\pm$ 2e-3	0.8081 $\pm$ 1e-3	0.6315 $\pm$ 2e-3	0.8716 $\pm$ 1e-3
NRCA	0.6145 $\pm$ 2e-3	0.6396 $\pm$ 4e-3	0.7818 $\pm$ 3e-3	0.6263 $\pm$ 4e-3	0.8624 $\pm$ 2e-3
TAERT	0.6105 $\pm$ 1e-3	0.6331 $\pm$ 2e-3	0.7793 $\pm$ 2e-3	0.6156 $\pm$ 2e-3	0.8597 $\pm$ 1e-3
EDNF $+$	0.5622 $\pm$ 5e-4	0.6180 $\pm$ 2e-3	0.7232 $\pm$ 1e-3	0.5746 $\pm$ 7e-4	0.7437 $\pm$ 1e-3
ARPCNN	0.5756 $\pm$ 2e-3	0.6133 $\pm$ 4e-4	0.7318 $\pm$ 5e-4	0.5789 $\pm$ 1e-3	0.7391 $\pm$ 1e-3
AMLIAN	0.5269 $\pm$ 2e-3	0.5896 $\pm$ 2e-3	0.7080 $\pm$ 7e-4	0.5592 $\pm$ 6e-4	0.7178 $\pm$ 4e-4
$\Delta$ %	6.28	3.86	2.10	2.68	2.88

Table 3

Performance comparison of five datasets for all methods by MSE. Boldface and underlining are used to highlight the top two results. $\Delta$ % indicates how much better AMLIAN performed than the best baseline. All the results are reported as “mean ( $\pm$ std)” across 5 random runs

MSE	Office products	Digital music	Baby	Toy and games	Cell phone and accessories
PMF	0.9425 $\pm$ 4e-4	1.0523 $\pm$ 3e-4	1.3126 $\pm$ 5e-4	0.9192 $\pm$ 5e-4	1.7454 $\pm$ 4e-4
HFT	0.9321 $\pm$ 5e-4	1.0145 $\pm$ 5e-4	1.3154 $\pm$ 1e-3	0.9213 $\pm$ 1e-3	1.7037 $\pm$ 6e-4
DeepCoNN	0.7444 $\pm$ 1e-1	0.8111 $\pm$ 2e-3	1.1783 $\pm$ 3e-3	0.8160 $\pm$ 1e-3	1.4687 $\pm$ 2e-3
CARL	0.7429 $\pm$ 3e-3	0.9214 $\pm$ 3e-3	1.2328 $\pm$ 4e-3	0.9132 $\pm$ 2e-3	1.5163 $\pm$ 2e-3
DAML	0.7384 $\pm$ 3e-3	0.8514 $\pm$ 3e-3	1.2374 $\pm$ 2e-3	0.8795 $\pm$ 2e-3	1.3976 $\pm$ 2e-3
NARRE	0.7188 $\pm$ 3e-3	0.8010 $\pm$ 2e-3	1.1807 $\pm$ 2e-3	0.8239 $\pm$ 2e-3	1.3731 $\pm$ 2e-3
NRPA	0.7049 $\pm$ 3e-3	0.8223 $\pm$ 4e-3	1.1618 $\pm$ 4e-3	0.8234 $\pm$ 3e-3	1.3431 $\pm$ 2e-3
NRCA	0.6912 $\pm$ 3e-3	0.7954 $\pm$ 4e-3	1.1591 $\pm$ 4e-3	0.7811 $\pm$ 2e-3	1.3386 $\pm$ 2e-3
TAERT	0.7117 $\pm$ 2e-3	0.7859 $\pm$ 3e-3	1.1524 $\pm$ 3e-3	0.7765 $\pm$ 6e-4	1.3354 $\pm$ 2e-3
EDNF $+$	0.6598 $\pm$ 1e-3	0.7912 $\pm$ 3e-3	1.1560 $\pm$ 6e-4	0.7541 $\pm$ 2e-3	1.2794 $\pm$ 2e-3
ARPCNN	0.7044 $\pm$ 3e-3	0.7767 $\pm$ 1e-3	1.1425 $\pm$ 2e-3	0.7411 $\pm$ 1e-3	1.2673 $\pm$ 2e-3
AMLIAN	0.6180 $\pm$ 2e-3	0.7484 $\pm$ 3e-3	1.0999 $\pm$ 1e-3	0.7149 $\pm$ 1e-3	1.2103 $\pm$ 1e-3
$\Delta$ %	6.34	3.64	3.73	3.54	4.50

PMF has the worst performance of the algorithms. The user-item interaction matrix of these datasets is fairly sparse. PMF is outperformed by HFT. The topic distribution of the user and item evaluations may have been used to identify potential features.

HFT is outperformed by the majority of review-based deep learning methods. This demonstrates that reviews are a crucial information source for improving the effectiveness of recommendations. For example, DeepCoNN extracts the semantic features of review information, which helps alleviate data sparseness and generate better recommendations.

Regardless of document- or review-level methods, attention-based approaches (e.g., NARRE, CARL, NRPA, DAML, and NRCA) typically outperform methods without attention (e.g., DeepCoNN). This is due to the possibility of noise from words or sentences in reviews or documents, which is detrimental to the learning of the user or item feature. The attention mechanism can assist the model in choosing more informative words or reviews.

Among document-based baseline methods, DAML performs better than CARL, and the reason for its good performance lies more in NFM, which ensures that the extracted user-item interaction can be fully fitted.

Among review-based baseline methods, NRPA outperforms NARRE because it considers not only the importance of each review but also the importance of the words included in each review. TAERT typically outperforms NARRE. One reason might for this be the relatively thorough use of the attention mechanism by TAERT at all review levels. EDMF $+$ outperforms TAERT in terms of prediction accuracy. Because review information is a sparse feature, the model constrains review information using the L0 norm to provide a more reasonable feature. ARPCNN generally outperforms EDMF $+$ . This indicates that choosing more appropriate attention mechanisms according to the feature attributes is more beneficial to the performance of the recommender system.

We find no observable distinction between the review-level method and document-level method used by the application alone. Our model applies these two methods according to the characteristics of user and item reviews. We observe that AMLIAN improves MAE and MSE by 3.56% and 4.35% on average, respectively, compared to the best baseline. This result validates the effectiveness of our method and denotes the significance of using the review-level method for the user and using the document-level method for the item, which facilitates learning of more accurate feature representation.

5.6 Hyperparameter sensitivity analysis (RQ2)

We demonstrate our examination of the parameters of the validation sets in this section. The MAE is employed as an evaluation index for presentation.

Figure 4.

The impact of the ID embedding dimension.

The user and item generate rating feature representations through the ID embedding layer. Therefore, we study the effect of various ID embedding dimensions on the AMLIAN model. As demonstrated in Fig. 4, the MAE initially decreases before increasing after it reaches its optimal value as the ID embedding dimension steadily increases. When the dimension is too small, the rating feature cannot accurately represent the variety of the user and item. However, overfitting can occur if the dimension is too large. Figure 4 demonstrates that 50 is the ideal choice for the ID embedding dimension.

Figure 5.

The influence of the number of CNN filters.

CNNs are utilized to extract user and item review feature representations; hence, this paper investigates how the recommendation effect is affected by various CNN filter counts. The model’s performance steadily improves with the number of CNN filters, according to the experimental findings displayed in Fig. 5. As the number of filters increases, the performance tends to remain stable. As a result, in our experiments, we choose 50 as the CNN filter number.

Figure 6.

The impact of the number of MLP layers in the NFM structure.

As the number of MLP layers in the NFM structure increases, the MAE value starts to gradually rise, as shown in Fig. 6. This illustrates how having too many layers may result in overfitting. As a result, we use two MLP layers in the NFM.

5.7 Ablation experiments (RQ3)

In this section, we analyze four attention layers, review aspect level attention (U1), review level attention (U2), word level attention (I1) and document aspect level attention (I2), and verify their impact on recommendation performance. The outcomes of the experiment are displayed in Table 4. The findings of the experiment show the following:

Table 4
Impact of the attention layers. No_U1: the combination of U2, I1 and I2 attention. No_U2: the combination of U1, I1 and I2 attention. No_I1: the combination of U1, U2 and I2 attention. No_I2: the combination of U1, U2 and I1 attention. No_A: no attention layers. All_review: User-items use review-level data. All_document: User-items use document-level data

MAE	Office products	Digital music	Baby	Toy and games	Cell phone and accessories
AMLIAN	0.5269 $\pm$ 2e-3	0.5896 $\pm$ 2e-3	0.7080 $\pm$ 7e-4	0.5592 $\pm$ 6e-4	0.7178 $\pm$ 4e-4
No_A	0.5472 $\pm$ 2e-3	0.6148 $\pm$ 3e-3	0.7390 $\pm$ 2e-3	0.5821 $\pm$ 3e-3	0.7459 $\pm$ 3e-3
All_review	0.5344 $\pm$ 2e-3	0.5977 $\pm$ 2e-3	0.7203 $\pm$ 3e-3	0.5757 $\pm$ 3e-3	0.7279 $\pm$ 2e-3
All_doc	0.5365 $\pm$ 2e-3	0.5934 $\pm$ 2e-3	0.7226 $\pm$ 3e-3	0.5723 $\pm$ 2e-3	0.7213 $\pm$ 2e-3
No_I1	0.5379 $\pm$ 3e-3	0.6065 $\pm$ 2e-3	0.7299 $\pm$ 3e-3	0.5762 $\pm$ 2e-3	0.7342 $\pm$ 1e-3
No_I2	0.5359 $\pm$ 2e-3	0.6046 $\pm$ 2e-3	0.7275 $\pm$ 3e-3	0.5785 $\pm$ 2e-3	0.7313 $\pm$ 1e-3
No_U1	0.5344 $\pm$ 3e-3	0.6033 $\pm$ 2e-3	0.7200 $\pm$ 3e-3	0.5698 $\pm$ 2e-3	0.7311 $\pm$ 1e-3
No_U2	0.5307 $\pm$ 3e-3	0.5944 $\pm$ 2e-3	0.7217 $\pm$ 2e-3	0.5742 $\pm$ 3e-3	0.7248 $\pm$ 1e-3

No_A is compared to other networks with attention layers. No_A shows the worst experimental results. This proves the effectiveness of the combination of the four attention mechanisms we designed. Next, we analyze the effects of these attention mechanisms on the performance of the whole model in more detail.

All_review and All_document are compared to other methods that use different types of data. On the one hand, we find that replacing user review data at the document level or replacing item text data at the review level results in decreased performance for all datasets. On the other hand, All_review and All_document have similar results. This proves that using document-level or review-level features for the user or item at the same time has little impact on recommendation performance. Furthermore, using document-level and review-level features can further improve performance, which is effective for using corresponding data types according to the respective characteristics of the user and item.

We also compare No_U1, No_U2, and No_I1, No_I2, which contain other three attention mechanisms. We draw the following conclusions. 1) No_I1 performs the worst on the evaluation metrics. This demonstrates that I1 can help select more informative words to reduce noise disturbance at the document level. 2) No_I2 is inferior to No_I1 with regard to MAE and MSE. The I2 attention layer is able to improve recommendation performance by identifying relevant information for user-item pairs. I2 can focus on useful review aspects related to target users. 3) No_U1 shows poor performance on evaluation metrics. This proves that U1 can help select more informative word aspects from user reviews. 4) No_U2 is superior to the other methods. We find that when the attention mechanism is applied, the performance of rating prediction is improved significantly. This justifies our assumption that the usefulness of reviews varies, and different reviews should have different representations of user preferences and item features. Moreover, our U2 can learn this representation well and lead to better performance by the recommender algorithm.

6. Conclusion and future work

In this paper, we propose a novel neural recommendation method that combines review-level and document-level features of a user and item, which can extract corresponding representation features according to the characteristics of a user and item. Review-level features are learned within a user representation network that includes review aspect level attention and review level attention layers. Document-level features are learned within an item representation network that includes word level attention and document aspect level attention layers. This could help capture more accurate feature representation according to the characteristics of a user and item. Experiments on five real-world datasets from Amazon demonstrate that our method can consistently outperform existing state-of-the-art methods. In the future, we will consider adding the time dimension, which will help in considering a user’s long- and short-term interests such that the recommender system can better meet the user’s preferences.

Footnotes

Available at: http://jmcauley.ucsd.edu/data/amazon/links.html.

Acknowledgments

This work is supported by Tianjin “Project $+$ Team” Key Training Project under Grant No. XC202022.

References

Zheng

Noroozi

and Yu

P.S.

, Joint deep modeling of users and items using reviews for recommendation, in: WSDM, 2017, pp. 425–434.

W.-D.

Huang

Wang

C.-D.

Zheng

Y.-Y.

and Lai

J.-H.

, Deep rating and review neural network for item recommendation, IEEE Transactions on Neural Networks and Learning Systems 33(11) (2021), 6726–6736.

Schafer

J.B.

Frankowski

Herlocker

and Sen

, Collaborative filtering recommender systems, in: The adaptive web, Springer, 2007, pp. 291–324.

Koren

Bell

and Volinsky

, Matrix factorization techniques for recommender systems, Computer 42(8) (2009), 30–37.

Liu

Yuan

and Ma

, A multi-task dual attention deep recommendation model using ratings and review helpfulness, Applied Intelligence 52(5) (2022), 5595–5607.

Shen

Liu

Zhang

Liu

and Xiong

, Deep variational matrix factorization with knowledge embedding for recommendation system, IEEE Transactions on Knowledge and Data Engineering 33(5) (2019), 1906–1918.

Wang

Chen

Wang

Chen

Liu

and Gong

, Trust-enhanced collaborative filtering for personalized point of interests recommendation, IEEE Transactions on Industrial Informatics 16(9) (2019), 6124–6132.

Dong

and Smyth

, Coevolutionary recommendation model: Mutual learning between ratings and reviews, in: WWW, 2018, pp. 773–782.

Tay

Luu

A.T.

and Hui

S.C.

, Multi-pointer co-attention networks for recommendation, in: SIGKDD, 2018, pp. 2309–2318.

10.

Liu

Zhang

Lin

Fang

and Xiong

N.N.

, CARM: Confidence-aware recommender model via review representation learning and historical rating behavior in the online platforms, Neurocomputing 455 (2021), 283–296.

11.

Wang

Zhang

and Deng

, Came: Content-and context-aware music embedding for recommendation, IEEE Transactions on Neural Networks and Learning Systems 32(3) (2020), 1375–1388.

12.

Xia

Wang

Zhang

and Chen, Leveraging ratings and reviews with gating mechanism for recommendation, in: CIKM, 2019, pp. 1573–1582.

13.

Guan

Cheng

Zhang

Zhu

Peng

and Chua

T.-S.

, Attentive aspect modeling for review-aware recommendation, TOIS 37(3) (2019), 1–27.

14.

Huang

Luo

and Wu

, Semi-supervised Factorization Machines for Review-Aware Recommendation, in: DASFAA, 2021, pp. 85–99.

15.

Yang

Xiao

Zheng

Jiao

Zhu

Sun

and Liu

, MAN: Main-auxiliary network with attentive interactions for review-based recommendation, Applied Intelligence, 2022, 1–16.

16.

Deng

Wang

Zhang

and Wang

, Lightgcn: Simplifying and powering graph convolution network for recommendation, in: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020, pp. 639–648.

17.

Xiao

Liu

Zheng

Wang

and Hsu

C.-H.

, A feature interaction learning approach for crowdfunding project recommendation, Applied Soft Computing 112 (2021), 107777.

18.

Liu

Xiao

Zheng

and Hsu

C.-H.

, Siga: Social influence modeling integrating graph autoencoder for rating prediction, Applied Intelligence 53(6) (2023), 6432–6447.

19.

Liu

Wang

Peng

and Jiao

, Neural Unified Review Recommendation with Cross Attention, in: SIGIR, 2020, pp. 1789–1792.

20.

Chen

Zhang

Liu

and Ma

, Neural attentional rating regression with review-level explanations, in: WWW, 2018, pp. 1583–1592.

21.

Seo

Huang

Yang

and Liu

, Interpretable convolutional neural networks with dual local and global attention for review rating prediction, in: RecSys, 2017, pp. 297–305.

22.

Liu

and Chang, Daml: Dual attention mutual learning between ratings and reviews for item recommendation, in: SIGKDD, 2019, pp. 344–352.

23.

Chin

J.Y.

Zhao

Joty

and Cong

, ANR: Aspect-based neural recommender, in: CIKM, 2018, pp. 147–156.

24.

Quan

Wang

Zheng

and Luo

, A context-aware user-item representation learning for item recommendation, TOIS 37(2) (2019), 1–29.

25.

Liu

Wang

Jiao

and Xie

, NRPA: neural recommendation with personalized attention, in: SIGIR, 2019, pp. 1233–1236.

26.

Guo

Wang

Yuan

Huang

Chen

and Wang

, TAERT: Triple-attentional explainable recommendation with temporal convolutional network, Information Sciences 567 (2021), 185–200.

27.

Liu

Zheng

Shen

Lin

Wang

and Zhang, EDMF: Efficient deep matrix factorization with review feature learning for industrial recommender system, IEEE Transactions on Industrial Informatics 18(7) (2021), 4361–4371.

28.

Pennington

Socher

and Manning

, GloVe: Global Vectors for Word Representation, in: EMNLP, 2014, pp. 1532–1543.

29.

Mnih

and Salakhutdinov

R.R.

, Probabilistic matrix factorization, Advances in Neural Information Processing Systems 20 (2007).

30.

McAuley

and Leskovec

, Hidden factors and hidden topics: understanding rating dimensions with review text, in: RecSys, 2013, pp. 165–172.

31.

Chen

Deng

Liu

and Liu

, ARPCNN: Auxiliary Review-Based Personalized Attentional CNN for Trustworthy Recommendation, IEEE Transactions on Industrial Informatics 19(1) (2022), 1018–1029.

Asymmetric multilevel interactive attention network integrating reviews for item recommendation

Abstract

Keywords

1. Introduction

2.1 Document-level methods

2.2 Review-level methods

3. Problem formalization

4. The proposed AMLIAN model

Review aspect level attention layer

Embedding look up layer

Convolution layer

Attention over review aspect level

Review level attention layer

Word level attention layer

Embedding look up layer

Attention over word level

Document aspect level attention layer

Convolution layer

Attention over document aspect level

5. Experiments

5.1 Datasets

Table 1 Statistics of datasets used in this paper

5.4 Parameter setting

5.5 Performance comparison (RQ1)

Table 2 Performance comparison of five datasets for all methods by MAE. Boldface and underlining are used to highlight the top two results. Δ % indicates how much better AMLIAN performed than the best baseline. All the results are reported as “mean ( ± std)” across 5 random runs

Footnotes

Acknowledgments

References

Table 1
Statistics of datasets used in this paper

Table 2
Performance comparison of five datasets for all methods by MAE. Boldface and underlining are used to highlight the top two results. $\Delta$ % indicates how much better AMLIAN performed than the best baseline. All the results are reported as “mean ( $\pm$ std)” across 5 random runs