Next point-of-interest recommendation by sequential feature mining and public preference awareness

Abstract

With the widespread of location-based social networks (LBSNs), the amount of check-in data grows rapidly, which helps to recommend the next point-of-interest (POI). Extracting sequential patterns from check-in data has become a meaningful way for next POI recommendation, since human movement exhibits sequential patterns in LBSNs. However, due to the check-ins’ sparsity problem, exploiting sequential patterns in next POI recommendation is a challenging issue, which makes the learned sequential patterns unreliable. Inspired by the fact that auxiliary information can be incorporated to alleviate this situation, in this paper, we model sequential transition based on both item-wise check-in sequences and region-wise spatial information. Besides, we propose an attention-aware recurrent neural network (ATTRNN) to learn the contribution of different time steps. Furthermore, considering users’ decision-making is influenced by public’s common preference to some extent, we design a novel framework, namely HSP (short for “Hybrid model based on Sequential feature mining and Public preference awareness”), to recommend POIs for a given user. We conduct a comprehensive performance evaluation for HSP on two real-world datasets. Experimental results demonstrate that compared to other state-of-the-art techniques, the proposed HSP achieves significantly improvements.

Keywords

Point-of-interest recommendation sequential pattern public preference

1 Introduction

Recent years have witnessed the increasing popularity of location-based social networks (LBSNs), such as Foursquare, Gowalla and Yelp. These platforms help users to explore the surrounding environ-ment and share life experiences in the form of check-ins in the physical world. The considerable check-in data provides a great opportunity to better analyze users’ mobility behavior and preferences, based on which next point-of-interest (POI) recommendation becomes valuable. Therefore, developing recommender systems for LBSNs has recently attracted increased research attention.

Compared to the general recommendation tasks (e.g., movie recommendation), next POI recommendation is highly dependent on the locations of spatial items [1]. Some existing next POI recommendation methods [2, 3] focus on exploring the influence of combination factors for improving the recommendation accuracy. Obviously, auxiliary factors include spatial proximity of locations, temporal context and auxiliary meta-data information (such as type and social relationship) have influence on users’ movement behaviors. Nevertheless, from recent research on human mobility [4], it has been observed that human movement exhibits sequential patterns, and successive check-ins are usually correlated (as shown in Figure 1(a)). For example, some users always check-in at POIs around restaurants after shopping malls since they would like to have something after shopping. Furthermore, according to the analysis [5] of successive check-ins’ transitive patterns on two real-life datasets, Foursquare and Gowalla, the distribution of transition probabilities between spatial items is nonuniform, which verifies users’ sequential patterns. Therefore, it is necessary to model the sequential patterns of users’ check-ins for next POI recommendation.

Fig. 1

An example of sequential pattern and public preference effects.

However, due to data sparsity problem, it is highly challenging to utilize sequential factor for next POI recommendation. Study in [6] analyzed the check-in records collected from Gowalla. It is obvious from the results that the data density that can be utilized for next POI recommendation is usually about 0.1%, which indicates that the check-in frequency of LBSN data is low. Since it is more difficult to learn sequential patterns based on sparse check-in sequences, next POI recommendation is sensitive to the data sparsity.

In order to solve the above challenges, earlier methods for next POI recommendation explore users’ sequential patterns by Markov Chain [7]. However, existing Markov Chain methods have limitations due to high computational complexity and difficulty of capturing longer dependencies. Hence, recurrent neural network (RNN) based approaches [8] are exploited to model users’ mobility patterns because of their ideal effect of capturing longer sequential context. Afterwards, some methods [9] even combine auxiliary factors into RNN to mitigate the data sparsity problem. However, none of the previous sequential recommendation methods consider the powerful impact of region-wise sequential patterns and public’s common preference.

The region-wise sequences are check-in sequences at the region level (as shown in Figure 1(b)). While users’ sequential transition on precise POIs has great randomness, their regional transitions often shows stronger sequential dependencies. For instance, we assume that a user wants to have dinner in a restaurant near home after work. The movement from the office to a specific restaurant tends to be unreliable. On the contrary, the region-to-region movement (e.g., from the user’s working area to the user’s living area) presents a more stable sequential pattern. Furthermore, users usually refer to the public preference before deciding to visit a spatial item. For example, as shown in Figure 1(c), when choosing a restaurant for dinner, a user is likely to prefer a local popular restaurant. A recommendation model that only exploits sequential transitions may not produce ideal results. Therefore, integrating sequential transition patterns and public preference in a unified model can provide better recommendation performance for next POI recommendation.

To tackle the problems mentioned above, we propose a hybrid model based on sequential feature mining and public preference awareness.

Our primary contributions can be summarized as follows.

We model sequential transitions based on both item-wise check-in sequences and region-wise spatial information which can address the data sparsity problem. Furthermore, an attention-aware recurrent neural network, named ATTRNN, is designed to model the contribution of different time steps.

We propose a novel framework for next POI recommendation, called HSP. HSP embeds two features (i.e., sequential transition pattern and public’s common preference) and combines them together for the prediction task.

We evaluate the proposed method by detailed experiments on two real-life datasets. Experimental results demonstrate the competitiveness of HSP, outperforming state-of-the-art next POI recommendation techniques. We further measure the impact of different components in HSP on the overall performance through ablation tests.

The rest of this paper is organized as follows. In Section 2, we review the related work. In the following section, we introduce the preliminary study on our method. We describe our HSP model in Section 4. Next, we report our empirical study in Section 5. Finally, we conclude this paper in Section 6.

2 Related work

In this section, we review previous studies related to next POI recommendation, including general POI recommendation and sequential POI recommendation.

2.1 General POI recommendation

POI recommendation has recently attracted widespread attention in academia. Exploiting only check-in records often results in poor performance due to the sparsity of users’ check-ins. Therefore, combination factors are exploited in some methods due to auxiliary information can be incorporated to alleviate this situation.

Since there is a strong correlation between users’ activities and geographical distance, some existing studies focus on exploiting spatial information [10]. Feng et al. [5] model the geographical distance between POIs and users. Wang et al. [1] incorporate spatial influence, spatial susceptibility, and distance into the model to learn the POI-specific geographical influence. Ye et al. [11] propose a spatial feature based Poisson Factor Model to jointly capture spatial preferences and user preferences. Feng et al. [4] design a model that places nearby POIs in the same cluster due to the high correlation between them. Sarwat et al. [12] propose a method based on region-wise popular location calculations using coarse-grained spatial information. Li et al. [13] directly calculate the attraction of a target POI by considering the attraction of its geographical neighbors.

In addition, other factors have been utilized to facilitate the recommendation performance [14]. Temporal context is an important factor in POI recommendation. Gao et al. [15] design a novel model based on latent factors, in which temporal influence and various types of contexts are introduced to capture user preferences. Furthermore, auxiliary meta-data information have significant influence on users’ check-in behaviors. For example, Ding et al. [16] utilize a deep neural network to integrate social information, geographical influence and categorical content. Yu et al. [17] take user’s influence and authority under various categories into consideration. Chang et al. [18] design a content-aware POI embedding model that employs various context information to learn the POIs’ characteristics. Such works are complementary to our work.

2.2 Sequential POI recommendation

Modeling sequential patterns is meaningful for capturing user preferences. Therefore, researchers have recently paid more attention to next POI recommendation.

Some works leverage the Markov chain property to model the sequential influence. Cheng et al. [7] develop a method that takes the latest check-in in a user’s check-in sequence into consideration. Nevertheless, the next POI may depend on both the last check-in and user’s earlier behaviors. Zhang et al. [3] leverage an additive Markov chain to predict the sequential probability. But this approach cannot assign transition probabilities to the unobserved data. Ye et al. [6] consider transitive patterns of categories in sequential check-ins. Since the hidden states learn the user’s sequential patterns, user preferences can be better modeled on sparse data. Recently, inspired by studies in deep learning, some researchers utilize neural networks to model check-in sequences. Liu et al. [19] extend Recurrent Neural Network (RNN) and propose a method of modeling local temporal and geographical influence with two types of transition matrices. Liu et al. [20] capture the sequential patterns by the word2vec framework. Kong et al. [9] combine geographical and temporal influence into long-short term memory (LSTM) network to mitigate data sparsity problem. Zhao et al. [21] design a new spatial-temporal gated network by enhancing LSTM to control user’s interest updates. Li et al. [22] propose the temporal and multi-level context attention mechanisms for next POI prediction by extending LSTM-based framework. He et al. [23] exploit the category transition pattern for next POI recommendation. Yu et al. [24] develop a LSTM based deep model that incorporates categorical information and spatial influence to reduce candidate results. Sun et al. [25] design a novel model for next POI recommendation, which learns user long-term preferences by using a nonlocal network, and captures user short-term preferences with a geo-dilated RNN. Nevertheless, all previous sequential models neglect the coarse-grained spatial information and public preferences.

2.3 Summary

Our work differs from previous studies in twofold. First, although existing research [4] has also exploited the influence of POIs in the same geographical region, the sequential transition pattern of POI regions is not considered to predict future check-ins. In contrast, our proposed HSP model extend RNN technique and propose a novel method to model the sequential pattern of POI regions. Second, we automatically integrate sequential patterns and public preference in a unified model.

3 Definitions and preliminaries

In this section, we provide formulation in terms of key notations and introduce the related technologies in this paper.

3.1 Problem formulation

Assume that there is a set of users $U = {u_{1}, u_{2}, \dots,$ $u_{| U |}}$ and a set of spatial items $V = {v_{1}, v_{2}, \dots, v_{| V |}}$ , where | · | denotes the size of an arbitrary set. A check-in is defined as a tuple (u, v, t), meaning that user u checked in at spatial item v at time step t. A check-in sequence of user $u \in U$ is defined as S_u = {(v₁, t₁) , (v₂, t₂) , . . . , (v_T, t_T)}, where v_T and t_T denotes the last spatial item and time checked in by this user, respectively. T is the length of historical check-in trajectory of user u. We extract the region sequence $R_{u} = {r_{1}, r_{2}, \dots, r_{T}}$ from S_u based on the spatial information of each check-in, where $r_{t} \in R$ , $R = {r_{1}, r_{2}, \dots, r_{| R |}}$ denotes a set of regions.

In this work, we focus on extracting information from check-in sequence S_u and public data (i.e., check-ins of other users in the same region). Therefore, we define our problem as follows.

Definition 1. (Next POI Recommendation): For a user $u \in U$ , given the spatial item set $V$ and the historical check-in sequence, the target of next POI recommendation is to recommend a list of POIs that user u would be interested in at time T + 1, with the help of sequential feature mining and public preference awareness.

3.2 Geographical binary tree

A geographical binary tree structure is developed to divide the search space into several regions. In the binary tree, nearby POIs should be tightly distributed, since they have high relevance[4]. In this paper, POIs are divided into a hierarchy of binary regions, so that geographically adjacent POIs have a higher probability to cluster in the same region. In order to build the geographical binary tree, we recursively divide each region into two equal-sized subregions, until at least one edge of these regions is below the region size threshold value. In the resulting binary tree, we assign each POI to a single region.

Definition 2. (Neighbors): Given a target user $u \in U$ and v_T located in region r_T, the neighbors of user u, denoted as $P_{u} = {u_{1}, u_{2}, \dots, u_{N}}$ , is a set of users who have check-ins in region r_T, where N is the number of neighbors in that region.

3.3 Latent representation learning

Similar to word symbols in natural language processing, the original check-ins, regions and users (i.e., one-hot representations) have very limited representation capacity. In this paper, these one-hot encodings are embedded into low-dimensional dense vectors, i.e., item vector ${\vec{v}}_{i}$ , region vector ${\vec{v}}_{c}$ and user vector $\vec{u}$ .

Intuitively, the latent representations of two spatial items (or regions) that are proximate in the geographical space should be similar to each other. Similarity, if two users have similar preferences, then the latent representations of them should be similar to each other. According to the observation, we leverage the Skip-gram model [26] to characterize similarity. Taking the latent representation learning of users as an example, we treat similar users of each given user as “words” in a “sentence”, and use Skip-gram model to learn user vectors. The learned user vector represents user preferences. The same is true for the latent representation learning of regions and spatial items. Thus, we train the embedding vectors of POIs, regions and users by maximizing the following objective function: $\begin{matrix} L (v_{j}) = \sum_{v_{j} \in C_{v_{j}}} \log ({\vec{v}}_{i_{j}} {\vec{v}}_{i_{k}} - \log \sum_{v_{k^{'}} \in V} \exp (\vec{v_{i_{j}}} \vec{v_{i_{k^{'}}}})) \\ L (r_{j}) = \sum_{r_{j} \in C_{r_{j}}} \log ({\vec{v}}_{c_{j}} {\vec{v}}_{c_{k}} - \log \sum_{r_{k^{'}} \in R} \exp (\vec{v_{c_{j}}} \vec{v_{c_{k^{'}}}})) \\ L (u_{j}) = \sum_{u_{j} \in C_{u_{j}}} \log ({\vec{u}}_{j} {\vec{u}}_{k} - \log \sum_{u_{k^{'}} \in U} \exp (\vec{u_{j}} \vec{u_{k^{'}}})) \end{matrix}$ (1) where C_{v
_j} denotes a set of spatial items whose distance from v_j does not exceed a given threshold, C_{r
_j} denotes surrounding regions to r_j. We group users with historical check-in information, C_{u
_j} denotes similar users of u_j. Note that users who have checked in at more than 10 identical spatial items with u_j are similar users of u_j.

4 The HSP model

In this part, we first provide an overview of our solution, then we present the details of the proposed HSP model, which integrates sequential transition pattern and public preference. The framework is presented in Figure 2.

Fig. 2

Illustration of HSP model.

4.1 Overview

To solve the issues mentioned above, HSP consists of three components, including region-wise spatial context, ATTRNN, and public’s common preference. More specifically, since check-ins are highly position-dependent, we introduce spatial context to the model for the recommended task. To mitigate the problem of data sparsity, we use the coarse-grained spatial context, i.e., POI regions. Sequentially, we leverage Gated Recurrent Unit (GRU) [8] based model to learn the long-range dependency. In order to model the dynamics of check-in sequences, we further utilize an attention mechanism to model the contribution of different time steps. We built the sequential feature vector by adopting the attention mechanism, which is a weighted sum of the prediction vectors of each time step with different attention weights. Afterwards, we exploit deep neural network (DNN) to learn public’s common preference in check-in behaviors. The final prediction vector is generated by integrating the sequential feature vector and the public preference vector into a fully-connected layer. The details of each component of HSP will be explained as follows.

4.2 Region-wise spatial context

Merely considering the dependencies between spatial items is not enough to make accurate prediction of the next spatial item simple because of the data sparsity. To solve this problem, we introduce region-wise spatial context to the HSP model.

Once the geographical binary tree is generated, sequential transition can be learned based on both item-wise check-in sequences and region-wise spatial information. Taking the spatial item sequence and the region sequence as input, our HSP first encodes sparse input (i.e., one-hot representations) via embedding layer and embeds them into low-dimensional dense vectors. The combination of spatial item vector and region vector can be used as a new representation sequence ${{\vec{v}}_{t}}_{t = 1}^{T}$ . In this paper, we combine item vector ${\vec{v}}_{i_{t}}$ and region vector ${\vec{v}}_{c_{t}}$ by simple concatenation, since the combination method does not affect the recommendation effectiveness. It is formally described as: $\begin{matrix} {\vec{v}}_{i_{t}} = Embed (v_{t}) \\ {\vec{v}}_{c_{t}} = Embed (r_{t}) \\ {\vec{v}}_{t} = [{\vec{v}}_{i_{t}}, {\vec{v}}_{c_{t}}] \end{matrix}$ (2) where Embed is the embedding layer.

We employ GRU to capture the long-range dependency via the following formulation: $\begin{matrix} {\vec{z}}_{t} = σ ({\vec{W}}_{z} \cdot [{\vec{h}}_{t - 1}, {\vec{v}}_{t}]) \\ {\vec{r}}_{t} = σ ({\vec{W}}_{r} \cdot [{\vec{h}}_{t - 1}, {\vec{v}}_{t}]) \\ {\vec{\tilde{h}}}_{t} = tanh ({\vec{W}}_{h} \cdot [{\vec{r}}_{t} * {\vec{h}}_{t - 1}, {\vec{v}}_{t}] \\ {\vec{h}}_{t} = (1 - {\vec{z}}_{t}) * {\vec{h}}_{t - 1} + {\vec{z}}_{t} * {\vec{\tilde{h}}}_{t} \end{matrix}$ (3) where $\vec{z}$ , $\vec{r}$ and $\vec{\tilde{h}}$ are respectively the update gate, reset gate, and cell value. $\vec{W}$ denotes the weight matrix, [] is combination vector, and * denotes matrix multiplication. The recurrent activation σ is the sigmoid function. Taking a vector sequence ${{\vec{v}}_{t}}_{t = 1}^{T}$ as input, and the GRU encodes ${{\vec{v}}_{t}}_{t = 1}^{T}$ into hidden states ${{\vec{h}}_{t}}_{t = 1}^{T}$ .

4.3 Modeling sequential patterns via attention mechanism

Existing methods only use the attention mechanism to describe users’ appetites for different items. However, these methods cannot reflect the dynamic features of users’ check-in sequential preferences at different time steps. Actually, the prediction results of different time steps have the different contribution to the sequential feature vector of target user.

Therefore, we propose an attention-aware recurrent neural network, called ATTRNN, whose aim is to capture the sequential feature from the user’s historical check-ins. ATTRNN is illustrated in Figure 2. ATTRNN introduces the attention mechanism for getting sequential feature and modeling the contribution of different time steps. These features are integrated as the sequential feature representation of user’s history check-in data.

ATTRNN calculates the weight of each time step in the sequence and learns the probability distribution over all time steps, which makes each time step has the different contribution. The time step with higher probability is considered as the one that more of significance to the next POI recommendation. The sum of the weights of all time steps is 1. The process of weights calculation is illustrated by Eq. 4, where α_t is the weight used to measure the impact of ${\vec{h}}_{t}$ on sequential feature vector ${\vec{v}}_{S}$ . The vectors ${{\vec{h}}_{t}}_{t = 1}^{T}$ are integrated by a hidden layer. It involves an weight matrix ${\vec{W}}_{t}$ and a bias vector ${\vec{b}}_{t}$ , and gets the hidden vector ${\vec{e}}_{t}$ . Then, we utilize the softmax function to get the probability distribution over hidden vectors ${\vec{e}}_{t = 1}^{T}$ . $\begin{matrix} {\vec{e}}_{t} = tanh ({\vec{W}}_{t} {\vec{h}}_{t} + {\vec{b}}_{t}) \\ α_{t} = \frac{{\vec{e}}_{t}}{‖ {\vec{e}}_{t} ‖_{1}} \end{matrix}$ (4)

As shown in Figure 2, once we compute the attention scores α_t for all T time steps, with the help of α_t, we define the final sequential feature representation vector ${\vec{v}}_{S}$ by performing element-wise aggregation on weighted features. Specifically, we built ${\vec{v}}_{S}$ by a weighted sum of all the T time steps. Here we define the final sequential feature representation vector ${\vec{v}}_{S}$ as: ${\vec{v}}_{S} = \sum_{t = 1}^{T} α_{t} {\vec{h}}_{t}$ (5)

4.4 Learning public preference

As mentioned above, the assumption that users prefer spatial items only based on their personalized sequential patterns may be inaccurate in multiple social application scenarios [27, 28]. After investigating some LBSNs, we find that users’ check-in behaviors are generally affected by two factors: their personalized sequential patterns and public’s common preference. Therefore, taking public preference into consideration is helpful for next POI recommendation. We design a hybrid neural network to learn the impact of sequential patterns and public preference on user’s check-in behaviors. More specifically, it consists of a RNN and a DNN, one used on the sequential feature representation vector and another on the public preference vector.

Public preference extraction based on DNN is shown in Figure 2. DNN is a feed-forward neural network with multiple hidden layers between its input and output. It is with fully connected layers and activated by sigmod. For each hidden layer l, sigmod is used to map all input from the lower layer, ${\vec{Z}}_{l}$ , to a scalar state, ${\vec{a}}_{l}$ , which is then fed to the upper layer. We leverage DNN via the following formulation: $\begin{matrix} {\vec{Z}}_{l} = {\vec{W}}_{l} {\vec{a}}_{l - 1} + {\vec{b}}_{l} \\ {\vec{a}}_{l} = σ ({\vec{Z}}_{l}) \end{matrix}$ (6) where ${\vec{Z}}_{1} = {\vec{W}}_{1} \vec{\tilde{u}} + {\vec{b}}_{1}$ , $\vec{\tilde{u}} = [{\vec{u}}_{1}, \dots, {\vec{u}}_{N}]$ and ${\vec{u}}_{n}$ denotes the embedded vector of a neighbor of the target user. ${\vec{b}}_{l}$ denotes the bias of layer l and ${\vec{W}}_{l}$ denotes the weight matrix on the connection between layer l and layer l - 1. Assuming there are L layers in the DNN, the public preference vector ${\vec{v}}_{P}$ can be learned by: ${\vec{v}}_{P} = σ ({\vec{W}}_{L} {\vec{a}}_{L - 1} + {\vec{b}}_{L})$ (7)

4.5 Prediction task and model learning

In HSP, ATTRNN is used to capture user’s sequential feature and the DNN is used to extract public preference. The simple concatenation operation makes sequential feature vector and the public preference vector independent with each other for next POI recommendation. Actually, the check-in sequence is closely related to public’s common preference, since users’ decision-making usually depending on public preference. Therefore, to capture the overall preference, we combine the output of ATTRNN ${\vec{v}}_{S}$ and the output of DNN ${\vec{v}}_{P}$ and project them into an fully-connected layer denoted as follows. $\vec{x} = {\vec{W}}_{o} [\begin{matrix} {\vec{v}}_{S} \\ {\vec{v}}_{P} \end{matrix}] + {\vec{b}}_{o}$ (8) where ${\vec{W}}_{o}$ and ${\vec{b}}_{o}$ are the weight matrix and the bias vector for the output layer, respectively. By integrate both ${\vec{v}}_{S}$ and ${\vec{v}}_{P}$ to a fully-connected layer, HSP can capture the sequential patterns and public preference for next POI recommendation.

The recommended task is usually handled by a multi-class classifier. At the target time, SoftMax is employed as activation function. The predicted probability distribution y is obtained via the following formulation: $\begin{matrix} \vec{y} = [y_{1}, y_{2}, . . ., y_{| V |}] \\ y_{i} = softmax ({\vec{W}}_{i} {\vec{x}}_{i} + {\vec{b}}_{i}) \end{matrix}$ (9) where ${\vec{W}}_{i}$ and ${\vec{b}}_{i}$ are the weight matrix and the bias vector, respectively. $y_{i} \in \vec{y}$ in the output layer is related to the probability that the user will check-in at spatial item v_i at next time step.

Algorithm 1 HSP

Input: The check-in sequences

Output: The recommended list

1: for each $u \in U$ do

2: Divide check-in sequences by 24 hours

3: Map each check-in sequence into a region transition sequence

4: end for

5: Initialize model parameters Θ

6: while not convergent do

7: for each sequence in training set do

8: Calculate user’s sequential preferences by using Eq. (5)

9: Calculate public preference by using Eq.(7)

10: Calculate the predicting score y_i by using Eq.(9)

11: Update model parameters Θ using Eq.(11)

12: end for

13: end while

14: for each sequence in test set do

15: Calculate the score according to Eq.(9)

16: Recommend top-k POIs

17: end for

Thus, a predicted vector y is built, which indicates the probability that the target user prefer to visit each spatial item in the future. The label vector over all spatial item candidates need to be obtained. Therefore, we define the real probability distribution, as $\vec{r} = [r_{1}, . . ., r_{| V |}]$ , where each $r_{i} \in \vec{r}$ denotes the frequency of spatial item i. Here cross-entropy loss with L₂ regularization is used as cost function. The goal of HSP in the training process is to minimize the cross entropy $L_{HSP}$ between predicted and real probability distributions: $\begin{matrix} E_{d} (\vec{r}, \vec{y}) = - \sum_{i = 1}^{| V |} (r_{i, d} \log y_{i, d}) \\ L_{HSP} = \frac{1}{D} \sum_{d = 1}^{D} E_{d} (\vec{r}, \vec{y}) + \frac{λ}{2} | | Θ | |^{2} \end{matrix}$ (10) where $| V |$ is number of spatial items, D is the number of all sequences in the training procedure, r_i and y_i are the predicted and real probability distributions for i-th spatial item candidate, all the model parameters are denoted as Θ, while λ denotes a regularization coefficient.

We utilize stochastic gradient descent (SGD) to optimize the objective function. The corresponding parameters are updated along the ascending gradient direction as follows: $\begin{matrix} Θ \leftarrow Θ + η (\frac{\partial}{\partial Θ} (r_{i} \log y_{i}) - λ | | Θ | |) \end{matrix}$ (11) where η is the learning rate.

Algorithm 1 describes the detailed flow of our HSP algorithm. The first step (lines 1-6) is our data processing, followed by our training module (lines 7–14), and finally the prediction module (lines 15–18).

5 Experiment

In this section, we evaluate the proposed method by detailed experiments on two real-life datasets to show the competitiveness of HSP in next POI recommendation task. We also measure the performance gain from the three main components in HSP via a series of tests.

5.1 Datasets

We conduct experiments on two real-world datasets. One is the Foursquare check-ins within Singapore [29]. Foursquare dataset contains 342,850 check-ins of users who live in Singapore between Aug. 2010 and Jul. 2011. The other one is the Gowalla check-ins within California and Nevada [30]. Gowalla dataset contains 736,148 check-in data between Feb. 2009 and Oct. 2010. Each check-in activity in both datasets contains the user-ID, POI-ID, POI-location in the form of latitude and longitude, and check-in time. To get check-in sequences, we utilize the same method as that in [31], i.e., taking a set of check-ins of user in 24 hours as a check-in sequence. We remove users’ interactions with fewer than 5 items and items interacted by fewer than 5 users in the two datasets. After pre-processing, the Foursquare dataset comprises 194,108 check-ins made by 2,321 users at 5,596 POIs, and the Gowalla dataset comprises 456,988 check-ins made by 10,162 users at 24,250 POIs. Each check-in is associated with a time stamp. The statistics of the datasets are shown in Table 1.

Table 1
Statistics of two datasets

Foursquare Gowalla

Number of users 2321 10162

Number of locations(POIs) 5596 24250

Number of check-ins 194108 456988

Density 6.35 × 10^-3 9.85 × 10^-4

	Foursquare	Gowalla
Number of users	2321	10162
Number of locations(POIs)	5596	24250
Number of check-ins	194108	456988
Density	6.35 × 10^-3	9.85 × 10^-4

In the experiments, 5-fold cross validation is performed on 90% of the dataset, and the remaining 10% of the dataset is selected as the test set to evaluate the effectiveness of the recommendation methods. Based on the cross validation, learning rate is set at 0.005, regularization term λ is set at 0.001.

5.2 Compared approaches

To evaluate the performance of the proposed method HSP, we conduct experiments against the following 7 state-of-the-art recommendation methods in the task of predicting next POI:

BPR: A matrix factorization model based on Bayesian Personalized Ranking [32], which employs BPR-MF for implicit feedback and optimizes the differences of user preferences for positive and negative POIs.

RNN: Recurrent Neural Network [8] is effective for next POI recommendation. In this work, GRU is utilized as an autoregressive model to predict user’s next behavior by learning user’s temporal preferences.

FPMC: Factorizing personalized Markov chains [7] introduce first-order Markov chain, and neighbor items are used as negative samples, i.e., linearly combines the user preference and Markov transition.

LORE: An additive Markov chain (AMC) based method [3], which employs both the nth-order additive Markov chain and two-dimensional check-in probability distribution.

PRME: Personalized ranking metric embedding [5], which is a metric embedding approach and considers spatial distance as the weight. It integrates sequential information, user preferences, and spatial influence. SITAR: A context-aware sequential recommender model [33] using stacked RNNs that model the dynamics of input and temporal patterns. Hence, we apply GRU to implement SITAR. POI2Vec: A latent representation model [4], which utilizes spatial information for recommendation. It uses a binary tree to cluster the nearby POIs into the same region.

In order to validate the performance gain brought by considering the region context, considering the contribution of different time steps and exploiting the public preference influence, we design three variants of HSP.

HSP-NC: the first variant, we do not consider the coarse-grained spatial context, i.e., POI regions.

HSP-NA: the second variation of HSP where we only consider the contribution of last time step to build a non-attentional variant.

HSP-NP: the last variation of HSP which we remove the public preference unit.

5.3 Evaluation metrics

In general, next POI recommendation techniques compute a score for each POI regarding a target user and return POIs with the top-k highest scores. To evaluate the effectiveness of the proposed methods, we employ four standard metrics, i.e., precision (pre@k), recall (rec@k), normalized discounted cumulative gain (NDCG@k) and mean reciprocal rank (MRR@k), where k is the number of recommendation results. The pre@k indicates the ratio of the number of discovered POIs to the k recommended POIs, and the rec@k defines the ratio of the number of discovered POIs to the number of positive POIs, which have been visited by the target user in the testing set. NDCG@k measures the ranking quality, which assigns higher scores to POIs at top-k ranks. MRR@k defines the average of reciprocal ranks of the desired POIs.

The pre@k and rec@k is defined as: $\begin{matrix} pre @ k = \frac{1}{| U |} \sum_{u \in U} \frac{# hi t_{u} @ k}{k} \\ rec @ k = \frac{1}{| U |} \sum_{u \in U} \frac{# hi t_{u} @ k}{TP} \end{matrix}$ (12) where # hit_u@k denotes the number of hits in the test set for each user, TP denotes the number of positive POIs.

The NDCG@k is defined as: $\begin{matrix} NDCG @ k = \frac{1}{| U |} \sum_{u \in U} \frac{{DCG}_{u} @ k}{{IDCG}_{u} @ k} \\ {DCG}_{u} @ k = \sum_{i = 1}^{k} \frac{2^{{rel}_{i}} - 1}{\log_{2} (i + 1)} \end{matrix}$ (13) where rel_i refers to the graded relevance of the result ranked at position i. In this paper, rel_i = 1 if the result is in the test set, and 0, otherwise. IDCG@k is the largest DCG@k value.

The MRR@k is defined as: $\begin{matrix} MRR @ k = \\ \frac{1}{| D_{test} |} \sum_{(u, v, t) \in D_{test}} {\begin{matrix} \frac{1}{r_{t} (u, v)} & , r_{t} (u, v) < k \\ 0 & , r_{t} (u, v) \geq k \end{matrix} \end{matrix}$ (14) where D_test is the test set, r_t (u, v) denotes the top rank of relevant POI.

5.4 Results on next POI recommendation

5.4.1 Overview

We present the experimental results of our proposed model HSP and the seven comparison recommendation methods on Foursquare and Gowalla datasets. To be specific, the effectiveness of different methods on two datasets evaluated by pre@k, rec@k, NDCG@k and MRR@k are shown in Figure 3 and Figure 4. In addition to recommendation effectiveness, we provide an efficiency comparison among different state-of-the-art methods shown in Table 2. Additionally, a statistical test is further used for statistical comparison of methods.

Fig. 3

The result of methods on Foursquare.

Fig. 4

The result of methods on Gowalla.

Table 2

The execution time (ks) of all the comparison methods

	BPR	RNN	FPMC	LORE	PRME	SITAR	POI2Vec	HSP
Foursquare	1.26	25.37	8.25	9.37	2.72	28.75	17.82	12.03
Gowalla	27.82	601.55	218.98	243.31	53.91	722.54	432.40	315.16

The embedding size is set to 200, and the number of GPU layers is set to 2 for our proposed model. For the parameters of other comparison methods, we follow the best settings in their papers. Note that k in all the tested methods is set between 5 and 20. Greater values of k are usually ignored for the top-k recommendation task.

5.4.2 Performance analysis

Based on the results shown in Figure 3 and Figure 4, we can note the following interesting findings.

Firstly, with the increasing value of k, the recall and MRR gradually increase but the precision and NDCG decreases steadily on the two datasets. The reason is that, by returning more POIs for users, it becomes easier to discover the next POI that users prefer to visit. However, the extra recommended spatial items are less possible to be liked by users due to the lower visiting probabilities of these locations, since the recommendation techniques return the locations with the top-k highest scores. The recommendation precision for all the comparing methods are low, since the check-in data on two datasets are very sparse.

Secondly, from the results, we observe that utilizing spatial information is necessary for next POI recommendation. Methods that take spatial influence into consideration generally perform better than those that ignore spatial influence. As shown in Figure 3 and Figure 4, PRME and FPMC are better than BPR, POI2Vec and our proposed HSP are better than SITAR, since PRME, FPMC, POI2Vec and HSP capture the spatial influence effectively on a user mobility behavior.

Thirdly, modeling the sequential patterns of user behaviors can improve recommendation performance, and correct sequential modeling is important for recommendation. The performance of BPR is worse than other comparison methods, proving the effectiveness of the sequential patterns modeling. Additionally, performances of methods based on sequential patterns modeling are influenced by the disordered property. Specifically, performances of sequential based methods on Foursquare are not as good as their performances on Gowalla. The reason is that multiple behaviors occur at a specific time on Fouraquare dataset and we cannot know the true order of these behaviors.

Fourthly, social influence affects the performance of POI recommendation. It is clear that, in Figure 3 and Figure 4, LORE is always superior than FPMC. Moreover, HSP outperforms POI2Vec, and this indicates that combining user’s public preference helps to recommend POIs.

Fifthly, data sparsity problem can be alleviated by combining multiple factors. On the Foursquare dataset, baselines BPR, RNN, FPMC, LORE perform better than on the Gowalla dataset. This is because Gowalla has much sparser check-in data than Foursquare. On the contrary, other comparison methods have stable performance, since they combine multiple factors such as spatial information, sequential patterns and social influence. Additionally, due to the use of region-wise spatial information, HSP and POI2Vec further alleviate data sparsity and achieve performance improvements over other methods.

Overall, our HSP is optimal on both datasets for all these cases. In terms of rec@10 on Foursquare dataset, HSP outperforms BPR, RNN, FPMC, LORE, PRME, SITAR and POI2Vec by 27%, 33%, 20%, 21%, 19%, 35%, and 7%, respectively. In terms of rec@10 on Gowalla dataset, HSP outperforms BPR, RNN, FPMC, LORE, PRME, SITAR and POI2Vec by 72%, 60%, 61%, 53%, 19%, 16%, and 18%, respectively. As shown in Table 2, in addition to recommendation effectiveness, the recommendation efficiency of HSP far exceeds that of other competitive methods based on deep learning. These results clearly demonstrates the competitiveness of our proposed HSP model with respect to other methods. Our region-based sequential pattern and public preference are powerful for next POI recommendation.

5.4.3 Statistical analysis

Although there are differences in the performance of all the recommendation methods, it is necessary to assess whether the differences are statistically significant. For this reason, Friedman statistical test [34] was used for recommendation methods statistical comparison.

The statistical analysis was based on the precision values of each method. According to the mean rank obtained by each recommendation method, HSP performs best on both two datasets, and the mean rank obtained is 1. We calculate Friedman statistic value with seven degrees of freedom as 15.79, where α = 0.05. Therefore, the null hypothesis that all methods behave the same is rejected. The differences of the comparison methods are significant. Figure 5 shows Friedman test results. From the results, certain deep learning based methods, such as HSP and POI2Vec perform better than the matrix factorization based method BPR.

Fig. 5

Friedman test results.

5.5 Results on different variants of HSP

5.5.1 Overview

We investigate the performance gain from three components, including region context, ATTRNN, and a hybrid neural network. To be specific, we remove region context, ATTRNN, and public preference from HSP, i.e., HSP-NC, HSP-NA, HSP-NP, respectively. Then compare the precision values and recall values with HSP. We took pre@k, rec@k,NDCG@k and MRR@k to illustrate the performance of the different components shown in Figure 6 and Figure 7. Note that k is set between 5 and 20. To further measure the impact from different components, we calculate the ratio of the average performance of each variant to that of HSP, as shown in Figure 8.

Fig. 6

Effectiveness of different components on Foursquare.

Fig. 7

Effectiveness of different components on Gowalla.

Fig. 8

The degree of influence from the different components.

5.5.2 The impact of region context

We validate the influence of region context via HSP-NC. Region context is an important component of HSP. Correspondingly, HSP-NC can only learn the item-level sequential preferences. The significant precision, recall and NDCG drop shown in Figure 6 and Figure 7 verifies that our proposed region context component contributes positively to the performance gain. Besides, from Figure 8, we observe that on Gowalla dataset, among three variants of HSP, HSP-NC has the largest performance degradation compared with HSP-NA and HSP-NP. The reason is that Gowalla dataset has sparser check-in data than Foursquare dataset. HSP-NC captures sequential preference from the sparse check-ins and neglects the user’s inequality of check-ins in different regions, so that the model unable to capture reliable sequential patterns and varied behavioral semantics. Overall, it further validates the importance of region context in next POI recommendation.

5.5.3 ATTRNN for modeling user sequential patterns

HSP-NA, which ignores the attention mechanism, is used to model users’ check-in sequential patterns. As shown in Figure 6 and Figure 7, an obvious performance decrease on both datasets. It shows the substantial effect of the attention mechanism in next POI recommendation. It can capture the dynamics of user check-in sequences, and detect which time step is important for modeling users’ check-in sequential preferences. HSP-NA experiences the most performance decrease compared with HSP-NC and HSP-NP on Foursquare dataset shown in Figure 8. The reason is that there is more than one behavior at a certain time on Fouraquare dataset, making the prediction of the last time step unreliable. HSP-NA ignores the contribution of different time steps. Moreover, the model can still performance well, especially in the scenarios when there are some drop-in check-in locations that interrupt the user check-in sequential patterns.

5.5.4 The impact of public preference

We validate the impact of public preference via HSP-NP which replaces the hybrid neural network unit with a single neural network. As shown in Figure 6 and Figure 7, similar to HSP-NA and HSP-NC, in HSP-NP which ignores the public preference for next POI recommendation, we also notice a significant drop on recommendation performance. In terms of rec@10, The component of public preference contributes 4.39% and 3.73% performance gain on Foursquare dataset and Gowalla dataset, respectively. This is due to the hybrid neural network learns public preference in check-in patterns. The results indicate that the proposed hybrid neural network unit benefits the model performance in next POI recommendation.

5.6 Performance study on GRU layer and embedding size

We investigate the performance of HSP with varying the number of GRU layers and embedding size, shown in Figure 9 and Figure 10. In this paper, we exploit the same embedding size for users, regions, and check-ins. Achieving optimal performance and maintaining reasonable training efficiency is important for next POI recommendation methods. From the perspective of performance, we study the sensitivity of HSP to the number of GRU layers and embedding size. We pay attention to the changes in execution time and rec@10 when the number of GRU layers and embedding size change. As can be inferred from Figure 9, nearly all optimal rec@10 results are obtained through a 2-layer GRU structure. Nevertheless, as shown in Figure 9, although the performance of the 3-layer GRU structure is slightly better than that of the 2-layer GRU structure, the execution time is much longer. Also, it can be seen from the Figure 10 that when the embedding size is less than 200, the execution time grows relatively slowly, and when the embedding is above 200, the execution time increases sharply. Therefore, we made a tradeoff between the performance and execution time, and the embedding size is set at 200.

Fig. 9

Execution time and the recommendation performance with varying GRU layer.

Fig. 10

Execution time and the recommendation performance with varying embedding size.

6 Conclusion

In this paper we have presented a new method, called HSP, for next POI recommendation. Our method absorbs the merits of recurrent neural network, region-wise spatial information and public preference. The central idea of our method is to learn user preferences, by exploiting sequential patterns and public preference. Extensive experiments based on two public datasets have demonstrated the effectiveness and efficiency of our method.

For future work, we would like to incorporate more content information such as POI’s text content to further improve our method. Moreover, we will further exploit category information in next POI recommendation task. In addition, we are also interested in exploring the impact of user temporal patterns on user movement behaviors.

Footnotes

Acknowledgments

This work is supported by the National Key R&D Program of China (2018YFB1003404) and the National Natural Science Foundation of China (61672142, 62072086, 62072084, U1811261).

References

Wang

, Shen

, Ouyang

and Cheng

, Exploiting poi-specific geographical influence for point-of-interest recommendation, In IJCAI, 2018, pp. 3877–3883.

Yin

, Sadiq

S.W.

, Chen

, Xie

and Zhou

, SPORE: A sequential personalized spatial item recommender system, In ICDE, 2016, pp. 954–965.

Zhang

, Chow

and Li

, LORE: exploiting sequential influence for location recommendations, In SIGSPATIAL, 2014, pp. 103–112.

Feng

, Cong

, An

and Chee

Y.M.

, Poi2vec: Geographical latent representation for predicting future visitors, In AAAI, 2017, pp. 102–108.

Feng

, Li

, Zeng

, Cong

, Chee

Y.M.

and Yuan

, Personalized ranking metric embedding for next new POI recommendation, In IJCAI, 2015, pp. 2069–2075.

Cheng

, Ye

and Zhu.

, What’s your next move: User activity prediction in location-based social networks, In SIAM, 2013, pp. 171–179.

Cheng

, Yang

, Lyu

M.R.

and King

, Where you like to go next: Successive point-of-interest recommendation, In IJCAI, 2013, pp. 2605–2611.

Cho

, van Merrienboer

, Ç. G”ulçehre

, Bahdanau

, Bougares

, Schwenk

and Bengio

, Learning phrase representations using RNN encoder-decoder for statistical machine translation, In EMNLP, 2014, pp. 1724–1734.

Kong

and Wu

, HST-LSTM: A hierarchical spatial-temporal long-short term memory network for location prediction, In IJCAI, 2018, pp. 2341–2347.

10.

Rehman

, Khalid

and Madani

S.A.

, A comparative study of location-based recommendation systems, Knowledge Eng. Review 32 (2017), e7.

11.

Liu

, Xiong

, Papadimitriou

, Fu

and Yao

, A general geographical probabilistic factor model for point of interest recommendation, IEEE Trans Knowl Data Eng 27(5) (2015), 1167–1179.

12.

Sarwat

, Levandoski

J.J.

, Eldawy

and Mokbel

M.F.

, Lars*: An efficient and scalable location-aware recommender system, IEEE Trans Knowl Data Eng 26(6) (2014), 1384–1399.

13.

, Cong

, Li

, Pham

T.N.

and Krishnaswamy

, Rank-geofm: A ranking based geographical factorization method for point of interest recommendation, In SIGIR, 2015, pp. 433–442.

14.

, Zhang

, Gao

, Bian

and Cui

, GARG: Anonymous recommendation of point-of-interest in mobile networks by graph convolution network, Data Sci Eng 5(4) (2020), 433–447.

15.

Gao

, Yang

, Wu

, Qiao

, Chen

, Yang

and Chen

, Exploiting location-based context for POI recommendation when traveling to a new region, IEEE Access 8 (2020), 52404–52412.

16.

Ding

and Chen

, Recnet: a deep neural network for personalized POI recommendation in location-based social networks, Int J Geogr Inf Sci 32(8) (2018), 1631–1648.

17.

, Li

, Jiang

and Lin

, Point-of-interest recommendation for location promotion in location-based social networks, In MDM, IEEE Computer Society, 2017, pp. 344–347.

18.

Chang

, Park

, Kim

and Kang

, Content-aware hierarchical point-of-interest embedding model for successive POI recommendation, In IJCAI, 2018, pp. 3301–3307.

19.

Liu

, Wu

, Wang

and Tan

, Predicting the next location: A recurrent model with spatial and temporal contexts, In AAAI, 2016, pp. 194–200.

20.

Liu

, Liu

and Li

, Exploring the context of locations for personalized location recommendations, In IJCAI, 2016, pp. 1188–1194.

21.

Zhao

, Zhu

, Liu

, Xu

, Li

, Zhuang

, Sheng

V.S.

and Zhou

, Where to go next: A spatio-temporal gated network for next POI Recommendation, In AAAI, 2019, pp. 5877–5884.

22.

, Shen

and Zhu

, Next point-of-interest recommendation with temporal and multi-level context attention, In ICDM, 2018, pp. 1110–1115.

23.

, Li

, Liao

, Song

and Cheung

W.K.

, Inferring a personalized next point-of-interest recommendation model with latent behavior patterns, In AAAI, 2016, pp. 137–143.

24.

, Cui

, Guo

, Lu

, Li

and Lu

, A category-aware deep model for successive POI recommendation on sparse check-in data, In WWW, 2020, pp. 1264–1274.

25.

Sun

, Qian

, Chen

, Liang

, Nguyen

Q.V.H.

and Yin

, Where to go next: Modeling long- and short-term user preferences for point-of-interest recommendation, In AAAI, 2020, pp. 214–221.

26.

Mikolov

, Chen

, Corrado

and Dean

, Efficient estimation of word representations in vector space. In Y. Bengio and Y. LeCun (Editors), ICLR, 2013.

27.

, Zhang

, Wu

and Yang

, Modeling user posting behavior on social media, In SIGIR, 2012, pp. 545–554.

28.

Cha

and Cho

, Social-network analysis using topic models, In SIGIR, 2012, pp. 565–574.

29.

Yuan

, Cong

, Ma

, Sun

and Magnenat-Thalmann

, Time-aware point-of-interest recommendation, In SIGIR, 2013, pp. 363–372.

30.

Cho

, Myers

S.A.

and Leskovec

, Friendship and mobility: user movement in location-based social networks, In SIGKDD, 2011, pp. 1082–1090.

31.

Zhao

, Zhao

, King

and Lyu

M.R.

, Geo-teaser: Geo-temporal sequential embedding rank for point-of-interest recommendation, In WWW, ACM, 2017, pp. 153–162.

32.

Rendle

, Freudenthaler

, Gantner

and Schmidt-Thieme

, BPR: bayesian personalized ranking from implicit feedback, CoRR, 2012.

33.

Rakkappan

and Rajan

, Context-aware sequential recommendations with stacked recurrent neural networks, In WWW, 2019, pp. 3172–3178.

34.

Friedman

, The use of ranks to avoid the assumption of normality implicit in the analysis of variance, Publications of the American Statal Association 32(200) (1939), 675–701.

Next point-of-interest recommendation by sequential feature mining and public preference awareness

Abstract

Keywords

1 Introduction

2.1 General POI recommendation

2.2 Sequential POI recommendation

2.3 Summary

3 Definitions and preliminaries

3.1 Problem formulation

3.2 Geographical binary tree

3.3 Latent representation learning

4.2 Region-wise spatial context

5.1 Datasets

Table 1 Statistics of two datasets Foursquare Gowalla Number of users 2321 10162 Number of locations(POIs) 5596 24250 Number of check-ins 194108 456988 Density 6.35 × 10-3 9.85 × 10-4

5.3 Evaluation metrics

5.4.1 Overview

5.4.3 Statistical analysis

5.5.1 Overview

5.5.3 ATTRNN for modeling user sequential patterns

5.5.4 The impact of public preference

5.6 Performance study on GRU layer and embedding size

Footnotes

Acknowledgments

References

Table 1
Statistics of two datasets

Foursquare Gowalla

Number of users 2321 10162

Number of locations(POIs) 5596 24250

Number of check-ins 194108 456988

Density 6.35 × 10^-3 9.85 × 10^-4