Abstract
Point-of-Interest (POI) recommendation is one of the most important tasks in the field of social network analysis. Many efforts have been proposed to enhance the model performance for the POI recommendation task in recent years. Existing studies have revealed that the temporal factor and geographical factor are two crucial contextual factors which influence user decisions. However, they only learn representations of POIs and users from the single contextual factor and fuse the learned representations in the final stage, which ignores the interactions of different contextual factors, leading to learning suboptimal representations of POIs and users. To overcome this gap, we propose a novel Temporal-Geographical Attention-based Transformer (TGAT) for the POI recommendation task. Specifically, TGAT develops a hybrid sequence sampling strategy that samples the sequence of POIs from the different contextual factor POI graphs generated by the users’ check-in records. In this way, the interactions of different contextual factors can be care-fully pre-served. Then TGAT conducts a Transformer-based neural network backbone to learn representations of POIs from the sampling sequences. In addition, a weighted aggregation strategy is proposed to fuse the representations learned from different context factors. The extensive experimental results on real-world datasets have demonstrated the effectiveness of TGAT.
Introduction
The advancement of smartphones has led to the rapid development of location-based social networks (LBSNs) [1] such as Foursquare, and Yelp. These social platforms benefit users to share their life experiences, resulting in massive check-in records. The abundance of location data poses an opportunity for understanding user preferences, which can bring huge economic benefits to the platforms [2]. As one of the typical applications, Point-of-interest (POI) recommendation systems [3–5] are particularly useful for handling massive data and helping users filter out irrelevant in-formation, while promoting tourism planning, business marketing, and scenic tourism services. By utilizing check-in data, location comments, and other information, these systems can suggest POIs that users may be interested in, which have attracted great attentions in the field of recommender systems.
A key issue of the POI recommender systems is how to precisely capture the user preferences [6]. Different from the conventional recommendation tasks, such as movie recommendation, the POI recommendation task faces with more complex situations. The check-in behaviors of users could be influenced by various contextual factors. Temporal factors [7–10] and geographical factors [11–15] are two crucial factors for understanding the user preferences. Temporal influence is critical in creating a functional POI recommendation system. Recent studies [7, 10] have found that user check-in behaviors in LBSNs displays specific temporal patterns. For instance, people tend to be in their office during Monday afternoons and bars for entertainment at night.
In addition, the geographical factor is a significant factor in POI recommendations. According to the Tobler’s first law of geography [16] which suggests that as the distance between a user and a POI increases, the user’s interest decreases. This relationship is similar to the inverse correlation between the probability of purchase and an item’s cost. However, additional factors such as utility also play a role. Utility is an economic term that refers to the satisfaction or preference users have for different POIs. For example, users may be willing to travel to a remote POI if it provides a higher level of satisfaction than a nearby location. Furthermore, users of LBSNs have varying mobility patterns, which can complicate the process of modeling check-in decisions. Accounting for all these factors when making POI recommendations can help produce more accurate and tailored suggestions for users.
Many studies [17–20] have been proposed to learn user preferences from the temporal factors and geographical factors, and have shown remarkable performance. Despite effectiveness, existing methods first leverage the single contextual factor to capture the user preferences and then develop the fusion function to fuse the representations learned from different contextual information. This work mechanism naturally ignores the relevance between different contextual information, leading to learn suboptimal user preferences.
To overcome this limitation, in this paper, we propose a novel POI recommendation model, called Temporal-Geographical Attention-based Transformer (TGAT). TGAT is a novel POI recommendation model that employs a hybrid sequence sampling strategy to sample POI sequences from contextual factor POI graphs created from users’ check-in records, which preserves the interactions of various contextual factors effectively. Next, TGAT uses a Transformer-based neural network backbone to learn representations of POIs from the sampled sequences. To fully capture the influences of temporal and geographical factors, TGAT also proposes a weighted aggregation strategy that fuses the learned representations from different context factors. To evaluate TGAT’s effectiveness against other state-of-the-art POI recommendation models, ex-tensive experiments were conducted using real-world datasets. The experimental results demonstrate that our proposed TGAT outperforms previous methods in the POI recommendation task.
The main contributions of this paper are summarized as follows: We propose a novel model TGAT for the POI recommendation task in LBSNs, which can effectively preserve the influences of different contextual factors on learning representations of POIs and users. We propose a hybrid sequence sampling strategy to sample POIs involving different contextual factors. And we further develop a weighted aggregation strategy to fuse the representations of POIs learned from different contexts. We conduct extensive experiments on real-world datasets. Experimental results show that our proposed TGAT consistently surpass other recent representative POIs recommendation methods, indicating the effectiveness of our proposed TGAT for the POI recommendation task.
Related work
In this section, we review the recent studies which consider the influences of temporal factors and geographical factors on POI recommendation task.
Temporal influence
The temporal factor is one of the most important factors to describe the check-in behaviors of users and the properties of POIs. For instance, if the check-in records of the target user are mainly happened in the morning, he may need to be recommended POIs opened at that time point. Another example is that a bar only opens at night, so that it is better to recommend a bar to users who often engage in activities at night.
Chen et al. [7] propose TeSP-TMF, which combines grey relational analysis and matrix factorization to mine inherent temporal relationships. Zhao et al. [8] consider three temporal characteristics of user mobility and propose an Aggregated Temporal Tensor Factorization (ATTF) model to capture these temporal features together as well as at different time scales using temporal tensor factorization and a linear combination operator. Ying et al. [9] highlight the challenge of temporal influence on POI recommendation and proposes a novel system that consists of two components: con-text-aware tensor decomposition (CTD) for user preferences modeling and weighted HITS-based POI rating (WHBPR). The system models user preferences with a three-dimension tensor (user-category-time), supplements missing entries using CTD and recovers user preferences for different time slots. Si et al. [10] propose an adaptive POI recommendation method (CTF-ARA) that combines check-in and temporal features with user-based collaborative filtering. The method mines user activity and similarity features using probability statistical analysis and temporal factor variability and consecutiveness features.
Geographical influence
The geographical factor is also important for understanding the users’ check-in activities. Since near POIs may share similar features, the geographical factor is usually to measure the user preference of the certain area.
Ren et al. [11] propose a novel framework, called MPGI (Mining Preferences from Geographical and Interactive Correlations), which includes a POI correlation modeling layer to capture geographical distances and interactive correlations between all POI pairs and fuses relevant signals from highly correlated POIs into target POI for high-quality POI representations. Liu et al. [12] propose a novel ensemble learning framework called PG-PRE, which constructs multiple similar user groups using a roulette selection-based sampling method to improve their diversity and provides each group with a POI recommendation suggestion. Huo et al. [13] propose a geographical location privacy-preserving algorithm (GLP) and a friend relationship privacy-preserving algorithm (FRP) to prevent the disclosure of user’s privacy upon geo-graphical location and friend relationship factors. Zhang et al. [14] propose a personalized geographical influence modeling method called PGIM which models geographical preference from user global tolerance, user local tolerance, and spatial distance and extracts user diversity preference from interactions among users to promote diversity in recommendations. Liu et al. [15] propose utilizing local collaborative ranking (LCR) to mitigate the sparsity of check-in data by assuming the user-POI matrix is locally low-rank instead of globally low-rank. Liu et al. [16] proposed general geo-graphical probabilistic factor model (Geo-PFM) framework that captures geographical influences on user check-ins and leverages user mobility behaviors.
Both temporal and geographical influence
There are also many studies that consider the influences of both temporal and geographical influence to precisely capture the user’s check-in habits. Wang et al. [17] propose spatial-temporal and text representation learning (STaTRL) method to incorporate geographical information and temporal information to improve performance. Cao et al. [13] propose that considers graph-structured information, collaborative signals from other users, and dynamic user preferences. Dai et al. [19] propose a PPR framework jointly models User-POI relation, sequential patterns, geographical influence, and social ties in a heterogeneous graph to learn user and POI representations and then uses a spatio-temporal neural network based on LSTM model to model user personalized sequential patterns for POI recommendation. Wang et al. [20] propose GSTN that incorporates user spatial and temporal dependencies via a Graph-based Spatial Dependency modeling (GSD) module to explicitly model complex geographical influences by leveraging graph embedding. Zhao et al. [21] propose a Spatio-Temporal Gated Network (STGN) to model personalized sequential patterns for users’ long and short-term preferences in next POI recommendation. Davtalab et al. [22] propose the social spatio-temporal probabilistic matrix factorization (SSTPMF) model, which integrates different spaces including social space, geographical space, and POI category space in similarity modeling. Huang et al. [23] propose a Spatio-Temporal effect based on Purpose Ranking (STPR) model that considers trip-purpose and historical check-in behavior for POI recommendation. Liu et al. [24] propose geographical-temporal awareness hierarchical attention network (GT-HAN) approach that explicitly captures POI-POI interactions across a user’s check-in history and distinguishes relevant check-ins from irrelevant ones. Ma et al. [25] model users’ periodic and repetitive daily activities for sequence-based prediction of visit probability. Additionally, geographical preference is represented by KDE, and category preference is used to predict POI check-in probability. Wang et al. [26] proposes two lightweight approaches, Time Aware Position Encoder (TAPE) and Interval Aware Attention Block (IAAB), which add scaled spatial-temporal intervals to the attention map, promoting attention mechanism while adhering to time constraints to provide more explainable recommendations. Li et al. [27] propose a unified approach to calculate context-aware similarities between different users by considering both temporal and spatial features, and dynamically generates different POI recommendation lists for a user based on current context information.
Though there are many methods considering the influence of both temporal and geographical influence factors on user preferences, their working mechanism enforce them firstly learn the representations from each single contextual factor, which inevitably weakens the ability of model to capture the relevance between different contextual information, further degrading the model performance. To overcome this gap, we propose a novel method TGAT, which not only capture the influence of temporal factor and geographical factor, but also learn the relevance between these contextual factors to better estimate user preferences.
Preliminaries
In this section, we provide several key definitions used in this paper.
In this paper, we use lowercase to represent elements and uppercase to represent sets. We use bold to represent vectors or matrices. The main notations of this paper are summarized in Table 1.
Notations and their descriptions
Notations and their descriptions
In this section, we detail our proposed temporal-geographical attention-based transformer, TGAT. We first introduce our proposed temporal-geographical sequence sampling. Then we introduce the implementation details of TGAT. Finally, we intro-duce the objective function and the learning algorithm of TGAT. The overall frame-work of TGAT is shown in Fig. 1.

The overall framework of TGAT.
Previous studies have revealed that exploiting the influences of temporal factor and geographical factor is beneficial for capturing the unique user preference from the complex check-in activities. However, existing methods ignore the relevance of temporal factor and geographical factor, which may weaken the model performance for POI recommendation. Hence, we propose a novel temporal-geographical attention mechanism (TGAM) that jointly learn user preferences from temporal factor, geographical factor and their combination.
TGAM is derived from the self-attention mechanism in Transformer [28]. Since Transformer learning the dependencies of input objects via the input sequences, the key problem of applying Transformer to learn POIs’ representations is how to construct the POI sequences associated with temporal factor and geographical factor. In this paper, we leverage the graph structural data to preserve the complex relations of POIs.
Hence, we first generate two graphs involving temporal factor and geographical factor to preserve the interaction of POIs under different contextual information, respectively. Specifically, for the temporal POI graph G t , we generate the edges based on the check-in timestamp. For POIs which have been visited in the same timestamp, there is an edge between them in the G t . Similar to G t , for POIs which are located in the same area, there is an edge between them in the geographical POI graph G g .
After constructing the graphs, we require to generate sequences of POIs as the input for the Transformer-based model. A naïve strategy is to sample sequences from these two graphs separately for each POI. However, this strategy ignores the relation of temporal factor and geographical factor. So that, we develop a new method to construct the context-aware sequences.
For each POI v, we first sample two sequences
After constructing the input sequence, we detail the implementations of TGAT for POI recommendation. TGAT is built on the standard Transformer architecture. We first detail how to associate the input features of POIs. Then we introduce the whole neural network architecture.
In this paper, we generate the input features for each POI from two perspectives, semantic features and graph topology features. For the semantic features
Then we leverage the self-attention mechanism to learn the representations of POIs from different sampling sequences. Since we have obtained three types of sampling sequences, the temporal sequence
For POIs in
Then, we can obtain the final representations
Similar to
In this way, we can adaptively learn the representations of POIs from the temporal context and geographical context. In this paper, we regard them as hyper-parameters and use the grid research method to determine their values.
After we obtain the representations of POIs, we can generate the representations of users according to the learned POIs’ representations. In this paper, we utilize the following strategy to obtain the users’ representations:
Here we provide the complexity analysis of our proposed method. The main computational cost of TGAT comes from the Transformer backbone, which is well known as the complexity square in the length of input sequences. Suppose the length of the sampling sequence is l s and the dimension of the hidden representations is d h , the computational complexity of TGAT is O (l s 2d h ). Hence, we can choose a suitable length of the length of the sampling sequence to control the whole model training cost.
Objective function and learning processing
In this paper, we adopt Bayesian Personalized Ranking [29] loss function to learn model parameters, which is widely used to model the implicit feedback in the field of recommender systems. In this paper, we adopt the settings of negative samples in [29]. Specifically, we regard the POIs that users have not visited as the negative samples. And the objective function of TGAT is represented as follows:
In this section, we detail the experiments of this paper. Specifically, we first introduce the benchmark datasets. Then we introduce the adopted baseline models and the evaluation metrics. Finally, we report the experimental results and provide the corresponding analysis.
Benchmark datasets
In this paper, we adopt two widely used datasets, NYC [30] and TKY [30], for evaluating the model performance on the POI recommendation task. Both of them are collected from the famous social network platform Foursquare. Each dataset consists of check-in records of users. And each check-in record contains three main elements, the user, the POI and the check-in timestamp. And each POI is associated with several additional information, such as location information and category information. Table 2 reports the statistics of datasets. In experiments, we resort the check-in records according the check-in timestamps. And for each user, we select the first 60% as the training set, then 20% as the validation set, and the rest as the test set.
Statistics of datasets
Statistics of datasets
We select the baseline models from the following three aspects: temporal factor-based methods, geographical factor-based methods and temporal-geographical based methods. For the first category, we select TeSP-TMF [7] and ATTF [8]. For the second category, we adopt PGIM [14] and Geo-PFM [16]. And for the last category, we choose SSTPMF [22] and LSTM-KDE [25]. The brief descriptions of these baseline models are as follows:
TeSP-TMF [7] is a matrix factorization-based method that investigates different contextual information to construct the rating matrix and further leverage the matrix factorization-based techniques to learn user preferences.
ATTF [8] is also a matrix factorization-based method that employs a tensor factorization method to learn user preferences at different time scales.
PGIM [14] is a neural network-based method that jointly learns users’ geographical preference and diversity preference for POI recommendation.
Geo-PFM [16] is a probabilistic graph model based on the Poisson distribution to capture the geographical influences on user preferences.
SSTPMF [22] is a probabilistic matrix factorization model that integrates diverse contextual influence factors to capture the similarity between users and POIs.
LSTM-KDE [25] is a hybrid method that combines the LSTM model and the kernel density estimation technique to learn user preferences from complex contextual information.
For the evaluation metrics, we adopt Precision and Recall as the evaluation metrics. Specifically, for the Top-k recommendation task, the calculations of the above metrics are as follows:
For the hyper-parameter settings, we use the grid search to determine the final values of hyper-parameters. Specifically, we try the dimension of the hidden representations in {64, 128, 256}, the length of the sampling sequences in {5, 15, …, 25}, the learning rate in {0.01, 0.005, 0.001}.
Performance comparison
To validate the performance of each model on the POI recommendation task, we run each model ten times with different random seeds and report the mean values of metrics. The results are shown in Figs. 2–5. Generally speaking, simultaneously considering the influence of temporal factors and geographical factors can significantly enhance the model performance for the POI recommendation task. For instance, SSTPMF and LSTM-KDE can achieve more competitive performance than Geo-PFM and PGIM, where the latter methods only consider the influence of temporal factors or geographical factors, leading to learning suboptimal representations of POIs and user. In addition, we can observe methods based on temporal factors outperform those based on geographical factors. This is because check-in records from Foursquare are more related to the temporal factors. Finally, we can observe that our proposed TGAT consistently outperforms all baselines on the POI recommendation with different list lengths. For instance, TGAT leads the second place by around 5% in terms of Precision on NYC dataset at k = 1. This phenomenon indicates that our proposed designs can effectively capture the influence of temporal and geographical factors, further improving the model performance.

Mean Precision (%) of each model with different on NYC.

Mean Recall (%) of each model with different on NYC.

Mean Precision (%) of each model with different on TKY.

Mean Recall (%) of each model with different on TKY.
There are two key modules in TGAT, the hybrid sampling strategy and the weighted aggregation strategy. The former preserves the influence of temporal factors and geographical factors. The latter helps the model better learn the representation of POIs from different contextual information. To investigate the contributions of them to the model performance, we provide two variants of TGAT, named TGAT-H and TGAT-A. In TGAT-H, we remove the hybrid sampling strategy. And in TGAT-A, we remove the weighted aggregation strategy. We fixed k = 10 and run each model on two datasets. The results are reported in Tables 3 and 4. We can observe that TGAT-A beats TGAT-H on two datasets, indicating that the hybrid sampling strategy has more influence than the aggregation strategy. This phenomenon also reveals that our proposed hybrid sampling strategy can significantly enhance the model performance for the POI recommendation task.
Performance of TGAT and its variants on NYC
Performance of TGAT and its variants on NYC
Performance of TGAT and its variants on TKY
As mentioned is Section 5.4, the hybrid sampling strategy can largely improve the model performance. And the length of the sampling sequence is the key parameter that can have large influence on the model performance. Hence, we conduct the following experiments to validate the influence of the sampling sequence length on the model performance. Specifically, we vary the sequence length m in {5, 10, 15, 20, 25} to observe the changes of the model performance. We also fixed k = 10 and report the performance of model with different sampling sequence lengths on two datasets. The experimental results are shown in Figs. 6 and 7. We can observe that the length of the sampling sequence can has an impact on the effectiveness of the model. Specifically, small m and large m lead to the poor model performance. This is because short sequences contain less information and long sequences include more irrelevance information. And we can also observe that different datasets require different sequence lengths to achieve the best performance. This is because different datasets have different numbers of POIs. And a large number of POIs need long sequences to preserve the information of POIs.

Performance of TGAT with different m on NYC.

Performance of TGAT with different m on TKY.
In this paper, we propose a novel model TGAT for the POI recommendation task. TGAT proposes a hybrid sampling strategy that can effectively preserve the interactions of POIs from temporal and geographical texts. Moreover, TGAT leverages the Transformer architecture the learn POI representations based on different contextual information. In addition, a weighted aggregation strategy is adopted to fuse the POI features extracted from different contexts to obtain the final representations of POIs. In this way, the influences of different contextual information can be carefully preserved. We conduct extensive experiments to validate the effectiveness of the proposed method. The experimental results on real-world datasets have proved the superiority of the proposed TGAT, compared to the representative POI recommendation models.
Though TGAT could utilize the weighted aggregation strategy to learn the representations of POIs from different contextual information, it fails to learn POIs’ representations adaptively, since the aggregation weights are not learnable. Hence, we will try to introduce the adaptive feature learning modules in the future work.
