Abstract
Recommender systems can recommend products by analyzing the interests and habits of users. To make more efficient recommendation, the contextual information should be collected in recommendation algorithms. In the restaurant recommendation, the location and the current time of customers should also be considered to facilitate restaurants to find potential customers and give accurate and timely recommendations. However, the existing recommendation approaches often lack the consideration of the influence of time and location. Besides, the data sparsity is an inherent problem in the collaborative filtering algorithm. To address these problems, this paper proposes a recommendation approach which combines the contextual information including time, price and location. Instead of constructing the user-restaurant scoring matrix, the proposed approach clusters price tags and generates the user-price scoring matrix to alleviate the sparsity of data. The experiment on Foursquare dataset shows that the proposed approach has a better performance than traditional ones.
Keywords
Introduction
The era of information explosion and the development of information technology have increased the importance of recommender systems [1]. The collaborative filtering is the most prevalent technique in recommender systems and facilitates the high-quality recommendations by identifying similar users based on their logged history of prior transactions [2]. Due to its effectiveness, the collaborative filtering has attracted considerable attention and has been significantly developed and adopted by some successful commercial companies, such as Amazon and Netflix.
However, the collaborative filtering should be revised when it is applied in some specific occasions. In the restaurant recommendation, the classical collaborative filtering assumes that customers are likely to go anywhere. The fact is that they usually have their own range of activities. The willingness of a customer to go to a particular restaurant will gradually decrease as the distance rises according to Tobler’s first law of geography [3]. Except for that, time is also an important factor. Customers will definitely not choose the restaurant that does not match their available time. In addition, people at the same consumption level may prefer the same restaurant, which means the price should also be considered in the restaurant recommendation.
The purpose of this paper is to propose an approach with contextual information to make more accurate recommendation in restaurant domain. Specifically, the approach 1) combines the value generated by an appropriate distance-decay function with the user-based collaborative filtering algorithm to consider the geographical effect in recommending restaurants; 2) filters the recommended restaurant lists by the degree of the overlap between the business hours and the current time in order to make the just-in-time recommendation; 3) groups restaurants according to the price tags to alleviate the data sparsity in the recommendation.
The related work
Some research on recommender systems has taken the contextual information into account. With the support of the Location-based Social Networks (LBSN), an online social network which records users’ check-in data in different locations, a great effort has been done to develop the Point-of-Interest (POI) recommendations [7]. In this section, the POI recommendations are classified into three types according to the additional information used.
The POI recommendation method based on the check-in data
The check-in frequency can indirectly reflect the preferences of the location. Viewing the check-in data as a commodity, the POI recommendation can be done with the help of the traditional user-based collaborative filtering [4] and the item-based collaborative filtering [5].
However, it should be pointed out that solely using the collaborative filtering algorithm, either the user-based or the item-based collaborative filtering, cannot handle the data sparsity problem very well. To address this problem, the model-based collaborative filtering is proposed and it uses data mining techniques to build the recommendation model. For example, geographical information can be used to make POI recommendation based on the regularized matrix factorization (RPF) [8]. Cheng, Yang and Lyu [9] proposed a POI recommendation algorithm based on the probabilistic matrix factorization (PMF) and the probabilistic factor model (PFM). Compared with the PMF, the PFM has a slight advantage in the experiment of Gowalla dataset.
The POI recommendation method based on the geographical information
The users’ check-in activity represents a spatial clustering phenomenon. Tobler’s first law of geography [3] illustrates that everything is related to everything else, but near things are more related than distant things. Thus, a user intuitively tends to visit the nearby locations. Ye et al. [6] modeled users’ check-in activities according to the power law distribution and developed a collaborative recommendation algorithm based on the geographical in Yuan, Cong and Ma [10] assumed that the willingness of a user moving from one place to another is influenced by the distance between them. Zhang, Chow and Li [11] utilized Kernel density estimation to personalize the location distribution and model the impact of each user. The article [12] proposed an algorithm focuses on directly optimizing the ranking loss based on users’ preferences of the locations and activities. Besides, combining the geographical influence with the Bayesian non-negative matrix factorization (BNMF), a novel geographical probabilistic factor analytical framework was presented [13].
The POI recommendation method based on the time factor
The time factor plays a critical role in the location recommendation due to the availability of check-in activities.
Assuming that users visit different locations at different occasions, The user-based location recommendation algorithm is extended by combining the time factor when calculating the similarity between users [10], in which only a certain period was considered rather than the entire period. Therefore, the inconsistency, preference of the user differs at different times of the day and the continuity, preference of check-in activities at consecutive periods of time are supposed to be considered [14].
In conclusion, various factors can influence the preference of users, such as time and the location. Previous research have lacked the integrated analysis of the joint effect of multiple factors. Our approach for supporting restaurant recommendation service in LBSNs is to strategically takes different factors into consideration. And the data sparsity problem is alleviate by clustering restaurants in the dataset according to price tags, which also can reflect the preference to some degree.
The location-price-based collaborative filtering algorithm
In order to make recommendation more accurately, the following preconditions are required for experiments in this paper: (1) The frequency of check-in can indirectly reflect users’ interest in the location [8]. Specifically, the frequency of check-in has positive effects on the level of interest. (2) The users’ location can be fed back to the system via GPS in real time.
To make a clearer statement, this paper introduces a multi-layer model. As shown in Fig. 1, the model has four layers:

The multi-layer restaurant recommendation model.
(1) The Data Acquisition Layer (DAL)
(2) The Reasoning Layer (RL)
(3) The Recommendation Generation Layer (RGL)
(4) The User Interface Layer (UIL)
The purpose of the DAL is to gather, organize and manage the data. Foursquare is a LBSN that allows people to share their location with friends by ’checking-in’ at a given place using their smartphone. The public check-in dataset in Foursquare is adopted to verify the proposed algorithm. The dataset used in this article includes 700,000 users, 206,000 venues, and 420,000 check-in records in New York. According to the data sparsity calculation formula, the result is almost 100%.
Therefore, we preprocess the data and exclude the users with less than 50 check-in records. Finally, 4,328 pieces of data are preserved with 51 users and 903 restaurants. All descriptions are shown in Table 1.
The description of the user-restaurant scoring matrix
The description of the user-restaurant scoring matrix
The relevant data, including the check-in records as well as the longitude and the latitude of each restaurant, are extracted from Foursquare dataset. Combined with the yelp dataset, the per capita price and the business time of restaurants are found out. After that, the contextual data are formed and stored in the database for the invocation of the RL.
This layer mainly contains two parts: one is to cluster the contextual data according to the price tag, another is that the geographical information in the underlying database is used to calculate users’ travel probability. After the process mentioned above, the contextual data is transformed into the contextual information.
The price grouping
It can be seen from Table 1 that the dataset used in the experiments is a user-restaurant scoring matrix with the density of 10-2. Obviously, the number of the restaurants visited by each user is extremely rare compared to the total number of restaurants, which leads to the data sparsity problem. That will reduce the accuracy of the recommendation result. Therefore, this paper adopts a data preprocessing method that uses the price tag to construct the user-price scoring matrix so as to reduce its sparsity. The restaurant is divided into five classes: the low-end restaurant referring to per capita consumption less than 20 yuan, the medium and low-end restaurant referring to per capita consumption between 20 and 50 yuan, the mid-range restaurant referring to per capita consumption between 50 and 100 yuan, the medium and high-end restaurant referring to per capita consumption between 100 and 200 yuan, and the high-end restaurant referring to per capita consumption more than 200 yuan [19].
In this section, the user-restaurant scoring matrix is transformed into the user-price scoring matrix. Specifically, in the user-restaurant scoring matrix, there are m users U = {u1, u2, . . . , u m } and n restaurants R = {r1, r2, . . . , r n }. Each user has his/her frequency of visiting a set of restaurants, generally in an explicit way with the value in a given numerical scale. This information is gathered in a user-restaurant scoring matrix A = (a ur ) m×n, in which a ur represents the frequency that user u visits restaurant r. Meanwhile, Supposing T = {t1, t2, . . . , t p } is the set of restaurant price tags, a certain restaurant will belong to any non-empty subset of T. The user-price scoring matrix with n restaurants and p price tags is denoted as B = (b rt ) n×p, and its elements b rt are defined as follows:
Afterwards, it is necessary to normalize the user-price scoring matrix C due to each user’s significant difference in the frequency of visiting. We can map the ratings c
ut
into the interval [0, 1] with the following formula:
After this, the sparsity of the scoring matrix can be successfully reduced. When applied on Foursquare dataset, the comparison of density of the pre and post scoring matrix is exhibited in Fig. 2. The density of dataset before processing is 0.09398, and decreases to 0.0549 with a variation of 41.6%.

The comparison of the density between the original data and the processed data.
In this paper, the Rayleigh distribution or the second-order Weibull distribution is used to calculate the travel probability of users. It is a travel distance probability distribution function of urban residents proposed by Shi and Lu [18]. The travel probability increases at first and then decreases with the distance, which is similar to the actual distribution characteristics of travel distance. The travel probability p (u, r) of user u to restaurant r is given as follows:

The curve fitting.
As it can be seen from Table 2, the goodness of the fitting is 0.7637, which means the dependent variable can be well explained by an independent variable. The root mean square error value is 0.02799, which means the deviation between the observed value and the true value is small.
The fitting result
This layer is the core of the TLP-CF algorithm. As it is shown in Fig. 4, the inferred information from the RL, including the user-price scoring matrix and the travel probability, is sent to the processor. The processor will use the data to run the user-based collaborative filtering algorithm to get the ratings of candidate price tags, which will be combined with the travel probability to generate a list of preselected restaurants. Meanwhile, it will send a request to the database with the current time. The database responds to the request by finding the restaurants which are at business hours and sends the result back to the processor. After filtering out the pre-selected restaurants that are not in the result set, a list of restaurant recommendations will be generated.

The recommendation algorithm flow chart.
Generally, the computation of the predicted rating r
ut
of user u for price tag t is the weighted sum of other users’ ratings [4]:
However, this correlation metric does not achieve satisfactory results in the restaurant recommendations. It is because that it only considers the similarity of the preference about the price between users but neglects the geographical influence. Therefore, this paper proposes a rating calculation method by taking into account users’ check-in information as well as the distance between restaurants and users. The corrected rating cr
ur
of user u to restaurant r can be calculated as follows:
The UIL is the top layer of the restaurant recommendation model. This layer is designed for the system to interact with users. It consists of two parts:
(1) User interface controls are visual components, such as buttons and menus, which can fetch data from users and display results to them.
(2) User interface components are non visual components. They receive data from the lower layers and send commands to them (for example, search actions).
Combining these controls and components gives a nice experience when using the recommendation system. The UIL is mainly responsible for the display of the user interface in the client’s browser, and it does not include any logic processing. In UIL, there are plenty of ways can be selected and adopted to visually display the results transmitted by the RL. Mainly through HTML and CSS to achieve, all the information will be nicely presented. At the same time, the UIL is also responsible for obtaining the data entered by the user through the interface. On the basis of validating the relevant data, it will transmit these data and the data stored in the underlying DAL to the RL.
The overview of model operation
The way that the model works can be described as follows. Restaurant recommendation algorithm recommends restaurants by integrating the contextual data in the DAL. These data will be sent to the RL and converted to the contextual information. Then, the contextual information will be passed to the RGL, in which the processor sends queries to the database to get the restaurants that are available. After that, the TLP-CF algorithm will be executed with the contextual information from the RL and the list of available restaurants from database. It will generate the final list of restaurant recommendations.
Eventually, the results will be delivered to the UIL for display. Our research and findings complement a recent study [16], which combines the business time of the restaurant, the activity duration and the personal preference to recommend. But the calculation of the preference simply takes the type of restaurants into account, which means only the similar restaurants will be recommended. The potential needs of users cannot be recognized and it will lead to the long tail effect, which may go against the market development. Besides, without preprocessing the data makes the sparsity problem serious. Our proposal may help to fix this by clustering restaurants according to the price tag, an important factor when choosing a restaurant. The experimental results in Section 4 show that the proposed recommendation approach can effectively recommend the most appropriate restaurant to users and improve the performance of the traditional user-based collaborative filtering.
More importantly, most of the research incorporating time factor into the recommendation process use the user-time or time-category scoring matrix [17], which increases the sparsity of the data because that the meal time is relatively concentrated in a few period of the day, usually at noon or at night. We suggest filtering the list of restaurants according to the current time to reduce the sparsity of data rather than constructing the time matrix.
The experiment
In this section, we will give experiments to explain the advantages of the TLP-CF algorithm.
The dataset
As mentioned above, this paper preprocesses Fours- quare dataset by excluding users with less than 50 check-in items, which screens out 51 users and 903 restaurants and left 4,328 pieces of data in total. Finally, combined with the yelp dataset, the information such as per capita price and business time of the restaurant is added.
Benchmarks and experimental settings
We demonstrate the experimental results on recall, Precision and F-measure for our recommendation task. For recall, we show the performance where the value of K is 5, 10 and 20. To verify the effectiveness of our model, we compare it with the following state-of-art recommendation methods: UBCF [4] refers to the user-based collaborative filtering, which is a primary collaborative filtering method that does not include enhancing influences. LARS [15] refers to a location-aware collaborative filtering recommendation approach which takes the distance penalty into account. PBSAA [16] refers to a person-based spatiotemporal accessibility analysis method, which measures individual access to urban opportunities in space and time. TLP-CF refers to our methods. The dataset was divided into two parts in the experiment. One part is used as the training set to extract users’ preference, and the other part is used as the test set to make restaurant recommendations. The training set contains 90% of the initial dataset (3,896 ratings) while the test set contains the remaining 10%.
The evaluation metrics for the recommendation
Specifically speaking, the recommendation algorithm calculates the prediction score for each restaurant of users, and then returns the candidate restaurants with the highest top-K prediction score for the user as the recommendation result. Therefore, this paper evaluates the algorithm performance by the metrics including the precision, recall and F1-measure, to measure the closeness of the predicted ratings to the actual ratings. The precision represents how many restaurants in the recommendation list are actually visited by users in the test set, the larger the value is, the better the results are. The recall represents how many restaurants are actually visited in the recommendation list, it is also positively correlated with recommendation performance. P and R metrics sometimes have contradictions, so they need to be considered comprehensively. The most common measurement is F1-measure. F1-measure is a comprehensive evaluation metrics based on the precision and the recall, the larger the F-measure, the better recommendation results. The calculation formula of the precision P u (L) and recall R u (L) of user u are as follows [22]:
However, the precision and recall are negatively correlated and depend on the length of the recommended list. When the system does not have a fixed recommended list length, a two-dimensional vector containing the precision and the recall is needed to reflect the performance of the system. F1-measure was proposed and defined as [21]:
In this section, the algorithm proposed in this paper is compared with other two existing algorithms, that is the user-based collaborative filtering (UBCF) [5] and a location-aware collaborative filtering recommendation approach (LARS) in which the distance penalty is considered. The calculation formula of the scoring
The recommendation result for user u is affected by two factors, namely the number of neighbors for calculating similarity and the number of recommended restaurants. Figs. 5, 6, and 7 present the performance of the three recommendation algorithms when the number of neighbors is 5 and the number of recommended restaurants is 5, 10, 15, 20, respectively.

The comparison of the precisions.

The comparison of the recalls.

The comparison of the F1-measure.
According to Fig. 5, we can find that the proposed algorithm has the highest precision value of 0.1 when the number of the recommended restaurants is 5. As the top-K value increases, the precision gradually decreases to 0.05833, decrease in precision represents a decrease in the proportion of all positive predictions that are truly correct, which means a decrease in recommendation performance. Therefore, 5 is the best one. While Fig. 6 reveals that the trend of the recall is opposite to that of the precision. With the increasing in the number of recommended restaurants, the recall rises from 0.038502 to 0.0999. This indicates that the precision and the recall are mutually constrained. In order to measure the recommendation more accurately, F1-measure, which combines two metrics, is used. As shown in Fig. 7, the value of F1-measure in the proposed algorithm rises from 0.0556 to 0.0737.
Based on the above analysis data, we can conclude that the algorithm is superior to the other two algorithms in all metrics. To be more specific, when the number of the recommended restaurants is 5, the precision of TLP-CF algorithm increases by 83.25% and the recall increases by 10% comparing to the LARS. By this, it can be seen that the distance decay function is more suitable when considering the geographical influence. The comparison between the TLP-CF and UBCF illustrates that combining the contextual information can effectively improve the accuracy of the recommendation.
In this paper, an approach for the restaurant recommend has been proposed based on the contextual information of users. The user-based collaborative filtering is combined with the distance decay function to consider the geographical influence on the restaurant choice. As for the time impact, the algorithm filters out restaurants that are not in business hours to improve the performance of recommendations. In order to alleviate data sparsity of collaborative filtering, restaurants are grouped by the price tags. Experiments on the Foursquare dataset prove that the algorithm can effectively deal with the data sparsity and has a better performance than the traditional algorithms. However, the proposed method does not consider the different genre attributes of the place that restaurants locate. The future research will examine the semantic features of the geographical location, which can be classified to improve the performance of the recommendation.
Footnotes
Acknowledgments
This paper was supported by the National Social Science Foundation of China (No. 18ZDA086), the National Natural Science Foundation of China (Nos. 71661167009) and the Beijing Natural Science Foundation (No. M21025).
