Abstract
When a person visits an unknown large city having multiple interesting locations, it is not so easy for him to find one location that is lively and convenient to visit in a given time-frame. To overcome such a problem, this paper proposes to make use of two technologies: smartphones, equipped with sensors for reading GPS coordinates; and multi-agent systems, providing assistance to users and gathering collective knowledge. Data collected by means of devices are analysed and organised in such a way to find locations that could be of immediate interest to people. Proposed agents gather opinions from several users, in terms of scores quantifying the level of satisfaction on visiting some place on a given time-frame. While gathering such an opinion, a solution is put into place to preserve user privacy (his location). Suggestions are made to potentially interested users by selecting for them locations according to closeness and satisfaction scores. In this approach, interesting locations emerge from the analysis of data gathered, hence scores and suggestions can be available for any large city in any place, provided that enough people hand data to the system. Moreover, such places are found dynamically according to people behaviour and preferences.
Introduction
Current recommendation systems for tourist attractions or services available for a city are generally manually maintained, e.g. a city council releases a list of public services, or a publisher gives the top spots, or points of interest (POIs), for a place or region [2]. A static list of POIs can fall short on the support that people expect, since for a large city or region there could be hundreds of points, each with its own specific offer, where each point could vary over time, and sometimes with a high frequency. Manually curating such a list could be cumbersome, and still a surrogate of an up-to-date list in some cases. Moreover, each manually identified POI is usually not associated with an indication of the most advantageous time-frame for the visitors and on real-time conditions of the spot [2, 15].
Conversely, thanks to the availability of mobile devices, users can provide their comments on places they visit in real-time. Such comments can be collected and be useful for other persons seeking advices. Therefore, the proposed approach makes use of an agent-based solution to gather data on locations and the related user satisfaction in order to propose to a person the next place he could visit according to his current location, hence time to reach the destination. The proposed solution can manage to compose a custom list of POIs for each single user according to his position and the trajectories of many users, by selecting real time data, gathered on a server, about user experiences on some locations. While users provide their comments and rate a place, their precise location and identity are preserved, as they are only kept on the user smartphone, whereas peer agents and the server do not handle such data.
For extracting interesting POIs, especially when people have not given their comments yet, DBSCAN algorithm was used, which automates the aggregation of data related to locations that people have visited [13]. Moreover, flows of people are discovered automatically, and these are useful, e.g. to find common itineraries, to suggest reachable spots to users, or to submit improvements on the public infrastructure [25].
The main contribution of this paper is: (i) automating the extraction of the list of places of interests; (ii) updating such a list in real time; (iii) characterising places of interests with time slots; (iv) catering for user comments on places of interests; (v) selecting comments and places of interests according to user preference and position; (vi) preserving privacy for the user.
The proposed approach was validated by analysis on data concerning locations, for eliciting the meaningful places, starting from data available on: taxis, trucks, and people movements. The resulting POIs have been manually validated and results have shown that the automatic extraction worked well. While examining more datasets, analysing the experimental results, and comparing them, this work builds on our previous one [8].
The rest of the paper is organised as follows. Next section introduces and compares the related works. Section 3 describes the software architecture of the proposed multi-agent system. Section 4 discusses the solution for extracting POIs from data available. Section 5 introduces the solution for detecting flows of people and trajectories useful for suggestions. Section 6 reports the experiments and results on available datasets. Finally, Section 7 draws some conclusions.
Related work
The proposed approach has some similarities to several types of recommendation systems. Recommendation systems can be grouped into three main categories [5]: Collaborative Filtering (CF) [31], Content-Based filtering (CB) [24], and Hybrid Filtering (HF). Collaborative filtering lets users give ratings about a set of elements, so that when enough information is stored in the system, it is possible to make recommendations to each user based on information provided by those users that have the most in common with them. E.g., if Bob and Alan have seen the same horror films, one of the films of the same category seen by Bob is suggested to Alan and vice versa. Content-based filtering makes recommendations based on user choices made in the past. E.g., for a person who likes a carbonated drink such as cola, the system recommends similar soda drinks. The hybrid technique is a mixture of the first two and suggestions are given according to the experiences and choices of other users, falling back into the CF systems. Such experiences are documented by agents observing user behaviour in real time.
In the literature, several studies use CF to suggest itineraries or POIs. In [37], itineraries were recommended on the basis of multiple user-generated GPS trajectories. In [15], time-sensitive trip routes are proposed, which consist of a sequence of locations with associated timestamps. In [16], a user-based ranking for visited places is the basis to offer a recommended itinerary. In [11], data on Foursquare were used to find clusters considering both spatial and social proximity, and results were useful to characterise the amount of people and the their flows on portions of a city. In [30], photos on Flickr were analysed to suggest routes that can be pleasant, beautiful or quiet, according to the geo-location of photos and user comments. In [9] the services offered by a city are investigated from the point of view of locality. Moreover, in [4], data gathered from both Flickr and Foursquare were used to identify POIs in Milan (Italy).
Compared to the proposed approach, such systems have the following shortcomings: i) the points closest to the user are not identified; ii) data are not updated in real-time; iii) POIs are not automatically determined. This work includes the automatic calculation of POIs, starting from a set of trajectories. Furthermore, while other works offer a complete itinerary, urging the user to choose an origin and a destination, in this proposal, a user can immediately see the closest recommended points, and can dynamically change the next point of the itinerary according to the current needs. That is, it allows users to more effectively search through travel information and arrange their trip.
In [1, 25], the identification of relevant places is based on data provided by a telecom operator that had recorded events such as calls or text messages. This is a statistic on the amount of people near to some place of interest (e.g. a supermarket). For the said approaches, data are only available to telecom operators or very popular internet service providers. Instead the analysis proposed on this paper can be performed on data gathered from user devices, hence without the external support given by big telecom operators. Moreover, in our approach we suggest locations representing meaningful places for people, rather than just popular places.
In [17], places of interest are gathered according to an analysis of geotagged photos, and by clustering data according to DBSCAN. Such a work differs from the proposed approach in that it does not consider: time slots for places of interest, people comments on such places, and data updated in real time.
In [22], places of interest are determined by using clustering techniques on the geolocations of people. Our work brings additional contributions in terms of privacy for the users, while also providing users with real time comments and the classification of useful time slots for the selected places.
CF recommendation systems have been based on multi-agent systems, and agent-based recommender systems have been proposed in several scenarios [3, 34]. The authors in [33, 34] proposed a system that produces recommendations for both individuals and groups. In [3], a system based on multi-agent technology was proposed to list the recommendations on personalised tour attraction, highlighting the usefulness of finding points near the user. Similarly, [10] described an application to better plan travel decisions based on a multi-agent system. In [20], a multi-agent system allows users to optimise the energy consumption of their smart homes. Each electrical device is configured as a virtual agent. These agents work simultaneously and together to reduce consumption while ensuring user comfort, energy costs and maximum energy savings.
In the literature, user privacy is a topic much discussed. The unconditional use of smartphones and the multiple downloaded applications put a strain on the protection of user rights. Existing systems have mainly taken three approaches to improving user privacy: (a) introducing uncertainty or error into location data [14, 26], (b) relying on trusted servers or intermediaries to apply anonymisation [19, 36], and (c) using cryptography techniques [27, 29]. However, each system has weaknesses: the first approach loses accuracy because the uncertainty rate degrades the data, the second approach can be risky, since private data is exposed to proxies that could be violated and the third approach is often computationally expensive. Hence, hybrid approaches are often a proper compromise. This work offers a hybrid system while preserving privacy in two ways: i) by using a centralised server the agents do not directly know each other and can not exchange information (as for systems (b) above); ii) by only extracting information on POIs, and an approximate position of the user, only when he expresses an opinion on a POI (as for systems (a) above). In addition, this approach is the first approach in the tourism sector that deals with preserving user privacy.
Table 1 summarises the differences between the approach described in this paper and other existing works in location recommendation. Note that only the proposed approach makes use of gathered data to compute points of interests, which therefore emerge and change while users interact with the provided system; additionally, several precautions have been taken to ensure anonymity of the user identity and their location.
Differences between this approach and other existing works suggesting itineraries or popular destinations
Differences between this approach and other existing works suggesting itineraries or popular destinations
In this approach, an agent is an app installed on an Android device that tracks the user movements around Points Of Interest (POIs), in order to suggest new places to visit [2]. A Multi-Agent System (MAS) is formed by a network of computational agents that, directly or indirectly, interact with each other [12]. In this sense, in this work, apps on smartphones are agents that indirectly communicate among themselves and take some decisions on behalf of the user. I.e. the agent determines: (i) when geolocation data are sent, (ii) which displacement is added to geolocation data to preserve user privacy, (iii) how to rank incoming data. For the latter, since the agent shields the user preferences to the outside world, hence the agent is entrusted with the task of muting some alerts when the user has low interest for it, or they are not related to the actual flow of the user, and with the ranking of incoming comments and selected places of interests according to match user preferences. Each agent sends data to a centralised server, and is identified by an app id that the server has provided. User identity is preserved since: the server does not disclose such ids to the agents; agents can not directly communicate with each other; the user is not providing his personal data.
In an initial phase, each agent receives from a known server a set of POIs, computed by means of the DBSCAN algorithm [13] (as described in the following Section 4). Starting from such a list, each agent collects information about the user activities on the device running it, i.e. visited POIs and potential user comments on them.
Data collected are processed and cleaned up in the user device, in order to respect anonymity, and then processed data are sent the server. In detail, each agent: analyses only user data located in the vicinity (about one kilometre) of known points of interest and adds a displacement to the data before making it available to the server. In this way, it preserves the user’s privacy in two ways: 1) it does not always share the user’s position to the server, 2) when sharing the position, it is affected by a variable uncertain amount.
The server analyses received data to offer to other users detailed and updated information on POIs, suggesting POIs recently visited and appreciated. Points are computed by an algorithm that: (i) analyses all data coming from several mobile devices as skimmed by agents; (ii) computes POIs that are well-known and constantly visited by users, therefore creating the recommendation system for the users. To suggest POIs, the server considers: the list of points of interest visited by other users, the time slots of visits and a score assigned by users as soon as their visit to the point of interest is over.
Agents take the suggestions from the server and filter them, as the exact position of the user is known only to the local agent. The suggested points are sorted according to the user’s current position, and filtered for the time slot matching the current time. Moreover, points not yet visited by the user are marked. As for the time slots, the hours in a day have been organised into six 4-hour slots; slot1 = [00:00:00, 03:59:59], slot2 = [04:00:00, 07:59:59], slot3 = [08: 00: 00,11: 59: 59], etc. Therefore, data gathered for a POI are associated with a time slot based on the start time of the visit. E.g., if the visit starts at 10 a.m. this point will be labeled in the 4th time slot.
Figure 1 shows the overview of the recommendation system; each agent, Alice and Bob, collects information related to POIs visited by user. For each POI visited, the data forwarded to the server are: the inaccurate GPS position of the device, visiting hours (through a timestamp), the name of a place, an evaluation score of the place (i.e. a number from 0 to 5). To forward such data, the user needs to authorise the app to share his GPS position.

Recommendation system consisting of agents Alice and Bob gathering opinions, a server, and agent Claire receiving a recommendation.
After receiving a feedback, the server processes them, creating for each point a list containing: time slots preferred by the users, an approximation of the number of people (that is agents) visiting the site at that time, the user feedback as the average score computed. Such date are used to drive the recommendation system.
Therefore, the recommendation system consists of giving each user the most significant POIs for him, based on the user previous experiences. Finally, these points are sorted according to the estimated user distances. E.g. if there is a POI near him in which the number of people is adequate (i.e. according to the preferences of the user), the time slot matches the current time, and users have appreciated the visit (having assigned a positive score), then such a POI is suggested to the user by alerting agent Claire (see Fig. 1). Otherwise, if the place had not been appreciated by other users or the current time is not within the time slot referred by available data, then the place is labelled as not recommendable.
Each agent observes the user behaviour and visited areas and offers recommendations based on current GPS positions and the experiences gained by other agents (see Fig. 2). The most important properties of the agents in this proposed recommendation system are the following. Agents are independent: if an agent stops working the others continue their work without consequences. Agents are mobile: they are easily transportable thanks to their integration in the mobile application. Agents are reliable: given that the GPS coordinates are taken directly from the device, avoiding to obtain false data. Agents work by preserving user privacy: there is no continuous sending of positions, instead the server only receives data when agents are near POIs, and the coordinates sent to the server are displaced by a bounded random amount, thus preserving privacy. More specifically, two important steps are taken to obtain privacy: (i) users share their location only if they are close to a well-known POI, i.e. their position falls within 300 m from the POI); (ii) the position is processed through data masking techniques, i.e. a variable and unknown displacement will be added before being sent to a central server storing it. Having altered the position does not affect the recommendation system and guarantees better protection for the user.

A representation of the dialog windows on the proposed app, showing the list of points of interest, and recent comments from users.
In this context, security and robustness are features embedded in all the agents, working together by means of a centralised server. Thus, thanks to multiple agents and a server it was possible to create a recommendation system based on the experience and appreciation of users, able to suggest points of interests to users, while ensuring user privacy.
An extensive series of experiments was performed in order to study the movements of users in different situations.
Firstly, a method to extract StayPoints (SPs) is described, and then a clustering algorithm is used to find important places, called POIs, as it can be easily seen that places that are of interest attract several persons at the same time. The experiments confirm that users stop in the same areas for some common reasons, such as visiting a tourist attraction or taking advantage of the same service and remain in certain areas in common time slots. Additionally, these tests reveal that people move together from one POI to another.
The first step of the proposed solution concerns the identification of SPs, i.e. if a region is detected in which a user is stationed for at least a time equal to the Time Threshold (TimeThr) for a maximum space Distance Threshold (DistThr), then the centroid of the set of those registered points represents an SP for the user and the trajectory in question (see the pseudo-code in Fig. 3 for finding SPs).

Pseudo-code for SPs detection.
A GPS trajectory consists of a list of points {p0 = (x0, y0, t0) , p1 = (x1, y1, t1), …, p n = (x n , y n , t n )}, where ∀i ∈ [0, n] , p i = (x i , y i , t i ) with t i < ti+1, and x i , y i and t i represent longitude, latitude, and timestamp, respectively.
For reading trajectories, users are asked to confirm opt-in then coordinates are read at regular intervals, i.e. every five minutes, on the user device. Then, a variable displacement having a maximum value equal to 100 metres is added to the coordinate before they are sent to the server. The server collects a list of points, and timestamps for the app id, to perform the analysis detailed in the following.
The algorithm for SP detection analyses each trajectory individually, scanning its points in the order of registration. For each, considering the pairs between it and its successors, calculate the spatial distance between the two points p i , and p j , (line 9 of the pseudo-code in Fig. 3) and the time span between them (line 12). All points are located in an area enclosed by the DistThr distance threshold and for which the time span is greater than TimeThr form a SP region. The average of their coordinates (line 15) determines the centre of the area, and represents a StayPoint (SP) of the analysed trajectory. This captures the behaviour of the user, who is probably going around some building for a while. We want to study mainly the sections of the path in which he dwells for some reason in a spatial environment. We also extract the arrival time information for it, i.e. the time relative to point p i , and the leaving time, i.e. the time relative to point p j , (lines 16 and 17).
POIs, which are places that are expected to offer an interest or utility for one or more users, are obtained from SPs: a POI is the centre of an SP area when the user lingers for a considerable time around it, which is an attraction or site providing services.
DBSCAN that is the acronym for “Density-based spatial clustering of applications with noise” discovers clusters as dense areas in space, surrounded by sparser areas [13, 32]. This clustering algorithm is appropriate for geospatial data because it forms clusters of GPS points as areas of high density separated by areas of low density. Points in the sparse areas are usually considered noise for this analysis.
DBSCAN has two parameters, being min_points and ɛ , which formally define density. High values for min_points or low values for ɛ determine a high density for a cluster. A core point is a point in the dataset such that there exist min_points other points within distance ɛ, which are defined as neighbours of the core point. Parameter ɛ is to be chosen appropriately for the dataset and distance function.
For the geospatial dataset the Haversine formula was employed: the distance d (p
i
, p
j
) between point p
i
and point p
j
, when their coordinates are given by latitude (lt) and longitude (lg), is defined as follows:
Geospatial clustering depends on geographic information domain knowledge and the context of the users. 200 m was chosen as a spatial parameter because it is the average space between an intersection or a square around an attraction. Both parameters control the local neighbourhood of the points: making better use of geospatial and clustering knowledge to select suitable constraints and parameters is likely to yield better and more meaningful clusters.
An alternative technique to identify POIs is based on a significant number of people flows within the city. Inspired by the method in [35], people or taxi flows can be identified and then POIs can be determined, starting from a set of GPS trajectories. For each trajectory, the three basic elements are: longitude, latitude and timestamp. Two trajectories are spatially similar if they pass through the same n points in the same temporal order. The distance between two points is defined in Haversine Formula above. A flow is a continuous and uniform movement of entities (like people, cars or trucks) in one direction (an example is shown in Fig. 4), and a flow density is the number of entities travelling through the same flow. Finally, trajectories (spatially similar) are said temporal similar if entities movements occur in the same time slot.

Example of flow with a flow density equal to 43.
The first necessary step is data preprocessing, that is each trajectory is cleaned up by removing noise and outliers. Noise removal cleans the trajectory from points too close to each other, so as to work with more homogeneous trajectories. Outliers removal compares each point with the preceding and following points (it is necessary to have about five points), looking for abrupt changes in speed. In case of abrupt changes, the corresponding point is labelled as anomalous and it is deleted.
Once the data preprocessing step is complete, an algorithm calculates all flows and their density, finding spatially similar trajectories. The search for similarities uses Java
Using this approach, relevant flows are extracted in the city. Then, for such flows it is possible to determine POIs by matching POIs of the analysed city, taken from datasets available on the web, and the points close to each detected flow (with a maximum distance of 100 m).
This section discusses the experiments performed using the approaches presented above (Section 4 and Section 5). Three datasets were used to test each approach, namely Taxi trajectories data [23], Truck trajectories data [23], and GeoLife GPS trajectories [38–40]. Table 2 gives the main characteristics of such datasets.
Datasets of trajectories used for the experiments
Datasets of trajectories used for the experiments
We used three datasets in order to show the generality and robustness of our approach in finding points of interests, and flows of people. Moreover, the availability and analysis of several datasets let us show that it is worth organising into different time slots the hours of a day. Finally, the analysis let us reveal how close the main flows of people are to the real points of interests, which were taken from a curated list. Hence, we can show that the suggestion of points of interest can actually be convenient for users who need not travel long distances if they accept the suggestion.
Each dataset was individually tested and analysed. The following describes the results for each set and their validation, and then it reports a way for the POIs to be used in the proposed multi-agent system.
Taxi trajectories data [23] consist of the GPS trajectories collected by 101 taxis equipped with GPS sensors in Beijing area ([115.421387, 39.437614] x [117.321785, 40.609333] of longitude and latitude) during a month, from 30 October 2010 to 30 November 2010. The positions saved by the mobile GPS devices were almost one point per minute. Figure 5 shows the heatmap of Taxi trajectories. On the available dataset, each taxi data was saved as a separate text file. On each file, a data point was saved as a line with the format: id; timestamp; latitude; longitude; speed; direction. For the experiments, the last two fields were not used.

Heatmap of Taxi trajectories data, Beijing overview.
Firstly, the SP detection algorithm with a distance threshold of 200 m and a time threshold of 5 minutes was applied on taxis trajectory data. The obtained SPs were 31621, with an execution time of 51 minutes and 9 seconds. Secondly, relatively to the POIs detection in this dataset, the parameters set in the DBSCAN were 200 m for ɛ and 8 (SPs) for min_points. This algorithm produced 560 clusters whose centroids are the POIs, with an execution time of 10.2 seconds. Thirdly, resulting POIs were filtered to find the popular POIs, i.e. the POIs shared by at least 8 taxis. The total number of such popular POIs was 257. Out of a total of 101 taxis, the popular POIs obtained were visited by 8 to 95 taxis.
The identified popular POIs can be found in the area [116.093765, 39.79077] x [116.608337, 40.092962]. They are in Beijing and the most far apart pairs of points have distances 45 km for the longitude and on 35 km for the latitude.
Detecting trajectories
In the first test, the algorithm was tested with a series of Taxi trajectories. The first step was data pre-processing. Thanks to this step, data about 400 outliers were removed, and about 70% to 80% of data were removed by identifying noise for each trajectory.
Once the data processing was complete, the algorithm started with the identification of flows. For the aim of validating the results found using DBSCAN, the temporal flows were initially ignored. Therefore, about 80 flows were found with a minimum length equal to 1 km and a minimum number of taxis equal to 8% (or 8 out of 101). Then, an estimate was made of the useful timeslot to suggest to potentially interested agents. According to available data, it was been found that the preferred time slot was between 00:00 and 07:59. Of course, found POIs, even not popular one, have all been associated with their own time slot.
Finally, in general, the algorithm takes a trajectory as a reference to find flows. In the case at hand, according to data, checks to find flows were performed taking about 10% of the trajectories (randomly chosen) and all the results were assembled together. Table 3 shows the details for all the datasets considered, and the first row of data is about taxis. In the table, each row contains: the name of the dataset used (dataset); the reduction rate of the number of points in the trajectories thanks to the phase of preprocessing of the data containing the minimum distance between two points for the removal of noise (Distance); the percentage reduction of the points thanks to step 1 (Noise) and the percentage reduction of the points thanks to step 2 (Outliers); the time slot most chosen by users (Time Slot); the number of flows found (Number of flows); the maximum density detected among flows (Maximum Density); the maximum distance in meters of the longest flow found (Maximum distance). All the flows detected have a minimum number of 8% of the total number of users and a minimum distance of 1 kilometre.
Overview of the tests performed using three different datasets and the number of flows found
Overview of the tests performed using three different datasets and the number of flows found
Truck trajectory data, referred to as Truck, consist of GPS trajectories collected by 101 trucks equipped with GPS sensors in China during a period from August 2015 to October 2015. The space covered by the registered paths is the large area [86.882817, 0.230753] × [172.467424, 43.405276].
As found in the previous dataset, each truck was saved as a separate text file, and each data point was saved as a line having: id; timestamp; latitude; longitude; speed; direction. Again, for experiments, the last two fields were excluded.
Detecting POIs using DBSCAN
Looking for SPs in Truck Dataset the parameters DistThr = 200 m and TimeThr = 10 minutes have been chosen. A total of 54,962 Sps were identified, in a time of 5 h 1 minute and 5 seconds.
Figure 6 shows trajectories recorded by trucks and their SPs, they pass through these provinces: Shānxī Shěng, Shânxī Shěng, Gansu, Henan, Hubei, Hebei, Beijing Municipality, Hunan, Sichuan, Guizhou, Yunnan, Guangxi Zhuang Autonomous Region, Guangdong, Jiangxi, Anhui, Fujian, Zhejiang and Shanghai Municipality.

Trajectories of Truck dataset and their detected SPs.
Table 4 shows the results of the SP detection algorithm. The choice of different values of time threshold set in the SP detection is due to the different nature of the three datasets. For taxis, a reasonable time of stay is 5 minutes, for trucks 10 minutes if it has to consider the bays, for Geolife dataset, which includes routes of users on foot, DistThr was chosen equal to 20 minutes. These parameters were validated by the average speed value relative to the flows found, in the vicinity of the Popular POIs.
StayPoints (SPs) obtained for the analysed datasets
For POIs detection the spatial threshold in DBSCAN remained unchanged (ɛ = 200m) and the min points = 8 as in the previous case. The clustering execution time was 5.31 seconds with 1,065 POIs detected. According to the minimum number of taxis (8 out of 101), the popular POIs obtained were 127: they are shared by a minimum number of 8 trucks and a maximum number of 24 trucks.
They cross the counties: Shangsi, Longzhou, Tiandong, Long’an, Mashan, Pingnan, Teng, Yunan, the prefecture cities: Chongzuo, Wúzhōu, Yunfu, the districts Jinchengjiang and Yun’an, the Luoding city and the Kunming Subdistrict (see Fig. 7).

Popular POIs obtained for Truck dataset.
The obtained Popular POIs covered the area in [106.880591, 21.609655] x [113.670971, 24.685619], with 700 km of difference in longitude and 350 km of latitude difference between the two most distant pairs of points. From west to east touched the cities: Chongzuo, Nanning, Guigang, Qinzhou, Fangchenggang, Beihai, Yulin, Wuzhou, Zhaoqing, Foshan, Canton and Dongguan.
In the second case study, the proposed solution was tested with a set of Truck trajectories. The first step was data pre-processing. Thanks to this step, for each trajectory, data removed were approximately between 0.001% and 0.007%, by eliminating outliers, and then by around 70% - 80%, for removing the noise. In this case, a minimum distance between two points was set equal to 250 m for elimination of noise. This choice was made due to the high number of points present in each trajectory and almost allows us to halve the response times of the algorithm with a minimum number of flows.
Subsequently, the algorithm was started by taking 10 trajectories at random and using them as a reference for a comparison between all other trajectories. Therefore, it obtained about 114 flows with a minimum distance of 1 km and with a minimum number of users equal to 8% (or 8 out of 101). For such experiments, it was found that the preferred time slot is between 16:00 and 19:59, though each POI was associated with its own time slot. Table 3 gives the details of the experiments.
Dataset 3: GeoLife
GeoLife GPS trajectory dataset was collected in the framework of Microsoft Research Asia; it contains about 182 users and covered a period of over five years (from April 2007 to August 2012) [39]. The tests were performed in the metropolitan area of Beijing: it has selected the following range: [116.1, 39.7] x [116.7, 40.13] of longitude and latitude.
Each user GPS trajectory is saved as a separate text file and in these files, a data point is saved as a line with the format: id; timestamp; latitude; longitude; altitude; timestamp. For tests, the fields used are: timestamp; latitude; longitude.
Detecting POIs using DBSCAN
The trajectories of this dataset have been selected on 6 time slots of 4 hours each. Slot1 = [00:00:00, 03:59:59], Slot2 = [04:00:00, 07:59:59] and so on. The algorithms for SPs and POIs detection were started on each time interval. For the first phase the TimeThr was chosen equal to 20 minutes, the DistThr (200 m) was unchanged instead.
The execution time of the SPs detection algorithm for 100 trajectories was about 16 minutes. The clustering algorithm DBSCAN determined clusters for each time slot, with min_points equal to 10 or 15 and ɛ from a minimum of 200 m to a maximum of 400 m. The execution times were from 240 ms to 1.44s. On average, 20 clusters were found for every time slot (120 POIs in total, for a minimum of 9 POIs to a maximum to 29), which represent significant places for users, i.e. the centroids of such clusters are POIs. E.g. for time Slot 3 with the minimum number of SPs necessary to make a cluster as 15 and the ɛ equal to 200 m, 29 clusters were found, hence 29 POIs. Among such POIs the ones that were considered Popular POIs were that with a number of users greater than 10: for time Slot 3 it was found that 9 POIs were shared by a minimum of 11 individuals to a maximum of 80 individuals.
The total of Popular POIs on all 6 time slots was 36; looking for the most distant pairs of points they have 8 km of longitude difference, 25 km of latitude difference on this area [39.908309, 116.262296] x [116.368098, 40.128495]. These places of interest in question started from the north in Yangyang Paradise (amusement park), up to the Cultural Palace of Nationalities in Fuxingmen Inner Street in south, crossing Changping District, Haidian District with Tsinghua Park, Beijing Shi and Zhongguancun.
It is possible to read more details of this approach in our previous work [8].
Detecting trajectories
The analysis of Geolife dataset found 25 flows, shared by a minimum of 18 users to a maximum of 53 users. The length of paths that covered these flows is between 2 and 4 kilometres.
As for previous datasets, for this case study the pre-processing phase was carried out, which removed approximately 400 outliers, and about 60% values for each trajectory due to noise. A minimum distance of 150 m was found between the trajectories. Furthermore, the longest one was taken as the reference trajectory, finding fewer flows (compared to the first two approaches) but still sufficient to validate the first approach (see Section 4).
Validation of results by means of Google Maps
The points of interests found were verified by means of Google Maps and compared with a curated list brought to us by a native of the analysed places. For each dataset used, consisting of data referring different categories of people and movements, it was shown that the points of interests are compatible with the nature of the dataset and the behaviour of the users who collected them. In fact, points of interests were found in service areas, supplies and spare parts sales outlets with regard to users with trucks; service buildings and companies for businessmen travelling by taxi; and places of study and work for participants of the Microsoft Research Asia Geolife project.
With reference to the analysis using Taxi dataset, a 68% correspondence was found between popular POIs and the shared flows, as can be seen in Fig. 8, showing both flows (black lines) and POIs (black diamonds). Such places correspond to real useful sites and attractions, like: World park, Grand View Garden, Muzhi Gongdian amusement park, Ancestral Temple, Dongcheng Chongwen Science & Technology Museum, financial buildings, airport area, Long-distance Passenger Transport Terminal, Chaoyangmen SOHO shopping center, UK Embassy and Silver Bridge on Shichaha river.

Popular POIs in correspondence of Taxi dataset’s flows.
The density of flows detected in this test was between a minimum of 9 taxis to a maximum of 70 taxis (among a total of 101 taxis).
The intersections between the identified popular flows and POIs were found in the Wanliu, Haidian, Fengtai Districts, in the Tianzhu, Cuige, Lai Guang Yin, Shibalidianxiang Villages, in the Dongfeng area and in Nanmofang residential district (see Fig. 8).
Experiments on the Truck dataset have shown that there are 56 occurrences of popular POIs near the 114 identified flows. It indicates a meaningful 50% correspondence, which gives a proximity within 100 m on the latitude and longitude of the points with respect to the identified flows. Points and flows were computed on a minimum number of 8 trucks sharing them.
Popular places of interest found near the flows correspond to: service areas, parking lots, grocery stores, tollgates, ATMs, auto parts stores, lottery ticket dealer, Driver Examination Center, hotel, restaurants, repair service area, gas station, lubricating oil sale service and Port of Fangcheng. They belong to Nanning, the capital of the Guangxi Zhuang Region, to the prefecture cities: Fangchenggang, Qinzhou, Guigang, Zhaoqing, to the Guiping county-level city and to the district Wuming.
As for the Geolife dataset, on 25 flows found, there were 13 matches with popular POIs (hence 52% of correspondence). Fig. 9 shows the longest route found, recorded by 35 users and also the popular POIs discovered, which are in the same area.

Popular POIs of 5 time slots near Geolife’s flow 21.
Such places are found on 5 of 6 different time slots, from 00:00:00 to 19:59:59 (near flow n. 21), and they correspond to the sites: the Tsinghua Park for time Slot 1, Tsinghua University West Stepping Classroom for time Slot 2, the Tsinghua University Human Resources Service & Employment Center for time Slot 3, the Tsinghua University Biomedical Library for time Slot 4 and the High School Attached to Tsinghua University for time Slot 5.
Other popular POIs near the flows detected in Geolife dataset are: China Academy of Space Technology, China Aerospace Zhongguancun Astronautics Community, Satellite Building Parking Lot, Jade Palace Hotel Office Building and Kangtuo Science e Technology Mansion.
The above analysis shows that the correspondence among flows and popular destinations is between 50% to 68%. Such values are high when we consider that we matched the flows of common people, rather than simply tourists. For the considered flows, many people would go to their homes and working places, many of which are not popular destinations. Moreover, for the list of POIs in the above matching we considered a manually curated one, which can be deemed limited.
POIs provide the initial knowledge base to agents. At the beginning, when agents have not yet collected and shared the user opinions, the multi-agent system is based on data obtained from the detection of POIs and flows, as explained above. Accordingly, POIs were computed by means of the approach detailed in Section 1 and used as a knowledge base for the multi-agent system. The POIs are defined by the name and GPS coordinates. Such points were identified by using the SPs, and other details (i.e. the name of the place) can be obtained, via APIs, since available on the web using services such as e.g. OpenStreetMap or Google Maps.
Experiments have shown how POIs vary according to the needs of the users. For available data, by analysing the flows of people for the different datasets, we could determine that users moving on foot mainly want to visit banks, bars, restaurants; whereas users moving by taxi (generally tourists) mainly want to visit popular places like churches, museums. This allows us to offer a more efficient recommendation system, since the POI emerges from the gathered user positions and their trajectories, hence are related to the users. In addition, thanks to the GPS position taken by the agent, the POIs are recommended based on the users distance from it and near the POI, the user’s position is monitored, taking care to protect his privacy, as explained in the multi-agents Section 3.
Finally, data and suggestions offered by means of the proposed multi-agents system are enriched by using the second approach, shown in Section 5. Firstly, such an approach allowed us to validate the POIs previously found, and then to highlight the time slot that is generally preferred by users. The analysis was performed for six time slots (see Section 5), in this way the recommendation system improves its efficiency, offering information on time slots.
Suggestions to users are always updated and improved by the exchange of data between the independent agents, guaranteeing a reliable and updated service to the end user.
Conclusions
The proposed approach has given a solution for sharing collective knowledge about useful spots on a city or region. Users can share their comments on a place they visit thanks to an app available on their smartphones, and can receive suggestions on the next place to visit. By analysing available datasets that have recorded people position, it was shown that it is possible to find: points of interests, and people flows. Moreover, it is possible to relate to found flows and the points that were identified independently. In some cases, it was found that a vehicle used to reach some highly popular place is then used to visit another destination also being a popular place. The discovery of flows in cities has a wide range of applications: it can help the planning and optimisation of transport services, and in mass events can be used to signal traffic.
Footnotes
Acknowledgments
The authors thank Mr Hongxiang Tang for his contribution. Moreover, the authors acknowledge the partial support provided by a PO FSE 2014-2020 grant by Regione Siciliana.
