Inferring popular locations in urban for professional education

Abstract

With the rapid development of urbane-centered economy, urban area has gone through strong but heterogeneous sprawl. In such complex urban systems, it is impossible to established teaching centers of night school in every district of city for continuing education programs. Part-time students tend to be educated in popular locations of city due to convenience. Since call logs and geographical nature of mobile phone data can provide an opportunity to measure human behavior and social dynamics, we investigate how to infer urban popular locations with large-scale quasi-social network for avoiding the limitation of data collection and even privacy problems. A large-scale quasi-social network model is developed via measuring the number of shared-user between zones, which is different from previous models for social network. We first verify whether or not this model also can show the social structure of given data, the ranking of places in the model have been calculated based on eigvalue metric. To understand the connections between popular locations of human activity and spatial structure, we present a method to infer the core zones in given region, and then we use a simple metric to evaluate the most popular locations of human activity.

Keywords

Education programs social network popular locations case study

1 Introduction

Since smart city have been recognized by many countries in recent years, urban spatial structure [1, 2] that measured by the degree of spatial concentration of population and employment, would have many meaningful applications in a great variety of fields, including urban planning [3, 4], public transport planning [5 –7] and locate-based recommendation [8, 9]. Because the organizations and the functions and role of different areas of city [10] in people’s life can be described more accurately form the insight into urban spatial structure of a city, the study of urban spatial structure provided a great potential for the locate-based mobile applications such as Wechat and Groupon to improve the user experiment of existing services, on the other side, understanding the spatial concentration of population and employment would help the urban planners to optimize the value of existing infrastructures in the city [11]. In particular, understanding popular locations in urban would help educators to establish teaching centers for popularizing continuing education programs.

Two key features of urban spatial structure are core zones and hot locations that were respectively defined as the sets of those popular locations where people like to go and the most popular location of different core locations in given area. Consequently, inferring the core zones and hot locations in city is an important research goal.

Since urban spatial structure that consists of these certain places represents the spatial concentration of various types of activities generated from individual people, general principle inferring the core zones or hot locations could incorporate simple statistical metrics. For example, the total number of individuals or human activities near given place would be a direct index to rank the top destinations for individuals in this area. Nonetheless, due to the limitation of data collection and even privacy problems, the lack of the completeness and the credibility of available data causes the ranking results seem to be subjective. The place entropy [12, 13] representing the individual diversity of given place also can be used to rank the popular places, however, due to the same lack of available data, whether or not the place entropy can infer the core zone and hot spot remains unknown. More specially, one challenge remains in inferring the core zone and hot location in static certain partitions have an entire region through modeling the limited data and thus avoiding privacy problem. For example, the telecoms providers or locate-based commercial application providers have sufficient incentives to partition the urban spatial structure and further discover the core zones and hot spots in partitions for their business.

Another challenge in inferring the hot locations in different partitions is dynamically. For example, a cellphone user wants to search the most popular place on his mobile phone within a 1000 meters’ radius around his current location. However, the public data generated from regional demographic survey usually provide a static perspective to rank the hot places, and the service providers urge a dynamic approach to calculate the hottest place in the selected dataset. Consequently, extrapolating hot spots formed by human activity patterns across the given dynamic region requires developing quantitative models shape local activity patterns based on as few types of available data.

There is a set of studies that examine the links between urban spatial structure and human activity patterns. In [14], the results imply the community formation between locations in a mobile phone communication network is related to geographic context, including social structure, wealth distribution, economic production and land use. In [1, 15], the urban spatial-temporal structure was analyzed by using activity-based travel survey data. These studies reveal spatial proximity can also promote community formation in networks of individual people to large-scale social networks in which network nodes represent a population of people at a given location. Moreover, individual attributes such as homophily and focus constraints in located-based social network [16] and the communication interactions among mobile towers [17] had been used to shape the edges of the large-scale social network, where each node represents co-locations between users or mobile towers [18 –20].

2 The methodology

2.1 Dataset

We consider a district in Wuhan named Wuchang as an example to begin the study about inferring hot locations in urban. Wuhan is the most popular city in Central China, which was sometimes referred to as the “Chicago of China”. Wuchang District is one of the seven central districts that merged into Wuhan. Then, in our research, an anonymous dataset was used to evaluate the shape of human activity patterns. This dataset, that accounted for approximately 25% of the population and was collected by a Chinese telecom operator, are composed of the user lists that call from 219 mobile phone towers (Fig. 1) in Wuchang District which cover about 20% of the total land area of Wuhan. As shown, the 219 mobile phone towers are distributed on the most regions of Wuchang district, which are about 3.5 square miles. There are five known centre business districts (CBD) in this area as follows: 1 represents Zhongnan, 2 is Jiedaokou, 3 is Huquan, 4 is Luxiang, and 5 is Jinronggang. Many functional areas such as univsersity campus or residential settlements located among these CBDs. In common sense, we can observe heavy traffic among these mobile phone towers in this area.

In the dataset, there are about 36 million (35610100) mobile phone communications occurred between 2 weeks in august 2012 and September 2012, for each call, the tower used by the phone initiating the call and the tower used by the telephone receiving the call were recorded. The records also contain the ID of mobile phone tower and the call time. As mentioned in previous section, the communication traffic records of towers can reflect the human activity patterns within the spatial dimension to some extent. Since it do represent some types of individual activity, we first consider two basic characters of the communication traffic record for each tower during the 2 weeks, namely, the total number of communication records and the total number of mobile phone users. As shown in Fig. 2, it can be obvious that there is a positive correlation between the two basic characters. The vertical axis presents the total number of communication records (C1, blue line) and the total number of mobile phone users (C2, red line). As shown, about 8 towers have higher levels than the other towers. More specially, Table 1 show the top 8 towers of user number base overlaps with that of call number, and these towers located respectively in different CBD such as Zhongnan, Jiedaokou and Luxiang. These characters are congenial with reason and common sense.

2.2 User diversity and Eigenvector Centrality of the locations

We also consider a potentially better metric to discover the popular locations in this area. In [21], the author proposes the entropy of the “venues” [22] can be applied to capture its user diversity generated from different visitors among those ventures. So we calculate this entropy e of mobile phone towers by the communication traffic records in similar way: $e = - \sum_{u : u \in U \land P_{i} (u) > 0}^{N} P_{i} (u) log (P_{i} (u)) N$ where i is the corresponding tower in the set c of all mobile phone towers, U is the set of all mobile phone user (total number is N) that observed in the dataset, P_i (u) is the fraction of communication records in tower contributed by user u. Moreover, given the set of mobile phone user that observed in tower is U_i ⊂ U, i ∈ C, the set of human activities that observed in mobile phone towers can be represented by s_i ={ (u_k, f_k) |i, u_k ∈ U_i }, where f_k be the frequency of the user u_k have been recorded by such towers. So, we define the number of co-users between tower i and j as follows: $w (i, j) = | U_{i} \cap U_{j} |$

Since the co-user of mobile phone towers can represent the social interaction among the corresponding places of such towers, we define a large-scale quasi-social network that all mobile phone towers c represent the set of its nodes. And the edge of node i, j be $e_{ij} = {\begin{matrix} 0, w (i, j) < p \\ 1, w (i, j) \geq p \end{matrix}$

And p is the threshold via the distribution of the co-user number between all nodes.

Therefore, this undirected binary network has been developed to infer those popular locations in this dataset via the shared-individuals (co-users). We chose the threshold p = 2000 and thus obtain the adjacent matrix of this binary network. We analyze the basic properties of this undirected binary network under different p value to understand the availability of our following analysis by the network model. When p = 2000, we found the mean of degree (MD) is 58.174, its maximum modularity is about 0.16-0.17(when p = 500, 1000, 1500, the corresponding MD is 137, 98, 74.8. the corresponding maximum modularity is about 0.07, 0.11, 0.14). Furthermore, we also consider the undirected weighted network that the weights of edges are determined by the level of co-users between nodes. We found the weight of edges in the network structure had not heavy influence on the performance of community partition.

Since there will always be some noises (ex: the activity from deliveryman) and data limitation as mentioned above, we can use several known measurements in social network theory to discover the hot locations in given area based on the proposed quasi-social network model.

An intuitionistic metric is Eigenvector centrality [23], which is an effective measure of the influence of a node in a network (especially undirected network). It assigns relative scores to all nodes in the network based on the concept that connections to high-scoring nodes contribute more to the score of the node in question than equal connections to low-scoring nodes. Google’s PageRank algorithm is a variant of the Eigenvector centrality measure. In our large-scale quasi-social network, the top-ranking nodes that were evaluated via eigenvector centrality represent the relative popular locations which have more active calls [24, 25].

2.3 Core zones and popular locations

Furthermore, a feasible approach to infer the hot locations of different zones in the area, which captures our attention. Firstly, the modularity maximum algorithms [26] for detecting community in social network have been applied to divide given area to different certain quantity areas (about 3–7), which can be identified as the cluster of locations that have more social interaction among them since they have a large enough number of co-users in this large-scale quasi-social network.

The modularity Q can be written in the following form: $Q = \frac{1}{2 m} \sum_{i, j} [A_{i, j} - \frac{k_{i} k_{j}}{2 m}] δ (c_{i}, c_{j}) .$

where A_i,j is the weight of edge between node, k_i is the degree of node i, c_i is the membership of node i, δ (u, v) is a Boolean value 1 if u = v, otherwise, 0. m is the total number of edges in the network.

Then, to infer the core zones in every activity areas with different partitions, an improved approach of modularity algorithm through repeated calculating the partitions of activation areas under various parameters has been proposed. The approach can be described as follows:

Given a partition for the x-th time $C_{x} = (c_{1}^{x}, c_{2}^{x}, \dots, c_{k_{i}}^{x})$ , where the total number of activity areas k_x = |c_x|. After executing SA method [27] X times, we get a membership matrix M = K × X, where K = max(k_x), x ∈ X. Its element M_ij is the membership of tower i that be allocated on the j-th implementation, and M_ij < K.

We further obtain a set sequence of core zones $S_{c} = (s_{c}^{1}, s_{c}^{2}, \dots, s_{c}^{m}) \subset C$ , C is the total number of location set. For a set of core zones, $\begin{matrix} \forall i, j \in s_{c}, \sum_{k = 1}^{X} δ_{k} (i, j) \geq x_{0} \\ δ_{k} (i, j) = {\begin{matrix} 1, M_{ik} = M_{jk} \\ 0, M_{ik} \neq M_{jk} \end{matrix}, k = 1, 2, \dots, X \end{matrix}$ where The threshold value x₀ ≤ X. In this study, we set x₀ = X.

At last, we use a new metric called KCM via k-means and cosine similarity to verify the above analysis and further obtain more clearly popular location. Specially, the cosine similarity provides a reasonable measure to evaluate the level of similarity on the human activity between two places.

Given the co-users U_ij of any two towers, U_ij = U_i ∩ U_j. The cosine similarity between can be represented as

$s (i, j) = \sum_{u = U} \frac{p (i, u) \times p (j, u)}{∥ p (i, u) ∥ \times ∥ p (j, u) ∥}$ where $p (i, u_{k}) = f_{k} / \sum_{1}^{N} f_{k}$ is the probability of the user u_k active on the tower.

To each partition, the sequence of popularlocation $({ct}_{1}^{x}, {ct}_{2}^{x}, \dots, {ct}_{k_{i}}^{x})$ can be obtained by thefollowing rules: ${ct}_{i}^{x} = arg min_{x \in c_{i}^{x}} \frac{1}{| c_{i}^{x} |} \sum_{y \in c_{i}^{x}, y \neq x} s (x, y), i = 1, 2, \dots, k$ .

3 Results

Although the location of a subject was collected only when the subject was connected to the cellular network, location traces from mobile phone data have been shown to be a reasonable proxy for individual human mobility, we firstly check whether the entropy of each locations that measured by the diversity of visitors of corresponding mobile phone towers can indicate the hot locations in this region. As shown in Fig. 3 and Table 2, the popular locations ranked by entropy of venues appears to have similar with two previous basic characters C1 and C2, since we also found that there are slightly different between these measurements. Given the principle of place entropy, to a large extent we hold that the level of place entropy totally dependent on the popularity of corresponding location. However, the lack of data caused by the market share of our mobile phone operator and subjects’ call plans, limit the availability of place entropy inevitably.

Therefore, applying eigenvector centrality to this binary network in Gephi [28], we found the top-ranking hot locations were almost exactly like the obtained results via previous three metrics. Furthermore, as Fig. 4 shows, the corresponding towers of top-ranking hot regions distributed mostly near three popular CBDs, respectively is Zhongnan, Jiedaokou, and Luxiang). However, as described in the above section, this region has five known CBDs that can be further identified. The different areas in the region have their respective hot locations.

Based on our large-scale quasi-social network, the different partitions of activity areas have been calculated using the modularity maximum algorithms. More specially, we calculate the activity areas under various parameter k, the results are Fig. 5, As shown in Fig. 5, the partition of activity areas with K = 3, 5, 7 availably reflect the distribution of activity from the different scale in this region. In Fig. 5.1, the green dots cover the activity area near Luxiang and Jinronggang, the red dots cover the activity zone near Huquan, and the blue dots cover the activity zone near Zhongnan and Jiedaokou. To higher scale, the partitions of activity zones presented in Figs. 5.2 and 5.3 also imply that to some extent the activity areas were composed of different regions should be formed around some core regions such as CBDs.

Using the above approach, we calculate the core zones in this region as Fig. 6. As shown in Fig. 6, the core zones distribute basically within the range of the five CBDs. In general, the popular locations such as CBDs would be considered as a popular spot in every activity zone, that is to say, the residents in an activity zone have some locations that can hold their interest such as shopping and entertainment.

4 Discussions

Since mobile phones can provide an opportunity to measure human behavior and social dynamics, as those known works showed, call logs and location traces allow researchers to undertake large-scale objective studies of social phenomena.

Here, we investigate how to infer urban popular locations with large-scale quasi-social network. A large-scale quasi-social network model is developed via measuring the number of shared-user between zones, which is different from previous models for social network. Therefore, we first verify whether or not this model also can show the social structure of given data, the ranking of places in the model have been calculated based on eigvalue metric. To understand the connections between popular locations of human activity and spatial structure, we present a method to infer the core zones in given region, and then we use a simple metric to evaluate the most popular locations of human activity. As a case study, the results can be used to design the teaching centers of night school in any city.

The first limit to our study is that the correlation between call logs of a connected cell tower and the human activities that occurred in the cell tower’s coverage region. Generally, it would be affected by geographical location of cell tower since the market share of the given telecom would have certain differences in each cell tower’s coverage region. However, we hope that the considerable scale of our aggregation of call activities and the geographical nature of the collected data would compensate for this.

Another inevitable limitation is our foundation of this research. Our original intention is to develop an acceptable method that can utilize limited types of individual activity data and thus avoiding privacy problem to access popular locations in urban. Such extreme cases are relatively rare, and we have many equally valid approach to obtain same results based on various types of activity data, however, we have faith the significance of the presented study will rise in future since this study provides a new perspective to infer popular locations in urban by using a large-scale quasi-social network. More specially, we believe the modeling of the large-scale quasi-social network require a less stringent social interaction among nodes, that can improve the practical value of our approach.

References

Jiang

, Ferreira Jr

and Gonzalez

M.C.

, Discovering urban spatial-temporal structure from human activity patterns, In Proceedings of the ACM SIGKDD International Workshop on Urban Computing, 2012, pp. 95–102. ACM. 1.

Wang

, Lv

, Li

et al., Virtual Reality Based GIS Analysis Platform, Neural Information Processing, Springer International Publishing, 2015, pp. 638–645.

Becker

R.A.

, Cáceres

, Hanson

, Loh

J.M.

, Urbanek

, Varshavsky

and Volinsky

, A tale of one city: Using cellular network data for urban planning, Pervasive Computing, IEEE10(4) (2011), 18–26. 2.

Zhang

, Han

, Hao

D.S.

et al., ARPPS: Augmented Reality Pipeline Prospect System, Neural Information Processing, Springer International Publishing, 2015, pp. 647–656.

Toole

J.L.

, Ulm

, González

M.C.

and Bauer

, Inferring land use from mobile phone activity, In Proceedings of the ACM SIGKDD International Workshop on Urban Computing, 2012, pp. 1–8. ACM. 3.

, Lv

, Hu

et al., Xearth: A 3d gis platform for managing massive city information, Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), 2015 IEEE International Conference on, 2015, pp. 1–6.

and Su

, 3d seabed modeling and visualization on ubiquitous context, SIGGRAPH Asia 2014 Posters, 2014, ACM, 33.

Zheng

V.W.

, Zheng

, Xie

and Yang

, Towards mobile intelligence: Learning from GPS history data for collaborative recommendation, Artificial Intelligence184 (2012), 17–37. 4.

, Lv

, Gao

et al., 3d seabed: 3d modeling and visualization platform for the seabed, Multimedia and Expo Workshops (ICMEW), 2014 IEEE International Conference on, 2014, p. 6.

10.

Yuan

, Zheng

and Xie

, Discovering regions of different functions in a city using human mobility and POIs, In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp. 186–194. ACM. 5.

11.

, Lv

, Zheng

et al., Assessment of lively street network based on geographic information system and space syntax, Multimedia Tools and Applications (2015), 1–19.

12.

Pelechrinis

and Krishnamurthy

, Location affiliation networks: Bonding social and spatial information, In Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, 2012, pp. 531–547. 6.

13.

Yang

, He

, Lin

et al., Multimedia cloud transmission and storage system based on internet of things, Multimedia Tools and Applications (2015), 1–16.

14.

Caughlin

T.T.

, Ruktanonchai

, Acevedo

M.A.

, Lopiano

K.K.

, Prosper

, Eagle

and Tatem

A.J.

, Place-based attributes predict community membership in a mobile phone communication network, PloS One8(2) (2013), e56057. 7.

15.

Jiang

, Ying

, Han

et al., Collaborative multi-hop routing in cognitive wireless networks, Wireless Personal Communications86(2) (2015), 901–923.

16.

Liu

, He

, Tian

, Lee

W.C.

, McPherson

and Han

, Event-based social networks: Linking the online and offline social worlds, In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp. 1032–1040. ACM. 8.

17.

Caughlin

T.T.

, Ruktanonchai

, Acevedo

M.A.

, Lopiano

K.K.

, Prosper

, Eagle

and Tatem

A.J.

, Place-based attributes predict community membership in a mobile phone communication network, PloS One8(2) (2013), e56057. 9.

18.

Jiang

, Xu

and Lv

, A multicast delivery approach with minimum energy consumption for wireless multi-hop networks, Telecommunication Systems (2015), 1–12.

19.

Zhang

, Zhou

, Jin

, Wang

and Cichocki

, Frequency recognition in SSVEP-based BCI using multiset canonical correlation analysis, International Journal of Neural Systems24(4) (2014), 1450013.

20.

Chen

, Huang

and Lv

, Towards a face recognition method based on uncorrelated discriminant sparse preserving projection, Multimedia Tools and Applications (2015), 1–15.

21.

, Rehman

S.U.

and Chen

, Webvrgis, Webgis based interactive online 3d virtual community. In Virtual Reality and Visualization (ICVRV), 2013 International Conference on, 2013, pp. 94–99.

22.

Cranshaw

, Toch

, Hong

, Kittur

and Sadeh

, Bridging the gap between physical location and online social networks, In Proceedings of the 12th ACM International Conference on Ubiquitous Computing, 2010, pp. 119–128. ACM. 10.

23.

Lohmann

, Margulies

D.S.

, Horstmann

, Pleger

, Lepsien

, Goldhahn

and Turner

, Eigenvector centrality mapping for analyzing connectivity patterns in fMRI data of the human brain, PloS One5(4) (2010), e10232. 11.

24.

, Lv

, Wang

et al., WebVRGIS based traffic analysis and visualization system, Advances in Engineering Software93 (2016), 1–8.

25.

Lin

, Yang

, Lv

et al., A self-assessment stereo capture model applicable to the internet of things, Sensors15(8) (2015), 20925–20944.

26.

Good

B.H.

, de Montjoye

Y.A.

and Clauset

, Performance of modularity maximization in practical contexts, Physical Review E81(4) (2010), 046106. 12.

27.

Liu

and Liu

, Detecting community structure in complex networks using simulated annealing with k-means algorithms, Physica A: Statistical Mechanics and its Applications389(11) (2010), 2300–2309. 14.

28.

Bastian

, Heymann

and Jacomy

, Gephi: An open source software for exploring and manipulating networks, In ICWSM (2009), 13.