Inferring student social link from spatiotemporal behavior data via entropy-based analyzing model

Abstract

Social link is an important index to understand master students’ mental health and social ability in educational management. Extracting hidden social strength from students’ rich daily life behaviors has also become an attractive research hotspot. Devices with positioning functions record many students’ spatiotemporal behavior data, which can infer students’ social links. However, under the guidance of school regulations, students’ daily activities have a certain regularity and periodicity. Traditional methods usually compare the co-occurrence frequency of two users to infer social association but do not consider the location-intensive and time-sensitive in campus scenes. Aiming at the campus environment, a Spatiotemporal Entropy-Based Analyzing (S-EBA) model for inferring students’ social strength is proposed. The model is based on students’ multi-source heterogeneous behavioral data to calculate the frequency of co-occurrence under the influence of time intervals. Then, the three features of diversity, spatiotemporal hotspot and behavior similarity are introduced to calculate social strength. Experiments show that our method is superior to the traditional methods under many evaluating criteria. The inferred social strength is used as the weight of the edge to construct a social network further to analyze its important impact on students’ education management.

Keywords

Social link social network campus big data data mining AI for education

1. Introduction

Social link analysis is widely used in resource recommendation, extracurricular activities, education management and other fields. Relevant studies have shown that students’ social behavior is also highly correlated with their mental health [11] and academic performance [5, 19]. For college students, mental disorders caused by the social link are becoming more and more common [21, 20]. School administrators need to judge the mental health and academic performance of college students by inferring their social skills and interpersonal relationships to educate better and manage college students [4, 9]. With the wide use of mobile devices and social application software with positioning functions, users are also more willing to share their daily lives. Fortunately, almost all students are equipped with a student campus card. Students can use this card for payments, access controls for gates, book borrowing via the library, showering, and other activities. Other activity records such as the login gateway and online payments are also recorded in the school’s server logs in addition to the student campus card. These multi-source heterogeneous data contain much spatiotemporal information, which provides data support for inferring students’ social link.

In real life, when two people frequently participate in many activities simultaneously, they are likely to have a close relationship. Sociologists often call this phenomenon homogeneity. Many researchers predict the social link according to this nature [13, 29, 30, 26]. Specifically, when two users appear in the exact location simultaneously, they are considered to co-occur. The higher the frequency of co-occurrence, the greater the social strength of two people [16, 24, 22, 10, 15, 1, 12, 17, 6, 18]. Therefore, some scholars use spatiotemporal data to extract co-occurrence frequency to infer social links. However, this may not be enough to fully solve the problem studied: when calculating the co-occurrence frequency, the random encounter problem caused by different crowding degrees of locations is ignored. For example, the social relationship reflected by two people meeting ten times in a cafeteria and once in a private place cannot be judged only by frequency. Because the cafeteria is more crowded, the random factor of two people meeting is more significant. To address these issues, Pham et al. [18] proposed an entropy-based model (EBM) to effectively solve the impact of location congestion and richness on co-occurrence frequency. EBM uses Renyi entropy to measure the diversity of co-occurrence locations and fully considers the impact of encounter frequency of different locations on diversity. Meanwhile, the popularity of each location is measured by location entropy, and different weights are assigned to the co-occurrence frequency of different locations.

EBM model provides a good solution for inferring social association from spatiotemporal data. However, EBM is designed and modeled for people’s activities in social scenes and can not be well applied to campus scenes. Due to the particularity of the campus scene, some crucial changes and effects different from the social scene must be considered. Firstly, the student activity sites are mainly distributed in each school building, which is relatively fixed and single. For example, a teaching building can usually accommodate many students, which dramatically increases the chance of student co-occurrence. Therefore, overreliance on location information with high crowding will significantly interfere with the inference of social strength. Secondly, students need to arrange classes according to the school calendar. For schools, students’ overall class time is regular. Still, for students, their activity time is not limited to class, but also rich after-class activities, such as eating in the canteen, reading in the library and exercising in the gym. The time of these activities can be arranged according to the students themselves. Therefore, the social link between students will be more sensitive to the time of these activities. For example, two students who appear in the canteen at 12 p.m and 3 p.m. have different social strengths. Because most students have lunch at noon, but few students go to the canteen at 3 p.m.

Further, analyzing the characteristics of the time dimension in campus scene, in previous studies, some scholars inferred social relations from refining the impact of time weight. Liu et al. [14] used a time slicing method to divide the card swiping data of college students into different time intervals. If students appear in the same location during a certain period of time within the division, they are considered to be co-occurring in time and space. However, this time slicing method may cut two adjacent co-occurrences into different time intervals, resulting in spatiotemporal co-occurrence cutting. Therefore, researchers [27, 31] proposed a more flexible sliding time window to measure the time when two students appear at the same time. Record the check-in time of two students, if the check-in time gap of two students is less than a certain value, it is considered that the two students are co-occurring. However, the methods above still did not consider the possibility of coincidence caused by the peak time. For example, the canteen at noon and the study room in the evening are the peak time for student activities. There will be a greater possibility for coincidence to happen in co-occurrence during this period.

Thus, for the special campus environment, a Spatiotemporal Entropy-Based Analyzing (S-EBA) model for campus behavior data to infer students’ social strength is proposed. Diversity, weights of frequencies, and behavioral similarity are introduced to measure social links. The main contributions of this paper are as follows:

1)
A refined model, S-EBA, for inferring social links in campus scenes is proposed. The model uses students’ multi-source heterogeneous behavior data for modeling and calculates students’ social strength;
2)
Aiming at the location-density and time-sensitivity of campus, location-time entropy is introduced to measure social link, and the influence of contingency and congestion is captured by multiple feature weights;
3)
Real student activity data are collected to verify the performance of the model from several indexes. The inferred social strength is the weights of edges to construct social networks, and actual cases verify the inferred student social intercourse for teaching management evaluation.

The structure of this article is as follows: Section 2 reviews the related work of inferring social networks based on spatiotemporal data. Section 3 gives an overview of the model. Section 4 introduces the data processing and analysis. Section 5 puts forward an expression method of spatiotemporal co-occurrence. Section 6 introduces a calculation algorithm of the social strength. Section 7 performs experimental verification and analysis. Section 8 summarizes and suggests future work.
2. Related work

This section will introduce the related work of analyzing social link using spatiotemporal data and inferring students’ social interaction for campus scenes.

2.1 Analyzing social link

In the past, the method of inferring the social relationship between two users was mainly based on historical location information. That is, if the historical activity trajectories of two users are similar, it is considered that the two users are more likely to know each other [13, 29, 30, 26]. Li et al. [13] established the users’ visit sequence and determined the user similarity by comparing the sequence similarity. Based on the spatiotemporal data that recorded location information, Yang et al. [29] proposed the LP-Mine form to formally describe individuals’ general lifestyle and laws that can be found from the location history. Zhang et al. [30] measured the similarity between users based on their location history, and recommended a set of potential friends in the GIS community to each user. Furthermore, Xiao et al. [26] used the MTM algorithm to estimate the potential similarity between different users based on the users’ physical location history. This method contains semantic information that carries the users’ interest.

According to the “imilarity breeds connection” principle, if two users are friends, they usually appear in the same location. Therefore, inferring social strength around users’ spatiotemporal co-occurrence data has received more and more attention [16, 24, 22, 10, 15, 1, 12]. Pham et al. [16] used a quad-tree data structure to describe the spatiotemporal co-occurrence by establishing the users’ visit vector and co-occurrence vector. Sun et al. [22] inferred the semantics of social relationships using digital social footprints in binary perspective on social ties. Jung et al. [10] proposed a model to infer social connections of smart objects and predict the connection strength using the co-usage data of the objects. Some scholars have identified data that they believe can infer social relations. Njoo et al. [15] extracted four key features by analyzing the spatial-temporal behavior data of users, namely diversity, universality, persistence and stability, and then extracted a social relevance reasoning framework based on these four features to distinguish real friends from familiar strangers. Li et al. [12] inferred social relationships from individual vehicle mobility data, but this is still a subject of inferring social networks based on spatiotemporal data. In inferring social relationships, Pham et al. [17] is interested in inferring social connections by analyzing users’ location information; this is useful in a variety of application domains, from sales and marketing to intelligence analysis. In particular, he proposed an entropy-based model (EBM) that infers social connections and estimates the strength of social connections by analyzing users’ co-occurrences in space and time. We know that user activities can be expressed in two dimensions: time and space. Pham et al. described spatiotemporal co-occurrence through a quad-tree structure. The space is divided into cells in the quadrants, and the cells in each quadrant are divided until all the space are divided. Each cell represents a location, and larger cells indicate that the location is very active. Each location includes different users co-occurring at different times.

In previous studies, indicators such as activity trajectory, co-occurrence frequency, and behavioral similarity have all been used to measure the social relationship between two people, providing a good research foundation for our work. However, these methods do not consider the particular factors of the campus scene and are not suitable for inferring students’ social strength.

2.2 Inferring student social strength

Obviously, this method of inferring social connections can be applied in education. Many scholars use various data generated by student activities to recommend social connections between students. Yao et al. [28] used each student’s consecutive check-in records to infer the student’s social network. On this basis, new methods of inferring social connections are constantly being proposed. Liu et al. [14] obtained the social association among students by analyzing the data stream of 17,795 students’ card consumption, and summarized the characteristics of students’ social interaction based on the social association, such as the phenomenon of small world, in which lower grade students tend to form large communities, while higher grade students tend to build small communities. Xu et al. [27] proposed a hierarchical encounter model based on association rules in order to solve the errors caused by the homogeneity of students’ behaviors in majors and grades in social association analysis, which has a good effect. However, these methods do not fully consider the trajectory of students’ activities. Ebadi et al. [7] proposed an activity-mobility trajectory construction algorithm based on the prior knowledge of student behavior patterns and student campus card consumption data, which can effectively predict the movement trajectory of students’ activities, provides a basis for comparing the similarity of students’ movement trajectory.

Using a time slice is a very reliable method to infer students’ social strength. However, these methods have some limitations because only through time slices to measure students’ co-occurrence in the time dimension, which will lead to inaccurate results. The time interval between two students will affect the ability of co-occurrence to measure social interaction. The above method does not take these into account.

3. Method overview

Given the special location-density and time-sensitivity of campus, S-EBA is proposed to infer students’ social links. The method framework is shown in Fig. 1. Extract and analyze the spatiotemporal information in students’ multi-source heterogeneous behavior data, and get the student activity sequence. The visit matrix is constructed according to the two dimensions of time and space, and the co-occurrence matrix is weighted considering the impact of the time interval. Based on the co-occurrence matrix, three features are introduced to measure social intercourse to fully consider diversity, spatiotemporal hotspots, and behavioral similarity. Finally, the social situation of student groups and individuals is analyzed through social networks. Therefore, the whole method consists of four parts.

(1)
Behavior data analysis: Collect students’ actual behavior data, which comes from students’ daily activities, including check-in data, consumption data, gateway login data. Because the data is multi-source and heterogeneous, we need to sort and clean the data to extract the spatiotemporal information. At the same time, some statistical methods are used to analyze the student behavior data, and some laws are obtained, which further proves the particularity of the campus scene. The process of behavioral data analysis is shown in Fig. 1(1).
(2)
Co-occurrence frequency expression: The model infers that social interaction mainly depends on spatiotemporal co-occurrence. Firstly, the visit matrix of each student is established according to the activity sequence. If the student appears at location $l$ at time $t$ , the value of the corresponding matrix coordinate $(l,t)$ is 1. In order to measure the co-occurrence time more accurately, the weighted time slice method is introduced; that is, the social strength is different with different time intervals. Therefore, visit time interval (VTI) and co-occurrence time interval (CTI) are considered to generate students’ co-occurrence matrix. The process of co-occurrence frequency expression is shown in Fig. 1(2).
(3)
Social strength calculation: With the co-occurrence matrix, the social link between students has been constructed. However, for the campus scene, we need to consider the impact of some particular circumstances. This model considers diversity, spatiotemporal hotspot and behavioral similarity. Based on the co-occurrence matrix, the social strength of the student is calculated in these three cases simultaneously. In particular, this paper creatively proposes location-time entropy to calculate spatiotemporal hot spots. The process of social strength calculation is shown in Fig. 1(3).
(4)
Visualization: A social network can be constructed by using social strength as the weight of the edges. Some visual images such as chord diagrams and force guidance diagrams are introduced to analyze students’ sparse relationships and provide good guidance for teaching management. This part is shown in Fig. 1(4).

Figure 1.
The S-EBA model framework.

4. Behavioral data analysis

This section details the collection and analysis of student behavior data. In many universities, each student has a student campus card with a unique number. The card is the primary medium for students to eat, shop, shower, etc. The system will record the historical use information of the campus card. In addition to the campus card, the gateway also records students’ online browsing information, and the library reservation system records students’ information to the library. Based on this, student behavior can be divided into many types, including consumer behavior, online behavior, learning behavior, etc. Sort out and clean the multi-source heterogeneous data of students to obtain the space-time information of each data, as shown in Table 1.

Table 1
Sign-in data sample

Student ID	Sign-in time	Sign-in location	Consumption amount	Activity type
17****01	2019-3-12 11:12:32	Canteen	10.5	Dining
17****01	2019-3-12 14:40:04	Library	*	Learning
18****01	2019-3-13 21:32:43	Bathroom	3	Showering
…	…	…	…	…

4.1 Data source

In order to obtain the activity sequence, we clean up the data and extract the features of the log recording students’ behavior. Each record includes time, location, frequency, activity type and consumption amount.

1) Consumption

The student campus card is the primary payment medium for campus dining, shopping and other consumer behaviors. When students use the card for consumption, the system will record the time, location and amount of consumption. Referring to the work and rest time of the school and the functional division of each location, Consumer behavior can be divided into many types, including breakfast behavior, lunch behavior, dinner behavior, shopping behavior and shower behavior.

2) Entering the library

The library is the central place for students to study and read. When students enter the library, they usually need to carry out identity authentication through face recognition or student campus card and record the entry time and library name. At the same time, the information of students’ reservation of seats and borrowing and returning books in the library will also be recorded.

3) Gateway login

At present, most colleges and universities have set up their own campus LAN. When students access the Internet through the campus network, they must log in to the gateway system. The system is deployed between the Internet and campus LAN, which is responsible for the protocol conversion between the two sides, realizes network interconnection, and records the students’ login time, login IP, access time and network traffic; every time a student logs in the gateway, a record will be generated. Through these records, students’ online time, web browser type and other information can be obtained.

4.2 Privacy protection

When collecting data, it should be noted that students’ privacy must be protected. Therefore, based on obtaining the consent of students, personal information must be encrypted in the process of data collection and processing. Specifically, essential information masking and one-hot coding are used to replace all students’ personal information.

We encode basic information such as student ID. A mapping table between real student ID and student ID code is created. Each real identity is encoded into a unique, anonymous alphanumeric identifier. The corresponding coded ID replaces the actual student ID in each data source to ensure the anonymity of the whole experiment. In order to simplify the experimental process, a mapping table is created between the active location and the code, and the corresponding code is assigned to the location. The activity time is expressed in the format of “year month day, hour: minute: second”. For ease of calculation, we convert it to “DD: mm” format, where DD represents the day of the semester and mm represents the first few minutes of the day. The mapping method is shown in Fig. 2.

Figure 2.

Time mapping table. The figure shows a way to map time to numbers. SITIME represents the sign-in time, SIDAY represents the day of the sign-in time in the data set, and SIMIN represents the minute of the sign-in time in the data set. For example, 2019/2/18 is 1, 2019/2/19 is 2; 09:14 is 9 $\times$ 60 $+$ 14 $=$ 554.

4.3 Student behavior pattern discovery

From the overall detailed analysis of the space-time information in students’ activities, this paper intuitively reveals the patterns of campus activities. Figure 3 is a calendar heat map, which counts the activity value of students every day in a semester. Figure 4 further refines four graphs to show the frequency of students’ activities in weeks and days. Analysis of a semester, students’ activities have a cycle T of 7 days. In the first five days of each cycle, the frequency of activities is more significant than that in the next two days. The reason is that the activity frequency of students on working days is higher than that on rest days (see Fig. 4a and b). On one day, the peak of students’ daily activities is concentrated in the dining periods of the morning, afternoon and evening (see Fig. 4c and d), indicating that students are more active in these three periods. This section intuitively proves that student activities have the characteristics of space-time hot spots.

Figure 3.

Calendar chart of student activity frequency. The figure records the frequency of student activities in the 21 weeks of the semester in order. It can be seen that students have a higher activity frequency in the first five days of a week than in the latter two days, and the activity frequency in one semester conforms to the cycle T $=$ 7 (days). However, we observed that the last few weeks did not conform to periodicity. The reason for the inquiry was that activities such as internships and examinations disrupted the regularity of activity frequency.

Figure 4.

Student activity frequency. (a) and (b) show that the activity frequency of the first 5 days of the week is significantly higher than the last 2 days. The reason is that the last 2 days are weekends, and most students go home or leave school; (c) and (d) shows the distribution of activities on the first day and the second day. It can be seen that the peak period of activities is mainly distributed during the dining period.

Figure 5.

Model details of co-occurrence frequency and social strength.

5. Co-occurrence frequency expression

This section introduces how to complete the co-occurrence frequency of two students. Particularly, we considered the influence of the visit time interval (VTI) and the co-occurrence time interval (CTI) on the co-occurrence frequency. The details of the model process are shown in Fig. 5a.

5.1 Visit frequency

The student set is represented by $S=(s_{1},s_{2},\ldots,s_{M})$ , the activity location set is represented by $L={(l}_{1},l_{2},\ldots,l_{N})$ , and the activity time set is represented by $T={(t}_{1},t_{2},\ldots,t_{K})$ . The social strength of students can be inferred from the set of activity sequence ${(<s,l,t>,s\in S,l\in L,t\in T)}$ , where $M$ is the total number of students, $N$ is the total number of locations, and $K$ is the total number of times.

The visit vector of student $i$ can be expressed as $(<{{v_{i,{l_{1}},{t_{1}}}}},{{v_{i,{l_{1}},{t_{2}}}}},\ldots,{{v_{i,{l_{1}},{t% _{K}}}}}>,<{{v_{i,{l_{2}},{t_{1}}}}},{{v_{i,{l_{2}},{t_{2}}}}},\ldots,% \linebreak{{v_{i,{l_{2}},{t_{K}}}}}>,<{{v_{i,{l_{N}},{t_{1}}}}},{{v_{i,{l_{N}}% ,{t_{2}}}}},\ldots,{{v_{i,{l_{N}},{t_{K}}}}}>)$ , where $<{{v_{i,{l_{n}},{t_{1}}}}},{{v_{i,{l_{n}},{t_{2}}}}},\ldots,{{v_{i,{l_{n}},{t_% {K}}}}}>$ indicates the time list when student $i$ visited the location $n$ . Where $v_{i,l_{n},t_{k}}$ represents the visit frequency of student $i$ appearing in the time $k$ at the location $n$ . Converting the visit vector into the visit matrix can be expressed as:

$\displaystyle{V_{i}}=\left[{\begin{array}[]{*{20}{c}}{{v_{i,{l_{1}},{t_{1}}}}}% &{{v_{i,{l_{1}},{t_{2}}}}}&\ldots&{{v_{i,{l_{1}},{t_{K}}}}}\\ {{v_{i,{l_{2}},{t_{1}}}}}&{{v_{i,{l_{2}},{t_{2}}}}}&\ldots&{{v_{i,{l_{2}},{t_{% K}}}}}\\ \vdots&\vdots&\vdots&\vdots\\ {{v_{i,{l_{N}},{t_{1}}}}}&{{v_{i,{l_{N}},{t_{2}}}}}&\ldots&{{v_{i,{l_{N}},{t_{% K}}}}}\end{array}}\right]$ (1)

The visit matrix $V$ of all students can be expressed as:

$\displaystyle V=[{V_{1}},\ldots,{V_{i}}],i\in({1,M}]$ (2)

5.2 Co-occurrence frequency

If the interval between the visit time of student $i$ and student $j$ at the same location is less than $\gamma$ , the two users have a spatiotemporal co-occurrence. As shown in Fig. 6, the more such spatiotemporal co-occurrences, the closer the social relationship between the two users.

The spatiotemporal co-occurrence matrix can be constructed based on the visit matrix of student $i$ and student $j$ , which is defined as follows:

$\displaystyle{C_{ij}}=\left[{\begin{array}[]{*{20}{c}}{{c_{ij,{l_{1}},{t_{1}}}% }}&{{c_{ij,{l_{1}},{t_{2}}}}}&\ldots&{{c_{ij,{l_{1}},{t_{K}}}}}\\ {{c_{ij,{l_{2}},{t_{1}}}}}&{{c_{ij,{l_{2}},{t_{2}}}}}&\ldots&{{c_{ij,{l_{2}},{% t_{K}}}}}\\ \vdots&\vdots&\vdots&\vdots\\ {{c_{ij,{l_{N}},{t_{1}}}}}&{{c_{ij,{l_{N}},{t_{2}}}}}&\ldots&{{c_{ij,{l_{N}},{% t_{K}}}}}\end{array}}\right]$ (3)

where we divide the day into $k$ time periods, $c_{ij,l_{n},t_{k}}$ is the spatiotemporal co-occurrence frequency of student $i$ and student $j$ in the location $n$ during the time period $k$ . The spatiotemporal co-occurrence matrix $C$ and $\hat{C}$ for all students can be expressed as:

$\displaystyle C=[C_{12},\ldots,C_{ij}],i\in[{1,M}],j\in[{1,M}],i\neq j$ (4) $\displaystyle\hat{C}=[C_{12},\ldots,C_{ji}],j\in[{1,M}],i\in[{1,M}],j\neq i$ (5)

5.3 Visit time interval

Introduce a sliding time window with the range of $\gamma$ , and specify that if the interval between student $i$ and student $j$ in the exact location $l$ is less than $\Delta t$ , two users will appear at the same time. And the smaller the $\Delta t$ , the smaller the visit time interval (VTI) between the two users, the closer the social relationship between the two is. Therefore, the influence of visit time interval $\overline{VT}(i,j)$ is introduced as the weight of co-occurrence frequency. As shown in Fig. 7, the visit time interval ${\Delta t}^{(v)}$ between student $i$ and student $j$ at the location $l$ can be expressed as:

$\displaystyle\Delta{{{t}}^{(v)}}=|{{t_{v}}_{i,l}-{t_{v}}_{j,l}}|$ (6)

The influence of visit time interval can be expressed as:

$\displaystyle\overline{VT}({ij,l,t})={e^{\left(-\frac{{\Delta{{{t}}^{(v)}}}}{% \tau}\right)}}$ (7)

where $\overline{VT}(ij,l,t)\in(0,1]$ , ${t_{v}}_{i,l}$ is the visit time of student $i$ at location $l$ , ${t_{v}}_{j,l}$ is the visit time of student $j$ at location $l$ , and $\tau$ is the expected time delay, which can be taken as the average time delay between two users visiting the same location. In this article, $\tau$ is 1, and the formula is simplified as:

$\displaystyle\overline{VT}({ij,l,t})={e^{(-(\Delta{{{t}}^{(v)}}))}}$ (8)

The check-in time of student $i$ at location $l$ is recorded as ${t_{v}}_{i,l}$ , and the length of the color bar is the activity time of student $i$ . If $\Delta t_{(i,j)}<\gamma$ , then the two users will have a spatiotemporal co-occurrence.

Figure 6.

Students co-occurrence. The check-in time of student $i$ at location $l$ is recorded as ${t_{v}}_{i,l}$ , and the length of the color bar is the activity time of student $i$ . If ${\Delta}t_{(i,j)}<\gamma$ , then the two users will have a spatiotemporal co-occurrence.

When the VTI is large, $\overline{VT}(ij,l,t)$ is small. For example, the spatiotemporal co-occurrence interval of student $i$ and student $j$ is ${{\Delta t}}_{1}$ , the co-occurrence interval of student $p$ and student $q$ is ${{\Delta t}}_{2}$ , ${{\Delta t}}_{1}<{{\Delta t}}_{2}$ . Therefore, the co-occurrence frequency of student $i$ and student $j$ is more important than that of student $p$ and student $q$ . The spatiotemporal co-occurrence can be expressed as:

$\displaystyle c_{ij,l,t}^{(\omega)}=\left\{\begin{array}[]{ll}\overline{VT}({% ij,l,t})\times{{\rm{c}}_{ij,l,t}},&\text{if }\Delta{{\rm{t}}^{(v)}}\leqslant% \gamma\\ 0,&\text{otherwise}\\ \end{array}\right.$ (9)

where $\gamma$ is the specified maximum interval time. When ${{\Delta t}}^{(v)}$ is less than or equal to the specified $\gamma$ , the spatiotemporal co-occurrence based on the influence of the VTI is the weighted co-occurrence frequency. Otherwise the co-occurrence frequency is 0.

Figure 7.

Effect of co-occurrence interval. The figure shows the visit time series of 4 students $i$ , $j$ and $p$ , $q$ . Assuming that $i$ and $j$ are at the same location, the time for $i$ to reach the place is ${t_{v}}_{i,l}$ , and the time for $j$ to reach the place is ${t_{v}}_{j,l}$ , the visit time interval between the two users is ${{\Delta}t_{(i,j)}}^{(v)}$ ; Similarly, the visit time interval between student $p$ and student $q$ is ${{\Delta}t_{(p,q)}}^{(v)}$ , ${{\Delta}t_{(i,j)}}^{(v)}<{{\Delta}t_{(p,q)}}^{(v)}$ . This explains that the co-occurrence weight of student $i$ and student $j$ is greater than that of student $p$ and student $q$ co-occurrence weights.

Figure 8.

Influence of the interval between two co-occurrences. The figure shows the time series of co-occurrence of students $i$ , $j$ and students $p$ , $q$ . $t_{c_{ij,t_{x-1}}}$ is the time when students $i$ and $j$ co-occur at $x-1$ , $t_{c_{ij,t_{x}}}$ is the next co-occurrence time $x$ of student $i$ and $j$ . ${{\Delta}t_{(i,j)}}^{(c)}$ is the co-occurrence time interval between the two adjacent co-occurrences $(x-1,x)$ of students $i$ and $j$ ; the same is true ${{\Delta}t_{(p,q)}}^{(c)}$ is the co-occurrence time interval between the two adjacent co-occurrences $(x-1,x)$ of students $p$ and $q$ . ${{\Delta}t_{(i,j)}}^{(c)}<{{\Delta}t_{(p,q)}}^{(c)}$ , indicating that the co-occurrence weight of student $p$ and student $q$ are greater than student $p$ and student $q$ co-occurrence weight.

5.4 Co-occurrence time interval

Meanwhile, we consider the influence of time on co-occurrence, given ${<c_{ij,l,t_{1}},c_{ij,l,t_{2}}\ldots c_{ij,l,t_{k}}>}$ , where $c_{ij,l,t}$ represents the co-occurrence established between student $i$ and student $j$ , $l$ represents the location where the co-occurrence is established, $t$ represents the time when the co-occurrence is generated, $t_{1}<t_{2}<\ldots<t_{k}$ . We believe that in the co-occurrence sequence, the later the co-occurrence time is established, the later the co-occurrence sequence is. The more scattered the distribution of all students’ co-occurrences on the time axis, the greater the impact on the co-occurrence frequency. The co-occurrence time interval (CTI) ${{\Delta t}}^{(c)}$ is defined as the absolute value of the difference between two adjacent timestamps $c_{ij,t_{x-1}}$ and $c_{ij,t_{x}}$ . Note that these two co-occurrences are not necessarily in the same location. As shown in Fig. 8, ${{\Delta t}}^{(c)}$ can be expressed as:

$\displaystyle\Delta{{{t}}^{(c)}}=|{{t_{{c_{ij,{t_{x-1}}}}}}-{t_{{c_{ij,{t_{x}}% }}}}}|$ (10)

where $t_{c_{ij,t_{x-1}}}$ is the $x-1$ -th co-occurrence time of student $i$ and student $j$ , and $t_{c_{ij,t_{x}}}$ is the $x$ -th co-occurrence time of student $i$ and student $j$ .

Then the influence of co-occurrence time interval can be expressed as:

$\displaystyle\overline{CT}({ij,l,t})=\textit{norm}(a\times\Delta{{{t}}^{(c)}})$ (11)

where $\textit{norm}(\cdot)$ is MinMaxScaler, which realizes $\overline{CT}(ij,l,t)\in(0,1]$ , and $a$ is a parameter determined by the co-occurrence time interval, and we set $a=$ 1. For example, the time interval of two spatiotemporal co-occurrences of student $i$ and student $j$ is ${{\Delta t}}_{1}$ , and the time interval of two spatiotemporal co-occurrences of student $p$ and student $q$ is ${{\Delta t}}_{2}$ , ${{\Delta t}}_{1}>{{\Delta t}}_{2}$ . Then the weight of the co-occurrence frequency of student $i$ and student $j$ is greater than the weight of the co-occurrence frequency of student $p$ and student $q$ . In this article, as the reciprocal of the maximum value of all co-occurrence time intervals, namely $\frac{1}{\text{max}{{{\Delta t}}^{(c)}}}$ . The formula can be simplified to:

$\displaystyle\overline{CT}({ij,l,t})=\textit{norm}\left(\frac{{\Delta{{{t}}^{(% c)}}}}{\text{max}\Delta{{{t}}^{(c)}}}\right)$ (12)

The co-occurrence frequency under the influence of time weight can be obtained by Eqs (9) and (12):

$\displaystyle c_{ij,l,t}^{(\omega^{\prime})}=\left\{\begin{array}[]{ll}% \overline{VT}({ij,l,t})\times\overline{CT}({ij,l,t})\times{{{c}}_{ij,l,t}},&% \text{if }\Delta{{{t}}^{(v)}}\leqslant\gamma\\ 0,&\text{otherwise}\\ \end{array}\right.$ (13)

Acquiring spatiotemporal co-occurrence matrix with the influence of time weight is summarized as Algorithm 5.4.

[h] Acquiring spatiotemporal co-occurrence matrix with the influence of time weightactivity sequence ${(<s,l,t>,s\in S,l\in L,t\in T)}$ , fixed time interval $\gamma$ , $M$ , $N$ , $K$ spatiotemporal co-occurrence matrix $C_{ij}$ , $i\in[1,M]$ , $j\in[1,M]$ , $i\neq j$

Create a zero matrix $V$ with dimensions (M, N, K)not activity sequence.empty() $S:<s,l,t>\neq 0$ $V_{i,l,t}=S:<s,l,t>$ $s-=1$

$i$ in $1$ to $M-1$ $j$ in $i+1$ to $M$ $l$ in 1 to $N$ $t$ in 1 to $K$

$|{t_{v}}_{i,l}-{t_{v}}_{j,l}|\leqslant\gamma$ calculate $\overline{VT}(ij,l,t)$ by using Eq. (8); $c_{ij,l,t}=\overline{VT}(ij,l,t)\times v_{i,l,t}$ ; $c_{ij,l,t}=0$ ; calculate $\overline{CT}(ij,l,t)$ by using Eq. (11), which uses the last co-occurrence time; $c_{ij,l,t}=\overline{CT}(ij,l,t)\times c_{ij,l,t}$ ;

6. Social strength calculation

After obtaining the co-occurrence frequency under the influence of weight, we consider the factors that affect students’ social link and select three features: diversity, spatiotemporal hotspot, and behavior similarity. The details of the model process are shown in Fig. 5b.

6.1 Diversity

Students with better relationships often move around at different times and locations, and their co-occurrence times and locations will be more diverse. As shown in Fig. 9a, 10 pairs of students are arranged according to the co-occurrence frequency score. It can be seen that the co-occurrence frequency and diversity are roughly linearly distributed. But there are exceptions, The diversity of student pair D and student pair I did not increase with the co-occurrence frequency. In order to analyze this phenomenon, we select two pairs of students, G and I. As shown in Fig. 9b, there are 6 different locations where the student pair G co-occurs, and the student pair I is 2. This leads to differences in the location diversity of the two groups of students (G $=$ 14, I $=$ 9), although the co-occurrence frequency of G is less than I. Shannon entropy can be used to measure spatiotemporal diversity. However, two students may have a greater co-occurrence frequency at a certain hot spot during peak hours, which leads to the higher weight given by Shannon entropy to this situation. To overcome this problem, Renyi entropy is used to measure the social association in our method.

Figure 9.

Score distribution of co-occurrence frequency and diversity in different locations.

The associated Renyi entropy of student $i$ and student $j$ in the two dimensions of time and location is calculated. The definition of associated Renyi entropy is as follows:

$\displaystyle H_{ij}={{\left({-\log\sum\limits_{l}{\sum\limits_{t}{{{\left({% \frac{c_{ij,l,t}^{(\omega^{\prime})}}{{{f_{ij}}}}}\right)}^{q}}}}}\right)}% \mathord{\left/{\vphantom{{\left({-\log\sum\limits_{l}{\sum\limits_{t}{{{\left% ({\frac{{c_{ij,l,t}^{(\omega\prime)}}}{{{f_{ij}}}}}\right)}^{q}}}}}\right)}{% \left({q-1}\right)}}}\right.\kern-1.2pt}{\left({q-1}\right)}}$ (14)

where $f_{ij}=\sum_{l}\sum_{t}c_{ij,l,t}$ . By adjusting the parameter $q$ , Renyi entropy can flexibly control the influence of local frequency on the entropy value. Setting $q<1$ can effectively reduce the impact of two users’s high-frequency coincidences at hot spots and hot spots on the entropy value, such as two students always eat in the same restaurant at the same time, and the co-occurrence frequency of the two will be very high, even if the two do not know each other.

Diversity can be obtained by the exponential change of Renyi entropy. Therefore, we calculate the spatiotemporal co-occurrence diversity of student $i$ and student $j$ by Eq. (15):

$\displaystyle D_{ij}={e^{({H_{ij}})}}={\left({\sum\limits_{l}{\sum\limits_{t}{% {{\left({\frac{{c_{ij,l,t}^{(\omega^{\prime})}}}{{{f_{ij}}}}}\right)}^{q}}}}}% \right)^{{1\mathord{\left/{\vphantom{1{\left({1-q}\right)}}}\right.\kern-1.2pt% }{\left({1-q}\right)}}}}$ (15)

Figure 10.

The location-time entropy of 49 locations on campus. It can be seen that such as restaurants and supermarkets are significantly higher than stadiums and shower rooms. The entropy of the same location in different time periods is also different. For example, the peak period of the restaurants is three peak dining periods.

6.2 Spatiotemporal hotspot

The spatiotemporal co-occurrence of students in different locations at different times has different effects on inferring social strength. For example, student $i$ and student $j$ showed up in the coffee shop several times during the off-peak period, while student $p$ and student $q$ showed up in the restaurant dozens of times during the peak dining period. However, we usually think that the social strength of student $i$ and student $j$ is higher than student $p$ and student $q$ . Location-time entropy is introduced to weight the co-occurrence frequency. Figure 10 shows the location-time entropy heat map of a university, where the abscissa represents time and the ordinate represents location. The larger the entropy value, the more crowded the location at that moment. $P_{s,l,t}$ is defined as the probability that location $l$ is visited by student $s$ at time $t$ . For example, many students will go to a restaurant to eat during the peak dining period, so the entropy value of the restaurant during this period is relatively large. When the peak dining period disappears, the entropy value of the restaurant will decrease. $P_{s,l,t}$ is as follows:

$\displaystyle{P_{s,l,t}}=\frac{{{V_{s,l,t}}}}{{\sum\limits_{l}{\sum\limits_{t}% {{V_{s,l,t}}}}}}$ (16)

where $V_{s,l,t}$ is the number of visiting students at location $l$ and time $t$ .

The location-time entropy is as shown in Eq. (17):

$\displaystyle H_{l,t}=-\sum\limits_{s,{P_{s,l,t}}\neq 0}{{P_{s,l,t}}\log{P_{s,% l,t}}}$ (17)

[h] Acquiring location-time entropyvisit matrix $V_{i}$ , $i\in[1,M]$ , $M$ , $N$ , $K$ location-time entropy $H_{ij}$ $V=V_{1}+V_{2}+\ldots+V_{i}+\ldots+V_{M}$ $i$ in $1$ to $N$ $j$ in $1$ to $K$ $k$ in $1$ to $M$ $v_{i,j}\neq 0$ calculate $P_{s,l,t}$ by using Eq. (16);

calculate $H_{l,t}$ by using Eq. (17);

The location-time entropy can be calculated by Algorithm 6.2. Using location-time entropy to weight the co-occurrence frequency, calculate the spatiotemporal hotspot of student $i$ and student $j$ , as shown in Eq. (18):

$\displaystyle{F_{ij}}=\sum\limits_{l}{\sum\limits_{t}{c_{ij,l,t}^{(\omega% \prime)}\times{e^{({-H_{l,t}^{T}})}}}}$ (18)

where $e^{(-H_{l,t})}\in(0,1]$ as the weight of the co-occurrence frequency, when the location-time entropy is large, $e^{(-H_{l,t})}$ is small. For example, student $i$ and student $j$ have higher positional time entropy in the canteen during the peak dining period, so $e^{(-H_{l,t})}$ is used as the weight to reduce the influence of spatiotemporal hot spots. On the contrary, student $i$ and student $j$ co-occur in the restaurant during the off-peak period, the co-occurrence frequency is increased by weight.

6.3 Behavioral similarity

Two students with strong social strength have similar behavioral characteristics. In order to express the distribution of the quantitative attribute values in the time series behavior data, the average value, range, and mode are calculated to express the central tendency of the distribution. The minimum, the first quantile, the median, the third quantile and the maximum are used to express the dispersion of the distribution. These statistics are used as behavioral characteristics of students, respectively, using Euclidean distance to calculate the behavior similarity between students $i$ and $j$ :

$\displaystyle D_{ij}^{k}=\sum\limits_{n}{{{(A_{i}^{k,n}-A_{j}^{k,n})}^{2}}}$ (19)

where $D_{ij}^{k}$ represents the difference between student $i$ and $j$ in the $k$ -th behavior, and $A_{i}^{k,n}$ represents the $n$ -th feature of student $i$ in the $k$ -th behavior. We use $e^{(-D_{ij}^{k})}$ to express the social strength between students $i$ and $j$ . The greater the similarity, the smaller the $D_{ij}^{k}$ , the stronger the social strength.

6.4 Social strength

Diversity, co-occurrence frequency and behavioral similarity are regarded as the three features for calculating social strength. These three independent features are standardized and summarized to express the social strength of students:

$\displaystyle{S_{ij}}={D_{ij}}+{F_{ij}}+\sum\limits_{k}{D_{ij}^{k}}$ (20)

Subsequently, Eq. (21) uses multiple regression methods to determine the parameters of these three independent features. In order to facilitate multiple regressions, we rewrite the equation in explicit form with the best parameters:

$\displaystyle{S_{ij}}=\alpha\cdot{D_{ij}}+\beta\cdot{F_{ij}}+\sum\limits_{k}{{% \gamma_{k}}\cdot D_{ij}^{k}}+\varepsilon$ (21)

which uses linear regression to determine the parameters $\alpha$ , $\beta$ , $\gamma_{k}$ and $\varepsilon$ . In subsequent experiments, we set $\gamma_{k}=$ 1 and $\varepsilon=$ 0. The algorithm for inferring social strength is shown in the Algorithm 6.4.

[h] Calculating social strengthspatiotemporal co-occurrence matrix $C_{ij}$ , $i\in[1,M]$ , $j\in[1,M]$ , $i\neq j$ , location-time entropy $H_{l,t}$ , threshold $q$ , $k$ social strength $S_{ij}$

$i$ in $1$ to $M-1$ $j$ in $i+1$ to $M$ calculate the co-occurrence diversity $D_{ij}$ by using Eq. (15);calculate the spatiotemporal hotspot $F_{ij}$ by using Eq. (18);calculate the behavioral similarity $D_{ij}^{k}$ by using Eq. (19);

calculate the social strength $S_{ij}$ by using Eq. (21);

7. Experimental results and analysis

The social strength of college students and the patterns of student activities are inferred and analyzed. In addition, the reliability of the model is verified by comparing with other state-of-the-art models such as EBM [17], walk2friends [2], PGT [23]. Finally, in order to enable student workers to understand the structure of social networks effectively, students social networks are examined and then visualized using chord diagrams and force-guided layout algorithms.

7.1 Data set

The 9353 student campus card sign-in data logs of three-year students from a university in China are collected. The data log counted students’ behavior data in the spring semester of 2019, including breakfast behavior, lunch behavior, dinner behavior, shopping behavior, entering library behavior, washing behavior, etc. The period of the data log is a teaching semester (145 days), involving 49 locations of different types of activities such as dining, shopping, exercise, learning, surfing the Internet, bathing, recharge, and payment.

For the spatiotemporal activity sequence of student $i$ on campus, we can use a two-dimensional matrix of location and time to represent it, called the activity matrix, as shown in Fig. 11a. The element value 1 or 0 indicates whether student $i$ has appeared in this time and space. In order to measure students’ co-existence from the time dimension in more detail, we further subdivide the behavior time into date and time scales, which respectively indicate the specific time scale on which day the behavior occurred. Therefore, the two-dimensional activity matrix is transformed into a three-dimensional activity matrix, as shown in Fig. 11b. Based on this tensor, we can count the co-occurrence of student $i$ and student $j$ . When the time interval between their appearances in the same location is lower than the threshold $\tau$ , they are considered to co-occur.

7.2 Prediction performance comparison

From more than 9,000 students, a small number are selected as the verification object because this experiment involves privacy. The reason for this is that the results of the experiment involve student privacy. Therefore, we can only choose the students who agree to know the results and announce that they conduct research to obtain their social relationships. 125 students in 5 classes of different grades in the data set are used as experimental samples. Considering only the relationships within the class; there are 1775 relationships. Classes are numbered to ensure the privacy of students. To determine the social relations of each student in the sample, these students are surveyed through questionnaires and interviews, and the results are converted into labels: mark it as 1 if it is a pair of friends, and mark it as 0 if it is not a pair of friends. In the end, we got 426 positive samples (friends) and 1349 negative samples (not friends) out of 1775 relationships. Table 2 shows the basic information of the survey samples, where Nodes is the number of students in the class, Edges is the number of students pair and Percentage indicates the proportion of students in this class across the 5 classes.

Table 2
The information of the selected students

Class	Grade	Nodes	Edges	Percentage
1101	17	29	406	23.2%
1102	17	30	435	24%
1103	18	28	378	22.4%
1104	18	22	231	17.6%
1105	18	26	325	20.8%

Figure 11.

Student visit matrix expression.

(1) Method comparison

The new method is compared with the EBM to judge the performance improvement. In addition, an advanced method of estimating social relationships based on spatio-temporal data that emerged after EBM is also chosen. It is best to conduct an ablation experiment on the new model, which tests the model under limited variants.

walk2friends [2]: Backes et al. proposed an advanced feature learning technology to automatically summarize the user’s mobile characteristics, including the locations he or she has visited and other users who have visited these locations.

PGT [23]: Wang et al. developed three features of co-occurrence, namely personal, global and temporal, in which the time factor is the same as the time interval we proposed but the weight is not considered.

LTE: An ablation method based on our method that only consider the co-occurrence frequency under the influence of time.

We build a student co-occurrence matrix with time weights according to Algorithm 5.4, calculate the location-time entropy according to Algorithm 6.2, and calculate the strength of students’ social relationships according to Algorithm 6.4. The distribution of student social strength of each method is counted, as shown in Fig. 12. The inflexion point is about 12% of the overall (12.9% after our precise calculation), so we have reason to believe that the top 10% of our predicted social strength is friend relationships. Unfortunately, we could not infer the proportion of non-friends from the Fig. 12, so the last 10%, 20%, 30% and 40% of the student relationship pair are chosen as non-friends for the experiment. The Percentages here represents the proportion of all relationship pairs that are non-friends.

Figure 12.

Social strength distribution. We rank the social strength in descending order, where the horizontal axis is the student pair and the vertical axis is the social strength. The red dotted line means friends, and green chooses four proportions that are non-friends.

Since the calculated social strength is a set of continuous values, it can be attributed to a regression problem. According to the proportion of non-friends, the data set is divided into 4 groups. The social strength value calculated by the model is used as the predicted value to calculate the three indicators of mean squared error (MSE), mean absolute error (MAE), and $R^{2}$ . The $R^{2}$ represents the proportion of variance that has been explained by theindependent variables in the model. It provides an indication of goodness of fit and therefore a measure of how well-unseen samples are likely to be predicted by the model, through the proportion of explained variance. The results are shown in Table 3. It can be seen that the MSE and MAE of the S-EBA model are the smallest, and $R^{2}$ is the largest, indicating that the accuracy and interpretation of the model are higher than other methods. To reduce the influence of contingency on the experimental results, in this experiment, the calculated social strength (less than 0.05) that are incredibly close to 0 are uniformly set to 0.

Table 3

Regression evaluation index comparison. The calculated social strengths are arranged in descending order. The first 10% are regarded as friends, and the last 10%, 20%, 30% and 40% are non-friends, respectively. (Bold font indicates best performance)

	10% non-friends			20% non-friends			30% non-friends			40% non-friends
	MSE	MAE	$R^{2}$	MSE	MAE	$R^{2}$	MSE	MAE	$R^{2}$	MSE	MAE	$R^{2}$
EBM	0.205	0.207	0.146	0.197	0.265	0.166	0.190	0.246	0.149	0.176	0.219	0.133
walk2frineds	0.164	0.198	0.147	0.164	0.198	0.147	0.164	0.198	0.147	0.164	0.198	0.147
PGT	0.088	0.139	0.546	0.085	0.132	0.545	0.095	0.135	0.469	0.101	0.134	0.383
LTE	0.078	0.121	0.576	0.077	0.117	0.560	0.094	0.126	0.441	0.104	0.131	0.336
S-EBA	0.063	0.090	0.600	0.062	0.089	0.589	0.082	0.105	0.445	0.080	0.101	0.413

The same classifier can have different recall and precision under different thresholds. The thresholds are usually taken in descending order of the actual value or probability of all sample predictions. Therefore, the P-R curve (precision-recall curve) is chosen as our evaluation index. The P-R curve compares all five methods and obtains precision and recall. As shown in Fig. 13, the results obtained by our method are better than EBM. The best area can reach 0.917, which is 18.3% higher than the 0.734 of EBM. Compared with the other two methods walk2friends and PGT and ablation method LTE, the increase was 8.4%, 2.5%, 4.1% respectively.

Figure 13.

P-R curve of different proportions of non-friends.

(2) Parameter verification

As mentioned earlier, compared with Shannon entropy, Renyi entropy can determine the contribution of local co-occurrence frequency to entropy by adjusting the value of parameter $q$ , thereby reducing the misjudgment of diversity caused by students’ chance encounters on campus. In particular, the location dimension in the location-time entropy is naturally divided, but human intervention is needed in the time dimension. Therefore, we introduce parameter $p$ to control the period of the location-time entropy. For example, $p=$ 15 means that the interval between two-time points of the student activity sequence is 15 minutes, and this parameter can be controlled manually. We run 5-fold cross-validation for each of the four $p$ values: $p=$ 15, 30, 45, 60. Figure 14 shows the changes in the accuracy of the hyperparameters $p$ and $q$ at different precisions, respectively. It can be observed that, in all cases, when the q value is increased from 0 to 0.1, the precision improves significantly and reaches the highest value. When the value of $p$ equals 15, the precision rate is the highest. Therefore, the optimal value of $q$ equals 0.1, and the optimal value of $p$ equals 15.

Next, we determined the regression parameters $\alpha$ and $\beta$ . According to the description of the paper in EBM [18], he recommended three sets of regression coefficient values (0.441, 0.550), (0.476, 0.521), and (0.483, 0.520) to draw the scale of real friendships versus the social strength. Based on the research, more sets of parameters are selected via cross-validation with a larger span, and the results are shown in Fig. 15. The smaller the number of buckets, the fewer students who have social connections, so ideally it should be a proportional distribution with a slope of 1. We finally determined that the effect is best when $\alpha=$ 0.15 and $\beta=$ 0.85.

Table 4

Information of students in class 1102

ID	Sex	Dormitory number	Number of friends	Class leader	Performance
1	Girl	3	3	Yes	Medium
2	Boy	0	0	No	Medium
3	Girl	3	2	No	Medium
4	Boy	1	7	No	Medium
5	Girl	4	5	No	Medium
6	Girl	4	2	Yes	Excellent
…	…	…	…	…	…
29	Boy	2	3	No	Medium
30	Girl	4	4	No	Excellent

Figure 14.

The precision of the parameters $q$ and $p$ under different recall.

Figure 15.

Percentage of real friendships versus the social strength of buckets. We divide 1,775 pairs of relationships into 89 groups in ascending order of social strength, with 20 pairs of students in each group. Then detect the proportion of the number of real friends in each group. The result follows that the lower the social strength, the lower the proportion of real friends.

7.3 Analysis of students’ social network and behavior pattern

To verify the effectiveness of the inferred social network in analyzing students’ social networking, we use node degree analysis, and Louvain algorithm [3] to analyze students’ weighted social network. We select the social network of class 1102 as the experimental object and analyze the social relationship between students from the perspective of groups and individuals. Table 4 shows the gender, dormitory number, and place of birth of 30 students in the class to explain the results of social relationship analysis.

Figure 16.

Using chord diagrams to visualize individual social relationships among students.

Figure 17.

Using Louvain algorithm to divide the student community.

Case 1: Individual Social Link Analysis

This case uses the chord diagram to visually analyse individual students’ social relationships. Generally, the betweenness centrality [25] of the student node in the social network can be used as the size of the nodes in the chord diagram. The node’s color is set according to the dormitory, and the student nodes living in the same dormitory have the same color. The thickness of the arc represents the calculated social intensity value. At the same time, to facilitate understanding, the naming of each node adopts the format of “student number $\_$ number of real friends”. For example, “1 $\_$ 3” means that student No. 1 has three friends. Figure 16 shows the use of chord diagrams to visualize the social relationships among students. Figure16a shows the social connections among all students. The strongest social connections are students No. 12 and 23, who lives in the same dormitory and close relationship; Fig. 16b highlights the social connection between student No. 4 and other students. Observation found that his overall social scope is wider, and he has the closest relationship with students No. 10 and 22. This is consistent with the survey results that he has seven real friends and a vibrant personality; Fig. 16c highlights the social situation of student No. 29. The social connection between this student and other classmates is weak, which can reflect his withdrawn personality. Figure 16d highlights the social situation of No. 25 student. It is characterized by limited social scope but closely with No. 22 and 26 students. We found that they are all non-Beijing students and often participate in different activities together.

Case 2: Class Social Relationship Analysis

This case uses Louvain’s algorithm to decompose the social network of class 1102 to understand the characteristics of small communities in the class. This allows us to grasp the reasons for the formation of communities, the associations between communities, and the integration process. After three iterations, the modularity tends to stabilize. To intuitively understand the decomposition results, the iterative results are visualized using force-guided layout [8].

Figure 17a shows the result of the first iteration. The class is divided into six communities: the No. 1 Purple Community, the No. 2 Dark Green Community, the No. 3 Orange Community, the No. 4 Green Community, the No. 5 Blue Community, and the Pink Community on the 6th. Combining student accommodation and gender data can reveal three phenomena. One is that the students in each community are all of the same genders. There is no situation where any community contains male and female students; the second is that the female community members are dormitories. The members are the same. For example, the No. 3 community members and the No. 5 community live in the same dormitory. Among the seven students in the No. 4 community, five live in the same dormitory; that is, the relationship between the students in the dormitory is relatively close, which shows that girls have a strong dormitory concept when socializing; third, boys’ dormitory concept when socializing is weaker than that of female classmates. For example, ten male classmates in Purple Community No. 0, 1, 2 dormitories, their social interaction shows the characteristics of the cross dormitory. Still, the two students in the second community do not belong to the same community as the other four students in the same dormitory. The No. 6 pink community only contains the No. 15 student who was transferred from another major in the spring semester of 2019 and did not live with his classmates. Therefore, the social relationship with the classmates is fragile.

Figure 17b shows the result of the second iteration, which contains four communities, namely, the orange community No. 1, the blue community No. 2, the dark green community No. 3, and the pink community No. 4. The No. 1 orange community merged the first iteration of No. 1 community, No. 2 community, No. 3 community, and No. 19 and No. 20 students of No. 4 community. This community includes all male students and six female students. After investigation, these six female classmates usually interacted more with male classmates. The community contains 18 classmates, accounting for 60% of the total class size, showing the characteristics of cross-dorm and transgender. However, the blue community No. 2 and the dark green community No. 3 still belong to the same dormitory.

Figure 17c shows the result of the third iteration, which contains three communities, namely the blue community No. 1, the orange community No. 2, and the pink community No. 3. Among them, the blue community No. 1 will be the second iteration. The orange community No. 1 and the dark green community No. 3 are merged, and the other two communities kept the results of the second iteration unchanged. This further shows that the social life of the students in the orange community. No. 2 is mainly dormitory students, and the social interaction with other students in the class is limited. However, by observing the performance data of the students, an interesting phenomenon is found. Four out of the five students in the No. 2 community, two students’ scores are excellent, accounting for 66.7% of the students with outstanding grades in the class.

The above analysis shows that the Louvain decomposition of the class social network can clearly understand the distribution and characteristics of the small communities in the class and provide helpful information for the construction of class cohesion.

8. Conclusion

Aiming at the mining of campus students’ social links, we propose an S-EBA model based on spatiotemporal behavior data to calculate the social strength of students. In order to analyze the special impact of location-intensive and time-sensitive campus scenes, Renyi entropy and location-time entropy are used to effectively suppress errors caused by hotspot space-time and contingency to ensure accuracy. Compared with the traditional methods, our method can calculate the social strength of students more accurately. This paper analyzes two cases of the relationship between individual and group society and obtains the influence of the social association inferred by the model on students’ mental health and performance. Certainly, the method can also be extended to a broader range of other fields, such as the social connection in social networks and passengerséˆ¥?hidden relation proximity for urban public transportation, etc. In future work, we will further analyze other features of the campus background that affect social strength to improve the model’s performance and further enhance the interpretability of teaching management.

Footnotes

Acknowledgments

The research project is partially supported by National Natural Science Foundation of China under Grant No. 62072015, 61906011, U19B2039, U1811463, 61632006, U21B2038, Beijing Municipal Science and Technology Project No. KM202010005014, Special Project of China Higher Education Association “Higher Education Informatization Research No. 2020XXHYB16. Beijing Natural Science Foundation No. 4202004.

References

Adriaens

De Bie

Gionis

Lijffijt

Matakos

and Rozenshtein

, Relaxing the strong triadic closure problem for edge strength inference, Data Mining and Knowledge Discovery, 2020, 1–41.

Backes

Humbert

Pang

and Zhang

, walk2friends: Inferring social links from mobility profiles, in: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, pp. 1943–1957.

Blondel

V.D.

Guillaume

J.-L.

Lambiotte

and Lefebvre

, Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment 2008(10) (2008), P10008.

Brook

C.A.

and Willoughby

, The social ties that bind: Social anxiety and academic achievement across the university years, Journal of Youth and Adolescence 44(5) (2015), 1139–1152.

Cook

K.S.

Cheshire

Rice

E.R.

and Nakagawa

, Social exchange theory, Handbook of social psychology, 2013, 61–88.

Crandall

D.J.

Backstrom

Cosley

Suri

Huttenlocher

and Kleinberg

, Inferring social ties from geographic coincidences, Proceedings of the National Academy of Sciences 107(52) (2010), 22436–22441.

Ebadi

Kang

J.E.

and Hasan

, Constructing activity-mobility trajectories of college students based on smart card transaction data, International Journal of Transportation Science and Technology 6(4) (2017), 316–329.

Fruchterman

T.M.

and Reingold

E.M.

, Graph drawing by force-directed placement, Software: Practice and Experience 21(11) (1991), 1129–1164.

Hristova

Musolesi

and Mascolo

, Keep your friends close and your facebook friends closer: A multiplex network approach to the analysis of offline and online social ties, in: Proceedings of the International AAAI Conference on Web and Social Media, Vol. 8, 2014.

10.

Jung

Chun

Jin

and Lee

K.-H.

, Quantitative computation of social strength in social internet of things, IEEE Internet of Things Journal 5(5) (2018), 4066–4075.

11.

Kelleher

K.J.

McInerny

T.K.

Gardner

W.P.

Childs

G.E.

and Wasserman

R.C.

, Increasing identification of psychosocial problems: 1979–1996, Pediatrics 105(6) (2000), 1313–1321.

12.

Zeng

Xiao

Jiang

Zheng

Liu

and Ren

, Drive2friends: Inferring social relationships from individual vehicle mobility data, IEEE Internet of Things Journal 7(6) (2020), 5116–5127.

13.

Zheng

Xie

Chen

Liu

and Ma

W.-Y.

, Mining user similarity based on location history, in: Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2008, pp. 1–10.

14.

Liu

Yang

Liu

and Ge

, Inferring and analysis of social networks using rfid check-in data in china, PloS One 12(6) (2017), e0178492.

15.

Njoo

G.S.

, Understanding human behavior through sensory data and location based services, in: 2019 20th IEEE International Conference on Mobile Data Management (MDM), IEEE, 2019, pp. 389–390.

16.

Pham

and Shahabi

, Towards integrating real-world spatiotemporal data with social networks, in: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2011, pp. 453–457.

17.

Pham

Shahabi

and Liu

, Ebm: an entropy-based model to infer social strength from spatiotemporal data, in: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 2013, pp. 265–276.

18.

Pham

Shahabi

and Liu

, Inferring social strength from spatiotemporal data, ACM Transactions on Database Systems (TODS) 41(1) (2016), 1–47.

19.

Segrin

, Indirect effects of social skills on health through stress and loneliness, Health Communication 34(1) (2019), 118–124.

20.

Segrin

Hanzal

Donnerstein

Taylor

and Domschke

T.J.

, Social skills, psychological well-being, and the mediating role of perceived stress, Anxiety, Stress, and Coping 20(3) (2007), 321–329.

21.

Strahan

E.Y.

, The effects of social anxiety and social skills on academic performance, Personality and Individual Differences 34(2) (2003), 347–366.

22.

Sun

W.-J.

and Liu

X.F.

, Inferring relationship semantics in social networks with dual-view features semi-supervised learning, in: 2019 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, 2019, pp. 1–5.

23.

Wang

and Lee

W.-C.

, Pgt: Measuring mobility relationship using personal, global and temporal factors, in: 2014 IEEE International Conference on Data Mining, IEEE, 2014, pp. 570–579.

24.

Wang

Kong

Xia

and Sun

, Urban human mobility: Data-driven modeling and prediction, ACM SIGKDD Explorations Newsletter 21(1) (2019), 1–19.

25.

White

D.R.

and Borgatti

S.P.

, Betweenness centrality measures for directed graphs, Social Networks 16(4) (1994), 335–346.

26.

Xiao

Zheng

Luo

and Xie

, Inferring social ties between users with human location history, Journal of Ambient Intelligence and Humanized Computing 5(1) (2014), 3–19.

27.

J.-Y.

Liu

Yang

L.-T.

Davison

M.L.

and Liu

S.-Y.

, Finding college student social networks by mining the records of student id transactions, Symmetry 11(3) (2019), 307.

28.

Yao

Nie

Xia

and Lian

, Predicting academic performance via semi-supervised learning with constructed campus social network, in: International Conference on Database Systems for Advanced Applications, Springer, 2017, pp. 597–609.

29.

Zheng

Chen

Feng

and Xie

, Mining individual life pattern based on location history, in: 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware, IEEE, 2009, pp. 1–10.

30.

Zheng

Zhang

Xie

and Ma

W.-Y.

, Recommending friends and locations based on individual location history, ACM Transactions on the Web (TWEB) 5(1) (2011), 1–44.

31.

Zhong-Ming

Sheng-Nan

Chen-Ye

Da-Gao

and Wei-Jie

, Link prediction model based on dynamic network representation, ACTA PHYSICA SINICA 69(16) (2020).

Inferring student social link from spatiotemporal behavior data via entropy-based analyzing model

Abstract

Keywords

1. Introduction

2.1 Analyzing social link

2.2 Inferring student social strength

3. Method overview

Table 1 Sign-in data sample

1) Consumption

2) Entering the library

3) Gateway login

4.2 Privacy protection

5.1 Visit frequency

6.1 Diversity

7.1 Data set

7.2 Prediction performance comparison

Table 2 The information of the selected students

(1) Method comparison

(2) Parameter verification

Case 1: Individual Social Link Analysis

Case 2: Class Social Relationship Analysis

Footnotes

Acknowledgments

References

Table 1
Sign-in data sample

Table 2
The information of the selected students