CaFIRST: A context-aware hybrid fuzzy inference system for the similarity measure of multivariate trajectories

Abstract

Movements of objects take place in different contexts and their trajectories are highly influenced by the contexts. Several studies have been conducted in the last decade on similarity measuring of raw trajectories, but very few have used context information in this process. Because the context information is collected from multifarious sources, it is qualitatively and quantitatively heterogeneous and uncertain. Therefore, the current distance functions are unable to measure the similarities between trajectories by considering the heterogeneous context information. This article presents a new context-aware hybrid fuzzy model, named CaFIRST, to measure the similarity of trajectories by considering not only the spatial footprints of moving objects but also various types of internal and external context information. CaFIRST is able to handle multi-size trajectories that are contextually enriched by both quantitative (numeric) and qualitative (descriptive) values. The performance of CaFIRST was examined using two real data sets, obtained from pedestrians and cyclists in New York City, USA. The results showed the robustness of CaFIRST for quantifying the commonalities in multivariate trajectories and its sensitivity to small alterations in context information. Furthermore, the effects of internal and external context information on similarity values are shown to be remarkable.

Keywords

1. Introduction

Owing to developments in navigation, positioning and tracking technologies, an unprecedented quantity of moving point data, (e.g., human, vehicle, animal) are being produced. The sequential positions of a moving point object (MPO) over the course of time generate a trail, which is known as a trajectory [28]. Nowadays, studying the movement of point objects and analysing their trajectories has become of great interest to researchers in various scientific fields, such as geographic information science, ecology and transportation. Specifically, measurement of the similarities between trajectories as a data mining technique in MPO databases has received considerable attention, because it can be the basis for understanding the objects’ behaviours, extracting movement patterns and predicting future movement trends [1, 23].

Finding the commonalities between trajectories and measuring their similarities is conventionally performed using distance functions. However, the current distance functions are only able to manipulate the spatial and spatio-temporal dimensions of trajectories, while they neglect the underlying situations and conditions during the movement, hereinafter termed the context. In other words, the movement of an entity is highly influenced by the context, (e.g., weather conditions, traffic, people nearby), and therefore the context should be incorporated in the movement analysis to obtain realistic results. A review of the literature discloses the utility and significance of accounting for context information in movement studies [3, 9], including in the similarity measurement process for trajectories [6 , 24].

Multiple definitions exist for the term context in the literature. Sharif and Alesheikh define context in movement studies as “that part of a situation or data that influences movement or is influenced by movement” [25]. In addition, context is generally categorized as internal, (i.e., related to the moving object) or external, (i.e., related to the environment where the movement takes place). Contextual information can be obtained from various sources, (e.g., sensors, user-generated content and inference systems) and at different accuracy levels [16]; thus, it can be recorded in quantitative (numeric) or qualitative (descriptive) forms. For example, slope can be represented as a percentage or the degree of satisfaction in a traversed path can be described as low, medium or high. Very few studies have developed or extended the current distance functions to take advantage of quantitative context information in similarity measurements of trajectories to date [5 , 23]. To the best of our knowledge, the present similarity measurement methods are unable to handle both quantitative and qualitative contextual information and the underlying uncertainty in the context information itself.

Fuzzy logic methods are able to deal with approximate and inaccurate values and are capable of capturing non-linear relationships between them [21], and so they can be applied to moving point object data sets [30]. The plethora of research contributions on exploiting fuzzy logic approaches demonstrates their applicability in movement applications [26] such as classifying human activity patterns [30], detecting travel transport modes [8], clustering mobile object trajectories [15] and eye movement [7] to name but a few. In this context, because of the ability of fuzzy inference systems (FISs) to handle heterogeneous and uncertain data [29], this research has developed a novel context-aware FIS for the similarity measure of trajectories which takes into account a wide variety of quantitative and qualitative information. The new contributions to the literature that this research can offer are threefold: (1) enriching trajectories with both multivariate internal and external contextual information, (2) employing qualitative and quantitative information in the similarity measure of trajectories and (3) introducing a context-aware hybrid fuzzy model and integrating the above variables in a broad sense.

The remainder of the paper is organized as follows. In Section 2, a review of the literature in terms of the current distance functions and the dimensions in the similarity search is discussed. Section 3 is dedicated to the proposed context-aware similarity measure model. The proposed model is assessed by applying it to two real data sets of pedestrian and cyclist trajectories in Section 4. Section 5 summarizes the research findings and provides potential future directions.

2. Related work

The challenging issue in similarity measurement of trajectories is how to define “similarity”. One perception is that the more two objects have common features, the more similar they are, and vice versa [18]. Specifically, two trajectories are similar if they have similar shapes, share similarities in their movement parameters, visit the same places, demonstrate identical trends [17] or are generated in similar contexts [23]. Such definitions have caused the similarity measures of trajectories to be explored from methodological and dimensional perspectives, which are described as follows.

From the methodological perspective, Dodge et al. claimed that the majority of distance functions fall into either the time series, (e.g., Minkowski distance, Dynamic Time Warping, Edit distance, Longest Common Subsequence) or the computational geometry, (e.g., Fréchet distance, Hausdorff distance) families [10]. Yuan and Raubal categorized the similarity measurement methods into geometry-based and time-based classes. The former concentrates only on the spatial aspects of trajectories and neglects the temporal dimension, while the latter incorporates the time characteristic into the similarity search on trajectories [35]. The time-based class can be considered as an extension to the time series family. However, new similarity functions have been developed recently [11, 31]. Because (1) each function has its own advantage(s) and disadvantage(s) and (2) the similarity measure of trajectories is an application-dependent task, there is no single standard method for all applications.

The introduced distance functions have limitations with respect to all types of data. On the one hand, the complex relationship between trajectories causes similarity searches to have a level of uncertainty. On the other hand, various heterogeneous parameters are used in the similarity search process. These parameters may come from different measurement devices or from individuals’ opinions. For example, speed, direction and slope variables are normally measured by instruments, while rating a path in terms of its difficulty, attractiveness and desirability is done by the users. Therefore, the context information can be diverse, descriptive, uncertain and heterogeneous, and must be modelled by a multivariate function.

From a dimensional standpoint, the spatial and spatio-temporal similarity measures of trajectories have been widely used by research studies for different applications. For example, the similarities for vehicle trajectories were extracted in a road network [33]. The similarities of mobile phone user trajectories were measured in an urban space [35]. The commonalities between North Atlantic hurricane trajectories were compared using a multidimensional similarity measure, not only for the spatio-temporal dimensions but also employing the movement parameters of the hurricane [10]. Moreover, as well as the physical properties, the semantic dimension was used to add meaning to raw trajectories [20, 37]. For instance, describing a moving object in terms of where (place), when (time), how (mode) and why (reason) can enrich the sequence of the object. This is known as a semantic trajectory [14]. In very few of the studies conducted, the semantic dimension was considered in the similarity measure of trajectories of visited places [12, 34]. However, context goes far beyond stops and visited places and covers a wide range of information that influence movement or are influenced by movement.

Incorporating contextual information provides a realistic perception of the similarity measure of trajectories. To this end, Buchin et al. modified existing distance functions (i.e., Hausdorff, Fréchet and equal time distances) to make them context-aware [6]. Specifically, geographical contexts were modelled as a polygonal subdivision of land use (sea/land) and as meteorological parameters for a North Atlantic hurricane data set and for albatross tracking data in Euclidean space. In another attempt, Sharif and Alesheikh proposed a multidimensional similarity measurement method based on Euclidean distance and applied it to real commercial aeroplane trajectories. The method was able to reveal the commonalities between the trajectories in the spatial, temporal and contextual dimensions, individually and collectively [22]. Despite the effectiveness of the developed method in handling both internal and external contexts, it can only be applied to trajectories of the same size (length). As a precursor to this work, a context-based dynamic time warping (CDTW) method was introduced for analysing trajectories of different lengths by including not only their spatial footprints but also a notion of internal and external contexts. The effectiveness of CDTW was examined in several experiments on a real commercial airplane data set [23].

All the aforementioned studies used quantitative (numeric) information in finding the similarities between trajectories. Despite contextual information being available in a qualitative (descriptive) form, no research has been conducted so far on incorporating both quantitative and qualitative information about moving objects for similarity assessment of their trajectories.

3. Methodology

A new hierarchal fuzzy inference system based on a genetic algorithm has been developed to measure the similarity of trajectories by employing a wide range of contextual information. Due to the complexity of the relationship between trajectories, designing fuzzy systems based on expert knowledge is a challenging task. In addition, the large number of contextual parameters means that the process of determining similarity measures of trajectories a high-dimensional problem. The advantages of using conventional fuzzy inference systems for solving such problems are fewer than the deficiencies, because of the exponential increase in their rules and decrease in their accuracy.

In order to establish an accurate, compact and adaptable system to determine the similarity measure of trajectories, a hybrid hierarchical fuzzy inference system (HFIS) has been designed. This system is named Context-aware Fuzzy InfeRence SysTem (CaFIRST), and it controls the rules by converting an HFIS into a number of simpler subsystems that are linked to each other hierarchically. The topology of CaFIRST is selected according to the problem, which is also confirmed by Zajaczkowski and Verma [36]. CaFIRST does not consider the problem as a single unit. Instead, each subsystem deals with a subset of the parameters. Therefore, the rules of the subsystems are much simpler than when the variables are set up in a single system.

The design of CaFIRST enables us to measure the similarity of trajectories for two sets of variables simultaneously. As shown in Fig. 1, the first group is a set of variables (hereinafter termed V_a) assigned to each sampling point along the trajectories, such as coordinates, speed and direction. The second group is a set of variables (hereinafter termed V_b) assigned to the entire trajectory, such as the difficulty of the path and the interest in the path. It should be noted that any of these variables can be in quantitative (numeric) or qualitative (descriptive) form.

Fig. 1

A sample multivariate trajectory.

In CaFIRST’s topology, the V_a and V_b variables are compared to each other separately. As shown in Fig. 2, for the variables V_a, the distance between two trajectories A and B is measured through the first fuzzy inference system (FIS 1) in layer 1 via pairwise comparison of samplingpoints a_i and b_j. The number of parameters attributed to each sampling point is given by k. By the same token, the similarity values for all other trajectories of group V_a are calculated by other FISs in the same layer. For the V_b set of variables, which are attributed to whole trajectories, the distances between pairs of trajectories are measured through other FISs in separate layers. The output of layer 1 and the similarity values from layer 2 are imported into the FIS in the next layer. This cumulative hierarchical process is continued until all the trajectories have been compared with each other.

Fig. 2

Hierarchical structure of a hybrid fuzzy inference system for similarity measure of trajectories.

For generating a training data set, we use the recently developed method named context-based dynamic time warping (CDTW) [23]. CDTW has been shown to be robust enough to quantify the commonalities between trajectories with different sizes. In this research, CDTW is used for assessing the similarities of trajectories, including not only their spatial footprints but also a notion of their quantitative contexts. Given two trajectories R [r₁, r₂, …, r_n] and S [s₁, s₂, …, s_m] of sizes n and m, respectively, it is possible for each element of any trajectory to be annotated by movement parameters (x, y, t) and internal and/or external context(s) (c) (see Equation (1)).

In other words, r₁ ={ x_r1, y_r1, t_r1, c_r1 } and s₁ ={ x_s1, y_s1, t_s1, c_s1 }. The CDTW distance is determined by ∀i, j, k : i = 1, …, n, j = 1, …, m, $\begin{matrix} CDTW (φ, φ) = 0 \\ CDTW (r_{i}, φ) = CDTW (φ, s_{j}) = \infty \\ {CDTW}_{k} (r_{i}, s_{j}) = {(C_{k} (r_{i}) - C_{k} (s_{j}))}^{2} + \\ \min {\begin{matrix} {CDTW}_{k} (r_{i}, s_{j - 1}) \\ {CDTW}_{k} (r_{i - 1}, s_{j}) \\ {CDTW}_{k} (r_{i - 1}, s_{j - 1}) \end{matrix} \end{matrix}$ (1) where k is the number of contexts attributed to each sampling point and φ is the empty trajectory.

In order to define and optimize the fuzzy inference system, the multi-objective non-dominated sorting genetic algorithm II (NSGA-II) was applied. The simultaneous generation of membership functions and tuning of rules through learning helped to optimize the accuracy and complexity of the system. To this end, the HFIS structure is encoded using NSGA-II, and each chromosome is considered equivalent to an HFIS. The algorithm is then applied for finding the optimal solutions (optimal HFISs). The objective of the exploratory algorithm for learning HFIS is to minimize the number of system rules and the root-mean-square error (RMSE). Equation (2) demonstrates the first objective function (F1), in which n^s is the number of fuzzy subsystems, m_j is the number of membership functions and v_i is the number of variables belonging to each subsystem. $F_{1} : Minimize (\sum_{i = 1}^{n^{s}} \prod_{j = 1}^{v_{i}} m_{j})$ (2)

Equation (3) relates to the second objective function (F2), where n is the number of samples, F (x_i) is the trajectory similarity of the fuzzy inference system for the ith vector of the data and y_i is the trajectory similarity of the observations. $F_{2} : Minimize (\sqrt{\frac{1}{n - 1} \sum_{i = 1}^{n^{c}} {(F (x_{i}) - y_{i})}^{2}})$ (3)

To encode the proposed HFIS, a triple-coding scheme is used. C_L determines the number of membership functions for each variable, C_T tunes the membership functions and C_C optimizes the consequence parts of the rules (Equation (4)). $C = C_{L} + C_{r} + C_{c}$ (4)

To encode the C_L part, a string of integers with size n + n^s - 1 is used, where n is the number of input variables and n^s - 1 is equal to the number of linking variables that are created for connecting different subsystems. Each of the genes (c_l (i)) is assigned one of the values {1, … , 7 }. These numbers represent the number of fuzzy membership functions used for each variable. The variables with a value of 1 are removed (Equation (5)). $C_{L} = {c_{i} (1); \dots; c_{i} (n + n^{s} - 1)}$ (5)

To encode the C_T part, a string of real numbers is used. These real numbers are determined in a predefined domain of displacement. The number of variables is n + n^s - 1, and each of them is described by a (c_l (i)) fuzzy membership function. For tuning membership functions, the lateral displacement method [2] is used (Equation (6)). $\begin{matrix} C_{T} = {c_{i} (1; 1); \dots; c_{i} (1; c_{l} (1)); \dots; \\ c_{i} (n + n^{s} - 1; c_{l} (n + n^{s} - 1))} \end{matrix}$ (6)

A real-value vector is exploited to encode the C_C part (Equation (7)). The output value of each rule is defined by a number between zero and one. $C_{c} = {c_{c} (1; 1); \dots; c_{c} (1; m_{1}^{v_{1}}); \dots; c_{c} (n^{s}; m_{n^{s}}^{v_{n^{s}}})}$ (7)

The initial population is randomly generated. To generate the C_L part of each primary population, one of the values {1, … , 7 } is randomly assigned to the corresponding gene. After creating the C_L part of each chromosome, two different versions are created in the C_T part. In these versions random integer numbers are allocated in the same way, as ((c_t (i, j + 1) - c_t (i, j))/2) in the positive and negative directions of the centre of each partition. The value c_t (i, j) indicates the centre of the jth partition of the ith variable. In the C_C part of each chromosome, a random number between zero and one is assigned to each cell. As demonstrated in Fig. 3, for each sample of C_L (in the blue part) two versions of C_T(yellow or orange) and C_C(green or purple) are generated.

Fig. 3

Sample chromosome structures.

Crossover and mutation operators are used for preventing premature convergence, until the best possible configuration is achieved. The crossover operator is selected for each part of the chromosome individually. The BLX - α method is used for the C_T and C_C parts that are encoded using real numbers [13, 29]. This crossover method has been extensively used in encoding real numbers in exploratory algorithms. In this method, the offspring genes are generated around the parent genes. Let us assume that $C^{i} = C_{L}^{i} + C_{T}^{i} + C_{C}^{i}$ and $C^{j} = C_{L}^{j} + C_{T}^{j} + C_{C}^{j}$ are two chromosomes selected from the parent pool to be crossed. The BLX operator (with α= 0.5) generates two offspring, where each gene in the C_T and C_c parts are randomly chosen from the interval $[min (c_{k}^{i}, c_{k}^{j}) - α d_{k}, max (c_{k}^{i}, c_{k}^{j}) + α d_{k}]$ , with $d_{k} = | c_{k}^{i} - c_{k}^{j} |$ . For the C_L sector, a two-point standard crossover method is used, i.e., the $C_{C}^{i}$ and $C_{C}^{j}$ parts. The genes between the two points are swapped between the parent chromosomes. The mutation operator applies to all the resulting offspring. Finally, after applying the two operators, the more optimized offspring form the new generation.

The parameters used in all the experiments included a maximum number of iterations of 500, an initial population of 200, a mutation rate of 0.2 and a crossover rate of 0.8. In addition, the centre-of-gravity and minimum operators were used for defuzzification and implication respectively. The sensitivity of these parameters will be investigated in future studies. In this study, the parameters of the algorithms are selected based on the HFIS learning studies using exploratory algorithms [4 , 38]. Algorithm 1 provides the pseudo code of CaFIRST. The computational time of CaFIRST is O (n * m), which is directly related to the lengths of the trajectories.

Algorithm 1:

CaFIRST

Input: N ← (population Size);

M ← (maximum number of iterations);

(C_p, M_p) ← (Crossover and Mutation probability);

Ensure: Set of non-dominated solutions with different

number of rules/accuracy trade off;

Initialize Population(PP);

C ← C_L + C_T + C_c

C_L ← Random {1, … , 7}

C_T ←±Random (c_t (i, j + 1) c_t (i, j))/2)

C_C ← Random (0 1)

Evaluate Population (RMSE, number of rules);

While (non-stop condition) do:

P_N ← P_P+P_O (Combine parent and offspring population)

P_F ← (Non-dominated fronts of P_N)

Calculate crowding distance

Choose the first N elements of P_F

Random [(C_Ti – α (C_Tj -C_Ti)) (C_Tj + α (C_Tj -C_Ti))]

and Random [(C_Ci - α (C_Cj – C_Ci)) (C_Cj + α (C_Cj – C_Ci))];

BLX – α crossover operator is applied to CT and CC parts

assume C_Ti < C_Tj and C_Ci < C_Cj

Two-point crossover is applied to C_C

Mutation is applied to 8 offspring

P_O ← a new population

End while

4. Implementation

In this section, we validate our methodology and demonstrate the applicability of CaFIRST in contextually measuring and analysing the similarity of multivariate trajectories. For this purpose, CaFIRST was initially applied to several pedestrian and cyclist movement data sets, taking into account contextual information. Secondly, the functionality of CaFIRST was assessed by representing the control surfaces and Pareto fronts.

4.1. Data

The area studied was Central Park in Manhattan, New York City, USA, as depicted in Fig. 4. The park, with an area of approximately 3.41 km² is one of the most famous parks in the United States, with millions of visitors per year. Due to the attractions and facilities of this park, various activities take place, the most common of which are walking, running, skating and cycling. In this research, a total of 400 real traces of individuals were examined, including 204 traces for pedestrians and 196 traces for cyclists [32].

Fig. 4

Study area: Central Park, Manhattan, New York City, USA.

For every moving object, as well as its spatial footprints, a series of items of information was recorded to provide internal context, (i.e., related to the moving object) and external context, (i.e., related to the environment). Table 1 illustrates a 2D plot of an exemplar pedestrian trajectory followed by the movement parameters and the contextual information in terms of quantitative and qualitative values.

Table 1

Recorded information of a sample trajectory [32]

	Quantitative value	Qualitative value
Internal	Speed * (5 km/h)	Path difficulty **
		(moderate)
context	Direction * (120°)	Degree of satisfaction **
		(high)
External	Slope * (2%)
context	Path length ** (12680 m)
Movement	Position * (40.7, – 73.9)
parameter	Path duration ** (150 min)

^*for each sampling point ^**for whole trajectory.

The variables for position (x, y), speed, direction and slope are recorded for every constituent sampling point, and the path length, path difficulty and degree of satisfaction variables are used to label the entire trajectory. The temporal resolutions of the pedestrian and cyclist data are approximately 20 s and 5 s respectively. The angles are measured in a clockwise direction relative to north, in degrees.

4.2. Results and discussion

In this section, the proposed CaFIRST was used for developing multidimensional models capable of using quantitative and qualitative internal and external context information, along with the spatial dimension, to measure the similarity of trajectories of different lengths. Using CaFIRST, four different models were developed for each pedestrian and cyclist data set. The main purpose of each model was to study the effects of the various internal and external contexts on finding the commonalities between trajectories. In the first model (S), the degree of similarity between the trajectories was measured for the spatial dimension. In the second model (ST), the time parameter was added to the first model and the commonalities between trajectories were measured in the spatio-temporal dimension. By enriching the second model with internal context information, the third model (STIC) explored the added value of this information with respect to the similarity results. Finally, the fourth model (STIEC) was designed, in which all the spatial, temporal, internal context and external context values together were employed for investigating the similarities between trajectories. Figure 5 represents the initial and final partitions for each membership function of the input, intermediate and output variables, as the results of a single trial of the fourth model (STIEC) after optimization. Although it is possible to select up to seven partitions for the different variables, due to the multi-objective optimization and in order to control the number of rules, most of the variables are described using five partitions.

Fig. 5

Initial and tuned database for of the STIEC model.

4.2.1 Evaluation of the similarity results

The similarity results obtained by applying the aforementioned four models to both the pedestrian and cyclist datasets were normalized to a scale of unity, to ensure they were invariant with respect to the amplitude of the scale as well as to ease interpretation and comparison. Then, a series of box plots was generated, aiming to explore the variations in the similarity results and to examine the role of the contextual information (see Fig. 6).

Fig. 6

Boxplots of variations of similarity searches between trajectories using quantitative and qualitative information.

The interquartile ranges (IQRs), or the values between the first and third quartiles, represent the magnitudes of the effective similarity values and their medians. The whiskers demonstrate the similarity domains, (i.e., the ranges between the maximum and minimum similarity values).

Applying the first model (S), the similarity values for the spatial dimension of pedestrian and cyclist trajectories range from 0.55 to 0.93 and 0.65 to 0.95 respectively, indicating that the distributions of the similarity values in the spatial dimension are very close to each other. By adding the time variable, the similarity ranges in the spatio-temporal dimension for both data sets increase, compared to the spatial dimension. This issue has a direct relationship to the length of the traversed paths and the elapsed time.

Applying the third model to the data sets, the remarkable effects of the quantitative and qualitative internal context variables become evident in the similarity domain. Although the median decreases, there are substantial increases in the spatio-temporal-internal context dimension. Comparing the similarity domains of pedestrians and cyclists, the effects of the internal context on the pedestrian movements is greater than for the cyclist movements. The combinatorial dimension of the spatial, temporal, internal and external contexts represents a greater similarity domain and IQR compared to the previous three experiments.

It can be seen from Fig. 4 that there are steadily decreasing trends in the magnitudes of the median values for both data sets as a result of adding more parameters into the models. These values range from 0.72 to 0.40 and 0.81 to 0.58 for the pedestrian and cyclist data sets respectively. Meanwhile, although the pairwise comparisons of IQRs demonstrate almost equal values for the same model, these ranges increase slightly as more parameters are added to the models. According to the outputs of the four models, it is apparent that both the internal and external contexts have considerable effects on the similarity results. However, the cyclist trajectories are less affected by the contexts compared to the pedestrian trajectories. In addition, we contend that although incorporating the context information reduces the number of trajectories that share common features, these trajectories are more realistically similar to each other.

To statistically evaluate the effects of each context on the similarity values and to determine whether there are any significant differences between their means, a one-way analysis of variance (ANOVA) was applied to the results. For this purpose, a confidence interval of 95% (α= 0.05) was used. The outputs of the ANOVA tests demonstrate meaningful statistical variations using various types of context information (Table 2). In other words, the effects of contextual information on the similarity results are shown to be meaningful (p < 0.001). In addition, in Table 2, the F-test results show that the between-group variance is much greater than the within-group variance (also called error group or error variance), indicating the considerable effect of the context parameters on the similarity results of the trajectories.

Table 2

One-way analysis of variance to compare the similarity of trajectories with different contexts

Model	Parameter	Pedestrian			Cyclist
Model	Parameter	Mean±SD	F	P	Mean±SD	F	P
S	Space	0.747±0.075			0.804±0.061
ST	Space, Time	0.564±0.115			0.711±0.102
STIC	Space, Time, Internal context	0.449±0.134	236.345	p < 0.001	0.627±0.127	237.359	p < 0.001
STIEC	Space, Time, Internal context, External context	0.406±0.160			0.538±0.139

A sensitivity analysis was used to determine the influence of the trajectory similarity results from various movement parameters and items of context information, using CaFIRST. Part of the output of this implementation for both the pedestrian and cyclist data sets is shown in Fig. 7. The bar charts represent the almost normal distribution and frequency of the variables time, degree of satisfaction and path length. In addition, the histograms demonstrate the impact of each model input on the outputs, (i.e., the similarity values). For example, the influence values of the similarity results of the trajectories from the temporal perspective for the pedestrian and cyclist data sets are 0.63 and 0.84, respectively, which indicates that there is a strong relationship between similarity values and the time parameter. However, the correlation values between similarity values and the degree of satisfaction parameter are 0.36 and 0.42 for the pedestrian and cyclist data sets respectively, which indicates less effect on the similarity results from this parameter. This can also be inferred from the distribution of the sampling points. Regarding the traversed path, the similarity correlation for pedestrian trajectories is 0.58 and for cyclist trajectories is 0.67. By reviewing all the movement parameters and context information, it can be shown that the effect on the similarity of the cyclist trajectories of these variables was higher than the effect on the pedestrian trajectories.

Fig. 7

Sensitivity analysis of CaFIRST to movement parameters and context information: a) Pedestrians trajectories, b) Cyclists trajectories.

4.2.2 Evaluation of CaFIRST performance

In this subsection, the performance of CaFIRST was evaluated by performing a set of analyses on the data sets. Control surface plots are not only able to visually demonstrate the relationship between inputs and outputs, but they are also able to address the consistency of the knowledge used in the FISs. Figure 8 illustrates three examples of the control surfaces, namely, time, distance and the degree of satisfaction for the fourth model, (i.e., STIEC), as this is the most complete and complex model relating to the similarity measure of the pedestrian trajectories. Each plot illustrates the relation between two inputs and one output. It can be seen from the plots that all the surfaces have fully covered the spaces, with no notches or abrupt changes. In other words, while CaFIRST prevents the exponential increase of rules, it produces rules in the process of learning and the optimization of the system that effectively establish the connection between inputs and outputs [27].

Fig. 8

Control surfaces of CaFIRST for the STIEC model applied on pedestrians trajectories: a) duration, b) distance, c) degree of satisfaction.

In addition, Fig. 9 illustrates the Pareto fronts generated by a single trial of CaFIRST for all the models of the pedestrian and cyclist data sets. The proposed algorithm for both data sets shows good coverage for all the Pareto fronts. However, the cyclists Pareto fronts show more uniformity than the pedestrian Pareto fronts. In other words, except for the third model (STIC) applied to the pedestrian data set, the diversity, (i.e., the distribution of the points on the Pareto frontier) is uniform.

Fig. 9

Pareto fronts derived from simultaneous learning through the two-tier optimization algorithm for trajectory similarity measure of the four models of pedestrians and cyclists.

As can be seen from Fig. 9, the ranges of the number of rules in these two sets of data are not significantly different. However, the more complex the model and the greater the use of input parameters, the more rules are generated to describe the system. The similarity of the number of rules in these two data sets can be considered as an indicator demonstrating the stability and repeatability of the proposed system for a variety of data sets [29].

5. Conclusion and future work

In this research, a novel hybrid method based on fuzzy inference systems was proposed to measure the multidimensional similarities of non-identical moving object trajectories with the use of internal and external context information. The proposed method, which was named CaFIRST, improves the existing distance functions in terms of using movement data and heterogeneous, descriptive and uncertain context information.

During the design of CaFIRST, to overcome the exponential increase of the rules to promote the accuracy of the system, two solutions were proposed. The first solution was using a hierarchical fuzzy inference system with a topology according to the problem and the second solution was simultaneous generation of membership functions and tuning of rules through learning. The combination of these two solutions led to a better association between the rules of the models and the optimization of the accuracy and complexity of the system.

CaFIRST improves upon the current distance functions by taking into account both internal and external context information, along with movement parameters. CaFIRST is able to handle both the quantitative and qualitative information attributed to each sampling point and to the entire trajectory. The performance of CaFIRST was evaluated by running several models on two real data sets consisting of pedestrian and cyclist trajectories in a constrained Euclidean space, accounting for internal and external context information. The results not only confirmed the efficiency of CaFIRST in producing a context-aware similarity measure of trajectories with various sizes, but also showed CaFIRST to be a system sensitive to any small alteration in the context information.

There are multiple directions for future work. We will use the similarity results of the trajectories for mining patterns and predicting the behaviour of the MPOs. Furthermore, we are interested in exploiting context information for other aspects of analysing moving object traces, such as classification and clustering. CaFIRST has been introduced to measure the similarity of movement between single MPO trajectories. Future research could examine the movements of groups of moving objects, such as tourist movements or the migrations of birds. Another promising future research direction relates to the interactions, (i.e., attraction or avoidance) of the MPOs with each other and with the geographical context. In addition, one can determine the place of interests (POIs) based on the outcomes of the similarity models.

References

Abbasi ,

Alesheikh and

Sharif , Ranking the City: The Role of location-based social media check-ins in collective human mobility prediction, ISPRS International Journal of Geo-Information 6(5) (2017), (136.

Alcalá ,

Alcalá-Fdez and

Herrera , A proposal for the genetic lateral tuning of linguistic fuzzy systems and its interaction with rule selection, IEEE Transactions on Fuzzy Systems 15(4) (2007), 616–635.

Amouzandeh ,

Goudarzi and

Karimipour , Contextual analysis of spatio-temporal walking observations, Cham: Springer International Publishing, 2018.

A.D.

Benítez and

Casillas , Multi-objective genetic learning of serial hierarchical fuzzy systems for large-scale problems, Soft Computing 17(1) (2013), 165–194.

V.S.

Brum-Bastos ,

J.A.

Long and

Demšar , Weather effects on human mobility: A study using multi-channel sequence analysis, Computers, Environment and Urban Systems 71 (2018), 131–152.

Buchin ,

Dodge and

Speckmann , Similarity of trajectories taking into account geographic context, Journal of Spatial Information Science 2014(9), 101–124.

Chandrika and

Amudha , A fuzzy inference system to recommend skills for source code review using eye movement data, Journal of Intelligent & Fuzzy Systems 34(3) (2018), 1743–1754.

R.D.

Das and

Winter , A fuzzy logic based transport mode detection framework in urban environment, Journal of Intelligent Transportation Systems 22(6) (2018), 478–489.

Demšar , et al., Analysis and visualisation of movement: An interdisciplinary review, Movement Ecology 3(1) (2015), 1–24.

10.

Dodge ,

Laube and

Weibel , Movement similarity assessment using symbolic representation of trajectories, International Journal of Geographical Information Science 26(9) (2012), 1563–1588.

11.

A.S.

Furtado ,

L.L.

Pilla and

Bogorny , A branch and bound strategy for fast trajectory similarity measuring, Data & Knowledge Engineering 115 (2018), 16–31.

12.

A.S.

Furtado , et al., Multidimensional Similarity measuring for semantic trajectories, Transactions in GIS, 2015.

13.

Herrera ,

Lozano and

J.L.

Verdegay , Tackling real-coded genetic algorithms: Operators and tools for behavioural analysis, Artificial Intelligence Review 12(4) (1998), 265–319.

14.

Ilarri ,

Stojanovic and

Ray , Semantic management of moving objects: A vision towards smart mobility, Expert Systems with Applications 42(3) (2015), 1418–1435.

15.

Izakian ,

M.S.

Mesgari and

Abraham , Automated clustering of trajectory data using a particle swarm optimization, Computers, Environment and Urban Systems 55 (2016), 55–65.

16.

Kaffash-Charandabi ,

A.A.

Alesheikh and

Sharif , A ubiquitous asthma monitoring framework based on ambient air pollutants and individuals’ contexts, Environmental Science and Pollution Research, 8 (2019), 7525–7539.

17.

Laube , Computational Movement Analysis. 1st ed. Springer Briefs in Computer Science. 2014: Cham: Springer International Publishing, p. 87.

18.

Lin , An information-theoretic definition of similarity, in ICML, 1998.

19.

López , et al., A hierarchical genetic fuzzy system based on genetic programming for addressing classification with highly imbalanced and borderline data-sets, Knowledge-Based Systems 38 (2013), 85–104.

20.

Parent , et al., Semantic trajectories modeling and analysis, ACM Computing Surveys 45(4) (2013), 1–32.

21.

Sadrykia ,

M.R.

Delavar and

Zare , A GIS-based decision making model using fuzzy sets and theory of evidence for seismic vulnerability assessment under uncertainty (case study: Tabriz), Journal of Intelligent & Fuzzy Systems 33(3) (2017), 1969–1981.

22.

Sharif and

A.A.

Alesheikh , Similarity measurement of trajectories based on contextual data in constrained euclidean space, Journal of Geomatics Science and Technology 5(4) (2016), 113–125.

23.

Sharif and

A.A.

Alesheikh , Context-awareness in similarity measures and pattern discoveries of trajectories: A context-based dynamic time warping method, GIScience & Remote Sensing 54(3) (2017), 426–452.

24.

Sharif and

A.A.

Alesheikh , Multi-dimensional pattern discovery of trajectories using contextual information, in 12th 3D Geoinfo Conference 2017, International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences: Melbourne, Australia, 2017.

25.

Sharif and

A.A.

Alesheikh , Context-aware movement analytics: Implications, taxonomy, and design framework, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8 (2018), e1233.

26.

Sharif ,

A.A.

Alesheikh and

Tashayo , Similarity measure of trajectories using contextual information and fuzzy approach, in Adjunct Proceedings of the 14th International Conference on Location Based Services, 2018, ETH Zurich.

27.

R.K.

Sharma ,

Kumar and

Kumar , Systematic failure mode effect analysis (FMEA) using fuzzy linguistic modelling, International Journal of Quality & Reliability Management 22(9) (2005), 986–1004.

28.

Spaccapietra , et al., A conceptual view on trajectories, Data & Knowledge Engineering 65(1) (2008), 126–146.

29.

Tashayo ,

Alimohammadi and

Sharif , A hybrid fuzzy inference system based on dispersion model for quantitative environmental health impact assessment of urban transportation planning, Sustainability 9(1) (2017), (134.

30.

Wan and

Lin , Classifying human activity patterns from smartphone collected GPS data: A Fuzzy classification and aggregation approach, Transactions in GIS 20(6) (2016), 869–886.

31.

Wang , et al., A new method for measuring topological structure similarity between complex trajectories, IEEE Transactions on Knowledge and Data Engineering (2018), 1–14.

32.

Wikiloc. 2017; Available from: https://www.wikiloc.com/.

33.

Xia , et al., Research of spatio-temporal similarity measure on network constrained trajectory data, in Rough Set and Knowledge Technology: in Proceedings 5th International Conference, RSKT 2010, Beijing, China,

Yu , et al., Editors. 2010, Springer Berlin Heidelberg: Berlin, Heidelberg, 2010, pp. 491–498.

34.

Xiao , et al., Inferring social ties between users with human location history, Journal of Ambient Intelligence and Humanized Computing 5(1) (2014), 3–19.

35.

Yuan and

Raubal , Measuring similarity of mobile phone user trajectories–a Spatio-temporal Edit Distance method, International Journal of Geographical Information Science 28(3) (2014), 496–520.

36.

Zajaczkowski and

Verma , Selection and impact of different topologies in multi-layered hierarchical fuzzy systems, Applied Intelligence 36(3) (2012), 564–584.

37.

Zhang , et al., Splitter: Mining fine-grained sequential patterns in semantic trajectories, Proceedings of the VLDB Endowment 7(9) (2014), 769–780.

38.

Zhang , et al., Hierarchical fuzzy rule-based system optimized with genetic algorithms for short term traffic congestion prediction, Transportation Research Part C: Emerging Technologies 43 (2014), 127–142.