Trajectory outlier detection method based on group division

Abstract

Trajectory-outlier detection can be used to discover the fraudulent behaviour of taxi drivers during operations. Existing detection methods typically consider each trajectory as a whole, resulting in low accuracy and slow speed. In this study, a trajectory outlier detection method based on group division is proposed. First, the urban vector region is divided into a series of grids of fixed size, and the grid density is calculated based on the urban road network. Second, according to the grid density, the grids were divided into high- and low-density grids, and the code sequence for each trajectory was obtained using grid coding and density. Third, the trajectory dataset is divided into several groups based on the number of low-density grids through which each trajectory passes. Finally, based on the high-density grid sequences, a regular subtrajectory dataset was obtained within each trajectory group, which was used to calculate the trajectory deviation to detect outlying trajectories. Based on experimental results using real trajectory datasets, it has been found that the proposed method performs better at detecting abnormal trajectories than other similar methods.

Keywords

Trajectory outlier detection group division grid density trajectory group regular sub-trajectories

1. Introduction

The development of vehicle communication and networks has generated a large amount of movement trajectory data. Analysis of these data can assist in urban planning [1], congestion detection [2], fraud detection [3], road flow prediction [4], and travel time estimation [5]. In this context, trajectory data mining and its applications have become important research topics for urban transportation systems.

Trajectory data analysis includes trajectory clustering [6, 7], trajectory classification [8], trajectory pattern mining [9], and trajectory outlier detection [10, 11, 12, 13]. This study focuses on the detection of taxi trajectory outliers. Trajectories that deviate from the norm spatially or in terms of distance are considered trajectory outliers (i.e., outlying trajectories) [14]. Many scholars have conducted research on trajectory outlier detection and achieved corresponding results, including clustering-, distance- [15] and grid-based [16, 17] methods.

However, most of the existing methods have two limitations. First, the impact of the road network environment on trajectory-outlier detection is neglected, whereas detour behaviour tends to occur in regions with dense road networks, where more roads are available for selection. Second, the trajectory dataset was considered as a whole, ignoring the deviations between the different types of normal trajectories. To solve these problems, this study proposes a trajectory outlier detection method based on group division (TODG), which is used to detect outliers from a dataset of trajectories with the same source (S) and destination (D).

The main contributions of this paper can be summarised as follows:

A method for defining grid types based on grid density calculations was proposed. The grids in the urban region were divided into high- and low-density grids. Trajectory groups were obtained based on the number of low-density grids through which each trajectory passed.

A trajectory outlier detection method based on group division was proposed. In each trajectory group, a regular sub-trajectory dataset was obtained based on high-density grid sequences. This method prevents the negative effect of excessive spatiotemporal deviation of normal trajectories on trajectory-outlier detection results and improves detection efficiency and accuracy.

A series of comparative experiments were conducted to verify the proposed method using real taxi trajectory datasets. The experimental results demonstrate that the TODG method performs better in trajectory outlier detection.

The remainder of this paper is organised as follows: Section 2 introduces existing methods related to trajectory-outlier detection. Section 3 provides the definitions of terms used here and related statements. The proposed method is described in detail in Section 4. The experimental evaluation and results are presented in Section 5. Section 6 summarises the study and provides suggestions for future work.

2. Related work

Existing trajectory outlier detection algorithms can be classified into clustering-, distance-, and grid-based methods.

2.1 Clustering-based methods

Clustering-based methods determine outlying trajectories according to clustering results. Ying et al. [18] proposed a method for identifying outliers in trajectories using the DBSCAN algorithm to cluster trajectories and identify trajectories in lower-density clusters as outliers; however, they ignored features other than position. To better express the local feature information of trajectories, Lu et al. [19] proposed a trajectory division strategy based on multi-motion features and a trajectory structure similarity measurement method, based on which they designed a distributed clustering algorithm for flow trajectories and detected outlying trajectories using the clustering results. Focusing on the problem of outlier detection in urban traffic flow, Wang et al. [20] proposed a framework that uses the fuzzy C-means clustering algorithm with optimal $k$ -clustering centres to cluster roads and extract similar road traffic flow patterns.

Clustering-based trajectory-outlier detection methods can be run in unsupervised mode. Some clustering-based algorithms require the distances between the trajectory points to be calculated. Therefore, they are similar to distance-based outlier detection methods.

2.2 Distance-based methods

Distance-based methods calculate the distance between data objects and obtain outlying trajectories based on Euclidean distance, Manhattan distance, dynamic time warp distance, etc. [21]. Lee et al. [22] proposed a partitioning and detection framework that first segmented trajectories into a set of line segments and then detected outliers using Hausdorff distance and density-based methods. To improve the accuracy of outlier detection, Hu et al. [23] analysed a trajectory-outlier detection method based on the Hausdorff distance and proposed an Improved Moving Euclidean Distance (IMED) to replace the Hausdorff distance. Belhadi et al. [24] proposed a two-phase outlier detection model to detect group trajectory outliers. In the first stage, individual taxi trajectory outliers are determined by calculating the distance between each point. In the second stage, feature selection and sliding window strategies were used to identify group trajectory outliers.

Distance-based methods require the selection of an appropriate distance metric that is closely related to their performance.

2.3 Grid-based methods

Grid-based methods quantify trajectories into a finite number of cells and convert each trajectory into a series of grid codes used for outlier detection. Based on the fact that abnormal features are easily affected by isolation mechanisms, Zhang et al. [17] proposed an iBAT algorithm based on isolation forests and isolation mechanisms. In response to the low efficiency of the iBAT algorithm, Chen et al. [10] proposed an isolation-based online outlier detection method that utilised an inverted indexing mechanism to quickly retrieve relevant trajectories. To determine different types of outlying trajectories, Wang et al. [25] proposed a trajectory-outlier detection and classification (ATDC) method that classifies trajectories into five categories based on trajectory anomaly scores. In addition to the aforementioned methods that detect outliers from a dataset of trajectories with the same source and destination, Bhattacharjee et al. [26] proposed a grid-based outlier detection technique that calculates the point density using kernel density estimation instead of distance measurement.

The trajectory outlier detection results of clustering-based methods largely depend on the selection of clustering numbers and the existence of outliers. Distance-based methods require the selection of appropriate distance metrics to frequently calculate the distance between trajectories, which is both time-consuming and inefficient. Grid-based methods have the advantages of high processing speeds and efficiency in trajectory retrieval. In addition, grid-based methods are typicallyy used to detect outliers from a dataset of trajectories with the same source and destination places; therefore, this study adopts a grid-based method to detect outlying trajectories. Existing grid-based methods generally convert trajectories into unit sequences to detect outliers, ignoring the road network environment and deviations between different types of normal trajectories. In reality, some grids have a large number of roads, whereas others have a small number, and abnormal behaviours such as detours often occur in grids with more optional roads. Therefore, this study proposes a trajectory-outlier detection method based on group division (TODG) to address these issues.

3. Problem description

This section defines the relevant concepts used in this paper.

Definition 1 (Trajectory). A trajectory refers to the GPS point sequence during the driving process of a taxi. The $i^{\text{th}}$ trajectory is denoted by $T_{i}=<{Id}_{i},(\textit{lon}_{1}^{i},\textit{lat}_{1}^{i}),(\textit{lon}_{2}^% {i},\textit{lat}_{2}^{i}),\ldots,(\textit{lon}_{n}^{i},\textit{lat}_{n}^{i})>$ , where $Id_{i}$ denotes the trajectory identifier $\textit{lon}_{j}^{i}$ and $\textit{lat}_{j}^{i}$ are the longitude and latitude of the $j$ th point of the $i$ th trajectory ${p}_{j}^{i}$ , respectively, and $n$ denotes the number of points.

Definition 2 (SD-pair trajectories). All trajectories with the same source (S) and destination (D) are defined as SD-pair trajectories.

Definition 3 (Trajectory dataset). A set consisting of multiple SD-pair trajectories is defined as a trajectory dataset and is represented as follows:

$\displaystyle TD=\{{T_{1},T_{2},T_{3},\ldots,T_{n}}\}.$ (1)

The subject of this study is trajectories with fixed source and destination.

Definition 4 (Grid density). The urban vector region is divided into cells of equal length and width, called road network grids. The density of each road network grid was calculated using Eq. (2).

$\displaystyle\textit{Density}_{G_{S}}=\frac{\sum\nolimits_{i=1}^{k}{L_{i}}}{G_% {S}},$ (2)

where $\mathop{\sum}\limits_{{i=1}}^{k}{L}_{i}$ denotes the total length of roads within the grid, and ${G}_{S}$ denotes the area of the grid.

Definition 5 (Trajectory code sequence). Taking trajectory $T_{i}$ as an example, ${TS}_{h}^{i}=\langle{{g}_{{11}}^{i},g_{{12}}^{i},\ldots,g_{{1m}}^{i}}\rangle$ and ${TS}_{l}^{i}=\langle{{g}_{{21}}^{i},g_{{22}}^{i},\ldots,g_{{2r}}^{i}}\rangle$ are the encoding sequences of the high- and low-density grids that $T_{i}$ passes through. The trajectory code sequence of $T_{i}$ , denoted as $TS_{i}$ , is defined as

$\displaystyle TS^{i}=<g_{11}^{i},g_{12}^{i},\ldots,g_{1m}^{i},g_{21}^{i},g_{22% }^{i},\ldots,g_{2r}^{i}>,$ (3)

where the number of trajectory codes in ${TS}_{h}^{i}$ is $m$ , and that in ${TS}_{l}^{i}$ is $r$ .

Definition 6 (Trajectory group). A set of trajectories with the same number of low-density grids is defined as a trajectory group.

Figure 1.

Division of trajectory groups.

As shown in Fig. 1, the grids marked in blue are low-density grids, and the rest are high-density grids. A trajectory dataset can be divided into one or more trajectory groups according to the number of low-density grids through which all the trajectories pass. In Fig. 1, eight low-density grids passed through $T_{1}$ and $T_{2}$ . Therefore, $T_{1}$ and $T_{2}$ are on the same trajectory. Similarly, $T_{3}$ , $T_{4}$ , and $T_{5}$ were in the same trajectory group; $T_{6}$ , $T_{7}$ , and $T_{8}$ were in the same trajectory group; and $T_{9}$ belonged to a separate trajectory group. The trajectory dataset was divided into four groups.

Definition 7 (Compatible trajectory). In the same trajectory group, if the $TS_{h}$ of $T_{i}$ and $T_{j}$ are the same, then $T_{i}$ and $T_{j}$ are called compatible trajectories. Two trajectories with completely different code sequences were considered incompatible. The calculation method for determining whether any two trajectories in a trajectory group are compatible or incompatible is shown in Eq. (4).

$\displaystyle\rho=\frac{|{TS_{h}^{i}\cup TS_{h}^{j}}|-|{TS_{h}^{i}\cap TS_{h}^% {j}}|}{|{TS_{h}^{i}\cup TS_{h}^{j}}|},$ (4)

Figure 2.

Division of trajectory.

where ${TS}_{h}^{i}$ is the encoding sequences of the high-density grids that $T_{i}$ passes through, and ${TS}_{h}^{j}$ is the encoding sequences of the high-density grids that $T_{j}$ passes through. If $\rho=$ 0, the trajectories $T_{i}$ and $T_{j}$ are compatible; if $\rho=$ 1, $T_{i}$ and $T_{j}$ are incompatible.

Definition 8 (Trajectory cluster). A set of compatible trajectories is defined as a trajectory cluster.

In the example shown in Fig. 1, $T_{4}$ and $T_{5}$ are compatible trajectories in the same cluster.

Definition 9 (Trajectory code rate). Let $T D$ be a trajectory dataset and $TZ_{i}$ be the $i^{\text{th}}$ trajectory cluster. The trajectory code rate of $T Z$ as the ratio of the number of trajectories within $TZ_{i}$ to the number of trajectories within $T D$ , which is calculated using Eq. (5).

$\displaystyle\zeta_{i}=\frac{\textit{num}(TZ_{{}^{i}})}{\textit{num}(TD)},$ (5)

where $\textit{num}(TZ_{i})$ is the number of trajectories within $TZ_{i}$ and $\textit{num}(TD)$ is the number of trajectories within $T D$ .

Definition 10 (Regular sub-trajectory). A trajectory cluster whose trajectory code rate exceeds a specified threshold is called a regular trajectory set, in which the trajectories are defined as regular subtrajectories. Regular subtrajectories are part of normal trajectories. In addition to regular subtrajectories, other trajectories in the trajectory group are called pseudo-anomalous trajectories. The relationships between regular subtrajectories, pseudo-anomalous trajectories, trajectory clusters, trajectory groups, and trajectory datasets are shown in Fig. 2.

4. Trajectory outlier detection

TODG consists of two stages: trajectory pre-processing and trajectory outlier detection. The working model of the proposed method is illustrated in Fig. 3.

Figure 3.

Trajectory outlier detection framework.

4.1 Trajectory pre-processing

Before conducting trajectory outlier detection, each trajectory must be converted into a trajectory code sequence via trajectory preprocessing.

First, the urban vector region is divided into grids of fixed size. Each grid is assigned a different code. Subsequently, based on the urban road network data, the density of each grid was calculated using Eq. (2).

In this study, a natural interruption-point grading method was used to determine the grid density threshold $\eta$ . This method utilises clustering to maximise the similarity within each class and the dissimilarity between external classes; however, clustering does not focus on the number and range of elements in each class. The natural-interruption-point grading method also ensures that the density range and quantity of different classes are as similar as possible. The grids in the urban region were divided into high- and low-density grids based on this threshold $\eta$ . Fixed SD-pair trajectories were coded to obtain trajectory code sequences.

4.2 Trajectory outlier detection

In the trajectory outlier detection stage, the trajectory groups were first obtained according to the number of low-density grids. Trajectory clusters in each trajectory group were then obtained for trajectory outlier detection. In addition to the start and end codes, if the number of high-density grids in the trajectory group is zero (although the region studied in this paper is an urban centre, there is almost no such situation), outliers are detected based on the code sequence of low-density grids.

4.2.1 Trajectory grouping

Detours mostly occur in dense areas of the road network, where the road network is complex and there are many optional roads. However, vehicles may also pass through sparse areas of the road network with low grid density while driving. For trajectory outlier detection, the trajectory dataset is first divided into trajectory groups based on the number of low-density grids through which each trajectory passes. The trajectory grouping method is presented in Algorithm 1.

Algorithm 1: Trajectory grouping
Input:
Trajectory code sequence set $T P$ , trajectory code number matrix $G n$
Output:
Trajectory groups ${TG}_{1}$ , ${TG}_{2}$ , …, ${TG}_{p}$
1: sort rows ( $G n$ , 3); // Sort $G n$ in ascending order by the element value of the third column
2: $j\leftarrow$ 1; $p\leftarrow$ 1; $t\leftarrow Gn_{1,3}$ ; $TG_{1}\leftarrow\emptyset$ ; $i\leftarrow$ 1;
3: while $i<=$ row ( $G n$ ) // The number of rows of $G n$
4: if $t==Gn_{i,3}$
5: $h\leftarrow Gn_{i,1}$ ;
6: for $r\leftarrow$ 1 to row ( $T P$ ) // row ( $T P$ ) is the number of trajectories within $T P$
7: if $TP\{r,1\}_{1,1}==h$
8: $TG_{p}\{j,1\}\leftarrow TP\{r,1\}$ ;
9: $j\leftarrow j+1$ ;
10: end if
11: end for
12: else
13: $p\leftarrow p+1$ ;
14: $TG_{p}\leftarrow\emptyset$ ; $j\leftarrow$ 1;
15: $t\leftarrow Gn_{i,3}$ ;
16: $i\leftarrow i-1$ ;
17: end if
18: $i\leftarrow i+1$ ;
19: end while
20: return ${TG}_{1},{TG}_{2},\ldots,{TG}_{p}$ ;

In the initialisation stage, $G n$ recorded the numbers of high- and low-density grids for each trajectory. Each row represents a trajectory. The first column records the identification of the corresponding trajectory, and the second and third columns record the numbers of high- and low-density grids, respectively. $T P$ is the set of trajectory code sequences obtained after preprocessing, and each row records the full information of one trajectory, including trajectory identification and trajectory high- and low-density codes.

The time complexity of Algorithm 1 is $O(n^{2})$ and the space complexity is $O(n)$ , where $n$ is the number of trajectories.

4.2.2 Trajectory clustering

Before a trajectory group is divided into trajectory clusters, it is necessary to calculate the trajectory group rate to determine whether clustering is necessary. The trajectory group rate, denoted by $\gamma$ , was calculated using Eq. (6).

$\displaystyle\gamma_{i}=\frac{\textit{num}(TG_{i})}{\textit{num}(TD)},$ (6)

where $\textit{num}(TG_{i})$ is the number of trajectories within the trajectory group $TG_{i}$ , and $\textit{num}({TD})$ is the number of trajectories within the trajectory dataset $T D$ . If the trajectory group rate is less than the trajectory code rate threshold $\zeta$ , all trajectories in this trajectory cluster are outlying trajectories. If the trajectory group rate is greater than or equal to $\zeta$ , the trajectory group is divided into trajectory clusters.

For a trajectory group that can be divided, the first step in trajectory-outlier judgment is the acquisition of trajectory clusters. Compatible trajectories within the trajectory group are divided into trajectory clusters according to Eq. (5), such that the trajectory group is divided into one or more trajectory clusters.

The trajectory-clustering process is presented in Algorithm 2. Let us consider trajectory group $TG_{1}$ as an example.

Algorithm 2: Trajectory clustering
Input:
Trajectory dataset $T D$ , trajectory group $TG_{1}$ , trajectory code rate threshold $\zeta$
Output:
Trajectory clusters $TZ_{1}$ , $TZ_{2}$ , …, $TZ_{t-1}$ or outlying trajectory dataset $T F$
1: $TF\leftarrow\emptyset$ ;
2: Calculate $\gamma_{1}$ using Eq. (6);
3: if $\gamma_{1}<\zeta$
4: $TF\leftarrow TF\cup TG_{1}$ ;
5: return $T F$
6: else
7: $t\leftarrow$ 1;
8: while $\sim$ isempty ( $TG_{1}$ )
9: $j\leftarrow$ 1;
10: $TZ_{t}\leftarrow\emptyset$ ;
11: for $i\leftarrow$ 1 to row ( $TG_{1}$ ) // row ( $TG_{1}$ ) is the number of trajectories in trajectory group $TG_{1}$
12: $p\leftarrow\|TG_{1}\{1,1\}_{:,2}\cup TG_{1}\{i,1\}_{:,2}\|$ ;
13: $q\leftarrow\|TG_{1}\{1,1\}_{:,2}\cap TG_{1}\{i,1\}_{:,2}\|$ ;
14: Calculate $\rho\leftarrow({pq})/p$ ; // Using Eq. (4);
15: if $\rho==$ 1
16: $TZ_{t}\{j,1\}\leftarrow TG_{1}\{i,1\}$ ;
17: $j\leftarrow j+1$ ;
18: end if
19: end for
20: $TG_{1}\leftarrow TG_{1}-TZ_{t}$ ;
21: $t\leftarrow t+1$ ;
22: end whlie
23: return $TZ_{1}$ , $TZ_{2}$ , …, $TZ_{t-1}$ ;
24: end if

The time complexity of Algorithm 2 is $O(f\times v)$ , and the space complexity is $O(q)$ , where $f=t-1$ is the number of trajectory clusters, $v=\max\{|TZ_{1}|,|TZ_{2}|,\ldots,|TZ_{t-1}|\}$ , and $q$ is the number of trajectories within $TG_{1}$ .

4.2.3 Outlier detection

After trajectory clustering, a set of regular subtrajectories was obtained using Eq. (5). At this point, the trajectory group was divided into regular subtrajectory and pseudo-anomalous trajectory datasets. The trajectories within the pseudo-anomalous trajectory dataset were judged to be abnormal based on the regular subtrajectory set. In this study, a trajectory-scoring function was designed to judge outlying trajectories. Taking any trajectory group as an example, the trajectory score calculation is given by Eq. (7).

$\displaystyle\varpi_{j}=\frac{|{TS^{i}\cap TS^{j}}|}{|{TS^{i}\cup TS^{j}}|},$ (7)

where $TS^{i}$ is the code sequence of trajectory $T_{i}$ in the regular sub-trajectory dataset and $TS^{j}$ is the code sequence of trajectory $T_{j}$ in the pseudo-anomalous trajectory dataset. The calculation result satisfies 0 $\leqslant\varpi_{j}\leqslant$ 1; the larger the $\varpi_{j}$ , the more likely the trajectory $T_{j}$ is a normal trajectory.

After trajectory scoring, the pseudo-anomalous trajectory dataset was divided into quasi-normal and outlying datasets. The set formed by the quasi-normal trajectory dataset and regular subtrajectory dataset is the normal trajectory dataset in the trajectory group, and the remaining trajectories in the group are the outlying trajectories. The outlying trajectory determination results of the trajectory groups were not affected by each other.

The acquisition of a regular subtrajectory dataset and judgment of the outlying trajectories are shown in Algorithm 3 (taking trajectory group $TG_{1}$ as an example).

Algorithm 3: Regular sub-trajectory dataset acquisition and trajectory outlier detection
Input:
Trajectory dataset $T D$ , trajectory clusters $TZ_{1}$ , $TZ_{2}$ , …, $TZ_{t-1}$ , trajectory code rate threshold $\zeta$ , trajectory score
threshold $\varpi$ , trajectory group $TG_{1}$
Output:
Outlying trajectory dataset $T F$
1: $TN\leftarrow\emptyset;PT\leftarrow\emptyset;TF\leftarrow\emptyset$ ;
2: for $i\leftarrow$ 1 to $t-1$
3: Calculate $\zeta_{i}$ using Eq. (4);
4: if $\zeta_{i}\geqslant\zeta$
5: $TN\leftarrow TN\cup TZ_{i}$ ; // $T N$ represents regular sub-trajectory dataset;
6: else
7: $PT\leftarrow PT\cup TZ_{i}$ ; // $P T$ represents pseudo-anomalous trajectory dataset;
8: end if
9: end for
10: for $j\leftarrow$ 1 to row ( $P T$ ) // row ( $P T$ ) is the number of trajectories within $P T$
11: Calculate $\varpi$ using Eq. (7);
12: if $\varpi_{j}<\varpi$
13: $TF\leftarrow TF\cup\{{{T}_{j}}\}$ ;
14: end if
15: end for
16: return $T F$ ;

Figure 4.

San Francisco Bay Area and trajectories.

The time complexity of Algorithm 3 is $O(h)$ and the space complexity is $O(n)$ , where $h$ is the number of trajectories in the pseudo-anomalous trajectory dataset $P T$ , and $n$ is the number of trajectories. The final output $T F$ is the detection result of the TODG algorithm.

5. Results and discussion: Experimental analysis

5.1 Experimental setup and dataset

The experimental setup was an Intel Corei5 processor with a 3.10 GHz CPU. The operating platform was Windows 10, and the proposed algorithm was implemented using MATLAB 2020a and ArcGIS 10.7.

This study used the dataset provided by Piorkowski et al.,1 which contains the trajectories of 536 taxis in the urban area of San Francisco, USA, over a 30-d period with an average sampling rate of 100 s. The GPS trajectory data recorded the location (latitude and longitude) of each taxi as well as the corresponding time and occupancy. The partial trajectory points and a schematic map of the San Francisco Bay Area are shown in Fig. 4. In this study, only the trajectories between the San Francisco Urban Airport and the central residential location were extracted, and the performance of the proposed trajectory-outlier detection method was measured using F-measure, Accuracy, Precision and Recall. The calculation methods are expressed in Eqs (8)–(11).

$\displaystyle\textit{F-measure}=\frac{2\times\textit{Recall}\times\textit{% Precision}}{\textit{Recall}+\textit{Precision}},$ (8) $\displaystyle\textit{Accuracy}=\frac{TP+TN}{TP+FN+FP+TN},$ (9) $\displaystyle\textit{Precision}=\frac{TP}{TP+FP},$ (10) $\displaystyle\textit{Recall}=\frac{TP}{TP+FN},$ (11)

where $T P$ (true positive) represents the number of normal trajectories that are correctly detected, $T N$ (true negative) represents the number of abnormal trajectories that are correctly detected, $F P$ (false positive) represents the number of abnormal trajectories that are mistakenly detected as normal, and $F N$ (false negative) represents the number of normal trajectories that are mistakenly detected as abnormal.

5.2 Grid density analysis

Based on the experimental analyses of ATDC [10] and iBAT [17], the regional vector map of San Francisco was first divided into fixed-size (300 m $\times$ 300 m) grids, as shown in Fig. 5a and b. The urban road data and divided vector graphics were superimposed, as shown in Fig. 5c. Then, the density of each grid was calculated. Based on the grid density calculation results, the grid density threshold $\eta$ was obtained using the natural-interruption-point grading method. The grids in the urban region were divided into two categories, high- and low-density, based on the threshold $\eta$ . As shown in Fig. 5d–f, the trajectory dataset was input into the divided urban region, and the grid codes passed by each trajectory were obtained to form the trajectory code sequence.

Figure 5.

Grid division.

Figure 6.

Road network information.

Table 1

Trajectory dataset information

$T D$	$\|TD\|$	Percentage of normal trajectories (%)	Percentage of abnormal trajectories (%)
T-1	1080	81.9	18.1
T-2	1150	81.7	18.3
T-3	1125	81.8	18.2

Figure 7.

Fmeasure, Accuracy, Precision, Recall values on T-1 for different parameters.

Figure 8.

Results of other datasets.

5.3 Trajectory outlier detection

Regular subtrajectory acquisition and abnormal judgment in trajectory outlier detection require dual-parameter settings. To verify the performance of the proposed TODG method, three SD pair (residential region to airport) trajectory sets, T-1 to T-3, were selected from 463,860 trajectories extracted from 1.122 million GPS points on 536 taxis. The roads near residential places or airports are more complex, and the roads between them are sparse, as shown in Fig. 6. T-1, T-2, and T-3 contain 1080, 1150, and 1125 trajectories, respectively. Some of the datasets are available at https://github.com/TUD-DD/ATDSO. The percentages of normal and outlying trajectories in each trajectory dataset are listed in Table 1, where $|TD|$ represents the number of trajectories in $T D$ . T-1 was used as the test set to obtain the appropriate $\zeta$ and $\varpi$ . The F-measure, Accuracy, Precision and Recall for different $\zeta$ ( $\zeta_{1}=$ 0.006, $\zeta_{2}=$ 0.012, $\zeta_{3}=$ 0.018, $\zeta_{4}=$ 0.024) and $\varpi$ ( $\varpi_{1}=$ 0.80, $\varpi_{2}=$ 0.87, $\varpi_{3}=$ 0.93, $\varpi_{4}=$ 0.98) values are shown in Fig. 7.

From Fig. 7, the F-measure, Accuracy, Precision and Recall values are higher at ( $\zeta_{2}$ , $\varpi_{3}$ ); therefore, we selected parameters ( $\zeta_{2}$ , $\varpi_{3}$ ) to verify the relevant evaluation values on datasets T-2 and T-3, as shown in Fig. 8, where good results for the F-measure, Accuracy, Precision and Recall were achieved under the selection of parameters ( $\zeta_{2}$ , $\varpi_{3}$ ). Therefore, we chose ( $\zeta_{2}$ , $\varpi_{3}$ ) for a comparative evaluation.

5.4 Comparative evaluation

5.4.1 Efficiency of trajectory outlier detection

This section compares the time cost of TODG, Two Phase [24], ATDC [25], and iBAT [17]; the results are listed in Table 2. The running time of TODG on the three experimental datasets was always shorter than that of Two Phase, ATDC, and iBAT. Two Phase algorithm detects outliers by calculating the distance between adjacent trajectory points and requires repeated calculations of the Euclidean distance between different trajectory points. ATDC first calculates the abnormal scores of all trajectories in the trajectory dataset using a single trajectory as the standard. It then finds the trajectories with the largest number of the same score, which is used as the standard set to repeat the above operation, and calculates the abnormal scores of the remaining trajectories in the trajectory set. iBAT identifies outlying trajectories based on isolation and constructs multiple decision trees to detect trajectory outliers. The core concept is to isolate the trajectories cyclically. The Two Phase, ATDC and iBAT algorithms have a higher time complexity, and their running time is longer than that of the TODG method proposed in this study.

5.4.2 Accuracy of trajectory outlier detection

Based on the parameter settings in Section 5.3, this section compares the F-measure, Accuracy, Precision, and Recall values of TODG with those of Two Phase, ATDC, and iBAT to test the effectiveness of the proposed TODG method for trajectory-outlier detection. Figures 9–12 show the F-measure, Accuracy, Precision, and Recall results of the four methods, TODG, Two Phase, ATDC, and iBAT, on trajectory sets T-1, T-2, and T-3.

Table 2
Running time (sec)

Dataset	TODG	Two phase	ATDC	iBAT
T-1	0.86	7.39	2.74	107.94
T-2	0.96	8.56	2.57	121.40
T-3	0.96	7.49	2.52	119.21

Figure 9.

Fmeasure comparison.

Figure 10.

Accuracy comparison.

Figure 11.

Precision comparison.

Figure 12.

Recall comparison.

Figure 13.

Visualization of the original trajectories.

Figure 14.

Visualization of TODG results.

As shown in Figs 9–12, although the Two Phase have higher Precision on the T-3 dataset, it has a smaller F-measure owing to a smaller Recall than that of TODG. The outlier detection results of the iBAT were worse than those of the other three methods. This is because the iBAT algorithm detects outlying trajectories according to the selected subsamples, whereas the detection results are affected by the selected subsamples. The Two Phase algorithm calculates the trajectory point density and identifies individual outliers based on the trajectory point density threshold. However, the detection results were affected by the density values of individual trajectory points. The ATDC algorithm converts the driving distance of a trajectory into several grids, and the detection results of the outlying trajectories are influenced by the number of grids. Meanwhile, Two Phase, ATDC, and iBAT ignore the objective factors that normal trajectories may contain multiple spatial types and that there may also be deviations between normal trajectories. Therefore, the F-measure, Accuracy, Precision and Recall results of the other methods were inferior to those of the TODG.

5.4.3 Visualization of trajectory outliers

Based on the parameter settings, the detection effect of the proposed TODG method was verified on the T-1, T-2, and T-3 trajectory sets. The visualization results are shown in Fig. 13.

The detection results of TODG on T-1, T-2 and T-3 datasets are shown in Fig. 14a–c respectively.

The outlying trajectories detected by the TODG are marked in green in Fig. 14. The experimental results show that the proposed TODG algorithm performs well in trajectory-outlier detection.

6. Conclusions

Most existing trajectory outlier detection methods use the trajectory as an entire object to detect outlying trajectories and do not consider situations in which the deviation between individual trajectories in the trajectory dataset is too large. To address these issues, this study proposes a trajectory outlier detection method based on group divisions. This method divides the trajectory dataset into multiple trajectory groups according to the trajectory features and detects outlying trajectories in the trajectory groups to prevent the influence of trajectory deviation in the trajectory dataset on outlying trajectory judgment. Simultaneously, the road length was considered under fixed grid conditions, and the trajectories were coded and divided into trajectory groups according to the number of codes with low grid density. Experimental results show that the proposed method has better detection accuracy and operational efficiency than similar methods.

However, TODG has certain limitations. Although it can distinguish between normal and outlying trajectories, it cannot identify outlying subsegments. In addition, when there is collective detour behaviour, the detection effect may not be good. In future work, we will conduct a more detailed classification of normal and outlying trajectories, integrate temporal and semantic attributes, analyse road traffic conditions, and further improve the accuracy of trajectory-outlier detection.

Footnotes

https://doi.org/10.15783/C7J010.

Acknowledgments

The work was supported by the National Natural Science Foundation of China under Grant Nos. 62272006 and 61972439, the Anhui Provincial Natural Science Foundation of China under Grant Nos. 2208085MF164 and 2108085MF214, the University Natural Science Research Program of Anhui Province under Grant No. KJ2021A0125, and the Key Research and Development Project of Wuhu under Grant No. 2022yf55.

References

Ruan

Zou

Chen

and Shen

, Monitoring the spatiotemporal trajectory of urban area hotspots using the svm regression method based on npp-viirs imagery, ISPRS International Journal of Geo-Information 10(6) (2021), 415–435.

Luo

Chen

and Zheng

, Road congestion detection based on trajectory stay-place clustering, ISPRS International Journal of Geo-Information 8(6) (2019), 264–284.

Ding

Zhang

Zhou

Liao

Luo

and Ni

L.M.

, FraudTrip: Taxi fraudulent trip detection from corresponding trajectories, IEEE Internet of Things Journal 8(16) (2021), 12505–12517.

Cheng

and Peng

, Short-term traffic forecasting by mining the non-stationarity of spatiotemporal patterns, IEEE Transactions on Intelligent Transportation Systems 22(10) (2021), 6365–6383.

Fan

Stewart

and Zhang

, Using big GPS trajectory data analytics for vehicle miles traveled estimation, Transportation Research Part C: Emerging Technologies 103 (2019), 298–307.

Yang

Cai

Yang

Zhang

and Zhao

, TAD: A trajectory clustering algorithm based on spatial-temporal density analysis, Expert Systems with Applications 139 (2020), 112846–112862.

Lin

Breugelmans

Wanga

Wang

Gao

and Tang

, A spatial-temporal trajectory clustering algorithm for eye fixations identification, Intelligent Data Analysis 20 (2016), 377–393.

Bian

Tian

Tang

and Tao

, Trajectory data classification: A review, ACM Transactions on Intelligent Systems and Technology 10 (2019).

Alatrista-Salas

Bringay

Flouvat

Selmaoui-Folcher

and Teisseire

, Spatio-sequential patterns mining: Beyond the boundaries, Intelligent Data Analysis 20 (2016), 293–316.

10.

Chen

Zhang

Castro

P.S.

Sun

and Wang

, IBOAT: Isolation-based online anomalous trajectory detection, IEEE Transactions on Intelligent Transportation Systems 14 (2013), 806–818.

11.

Qian

Cheng

Cao

Xue

Zhu

and Zhang

, Detecting taxi trajectory anomaly based on spatio-temporal relations, IEEE Transactions on Intelligent Transportation Systems 23 (2022), 6883–6894.

12.

Zhao

Cai

Yang

and Xi

, Vehicle anomalous trajectory detection algorithm based on road network partition, Applied Intelligence 52 (2021), 8820–8838.

13.

Luo

Chen

and Bian

, Neighborhood relevant outlier detection approach based on information entropy, Intelligent Data Analysis 20 (2016), 1247–1265.

14.

Zhang

Chang

Yuan

, Tan

and Chen

, Continuous trajectory similarity search for online outlier Detection, IEEE Transactions on Knowledge and Data Engineering 34 (2020), 4690–4704.

15.

Luo

Chen

and Wang

, Trajectory outlier detection approach based on common slices sub-sequence, Applied Intelligence 48 (2018), 2661–2680.

16.

Chen

Gong

Shi

Liu

and Chen

, Abnormal-trajectory detection method based on variable grid partitioning, ISPRS International Journal of Geo-Information 12 (2023).

17.

Zhang

Zhou

Z.H.

Chen

Sun

and Li

, iBAT: Detecting anomalous taxi trajectories from GPS traces, in: Proceedings of the 2011 ACM Conference on Ubiquitous Computing, China, 2011, pp. 99–108.

18.

Ying

and Yin

W.G.

, Cluster-based congestion outlier detection method on trajectory data, in: Proceedings of the 6th International Conference on Fuzzy Systems and Knowledge Discovery, 2009, pp. 243–247.

19.

Cheng

Xiong

Duan

and Xiao

, Distributed anomaly detection algorithm for spatio-temporal trajectories of vehicles, in: Proceedings of the 15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications, 2018, pp. 590–598.

20.

Wang

Zeng

Zou

Huang

and Jin

, A highly efficient framework for outlier detection in urban traffic flow, IET Intelligent Transport Systems 15 (2021), 1494–1507.

21.

Knorr

E.M.

R.T.

and Tucakov

, Distance-based outliers: Algorithms and applications, VLDB Journal 8 (2000), 237–253.

22.

Lee

J.G.

Han

and Li

, Trajectory outlier detection: A partition-and-detect framework, in: Proceedings of International Conference on Data Engineering, 2008, pp. 140–149.

23.

and Guo

, Trajectory Anomaly Detection Based on the Mean Distance Deviation, in: Proceedings of the 27th International Conference on Neural Information Processing, 2020, pp. 140–147.

24.

Belhadi

Djenouri

Srivastava

Djenouri

Cano

and Lin

J.C.W.

, A two-phase anomaly detection model for secure intelligent transportation ride-hailing trajectories, IEEE Transactions on Intelligent Transportation Systems 22 (2021), 4496–4506.

25.

Wang

Yuan

Liu

and Shen

, Anomalous trajectory detection and classification based on difference and intersection set distance, IEEE Transactions on Vehicular Technology 23 (2020), 6883–6894.

26.

Bhattacharjee

Garg

and Mitra

, KAGO: An approximate adaptive grid-based outlier detection approach using kernel density estimate, Pattern Analysis and Applications 24 (2021), 1825–1846.

Trajectory outlier detection method based on group division

Abstract

Keywords

1. Introduction

2. Related work

2.1 Clustering-based methods

2.2 Distance-based methods

2.3 Grid-based methods

3. Problem description

4.2 Trajectory outlier detection

4.2.1 Trajectory grouping

4.2.2 Trajectory clustering

5.1 Experimental setup and dataset

5.4 Comparative evaluation

5.4.1 Efficiency of trajectory outlier detection

5.4.2 Accuracy of trajectory outlier detection

Table 2 Running time (sec)

6. Conclusions

Footnotes

Acknowledgments

References

Table 2
Running time (sec)