Map-matching approach based on link factor and hidden Markov model

Abstract

An accurate map matching is an essential but difficult step in mapping raw float car trajectories onto a digital road network. This task is challenging because of the unavoidable positioning errors of GPS devices and the complexity of the road network structure. Aiming to address these problems, in this study, we focus on three improvements over the existing hidden Markov model: (i) The direction feature between the current and historical points is used for calculating the observation probability; (ii) With regard to the reachable cost between the current road section and the destination, we overcome the shortcoming of feature rarefaction when calculating the transition probability with low sampling rates; (iii) The directional similarity shows a good performance in complex intersection environments. The experimental results verify that the proposed algorithm can reduce the error rate in intersection matching and is suitable for GPS devices with low sampling rates.

Keywords

Map-matching hidden Markov model link factor reachable cost directional similarity

1 Introduction

With the development of the global economy and the improvement in living standards, car navigation systems are now part of almost all new family cars [1]. As a high-quality and low-cost location sensor, GPS has been widely employed in the personal navigation services and vehicle safety monitoring industries, emerging as the most common core device in terminal positioning equipment. With the popularization of car navigation systems, various applications driven by trajectory data have emerged [2]. For example, tens of thousands of taxis are equipped with GPS sensors to determine the traffic flow of the entire city and the movement of people. Specific applications include predicting the traffic flow at the entire city level and designing practical, fast, and feasible driving routes for drivers, recommending locations where passengers may find empty taxis, and detecting unusual events in real-time (such as road collapse, temporary traffic control, and large gatherings), in order to provide early warning or timely action [3 –7]. Through the analysis of long-term data, we can determine the drawbacks in urban planning and verify the effectiveness of the implemented plans to a certain extent [8]. In some cities, such as Copenhagen, researchers have installed sensors on bicycles to detect the air quality and temperature in the different parts of the city, and these data are then shared with information centers or relevant people through mobile phones. In Singapore, researchers have used cell phone signals to detect the flow of people, hotspots, and unusual events in the city. In the above examples, multiple computing units were used, such as cars, mobile phones, various types of sensors, roads and points of interest, and the data generated by these sources [5]. The results obtained reflect the rhythm and dynamics of the entire city. These applications require a necessary preprocessing step that matches each raw trajectory collected by the GPS with the road network. Research on effectively improving the matching accuracy to serve fields, such as intelligent transportation and smart cities, has attracted wide interest given the unavoidable errors in navigation systems [1].

Map matching typically implies correcting a deviated trajectory collected by the GPS from the map. Today’s GPS civilian positioning accuracy can reach up to 14 m, which meets the civilian positioning requirements. The GPS system can be stabilized in a short period of time. In this case, there are two main problems in using existing equipment to improve the matching accuracy: 1) Efficiency of the map matching algorithm. Especially at the initial moments, intersections, loops, viaducts, and other special locations, these locations have geographic complexity, which is the main reason for the mismatch of existing algorithms, i.e., a vehicle is matched to the wrong path or jumps or reciprocates. Under this circumstance, users cannot accurately judge their location, and the realization of service functions, such as traffic guidance and route planning, is affected. 2) Efforts to improve the accuracy of digital maps. The digital map is the basis of the navigation system and is the most intuitive matching result that users can perceive. If the map accuracy is low, the map matching algorithm will inevitably require a higher computational cost, even causing false matching. For the first problem, the existing methods are divided into two modes: online matching and offline matching. This article focuses on offline map matching. Typical techniques include point-to-segment projection [9], Frechette distance analysis [10], weight-based methods [11 –13], hidden Markov model (HMM) [14, 15], Bayesian inference [16, 17], confidence function theory [18], and fuzzy control theory [19]. However, when the sampling frequency is low, and the intersection is complicated, the accuracy of the matching algorithm is significantly reduced. Developing an algorithm that can overcome the problems of low sampling frequency and complex intersections in map matching remains challenging. The second problem in the map matching algorithm will be addressed in our future work. Herein, we mainly study the first question and assume that the road network data obtained by OpenStreetMap (OSM) are correct.

To improve the matching accuracy, this paper proposes an algorithm based on the link factor and the hidden Markov model (HMM). Considering that there is no turning requirement for the trajectory points of non-intersections, one road section can contain a large number of trajectory points. Therefore, in the preprocessing stage, the trajectory points are divided into intersection points (intersection points are the trajectory points at the road intersections) and non-intersections (trajectory points that are not intersection points). This approach not only reduces the time cost, but also maintains the matching accuracy. We use different matching methods for different trajectory points. In the calculation of the observation probability, it is necessary to construct an R-tree with the trajectory point as the node to obtain the k-nearest candidate road sections. Only the distance feature is considered in the matching of the non-intersections, and in the matching of the intersection points, the distance and direction characteristics of the historical-state and current-state candidate road sections are combined. In the calculation of the transition probability, for the non-intersection points, the connectivity of the road network and the distance characteristics between the projection points in the candidate road sections are mainly considered. For intersection points that may require turning, the overall trend in the trajectory should be considered. The main contributions of this study to map matching are summarized as follows:

Based on the temporal and spatial characteristics of the trajectory points, the trajectory points are divided into intersections and non-intersections. Different trajectory points have different matching methods.

We outline two problems in the application of the conventional hidden Markov model in map matching and propose corresponding solutions.

To overcome the difficulty in matching intersections, new calculation methods for the observation and transition probabilities are derived, and an intersection matching algorithm suitable for low-frequency sampling data is proposed.

The algorithm is compared with the classical HMM-based matching algorithm and the AntMapper algorithm on public and simulated datasets.

The rest of this paper is organized as follows. In Section 2, we introduce existing related work on abnormal trajectory detection. The matching algorithm based on the link factor and hidden Markov model is detailed in Section 3. The experimental setup and results are presented in Section 4. Finally, the conclusions are presented in Section 5.

2 Related work

Several map-matching approaches have been developed so far, and new ones are being explored. The existing approaches can be mainly classified into two categories [20]: weight-based methods and geometry-based methods.

A geometry-based approach depends on the point-to-point, point-to-line, and line-to-line distances. Yang et al. [21] presented a graph-based algorithm by assuming the trajectory as a line segment; they determined a candidate road set by checking whether the line segments intersect with the polygons in the graph. However, in the case of complex two-way and round roads, this method often fails. Researchers prefer weight-based approaches.

Different weight-based algorithms take different feature weights of the sampling points, such as the distance, direction, speed, and other features. The time cost of this method is relatively high despite its high accuracy. To reduce the cost, some scholars have reduced the pathfinding time [15 , 22–24], whereas others simplified the trajectory and map data [24, 25]. A map matching algorithm based on the HMM was proposed by Newson et al. in 2009 [15]. They used a Markov chain to find candidate links and to effectively control the number of candidate links. This method showed a good accuracy and time cost. However, the algorithm employed the distance difference feature, which is less advantageous in the dense parts of the road network. An extended algorithm [26] has been presented by combining the average speed difference feature and the driver path preference (DPP) to define the transition probability. The use of the average speed and DPP provided a more accurate description of the context relationship between the two candidates. The matching accuracy was more satisfactory when compared with Newson¡¯s algorithm. However, the matching accuracy is low when the number of trajectories marked with the road segments is insufficient. Qi et al. [27] proposed a junction decision-domain model. Although they concluded that the key to matching performance is the interaction matching, their model increases the set of candidate road segments without improving the decision-making process in specific instances.

Many existing approaches have focused on searching for more suitable features to calculate the observation probability or transition probability in order to improve the matching accuracy. However, with respect to dense road networks, the context information of the candidates used in calculating the transition or observation probabilities is weak. Therefore, further research should be conducted to improve the intersection matching accuracy.

3 Map-matching algorithm based on link factor and hidden Markov model

This section introduces the preliminary knowledge of map matching, and elaborates the design of the observation probability and transition probability model we proposed. For clarity, the symbols used are summarized in Table 1.

Table 1
Main notations used in this paper

Symbol Description

X Hidden state set

Y Observation state set

π Initial-state probability vector

A Hidden-state transition probability matrix

B Observed-state transition probability matrix

a _ij Hidden-state transition probability

b _ij Observed-state transition probability

p _i Trajectory point

G (V, E) Road network

it Intersection point

IT Set of intersection points in a trajectory

r Distance threshold between the track point and road node

x _t Observed object at time t

y _i i-th road segment

z₍t, i) Projection point of x_t on the road segment y_i

Δθ Direction deviation

β Coefficient of exponential distribution

α Standard deviation of Gaussian distribution

θ (·) Angle function between two vectors

Rec (·) Reachable cost of two road segments

hs_t (·) Directional similarity between the trajectory point and the road

DS (·) Directional similarity between the trajectory segment and the road

Symbol	Description
X	Hidden state set
Y	Observation state set
π	Initial-state probability vector
A	Hidden-state transition probability matrix
B	Observed-state transition probability matrix
a _ij	Hidden-state transition probability
b _ij	Observed-state transition probability
p _i	Trajectory point
G (V, E)	Road network
it	Intersection point
IT	Set of intersection points in a trajectory
r	Distance threshold between the track point and road node
x _t	Observed object at time t
y _i	i-th road segment
z₍t, i)	Projection point of x_t on the road segment y_i
Δθ	Direction deviation
β	Coefficient of exponential distribution
α	Standard deviation of Gaussian distribution
θ (·)	Angle function between two vectors
Rec (·)	Reachable cost of two road segments
hs_t (·)	Directional similarity between the trajectory point and the road
DS (·)	Directional similarity between the trajectory segment and the road

3.1 Introduction to HMM

The HMM is a statistical analysis model used to describe a Markov process with hidden parameters. This model has been applied in many fields including language recognition, natural language recognition, and pattern recognition. The main problems that it can solve have the following characteristics: 1) Sequence-based problems such as time series or state sequences; 2) There are two types of data in this type of problem: one is observable, which then becomes an observation sequence, and the other is the hidden sequence, which cannot be directly observed. A Markov process is a type of random process, which means that with a given current state, only the current state can be used for the future evolution, whereas the previous historical state is irrelevant to the future state. This process can be expressed as follows: $\begin{matrix} P (X_{n + 1} | X_{1} = x_{1}, X_{2} = x_{2}, . . ., \\ X_{n} = x_{n}) = P (X_{n + 1} | X_{n}) \end{matrix}$ (1) where X_n+1 is only related to the given X_n. In the HMM, X_n is a hidden state. As shown in Fig. 1, each hidden state x_i corresponds to an observable state y_i.

Fig. 1

Hidden Markov state chain.

From the above description, five basic elements of the HMM are introduced:

Hidden state set: X = {x₁, x₂, ⋯ , x_n}, where n is the number of hidden states;

Observation state set: Y = {y₁, y₂, ⋯ , y_m}, where m is the number of observation states;

Initial state probability vector: π = {π₁, π₂, ⋯ , π_n}, π_i = P {x_t = i} (1 ≤ i ≤ n);

Hidden state transition probability matrix: $A = [\begin{matrix} a_{11}, a_{12}, \dots, a_{1 n} \\ a_{21}, a_{22}, . . ., a_{2 n} \\ \dots \\ a_{n 1}, a_{n 2}, . . ., a_{nn} \end{matrix}],$ where a_ij = P (x_t+1 = j|x_t = i) (1 ≤ i, j ≤ n).

Observed state transition probability matrix: $B = [\begin{matrix} b_{11}, b_{12}, \dots, b_{1 m} \\ b_{21}, b_{22}, \dots, b_{2 m} \\ \dots \\ b_{n 1}, b_{n 2}, \dots, b_{nm} \end{matrix}],$ where b_ij = P (y_t = j|x_t = i) (1 ≤ j ≤ m, 1 ≤ i ≤ n).

If this model is applied to map matching, the observation state set Y contains the trajectory points collected by the GPS. Because the trajectory point is determined, the size of the observation set for a single trajectory point is one. The hidden state set X contains the road section where the trajectory point is actually located, and is also called the candidate set. The locations of the trajectory points are different in real time, so the candidate set will change accordingly. The calculation method for the observation probability matrix is typically determined using the Gaussian distribution of the distances from the GPS points to the road sections. In [15], the observation probability was calculated using the following equation: $P (x_{t} | y_{i}) = \frac{1}{\sqrt{2 π} σ} exp (- 0.5 (\frac{∥ x_{t} - z_{t, i} ∥}{σ})^{2})$ (2) where x_t is the position of the trajectory point at time t, y_i represents the candidate road sections, z_t,i is the projection point of x_t on the road section y_i, and σ is the accuracy of the GPS. Newson et al. [15] concluded that, the shorter the distance from a trajectory point to a road section, the greater the probability that the trajectory point is on that road section. The hidden state transition probability can be derived from the calculation combining the road network analysis with the trajectory point or from the road network. In [15], this probability was calculated as follows: $\begin{matrix} P (y_{t + 1, j} | y_{t, i}) = \frac{1}{β} \times exp (- \frac{1}{β} \times | ∥ \\ x_{t} - x_{t + 1} ∥ - ∥ z_{t, i} - z_{t + 1, j} ∥ |) \end{matrix}$ (3) where P (y_t+1,j|y_t,i) is the transition probability that the driver moves from the road section y_i at time t to the road y_j at time t + 1; β is the coefficient obtained by training based on a large number of sampling points. ∥x_t - x_t+1∥ is the distance between the trajectory point at time t and the point at time t + 1; ∥z_t,i - z_t+1,j∥ is the distance between the projection point on y_i at time t and the point on y_j at time t + 1. In Equation (3), the calculation of this probability is only related to the trajectory point. The authors in [15] concluded that, the closer the above two distances, the greater the probability that a driver moves from the current road section to the next section. This type of assumption is likely to cause a misjudgment at complex roundabouts. When roundabouts have multiple coinciding sections, the transition probabilities of the coincident sections are similar, leading to a difficulty in decision-making at the intersections. Some scholars have continued to modify Equation (3). For example, Raymond et al. [23] concluded that the distance between the GPS sampling points should not be considered when calculating the transition probability of the road sections. Szwed et al. [28] analyzed the complete map data to calculate the transition probability between road sections. However, the problem of complex intersection matching remains unresolved.

3.2 System framework

To solve the aforementioned problem, we improve the method proposed in [15] and consider the matching of the complex roundabouts separately. Figure 2 shows the framework of our proposed approach, which is composed of two major components: a preprocessing stage and a matching stage.

Fig. 2

Algorithm flowchart.

The preprocessing stage mainly involves extracting the intersection and preparing the candidate sets. In the matching stage, based on the characteristics of the HMM, we need to calculate the observation probability of each trajectory point based on the direction feature and the transition probability based on the road network connectivity. Each trajectory point is checked: if it is an intersection, another method is used to calculate the observation and transition probabilities. How to calculate the transition and observation probability of intersection points will be described in Section 3.4.

3.3 Preprocessing of the trajectory

The trajectory data are obtained by sampling the movement process of one or more moving objects in the space-time environment, including the sampling point position, sampling time, and speed. The sampling point data information constitute the trajectory based on the sequence of the sampling time data.

Definition 1 (Trajectory point) Let t be a timestamp and (x, y) be a location in R². A trajectory point is defined as a triple p (x, y, t), implying that an object is at location (x, y) at time t.

Definition 2 (Trajectory) A trajectory, denoted by T, represents a set of trip trajectory points. T = {Tid, p₁, p₂, ⋯ , p_n} , p_i = (x_i, y_i, t_i) (1 ≤ i ≤ n), n is the number of sampled points in the trajectory. Due to the complexity of the road network, the amount of storage required and the calculation time are considerable; therefore, the network should be simplified such that only the nodes representing the intersection of multiple streets are retained. Figure 3 shows the simplified network. Figure 3(a) shows the original road network. As shown, this kind of complex road network is very time-consuming to store and read. Figure 3(b) shows the simplified road network. After simplifying the road network, the trajectory points need to be classified.

Fig. 3

Road network.

Definition 3 (Road network) The road network, denoted by G (V, E), is a directed graph, where V = {v₁, v₂, ¡, v_nv} represents the node set, and E = {e₁, e₂, ¡, e_ne} represents the directed edge set formed by the road sections between the end points of the road sections. Each directed edge is written as $e_{i} = {{Eid}_{i}, < v_{s}^{i}, v_{d}^{i} >}$ , 1 ≤ i ≤ ne, where Eid_i records the identification of the directed edge, and $v_{s}^{i}, v_{d}^{i} \in V$ respectively record the position points at both ends of the directed edge (road section).

Definition 4 (Intersection trajectory point) There is a road network G (V, E). If the distance between the trajectory point p_i = (x_i, y_i, t_i) and road node v_i is less than r, the trajectory point is considered to be near the road junction, and the point is considered the intersection trajectory point. Moreover, we believe that the starting and ending points of the trajectory are also intersection points, denoted by it, and the set of intersection points of a trajectory is represented by IT = {it₁, it₂, ⋯ , it_m}.

As shown in Fig. 4, the red dots, green dots, and five-pointed stars indicate the intersection point, non-intersection points, and road node, respectively. The overall trend in the trajectory can be roughly determined by the intersection points. Therefore, we can solve the junction matching problem through the temporal and spatial characteristics of the intersection points. We introduce this aspect in detail in Section 3.4.

Fig. 4

Example of intersection points.

We use the idea of the binary search algorithm to find the intersection trajectory points. The specific algorithm steps are as follows:

The trajectory points are represented in the form of a 1D-ordered linear table.

The distance between the keyword recorded in the middle position of the linear table and the road section point should be calculated. If the distance is less than the set threshold r, the search is said to be successful.

Otherwise, we judge whether the distance between the keyword of the previous position of the middle position and the road section is less than r, and whether the distance between the keyword of the next position of the middle position and the road section is less than r, until the intersection point is found.

Otherwise, it always looks forward along the corresponding direction; after a successful search, the remaining unsearched tables are divided into two sub-tables. Steps 2 ∼ 4 are repeated until the child table no longer exists. The specific example is shown in Fig. 5.

Fig. 5

Specific search example.

The green square in Fig. 5 represents the intersection trajectory point, and its search process is as follows:

In the first round of the search, we check whether Box 7 is an intersection trajectory point. The search is successful. The linear table is divided into two sub-tables: one is Box 1 to Box 6, and the other is Box 8 to Box 14. In each sub-table, we first check whether Box 3 is an intersection trajectory point. If the search fails, we find the squares before and after Box 3. Box 2 is not an intersection point, and the sub-table cannot be divided; in this case, the search ends. Box 4 is an intersection point, so Boxes 5 and 6 are used as a sub-table for the next round of query; in the second sub-table, Box 11 is not an intersection point, so we check the nearby boxes in turn, and find Box 10 as an intersection point, so Boxes 8 and 9 are also used for the next round of search. There is no intersection trajectory point in a square with a number greater than 11, so the search process ends. In the third round of the search, none of the sub-tables are intersection trajectory points, and the sub-tables cannot be divided, so the search process ends.

3.4 Map-matching approach based on link factor and hidden Markov modelm

This study mainly improves the HMM model from three aspects. First, the trajectory points are divided into two types: intersecting and non-intersecting. Second, we use different methods to calculate the observation and transition probabilities for different types of points. Third, we determine the specific direction of the intersection through the overall direction of the trajectory. Considering the need for turning at intersections, when calculating the observation and transition probabilities, the direction difference and the inter-section reachability cost are increased.

We judge whether a trajectory point is an intersection point mainly based on the distance between the trajectory point and the node of the candidate road section. The trajectory points within 100 m of the crossroad are considered intersections.

3.4.1 Observation probability matrix

The conventional HMM only considers the distance between the current point and the road segment in the observation probability calculation. For non-intersection trajectory points, the observation probability in the conventional HMM is calculated accurately, so we use the observation probability calculation method proposed by Newson et al. [15] for the non-intersections. When the distance between the candidate road sections of the historical and current observation objects is short, the probability that the current observation object matches with the road section is high.

However, we believe that it is necessary to consider the matching of the historical points for roads in complex environments, particularly at road intersections. As shown in Fig. 6, p_i-1, p_i, and p_i+1 denote three consecutive trajectory points. If the p_i trajectory point is at an intersection, the position of p_i is very close to the sections of rs₁ and rs₂ because of the error in the GPS points. If only the distance from the trajectory point to the road section is considered, there may be an error when matching p_i with rs₂.

Fig. 6

Example of a problem that is difficult to solve using the conventional algorithm.

Therefore, this study considers two factors when calculating the observation probability at intersections:

The absolute distance between the current observation object x_t and the candidate road section.

The correlation between the trajectory direction formed by the current observation point x_t and the previous observation point x_t-1 and the direction of the candidate road section at the current observation point.

Definition 5 (the directional similarity between the trajectory point and the road) Given the current observation object x_t, an R-tree with x_t as the node is constructed, and we obtain k-nearest candidate road segment sets RS_t = {rs_i|i = 1, 2, ⋯ , k}. Given the observation point x_t-1 at the previous moment, the directional similarity between the trajectory point and the road can be described as follows:

$\begin{matrix} {hs}_{t} = \frac{1}{2} (cos (min {Δ θ, 180^{\circ}}) + 1 + α) \\ Δ θ = | θ (x_{t - 1}, x_{t}) - θ (r s_{i} . start, r s_{i} . end) | \end{matrix}$ (4) where θ (x_t-1, x_t) represents the angle between the two points x_t-1, x_t and the x-axis, θ (rs_i . start, rs_i . end) represents the straight line inclination angle of the candidate road section rs_i, and Δθ represents the direction deviation of the trajectory point and the road. To prevent the cosine function value from being 0, we set α to a very small number.

Similarly, assuming that the distance between the observation object and the road section obeys a normal distribution, we can express the observation probability of the intersection point as follows: $P (x_{t} | {rs}_{i}, x_{t - 1}) = P (x_{t} | {rs}_{i}) \times {hs}_{t}$ (5)

3.4.2 Transition probability matrix

A trajectory is a line segment composed of continuous points, so the matching road sections should also be connected. Therefore, it is necessary to consider the reachability between sections in the road network, and a road network directed graph can be used to describe this feature. Therefore, we consider two factors when calculating the transition probability for non-intersection points:

The difference in the distance between consecutive trajectory points and that between candidate road section projection points;

The connectivity between the candidate road sections at the consecutive sampling points.

We define the reachable cost between the links based on their connectivity.

Definition 6 (Reachable cost): If two road sections are connected, we call them directly reachable. If a link exists rs₁, rs₂, ⋯ , rs_n, rs₁ = p, rs_n = q, rs_i ∈ D (1 ≤ i ≤ n),where any two adjacent road sections are directly reachable, then p and q are reachable. The reachable cost between p and q is defined as the shortest path from p to q: $Rec (p, q) = shorestpath (p \to q)$ (6) The shortest path is obtained using Dijkstra’s algorithm. The calculation method for the state transition matrix is typically determined by the exponential distribution of the cost of one road section to another. Combining the reachable cost of the links and the distance between the links, we can define the state transition probability as follows: $P ({rs}_{t + 1, j} | {rs}_{t, i}) = \frac{1}{β} \times exp (- \frac{1}{β} \times ζ)$ (7) $\begin{matrix} ζ = | ∥ x_{t} - x_{t + 1} ∥ - ∥ z_{t, i} - \\ z_{t + 1, j} ∥ | + Rec ({rs}_{t + 1, j}, {rs}_{t, i}) \end{matrix}$ (8)

In Equation (7), β refers to the difference between the driving distance and the distance from the earth’s surface. The calculation of ζ is given in Equation (8), which evaluates the cost of the current road section with the next road section. ∥z_t,i - z_t+1,j∥ is the distance between the projection point on y_i at time t and the projection point on y_j at time t + 1. The closer the distance, the more likely the observation object is on the link. Moreover, the lower the reachable cost between the links, the closer the target is to the destination.

The intersection point matching decision is a difficult problem in map matching, as there is no accurate theory for guidance. We consider the next trajectory point of the direction only if there is a crossroad. However, as many roads are close by and overlap each other, it is difficult to make decisions for circular viaduct intersections.

As shown in Fig. 7, the green and yellow dots respectively represent the trajectory intersections and road points, and ir_jk is the number of road sections with the starting point ir_j and the ending point ir_k. Two green points, ix₁ and ix₂, indicate the intersections extracted in the preprocessing stage. x is not only the non-intersection trajectory point, but is also the next GPS sample point of ix₁ in the original trajectory. When the current matching object is ix₁, ∥z_1,23 - z₂₃∥ is very close to ∥z_1,24 - z₂₄∥, and a matching error is likely under this condition.

Fig. 7

Elevated intersection matching example.

Given the above problems, we consider the similarity between the direction of the trajectory point and the direction of the road section.

Definition 7 (Directional similarity) Given an intersection sequence of trajectory T: it₁ → it₂ → ⋯ → it_n with it_j (j = 2, 3, …, n), it_j-1it_j, and it_jit_j+1 constituting a trajectory segment, the difference in the direction between the trajectory segment and the road section, denoted by φ, is calculated as $φ = | 〈 \vec{i t_{j - 1} i t_{j}}, \vec{i t_{j} i t_{j + 1}} 〉 | - 〈 \vec{d_{i r_{ck}}}, \vec{d_{i r_{kl}}} 〉$ , where $\vec{d_{i r_{ck}}}$ is a vector representing a road direction. Hence, the directional similarity between the trajectory segment and the road section is recorded as DS, and the calculation is as follows: $DS = cos φ$ (9) The transition probability at the intersection is as follows: $P (r s_{t + 1, j} | r s_{t, i}, x_{t - 1}, x_{t}, x_{t + 1}) = P (r s_{t + 1, j} | r s_{t, i}) \times DS$ (10)

From the definition, when 0 ≤ Δθ ≤ 90°, and 0 ≤ DS ≤ 1 the greater the difference in the direction between the track and road segments, the lesser the directional similarity. In addition, when 90° ≤ Δθ ≤ 180° and -1 ≤ DS ≤ 0 and when the range of the candidate links is wide, DS may be less than 0.

We summarize the observation probability calculation, transition probability calculation, and intersection matching algorithm introduced earlier to obtain the LF-Mapper matching algorithm.

Evidently, since the node information of the road section obtained from OpenStreetMap is complicated, the road section should be simplified by a line segment comprising two points in the pre-processing stage. In Line 1 of Algorithm 1, the intersection points in a trajectory are found, and in Line 2 of Algorithm 1, the candidate for each trajectory point is found. In this study, the trajectory points are divided into intersections and non-intersections. Because a crossroad is an important factor affecting the matching accuracy in map matching, the observation and transition probabilities for different types of trajectory points are calculated differently. Lines 4 to 12 of Algorithm 1 represent the calculation of the observation and transition probabilities. Finally, the transition probability matrix and the observation probability matrix are used as the input of the Viterbi algorithm, which outputs the optimal road sections.

Algorithm 1. Matching algorithm based on link factor and hidden Markov model(LF-Mapper)

Input: trajectory T = {x₁, x₂, x₃, ⋯ , x_n}, road network G (V, E)

Output: matched road segments RS = {rs₁, rs₂, rs₃, ⋯ , rs_n}

1: IT = Find the intersection points in a trajectory

2: CN = Calculate the candidate nodes for all trajectory points

3: for i in range(len(T)) do

4: if x_i not in IT then

5: Calculate the observation probability using (2)

6: Calculate the transition probability using (6,7,8)

7: else:

8: Calculate the observation probability using (4,5)

9: Calculating the transition probability using (9,10)

10: endif

11: endfor

12: RS = Viterbi // Find optimal path

13: return RS;

4 Experimental analysis

In this section, we report experiments conducted on real and degraded datasets to study the performance of the proposed algorithm. The experimental result of LF-Mapper is compared with those of a representative algorithm based on the HMM [15] and an ant colony-based map matching approach (AntMapper) [22]. We list all the parameters used along with the ranges in Table 2. In the following, we introduce the datasets and parameters used in this study and describe the experimental results in detail.

Table 2
Parameters used along with the ranges

Parameter Description Range

Δt Sampling frequencies [0,470]

k The number of candidates [1,8]

Δσ Gaussian white noise [5,90]

Δrs The number of road sections [4,10]

Parameter	Description	Range
Δt	Sampling frequencies	[0,470]
k	The number of candidates	[1,8]
Δσ	Gaussian white noise	[5,90]
Δrs	The number of road sections	[4,10]

4.1 Dataset

4.1.1 Trajectory datasets

This section mainly introduces the datasets. We have two real trajectory datasets. One of the datasets (hereinafter referred to as SD) was reported in [15]. SD was tested in Seattle, Washington, USA for approximately 80 km, and 7,351 GPS points were collected at a sampling frequency of 1 Hz. The ground truth is a sequence of edges in the map data. The other dataset has been published in [24] (hereinafter referred to as MD); it reflects a journey in Melbourne that took approximately 40 min, with 2512 GPS points.

We simulate low-fidelity data by adding a Gaussian noise. Fortunately, a Gaussian noise is not only mutually uncorrelated between random variables at any two different moments, but is also statistically independent. In short, it satisfies the equations below. Therefore, we simulate a random Gaussian noise with standard deviations of 1, 10, 15, 20, 30, 40, 50, 75, and 100. $\begin{matrix} X_{1} \sim N (μ_{1}, σ_{1}^{2}) \\ X_{2} \sim N (μ_{2}, σ_{2}^{2}) \\ (X_{1} + X_{2}) \sim N (μ_{1} + μ_{2}, σ_{1}^{2} + σ_{2}^{2}) \end{matrix}$ (11)

4.1.2 Road network datasets

In addition to GPS data, we require road network data. Figure 8(a) shows the actual road network map obtained of Seattle. The coverage area is [-37.8269, -37.7221, 144.9316, 145.0753]. The actual condition of the road cannot be clearly seen in the figure because the complete road network data are complex, with 82,674 nodes and 188,046 edges. Reading and searching is very time-consuming, so the network should be simplified. The simplified road network diagram has 38,102 edges and 16,198 nodes. Figure 8(b) shows the actual road network diagram of Melbourne. The coverage area is [47.563166, 47.671083, -122.357316, -122.086483]. The complete road network data has 79,351 nodes and 151,085 edges, whereas the simplified road network data has only 36,576 edges and 14,067 nodes.

Fig. 8

Road network datasets.

4.2 Parameter estimation

We need to estimate two probability-related parameters and one parameter of the number of candidates k. One of the probability-related parameters is σ, estimated from the Gaussian distribution using Equation (13). From the notation given in Section 3, the sampling point on the road section y_i nearest to x_t is z_t,i. We calculate the perpendicular distance ∥x_t - z_t,j∥, which is considered an estimate of the GPS sampling error. The median absolute deviation (MAD) method is used to estimate σ, which is resilient against outliers. For our datasets, we set σ to 5.05 m. $σ = \sqrt{2} {median}_{t} (∥ x_{t} - z_{t, j} ∥)$ (12) The other parameter is β, estimated from the exponential distribution in Equation (7), which describes the distance difference between two adjacent trajectory sampling points and two candidate points corresponding to these sampling points. We consider this parameter and estimate it using a robust method reported by Gather and Schultze [29]:

$β = \frac{1}{ln 2} media n_{t} ∣ ∥ x_{t} - x_{t + 1} ∥ - ∥ z_{t, i} - z_{t + 1, j} ∥ ∣$ (13)

Considering the number of k candidate values, we set up a group of experiments to compare the matching errors corresponding to different k. Figure 9 shows the impact of the number of candidates on the matching error of LF-Matching. We set the range of k between 1 and 8, and compare the influence of k on the matching error rate under different sampling frequencies. The sampling frequencies are set to 1, 5, 10, 20, 30, 50, 70, and 100 s. Fig. 9(a) shows the matching error rate with different sampling rates and k on the SD dataset, and Fig. 9(b) shows the matching error rate with different sampling rates and k on the MD dataset. Figure 9 shows the matching error rate with different numbers of candidates and sampling frequencies on the SD and MD datasets. As shown in Fig. 9, with increasing k value, the error rate decreases. When k increases to 6, the error rate decreases less evidently. Considering that, the greater the k value, the higher the time cost of searching for the shortest path, we believe that setting a k value of 6 is appropriate.

Fig. 9

Comparison of the error rates with different sampling rates and k on the SD and MD datasets.

4.3 Experimental results

All the experiments were conducted using Python3.7, with the software and hardware environment being Intel Core i5 @ 2.30 GHz quad-core CPU, 16 GB memory, and Windows 10 operating system. The simulated map data are more complicated than the real map, which can help verify the advantages of LF-Mapper for the intersection decision. Two main methods are used for calculating the accuracy of the map matching algorithm: calculating the number of correctly sampled points or the number of correctly matched road sections. In the first method, finding the actual locations of each sampling point is difficult. The latter is evidently simpler than the former. This method will obscure the position of the intersection. From a macroperspective, as long as the direction of the entire trajectory turning is correct, it is irrelevant which road the intersection belongs to. Therefore, we use the second method to calculate the matching accuracy. We calculate the precision suggested by Newson et al. [15]. Fig. 10 shows the detailed calculation method.

Fig. 10

Measurement of the matching error.

4.3.1 Comparison of error rates at different sampling intervals

First, we use the real datasets to change the sampling frequency by adding and deleting trajectory points. The sampling frequencies are set to 1, 5, 10, 20, 30, 50, 70, 100, 130, 170, 210, 250, 300, 350, 410, and 470 s. Fig. 11(a) shows the matching error rate with different sampling rates to LF-Mapper, HMM, and AntMapper on the SD dataset, and Fig. 11(b) shows the matching error rate with different sampling rates to LF-Mapper, HMM, and AntMapper on the MD dataset. As shown in Fig. 11, when the sampling frequency is increased to 470 s, the LF-Mapper error rate changes smoothly and has an advantage after 300s. The HMM algorithm only considers the relationship between the sampling point and the distance of the road section, without adding the direction factor. In addition, the AntMapper algorithm normalizes the geometric and direction errors, thus reducing the influence on the direction error at the intersection. As the sampling frequency gradually increases, the directional similarity between trajectory sections and road sections and road connectivity become critical. Distance feature has a good matching effect on the trajectory of high-frequency sampling, but loses its advantage for low-frequency sampling.

Fig. 11

Comparison of the error rates with different sampling rates to LF-Mapper, HMM, and AntMapper on the SD and MD datasets.

4.3.2 Comparison of matching error rate after adding noise

We simulate the real ground situation by adding Gaussian white noise to the trajectory. In Section 4.1, we introduced the mechanism whereby noise is added. Moreover, we set four different sampling intervals, namely 30, 70, 100, and 130, to check the matching error rate. Figs. 12 and 13 show comparisons of the map matching accuracy on the SD and MD datasets, respectively. The x-axis represents the standard deviation (in m) in a zero-mean Gaussian distribution. For setting the Gaussian white noise, we follow the suggestion provided by Newson et al. [15]. As shown in Figs. 12 and 13, when the added noise is less than 100 m, the error rate comparison of LF-Mapper is better than those of the other algorithms. With increasing noise and sampling frequency, the matching error increases. When the noise data value increases to 75, the matching error rate of LF-Mapper is lower than those of HMM and AntMapper. Because of the increasing noise, the distance feature loses its efficiency to some extent. At this time, the matching efficiency mainly depends on the connectivity of the direction features and road network data, which can ensure that the macroscopic trend in the vehicle trajectory is correct.

Fig. 12

Comparison of the error rates with different sampling rates and standard deviation of Gaussian noise to LF-Mapper, HMM, and AntMapper on the SD dataset.

Fig. 13

Comparison of the error rates with different sampling rates and standard deviation of Gaussian noise to LF-Mapper, HMM, and AntMapper on the MD dataset.

4.3.3 Comparison of matching error rates to increase road section complexity

We edited the real map data and added road sections in different directions by overlapping the real map intersections, which increase the complexity of the road sections. When four real road sections converge at an intersection, we added one or two roads to make five or six road sections converge at one point. We set the number of road sections to 4, 5, 6, 7, 8, 9, and 10. Similarly, when adding roads, we also set different sampling times to compare the matching error rates. We set four different sampling intervals, namely 30, 70, 100, and 130, to check the matching error rate. Figs. 14 and 15 show the comparisons of the map matching accuracy on the SD and MD datasets, respectively. From the matching error rates of the different algorithms applied to the Seattle and Melbourne datasets, we find that when the number of road sections increases to 8, the matching error rate changes abruptly at different sampling frequencies. The error rates of the other two algorithms are similar. With the increase in the number of road sections, the angle between the road sections decreases, so the directional similarity between the roads is higher. When matching depends on the directional similarity between the trajectory segment and road section, the matching error rate will suddenly change. However, considering the actual situation, it is rare for eight road sections to overlap at an intersection, so this error rate can be tolerated.

Fig. 14

Comparison of the error rates with different sampling rates and different numbers of road sections to LF-Mapper, HMM, and AntMapper on the SD dataset.

Fig. 15

Comparison of the error rates with different sampling rates and different numbers of road sections to LF-Mapper, HMM, and AntMapper on the MD dataset.

5 Conclusions

As a pre-processing step for many applications, map matching has increasingly higher requirements on the matching accuracy of the algorithms used, in complex road environments. We propose a map matching algorithm based on link factor and hidden Markov model: 1) Based on the temporal and spatial characteristics of the trajectory points, the trajectory points are divided into intersections and non-intersections. Different trajectory points have different matching methods. 2)For a trajectory with a low sampling frequency, the cubic spline interpolation method is used for interpolation, so that there are intersection trajectory points at the road intersection. 3) We outline two problems in the application of the conventional hidden Markov model in map matching and propose corresponding solutions. 4) To overcome the difficulty in matching intersections, new calculation methods for the observation and transition probabilities are derived, and an intersection matching algorithm suitable for low-frequency sampling data is proposed Finally, we tested our algorithm on real and simulated datasets, and confirmed its effectiveness. A limitation of the proposed algorithm is that it is only suitable for offline states, not for online matching. This is because online matching requires considering the real-time performance, and it is necessary to preprocess a user’s trajectory in advance, which will be addressed in our future work.

Footnotes

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61972439, 61672039, 61702010).

References

Lim

, Lee

, Sunwoo

and Jo

, Hierarchical trajectory planning of an autonomous car based on the integration of a sampling and an optimization method, IEEE Transactions on Intelligent Transportation Systems 19(2) (2018), 613–626.

Wang

, Cao

and Yu

, Deep learning for spatio-temporal data mining: A survey, IEEE Transactions on Knowledge and Data Engineering, 2020.

Besse

P.C.

, Guillouet

, Loubes

J.-M.

and Royer

, Destination prediction by trajectory distribution-based model, IEEE Transactions on Intelligent Transportation Systems 19(8) (2017), 2470–2481.

Sodemann

A.A.

, Ross

M.P.

and Borghetti

B.J.

, A review of anomaly detection in automated surveillance, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42(6) (2012), 1257–1272.

Chai

, Savvaris

, Tsourdos

, Chai

and Xia

, Stochastic spacecraft trajectory optimization with the consideration of chance constraints, IEEE Transactions on Control Systems Technology 28(4) (2020), 1550–1559.

Jiang

, Barnett

and Gosselin

, Periodic trajectory planning beyond the static workspace for 6-dof cable-suspended parallel robots, IEEE Transactions on Robotics 34(4) (2018), 1128–1140.

Qiao

, Shen

, Wang

, Han

and Zhu

, A self-adaptive parameter selection trajectory prediction approach via hidden markov models, IEEE Transactions on Intelligent Transportation Systems 16(1) (2015), 284–296.

Liu

, Zhang

and Rogers

, Trajectory tracking control for autonomous underwater vehicles based on fuzzy re-planning of a local desired trajectory, IEEE Transactions on Vehicular Technology 68(12) (2019), 11657–11667.

White

C.E.

, Bernstein

and Kornhauser

A.L.

, Some map matching algorithms for personal navigation assistants, Transportation Research Part c: Emerging Technologies 8(1-6) (2000), 91–108.

10.

Brakatsoulas

, Pfoser

, Salas

and Wenk

, On map-matching vehicle tracking data, in Proceedings of the 31st international conference on Very large data bases, pp. 853–864, 2005.

11.

Wei

, Wang

, Forman

, Zhu

and Guan

, Fast viterbi map matching with tunable weight functions, in Proceedings of the 20th International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’12, (New York, NY, USA), p. 613–616, Association for Computing Machinery, 2012.

12.

Quddus

and Washington

, Shortest path and vehicle trajectory aided map-matching for low frequency gps data, Transportation Research Part C: Emerging Technologies 55 (2015), 328–339.

13.

Hashemi

and Karimi

H.A.

, A weight-based map-matching algorithm for vehicle navigation in complex urban networks, Journal of Intelligent Transportation Systems 20(6) (2016), 573–590.

14.

Wang

and Zimmermann

, Eddy: An error-bounded delay-bounded real-time map matching algorithm using hmm and online viterbi decoder, in Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL ’14, (New York, NY, USA), p. 33–42, Association for Computing Machinery, 2014.

15.

Newson

and Krumm

, Hidden markov map matching through noise and sparseness, in Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems, pp. 336–343, 2009.

16.

Fouque

and Bonnifait

, Matching raw gps measurements on a navigable map without computing a global position, IEEE Transactions on Intelligent Transportation Systems 13(2) (2012), 887–898.

17.

Hunter

, Abbeel

and Bayen

, The path inference filter: Model-based low-latency map matching of probe vehicle data, IEEE Transactions on Intelligent Transportation Systems 15(2) (2014), 507–529.

18.

Abdallah

, Nassreddine

and Denoeux

, A multiple-hypothesis map-matching method suitable for weighted and box-shaped state estimation for localization, IEEE Transactions on Intelligent Transportation Systems 12(4) (2011), 1495–1510.

19.

Ren

and Karimi

H.A.

, A fuzzy logic map matching for wheelchair navigation, GPS Solutions 16(3) (2012), 273–282.

20.

Hashemi

and Karimi

H.A.

, A critical review of real-time map-matching algorithms: Current issues and future directions, Computers, Environment and Urban Systems 48 (2014), 153–165.

21.

Yang

, Wang

and Yang

, A map-matching algorithm based on graphics, in Proceedings of the 32nd Chinese Control Conference, pp. 5046–5051, IEEE, 2013.

22.

Gong

Y.-J.

, Chen

, Zhang

, Ni

L.M.

and Zhang

, Antmapper: An ant colony-based map matching approach for trajectory-based applications, IEEE Transactions on Intelligent Transportation Systems 19(2) (2017), 390–401.

23.

Raymond

, Morimura

, Osogami

and Hirosue

, Map matching with hidden markov model on sampled road network, in Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp. 2242–2245, IEEE, 2012.

24.

, Kulik

and Ramamohanarao

, Spatio-temporal trajectory simplification for inferring travel paths, in Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 63–72, 2014.

25.

, Kulik

and Ramamohanarao

, Robust inferences of travel paths from gps trajectories, International Journal of Geographical Information Science 29(12) (2015), 2194–2222.

26.

Song

, Yan

, Stephen

and Khan

A.A.

, Hidden markov model and driver path preference for floating car trajectory map matching, IET Intelligent Transport Systems 12(10) (2018), 1433–1441.

27.

, Di

and Li

, Map-matching algorithm based on the junction decision domain and the hidden markov model, PloS One 14(5) (2019).

28.

Szwed

and Pekala

, Map-matching in a real-time traffic monitoring service, in Beyond Databases, Architectures, and Structures (S. Kozielski, D. Mrozek, P. Kasprowski, B. Małysiak-Mrozek, and D. Kostrzewa, eds.), (Cham), pp. 425–434, Springer International Publishing, 2014.

29.

Gather

and Schultze

, Robust estimation of scale of an exponential distribution, Statistica Neerlandica 53(3) (1999), 327–341.

Map-matching approach based on link factor and hidden Markov model

Abstract

Keywords

1 Introduction

2 Related work

3 Map-matching algorithm based on link factor and hidden Markov model

3.4.1 Observation probability matrix

Table 2 Parameters used along with the ranges Parameter Description Range Δt Sampling frequencies [0,470] k The number of candidates [1,8] Δσ Gaussian white noise [5,90] Δrs The number of road sections [4,10]

4.1.1 Trajectory datasets

Footnotes

Acknowledgments

References

Table 2
Parameters used along with the ranges

Parameter Description Range

Δt Sampling frequencies [0,470]

k The number of candidates [1,8]

Δσ Gaussian white noise [5,90]

Δrs The number of road sections [4,10]