Abstract
An accurate map matching is an essential but difficult step in mapping raw float car trajectories onto a digital road network. This task is challenging because of the unavoidable positioning errors of GPS devices and the complexity of the road network structure. Aiming to address these problems, in this study, we focus on three improvements over the existing hidden Markov model: (i) The direction feature between the current and historical points is used for calculating the observation probability; (ii) With regard to the reachable cost between the current road section and the destination, we overcome the shortcoming of feature rarefaction when calculating the transition probability with low sampling rates; (iii) The directional similarity shows a good performance in complex intersection environments. The experimental results verify that the proposed algorithm can reduce the error rate in intersection matching and is suitable for GPS devices with low sampling rates.
Introduction
With the development of the global economy and the improvement in living standards, car navigation systems are now part of almost all new family cars [1]. As a high-quality and low-cost location sensor, GPS has been widely employed in the personal navigation services and vehicle safety monitoring industries, emerging as the most common core device in terminal positioning equipment. With the popularization of car navigation systems, various applications driven by trajectory data have emerged [2]. For example, tens of thousands of taxis are equipped with GPS sensors to determine the traffic flow of the entire city and the movement of people. Specific applications include predicting the traffic flow at the entire city level and designing practical, fast, and feasible driving routes for drivers, recommending locations where passengers may find empty taxis, and detecting unusual events in real-time (such as road collapse, temporary traffic control, and large gatherings), in order to provide early warning or timely action [3–7]. Through the analysis of long-term data, we can determine the drawbacks in urban planning and verify the effectiveness of the implemented plans to a certain extent [8]. In some cities, such as Copenhagen, researchers have installed sensors on bicycles to detect the air quality and temperature in the different parts of the city, and these data are then shared with information centers or relevant people through mobile phones. In Singapore, researchers have used cell phone signals to detect the flow of people, hotspots, and unusual events in the city. In the above examples, multiple computing units were used, such as cars, mobile phones, various types of sensors, roads and points of interest, and the data generated by these sources [5]. The results obtained reflect the rhythm and dynamics of the entire city. These applications require a necessary preprocessing step that matches each raw trajectory collected by the GPS with the road network. Research on effectively improving the matching accuracy to serve fields, such as intelligent transportation and smart cities, has attracted wide interest given the unavoidable errors in navigation systems [1].
Map matching typically implies correcting a deviated trajectory collected by the GPS from the map. Today’s GPS civilian positioning accuracy can reach up to 14 m, which meets the civilian positioning requirements. The GPS system can be stabilized in a short period of time. In this case, there are two main problems in using existing equipment to improve the matching accuracy: 1) Efficiency of the map matching algorithm. Especially at the initial moments, intersections, loops, viaducts, and other special locations, these locations have geographic complexity, which is the main reason for the mismatch of existing algorithms, i.e., a vehicle is matched to the wrong path or jumps or reciprocates. Under this circumstance, users cannot accurately judge their location, and the realization of service functions, such as traffic guidance and route planning, is affected. 2) Efforts to improve the accuracy of digital maps. The digital map is the basis of the navigation system and is the most intuitive matching result that users can perceive. If the map accuracy is low, the map matching algorithm will inevitably require a higher computational cost, even causing false matching. For the first problem, the existing methods are divided into two modes: online matching and offline matching. This article focuses on offline map matching. Typical techniques include point-to-segment projection [9], Frechette distance analysis [10], weight-based methods [11–13], hidden Markov model (HMM) [14, 15], Bayesian inference [16, 17], confidence function theory [18], and fuzzy control theory [19]. However, when the sampling frequency is low, and the intersection is complicated, the accuracy of the matching algorithm is significantly reduced. Developing an algorithm that can overcome the problems of low sampling frequency and complex intersections in map matching remains challenging. The second problem in the map matching algorithm will be addressed in our future work. Herein, we mainly study the first question and assume that the road network data obtained by OpenStreetMap (OSM) are correct.
To improve the matching accuracy, this paper proposes an algorithm based on the link factor and the hidden Markov model (HMM). Considering that there is no turning requirement for the trajectory points of non-intersections, one road section can contain a large number of trajectory points. Therefore, in the preprocessing stage, the trajectory points are divided into intersection points (intersection points are the trajectory points at the road intersections) and non-intersections (trajectory points that are not intersection points). This approach not only reduces the time cost, but also maintains the matching accuracy. We use different matching methods for different trajectory points. In the calculation of the observation probability, it is necessary to construct an R-tree with the trajectory point as the node to obtain the k-nearest candidate road sections. Only the distance feature is considered in the matching of the non-intersections, and in the matching of the intersection points, the distance and direction characteristics of the historical-state and current-state candidate road sections are combined. In the calculation of the transition probability, for the non-intersection points, the connectivity of the road network and the distance characteristics between the projection points in the candidate road sections are mainly considered. For intersection points that may require turning, the overall trend in the trajectory should be considered. The main contributions of this study to map matching are summarized as follows: Based on the temporal and spatial characteristics of the trajectory points, the trajectory points are divided into intersections and non-intersections. Different trajectory points have different matching methods. We outline two problems in the application of the conventional hidden Markov model in map matching and propose corresponding solutions. To overcome the difficulty in matching intersections, new calculation methods for the observation and transition probabilities are derived, and an intersection matching algorithm suitable for low-frequency sampling data is proposed. The algorithm is compared with the classical HMM-based matching algorithm and the AntMapper algorithm on public and simulated datasets.
The rest of this paper is organized as follows. In Section 2, we introduce existing related work on abnormal trajectory detection. The matching algorithm based on the link factor and hidden Markov model is detailed in Section 3. The experimental setup and results are presented in Section 4. Finally, the conclusions are presented in Section 5.
Related work
Several map-matching approaches have been developed so far, and new ones are being explored. The existing approaches can be mainly classified into two categories [20]: weight-based methods and geometry-based methods.
A geometry-based approach depends on the point-to-point, point-to-line, and line-to-line distances. Yang et al. [21] presented a graph-based algorithm by assuming the trajectory as a line segment; they determined a candidate road set by checking whether the line segments intersect with the polygons in the graph. However, in the case of complex two-way and round roads, this method often fails. Researchers prefer weight-based approaches.
Different weight-based algorithms take different feature weights of the sampling points, such as the distance, direction, speed, and other features. The time cost of this method is relatively high despite its high accuracy. To reduce the cost, some scholars have reduced the pathfinding time [15, 22–24], whereas others simplified the trajectory and map data [24, 25]. A map matching algorithm based on the HMM was proposed by Newson et al. in 2009 [15]. They used a Markov chain to find candidate links and to effectively control the number of candidate links. This method showed a good accuracy and time cost. However, the algorithm employed the distance difference feature, which is less advantageous in the dense parts of the road network. An extended algorithm [26] has been presented by combining the average speed difference feature and the driver path preference (DPP) to define the transition probability. The use of the average speed and DPP provided a more accurate description of the context relationship between the two candidates. The matching accuracy was more satisfactory when compared with Newson¡¯s algorithm. However, the matching accuracy is low when the number of trajectories marked with the road segments is insufficient. Qi et al. [27] proposed a junction decision-domain model. Although they concluded that the key to matching performance is the interaction matching, their model increases the set of candidate road segments without improving the decision-making process in specific instances.
Many existing approaches have focused on searching for more suitable features to calculate the observation probability or transition probability in order to improve the matching accuracy. However, with respect to dense road networks, the context information of the candidates used in calculating the transition or observation probabilities is weak. Therefore, further research should be conducted to improve the intersection matching accuracy.
Map-matching algorithm based on link factor and hidden Markov model
This section introduces the preliminary knowledge of map matching, and elaborates the design of the observation probability and transition probability model we proposed. For clarity, the symbols used are summarized in Table 1.
Main notations used in this paper
Main notations used in this paper
The HMM is a statistical analysis model used to describe a Markov process with hidden parameters. This model has been applied in many fields including language recognition, natural language recognition, and pattern recognition. The main problems that it can solve have the following characteristics: 1) Sequence-based problems such as time series or state sequences; 2) There are two types of data in this type of problem: one is observable, which then becomes an observation sequence, and the other is the hidden sequence, which cannot be directly observed. A Markov process is a type of random process, which means that with a given current state, only the current state can be used for the future evolution, whereas the previous historical state is irrelevant to the future state. This process can be expressed as follows:

Hidden Markov state chain.
From the above description, five basic elements of the HMM are introduced: Hidden state set: X = {x1, x2, ⋯ , x
n
}, where n is the number of hidden states; Observation state set: Y = {y1, y2, ⋯ , y
m
}, where m is the number of observation states; Initial state probability vector: π = {π1, π2, ⋯ , π
n
}, π
i
= P {x
t
= i} (1 ≤ i ≤ n); Hidden state transition probability matrix: Observed state transition probability matrix:
If this model is applied to map matching, the observation state set Y contains the trajectory points collected by the GPS. Because the trajectory point is determined, the size of the observation set for a single trajectory point is one. The hidden state set X contains the road section where the trajectory point is actually located, and is also called the candidate set. The locations of the trajectory points are different in real time, so the candidate set will change accordingly. The calculation method for the observation probability matrix is typically determined using the Gaussian distribution of the distances from the GPS points to the road sections. In [15], the observation probability was calculated using the following equation:
To solve the aforementioned problem, we improve the method proposed in [15] and consider the matching of the complex roundabouts separately. Figure 2 shows the framework of our proposed approach, which is composed of two major components: a preprocessing stage and a matching stage.

Algorithm flowchart.
The preprocessing stage mainly involves extracting the intersection and preparing the candidate sets. In the matching stage, based on the characteristics of the HMM, we need to calculate the observation probability of each trajectory point based on the direction feature and the transition probability based on the road network connectivity. Each trajectory point is checked: if it is an intersection, another method is used to calculate the observation and transition probabilities. How to calculate the transition and observation probability of intersection points will be described in Section 3.4.
The trajectory data are obtained by sampling the movement process of one or more moving objects in the space-time environment, including the sampling point position, sampling time, and speed. The sampling point data information constitute the trajectory based on the sequence of the sampling time data.

Road network.
As shown in Fig. 4, the red dots, green dots, and five-pointed stars indicate the intersection point, non-intersection points, and road node, respectively. The overall trend in the trajectory can be roughly determined by the intersection points. Therefore, we can solve the junction matching problem through the temporal and spatial characteristics of the intersection points. We introduce this aspect in detail in Section 3.4.

Example of intersection points.
We use the idea of the binary search algorithm to find the intersection trajectory points. The specific algorithm steps are as follows: The trajectory points are represented in the form of a 1D-ordered linear table. The distance between the keyword recorded in the middle position of the linear table and the road section point should be calculated. If the distance is less than the set threshold r, the search is said to be successful. Otherwise, we judge whether the distance between the keyword of the previous position of the middle position and the road section is less than r, and whether the distance between the keyword of the next position of the middle position and the road section is less than r, until the intersection point is found. Otherwise, it always looks forward along the corresponding direction; after a successful search, the remaining unsearched tables are divided into two sub-tables. Steps 2 ∼ 4 are repeated until the child table no longer exists. The specific example is shown in Fig. 5.

Specific search example.
The green square in Fig. 5 represents the intersection trajectory point, and its search process is as follows:
In the first round of the search, we check whether Box 7 is an intersection trajectory point. The search is successful. The linear table is divided into two sub-tables: one is Box 1 to Box 6, and the other is Box 8 to Box 14. In each sub-table, we first check whether Box 3 is an intersection trajectory point. If the search fails, we find the squares before and after Box 3. Box 2 is not an intersection point, and the sub-table cannot be divided; in this case, the search ends. Box 4 is an intersection point, so Boxes 5 and 6 are used as a sub-table for the next round of query; in the second sub-table, Box 11 is not an intersection point, so we check the nearby boxes in turn, and find Box 10 as an intersection point, so Boxes 8 and 9 are also used for the next round of search. There is no intersection trajectory point in a square with a number greater than 11, so the search process ends. In the third round of the search, none of the sub-tables are intersection trajectory points, and the sub-tables cannot be divided, so the search process ends.
This study mainly improves the HMM model from three aspects. First, the trajectory points are divided into two types: intersecting and non-intersecting. Second, we use different methods to calculate the observation and transition probabilities for different types of points. Third, we determine the specific direction of the intersection through the overall direction of the trajectory. Considering the need for turning at intersections, when calculating the observation and transition probabilities, the direction difference and the inter-section reachability cost are increased.
We judge whether a trajectory point is an intersection point mainly based on the distance between the trajectory point and the node of the candidate road section. The trajectory points within 100 m of the crossroad are considered intersections.
Observation probability matrix
The conventional HMM only considers the distance between the current point and the road segment in the observation probability calculation. For non-intersection trajectory points, the observation probability in the conventional HMM is calculated accurately, so we use the observation probability calculation method proposed by Newson et al. [15] for the non-intersections. When the distance between the candidate road sections of the historical and current observation objects is short, the probability that the current observation object matches with the road section is high.
However, we believe that it is necessary to consider the matching of the historical points for roads in complex environments, particularly at road intersections. As shown in Fig. 6, pi-1, p i , and pi+1 denote three consecutive trajectory points. If the p i trajectory point is at an intersection, the position of p i is very close to the sections of rs1 and rs2 because of the error in the GPS points. If only the distance from the trajectory point to the road section is considered, there may be an error when matching p i with rs2.

Example of a problem that is difficult to solve using the conventional algorithm.
Therefore, this study considers two factors when calculating the observation probability at intersections: The absolute distance between the current observation object x
t
and the candidate road section. The correlation between the trajectory direction formed by the current observation point x
t
and the previous observation point xt-1 and the direction of the candidate road section at the current observation point.
Similarly, assuming that the distance between the observation object and the road section obeys a normal distribution, we can express the observation probability of the intersection point as follows:
A trajectory is a line segment composed of continuous points, so the matching road sections should also be connected. Therefore, it is necessary to consider the reachability between sections in the road network, and a road network directed graph can be used to describe this feature. Therefore, we consider two factors when calculating the transition probability for non-intersection points: The difference in the distance between consecutive trajectory points and that between candidate road section projection points; The connectivity between the candidate road sections at the consecutive sampling points.
We define the reachable cost between the links based on their connectivity.
In Equation (7), β refers to the difference between the driving distance and the distance from the earth’s surface. The calculation of ζ is given in Equation (8), which evaluates the cost of the current road section with the next road section. ∥zt,i - zt+1,j∥ is the distance between the projection point on y i at time t and the projection point on y j at time t + 1. The closer the distance, the more likely the observation object is on the link. Moreover, the lower the reachable cost between the links, the closer the target is to the destination.
The intersection point matching decision is a difficult problem in map matching, as there is no accurate theory for guidance. We consider the next trajectory point of the direction only if there is a crossroad. However, as many roads are close by and overlap each other, it is difficult to make decisions for circular viaduct intersections.
As shown in Fig. 7, the green and yellow dots respectively represent the trajectory intersections and road points, and ir jk is the number of road sections with the starting point ir j and the ending point ir k . Two green points, ix1 and ix2, indicate the intersections extracted in the preprocessing stage. x is not only the non-intersection trajectory point, but is also the next GPS sample point of ix1 in the original trajectory. When the current matching object is ix1, ∥z1,23 - z23∥ is very close to ∥z1,24 - z24∥, and a matching error is likely under this condition.

Elevated intersection matching example.
Given the above problems, we consider the similarity between the direction of the trajectory point and the direction of the road section.
From the definition, when 0 ≤ Δθ ≤ 90°, and 0 ≤ DS ≤ 1 the greater the difference in the direction between the track and road segments, the lesser the directional similarity. In addition, when 90° ≤ Δθ ≤ 180° and -1 ≤ DS ≤ 0 and when the range of the candidate links is wide, DS may be less than 0.
We summarize the observation probability calculation, transition probability calculation, and intersection matching algorithm introduced earlier to obtain the LF-Mapper matching algorithm.
Evidently, since the node information of the road section obtained from OpenStreetMap is complicated, the road section should be simplified by a line segment comprising two points in the pre-processing stage. In Line 1 of Algorithm 1, the intersection points in a trajectory are found, and in Line 2 of Algorithm 1, the candidate for each trajectory point is found. In this study, the trajectory points are divided into intersections and non-intersections. Because a crossroad is an important factor affecting the matching accuracy in map matching, the observation and transition probabilities for different types of trajectory points are calculated differently. Lines 4 to 12 of Algorithm 1 represent the calculation of the observation and transition probabilities. Finally, the transition probability matrix and the observation probability matrix are used as the input of the Viterbi algorithm, which outputs the optimal road sections.
Algorithm 1. Matching algorithm based on link factor and hidden Markov model(LF-Mapper)
Input: trajectory T = {x1, x2, x3, ⋯ , x n }, road network G (V, E)
Output: matched road segments RS = {rs1, rs2, rs3, ⋯ , rs n }
1: IT = Find the intersection points in a trajectory
2: CN = Calculate the candidate nodes for all trajectory points
3: for i in range(len(T)) do
4: if x i not in IT then
5: Calculate the observation probability using (2)
6: Calculate the transition probability using (6,7,8)
7: else:
8: Calculate the observation probability using (4,5)
9: Calculating the transition probability using (9,10)
10: endif
11: endfor
12: RS = Viterbi // Find optimal path
13: return RS;
In this section, we report experiments conducted on real and degraded datasets to study the performance of the proposed algorithm. The experimental result of LF-Mapper is compared with those of a representative algorithm based on the HMM [15] and an ant colony-based map matching approach (AntMapper) [22]. We list all the parameters used along with the ranges in Table 2. In the following, we introduce the datasets and parameters used in this study and describe the experimental results in detail.
Parameters used along with the ranges
Parameters used along with the ranges
Trajectory datasets
This section mainly introduces the datasets. We have two real trajectory datasets. One of the datasets (hereinafter referred to as SD) was reported in [15]. SD was tested in Seattle, Washington, USA for approximately 80 km, and 7,351 GPS points were collected at a sampling frequency of 1 Hz. The ground truth is a sequence of edges in the map data. The other dataset has been published in [24] (hereinafter referred to as MD); it reflects a journey in Melbourne that took approximately 40 min, with 2512 GPS points.
We simulate low-fidelity data by adding a Gaussian noise. Fortunately, a Gaussian noise is not only mutually uncorrelated between random variables at any two different moments, but is also statistically independent. In short, it satisfies the equations below. Therefore, we simulate a random Gaussian noise with standard deviations of 1, 10, 15, 20, 30, 40, 50, 75, and 100.
In addition to GPS data, we require road network data. Figure 8(a) shows the actual road network map obtained of Seattle. The coverage area is [-37.8269, -37.7221, 144.9316, 145.0753]. The actual condition of the road cannot be clearly seen in the figure because the complete road network data are complex, with 82,674 nodes and 188,046 edges. Reading and searching is very time-consuming, so the network should be simplified. The simplified road network diagram has 38,102 edges and 16,198 nodes. Figure 8(b) shows the actual road network diagram of Melbourne. The coverage area is [47.563166, 47.671083, -122.357316, -122.086483]. The complete road network data has 79,351 nodes and 151,085 edges, whereas the simplified road network data has only 36,576 edges and 14,067 nodes.

Road network datasets.
We need to estimate two probability-related parameters and one parameter of the number of candidates k. One of the probability-related parameters is σ, estimated from the Gaussian distribution using Equation (13). From the notation given in Section 3, the sampling point on the road section y
i
nearest to x
t
is zt,i. We calculate the perpendicular distance ∥x
t
- zt,j∥, which is considered an estimate of the GPS sampling error. The median absolute deviation (MAD) method is used to estimate σ, which is resilient against outliers. For our datasets, we set σ to 5.05 m.
Considering the number of k candidate values, we set up a group of experiments to compare the matching errors corresponding to different k. Figure 9 shows the impact of the number of candidates on the matching error of LF-Matching. We set the range of k between 1 and 8, and compare the influence of k on the matching error rate under different sampling frequencies. The sampling frequencies are set to 1, 5, 10, 20, 30, 50, 70, and 100 s. Fig. 9(a) shows the matching error rate with different sampling rates and k on the SD dataset, and Fig. 9(b) shows the matching error rate with different sampling rates and k on the MD dataset. Figure 9 shows the matching error rate with different numbers of candidates and sampling frequencies on the SD and MD datasets. As shown in Fig. 9, with increasing k value, the error rate decreases. When k increases to 6, the error rate decreases less evidently. Considering that, the greater the k value, the higher the time cost of searching for the shortest path, we believe that setting a k value of 6 is appropriate.

Comparison of the error rates with different sampling rates and k on the SD and MD datasets.
All the experiments were conducted using Python3.7, with the software and hardware environment being Intel Core i5 @ 2.30 GHz quad-core CPU, 16 GB memory, and Windows 10 operating system. The simulated map data are more complicated than the real map, which can help verify the advantages of LF-Mapper for the intersection decision. Two main methods are used for calculating the accuracy of the map matching algorithm: calculating the number of correctly sampled points or the number of correctly matched road sections. In the first method, finding the actual locations of each sampling point is difficult. The latter is evidently simpler than the former. This method will obscure the position of the intersection. From a macroperspective, as long as the direction of the entire trajectory turning is correct, it is irrelevant which road the intersection belongs to. Therefore, we use the second method to calculate the matching accuracy. We calculate the precision suggested by Newson et al. [15]. Fig. 10 shows the detailed calculation method.

Measurement of the matching error.
First, we use the real datasets to change the sampling frequency by adding and deleting trajectory points. The sampling frequencies are set to 1, 5, 10, 20, 30, 50, 70, 100, 130, 170, 210, 250, 300, 350, 410, and 470 s. Fig. 11(a) shows the matching error rate with different sampling rates to LF-Mapper, HMM, and AntMapper on the SD dataset, and Fig. 11(b) shows the matching error rate with different sampling rates to LF-Mapper, HMM, and AntMapper on the MD dataset. As shown in Fig. 11, when the sampling frequency is increased to 470 s, the LF-Mapper error rate changes smoothly and has an advantage after 300s. The HMM algorithm only considers the relationship between the sampling point and the distance of the road section, without adding the direction factor. In addition, the AntMapper algorithm normalizes the geometric and direction errors, thus reducing the influence on the direction error at the intersection. As the sampling frequency gradually increases, the directional similarity between trajectory sections and road sections and road connectivity become critical. Distance feature has a good matching effect on the trajectory of high-frequency sampling, but loses its advantage for low-frequency sampling.

Comparison of the error rates with different sampling rates to LF-Mapper, HMM, and AntMapper on the SD and MD datasets.
We simulate the real ground situation by adding Gaussian white noise to the trajectory. In Section 4.1, we introduced the mechanism whereby noise is added. Moreover, we set four different sampling intervals, namely 30, 70, 100, and 130, to check the matching error rate. Figs. 12 and 13 show comparisons of the map matching accuracy on the SD and MD datasets, respectively. The x-axis represents the standard deviation (in m) in a zero-mean Gaussian distribution. For setting the Gaussian white noise, we follow the suggestion provided by Newson et al. [15]. As shown in Figs. 12 and 13, when the added noise is less than 100 m, the error rate comparison of LF-Mapper is better than those of the other algorithms. With increasing noise and sampling frequency, the matching error increases. When the noise data value increases to 75, the matching error rate of LF-Mapper is lower than those of HMM and AntMapper. Because of the increasing noise, the distance feature loses its efficiency to some extent. At this time, the matching efficiency mainly depends on the connectivity of the direction features and road network data, which can ensure that the macroscopic trend in the vehicle trajectory is correct.

Comparison of the error rates with different sampling rates and standard deviation of Gaussian noise to LF-Mapper, HMM, and AntMapper on the SD dataset.

Comparison of the error rates with different sampling rates and standard deviation of Gaussian noise to LF-Mapper, HMM, and AntMapper on the MD dataset.
We edited the real map data and added road sections in different directions by overlapping the real map intersections, which increase the complexity of the road sections. When four real road sections converge at an intersection, we added one or two roads to make five or six road sections converge at one point. We set the number of road sections to 4, 5, 6, 7, 8, 9, and 10. Similarly, when adding roads, we also set different sampling times to compare the matching error rates. We set four different sampling intervals, namely 30, 70, 100, and 130, to check the matching error rate. Figs. 14 and 15 show the comparisons of the map matching accuracy on the SD and MD datasets, respectively. From the matching error rates of the different algorithms applied to the Seattle and Melbourne datasets, we find that when the number of road sections increases to 8, the matching error rate changes abruptly at different sampling frequencies. The error rates of the other two algorithms are similar. With the increase in the number of road sections, the angle between the road sections decreases, so the directional similarity between the roads is higher. When matching depends on the directional similarity between the trajectory segment and road section, the matching error rate will suddenly change. However, considering the actual situation, it is rare for eight road sections to overlap at an intersection, so this error rate can be tolerated.

Comparison of the error rates with different sampling rates and different numbers of road sections to LF-Mapper, HMM, and AntMapper on the SD dataset.

Comparison of the error rates with different sampling rates and different numbers of road sections to LF-Mapper, HMM, and AntMapper on the MD dataset.
As a pre-processing step for many applications, map matching has increasingly higher requirements on the matching accuracy of the algorithms used, in complex road environments. We propose a map matching algorithm based on link factor and hidden Markov model: 1) Based on the temporal and spatial characteristics of the trajectory points, the trajectory points are divided into intersections and non-intersections. Different trajectory points have different matching methods. 2)For a trajectory with a low sampling frequency, the cubic spline interpolation method is used for interpolation, so that there are intersection trajectory points at the road intersection. 3) We outline two problems in the application of the conventional hidden Markov model in map matching and propose corresponding solutions. 4) To overcome the difficulty in matching intersections, new calculation methods for the observation and transition probabilities are derived, and an intersection matching algorithm suitable for low-frequency sampling data is proposed Finally, we tested our algorithm on real and simulated datasets, and confirmed its effectiveness. A limitation of the proposed algorithm is that it is only suitable for offline states, not for online matching. This is because online matching requires considering the real-time performance, and it is necessary to preprocess a user’s trajectory in advance, which will be addressed in our future work.
Footnotes
Acknowledgments
This work was supported by the National Natural Science Foundation of China (61972439, 61672039, 61702010).
