Map-Matching Algorithm Based on Hidden Markov and Constraint Value Pruning

Abstract

Map matching is the process of matching global positioning system (GPS) trajectory data with map data. Its purpose is to determine the actual route of the moving object. Because of factors such as positioning devices and the environment, the GPS trajectory data obtained may not be accurate. In location-based services, map matching can be used to address the accuracy and reliability issues of GPS location data. There are currently many map-matching algorithms, for example, spatio-temporal-based (STD) matching algorithm, improved interactive voting-based map-matching (IIVMM) algorithm, and turning-point-based (TPB) offline map-matching algorithm, but existing algorithms have some shortcomings, such as low matching accuracy on complex roads or low sampling rates, and routing calculation time bottlenecks. Therefore, this paper proposes a fast matching algorithm based on constrained value pruning that is suitable for complex roads. The algorithm considers the multi-directional features within the road and uses secondary calculations to determine candidate points, improving the accuracy of candidate point selection. Additionally, two pruning strategies based on constrained values are introduced to reduce the majority of the routing calculation process and improve the matching efficiency. Finally, comparative experiments are conducted on a real trajectory dataset. The results show that, compared with STD, IIVMM, and TPB algorithms, the algorithm accuracy is improved by about 2% to 4%, and the running time is reduced by about 30%.

Keywords

map matching location-based services fast matching pruning strategy GPS data

Map matching is the process of matching global positioning system (GPS) trajectory data with map data to determine the actual position and movement trajectory of vehicles or pedestrians on the road network. Because of the effects of multipath, environmental noise, and low-power design on the mobile receiver module, GPS positioning technology has a low trajectory sampling rate and large positioning error, and cannot be used directly. Map matching can project the initial positioning point onto the actual road, complete the missing trajectory, and minimize the deviation of the moving target from the road.

With the rapid development of mobile internet and the Internet of Things, vehicles equipped with positioning modules generate massive trajectory data. Researchers are committed to quickly and effectively obtaining traffic information from these data and further applying them to many location-based services (LBS), such as map services, traffic management, and route searching. LBS services and map matching are closely related because LBS relies on location data to provide services, while map matching is the process of matching location data with map data to determine their accurate position. Map matching is a necessary preprocessing step for trajectory data and has strong practical application value for LBS.

Given a road network $G (V, E)$ and a GPS trajectory data, the goal of map matching is to map each trajectory point to the actual position of the moving object on the road network. Most existing advanced map-matching algorithms are developed using hidden Markov models (HMM), as they provide good predictions for the matching process and are widely applied. HMM is a Markov process that infers hidden unknown parameters from observable parameters, where the true route of the moving object and GPS trajectory points are considered as hidden state sequences and observable sequences, respectively. First, a set of candidate points is determined for each GPS point as hidden states on the Markov chain, and observation probabilities are assigned based on the matching between candidate road segments and GPS points. Then, the connectivity of candidate points is modeled according to certain rules, and state transition probabilities are computed for adjacent vertices on the Markov chain. Finally, the algorithm searches for the candidate route with the highest probability on the Markov chain as the matching result. Currently, the field of map matching has achieved many representative results, especially the map-matching algorithm based on HMM. Representative algorithms include spatio-temporal-based (STD) matching and improved interactive voting-based map-matching (IIVMM) ( 1 , 2 ). The matching processes of the two algorithms are similar. Firstly, a candidate point set is selected for each GPS point, as shown in Figure 1. After the road segment projection process, a set of candidate points ${c_{i}^{1}, c_{i}^{2}}$ within the error range of GPS point $p_{i}$ are determined. Secondly, a candidate graph is generated. In the global route matching stage, a scoring function is defined using spatiotemporal and directional features, and the candidate route with the maximum cumulative probability is calculated through the $Viterbi$ algorithm.

Figure 1.

A sample of candidate point set of global positioning system point $p_{i}$ .

In addition, complex road networks and computational time pose challenges common to the development of all novel algorithms. Complex roads generate an abundance of potential matching points, which can potentially result in incorrect path matches—an issue that will be elaborated on in subsequent sections. Furthermore, the dataset utilized in this paper comprises taxi trajectories. Typically, taxi trajectory data are gathered from the intricate urban road systems, where complexity is inherent. Existing HMM-based matching algorithms incorporate a variety of feature factors for calculation, which improves the matching accuracy. However, there are still some shortcomings in such algorithms. First, the internal direction change of the road section is not considered in the candidate set selection stage, resulting in the calculated observation probability not being accurate: as in Figure 1, the projection point $c'_{i}$ at $e_{i}^{2}$ turn is obviously closer to the real position of $p_{i}$ . Second, the decoding process of the existing map-matching algorithm uses the $Viterbi$ algorithm, which requires many shortest route search processes on the road network, with huge time costs.

To alleviate the limitations brought by the above problems, this paper is dedicated to improving the accuracy and computational efficiency of the map-matching algorithm from two aspects that have been neglected by related studies. In summary, the main contributions of this paper are as follows:

1) For map matching under complex roads, the algorithm in this paper considers the distance and direction factors in the candidate point extraction stage and uses the road midpoint information to determine the candidate points by secondary computation, thus improving the matching accuracy.

2) For the problem of excessive computation time cost, a fast matching strategy based on limited value pruning is proposed, and an improved Dijkstra (IDijkstra) routefinding algorithm is proposed, which reduces the unnecessary routing computation process and greatly improves the efficiency of matching.

In conclusion, extensive experiments were conducted on real data sets by combining contribution points 1 and 2, and the experimental results show that the matching accuracy of Pruning Features Map Matching (PFMM) for GPS trajectories can reach 92.6%. Comparing the algorithm of this paper with three existing methods, the performance of the proposed PFMM algorithm outperforms the three existing algorithms at both high and low sampling frequencies, the running time is reduced by more than 30% compared with the three existing algorithms, and the matching efficiency is greatly improved.

Related Work

Map matching is the process of corresponding trajectory data to road networks. The concept was first proposed by Tanaka et al. and has evolved from simple search methods to complex and accurate route matching ( 3 ). Map-matching algorithms can be classified into local matching algorithms and global matching algorithms based on the range of track inputs. Local matching algorithms use only some of the current trajectory points to calculate the partial solution, which gives a faster computation speed and is suitable for processing real-time and streaming trajectory data to provide support for some real-time services, such as vehicle navigation and driving route prediction. Local matching algorithms are more suitable for processing high sampling rate trajectory data, and the effectiveness of the algorithm will be reduced for low sampling rate trajectory data. Global matching algorithms, on the other hand, process the stored complete trajectory data and find the best matching route in the road network, which can satisfy some trajectory-based services, such as travel route recommendation, cab abnormal trajectory detection, and so forth. Since the conversion and storage of raw data will be burdensome and that there is equipment battery limitation, which usually reduces the sampling rate of trajectory data, how to perform map matching based on low sampling GPS trajectories is an important problem, which has been better solved in recent years’ map-matching algorithm research.

Simple Map-Matching Algorithms

Early simple map-matching algorithms could return vertices or edges that were close in geometry or topology and trajectory distance, generating matching results. Geometry-based map-matching algorithms include point-to-point matching, point-to-line matching, and line-to-line matching (4 –6). These methods are relatively simple to implement, but the algorithmic computation is large, and they are greatly affected by trajectory noise and sparsity. Topology-based map-matching mainly relies on the topology of the road network and considers vehicle historical data and road topology to determine the vehicle’s position (7 –9). The weighted topology matching method in Quddus et al. considers different types of data such as position, speed, direction, and travel time to improve matching accuracy ( 9 ). However, it is still susceptible to GPS trajectory noise and sparse roads, and is not suitable for complex road networks.

The key to a simple map-matching algorithm is how to measure distance similarity. The most commonly used distance function is the Frechet distance, which is suitable for measuring curve similarity with spatial temporal sequence, but is sensitive to trajectory outliers ( 5 ). The longest common subsequence (LCSS) is an alternative method to Frechet distance which considers the similar parts between trajectories as a measure of similarity and skips trajectory points that exceed a threshold value because of matching distance. These features make it robust to noise. Cui et al. proposed an improved LCSS, known as the Longest common subsequence with heading penalty (LCSS-HP) method, which incorporates direction label information for trajectory segments and allows candidate road segments to be matched with multiple consecutive GPS points, further improving accuracy ( 10 ).

Advanced Map-Matching Algorithms

Advanced map-matching algorithms refer to those that use Kalman filtering, fuzzy logic, Bayesian inference, conditional random fields (CRF), HMM, and neural networks, among others (1, 2, 11 –29). Advanced map-matching algorithms often consider more comprehensive information, resulting in higher accuracy.

CRF is the interaction between observations, using characteristic functions to more freely express the relationship between states. The map-matching algorithm in reference, which applies to low sampling trajectories, applies GPS point candidate points and paths to a CRF model, and finds the best matching candidate points through a forward-backward recursive algorithm. The map-matching algorithm proposed by Hunter et al. combines the observation model and the driver model into a CRF to obtain the most probable trajectory ( 20 ). The map-matching algorithm based on CRF normalizes the global observation sequence and adds all possible candidate paths to the calculation, resulting in a huge search space and low efficiency.

HMM is one of the most widely used methods. HMM can be used to model sequential data of vehicle driving trajectories and calculate transition probabilities by comparing factors such as distance and direction between GPS points and road segments on the road network. It can utilize contextual information, whereby the current observation value is not only influenced by the current state, but also by previous states, and use dynamic programming based on the $Viterbi$ algorithm for inference and matching. The spatial and temporal matching (ST) algorithm in Lou et al. first applied HMM to map-matching, constructing a scoring function with temporal and spatial features and transforming observation probabilities using zero-mean Gaussian distribution for GPS noise ( 23 ). The difference between GPS displacement distance and candidate route distance was transformed into transition probabilities, since the route of correctly matched points is always short. Finally, the $Viterbi$ algorithm was used to calculate the optimal route. Later, the STD algorithm and the IIVMM algorithm improved the ST algorithm by defining feature functions through spatiotemporal and directional analysis, considering real-time movement direction and speed constraints of trajectory points to improve matching accuracy ( 1 , 2 ). In recent years, there have been some map-matching algorithms based on trajectory segmentation, which to some extent improve matching efficiency compared with point-by-point calculations. Zhang et al. proposed a map-matching algorithm based on turning points, which introduces the concept of vehicle turning points and implements segmented matching of the map ( 24 ). The algorithm first separates the entire trajectory into multiple sub-trajectories using identified turning points, and then selects the best matching route for each sub-trajectory from the corresponding multiple shortest routes.

The HMM-based algorithm uses projection once to determine candidate points in the candidate point selection stage, without considering the internal curvature of road segments, which may lead to inaccuracies in some cases. In addition, the large amount of routing calculation required for global matching in such algorithms leads to high computational costs.

In addition, some research results are based on Bayesian rules to perform iterative prediction and update on current matching results. Taguchi et al. proposed a matching method based on a probabilistic path prediction model ( 12 ). The algorithm is based on a Bayesian network observation model and a route prediction model to match GPS trajectory data with a map road network. The route prediction model generates a candidate set of routes, and the observation model is used to calculate the matching probability for each route and select the best matching route. Hao et al. ( 31 ) proposed an improved Kalman filter and GPS error correction method, which uses historical flight paths and road map information to update the Kalman filter state vector and related parameters ( 19 ). The Kalman filter can predict future states based on current states and measurements. During the modeling process, the white noise assumption of the Kalman filter theory was considered, and the incremental error between two adjacent points was modeled as an approximation of Gaussian error and then incorporated into the Kalman filter.

Fuzzy logic can score matching candidate objects and select the candidate object with the highest score function value as the matching result. Quddus et al. proposed a high-precision map-matching algorithm based on fuzzy logic ( 13 ). This algorithm first calculates candidate routes based on vehicle sensor data, and uses fuzzy logic to score the routes, ultimately determining the best matching route. In recent years, neural networks have been increasingly applied to map matching problems. Neural networks combine multiple neurons in a nonlinear manner to calculate the score of candidate paths, and encoder-decoder frameworks are used to map trajectories to corresponding paths. Jin et al. ( 32 ) proposed a map-matching model based on deep learning ( 12 ). They constructed a transformer-based map-matching model using transfer learning methods, generated trajectory data for pre-training the transformer model, and then fine-tuned the model using real data to minimize model development costs and reduce the gap between real and virtual. The average Hamming distance of three metrics at the point and segment levels was used to evaluate model performance using F-score and Bilingual Evaluation Understudy (BLEU), using the transformer’s attention weight to draw the map matching process, and finding out how the model correctly matches road segments. Neural-network-based models are sensitive to data size and, without sufficient data, they may fall into overfitting. These models heavily rely on the traffic patterns behind the training data, and different cities may have different numbers of roads and driving patterns. Therefore, models trained for one city cannot be directly applied to another city.

In addition, map-matching methods based on deep learning include Feng et al. who build a “trajectory2road” model with attention mechanism to map the sparse and noisy trajectory into the accurate road network based on the “seq2seq” learning framework ( 25 ). Jiang et al. propose a general and robust deep learning-based model, Learning TO Map Matching (L2MM), to tackle these issues ( 26 ). First, high-quality representations of low-quality trajectories are learned by two representation enhancement methods, that is, enhancement with high-frequency trajectories and enhancement with the data distribution. Second, to embrace more heuristic clues, typical mobility patterns are recognized in the latent space and further incorporated into the map-matching task. Finally, based on the available representations and patterns, a mapping from trajectories to corresponding paths is constructed through a joint optimization method. Liu et al. present GraphMM, a graph-based approach that exploits the graph nature of map matching and incorporates graph neural networks and conditional models to leverage both road and trajectory graph topology, while managing to align road segments and trajectories in latent space ( 27 ). We formally analyze the expressive power of our model in capturing various correlations and propose efficient algorithms for model training and inference.

This paper proposes a fast matching algorithm based on HMM. This algorithm takes into account the multi-directional characteristics of road networks and employs supplementary calculations to identify candidate points, thereby enhancing the precision of their selection. Moreover, a fast matching algorithm utilizing bounded value pruning, PFMM, is introduced, which significantly curtails the computational demands of route processing, thus boosting the overall matching efficiency.

Problem Definition

Given an initial GPS coordinate point and a digital vector map, the goal of map matching is to find the path the vehicle is most likely to travel on the map. In this paper, map matching based on HMM is studied.

In this section, we give the definition of the map matching problem. First, we give some basic definitions of map matching, and then we give relevant definitions of candidate set and candidate path probability in specific matching.

A GPS point $p_{i}$ is a latitude-longitude coordinate at a given moment collected by the vehicle’s loaded positioning module. Each GPS point is a quadruple $p_{i}$ 〈 lon, lat, time, d 〉, attributes consist of longitude $p_{i} . lon$ , latitude $p_{i} . lat$ , timestamp $p_{i} . time$ and real-time movement direction $p_{i} . d$ , where the movement direction value of the GPS point is the horizontal angle between the north-pointing direction line from the point and the target direction line in clockwise direction.

A GPS trajectory $Traj$ consists of a sequence of consecutive GPS points and the time interval between any two adjacent GPS points cannot exceed the sampling interval $Δ time$ , that is, $Traj$ : $p_{1} \to p_{2} \to \dots \to p_{n}$ , and 0 < $p_{i} . time - p_{i - 1} . time < Δ time$ .

A road network can be viewed as a directed graph $G (V, E)$ , where V is the set of vertices, that is, the endpoints or intersections of roads, and E is the set of directed edges, which is the set of all road segments in the road network. A road segment e denotes a directed edge in the road network with any road segment e ∈ E. Each road segment contains a start point e.start, an end point e.end, a length e.len, and a list of intermediate points. The other points between the two endpoints of the section are regarded as intermediate points.

A subsegment $e^{m}$ is a small segment formed by the endpoints of a road segment e and the intermediate point split.

A route is a set of consecutive sections starting at $v_{i}$ and ending at $v_{j}$ , that is, Route: $e_{1} \to e_{2} \to \dots \to e_{z}$ , where $e_{1} . start = v_{i}$ and $e_{1} . end = v_{j}$ .

The map-matching problem is defined as follows. Given the road network $G (V, E)$ and vehicle GPS trajectory $Traj$ , find the route with the highest matching probability from $G$ as the matching result.

Candidate Set Preparation

Data Preprocessing

In this section, data pre-processing is performed before the map-matching algorithm is executed, with the purpose of analyzing and filtering the raw data for outlier filtering and dwell point detection for GPS trajectories, to filter out the data that are not valuable in the raw data, as well as those that may interfere with the results. Data preprocessing pseudocode is shown in Algorithm 1, and the specific process is as follows.

Algorithm 1: GPS trajectory preprocessing
Input: Trajectory $Traj : p_{1} \to p_{2} \to \dots \to p_{n}$ , speed, threshold $vc$ , distance threshold $Dc$ , time threshold $Tc$ Output: GPS trajectory after preprocessing 1 $GSet \leftarrow Ø, GroupSet \leftarrow Ø, CSet \leftarrow Ø$ 2 for each pi in Traj do 3 $spee d_{i} = \frac{dis (p_{i - 1}, p_{i})}{(p_{i} . time - p_{i - 1} . time)}$ 4 if speed_i > vc then 5 pi←mean point of pi−1 and pi+1 6 end 7 GSet.add(pi) 8 end 9 for each pi in GSet do 10 GroupSet←calculate cluster set of pi 11 pM←calculate cluster set of pi 12 pi←mean point of GroupSet 13 CSet.add(pM) 14 end 15 return CSet

Algorithm 1: GPS trajectory preprocessing

Input: Trajectory

Traj : p_{1} \to p_{2} \to \dots \to p_{n}

, speed, threshold

vc

, distance threshold

Dc

, time threshold

Tc

Output: GPS trajectory after preprocessing
1

GSet \leftarrow Ø, GroupSet \leftarrow Ø, CSet \leftarrow Ø

2 for each pi in Traj do
3

spee d_{i} = \frac{dis (p_{i - 1}, p_{i})}{(p_{i} . time - p_{i - 1} . time)}

4 if speed_i > vc then
5 pi←mean point of pi−1 and pi+1
6 end
7 GSet.add(pi)
8 end
9 for each pi in GSet do
10 GroupSet←calculate cluster set of pi
11 pM←calculate cluster set of pi
12 pi←mean point of GroupSet
13 CSet.add(pM)
14 end
15 return CSet

Outlier Filtering Strategy

Outliers are a few GPS points that are abnormal to the normal data. These points are often abrupt to ensure the accuracy of the algorithm and need to remove the track outliers. The outlier filtering strategy is described below.

1) Specify the speed threshold ( $vc$ ) and calculate the average speed of $p_{i}$ and $p_{i - 1}$ . The term $dis (p_{i - 1}, p_{i})$ indicates the Euclidean distance between the two points, so: $spee d_{i} = \frac{dis (p_{i - 1}, p_{i})}{(p_{i - 1} . time - p_{i} . time)}$ .

2) If $spee d_{i}$ ≤ $vc$ , then $p_{i}$ is added to the result set.

3) If $spee d_{i} > vc$ , $p_{i}$ is an outlier, the average point $p'_{i}$ of $p_{i - 1}$ and $p_{i + 1}$ is calculated. The attribute value of $p'_{i}$ takes the median of the attributes of $p_{i - 1}$ and $p_{i + 1}$ , and $p'_{i}$ is added to the result set.

Take Figure 2 as an example, showing part of the trajectory $p_{1} \to p_{2} \to \dots \to p_{9}$ , the road speed limit $vc$ is 60 km/h, and easy $spee d_{2} = 375 m / 30 s = 45 km / h < vc$ , so p₂ is not an outlier, and $spee d_{3} = 1200 / 30 s = 144 km / h > vc$ , so $p_{4}$ is an outlier. Fitting $p_{3}$ and $p_{5}$ yields the intermediate point $p'_{4}$ to replace $p_{4}$ and serves as the result point.

Figure 2.

Examples of outliers and dwell points.

Dwell Point Detection Strategy

The dwell point indicates the geographical location of the vehicle that stays for a period of time. When the vehicle stops at a certain location, the GPS points recorded by the positioning device are often not the same coordinates, but a series of extremely similar trajectory points, so, to reduce unnecessary computation, the aggregated GPS points are clustered.

The $p_{i}$ is noted as the anchor point, and the clustering set of anchor points $grouplis t_{i}$ = ${p_{i}}$ is set, the distance threshold ( $Dc$ ), the time threshold ( $Tc$ ), and the dwell point detection conditions are:

1) $dis (p_{i}, p_{i + 1})$ ≤ $Dc$ , and $dis (p_{i}, p_{i})$ is the Euclidean distance between $p_{i}$ and $p_{i + 1}$ ;

2) $Δ t$ ≥ $Tc$ , $Δ t$ = $p_{i + 1} . time - p_{i} . time$ .

If $p_{i + 1}$ satisfies the above conditions, $p_{i + 1}$ is added to $grouplis t_{i}$ and the following trajectory points ${p_{i + 2}, p_{i + 3}, . . ., p_{n}}$ satisfy the residual point determination condition until $p_{i + m}$ does not satisfy the condition. Stop the detection and cluster $grouplis t_{i}$ to get the average point $p_{M}$ , and use $p_{i + m}$ as the anchor point for the residual point detection again.

As shown in Figure 2, the sampling time interval is known to be 30 s, and Dc = 200 m and Tc = 30 s are specified. When $p_{5}$ is an anchor point, the subsequent trajectory points ${p_{6}, p_{7}, p_{8}}$ meet the stay point condition after detection, so $grouplis t_{5} = {p_{5}, p_{6}, p_{7}, p_{8}}$ , and calculate the average point $p_{M}$ of $grouplis t_{5}$ to replace the points in the set.

Candidate Points Extraction

In this section, the candidate points extraction method is introduced, using a quadratic computation to extract candidate points strategy to search a set of candidate points for each GPS point before the matching calculation. Since, in real situations, there may be multiple turns inside the road and a road contains multi-directional features, the intermediate point information of the road section is considered for the first time to extract preselected points and candidate points separately.

Candidate Road Segment Extraction

For each GPS point $p_{i}$ in the given trajectory Traj: $p_{1} \to p_{2} \to \dots \to p_{n}$ , all road network segments contained in or intersecting the circular region with $p_{i}$ as the center and a radius of length r are called the candidate road segments of point $p_{i}$ . The jth candidate road segment of $p_{i}$ is denoted as $e_{i}^{j}$ .

Preselected Points Extraction

Given a GPS point $p_{i}$ and its candidate road segment, $p_{i}$ is projected onto each subsegment. If the projection point lies within the subsegment, it is considered a preselected point. Otherwise, the subsegment endpoint closest to $p_{i}$ is considered a preselected point. Taking Figure 3 as an example, $e_{i}^{1}$ is the candidate road segment of $p_{i}$ . $p_{i}$ is projected onto the subsegments $e_{i}^{1, 1}, e_{i}^{1, 2}, e_{i}^{1, 3}$ , and $e_{i}^{1, 4}$ of $e_{i}^{1}$ . According to the definition of preselected points, the set of preselected points PreCandi $(p_{i}, e_{i}^{1})$ = ${c_{1}, c_{2}, c_{3}, c_{4}}$ is extracted.

Figure 3.

A sample of extracting preselected sets for global positioning system point p_i.

The similarity between preselected points and trajectory points is measured by two criteria: distance and direction. The distance criterion is the Euclidean distance between the two points, and the direction criterion is the difference in direction between the trajectory point and the candidate point. The direction value of a preselected point is determined by the direction value of the subsegment it belongs to. The formula for calculating the direction of subsegment $e_{i}^{j, m}$ is:

θ_{e_{i}^{j, m}} = \arctan (\frac{e_{i}^{j, m} . end . lat - e_{i}^{j, m} . start . lat}{e_{i}^{j, m} . end . lon - e_{i}^{j, m} . start . lon})

(1)

where

$e_{i}^{j, m} . end . lat$ = the latitude of the end point of the subsegment, and

$e_{i}^{j, m} . end . lat$ = the longitude of the start point of the subsegment.

To reduce the error in direction calculation, we adopt an optimal range rule to assign direction values to preselected points:

1) If the preselected point itself is not an intermediate or endpoint of the road segment, the three adjacent subsegments are included in the range, and the direction closest to $p_{i} . d$ is selected as the direction value of the preselected point. In Figure 3, the adjacent subsegments of preselected point $c_{3}$ are ${e_{i}^{1, 2}, e_{i}^{1, 3}, e_{i}^{1, 4}}$ , and the direction of $e_{i}^{1, 3}$ is closest to $p_{i} . d$ , so the direction of $c_{3}$ is assigned as the direction of $e_{i}^{1, 3}$ .

2) If the preselected point is an intermediate or endpoint of the road segment, the two adjacent subsegments before and after the preselected point are included in the range, and the direction value is determined. In Figure 3, the adjacent subsegments of candidate point $c_{4}$ are ${e_{i}^{1, 3}, e_{i}^{1, 4}}$ , and it is obvious that the direction of $e_{i}^{1, 3}$ is closest to $p_{i} . d$ , so the direction of $c_{4}$ is the same as that of $e_{i}^{1, 3}$ .

The definition of the direction difference between trajectory points and preselected points is as follows:

Δ θ_{c_{m}} = | θ_{e_{i}^{j, m}} - p_{i} . d |,

(2)

where

$p_{i} . d$ = the direction value of $p_{i}$ , and

$θ_{e_{i}^{j, m}}$ = the direction value of the preselected point.

Candidate Points Extraction

For a GPS point $p_{i}$ and its candidate road segment $e_{i}^{j}$ , the preselected point set PreCandi $(p_{i}, e_{i}^{j})$ = ${c_{1}, c_{2}, . . .}$ is defined based on the similarity in distance and direction difference between $p_{i}$ and any candidate point $c_{m}$ . In general, GPS measurements have errors, but the true location can still be considered to exist within a certain range of the measured GPS position. Therefore, the distance similarity function can be reasonably formalized as a zero-mean normal distribution $N (μ, σ_{1})$ with a standard deviation of $σ_{1}$ , and the direction similarity function can be similarly formalized as $N (μ, σ_{2})$ . Based on the previous descriptions, the following formulas can be derived:

O (c_{m}) = \frac{1}{\sqrt{2 π σ_{1}^{2}}} \exp - (\frac{{(dis (p_{i}, c_{m}) - μ)}^{2}}{2 σ_{1}^{2}}),

(3)

D (c_{m}) = \frac{1}{\sqrt{2 π σ_{2}^{2}}} \exp (- \frac{Δ θ_{c_{m}} - μ)^{2}}{2 σ_{2}^{2}}),

(4)

where

$dis (p_{i}, c_{m})$ = the Euclidean distance between $p_{i}$ and $c_{m}$ , and

$Δ θ_{c_{m}}$ = the difference in direction between the two.

The overall similarity function $Sim (p_{i}, c_{m})$ is defined by Equations 3 and 4 as follows:

Sim (p_{i}, c_{m}) = O (c_{m}) \times D (c_{m}),

(5)

For a set of preselected points $PreCandi (p_{i}, e_{i}^{j}) = {c_{1}, c_{2}, . . .}$ , the point with the highest similarity is called the optimal preselected point, that is: $PreCand i_{opt} (p_{i}, e_{i}^{j}) = arcmax (Sim (p_{i}, c_{m}))$ ,

and the optimal preselected point is called the candidate point, denoted as $c_{i}^{j}$ , that is:

c_{i}^{j} = PreCand i_{opt} (p_{i}, e_{i}^{j}) .

Candidate points are extracted for all candidate segments of each GPS point $p_{i}$ , and the candidate point set $Candi (p_{i})$ is generated.

Based on the processed underlying road network data, the end point, length, and intermediate point data of the road are extracted and stored in the form of grid index to record the information of objects falling on the grid and reduce the search time in the process of searching for candidate road segments.

Candidate Route Probability Analysis

This section introduces candidate point probability calculation, candidate route probability calculation, and candidate graph construction. After candidate set extraction, each GPS point in the trajectory can generate a set of candidate points. In this section, first, the observation probability is defined, and the probability for candidate points is analyzed and calculated. Then, considering road connectivity, transfer probability is defined, probability calculation for candidate routes is performed, and, finally, candidate graphs of trajectories are generated based on candidate points and candidate routes.

The similarity function is described as the observation probability and has the following definitions based on the distance similarity and directional similarity.

The $observation$ $probability$ $pr (c_{i}^{j})$ denotes the probability that the candidate point $c_{i}^{j}$ is the actual location of the GPS point $p_{i}$ , and is determined by the distance similarity function and the direction similarity function, which are calculated as follows:

pr (c_{i}^{j}) = O (c_{i}^{j}) \times D (c_{i}^{j}),

(6)

The larger the distance $dis (p_{i}, c_{i}^{j})$ between the candidate point $c_{i}^{j}$ and the GPS point $p_{i}$ , the smaller the value of $O (c_{i}^{j})$ ; the larger $Δ θ (c_{i}^{j})$ , the smaller $D (c_{i}^{j})$ , that is, the value of the observation probability is inversely correlated with the distance and direction difference between the two points. The observation probability is calculated for each candidate point.

In some cases, considering only the observation probability is likely to result in a situation where the wrong candidate point has a larger observation probability, leading to a wrong matching route. For example, in Figure 4, there are $pr (c_{1}^{1}) > pr (c_{1}^{2})$ and $pr (c_{2}^{1}) > pr (c_{2}^{2})$ . If the observation probability is used as the criterion to match the candidate with the highest probability, the matching sequence is ${c_{1}^{1}, c_{2}^{1}}$ , but the corresponding matching route $e_{1} \to e_{5} \to e_{4} \to e_{3}$ has an obvious detour, which is not consistent with the driving norm. Therefore, if we consider the observation probability and the length of the matching route, the best candidate sequence is ${c_{1}^{2}, c_{2}^{1}}$ , whose matching route $e_{2} \to e_{3}$ is closer and more in line with the driving norm.

Figure 4.

A sample of parallel road.

A candidate route denotes a shortest route on a road network starting at $\forall c_{i - 1}^{t} \in Candi (p_{i - 1})$ and ending at $\forall c_{i}^{s} \in Candi (p_{i})$ , denoted as $CandiRoute (c_{i - 1}^{t}, c_{i}^{s}) : e_{1} \to e_{2} \to \dots \to e_{z}$ , where $c_{i - 1}^{t} \in e_{1}$ and $c_{i}^{s} \in e_{z}$ .

The transition probability denotes the candidate route $CandiRoute (c_{i - 1}^{t}, c_{i}^{s})$ as the probability of the true route of the vehicle, denoted as $tr (c_{i - 1}^{t}, c_{i}^{s})$ . Using the ratio of the Euclidean distance from $p_{i - 1}$ to $p_{i}$ and the road network distance $CandiRoute (c_{i - 1}^{t}, c_{i}^{s})$ , the following equation is available:

routesim = \frac{dis (p_{i - 1}, p_{i})}{length (c_{i - 1}^{t}, c_{i}^{s})},

(7)

tr (c_{i - 1}^{t}, c_{i}^{s}) = \frac{1}{\sqrt{2 π σ_{3}^{2}}} \exp (- \frac{{(1 - routesim)}^{2}}{2 σ_{3}^{2}}),

(8)

where

$dis (p_{i - 1}, p_{i})$ = the Euclidean distance between GPS points $p_{i - 1}$ and $p_{i}$ ,

$length (c_{i - 1}^{t}, c_{i}^{s})$ = the $CandiRoute (c_{i - 1}^{t}, c_{i}^{s})$ route length, and

$routesim$ = the ratio of the two.

If the route ratio is smaller, the transition probability value is larger, and the matching probability of its candidate route is larger.

According to the candidate points and candidate routes of trajectory, the candidate graph of trajectory can be generated, and the later calculations about trajectory matching are based on the candidate graph. Figure 5 shows the candidate graph $G_{Traj}$ of trajectory $Traj : p_{1} \to p_{2} \to p_{3}$ . The candidate graph has the characteristics shown in Figure 5.

Figure 5.

A sample of p1 →p2 →p3 candidate graph.

The candidate graph can find the final matching sequence, that is, the candidate graph contains the routes of all trajectory points of a candidate point. Since GPS points have the characteristics of time sequence, the final matching route can be calculated according to the incremental topology of the candidate graph.

Map-Matching Algorithm

Overview of Map-Matching Algorithm

According to the candidate graph definition, the cumulative probability measures the corresponding candidate route matching probability size. Figure 5 shows that, except for the $p_{1}$ candidate point, each candidate point has multiple directed edges (candidate routes) to reach the point, but only one candidate route with the best cumulative probability is retained, and only that candidate route has the possibility to become the final matching route.

The cumulative probability $cumu (c_{i - 1}^{t}, c_{i}^{s})$ denotes the probability that the candidate route $CandiRoute (c_{i - 1}^{t}, c_{i}^{s})$ becomes part of the optimal matching route, and is calculated as follows:

\begin{matrix} cumu (c_{i - 1}^{t}, c_{i}^{s}) = m ({cc}_{i - 1}^{t}) \times tr (c_{i - 1}^{t}, c_{i}^{s}) \times pr (c_{i}^{s}), \\ (1 < i \leq n), \end{matrix}

(9)

where

$tr (c_{i - 1}^{t}, c_{i}^{s})$ = the transition probability of $CandiRoute (c_{i - 1}^{t}, c_{i}^{s})$ ,

$pr (c_{i}^{s})$ = the observation probability of $c_{i}^{s}$ , and

$mc (c_{i - 1}^{t})$ = the optimal cumulative probability of the candidate point $c_{i - 1}^{t}$ .

The optimal cumulative probability is the maximum value in $mc (c_{i - 1}^{t})$ of the candidate points $c_{i}^{s}$ , denoted as $mc (c_{i}^{s})$ , and is computed as follows:

mc (c_{i}^{s}) = {\begin{matrix} pr (c_{i}^{s}), & i = 1 \\ \max {cumu (c_{i - 1}^{t}, c_{i}^{s})}, & (1 < i \leq n) \end{matrix}

(10)

where

$i = 1$ (i.e., the optimal cumulative probability of candidate points for the first point $p_{1}$ of the trajectory is its observation probability value),

$n$ = the total number of points of the trajectory,

$c_{i - 1}$ = the predecessor node of $c_{i}$ ,

$t$ = the number of any candidate points, and

$c_{i - 1}^{s}$ = the number of any candidate points $c_{i}$ .

Table 1 shows the cumulative probabilities of all candidate routes for the trajectory $Traj : p_{1} \to p_{2} \to p_{3}$ . For example, the candidate point $c_{2}^{1}$ has three candidate routes $CandiRoute (c_{1}^{1}, c_{2}^{1}),$ $CandiRoute (c_{1}^{2}, c_{2}^{1}), CandiRoute (c_{1}^{3}, c_{2}^{1})$ , the cumulative probabilities are 0.53, 0.43, and 0.27, the optimal cumulative probability is 0.53, and the optimal candidate route is $CandiRoute (c_{1}^{1}, c_{2}^{1})$ .

Table 1.

Candidate Routes of $p_{1} \to p_{2} \to p_{3}$

Candidate routes	Transition probability	Cumulative probability
$(c_{1}^{1}, c_{2}^{1})$	0.75	0.53
$(c_{1}^{2}, c_{2}^{1})$	0.68	0.43
$(c_{1}^{3}, c_{2}^{1})$	0.54	0.27
$(c_{1}^{1}, c_{2}^{2})$	0.91	0.47
$(c_{1}^{2}, c_{2}^{2})$	0.71	0.33
$(c_{1}^{3}, c_{2}^{2})$	0.66	0.24
$(c_{2}^{1}, c_{3}^{1})$	0.68	0.31
$(c_{2}^{2}, c_{3}^{1})$	0.89	0.36
$(c_{2}^{1}, c_{3}^{2})$	0.77	0.28
$(c_{2}^{2}, c_{3}^{2})$	0.78	0.35
$(c_{2}^{1}, c_{3}^{3})$	0.72	0.27
$(c_{2}^{2}, c_{3}^{3})$	0.52	0.19

The optimal cumulative probability of candidate points at the last point $p_{n}$ of the trajectory takes into account the cumulative probability factors of candidate points of all previous GPS points, so the candidate points with the maximum $mc (c_{n}^{z})$ value in $p_{n}$ are part of the final matching route, and the sequence of candidate points from $c_{n}^{z}$ backward to the starting trajectory point is the sequence of optimal candidate points.

As shown in Table 1, the candidate point $c_{3}^{1}$ of $p_{3}$ has $mc (c_{3}^{1}) = 0.36 > mc (c_{3}^{2}) > mc (c_{3}^{3})$ , and the optimal candidate point sequence ${c_{1}^{1}, c_{2}^{2}, c_{3}^{1}}$ is obtained by going back from $c_{3}^{1}$ . The matching route of vehicle driving is composed of the optimal candidate point sequence. The goal of map matching is to find a road network route that is closest to the driving trajectory, the return result is the optimal candidate point sequence, and the matching vehicle driving route is composed of the optimal candidate point sequence and its corresponding candidate route.

The map matching process is the most time-consuming when it comes to calculating the transfer probabilities of candidate routes, as many shortest route queries have to be performed on the road network. In the traditional HMM-based map-matching algorithm, neighboring candidate points are fully connected and all transfer probabilities have to be calculated, which is very time-consuming. As shown in Table 1, the trajectory $Traj : p_{1} \to p_{2} \to p_{3}$ requires the transition probabilities of 12 candidate routes to be calculated and 12 shortest route queries to be performed.

To address this computational bottleneck, this paper proposes the PFMM algorithm, which reduces most of the shortest route query process through pruning optimisation. As shown in Figure 6, the algorithm can prune four candidate routes, which can reduce part of the computational cost. The performance improvement of this algorithm is especially significant in complex dense road networks. Algorithm 2 shows the map-matching algorithm framework.

Figure 6.

A sample of $p_{1} \to p_{2} \to p_{3}$ candidate graph after pruning.

Algorithm 2: Algorithm framework
Input: Trajectory $Traj : p_{1} \to p_{2} \to \dots \to p_{n}$ road network G(V,E), search radius r Output: Matching result sequence: $c_{1}^{s_{1}} \to c_{2}^{s_{2}} \to \dots \to c_{n}^{s_{n}}$ 1 GSet← Ø, TargetSet← Ø 2 for each p_i in Traj do 3 /Extract candidate set/ 4 Candi←ExtractCandidateSet(pi, r,G) 5 GCSet.add(Candi) 6 end 7 G_Traj←Construct candidate graph 8 TargetSet←PFMM(G_Traj,G) 9 return TargetSet

Algorithm 2: Algorithm framework

Input: Trajectory

Traj : p_{1} \to p_{2} \to \dots \to p_{n}

road network G(V,E), search radius r
Output: Matching result sequence:

c_{1}^{s_{1}} \to c_{2}^{s_{2}} \to \dots \to c_{n}^{s_{n}}

1 GSet← Ø, TargetSet← Ø
2 for each p_i in Traj do
3 /*Extract candidate set*/
4 Candi←ExtractCandidateSet(pi, r,G)
5 GCSet.add(Candi)
6 end
7 G_Traj←Construct candidate graph
8 TargetSet←PFMM(G_Traj,G)
9 return TargetSet

PFMM Algorithm Process

The PFMM algorithm is able to filter a portion of unnecessary candidate route probability calculations. Its algorithm is divided into three main steps: 1) Determine the baseline candidate route for candidate points, 2) Candidate route pruning optimization, and 3) Optimal matching route recovery. Algorithm 3 shows the pseudocode of PFMM.

Algorithm 3: PFMM
Input: G_Tra_j, road network G(V,E) Output: Matching result sequence: $c_{1}^{s_{1}} \to c_{2}^{s_{2}} \to \dots \to c_{n}^{s_{n}}$ 1 $pre \leftarrow null, TargetSet \leftarrow Ø$ 2 for each $c_{1}^{s}$ of $p_{1}$ do 3 $mc (c_{1}^{s}) \leftarrow pr (c_{1}^{s})$ 4 end 5 for each $p_{1}$ in $Traj$ do 6 $PathSet \leftarrow Ø$ 7 for each $c_{i}^{s}$ of $p_{i}$ do 8 $\max \leftarrow - 1$ 9 for each $c_{i - 1}^{t}$ of $p_{i - 1}$ do 10 $t r_{\min} (c_{i - 1}^{t}, c_{i}^{s})$ ← Transition probability limit value 11 if $t r_{\min} (c_{i - 1}^{t}, c_{i}^{s}) \geq 1$ 12 break; 13 end 14 $TH (c_{i - 1}^{t}, c_{i}^{s})$ ← Candidate route distance limit value 15 if $CandiRoute (c_{i - 1}^{t}, c_{i}^{s}) \notin PathSet$ then 16 $PathSet$ ← $IDijkstra (v_{s} tart, v_{e} nd, lis t_{i - 1}, lis t_{i}, G, dist, TH)$ 17 $lis t_{i}, G, dist, TH)$ 18 end 19 $length (c_{i - 1}^{t}, c_{i}^{s}) \leftarrow$ Shortest route of $c_{i - 1}^{t}$ → $c_{i}^{s}$ 20 if $length (c_{i - 1}^{t}, c_{i}^{s}) > TH (c_{i - 1}^{t}, c_{i}^{s})$ 21 break; 22 end 23 $cumu (c_{i - 1}^{t}, c_{i}^{s}) \leftarrow$ $mc (c_{i - 1}^{t}) \times tr (c_{i - 1}^{t}, c_{i}^{s}) \times pr (c_{i}^{s})$ 24 if $cumu (c_{i - 1}^{t}, c_{i}^{s}) > \max$ then 25 $\max \leftarrow cumu (c_{i - 1}^{t}, c_{i}^{s})$ 26 $pre [c_{i}^{s}] \leftarrow c_{i - 1}^{t}$ 27 end 28 $mc (c_{i}^{s}) \leftarrow \max$ 29 end 30 end 31 end 32 $c \leftarrow argma x_{c_{n}^{s}} (m c_{c_{n}^{s}})$ 33 while $mc (c) \neq null$ do 34 $TargetSet . add (c)$ 35 end 36 return $TargetSet$

Algorithm 3: PFMM

Input: G_Tra_j, road network G(V,E)
Output: Matching result sequence:

c_{1}^{s_{1}} \to c_{2}^{s_{2}} \to \dots \to c_{n}^{s_{n}}

pre \leftarrow null, TargetSet \leftarrow Ø

2 for each $c_{1}^{s}$ of $p_{1}$ do
3

mc (c_{1}^{s}) \leftarrow pr (c_{1}^{s})

4 end
5 for each $p_{1}$ in

Traj

do
6

PathSet \leftarrow Ø

7 for each

c_{i}^{s}

p_{i}

do
8

\max \leftarrow - 1

9 for each $c_{i - 1}^{t}$ of $p_{i - 1}$ do
10

t r_{\min} (c_{i - 1}^{t}, c_{i}^{s})

← Transition probability limit value
11 if

t r_{\min} (c_{i - 1}^{t}, c_{i}^{s}) \geq 1

12 break;
13 end
14

TH (c_{i - 1}^{t}, c_{i}^{s})

← Candidate route distance limit value
15 if

CandiRoute (c_{i - 1}^{t}, c_{i}^{s}) \notin PathSet

then
16

PathSet

←

IDijkstra (v_{s} tart, v_{e} nd, lis t_{i - 1}, lis t_{i}, G, dist, TH)

lis t_{i}, G, dist, TH)

18 end
19

length (c_{i - 1}^{t}, c_{i}^{s}) \leftarrow

Shortest route of

c_{i - 1}^{t}

→

c_{i}^{s}

20 if

length (c_{i - 1}^{t}, c_{i}^{s}) > TH (c_{i - 1}^{t}, c_{i}^{s})

21 break;
22 end
23

cumu (c_{i - 1}^{t}, c_{i}^{s}) \leftarrow

mc (c_{i - 1}^{t}) \times tr (c_{i - 1}^{t}, c_{i}^{s}) \times pr (c_{i}^{s})

24 if

cumu (c_{i - 1}^{t}, c_{i}^{s}) > \max

then
25

\max \leftarrow cumu (c_{i - 1}^{t}, c_{i}^{s})

pre [c_{i}^{s}] \leftarrow c_{i - 1}^{t}

27 end
28

mc (c_{i}^{s}) \leftarrow \max

29 end
30 end
31 end
32

c \leftarrow argma x_{c_{n}^{s}} (m c_{c_{n}^{s}})

33 while

mc (c) \neq null

do
34

TargetSet . add (c)

35 end
36 return

TargetSet

Determine the Baseline Candidate Route for Candidate Points

For each candidate point in the candidate graph for $p_{i} (i > 1)$ , first select the candidate point with the largest value of the optimal cumulative probability $mc$ in $p_{i - 1}$ , set it as $c_{i - 1}^{t}$ , take $CandiRoute (c_{i - 1}^{t}, c_{i}^{s})$ as the baseline candidate route and also the current optimal candidate route, and calculate the current maximum cumulative probability $cumu (c_{i - 1}^{t}, c_{i}^{s})$ .

Candidate Route Pruning Optimization

For the remaining unvisited candidate points of $p_{i - 1}$ , the candidate point with the largest mc value is selected and set as $c_{i - 1}^{w}$ , and the candidate route $CandiRoute (c_{i - 1}^{w}, c_{i}^{s})$ enters the pruning strategy. Based on the current benchmark candidate route, the values of transition probability and candidate route distance of $CandiRoute (c_{i - 1}^{w}, c_{i}^{s})$ are restricted to pruning, and the candidate route that does not satisfy the pruning condition is calculated with its cumulative probability $cumu (c_{i - 1}^{w}, c_{i}^{s})$ . If $cumu (c_{i - 1}^{w}, c_{i}^{s}) > cumu (c_{i - 1}^{t}, c_{i}^{s})$ , then the candidate route $CandiRoute (c_{i - 1}^{w}, c_{i}^{s})$ replaces the current baseline route. Then repeat step 2 until all candidate points of $p_{i - 1}$ have been visited to obtain the final benchmark candidate route of $c_{i}^{s}$ , that is, the optimal candidate route.

In particular, for the candidate route $CandiRoute (c_{i - 1}^{w}, c_{i}^{s})$ , the current maximum cumulative probability value of the candidate point $c_{i}^{s}$ is known, and the transition probability of the candidate route $CandiRoute (c_{i - 1}^{w}, c_{i}^{s})$ is inverted according to Equation 9 with a finite value of transition probability, denoted as $tr (c_{i - 1}^{w}, c_{i}^{s})$ . If the candidate route $CandiRoute (c_{i - 1}^{w}, c_{i}^{s})$ wants to replace the current optimal candidate route, then its transition probability value must be greater than the transition probability limit. The transfer probability limit is calculated as follows:

t r_{\min} (c_{i - 1}^{w}, c_{i}^{s}) = \frac{cumu (c_{i - 1}^{t}, c_{i}^{s})}{pr (c_{i}^{s}) \times mc (c_{i - 1}^{w})}, 1 < i \leq n,

(11)

where

$pr (c_{i}^{s})$ = the observation probability of candidate point $c_{i}^{s}$ ,

$cumu (c_{i - 1}^{t}, c_{i}^{s})$ = the current maximum cumulative probability value of candidate point $c_{i}^{s}$ , and

$mc (c_{i - 1}^{w})$ = the optimal cumulative probability of $c_{i - 1}^{w}$ .

Pruning strategy 1 and pruning strategy 2 are proposed based on the transition probability limit values.

Pruning Strategy 1

If the transition probability is bounded to $t r_{\min} (c_{i - 1}^{w}, c_{i}^{s}) \geq 1$ , the candidate route and the remaining uncomputed candidate routes are directly pruned, as shown in Theorem 1.

Theorem 1: Given a candidate point $c_{i}^{s}$ and a candidate route $CandiRoute (c_{i - 1}^{w}, c_{i}^{s})$ , if the transition probability of this candidate route is limited to $t r_{\min} (c_{i - 1}^{w}, c_{i}^{s}) \geq 1$ , then this route and the candidate routes computed after it must not replace the current optimal candidate route.

Proof: The $p_{i - 1}$ candidate points are known to be visited in descending order according to the maximum cumulative probability $mc$ value, so the candidate points $c_{i - 1}^{z}$ computed after the candidate point $c_{i - 1}^{w}$ have $mc (c_{i - 1}^{z}) < mc (c_{i - 1}^{w})$ . From Equation 11 we get $t r_{\min} (c_{i - 1}^{z}, c_{i}^{s}) > t r_{\min} (c_{i - 1}^{w}, c_{i}^{s})$ and, since $t r_{\min} (c_{i - 1}^{w}, c_{i}^{s}) \geq 1$ , there must be $t r_{\min} (c_{i - 1}^{z}, c_{i}^{s}) \geq 1$ . By Equation 8, the transition probabilities take values less than 1, so, if $t r_{\min} (c_{i - 1}^{w}, c_{i}^{s}) \geq 1$ , it is not possible to satisfy $tr (c_{i - 1}^{w}, c_{i}^{s}) > t r_{\min} (c_{i - 1}^{w}, c_{i}^{s})$ , and the candidate route must not replace the current optimal candidate route.

If the transition probability is limited to $t r_{\min} (c_{i - 1}^{w}, c_{i}^{s}) < 1$ , then the transition probability must be further evaluated, and the route is retained only if $tr (c_{i - 1}^{w}, c_{i}^{s}) > t r_{\min} (c_{i - 1}^{w}, c_{i}^{s})$ , as shown in Theorem 2.

Theorem 2: Given a candidate point and a candidate route $CandiRoute (c_{i - 1}^{w}, c_{i}^{s})$ , if the candidate route wants to replace the baseline candidate route so that it is likely to be part of the final matching result, then the transition probability value of $CandiRoute (c_{i - 1}^{w}, c_{i}^{s})$ must be greater than its transition probability limit $t r_{\min} (c_{i - 1}^{w}, c_{i}^{s})$ .

Proof: The current maximum cumulative probability value $cumu (c_{i - 1}^{t}, c_{i}^{s})$ of the candidate point $c_{i}^{s}$ is known, and for the candidate point $c_{i - 1}^{w}$ calculated after $c_{i - 1}^{t}$ . If $CandiRoute (c_{i - 1}^{w}, c_{i}^{s})$ is to replace the baseline candidate route, then there must be a cumulative probability $cumu (c_{i - 1}^{w}, c_{i}^{s}) > cumu (c_{i - 1}^{t}, c_{i}^{s})$ . From Equation 9 we get: $mc (c_{i - 1}^{w}) \times tr (c_{i - 1}^{w}, c_{i}^{s}) \times pr (c_{i}^{s}) > cumu (c_{i - 1}^{t}, c_{i}^{s})$ , so there is $tr (c_{i - 1}^{w}, c_{i}^{s}) > (\frac{cumu (c_{i - 1}^{t}, c_{i}^{s})}{mc (c_{i - 1}^{w}) \times pr (c_{i}^{s})})$ , and from Equation 11 we get $tr (c_{i - 1}^{w}, c_{i}^{s}) > t r_{\min} (c_{i - 1}^{w}, c_{i}^{s})$ .

To determine whether the transition probability satisfies $tr (c_{i - 1}^{w}, c_{i}^{s}) >$ $t r_{\min} (c_{i - 1}^{w}, c_{i}^{s})$ , it is still necessary to compute the transition probability $tr (c_{i - 1}^{w}, c_{i}^{s})$ , which requires a shortest route query. Therefore, for further optimisation, the candidate route distance limit is backpropagated based on the known transition probability limit by Equation 11, denoted as $TH (c_{i - 1}^{w}, c_{i}^{s})$ . If the candidate route $CandiRoute (c_{i - 1}^{w}, c_{i}^{s})$ can replace the baseline candidate route, then its shortest route distance value must be less than the candidate route distance limit value. From Equations 7 and 8, we know the transition probability using the ratio of the Euclidean distance from $p_{i - 1}$ to $p_{i}$ and the road network distance, so $\frac{1}{\sqrt{2 π σ_{3}^{2}}} \exp (- \frac{{(1 - \frac{dis (p_{i - 1}, p_{i})}{length (c_{i - 1}^{t}, c_{i}^{s})})}^{2}}{2 σ_{3}^{2}}) > t r_{\min} (c_{i - 1}^{w}, c_{i}^{s})$ .

The candidate route distance limit is calculated as follows:

TH (c_{i - 1}^{w}, c_{i}^{s}) = \frac{dis (p_{i - 1}, p_{i})}{1 - σ_{3} \sqrt{2 \sqrt{2 π} \times t r_{\min} (c_{i - 1}^{w}, c_{i}^{s}) \times \ln [\frac{1}{σ_{3}}]}}

(12)

where

$t r_{\min} (c_{i - 1}^{w}, c_{i}^{s}) < 1$ ,

$dis (p_{i - 1}, p_{i})$ = the Euclidean distance between GPS points $p_{i - 1}$ and $p_{i}$ ,

$t r_{\min} (c_{i - 1}^{w}, c_{i}^{s})$ = the transition probability limit, and

$σ_{3}$ = the standard deviation of the transition probability.

Pruning Strategy 2

If the transition probability limit $t r_{\min} (c_{i - 1}^{w}, c_{i}^{s}) < 1$ and the candidate route distance $length (c_{i - 1}^{w}, c_{i}^{s}) > TH (c_{i - 1}^{w}, c_{i}^{s})$ , the candidate route is pruned, as shown in Theorem 3.

Theorem 3: Given a candidate point $c_{i}^{s}$ and a candidate route $CandiRoute (c_{i - 1}^{w}, c_{i}^{s})$ , and that candidate route transition probability limited to $t r_{\min} (c_{i - 1}^{w}, c_{i}^{s}) < 1$ , if that candidate route is to replace the baseline candidate route, then its road network distance $length (c_{i - 1}^{w}, c_{i}^{s})$ must be less than the candidate route distance limit $TH (c_{i - 1}^{w}, c_{i}^{s})$ , otherwise the candidate route must not become a baseline candidate route.

Proof: Theorem 2 shows that $CandiRoute (c_{i - 1}^{w}, c_{i}^{s})$ can only replace the benchmark candidate route when $tr (c_{i - 1}^{w}, c_{i}^{s}) > t r_{\min} (c_{i - 1}^{w}, c_{i}^{s})$ , with $\frac{1}{\sqrt{2 π σ_{3}^{2}}} \exp (- \frac{{(1 - routesim)}^{2}}{2 σ_{3}^{2}}) > t r_{\min} (c_{i - 1}^{w}, c_{i}^{s})$ , further

$length (c_{i - 1}^{w}, c_{i}^{s}) > \frac{dis (p_{i}, c_{m})}{1 - σ_{3} \sqrt{2 \sqrt{2 π} \times t r_{\min} (c_{i - 1}^{w}, c_{i}^{s}) \times \lg (\frac{1}{σ_{3}})}}$ , with

$length (c_{i - 1}^{w}, c_{i}^{s}) > TH (c_{i - 1}^{w}, c_{i}^{s})$ .

Optimal Matching Route Recovery

When the last $p_{n}$ candidate points of the trajectory are all calculated, the candidate point with the highest cumulative probability of being the best among the $p_{n}$ candidate points is selected and retraced until the complete matching route is obtained.

As shown in Table 2, the matching procedure for the $Traj : p_{1} ß p_{2} ß p_{3}$ is based on limited value pruning. The computational process is as shown in Table 2.

Table 2.

Global Matching Calculation of p1 →p2 →p3

Candidate routes	Transition probability limit value	Candidate route distance limit value	Candidate route distance	Transition probability	Cumulative probability
$(c_{1}^{1}, c_{2}^{1})$	$- \infty$	$+ \infty$	320	0.75	0.53
$(c_{1}^{2}, c_{2}^{1})$	0.84	309	> 309	NA	NA
$(c_{1}^{3}, c_{2}^{1})$	1.07	NA	NA	NA	NA
$(c_{1}^{1}, c_{2}^{2})$	$- \infty$	$+ \infty$	300	0.91	0.47
$(c_{1}^{2}, c_{2}^{2})$	0.74	367	> 367	0.75	NA
$(c_{1}^{3}, c_{2}^{2})$	1.29	NA	NA	NA	NA
$(c_{2}^{1}, c_{3}^{1})$	$- \infty$	$+ \infty$	450	0.68	0.31
$(c_{2}^{2}, c_{3}^{1})$	0.77	434	412	0.89	0.36
$(c_{2}^{1}, c_{3}^{2})$	$- \infty$	$+ \infty$	433	0.77	0.28
$(c_{2}^{2}, c_{3}^{2})$	0.85	417	420	0.78	0.35
$(c_{2}^{1}, c_{3}^{3})$	$- \infty$	$+ \infty$	436	0.72	0.27
$(c_{2}^{2}, c_{3}^{3})$	1.02	NA	NA	0.52	0.19

Note: NA = not available.

First, the point $c_{2}^{1}$ with the largest observation probability value among the candidate points of $p_{2}$ is selected, the initial transition probability of each candidate point is limited to $t r_{\min} = - \infty$ , the point $c_{1}^{1}$ with the largest $mc$ value among the candidate points of $p_{1}$ is selected, and $CandiRoute (c_{1}^{1}, c_{2}^{1})$ is used as the benchmark candidate route. The current maximum cumulative probability is $cumu (c_{1}^{1}, c_{2}^{1}) = 0.53$ .

Then, the point $c_{1}^{2}$ with the largest $mc$ value among the remaining candidate points of $p_{1}$ is selected, and $t r_{\min} (c_{1}^{2}, c_{2}^{1}) = 0.84 < 1$ for $CandiRoute (c_{1}^{2}, c_{2}^{1})$ is calculated. $TH (c_{1}^{2}, c_{2}^{1}) = 309 m$ is calculated, by shortest route query $length (c_{1}^{2}, c_{2}^{1}) > 309$ , so the route is pruned according to pruning strategy 2 .

Next, the point $c_{1}^{3}$ with the largest $mc$ value among the remaining candidate points of $p_{1}$ is selected, and the $t r_{\min} (c_{1}^{3}, c_{2}^{1})$ of $CandiRoute (c_{1}^{3}, c_{2}^{1}) = 1.07 > 1$ is calculated, so the route is pruned according to pruning strategy 1. If there are still unvisited candidate points of $p_{1}$ at this time, they are also pruned together.

The candidate point $c_{2}^{1}$ is computed, and its optimal candidate route is obtained as $CandiRoute (c_{1}^{1}, c_{2}^{1})$ with optimal cumulative probability $mc (c_{2}^{1}) = 0.53$ .

Next, the point $c_{2}^{2}$ with the largest observation probability value among the remaining candidate points of $p_{2}$ is calculated. $CandiRoute (c_{1}^{2}, c_{2}^{2})$ is pruned according to pruning strategy 2 and $CandiRoute (c_{1}^{3}, c_{2}^{2})$ is pruned according to pruning strategy 1. Finally, the optimal candidate route for $c_{2}^{2}$ is obtained as $CandiRoute (c_{1}^{1}, c_{2}^{2})$ , with the optimal cumulative probability $mc (c_{2}^{2}) = 0.47$ .

At this point, the computation between $p_{1} \to p_{2}$ ends and, similarly, the computation of $p_{2} \to p_{3}$ continues, as shown in Table 2. When the computation is completed at the last point $p_{3}$ of the trajectory, the optimal matching route is recovered, and the candidate point $c_{1}^{3}$ has the highest cumulative probability of 0.36. Then, the optimal candidate point sequence is obtained from this point backward: ${c_{1}^{1}, c_{2}^{2}, c_{1}^{3}}$ .

IDijkstra Algorithm

When the starting and ending points are near to each other, according to dwell point detection strategy, in data preprocessing, these points merge into a single stopping point. Furthermore, when discarding nodes, according to $Dc$ , if the Euclidean distance between $p_{i}$ and $p_{i + 1}$ is less than or equal to $Dc$ , then $p_{i}$ and $p_{i + 1}$ are considered the same point, and physical connectivity can be maintained. For some outliers, the velocity threshold $Vc$ is set according to the outlier filtering strategy. If the velocity of point $p_{i}$ is greater than the velocity threshold $Vc$ , point $p_{i}$ is discarded. The discarded outliers will not affect the connectivity between road segments. The impact of $Dc$ and velocity threshold $Vc$ changes on the algorithms will be explained in detail in the experiment.

This section describes the routefinding algorithm used, the IDijkstra algorithm, which minimizes the search process for the shortest route of the road network. Algorithm 4 shows the algorithm pseudocode. The basic steps of the IDijkstra algorithm are described as follows.

Algorithm 4: IDijkstra
Input: $v_{start}, v_{end}, lis t_{i - 1}, lis t_{i}, G (V, E), dist [v], TH$ Output: $RouteSet$ 1 $RouteSet \leftarrow Ø, pre [] \leftarrow null$ 2 $G_{0} \leftarrow$ Build sub road network 3 $O \leftarrow v_{start}, U \leftarrow G_{0}$ 4 while $v_{end} \notin O$ do 5 $v_{m} \leftarrow argma x_{U} (dist [v])$ 6 if $dist [v_{m}] \geq TH$ then 7 return 8 end 9 O.add(v_m) 10 O.delete(v_m) 11 foreach $v_{w}$ in $U$ do 12 if $dist [v_{w}] > (dist [v_{m}] + adjacent [v_{m}] [v_{w}])$ then 13 $dist [v_{w}] \leftarrow dist [v_{m}] + adjacent [v_{m}] [v_{w}]$ 14 $pre [v_{w}] \leftarrow v_{m}$ 15 end 16 end 17 end 18 for each $v_{x}$ in $lis t_{i}$ do 19 $Route \leftarrow \emptyset$ 20 $Route . add (v_{x})$ 21 $v \leftarrow pre [v_{x}]$ 22 while $v \neq null$ do 23 $Route . add (v)$ 24 if $v \in lis t_{i - 1}$ then 25 $RouteSet . add (Route)$ 26 break; 27 end 28 $v \leftarrow pre [v]$ 29 end 30 end 31 return $RouteSet$

Algorithm 4: IDijkstra

Input:

v_{start}, v_{end}, lis t_{i - 1}, lis t_{i}, G (V, E), dist [v], TH

Output:

RouteSet

RouteSet \leftarrow Ø, pre [] \leftarrow null

G_{0} \leftarrow

Build sub road network
3

O \leftarrow v_{start}, U \leftarrow G_{0}

4 while

v_{end} \notin O

do
5

v_{m} \leftarrow argma x_{U} (dist [v])

6 if

dist [v_{m}] \geq TH

then
7 return
8 end
9 O.add(v_m)
10 O.delete(v_m)
11 foreach $v_{w}$ in $U$ do
12 if

dist [v_{w}] > (dist [v_{m}] + adjacent [v_{m}] [v_{w}])

then
13

dist [v_{w}] \leftarrow dist [v_{m}] + adjacent [v_{m}] [v_{w}]

pre [v_{w}] \leftarrow v_{m}

15 end
16 end
17 end
18 for each $v_{x}$ in

lis t_{i}

do
19

Route \leftarrow \emptyset

Route . add (v_{x})

v \leftarrow pre [v_{x}]

22 while

v \neq null

do
23

Route . add (v)

24 if

v \in lis t_{i - 1}

then
25

RouteSet . add (Route)

26 break;
27 end
28

v \leftarrow pre [v]

29 end
30 end
31 return

RouteSet

Construct the Subroad Network

Construct the subroad network according to the target starting point to limit the scope of subsequent searches, the method is described as follows:

1) Denote the target start and end point $v_{start} \to v_{end}$ Euclidean distance as $a$ , and search the road nodes in the saved road network $G$ that do not exceed the distance a from the start and end point and deposit them in the set $G_{0}$ .

2) To search more potential road sections, for the nodes in $G_{0}$ continue to search their first-adjacent and second-adjacent road nodes. As shown in Figure 7, $v_{0}$ is a node in $G_{0}$ whose first-adjacent node $G_{1} = {v_{2}, v_{4}, v_{5}, v_{7}}$ , a node directly connected to $v_{0}$ with a certain road section; and $G_{2} = {v_{1}, v_{3}, v_{6}, v_{8}}$ is a second-adjacent node of $v_{0}$ , a node first-adjacent to at least one node in the set $G_{1}$ .

3) $G_{0} = G_{1}$ $\cup$ $G_{2}$ , $G_{0}$ is the subroad network.

Figure 7.

A sample of $v_{0}$ sub road network.

Shortest Route Search

Based on the original Dijkstra algorithm process, which is calculated from the starting point of the destination, the route-finding process consists of the following three parts:

1) Introducing the sets $O$ and $U$ ( $O$ is used to record the vertices for which the shortest route has been found, while $U$ records the vertices that have not yet been visited), and defining the $dist []$ array to store the shortest route distance of each vertex.

2) Selecting the vertex $v_{m}$ with the minimum distance in $U$ . If $dist [m] > TH$ , stop the search, otherwise $v_{m}$ joins $O$ . For any $v_{w}$ ∈ $U$ , if it satisfies: $dist [w] > dist [m] + adjacent [m] [w]$ , then $dist [w] = dist [m] + adjacent [m] [w]$ , adjacency is the road network adjacency matrix, and the array of precursor nodes $pre [w] = m$ ;

3) Looping step 2 until the destination end point is added to the visited set; defining sets $lis t_{i - 1}$ and $lis t_{i}$ to store $p_{i - 1}$ and $p_{i}$ unvisited candidate points, respectively; backtracking the pre array; marking the shortest route with the points in the $lis t_{i - 1}$ and $lis t_{i}$ sets as the start and end points, respectively; and returning the shortest route set.

Experiments

In this section, the PFMM algorithm in this paper is implemented in Java language and tested against the STD, IIVMM, and turning-point-based (TPB) map-matching algorithms. The experimental environment is AMD Ryzen 7 5700U CPU @ 1.80 GHz; 16 GB RAM; 1TB HDD, and Windows 10 OS.

Dataset Description

The dataset employed for this experiment consists of real-world road networks and vehicle trajectories. We utilized Shanghai, China, road network data, obtained from OpenStreetMap, which encompasses 215,070 road segments and 213,734 road nodes ( 32 ). The GPS data were sourced from the Shanghai taxi trajectory dataset provided by the Smart City Research Group at the Hong Kong University of Science and Technology. Each GPS point within this dataset carries attributes such as timestamp, longitude, latitude, direction, speed, and occupancy status. To evaluate the map-matching algorithm’s performance, we extracted 10 full-day taxi trajectories from the dataset. These trajectories collectively consist of 14,443 GPS points. Table 3 provides a summary of the trajectory information, detailing the number of GPS points in each trajectory and their corresponding travel times. A graphical representation of selected trajectory points is displayed in Figure 8.

Table 3.

Trajectory Information

ID	Point count	Driving time	Right (%)	Time (s)
Traj1	1,052	9 h 36 min	93	185
Traj2	1,147	12 h 25 min	89	187
Traj3	1,322	17 h 47 min	91.3	190
Traj4	1,334	19 h 26 min	88	192
Traj5	1,383	14 h 34 min	92	196
Traj6	1,404	15 h 12 min	91.1	200
Traj7	1,492	12 h 48 min	94	205
Traj8	1,614	20 h 16 min	90	210
Traj9	1,760	18 h 42 min	89	213
Traj10	1,925	21 h 37 min	91	220

Figure 8.

Point plot of partial global positioning system trajectory.

The real trajectories were matched using the ST algorithm and further refined manually. Experiments indicated that trajectory length has negligible impact on matching accuracy. To illustrate this, default parameter values were used, and results for matching accuracy (Right) and processing time (Time) across 10 trajectories are presented in Table 3. Specific GPS trajectory points were selected for map matching, with results depicted in Figure 9. In the figure, the red trajectory denotes unprocessed GPS points, the blue trajectory represents the GPS roadmap after preprocessing, and the green trajectory depicts the actual GPS route. It is evident from the figure that algorithm-derived trajectories closely align with the actual trajectories, exhibiting minimal errors.

Figure 9.

Example plot of trajectory matching.

Parameter Settings

In this paper, the sampling interval of GPS points is within 10–50 s.In the trajectory pre-processing, the velocity threshold $vc$ = 50 m/s is set in the outlier filtering stage, $Dc$ = 10 m is set in the dwell point detection stage, and $Tc$ is consistent with the trajectory sampling interval. In the candidate set extraction stage, the search radius (r) of GPS points is set to $r$ = 200 m and only the top $k$ candidate points are retained. The effect of parameter changes on the experimental results is described in detail in later sections. During the candidate route probability analysis, the distance analysis function followed a normal distribution of $μ = 0$ , $σ_{1} = 10 m$ , the direction analysis function followed a normal distribution of $μ = 0$ , $σ_{2} = 30 Â °$ , and the transfer probability followed a normal distribution of $μ = 0$ , $σ_{3} = 0.25$ according to the GPS positioning device error. Table 4 shows the values of the parameters used in the experiments.

Table 4.

The Values of the Parameters Used in the Experiments

Parameter	Description	Value
r	Search radius	200 m
k	Number of candidate points	5
vc	Speed threshold	50 m/s
Dc	Distance threshold	15 m
$σ$	Standard deviation	10 m, 30°, 0.25

In the experiments conducted for this paper, the PFMM algorithm is benchmarked against STD, IIVMM, TPB, and L2MM algorithms (1, 2, 24, 26). For the STD, IIVMM, and L2MM algorithms, the observation probability parameters are initialized to their default settings as prescribed in this work. Meanwhile, the remaining parameters for these three algorithms are fine-tuned in accordance with the specifications detailed in the pertinent literature.

Evaluation Criteria Setting

This section describes three criteria for measuring the performance of this paper and the comparison algorithm, and one for the PFMM algorithm.

1) Percentage of correctly matched $sectionsright$ :

r i g h t = \frac{N u m b e r o f c o r r e c t m a t c h p o i n t s}{N u m b e r o f p o i n t s t o b e m a t c h e d} \times 100 %

(13)

2) Running time of the algorithm on the same platform: Time

3) The length of the matching result route: Glength

4) The number of prunings during the matching process of the PFMM algorithm: pruning

The performance of the algorithm can be measured in many aspects through the above four criteria. The first criterion, Right, is determined by the number of matched road segments, and the maximum value of Right is set to 100% in the experiment. The second standard, $Time$ , represents the total time for trajectory matching to complete on the same platform. The third criterion represents the distance of the final output trajectory matching path, which reflects the detour degree of the matching result path. The fourth criterion $pruning$ is for the PFMM algorithm in this paper, which represents the number of candidate paths that are pruned together in the matching process using pruning strategy 1 and pruning strategy 2.

Impact of Changing the Speed Threshold (vc)

In this section, the performance of the PFMM, STD, IIVMM, TPB, and L2MM algorithms under different values of vc in the trajectory preprocessing outlier filtering stage is compared (1, 2, 24, 26). In general, when the vc value is too large, it is difficult to filter outliers, while, when the vc value is too small, some normal points will be filtered, affecting the matching accuracy.

Figure 10a shows the comparison of the accuracy of the four algorithms under different vc values; on average, the accuracy of the PFMM algorithm is about 3.7% higher than that of the STD, IIVMM, and TPB algorithms (1, 2, 24). However, compared with the L2MM algorithm, the accuracy of the PFMM algorithm is about 15%. Figure 10b shows the influence of different vc values on the matching time. On average, the running time of the PFMM algorithm is reduced by about 34.1% compared with the comparison algorithms. Figure 10c shows the influence of different vc values on the matching path length; it can be seen that the matching result path length of the proposed algorithm is shorter, and the detour degree is reduced by about 2.8%. Figure 10d shows the pruning of the PFMM algorithm, where the number of prunings increases as the value of vc increases. According to the experimental results, it can be seen that, when the vc value is 50 m/s, the performance of the algorithm in all aspects is better, so the default vc value is 50 m/s in the following experiments.

Figure 10.

The impact of speed threshold (vc) value on algorithms: (a) The impact of speed threshold (vc) value on algorithms rights; (b) The impact of speed threshold (vc) value on algorithms Time; (c) The impact of speed threshold (vc) value on Glength; and (d) The impact of speed threshold (vc) value on pruning.

Impact of Changing the Distance Threshold (Dc)

In this section, the performance of the PFMM, STD, IIVMM, TPB, and L2MM algorithms under different values of $Dc$ in trajectory preprocessing stop point detection stage is compared (1, 2, 24, 26). In general, when the $Dc$ value is large, the normal trajectory points may be clustered and the accuracy will decrease, while, when the $Dc$ value is too small, some stop points will not be clustered. As a result, the amount of calculation becomes larger.

Figure 11a shows the influence of different $Dc$ values on the accuracy of the algorithm; on average, the accuracy of the PFMM algorithm is improved by about 2.1% compared with that of the STD, IIVMM, and TPB algorithms. However, compared with the L2MM algorithm, the accuracy of the PFMM algorithm is about 13%. Figure 11b shows that the running time of the PFMM algorithm is reduced by about 35.2% compared with the other algorithms. Figure 11c shows the matching path length of the four algorithms, and it can be seen that the matching path length of the PFMM algorithm is approximately 2.2% shorter than that of the other algorithms. Figure 11d shows that the number of pruning times of the PFMM algorithm decreases with the increase of the $Dc$ value. According to the experimental results, the performance of the algorithm is better when the $Dc$ value is 15 m, so the default value of $Dc$ in this paper is set to 15 m.

Figure 11.

The impact of distance threshold (Dc) value on algorithms: (a) The impact of distance threshold (Dc) value on algorithms rights; (b) The impact of distance threshold (Dc) value on algorithms Time; (c) The impact of distance threshold (Dc) value on Glength; and (d) The impact of distance threshold (Dc) value on pruning.

Impact of Number of Candidate Points (k)

This section compares the performance of the PFMM, IIVMM, STD, and L2MM algorithms for different values of the number of candidate points ( $k$ ). The TPB algorithm is not included in the comparison as it is based on the matching of trajectory segments. Figure 12a illustrates the comparison of the correct matching rates for these algorithms under different k values. On average, the algorithms presented in this paper demonstrate a superiority of 3.4% in accuracy over the other four algorithms. The accuracy increases as the k value gradually becomes larger within a certain range. This is because the algorithms consider not only distance information but also road topology information. Matching accuracy improves when more candidate points are provided within a specific range. However, accuracy decreases when the k value is excessively large, as it affects the matching results. For instance, candidate points from parallel road sections that are farther apart may receive higher similarity scores, thereby influencing the matching outcomes. In Figure 12b, the matching time increases as the k value increases because the shortest route computation becomes larger with the increase in candidate points. The increase in Time is significantly smaller than the other two algorithms because of the pruning filtering of candidate points based on limited values. Figure 12c shows the variation in the route length of the algorithm’s matching results, and it can be seen that the PFMM algorithm reduces the route length by an average of 3.5% compared with the other two algorithms. Figure 12d shows the pruning of the PFMM algorithm, and it can be seen that the larger the value of k, the greater the number of prunings. The experimental results show that the algorithm performs better when the value of $k$ is 5, so the default value of $k$ in this paper is 5.

Figure 12.

The impact of the number of candidate points (k) value on algorithms: (a) The impact of candidate points (k)value on algorithms rights; (b) The impact of candidate points (k) value on algorithms Time; (c) The impact of candidate points (k) value on Glength; and (d) The impact of candidate points (k) value on pruning.

Impact of Candidate Search Radius (r)

This section describes the effect of changing the candidate point $r$ on the matching time of the PFMM, IIVMM, and STD algorithms. As TPB is a trajectory-segment-based matching algorithm, there is no candidate point search process, so it is not included in the comparison experiment. As can be seen from Figure 13, the Time value changes proportionally to the $r$ value, because, in the the process of increasing r, the number of candidate segments within the range becomes larger, thus increasing the computation. In the experiments in this paper, the $r$ value is set to 200 m. Search the top $k$ candidate points: if the current r within the range of less than $k$ candidate points, then increase the $r$ value to continue the search until $k$ is satisfied.

Figure 13.

The impact of search radius (r) value on algorithms.

Algorithm Performance at Different Sampling Frequencies

In this section, the trajectory dataset is resampled and the algorithm performance of the PFMM, IIVMM, TPB, STD, and L2MM algorithms is compared for trajectories with different sampling frequencies.

Figure 14a shows the difference between the PFMM algorithm in this paper and the comparison algorithm when comparing the Right values for trajectories with different sampling frequencies. The PFMM algorithm has a 2.2% higher Right than other existing algorithms, the difference between the Right of the PFMM algorithm and the IIVMM is smaller, and the PFMM performs better at low sampling rates and can improve the algorithm performance in complex road conditions because of the inclusion of pre-selected point analysis. The performance of the algorithm in complex road conditions is improved by the inclusion of pre-selected point analysis. The PFMM algorithm reduces the cost of shortest route computation by pruning with a bounded value rule. On average, the running time cost of the PFMM algorithm is reduced by 29.6% and the matching efficiency is significantly improved. Figure 14c shows the variation of the algorithm’s matching route length at different sampling frequencies. This is because, when the sampling frequency is too high, the algorithm directly queries the shortest route between two points and considers fewer trajectory points; on average, the matching route length of the PFMM algorithm is reduced by 4.6% compared with other algorithms. Figure 14d shows the change in the number of prunings for the PFMM algorithm, with fewer trajectory points and fewer prunings as the sampling frequency increases.

Figure 14.

The impact of sampling intervals value on algorithms: (a) The impact of sampling intervals value on algorithms rights; (b) The impact of sampling intervals value on algorithms Time; (c) The impact of sampling intervals value on Glength and (d) The impact of sampling intervals value on pruning.

Conclusion

This paper investigates the map-matching problem based on GPS trajectories. Compared with existing work, this article provides a faster and more accurate mapping processing method, and proposes a secondary computation candidate point extraction strategy and a fast matching strategy based on limited value pruning to solve the bottleneck and accuracy problems of existing map-matching algorithms in routing computation. The algorithm makes use of road intermediate point information and considers more factors in the candidate point extraction stage to improve the matching accuracy; meanwhile, two pruning strategies and the IDijkstra routefinding algorithm are proposed to reduce unnecessary routing computations and thus improve the matching efficiency. In addition, the initial trajectory is processed by outlier filtering and dwell point detection methods, which reduce the number of trajectory input GPS points and lower the computational cost, providing a more efficient and accurate solution for map-matching applications. This helps to improve the efficiency and accuracy of some specific problems in the Geographic Information System (GIS) field, such as navigation processing and traffic management. The map-matching algorithm proposed in this paper is suitable for offline scenarios, to which a real-time matching prediction model can be added in subsequent work to make it applicable to real-time scenarios and reduce latency. Through this article, we provide readers with a novel approach to solving the mapmath problem. In future work, we will extend this method to big data environments, using parallel and distributed methods to solve the mapping problem in large-scale data, to cope with the growing traffic data. The code of our model are available on our GitHub repository (https://github.com/bugerror404/mapmatch.git)

Footnotes

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: Bai Mei, Niu Yujing; data collection: Li Chunye, Wang Xite; analysis and interpretation of results: Ma Qian, Li Guanyu; draft manuscript preparation: Bai Mei, Niu Yujing. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the National Natural Science Foundation of China (Grant numbers 62002039, 61702072, 61602076, and 61976032).

ORCID iD

Li Chunye

References

Hsueh

Y.-L.

Chian

Map Matching for Low-Sampling-Rate GPS Trajectories by Exploring Real-Time Moving Directions. Information Sciences, Vol. 433–434, 2018, pp. 55–69.

Shenglong

Juan

Houpan

IIVMM: An Improved Interactive Voting-Based Map Matching Algorithm for Low-Sampling-Rate GPS Trajectories. Computer Science, Vol. 46, No. 9, 2019, pp. 325–332.

Tanaka

Hirano

Nobuta

Itoh

Tsunoda

Navigation System with Map- Matching Method. SAE Technical Paper 900471, Society of Automotive Engineers, Detroit, 1990, pp. 45–50.

Bernstein

Kornhauser

An Introduction to Map Matching for Personal Navigation Assistants. Geometric Distributions, Vol. 122, 1996, pp. 1082–1083.

White

C. E.

Bemstein

Komhauser

A. L.

Some Map Matching Algorithms for Personal Navigation Assistants. Transportation Research Part C: Emerging Technologies, Vol. 8, No. 1–6, 2000, pp. 91–108.

Taylor

Blewitt

Steup

Corbett

Car

Road Reduction Filtering for GPS-GIS Navigation. Transactions in GIS, Vol. 5, No. 2, 2001, pp. 193–207.

Alt

Efrat

Rote

Wenk

Matching Planar Maps. Journal of Algorithms, Vol. 49, No. 2, 2003, pp. 262–283.

Greenfeld

J. S.

Matching GPS Observations to Locations on a Digital Map. Presented at 81st Annual Meeting of the Transportation Research Board, Washington, D.C., 2002.

Quddus

M. A.

Ochieng

W. Y.

Zhao

Noland

R. B.

A General Map Matching Algorithm for Transport Telematics Applications. GPS Solutions, Vol. 7, No. 3, 2003, pp. 157–167.

10.

Cui

Bian

Wang

Hidden Markov Map Matching Based on Trajectory Segmentation with Heading Homogeneity. GeoInformatica, Vol. 25, 2021, pp. 179–206.

11.

Obradovic

Lenz

Schupfner

Fusion of Sensor Data in Siemens Car Navigation System. IEEE Transactions on Vehicular Technology, Vol. 56, 2007, pp. 43–50.

12.

Taguchi

Koide

Yoshimura

Online Map Matching with Route Prediction. IEEE Transactions on Intelligent Transportation Systems, Vol. 20, No. 1, 2019, pp. 338–347.

13.

Quddus

M. A.

Noland

R. B.

Ochieng

W. Y.

A High Accuracy Fuzzy Logic Based Map Matching Algorithm for Road Transport. Journal of Intelligent Transportation Systems, Vol. 10, 2006, pp. 103–115.

14.

Liu

Tan

C. W.

Bao

Development and Application of an Enhanced Kalman Filter and Global Positioning System Error- Correction Approach for Improved Map-Matching. Journal of Intelligent Transportation Systems, Vol. 14, 2010, pp. 27–36.

15.

Kim

Adaptive Fuzzy-Network Based C-Measure Map- Matching Algorithm for Car Navigation System. IEEE Transactions on Industrial Electronics, Vol. 48, No. 2, 2001, pp. 432–440.

16.

Chen

A Adaptive Map Matching Algorithm Based on Fuzzy-Neural-Network for Vehicle Navigation System. Proc. 7th World Congress on Intelligent Control and Automation (WCICA), Chongqing, June 25–27, 2008, IEEE, New York, pp. 4448–4452.

17.

Haibin

Jiansheng

Chaozhen

A Integrated Map Matching Algorithm Based on Fuzzy Theory for Vehicle Navigation System. Proc. International Conference on Computational Intelligence and Security (ICCIAS), Guangzhou, China, November 3–6, 2006, IEEE, New York, pp. 916–919.

18.

Pyo

J.-S.

Shin

D.-H.

Sung

T.-K.

Development of a Map Matching Method Using the Multiple Hypothesis Technique. Proc. IEEE Intelligent Transportation Systems, Oakland, CA, August 25–29, 2001, IEEE, New York, pp. 23–27.

19.

Wylie

Zhu

Following a Curve with the Discrete Fréchet Distance. Theoretical Computer Science, Vol. 556, 2014, pp. 34–44.

20.

Hunter

Herring

Abbeel

Bayen

A. M.

The Path Inference Filter: Model-Based Low-Latency Map Matching of Probe Vehicle Data. IEEE Transactions on Intelligent Transportation Systems, Vol. 15, 2011, pp. 507–529.

21.

Liu

A ST-CRF Map-Matching Method for Low-Frequency Floating Car Data. IEEE Transactions on Intelligent Transportation Systems, Vol. 18, 2017, pp. 1241–1254.

22.

Zhou

Map Matching Based on Conditional Random Fields and Route Preference Mining for Uncertain Trajectories. Mathematical Problems in Engineering, Vol. 2015, 2015, pp. 1–13.

23.

Lou

Zhang

Zheng

Xie

Wang

Huang

Map-Matching for Low-Sampling-Rate GPS Trajectories. Proc. 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Washington, SA, November 4–6, 2009, ACM, New York, pp. 352–361.

24.

Zhang

Dong

Guo

A Turning Point-Based Offline Map Matching Algorithm for Urban Road Networks. Information Sciences, Vol. 565, No. 2, 2021, pp. 32–45.

25.

Feng

Zhao

Xia

Zhang

Jin

DeepMM: Deep Learning Based Map Matching with Data Augmentation. IEEE Transactions on Mobile Computing, Vol. 21, No. 7, 2020, pp. 2372–2384.

26.

Jiang

Chen

C.-X.

Chen

L2MM: Learning to Map Matching with Deep Models for Low-Quality GPS Trajectory Data. ACM Transactions on Knowledge Discovery from Data, Vol. 17, No. 3, 2023, pp. 1–25.

27.

Liu

Luo

Huang

Zou

Wang

Liu

GraphMM: Graph-Based Vehicular Map Matching by Leveraging Trajectory and Road Correlations. IEEE Transactions on Knowledge and Data Engineering, Vol. 36, No. 1, 2023, pp. 184–198.

28.

Hashemi

Karimi

H. A.

A Machine Learning Approach to Improve the Accuracy of GPS-Based Map-Matching Algorithms (Invited Thesis). Proc. IEEE 17th International Conference on Information Reuse and Integration (IRI), Pittsburgh, PA, July 28–30, 2016, IEEE, New York, pp. 77–86.

29.

Shen

Zhao

Zou

DMM: Fast Map Matching for Cellular Data. Proc. 26th Annual International Conference on Mobile Computing and Networking, London, September 21–25, 2020, ACM, New York, pp. 1–14.

30.

Hao

X. U.

Liu

Tan

C. W.

Bao

Development and Application of an Enhanced Kalman Filter and Global Positioning System Error-Correction Approach for Improved Map-Matching. Journal of Intelligent Transportation Systems, Vol. 14, 2010, pp. 27–36.

31.

Jin

Kim

Yeo

Choi

Transformer-Based Map Matching Model with Limited Ground-Truth Data Using Transfer-Learning Approach. arXiv, abs/2108.00439, 2021.

32.

Open Street Map. OpenStreetMap. 2007. https://www.openstreetmap.org.