Abstract
We proposed an estimator for traffic volumes at signalized intersections using only sparse trajectory data. Each pair of observed trajectories defined a statistical event, from which traffic volume was inferred. The interaction between trajectory stop distances, arrival speeds, and signal plan defined different classes of statistical events, with distinct likelihood expressions. Contrary to recent approaches found in the literature, our method addressed oversaturation and residual queues. We also proposed a method to estimate stopbar location, crucial to properly estimating stop distances and queue lengths. Arrivals were assumed to follow a negative exponential distribution., and the method was compatible with any kind of control. The signal plan was assumed to be known, but an estimated signal plan could also be used. The estimation was formulated as a maximum marginal likelihood problem. We showed the problem to be globally concave and, thus, optimally solvable by standard gradient-based methods. The estimator was first validated by an event-based Lighthill–Whitham–Richards simulation, suppressing any measurement errors from trajectories and uncertainty from driver behavior. The estimator was found to consistently show bias below 10% in low-penetration-rate situations, when the
Traffic volume is a fundamental variable in traffic signal control. Typically, volume counts are obtained from loop detectors placed in fixed locations on the road. More recently, vehicle counts and speeds have also been obtainable from cameras and radar. However, physical detection is still expensive, and measurements are available only where such equipment is installed. Therefore, full-network coverage is generally not available.
A different source of widely available data is trajectory data. These data readily provide information about incomplete queue lengths and measurements of vehicle delays and speeds. However, they do not directly provide traffic volume information—this must be inferred. Instead, probe vehicles are able to provide better coverage on large networks at a much lower cost.
As a consequence, traffic volume estimation from trajectory data has recently received interest in the literature. Zheng and Liu proposed a maximum likelihood estimation that assumed a time-dependent Poisson distribution for undersaturated conditions ( 1 ). Zhao et al. proposed a volume and queue length estimation method using a Bayes rule-based closed formula with an isotonic regression as the first step ( 2 ). Wang et al. presented a Bayesian-network method to estimate volumes in undersaturated junctions, solved by expected-maximization, using trajectory delays, and imposing a Lighthill–Whitham–Richards (LWR) model for shockwave estimation ( 3 ). Seo et al. proposed fundamental diagram estimation for the freeway context using probe-vehicle data ( 4 ). Tang et al. presented a method to estimate volumes based on tensor decomposition ( 5 ). Yao et al. combined shockwave theory and maximum likelihood to estimate cycle-by-cycle volumes ( 6 ). However, cycle-by-cycle shockwave estimation is unreliable for low sample ratios, that is, when one trajectory per cycle is observed. The balance between free-flow and stopped trajectories is calculated via the platoon ratio, defined in the Highway Capacity Manual ( 7 ).
In addition to work on volume estimation, there is extensive literature on queue length estimation and other indices that use probe-vehicle data and other mobile sensors. Hao et al. ( 8 ) and Hao et al. ( 9 ) proposed a method based on Bayesian networks to estimate the order of probe-vehicle positions in vehicle arrivals and departures. Hao and Ban explored queue length estimation based on mobile data ( 10 ). Sun and Ban applied variational traffic flow theory to retrieve a population of vehicle trajectories based on probe-vehicle data ( 11 ). Their objective was to acquire volume information, assuming uniform arrivals between each pair of observed trajectories. More recently, Zhang et al. ( 12 ) and Mei et al. ( 13 ) estimated queue lengths via frequentist and Bayesian methods, respectively.
In this article, we propose a method for estimating volumes in signalized approaches using only trajectory data. The goal is to confidently estimate average volumes over a period of time, requiring only sparse trajectory data and signal plan information. Specifically, the features required are stop locations, vehicle arrival times, stopbar crossing times, and signal plan information.
A maximum likelihood estimation was performed, in which each observation captured the traffic characteristics of an interval defined by two consecutive trajectories. We assumed exponential arrivals since the experimental setting had independent arrivals. However, the method could incorporate arrival patterns from adjacent junctions if necessary, as in research undertaken by Zheng and Liu ( 1 ). We present the case for pretimed control, but the method could also be applied for adaptive signal control as long as the signal plan is known.
The contributions of this article are both methodological and experimental. First, we extend the work carried out by Zheng and Liu ( 1 ) to include oversaturation events. Second, we are the first to incorporate stopbar location estimation in a volume estimation problem—an issue that has been completely ignored in the literature. Stopbar location is a necessary parameter to correctly measure partially observed queue lengths and stop locations. Third, unlike Yao et al.’s research ( 6 ), this method does not rely on shockwave information. In low-penetration-rate conditions or a low number of lanes, the number of trajectories may not be large enough to confidently estimate shockwaves.
Our method was thoroughly evaluated, first via simulation on a one-lane approach and then experimentally tested in two distinct one-lane approaches with trajectory data from a Transportation Network Company (TNC) service provider. One-lane approaches are more statistically challenging than the multilane settings used in research by Zheng and Liu ( 1 ), Tang et al. ( 5 ), and Yao et al. ( 6 ) since there is at most only one observable queue every cycle.
The next section provides relevant information on stopbar location- and signal plan estimation. After that, we present the maximum likelihood estimation method. The simulation results are then described, followed by the field test results. Finally, our conclusions and suggestions for further research are provided.
Preliminaries
The algorithm required two data inputs: mapmatched trajectory data and the signal plan of the intersection. Jam density and saturation headway were assumed to be known. Jam density is the average number of vehicles per unit of distance while queued. Saturation headway is defined as the time interval between queued departures. Before the estimation, we executed four preprocessing steps: stopbar location estimation, signal plan clock shift estimation, stop identification and filtering, and stop cycle labeling. The preprocessing steps produced trajectory data enriched with stop information and signal plan-related features, ready to be used for volume estimation.
Stopbar Location Estimation
Every queue at a signal has a start and end point. When queues are partially observed via trajectory data, we are not certain of either. Therefore, to estimate queue lengths and volumes we need to first estimate the stopbar location—the most probable start point of the queues. Estimation of this feature has been overlooked in the literature to date. Researchers usually sample trajectory data from a population of microsimulation-generated trajectories or simply assume a particular stopbar location manually. However, in large-scale industrial applications such assumptions are unsuitable and do not generalize to new data. Therefore, stopbar location must be estimated carefully.
Stopbar location estimation is necessary for the following reasons. First, the first vehicle at every queue is not always observed. Second, the first vehicles in a queue do not always stop at the same location, owing to shifts in the location of the in-vehicle detector and detection inaccuracies. This generates a distribution of stopbar locations. Third, and most importantly, different movements can have different stopbar distances with respect to the junction origin defined by the map data. Not estimating the stopbar location will consistently bias queue lengths and volumes across movements. As a side note, the stopbar location derived from the data will not necessarily correspond to the white line painted at the intersection for the aforementioned reasons.
The location of the stopbar is defined as a point along a link where vehicles are expected to stop before, and not after it. Thus, we needed to find the most statistically significant location where the change between before and after stop densities was maximized. As a first step, if a trajectory had multiple stops, we only retained the last stop before crossing the intersection, since any earlier stops were not relevant to stopbar location estimation. Then, from 0 (the junction node in the map data) to a fixed upper limit (i.e., 50 m), we evaluated at every point
Since we did not know the actual value of jam spacing, but had a broad knowledge of its range, we solved the above score function for several values of
This method could also be applied at junctions where there are left-turn waiting areas, which is quite common in Chinese cities. If this is the case, there will be two stopbar locations to be estimated. The advanced stopbar location is first found as described above. The second stopbar could be similarly estimated by using the distribution of the second-to-last stops. These waiting areas can also exist for through movements.
Clock Shift
Signal controllers and smartphones may not have the same time reference. We call this a maladjustment clock shift. Thus, we might find crossings on red or queues during green. Clock shift is estimated by finding the offset value that has the minimum number of trajectories crossing on red. Alternatively, if no signal plan is available, it can be estimated from trajectory data ( 14 ). Since green times are not necessarily fully utilized, we might encounter offset values with the same number of crossings on red. If that is the case, we would choose the rightmost value, when the queue departure process is more likely to start.
Stop Identification and Filtering
Once the clock shift has been estimated, we can start extracting features from the trajectories. One of the advantages of using trajectory data is the simple identification of stops. Since the whole trajectory is observed, the stops are those first points where the speed is lower than a given threshold,
The filtering is carried out by applying a sequence of rules:
If two successive potential stops are closer than the merge stop threshold (i.e., 4 s.), the stop point becomes the candidate closer to the stopbar. The new stop delay is the sum of the two old stop delays. This is generally the case when a vehicle switches to a shorter queue having queued already.
The remaining stops’ delays should be longer than the minimal stop delay threshold (i.e., 6 s.)
Optionally, stops occurring too far from the stopbar can be filtered out by a quantile rule or by setting a distance threshold, that is, the distance to the closest upstream unsignalized junction.
Stop Cycle Labeling
After filtering stops, we can now confidently assign a cycle to each stop. This feature is critical in our model. In normal conditions, a trajectory is expected to stop only once every red light. We first assigned a cycle to each departure point. Then, starting from the departure cycle, we discounted one cycle per additional stop. Note that we did not use any shockwave information. First, the sampling frequency of 3 s found in our data prevented us from accurately detecting shockwave points. Second, for low penetration rates, for which there is usually, at most, one trajectory per cycle and movement, cycle-by-cycle shockwave estimation is too unreliable. This is especially true for oversaturated trajectories in which the departure process exhibits growing variance as we distance ourselves from the estimated stopbar location.
Marginal Likelihood Estimation
In a signalized intersection approach, incoming vehicles arrive following a Poisson process of mean rate,
The approach was considered long enough for arrivals to be independent, where incoming vehicles were first detected at a distance,
The mean rate,
The definition of such event classes must first be exhaustive, that is, cover the whole process distribution, and second, mutually exclusive, that is, the probability of such classes must not overlap. These statistical events can be broadly classified by the number of stops each trajectory has and the number of cycles spanning the departure of the first trajectory and the arrival of the second:
Both first and second trajectories have a stop during the same cycle;
The two trajectories do not have a stop during the same cycle, and (a) The first trajectory is free-flow (zero stops), the second one has at least one stop; (b) Both trajectories are free-flow (zero stops); (c) The first trajectory has at least one stop and the second one is free-flow (zero stops); and (d) Both trajectories have at least one stop, but not during the same cycle.
The cycle number difference is the main concept in the event definition. Let us assume that cycles within a time of day (TOD) are indexed from 1 to
Table 1 enumerates all the statistical event types. There are 10 different types, defined by all the possible combinations of stops and cycle number differences. However, these 10 events can be expressed probabilistically in just four likelihood expression types, since some of these events have similarities and only the definition of the input parameters changes. Figure 1 illustrates six of these statistical events. Blue dashed lines represent departure shockwaves and orange dashed lines represent projected beginning of red times at free-flow speed.
Statistical Event Types and Their Likelihood Expressions
Note: Distrib.= Distribution; PMF = Probability Mass Function; Convol. = Convolution; CDF = Cumulative Density Function.

Example diagrams of some statistical event cases: (a) stop-stop-in-same-cycle, (b) second trajectory stopped with cycle difference larger than one, (c) stopped free-flow in contiguous cycles, (d) stopped-stopped in contiguous cycles, (e) both trajectories free-flow in contiguous cycles, and (f) first trajectory free-flow second stopped contiguous cycles. Shockwave lines are shown for reference only, not used in the event definition.
Suppose that either cycle number difference is equal to 0 or both trajectories have a stop in the same cycle (see Figure 1a); then, the arrival interval length is equal to
The value of
When the cycle number difference is equal to 1, the queue on the second cycle may depend on the arrivals during the first cycles are the first trajectory, whenever there is oversaturation. Thus, if the first trajectory has stops (see Figure 1,
a
,
c
and
d
), the arrival interval length is equal to
Let
where
However, since exponential arrivals are memoryless, we can break the dependency on
where
Let us assume
If the first trajectory is free-flow, arrivals during both cycles are considered independent since oversaturation is unlikely to happen (see Figure 1,
e
and
f
). Therefore, the arrival interval will start at
The total marginal likelihood expression is
where
Proof:
Since
For situations where the mean arrival flow is time-varying, that is, a nonhomogeneous Poisson process, the estimator may be used for shorter periods of time. Consecutive periods can be estimated independently, that is, as a homogeneous Poisson process, by using either overlapping or nonoverlapping windows. The simulation sensitivity results detailed in the next section can be used as an upper bound on the efficiency of the estimator when using real-world data, assisting in the selection of an appropriate period length.
Last but not least, for situations in which the arrival pattern is dependent on an upstream junction, the simplest way to include such dependence is to do as in Zheng and Liu’s study (
1
). First, the arrival timestamps of the trajectories are binned by time slices,
Simulation Results
The method was tested via an event-based simulation, in which both time and space dimensions were continuous. The traffic dynamics followed an LWR model (
15
). Saturation headway was set to 2.15 s/veh, saturation spacing equal to 7 m/veh, and free-flow speed equal to 10 m/s. The signal had a cycle length of 90 s with an effective green of 30 s for that movement. Interrarrival times were exponentially distributed with rate parameter

Event-based simulated trajectories.
The purpose of this event-based simulation was manifold: first, we avoided any biases on generation of the arrival process owing either to time discretization or Poisson approximations via negative-binomial distributions, commonly employed in microsimulation software (see the specific warning in SUMO documentation [ 17 ]). Second, this simulation was free from trajectory measurement errors and signal plan estimation inaccuracy. Third, driver behavior had no added uncertainty, such as yellow light dilemma or anticipated deceleration when facing red lights. In summary, our event-based simulation isolated the stochasticity of the arrival process upstream from other uncertainties of the indirect observational process, suggesting a best-case performance, guarantee on the bias and the consistency of the estimator.
The simulation was run over several parameters values: four flow-rate values of 200, 300, 400, and 500 vph; penetration rates of 5%, 10%, 20%, and 50%; and time horizons of 0.5, 1, 2, and 10 h. This range of parameters was designed to explore efficiency and bias from a quasi-real-time setting to a TOD with several days of data (5 weekdays). The resulting
Volume estimator accuracy was measured by the absolute percentage error:
Figure 3 shows the relative centered (by subtracting 1) estimator bias versus the sample size, as a function of the

Flow-rate relative bias against absolute sample size, time horizon length in hours, penetration rate, and
We also observed that the estimator was generally consistent. The larger the sample, the lower the variance of the individual biases, until average biases of less than 5% were reached for periods of 10 h.
Table 2 analyzes our estimator’s bias and efficiency as a function of the main sensitivity parameters:
Estimator Bias and Efficiency Comparison
Note: b = stands for bias (in %); std = stands for standard deviation of bias (in %); Δ b = stands for bias difference (in %) and Δ std stands for difference in efficiency (in %) prop = stands for proposed method; zheng = stands for Zheng et al (2017).
In
Concerning the other sensitivity variables, as penetration rate
Field Test Results
We proceeded to test the estimator at a real-world experimental junction managed by the authors in Haidian district, Beijing, China. This was a four-leg intersection, with one lane per approach, with left-turn-, through-, and right-turn movements allowed. The location of the intersection is shown in Figure 4. Video captures of the north and east approaches are shown in Figure 5.

Test junction location map.

Video detail of (a) north and (b) east approaches.
We recorded traffic on the north approach for 55 min, from 16:35 to 17:30 on September 10, 2019. Vehicle counts were done manually. Figure 6 shows the signal plan that was active during that period of time. Cycle length was equal to 60 s. To minimize conflicts in crossings and delays, one approach started 5 s later than the other for each pair of parallel approaches, whereas the other terminated 5 s earlier.

Experimental intersection signal plan.
The first step was to estimate stopbar locations. For completeness, we estimated the stopbar location for the four approaches. Figure 7 shows the results. We observed how the estimation was sharp for all movements. However, for the north approach, the peak was slightly less clear, perhaps because there was a high right-turn ratio.

Stopbar location estimation score against stopbar distance. Optimal locations marked by the red line.
After calibrating the clock shift of the signal plan, we needed to estimate the other traffic-related parameters: jam density, saturation headway, and free-flow speed. From observations of the video, we set jam density equal to 6.5 m/veh. Average saturation headway was estimated by measuring the queue discharges for all trajectories that stopped more than once and averaged out. Estimating this parameter in this way was particularly important since the speed bumps slowed down the queue discharge process. Queue discharge measurements were taken from data from the original 6-h TOD. The final value was equal to 3.8 s/veh, significantly lower than the standard values used in the optimal condition. The free-flow speed parameter was equal to the 95th percentile of all observed individual trajectory incoming speeds to the link. The link length of the study approach was limited to 150 m since there was a side street at that distance that connected to the studied approach. Moreover, right-turn trajectories crossing on red were deleted. Any other trajectories with significant issues were also deleted. Table 3 shows the original vehicle counts, measured volumes, penetration rates, and volume estimations for north and east approaches for both estimators. The v/c ratio of the North approach was 0.77, the east approach was less congested, with v/c close to 0.49. The north approach had four oversaturated cycles whereas the east approach had none. The final relative biases were 9.3% and 13.2% for our proposed method, which fell within the expected range of the simulation results. Our estimated biases were thus smaller than Zheng and Liu’s ( 1 ).
Field Test Statistics and Volume Estimates
Note:
Conclusions
We proposed a new method to estimate traffic volumes on signalized intersections using only probe-vehicle trajectories, which were able to effectively address oversaturated conditions and residual queues. The flow-rate estimate was obtained via maximum marginal likelihood, in which each observation corresponded to an interval defined between two consecutive trajectories. Several types of statistical events were specified in the likelihood, in the function of the observed stop positions, in the relative signal plan position, and in the difference in cycles where these stops were located.
The performance of the estimator was first evaluated via simulation. We concluded that the estimator presented low bias and was consistent enough to provide estimates of less than 10% error for most situations. The estimator performed the worst for low saturation ratios, in which an identifiability issue became prevalent. The method was also tested in a real-world intersection with validated camera traffic counts, with relative errors of 9.3% and 13.2% on the test approaches.
We will now identify several points for further research. First, the algorithm was tested using the Poisson process for convenience. However, it could be adapted to a more general renewal process, capturing potential under- or overdispersion in the arrivals. Although inferring the actual parameters of complex arrival models may be challenging, an alternative way could be to test for more flexible count distributions. One good candidate would be the generalized Poisson distribution, which has only two parameters.
To validate its generalization, the estimator could be tested at more intersections, especially intersections with multiple lanes and coordinated approaches. We plan to explore using convolutions for cases in which there are long periods of missing data, especially when the following trajectory is oversaturated. To date, cycles with missing data have been omitted from the analysis, as well as right-turn on-red trajectories, which could provide useful information on queues.
Finally, the method currently incorporates information only on stop positions, arrival speeds, and the signal plan. Adding information on delays and shockwave lines could help improve the accuracy, especially in undersaturated settings, by estimating the headway distribution from the data.
Footnotes
Author Contributions
The authors confirm contribution to the paper as follows: study conception and design: R. Lloret-Batlle, Zi-Hao Wang; data collection: Zi-Hao Wang, Roger Lloret-Batlle; analysis and interpretation of results: Roger Lloret-Batlle; draft manuscript preparation: Roger Lloret-Batlle, Jianfeng Zheng. All authors reviewed the results and approved the final version of the manuscript.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
