Traffic Volume Estimation for both Undersaturated and Oversaturated Signalized Intersections With Stopbar Location Estimation Using Trajectory Data

Abstract

We proposed an estimator for traffic volumes at signalized intersections using only sparse trajectory data. Each pair of observed trajectories defined a statistical event, from which traffic volume was inferred. The interaction between trajectory stop distances, arrival speeds, and signal plan defined different classes of statistical events, with distinct likelihood expressions. Contrary to recent approaches found in the literature, our method addressed oversaturation and residual queues. We also proposed a method to estimate stopbar location, crucial to properly estimating stop distances and queue lengths. Arrivals were assumed to follow a negative exponential distribution., and the method was compatible with any kind of control. The signal plan was assumed to be known, but an estimated signal plan could also be used. The estimation was formulated as a maximum marginal likelihood problem. We showed the problem to be globally concave and, thus, optimally solvable by standard gradient-based methods. The estimator was first validated by an event-based Lighthill–Whitham–Richards simulation, suppressing any measurement errors from trajectories and uncertainty from driver behavior. The estimator was found to consistently show bias below 10% in low-penetration-rate situations, when the $υ / c$ ratio was medium to high. Finally, the estimator was tested at a real-world intersection set by the authors in Beijing, China, which obtained a bias of similar of magnitude.

Keywords

data and data science data science operations traffic flow

Traffic volume is a fundamental variable in traffic signal control. Typically, volume counts are obtained from loop detectors placed in fixed locations on the road. More recently, vehicle counts and speeds have also been obtainable from cameras and radar. However, physical detection is still expensive, and measurements are available only where such equipment is installed. Therefore, full-network coverage is generally not available.

A different source of widely available data is trajectory data. These data readily provide information about incomplete queue lengths and measurements of vehicle delays and speeds. However, they do not directly provide traffic volume information—this must be inferred. Instead, probe vehicles are able to provide better coverage on large networks at a much lower cost.

As a consequence, traffic volume estimation from trajectory data has recently received interest in the literature. Zheng and Liu proposed a maximum likelihood estimation that assumed a time-dependent Poisson distribution for undersaturated conditions ( 1 ). Zhao et al. proposed a volume and queue length estimation method using a Bayes rule-based closed formula with an isotonic regression as the first step ( 2 ). Wang et al. presented a Bayesian-network method to estimate volumes in undersaturated junctions, solved by expected-maximization, using trajectory delays, and imposing a Lighthill–Whitham–Richards (LWR) model for shockwave estimation ( 3 ). Seo et al. proposed fundamental diagram estimation for the freeway context using probe-vehicle data ( 4 ). Tang et al. presented a method to estimate volumes based on tensor decomposition ( 5 ). Yao et al. combined shockwave theory and maximum likelihood to estimate cycle-by-cycle volumes ( 6 ). However, cycle-by-cycle shockwave estimation is unreliable for low sample ratios, that is, when one trajectory per cycle is observed. The balance between free-flow and stopped trajectories is calculated via the platoon ratio, defined in the Highway Capacity Manual ( 7 ).

In addition to work on volume estimation, there is extensive literature on queue length estimation and other indices that use probe-vehicle data and other mobile sensors. Hao et al. ( 8 ) and Hao et al. ( 9 ) proposed a method based on Bayesian networks to estimate the order of probe-vehicle positions in vehicle arrivals and departures. Hao and Ban explored queue length estimation based on mobile data ( 10 ). Sun and Ban applied variational traffic flow theory to retrieve a population of vehicle trajectories based on probe-vehicle data ( 11 ). Their objective was to acquire volume information, assuming uniform arrivals between each pair of observed trajectories. More recently, Zhang et al. ( 12 ) and Mei et al. ( 13 ) estimated queue lengths via frequentist and Bayesian methods, respectively.

In this article, we propose a method for estimating volumes in signalized approaches using only trajectory data. The goal is to confidently estimate average volumes over a period of time, requiring only sparse trajectory data and signal plan information. Specifically, the features required are stop locations, vehicle arrival times, stopbar crossing times, and signal plan information.

A maximum likelihood estimation was performed, in which each observation captured the traffic characteristics of an interval defined by two consecutive trajectories. We assumed exponential arrivals since the experimental setting had independent arrivals. However, the method could incorporate arrival patterns from adjacent junctions if necessary, as in research undertaken by Zheng and Liu ( 1 ). We present the case for pretimed control, but the method could also be applied for adaptive signal control as long as the signal plan is known.

The contributions of this article are both methodological and experimental. First, we extend the work carried out by Zheng and Liu ( 1 ) to include oversaturation events. Second, we are the first to incorporate stopbar location estimation in a volume estimation problem—an issue that has been completely ignored in the literature. Stopbar location is a necessary parameter to correctly measure partially observed queue lengths and stop locations. Third, unlike Yao et al.’s research ( 6 ), this method does not rely on shockwave information. In low-penetration-rate conditions or a low number of lanes, the number of trajectories may not be large enough to confidently estimate shockwaves.

Our method was thoroughly evaluated, first via simulation on a one-lane approach and then experimentally tested in two distinct one-lane approaches with trajectory data from a Transportation Network Company (TNC) service provider. One-lane approaches are more statistically challenging than the multilane settings used in research by Zheng and Liu ( 1 ), Tang et al. ( 5 ), and Yao et al. ( 6 ) since there is at most only one observable queue every cycle.

The next section provides relevant information on stopbar location- and signal plan estimation. After that, we present the maximum likelihood estimation method. The simulation results are then described, followed by the field test results. Finally, our conclusions and suggestions for further research are provided.

Preliminaries

The algorithm required two data inputs: mapmatched trajectory data and the signal plan of the intersection. Jam density and saturation headway were assumed to be known. Jam density is the average number of vehicles per unit of distance while queued. Saturation headway is defined as the time interval between queued departures. Before the estimation, we executed four preprocessing steps: stopbar location estimation, signal plan clock shift estimation, stop identification and filtering, and stop cycle labeling. The preprocessing steps produced trajectory data enriched with stop information and signal plan-related features, ready to be used for volume estimation.

Stopbar Location Estimation

Every queue at a signal has a start and end point. When queues are partially observed via trajectory data, we are not certain of either. Therefore, to estimate queue lengths and volumes we need to first estimate the stopbar location—the most probable start point of the queues. Estimation of this feature has been overlooked in the literature to date. Researchers usually sample trajectory data from a population of microsimulation-generated trajectories or simply assume a particular stopbar location manually. However, in large-scale industrial applications such assumptions are unsuitable and do not generalize to new data. Therefore, stopbar location must be estimated carefully.

Stopbar location estimation is necessary for the following reasons. First, the first vehicle at every queue is not always observed. Second, the first vehicles in a queue do not always stop at the same location, owing to shifts in the location of the in-vehicle detector and detection inaccuracies. This generates a distribution of stopbar locations. Third, and most importantly, different movements can have different stopbar distances with respect to the junction origin defined by the map data. Not estimating the stopbar location will consistently bias queue lengths and volumes across movements. As a side note, the stopbar location derived from the data will not necessarily correspond to the white line painted at the intersection for the aforementioned reasons.

The location of the stopbar is defined as a point along a link where vehicles are expected to stop before, and not after it. Thus, we needed to find the most statistically significant location where the change between before and after stop densities was maximized. As a first step, if a trajectory had multiple stops, we only retained the last stop before crossing the intersection, since any earlier stops were not relevant to stopbar location estimation. Then, from 0 (the junction node in the map data) to a fixed upper limit (i.e., 50 m), we evaluated at every point $x$ the difference in stops found between $[x, x + s]$ and $[x - s, x]$ , $s$ being an a priori set saturation spacing. Given $s$ , the optimal stopbar location is

x^{*} = argmax (N_{before} (x; s) - N_{after} (x; s))

(1)

Since we did not know the actual value of jam spacing, but had a broad knowledge of its range, we solved the above score function for several values of $s$ , averaging the score pointwise. For instance, in the field test section of this paper, our prior information of jam spacing was considered to be uniform between 6 and 9 m/veh.

This method could also be applied at junctions where there are left-turn waiting areas, which is quite common in Chinese cities. If this is the case, there will be two stopbar locations to be estimated. The advanced stopbar location is first found as described above. The second stopbar could be similarly estimated by using the distribution of the second-to-last stops. These waiting areas can also exist for through movements.

Clock Shift

Signal controllers and smartphones may not have the same time reference. We call this a maladjustment clock shift. Thus, we might find crossings on red or queues during green. Clock shift is estimated by finding the offset value that has the minimum number of trajectories crossing on red. Alternatively, if no signal plan is available, it can be estimated from trajectory data ( 14 ). Since green times are not necessarily fully utilized, we might encounter offset values with the same number of crossings on red. If that is the case, we would choose the rightmost value, when the queue departure process is more likely to start.

Stop Identification and Filtering

Once the clock shift has been estimated, we can start extracting features from the trajectories. One of the advantages of using trajectory data is the simple identification of stops. Since the whole trajectory is observed, the stops are those first points where the speed is lower than a given threshold, $α$ . Stop delays are the time differences between each stop and the first subsequent point within a trajectory to exceed a speed threshold, $β$ . These values are closely related to the sampling frequency of the trajectories. In this study, $α$ was set to 0.67 m/s and $β$ was set to 1.5 m/s. These values proved adequate for the time granularity of 3 s found in our data. Unfortunately, not all detected stops will be stops caused by the presence of the signal. Some stops could be produced for other reasons. Sometimes vehicles will stop owing to an obstacle (i.e., a vehicle joining the road from street parking) or because of switching queues if they find a lane with a shorter queue. For these reasons, detected stops need to be filtered.

The filtering is carried out by applying a sequence of rules:

If two successive potential stops are closer than the merge stop threshold (i.e., 4 s.), the stop point becomes the candidate closer to the stopbar. The new stop delay is the sum of the two old stop delays. This is generally the case when a vehicle switches to a shorter queue having queued already.

The remaining stops’ delays should be longer than the minimal stop delay threshold (i.e., 6 s.)

Optionally, stops occurring too far from the stopbar can be filtered out by a quantile rule or by setting a distance threshold, that is, the distance to the closest upstream unsignalized junction.

Stop Cycle Labeling

After filtering stops, we can now confidently assign a cycle to each stop. This feature is critical in our model. In normal conditions, a trajectory is expected to stop only once every red light. We first assigned a cycle to each departure point. Then, starting from the departure cycle, we discounted one cycle per additional stop. Note that we did not use any shockwave information. First, the sampling frequency of 3 s found in our data prevented us from accurately detecting shockwave points. Second, for low penetration rates, for which there is usually, at most, one trajectory per cycle and movement, cycle-by-cycle shockwave estimation is too unreliable. This is especially true for oversaturated trajectories in which the departure process exhibits growing variance as we distance ourselves from the estimated stopbar location.

Marginal Likelihood Estimation

In a signalized intersection approach, incoming vehicles arrive following a Poisson process of mean rate, $λ$ , with Probability Mass Function (PMF) $f (q | λ, t)$ and Cumulative Distribution Function (CDF) $F (q | λ, t)$ , where $q$ is the number of arrivals during a period of time of length $t$ ,

f (q | λ, t) = \frac{e^{- λ t} {(λ t)}^{q}}{q!}

(2)

F (q | λ, t) = \sum_{k = 0}^{q} f (k | λ, t)

(3)

The approach was considered long enough for arrivals to be independent, where incoming vehicles were first detected at a distance, $L$ , from the signal. Only a proportion, $p$ , of trajectories was observed. The approach had a stopbar located at a distance, $δ$ , from such a signal. We assumed that the signal had a pretimed control, and the signal plan was known. Therefore, cycle length, $C$ , effective green length, $g$ , and green start, $g_{s}$ , values were known. Nonetheless, the method was also applicable for adaptive or actuated control cases.

The mean rate, $λ$ , was estimated by maximum marginal likelihood. Before specifying the likelihood function, we first needed to define the full set of distinct statistical events that described the partially observed stochastic process. This process was inferred from the interaction of the observed trajectories with the signal plan. Each event was defined by a pair of consecutive observed trajectories, $i, i'$ , their stop locations and durations, and the relative position of the signal plan with regard to such trajectories. For every pair of consecutive observed trajectories, the first arrived trajectory, $i$ , is referred to as “first,” and the second arrived trajectory, $i'$ , as “second.”

The definition of such event classes must first be exhaustive, that is, cover the whole process distribution, and second, mutually exclusive, that is, the probability of such classes must not overlap. These statistical events can be broadly classified by the number of stops each trajectory has and the number of cycles spanning the departure of the first trajectory and the arrival of the second:

Both first and second trajectories have a stop during the same cycle;

The two trajectories do not have a stop during the same cycle, and

(a) The first trajectory is free-flow (zero stops), the second one has at least one stop;

(b) Both trajectories are free-flow (zero stops);

(d) Both trajectories have at least one stop, but not during the same cycle.

The cycle number difference is the main concept in the event definition. Let us assume that cycles within a time of day (TOD) are indexed from 1 to $m$ . We define the cycle number difference as the difference between the departure cycle index of the first trajectory and the arrival cycle index of the second trajectory. If two trajectories arrive or have a stop during the same cycle, the cycle number difference is zero. Cycle number difference is calculated using the previously described stop cycle label feature.

Table 1 enumerates all the statistical event types. There are 10 different types, defined by all the possible combinations of stops and cycle number differences. However, these 10 events can be expressed probabilistically in just four likelihood expression types, since some of these events have similarities and only the definition of the input parameters changes. Figure 1 illustrates six of these statistical events. Blue dashed lines represent departure shockwaves and orange dashed lines represent projected beginning of red times at free-flow speed.

Table 1.

Statistical Event Types and Their Likelihood Expressions

First traj.	Second traj.	Cycle number difference	Likelihood case	Likelihood expression
Stop	Stop	(Have stop in same cycle)	Distrib. PMF	$f (n \| λ, τ)$
		1	Convol. PMF	$g (n \| λ, t_{1}, t_{2}, \tilde{n})$
		$> 1$	Distrib. PMF	$f (n \| λ, t_{2})$
	Free-flow	0	Distrib. CDF	$F (n \| λ, τ) - f (0 \| λ, τ)$
		1	Convol. CDF	$G (n \| λ, t_{1}, t_{2}, \tilde{n}) - g (0 \| λ, t_{1}, t_{2}, \tilde{n})$
		$> 1$	Distrib. CDF	$F (n \| λ, t_{2}) - f (0 \| λ, t_{2})$
Free-flow	Stop	0	$∄$	1
	Stop	$\geq 1$	Distrib. PMF	$f (n \| λ, t_{2})$
	Free-flow	0	Distrib. CDF	$F (n \| λ, τ) - f (0 \| λ, τ)$
	Free-flow	$\geq 1$	Distrib. CDF	$F (n \| λ, t_{2}) - f (0 \| λ, t_{2})$

Note: Distrib.= Distribution; PMF = Probability Mass Function; Convol. = Convolution; CDF = Cumulative Density Function.

Figure 1.

Example diagrams of some statistical event cases: (a) stop-stop-in-same-cycle, (b) second trajectory stopped with cycle difference larger than one, (c) stopped free-flow in contiguous cycles, (d) stopped-stopped in contiguous cycles, (e) both trajectories free-flow in contiguous cycles, and (f) first trajectory free-flow second stopped contiguous cycles. Shockwave lines are shown for reference only, not used in the event definition.

Suppose that either cycle number difference is equal to 0 or both trajectories have a stop in the same cycle (see Figure 1a); then, the arrival interval length is equal to $τ$ , the difference between the two arrival times. If we observe a (partial) queue of $n$ vehicles during a time interval, the likelihood of such an event is equal to the PMF, $f$ , of the arrival process distribution over that interval. If we observe instead an upper bound, the likelihood term is the CDF, $F$ , of the arrival process distribution.

The value of $n$ is obtained by dividing the distance between the stopbar and the stop location. Or, alternatively, the distance between the two stops (if they are in the same cycle) by the jam density. When $n$ is an upper bound from an observed nonstopped trajectory, it is calculated by dividing the observed used green time interval by the saturation headway, assumed to be known a priori.

When the cycle number difference is equal to 1, the queue on the second cycle may depend on the arrivals during the first cycles are the first trajectory, whenever there is oversaturation. Thus, if the first trajectory has stops (see Figure 1, a , c and d ), the arrival interval length is equal to $τ$ . The relationship with the observed queue is now more complex since some of the arrivals will contribute to the first cycle’s queue formation and others to the second one. If there is oversaturation, a residual queue will form at the beginning of the second cycle. The arrival interval is now split in two subintervals $t_{2} = a'_{i} - t_{C}$ and $t_{1} = t_{C} - a_{i}$ . Arrivals during $t_{1}$ will either depart during the first cycle, or build a residual queue equal to $max (n_{1} - s \tilde{g}, 0)$ during the second cycle, where $\tilde{g}$ is the observed residual green; and $\tilde{n}$ denotes the trajectories departing during $\tilde{g}$ . $\tilde{n}$ is obtained by dividing the residual green time interval by the saturation headway.

Let $N_{1} ~ f (λ, t_{1})$ be the random number of arrivals during $t_{1}$ . Let $R_{1} ~ max (N_{1} - s \tilde{g}, 0)$ be the random variable of the residual queue of the following cycle. The PMF of the residual queue, $R_{1}$ , is defined by

f_{R_{1}} (\tilde{n} | λ, t_{1}) = {\begin{matrix} \sum_{r = 0}^{\tilde{n}} \frac{e^{- λ t_{1}} {(λ t_{1})}^{k}}{k!} r = 0 \\ \frac{e^{- λ t_{1}} {(λ t_{1})}^{r + \tilde{n}}}{(r + \tilde{n})!} r > 0 \end{matrix}

(4)

where $f_{R_{1}}$ is the density, $f$ , of the arrival process shifted by $\tilde{n}$ , and censored at 0; and $s$ the saturation flow-rate. Let $N_{2} | N_{1} ~ f (λ, t_{2}, S_{N_{1} = n_{1}})$ be the random number of arrivals during $t_{2}$ , given $n_{1}$ arrivals during $t_{1}$ , the latest arrival occurring at $S_{N_{1} = n_{1}}$ . The random variable corresponding to the queue, $Q$ , during the next cycle is the sum of the residual queue plus the conditional number of arrivals: $Q ~ R_{1} + N_{2} | N_{1}$ .

However, since exponential arrivals are memoryless, we can break the dependency on $N_{1}$ . The distribution of $Q$ on the next cycle, $g (q | λ, t_{1}, t_{2}, \tilde{n})$ , then becomes

\begin{matrix} g (q | λ, t_{1}, t_{2}, \tilde{n}) & = f_{N_{2}} * f_{R_{1}} \\ = \sum_{r = 0}^{q} f_{N_{2}} (q - r | λ, t_{2}) F_{R_{1}} {(\tilde{n} | λ, t_{1})}^{I (r = 0)} \\ F_{R_{1}} {(r + \tilde{n} | λ, t_{1})}^{1 - I (r = 0)} \end{matrix}

(5)

where $G$ is its CDF,

G (q | λ, t_{1}, t_{2}, \tilde{n}) = \sum_{i = 0}^{q} g (i | λ, t_{1}, t_{2}, \tilde{n})

(6)

Let us assume $Q$ follows the convolution distribution. If $q$ is an observed partial queue, the likelihood term will be the PMF, $g (q)$ , of the convolution distribution. If what we observe is an upper bound, the likelihood term will equal the CDF of the convolution distribution at $q, G (q),$ minus $g (0)$ , since only one is trajectory observed.

If the first trajectory is free-flow, arrivals during both cycles are considered independent since oversaturation is unlikely to happen (see Figure 1, e and f ). Therefore, the arrival interval will start at $t_{C}$ , the projected start time of the second trajectory’s arrival cycle. The arrival interval length is $t_{2} = a_{i'} - t_{C}$ . If there are additional cycles between the observed trajectories (see Figure 1b; trajectory $i$ not shown), the start time of the interval will be $t_{C}$ , and its length $t_{2} = a_{i'} - t_{C}$ . The use of PMF or CDF is analogous to that of previous cases. We truncated this for two reasons: first, multiple cycles imply multiple levels of convolutions, which make the computation too complicated; second, the likelihood value of the expression becomes very small, and dominated by the $f, F, g$ , and $G$ terms. Finally, if the first trajectory is free-flow, the second one has stops and the cycle number difference is equal to zero, this means that the trajectories belong to different lanes and can be modeled as independent intervals, starting from the preceding trajectory.

The total marginal likelihood expression is

\begin{matrix} L (λ) = \underset{j \in J_{f}}{Π} f (n | λ, θ_{j}) \underset{j \in J_{F}}{Π} F (n | λ, θ_{j}) - f (0 | λ, θ_{j}) \\ \underset{j \in J_{g}}{Π} g (n | λ, θ_{j}) \underset{j \in J_{G}}{Π} G (n | λ, θ_{j}) - g (0 | λ, θ_{j}) \end{matrix}

(7)

where $J_{f}, J_{F}, J_{g}, J_{G}$ are the sets of intervals corresponding to each likelihood case; and $θ_{j} = (τ^{j}, t_{1}^{j}, t_{2}^{j}, {\tilde{n}}_{j}), \forall j \in J^{•}$ are the vectors of observed parameters for each interval, $j$ . Before maximizing the loglikelihood, $LL (λ)$ , we make the following observation:

Proposition 1. If interarrival times are exponentially distributed, $LL (λ)$ is globally concave.

Proof:

$LL (λ)$ is composed of the sum of four families of terms. Given that $λ > 0$ , log-Poisson PMF terms are concave. The logarithm of convolution PMF terms are globally concave since the sum of product of Poisson terms keeps concavity. Poisson CDF terms are concave as well since they are composed by the sum of sum Poisson PMF expressions. Convolution CDF terms are concave for the same reason. Within each family of terms, the sum does not alter concavity. Thus, $LL (λ)$ is concave.

Since $LL (λ)$ is globally concave and univariate, it can easily be optimally solved by any gradient-based method, or even the bisection method; there is no need for approximate variational procedures such as the expectation-maximization algorithm as in research by Zheng and Liu ( 1 ).

For situations where the mean arrival flow is time-varying, that is, a nonhomogeneous Poisson process, the estimator may be used for shorter periods of time. Consecutive periods can be estimated independently, that is, as a homogeneous Poisson process, by using either overlapping or nonoverlapping windows. The simulation sensitivity results detailed in the next section can be used as an upper bound on the efficiency of the estimator when using real-world data, assisting in the selection of an appropriate period length.

Last but not least, for situations in which the arrival pattern is dependent on an upstream junction, the simplest way to include such dependence is to do as in Zheng and Liu’s study ( 1 ). First, the arrival timestamps of the trajectories are binned by time slices, $t$ (i.e., of 5- or 10-s long) modulo cycle length, obtaining a normalized cumulative frequency distribution, $P (i) = \int_{t_{i}}^{t_{i'}} p (t) dt$ , where $p (t)$ is the normalized frequency distribution modulo cycle length. We express this as integral for notational convenience. Then, the distribution, $P (i)$ , is scaled by the volume parameter, $λ$ . In other words, $λ t_{i}$ terms in the likelihood are replaced by $λ P (i)$ .

Simulation Results

The method was tested via an event-based simulation, in which both time and space dimensions were continuous. The traffic dynamics followed an LWR model ( 15 ). Saturation headway was set to 2.15 s/veh, saturation spacing equal to 7 m/veh, and free-flow speed equal to 10 m/s. The signal had a cycle length of 90 s with an effective green of 30 s for that movement. Interrarrival times were exponentially distributed with rate parameter $λ$ . Figure 2 shows an example of the simulated trajectories, in which the time axis had time traveled subtracted, as in research by Newell ( 16 ). Trajectories marked in blue are the observed trajectories.

Figure 2.

Event-based simulated trajectories.

The purpose of this event-based simulation was manifold: first, we avoided any biases on generation of the arrival process owing either to time discretization or Poisson approximations via negative-binomial distributions, commonly employed in microsimulation software (see the specific warning in SUMO documentation [ 17 ]). Second, this simulation was free from trajectory measurement errors and signal plan estimation inaccuracy. Third, driver behavior had no added uncertainty, such as yellow light dilemma or anticipated deceleration when facing red lights. In summary, our event-based simulation isolated the stochasticity of the arrival process upstream from other uncertainties of the indirect observational process, suggesting a best-case performance, guarantee on the bias and the consistency of the estimator.

The simulation was run over several parameters values: four flow-rate values of 200, 300, 400, and 500 vph; penetration rates of 5%, 10%, 20%, and 50%; and time horizons of 0.5, 1, 2, and 10 h. This range of parameters was designed to explore efficiency and bias from a quasi-real-time setting to a TOD with several days of data (5 weekdays). The resulting $υ / c$ ratios were 0.36, 0.54, 0.72 and 0.9. The percentages of oversaturated cycles were 19% for a $υ / c$ equal to $0.9$ ; 2% for a $υ / c$ equal to 0.72; and 0% for the rest. Scenarios with a larger $υ / c$ were omitted because the volume estimation in continuous oversaturation spanning multiple cycles ( $υ / c \geq$ 1) was trivial: not only was the departure process roughly deterministic, but the volumes could be obtained from directly measuring the difference between stop positions. For each distinct scenario, 20 different seeds were simulated. In total, we carried out 4 × 4 × 4 × 20 = 1,280 experiments.

Volume estimator accuracy was measured by the absolute percentage error: $\frac{λ^{*}}{λ} - 1$ against the sample size, instead of the unknown penetration rate as is normally reported in the literature. Given a study period of length $T$ , estimator consistency was evaluated against sample size $n_{s}$ which the practitioner or researcher could measure directly from the sample. In fact, sample density is the sample estimator of the probe-vehicle trajectory volume, which is the product of the two unknown random variables,

\frac{n_{s}}{T} = p λ

(8)

Figure 3 shows the relative centered (by subtracting 1) estimator bias versus the sample size, as a function of the $υ / c$ ratio, penetration rate, and length of study period. The two horizontal red dashed lines represent +/−10% bias bands and the blue ones +/−5% bias bands. We observed that for sample sizes over 20 trajectories, most of the instances for medium to high $υ / c$ ratio fell within the 10% error range. For low volumes, most trajectories provided an upper bound of queues, and bias became more polarized at the extremes. Had only free-flow trajectories been observed, it would not have been possible to estimate volumes at all since there was an identification problem: infinite combinations of $p$ and $λ$ values could produce the same sample several times.

Figure 3.

Flow-rate relative bias against absolute sample size, time horizon length in hours, penetration rate, and $υ / c$ ratio.

We also observed that the estimator was generally consistent. The larger the sample, the lower the variance of the individual biases, until average biases of less than 5% were reached for periods of 10 h.

Table 2 analyzes our estimator’s bias and efficiency as a function of the main sensitivity parameters: $υ / c$ ratio, penetration rate, $p$ , and time horizon length, $T$ . Time horizon is the time interval length (in hours) over which the volume is estimated. Our proposed method is compared with Zheng and Liu’s ( 1 ) for reference since ours extends it. We observed how the two estimators performed very similarly for very low volumes. Since data in such situations were sparse, the statistical events, which composed of the likelihood, were practically the same ones, boiling down to a combination of only distribution PMF and CDF terms. However, as the $υ / c$ ratio increased, the new statistical event categories in the proposed method, represented by the convolution PMF and CDF, started gaining weight. In Zheng and Liu’s estimator, such situations were simply ignored ( 1 ). As a result, bias was reduced.

Table 2.

Estimator Bias and Efficiency Comparison

$υ / c$	$p$	$T (h)$	$b_{prop} (%)$	$st d_{prop} (%)$	$b_{zheng} (%)$	$st d_{zheng} (%)$	$Δ b$ (%)	$Δ$ std (%)
0.36	0.05	1	20.6	24.3	20.7	24.4	−0.0	−0.0
	0.05	10	11.7	6.8	11.7	6.8	0.0	0.0
	0.10	1	19.9	15.5	19.9	15.5	−0.0	0.0
	0.10	10	18.2	4.4	18.3	4.4	−0.1	−0.0
	0.20	1	17.3	9.2	17.3	9.2	0.0	0.0
	0.20	10	21.9	4.2	21.9	4.2	0.0	0.0
0.54	0.05	1	13.1	15.5	13.5	15.9	−0.4	−0.4
	0.05	10	4.3	3.6	4.5	3.7	−0.2	−0.1
	0.10	1	11.8	8.6	12.5	9.4	−0.7	−0.7
	0.10	10	6.1	3.2	6.3	3.3	−0.2	−0.1
	0.20	1	8.3	7.8	8.8	8.1	−0.5	−0.3
	0.20	10	8.3	1.7	8.6	1.8	−0.3	−0.1
0.72	0.05	1	7.0	11.8	8.1	11.2	−1.1	0.5
	0.05	10	4.2	3.5	5.8	3.6	−1.6	−0.1
	0.10	1	8.2	7.1	11.1	7.6	−2.9	−0.5
	0.10	10	3.8	1.8	6.2	2.1	−2.4	−0.4
	0.20	1	5.7	2.8	8.4	3.0	−2.8	−0.2
	0.20	10	3.6	1.8	6.1	2.2	−2.5	−0.4
0.90	0.05	1	11.7	14.1	21.5	15.6	−9.8	−1.5
	0.05	10	12.8	4.2	23.6	6.2	−10.8	−2.0
	0.10	1	5.8	4.6	15.2	5.9	−9.4	−1.4
	0.10	10	4.8	1.1	13.9	1.5	−9.2	−0.4
	0.20	1	2.5	1.9	8.4	2.7	−5.8	−0.8
	0.20	10	1.4	0.5	7.8	1.0	−6.4	−0.5

Note: b = stands for bias (in %); std = stands for standard deviation of bias (in %); Δ b = stands for bias difference (in %) and Δ std stands for difference in efficiency (in %) prop = stands for proposed method; zheng = stands for Zheng et al (2017).

In $υ / c = 0.9, p = 0.05$ bias was the largest for both estimators, though much lower in the proposed estimator than in Zheng and Liu’s ( 1 ). As the penetration rate decreased, the percent of cycles without observed trajectories increased. Moreover, time intervals between the observed trajectories became longer and the average cycle number difference increased as a consequence. For computational purposes, we did not fully develop the convolution terms for more than one cycle of difference. Thus, likelihood terms representing these events led to bias since we truncated the time interval using the closest signal plan start. For undersaturated conditions with low penetration rates this was not an issue: arrivals during different cycles could be seen as independent. But for oversaturated cycles this was no longer the case and bias increased.

Concerning the other sensitivity variables, as penetration rate $p$ increased, both estimators became less biased and more accurate. The same trend was observed when increasing time horizon, $T$ . Last but not least, there was some marginal gain in efficiency and accuracy (expressed in the rightmost column of Table 2). In conclusion, the proposed estimator empirically dominated Zheng and Liu (1). That is, we obtained lower bias and improved efficiency for all the test situations.

Field Test Results

We proceeded to test the estimator at a real-world experimental junction managed by the authors in Haidian district, Beijing, China. This was a four-leg intersection, with one lane per approach, with left-turn-, through-, and right-turn movements allowed. The location of the intersection is shown in Figure 4. Video captures of the north and east approaches are shown in Figure 5.

Figure 4.

Test junction location map.

Figure 5.

Video detail of (a) north and (b) east approaches.

We recorded traffic on the north approach for 55 min, from 16:35 to 17:30 on September 10, 2019. Vehicle counts were done manually. Figure 6 shows the signal plan that was active during that period of time. Cycle length was equal to 60 s. To minimize conflicts in crossings and delays, one approach started 5 s later than the other for each pair of parallel approaches, whereas the other terminated 5 s earlier.

Figure 6.

Experimental intersection signal plan.

The first step was to estimate stopbar locations. For completeness, we estimated the stopbar location for the four approaches. Figure 7 shows the results. We observed how the estimation was sharp for all movements. However, for the north approach, the peak was slightly less clear, perhaps because there was a high right-turn ratio.

Figure 7.

Stopbar location estimation score against stopbar distance. Optimal locations marked by the red line.

After calibrating the clock shift of the signal plan, we needed to estimate the other traffic-related parameters: jam density, saturation headway, and free-flow speed. From observations of the video, we set jam density equal to 6.5 m/veh. Average saturation headway was estimated by measuring the queue discharges for all trajectories that stopped more than once and averaged out. Estimating this parameter in this way was particularly important since the speed bumps slowed down the queue discharge process. Queue discharge measurements were taken from data from the original 6-h TOD. The final value was equal to 3.8 s/veh, significantly lower than the standard values used in the optimal condition. The free-flow speed parameter was equal to the 95th percentile of all observed individual trajectory incoming speeds to the link. The link length of the study approach was limited to 150 m since there was a side street at that distance that connected to the studied approach. Moreover, right-turn trajectories crossing on red were deleted. Any other trajectories with significant issues were also deleted. Table 3 shows the original vehicle counts, measured volumes, penetration rates, and volume estimations for north and east approaches for both estimators. The v/c ratio of the North approach was 0.77, the east approach was less congested, with v/c close to 0.49. The north approach had four oversaturated cycles whereas the east approach had none. The final relative biases were 9.3% and 13.2% for our proposed method, which fell within the expected range of the simulation results. Our estimated biases were thus smaller than Zheng and Liu’s ( 1 ).

Table 3.

Field Test Statistics and Volume Estimates

Direction	North	East
Counts	301	161
$λ$	328.4	210.0
$p$ , %	7.1	4.6
$λ_{proposed}$	358.8	237.7
$λ_{zheng}$	363.4	242.3
$b_{proposed}$ , %	9.3	13.2
$b_{zheng}$ , %	10.7	15.4

Note: $λ$ = is the average volume (in veh/h)p = is the penetration rate, $λ_{proposed}$ and $λ_{zheng}$ is the estimated average volumes from our method and Zheng's. $b_{proposed}$ and $b_{zheng}$ = bias of proposed method and of Zheng et al 2017, respectively (in %).

Conclusions

We proposed a new method to estimate traffic volumes on signalized intersections using only probe-vehicle trajectories, which were able to effectively address oversaturated conditions and residual queues. The flow-rate estimate was obtained via maximum marginal likelihood, in which each observation corresponded to an interval defined between two consecutive trajectories. Several types of statistical events were specified in the likelihood, in the function of the observed stop positions, in the relative signal plan position, and in the difference in cycles where these stops were located.

The performance of the estimator was first evaluated via simulation. We concluded that the estimator presented low bias and was consistent enough to provide estimates of less than 10% error for most situations. The estimator performed the worst for low saturation ratios, in which an identifiability issue became prevalent. The method was also tested in a real-world intersection with validated camera traffic counts, with relative errors of 9.3% and 13.2% on the test approaches.

We will now identify several points for further research. First, the algorithm was tested using the Poisson process for convenience. However, it could be adapted to a more general renewal process, capturing potential under- or overdispersion in the arrivals. Although inferring the actual parameters of complex arrival models may be challenging, an alternative way could be to test for more flexible count distributions. One good candidate would be the generalized Poisson distribution, which has only two parameters.

To validate its generalization, the estimator could be tested at more intersections, especially intersections with multiple lanes and coordinated approaches. We plan to explore using convolutions for cases in which there are long periods of missing data, especially when the following trajectory is oversaturated. To date, cycles with missing data have been omitted from the analysis, as well as right-turn on-red trajectories, which could provide useful information on queues.

Finally, the method currently incorporates information only on stop positions, arrival speeds, and the signal plan. Adding information on delays and shockwave lines could help improve the accuracy, especially in undersaturated settings, by estimating the headway distribution from the data.

Footnotes

Author Contributions

The authors confirm contribution to the paper as follows: study conception and design: R. Lloret-Batlle, Zi-Hao Wang; data collection: Zi-Hao Wang, Roger Lloret-Batlle; analysis and interpretation of results: Roger Lloret-Batlle; draft manuscript preparation: Roger Lloret-Batlle, Jianfeng Zheng. All authors reviewed the results and approved the final version of the manuscript.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Roger Lloret-Batlle

Zi-Hao Wang

References

Zheng

Liu

H. X.

Estimating Trafﬁc Volumes for Signalized Intersections Using Connected Vehicle Data. Transportation Research Part C: Emerging Technologies, Vol. 79, 2017, pp. 347–362. https://doi.org/10.1016/j.trc.2017.03.007.

Zhao

Zheng

Wong

Wang

Meng

Liu

H. X.

Estimation of Queue Lengths, Probe Vehicle Penetration Rates, and Trafﬁc Volumes at Signalized Intersections Using Probe Vehicle Trajectories. Transportation Research Record: Journal of the Transportation Research Board, 2019. 2673: 660–670.

Wang

Huang

H. K.

Trafﬁc Parameters Estimation for Signalized Intersections Based on Combined Shockwave Analysis and Bayesian Network. Transportation Research Part C: Emerging Technologies, Vol. 104, No. September 2018, 2019, pp. 22–37. https://doi.org/10.1016/j.trc.2019.04.023.

Seo

Kawasaki

Kusakabe

Asakura

Fundamental Diagram Estimation by Using Trajectories of Probe Vehicles. Transportation Research Part B: Methodological, Vol. 122, 2019, pp. 40–56. https://doi.org/10.1016/j.trb.2019.02.005.

Tang

Tan

Cao

Yao

Sun

A Tensor Decomposition Method for Cycle-Based Trafﬁc Volume Estimation Using Sampled Vehicle Trajectories. Transportation Research Part C: Emerging Technologies, Vol. 118, No. August, 2020, P. 102739. https://doi.org/10.1016/j.trc.2020.102739.

Yao

Tang

Jian

Sampled Trajectory Data-Driven Method of Cycle-Based Volume Estimation for Signalized Intersections by Hybridizing Shockwave Theory and Probability Distribution. IEEE Transactions on Intelligent Transportation Systems, Vol. 21, No. 6, 2020, pp. 2615–2627. https://doi.org/10.1109/TITS.2019.2921478.

National Research Council Transportation Research Board. HCM 2010: Highway Capacity Manual. Transportation Research Board, Washington, D.C., 2010.

Hao

Sun

Ban

X. J.

Guo

Vehicle Index Estimation for Signalized Intersections Using Sample Travel Times. Procedia -Social and Behavioral Sciences, Vol. 80, No. Isttt, 2013, pp. 473–490. https://doi.org/10.1016/j.sbspro.2013.05.026.

Hao

Ban

Guo

Cycle-by-Cycle Intersection Queue Length Distribution Estimation Using Sample Travel Times. Transportation Research Part B: Methodological, Vol. 68, 2014, pp. 185–204. https://doi.org/10.1016/j.trb.2014.06.004.

10.

Hao

Ban

Long Queue Estimation for Signalized Intersections Using Mobile Data. Transportation Research Part B: Methodological, Vol. 82, 2015, pp. 54–73. http://doi.org/10.1016/j.trb.2015.10.002.

11.

Sun

Ban

X. J.

Vehicle Trajectory Reconstruction for Signalized Intersections Using Mobile Trafﬁc Sensors. Transportation Research Part C: Emerging Technologies, Vol. 36, 2013, pp. 268–283. https://doi.org/10.1016/j.trc.2013.09.002. https://linkinghub-elsevier-com-s.web.bisu.edu.cn/retrieve/pii/S0968090X13001824.

12.

Zhang

Liu

H. X.

Chen

Wang

Cycle-Based End of Queue Estimation at Signalized Intersections Using Low-Penetration-Rate Vehicle Trajectories. IEEE Transactions on Intelligent Transportation Systems, Vol. 21, No. 8, 2020, pp. 3257–3272. https://doi.org/10.1109/TITS.2019.2925111.

13.

Mei

Chung

E. C.

Tang

A Bayesian Approach for Estimating Vehicle Queue Lengths at Signalized Intersections Using Probe Vehicle Data. Transportation Research Part C: Emerging Technologies, Vol. 109, No. October, 2019, pp. 233–249. https://doi.org/10.1016/j.trc.2019.10.006.

14.

Yan

Zhu

Sun

Signal Timing Parameters Estimation for Intersections Using Floating Car Data. Transportation Research Record: Journal of the Transportation Research Board, 2019. 2673: 189–201.

15.

Lighthill

M. J.

Whitham

G. B.

On Kinematic Waves. II. A Theory of Trafﬁc Flow on Long Crowded Roads. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, Vol. 229, No. 1178, 1955, pp. 317–345. https://www-jstor-org-443.web.bisu.edu.cn/stable/99769.

16.

Newell

Theory of Highway Trafﬁc Signals. University of California at Berkeley, Institute of Transportation Studies, 1989. https://escholarship.org/uc/item/7zn2b9bc.

17.

SUMO. SUMO Documentation, 2018. https://sumo.dlr.de/docs/Simulation/Randomness.html.