Multiple object tracking using A * association algorithm with dynamic weights

Abstract

Persistently tracking multiple objects is very challenging when there exit occlusions. We present a tracking association approach based on the A^* algorithm. We first formulate the multiple object tracking as an integer programming problem of the flow network. Under this framework, the integer assumption is relaxed to a standard linear programming problem. Therefore, the global optimal solution can quickly be obtained using the A^* algorithm with dynamic weights. The proposed method avoids the difficulties of integer programming and more importantly, it has a lower worst-case complexity than competing methods but a better tracking accuracy and robustness in complex environments. Experiment results revealed that our proposed method achieved state-of-the-art time costs and can operate in real-time.

Keywords

Multi-object tracking flow network model integer programming

1 Introduction

Multiple object tracking is important for many computer vision applications, such as video surveillance, human-computer interaction, intelligent navigation, and others [1, 2]. Apart from a high performance detection algorithm as an auxiliary, high quality multi-object tracking should also track the algorithm for support, which can address certain types of complex cases, e.g., occlusion, illumination, clutter, and so on [3]. The data association (DA) method is a favorite method for multi-object tracking. The utilized techniques often incluode the nearest neighbor method, joint probability data association (JPDA), methods based on neural networks, and so on [4, 5].

The effect of the DA methods mentioned above is closely related to the detection accuracy in adjacent frames. These typical approaches are resilient to false positives and false negatives: if an object is not detected in a frame but is detected in the preceding and following frames, it is a false negative. A false positive is mistaking the tracking object “A” as object “B”. Although this problem can be solved using targeted design in a statistical trajectory model with filtering [6, 7], the calculation method that provides the maximum posterior probability is NP-complete.

To deal with this problem, some recent papers have proposed different approaches: Giebel [8] uses sampling and particle filtering to remove clutter from the same object and reduce the probability of NP-completeness. This method obtains a relatively accurate tracking trajectory but requires a sufficient number of sampling points. Perera [9] divides a long sequence into several short ones, yielding many tracklets, and links them using Kalman filtering. This can avoid NP-completeness. The accuracy of this method is inversely proportional to the length of the tracklets; the shorter the length, the better the tracking result. However, the excessive division increases computation time, therefore, the method cannot track objects for long. Fleuret [6] processes trajectories individually over long sequences using greedy dynamic programming (DP) to choose the order. These approaches, while effective, cannot attain the global optimum. Xi [10] improves the SPFA algorithm to relax the integer assumption and to successfully identify the global optimal soluteion. However, this algorithm is not easy to understand.

Zhang’s approach [11] relies on a min-cost network flow framework-based optimization method to find the global optimum for multi-object tracking. However, the two algorithms he proposes have defects and his complexity is polynomial. Under this framework, Berclaz [12] formulates multi-object tracking as an Integer Programming (IP) problem and reduces it to linear programming (LP). By relying on the k-shortest paths (KSP) algorithm for the optimization of the LP problem, this approach reduces the complexity to perform robust multi-object tracking in time. However, because of KSP’s lack of a motion model over DP, the tendency of the latter to ignore fragmentary trajectories makes it more robust. Pirsiavash [13] continues the work of Zhang and uses his method to obtain the global optimal solution with the greedy algorithm for K = 1 in O (N), but only obtains the approximate solutions for K > 1 in O (KN), where K is the unknown optimal number of unique tracks.

By contrast, we effectively combine the models proposed by Zhang and Berclaz, to devise a more efficient A^* association algorithm with dynamic weights(DW-A^*). Not only can the DW-A^* algorithm directly obtain the global solution without greedy optimization, but it can show better performance with respect to both the worst-case complexity and the run time than the above-mentioned state-of-the-art algorithms. The main contributions of this paper include the following:

Based on the min-cost network framework, we introduce a novel general mathematical IP formulation for multi-object tracking. The proposed IP method convenient and easier to comprehend than the state-of-the-art methods stated in Refs. [12] and [13]. Furthermore, it is conducive to naturally filtering out false positives and false negatives using DW-A^*.

To solve the integer LP formulation of the proposed framework, and to obtain the global optimum, we propose a novel more rapid and more efficient DW-A^* algorithm that improves the A^* algorithm. Compared with the state-of-the-art methods of Refs. [12] and [13], the proposed algorithm can obviously reduce the running time and the results of MOTA and MOTP have improved.

Extensive experimental validations.

The rest of paper is organized as follows. InSection 2, we formulate an IP problem using the min-cost network flow framework and relax it to a continuous LP. Section 3 contains our proposed DW-A^* algorithm for the relaxation of the original integer assumption. We introduce methods to target localization and long sequence segmentation processing in Section 4. Section 5 shows the experimental results and a complete evaluation metrics, and Section 6 concludes the paper.

2 Network flow model

The target motion of multi-objet tracking can be described better by the relationship between the neighborhood locations that uses the DP method in a min-cost network flow model. We define an objective function for multi-object tracking in the same manner as Ref. [12]. The objective presence of likelihood will be estimated by the marginal posterior probability in every frame, thereby obtaining the potential trajectory of the moving object.

2.1 Formalization

We formulate multi-object tracking as a process where the objective location of each target discretely changes in continuous time. A directed 3D spatiotemporal group with random variable k is used to describe the video sequence. $k = (x, y, t), x \in V$ (1) where k denotes the location of an object in this spatiotemporal group at time t, V is the set of all space-time locations in a sequence, x and y are the pixel positions of the target in the transverse and longitudinal axes, respectively.

For any location k at time t, the neighborhood N (k)⊂ { 1, 2, ⋯ , K } denotes the locations that an object can reach at time t + 1. A single track as an ordered set of state vectors T = (k ₁, ⋯ , k _N), and X = (T ₁, ⋯ , T _L) is a set of tracks. We assume that the tracking tracks are independently of each other and describe the network flow model of multi-object tracking using the dynamic model as follows: $P (X) = \prod_{T \in X} P (T)$ (2) where $P (T) = P_{source} (k_{1}) (\prod_{n = 1}^{N - 1} P (k_{n + 1} | k_{n})) P_{sink} (k_{N})$ (3)

P _source (k ₁) is the probability of a tracking trackstarting at location k ₁ and P _sink (k _N) is the probability of a tracking track ending at location k _N.

In the spatial coordinate set V, a binary indicator variable φ _i,k represents the directed flow from location i to location k, i.e., it stands for the number of objectsmoving from i to k. φ _i,k is 1 when the space-time locations i and k are included in some track, given that the object is at location i at time t and location k at time t + 1, which means that an object stays at the same spatial location between times t and t + 1. Some constraint conditions are executed for the variable φ _i,k. $\forall k, \sum_{i, k \in N (i)} φ_{i, k} = φ_{k} = \sum_{j \in N (k)} φ_{k, j}$ (4) $\forall i, k, \sum_{k \in N (i)} φ_{i, k} \leq 1$ (5)

Let a random variable M _k stands for the true presence of an object at location k in space-time. For every instant of t, the detector is used to check every location of the tracking zone. The marginal posterior probability of an existing object is calculated as $ρ_{k} = \hat{P} (M_{k} = 1 | I_{t})$ (6) where I _t is the single image at frame t. We write m ={ m _k } for a feasible set of the likelihood probability distributions for the existence of objects in V by the method detailed in Section 4.1. M is the spatial set of M _k. The likelihood probability of the existance of an object in the given set of tracks X is $P (M = m | X) = \prod_{k \in X} P (M_{k} = m_{k} | X)$ (7)

M _k is conditional independence in X, we infer the maximum a posteriori estimate of tracks by the probability distributions of the existence of objects: $X^{*} = \underset{X}{arg max} P (X) P (M = m | X)$ (8) $X^{*} = \underset{X}{arg max} \prod_{T \in X} P (T) \prod_{k \in X} P (M_{k} = m_{k} | X)$ (9)

$\begin{matrix} X^{*} & = & \underset{X}{arg max} \sum_{T \in X} log P (T) \\ + \sum_{k \in X} log P (M_{k} = m_{k} | X) \end{matrix}$ (10) $\begin{matrix} X^{*} = \underset{X}{arg max} \sum_{T \in X} log P (T) + \sum_{k} [(1 - m_{k}) \\ log P (M_{k} = 0 | X) + m_{k} log P (M_{k} = 1 | X)] \end{matrix}$ (11)

$\begin{matrix} X^{*} & = & \underset{X}{arg max} \sum_{T \in X} log P (T) \\ + \sum_{k} m_{k} log \frac{P (M_{k} = 1 | X)}{P (M_{k} = 0 | X)} \end{matrix}$ (12)

$\begin{matrix} X^{*} & = & arg max_{X} \sum_{T \in X} log P (T) \\ + \sum_{k} m_{k} log (\frac{ρ_{k}}{1 - ρ_{k}}) \end{matrix}$ (13) where Equation (11) is true because m _k is 0 or 1 according to Equation (5). Ignoring a term of Equation (10) that does not need m _k, we can obtain Equation (12). The cost value of a directed flow between the neighborhood locations of any adjacent frames isdefined as $c (e_{k, n}) = - log (\frac{ρ_{k}}{1 - ρ_{k}})$ (14) where e _k,n is a directed edge from location k at time t to location n at time t + 1, and the total cost value between any two locations in V is $C (e_{i, j}) = \sum_{e_{k, n} \in e_{i, j}, n \in N (k)} c (e_{k, n})$ (15)

2.2 Interger programming formulation

In our model, because the objects can enter and leave the tracking area, we introduce additional nodes for the source and sink that have been defined in the model proposed by [12]. Equations (8–13) can then be naturally translated into an integer program:

$\begin{matrix} Minimize C (φ) = C (e_{i, j}) \sum_{j \in N (i)} φ_{i, j} + \\ C (e_{source, i}) \sum_{i} φ_{source, i} + C (e_{i, sink}) \sum_{i} φ_{i, sink} \\ Subject to \forall k, \sum_{i, k \in N (i)} φ_{i, k} = φ_{k} = \sum_{j \in N (k)} φ_{k, j} \\ \forall i, k, \sum_{k \in N (i)} φ_{i, k} \leq 1 \end{matrix}$ (16) where the constraint conditions are the same as in Equations (4) and (5), φ * = arg min C (φ) is the optimal solution of the IP. C (e _source,i) is the total cost of a directed flow from the source node to the locations of the tracking track, and C (e _i,sink) is that from the locations of the track to the sink node. Figure 1 shows a simple flow network constructed from multi-object tracking, where the costs c _i,j is represented by line “ ”, c _source,i and c _i,sink are represented by dash line “ ”.

The costs are defined as follows: $c (e_{source, i}) = - log P_{source} (k_{i})$ (17) $c (e_{i, sink}) = - log P_{sink} (k_{i})$ (18)

The relaxation of integer program using standard methods is NP complete. In general, the variants of the simple algorithm [14] or the interior point-based methods [15] can be used to solve this problem. However, these algorithms have very high worst case time complexities. In [12] and [13], although the methods for KSP and the stochastic shortest path (SSP) can successfully relax the IP to a continuous linear program, both of them have their own deficiency. We used the proposed DW-A^* algorithm to compensate for the deficiencies of these methods.

3 A^* association algorithm with dynamic weights

In this paper, we propose an A^* algorithm with dynamic weights to relax the IP by the network flow mode, the worst-case complexity of this algorithm is O (KN). The global optimal solution of the proposed algorithm makes the tracking more the reliabe and more efficient. The network flow model needs two particular properties to realize our algorithm:

All edges and nodes are independent of each other and all edges are unit capacity.

The network is a directed acyclic graph (DAG).

3.1 A^* algorithm

Let C be the total cost of any location in space V, and let E be the set of the edges between adjacent frames of any neighborhood location. The state transition between any pair of nodes of the model can be attained by E, and the DAG G (V, E C) can completely describe the flow activity of an object of the min-cost flow model.

Let G _r (φ) as the residual graph of G (V, E, C) that denotes all locations from the current node to the ending node. We can than find the shortest path between both nodes by the A^* algorithm in G _r (φ).

Because the tracking targets may appear inside the tracking area and others may leave, we introduce two additional virtual nodes, source and sink into our DAG. These two virtual nodes denote the potential position, source and sink here denote the position where a target appeares and disappeares, respectively. Then, we use the neighbors of birth and end to replace the original position and form a new DAG with virtual nodes source and sink, as shown in Fig. 2.

We create the Open list and the Closed list. The Open list records all the nodes that are considered to find the shortest path, and the Closed list records the nodes that are no longer considered. In the proposed min-cost flow model, we can obtain the shortest path througth the following steps:

1) We put the initial position l _birth into the Open list, calculating the total cost of any path f _i,birth from l _birth to current position l _i: $g (l_{i}) = cost (f_{birth, i}) = \sum_{e_{k, n} \in f_{birth, i}} c (e_{k, n})$ (19)

The cost from l _i to the potential position l _j, l _j ∈ N (l _i) of the next frame is calculated as follows: $g^{'} (l_{j}) = cost (f_{i, j}) = c (e_{i, j})$ (20)

The estimate of total cost of any path f _j,end from l _j to the potentially terminal position is calculated by Equation (21), l _birth is put into the Closed list. $h (l_{end}) = cost (f_{j, end}) = \sum_{e_{k^{'}, n^{'}} \in f_{j, end}} c (e_{k^{'}, n^{'}})$ (21)

2) Putting the neighborhood of l _i, l _j ∈ N (l _i) into the Open list, we obtain the position $l_{j}^{*}$ , which can be satisfied by Equation (22), putting $l_{j}^{*}$ into the Closed list. $arg min F (l_{j}) = g (l_{i}) + g^{'} (l_{j}) + h (l_{end})$ (22)

We empty the Open list and update the current location to $l_{j}^{*}, l_{j}^{*} \in N (l_{i})$ .

Steps 1)-3) are iterated until l _end is added into the Open list (because l _birth has been add into the Closed list, the Open list no longer adds it). We output all the searched locations in the Closed list according to first in first out (FIFO), which is the shortest path.

$f_{birth, end}^{*} = (l_{birth}, \dots l_{i}^{*}, l_{j}^{*}, \dots, l_{end}), j \in N (i)$ (23)

Figure 2 shows the simple processing steps of the A^* algorithm in our proposed mode. Here, birth represents the node where an object was first discovered, and end is that it was last discovered. Each relaxation operation using the A^* algorithm is a heuristic searching process for the objective presence of likelihood in the next frame. The nth relaxation operation ensures that the path is the shortest in n. As the length of the edge for the shortest path in the residual graph does not exceed N - 1, the path that we obtain using the A^* algorithm is the shortest one. Compared with the method in [13], which uses the SSP algorithm with the additional greedy method, the A^* algorithm can find the global optimum. The next will prove its convergence.

3.2 The proof of the optimal solution

Lemma 1. [16] Let A be an m × n matrix. Then, for each integer vector b ∈ R ^m, the vertices of thepolyhedron {x : Ax≤ b, x ≥ 0 } are integer vector, if and only if A is a totally unimodular matrix.

Lemma 2. [17] For the arbitrary node n in the A ^* algorithm, h (n) is an estimated distance from n to the terminal node. If h (n) is no more than the actual distance between the both nodes, the globally optimal solution can be obtained.

The IP problem U ₁ and the corresponding relaxed LP problem U ₂ are considered as follows:

U ₁ : min cx ; s . t . Ax = b, x ≥ 0 and as an integer vector.

U ₂ : min cx ; s . t . Ax = b, x ≥ 0.

where, c, b and A are the known appropriate dimension vectors and constraint matrix, respectively.

Theorem. In the DAG, if A is a totally unimodular matrix, then the relaxed liner programming problem U ₂ can be solved by the A ^* algorithm and the global optimal solution of the integer programming problem U ₁ can also be obtained.

Proof. Let the set {x : Ax≤ b, x ≥ 0 } denotes a bounded polyhedron of the feasible solutions, in which there is only one vertex represents the optimal solution. Lemma 1 shows that the vertices must be nonnegative integers. Considering the specific element of A and that the vertices must be between 0 and 1, we can conclude that the vertex coordinates of a polyhedron should be either 0 or 1. In fact, the A^* algorithm accelerates the iterative process of obtaining the optimal solution by solving the LP step by step. Since, in the DAG, the calculating distance h (n) must no more than the distance of n to the terminal node, we can obtain the globally optimal solution using the A^* algorithm from Lemma 2. The total unimodularity of A ensures that each basic feasible solution is an integer, which means the relaxed LP can always converge to the optimal solution of the original IP.

The total unimodularity of A has been proved in [12]. To improve the screening speed of the A^* algorithm for the large number of nodes in the initial stage and then ignore some objective movement in the later stage, we use the dynamic weight on the A^* algorithm.

3.3 Dynamic weights

To help find the optimal solution quickly and accurately, we can prioritize speed in the initial stage of the search and increase the precision priority in the later stage, which can be achieved by adding a dynamic weight ω in Equation (22).

$\begin{matrix} F (l_{t + n + 1, u}) & = & g (l_{t + n, j}) + g^{'} (l_{t + n + 1, u}) \\ + ω * h (l_{t + n + 1, u}) \end{matrix}$ (24) where, $ω = 1 + \frac{x - (n + 1)}{x}$ (25) x is the number of nodes in one search, and n is the number of frames between the tracking initial node and the terminal node. In the initial stage of the search, the current position of the object is far from the objective position and ω is relatively large. Thus, the A^* algorithm is performed quickly at first, and the LP quickly converges to near the objective position. In the later stage, the objective position is closer and ω closer to 1. Thus, the A^* algorithm prioritizes precision to reduce the searching blindness.

3.4 The worst-case complexity

Pirsiavash and Berclaz propose the KSP and SSP algorithms. The worst-case complexity of both algorithms is O (KN log N), where K is the unknown optimal number of unique tracks, and N is the frame number of the sequence. Because of the different values of K, Pirsiavash uses different methods to obtain the solution. The specific complexity of this algorithm is related to the value of K.

The Dijkstra algorithm is recognized as an effective method to compute the shortest path in O (N log N) time. Unfortunately, in our proposed min-cost flow network, there are negative costs, which contrdict the precondition of the Dijkstra algorithm. Fortunately, the simpler A^* algorithm can be adopted in this network. For the DAG G (V, E, C), the worst case will appear when the number of extended sub-nodes from the current node is up to 3 sub-nodes that are not in the Open list. While the Open list will be empty in each iteration, the number of nodes of the Open list will not increase. Therefore, for the N frame sequence, the worst-case complexity of the DW-A^* algorithm is O (KN), where K is the number of optimal paths using the A^* algorithm in the DAG. Generally, K ≤ 3. In fact, achieving the tracking curves using the A^* algorithm involves obtaining the optimal solution using the heuristic Dijkstra algorithm. The heuristic feature of the A^* algorithm makes the search direction more objective and reduces unnecessary calculations.

4 Object localization and sequence processing

High quality multi-object tracking requires a reliable tracker, a detector that can accurately segment and locate multiple objects, and a pre-processing method that can improve the performance of the algorithm.

4.1 Object detection and localization

To obtain an accurate target for the tracker, we establish a background model with the improved codebook algorithm and extract the observed characteristic information of the tracking object by the foreground/background subtraction method of [18]. Using the method from [19], we segment objects that were initially merged together. Then, we obtain the probability distributions of the planes of the objects from the detector, and these can serve as the input to the DW-A^* algorithm. A few selected frames of target localization are illustrated in Fig. 3.

Full range tracking in a camera field of view increases the processing time of the algorithm and consumes a significant portion of the limited memory resources. For this reason, because most of the probabilities of the objective presence are 0, we can reduce the number of nodes and computational cost by this characteristic. On the other hand, we limit the potential birth area of targets to reduce the amount of computation. The proposed method also checks the maximum detection probability of each location k within a given spatiotemporal neighborhood of each frame t. $max_{∥ j - k ∥ < ɛ_{1}, t - ɛ_{2} < α < t + ɛ_{2}} ρ_{j}^{μ}$ (26)

If the value at a location is below the set threshold, an object represented by the value is considered not able to reach the location, and all flows from and to it are removed from the model. This method can reduce by an order of magnitude the number of required variables and constraints. In our experiment, we pruned the graph by a radius of ɛ ₁ = ɛ ₂ = 3.

4.2 Sequence processing

In theory, processing a long video sequence using the DW-A^* algorithm can obtain the global optimum for tracking time, but requires lots of operation time. To address this issue, we split the long sequence into segments of 100 frames each, which yields good results with a delay of less than 0.5 s between input and output and can be performed in real time.

For each segment maintaining temporal consistency, we use the method of multi-frame overlay, as shown in Fig. 4, and add the last 10 frames of the previously optimized segmentation to the first 10 frames of the current one. We then force the sum of the flows of every location of the first 10 frames of the current frame to be consistent with the total number of flows of the last locations of the object in the last 10 frames of the previous one. This effectively solves the problem of the missing target on the piecewise point. $\forall k \in {1, \dots, K}, \sum_{j \in N (i)} φ i, j = \sum_{i \in N (k)} φ k, i = θ_{k}$ (27) where θ _k is the total flow of the last position k of the object appearing in the last 10 frames of the previous segment. For the corresponding first position j of an object appearing in the first 10 frames of the current segment, the flow into it is equal to the flow out of position k, and is also equal to the total flow out of any potential position i of any object between k and j. This is implemented as an additional constraint in our model.

If we cannot find the tracking object in the first 10 frames of the current segment, our method searches for the object in t′ frames after the current one. In our experiment, we let t′ = 10. If we find the object in a frame within t′, this frame is the first frame of the current segment, the tracking fails otherwise.

5 Experimental results

In our simulation, sequences with differentcharacteristics were selected from the CAVIAR, BEHAVEDATA, PETS09 and ETHMS datasets. The challenges for each of these are summarized in Table 1. The selected sequences cover almost all problems that commonly occur in multi-object tracking.

5.1 Parameter setting

In the training period, a detector is established by the background subtraction method of the improved codebook algorithm model. We combine the detection result with the activity scope of the object by foreground/background segment update in real time, and calculate the location of the object with a high probability. Because the size of the activity scope of the object and the number of the pixels of the object are not identical in every sequence, our method can generate about 900 detections per frame in each video sequence. We set the log-likelihood ratio of each detection to be the negative score as the results of the linear detector.

We used a bounded values dynamic model. We define the cost c _i,j between two locations in consecutive frames in the case of spatial overlap (i.e., an object remains at a location) as 0. The costs from the virtual position to the neighborhood of birth and end are c _source,birth = 10, c _end,sink = 10, respectively.

5.2 Evaluation metrics

Let GT_i,t be the i-th ground truth bounding box for the t-th frame, and TR_i,t be the tracked bounding box. C _i,t for the t-th frame and i-th object is defined as the ratio between the area of intersection GT_i,t ∩ TR_i,t and the area of union GT_i,t ∪ TR_i,t [20]. $C_{i, t} = \frac{AREA {{GT}_{i, t} \cap {TR}_{i, t}}}{AREA {{GT}_{i, t} \cup {TR}_{i, t}}}$ (28)

In our experiment, we set the threshold of C _i,t to 0.5, which means that the tracking is successful when the overlapping areas of the ground truth bounding box and the tracked bounding box exceed 0.5.

Our results are evaluated using the multiple object tracking accuracy (MOTA) and multiple object tracking precision (MOTP) metrics of the standard CLEAR2006 metrics [21]. $MOTA = 1 - \frac{\sum_{t} (c_{m} (m_{t}) + c_{f} ({fp}_{t}) + c_{s})}{\sum_{t} g_{t}}$ (29) $MOTP = \frac{\sum_{i, t} C_{i, t}}{\sum_{t} {Nm}_{t}}$ (30) where, g _t is the number of ground truth objects in the t-th frame, Nm _t refers to the number of mapped objects in the t-th frame, m _t represents the missed detection count, and fp _t is the false positive count for each frame. c _s = log SWITCHES _t, where SWITCHES _t is the number of ID mismatches in t considering the mapping in frame t - 1. We started the count from 1 because of the log function. c _m and c _f represent, respectively, the cost functions for missed detections and false positives. The values used for the weighting functions in Equation (29) are c _m = c _f = 1. Figure 5 shows the histograms of MOTA and MOTP in the experiment using the DW-A^* algorithm.

5.3 Analysis of the experimental results

To ensure the unique identification of each tracking target, we use different colors to indicate the order. The video sequences used in our experiment are from Table 1. The detection results are obtained by the process described in Section 4.1 as the input of our algorithm. We then conduct a performance test of the multi-object tracking of false positives, false negatives and a dynamic background, respectively.

Test for false negatives: The sequences use Multipleflow and CrowdS2view8 from the PETS09 dataset. We show typical results in Figs. 6 and 7. In particular, the former uses bright yellow coats worn by pedestrians as the tracking target. Although the probability of false negatives increases significantly because of the occlusion with non-tracking targets, the proposed algorithm can ensure persistent tracking for each object in the entire tracking process. The experiment for CrowdS2view8 verifies the robustness of the proposed algorithm when the targets leave the area of non-restricted departure and reappear soon.

Test for false positives: The sequences use the Fightmargaret of the BEHAVEDATA dataset and the OneStopMoveEnter of the CAVIAR dataset. Typical results are shown in Figs. 8 and 9. We used the method from Section 4.1 for detection and localization. Because of the superior solution and anti-interference of the DW-A^* algorithm, we can stably track multiple targets in a timely fashion in case of false positives.

Test for dynamic background: There are two conditions that must be satisfied by the sequence of the experiment:

The available probability distribution of the dynamic background of the sequence needs to be relatively consistent. Only in this way can the algorithm quickly obtain the location of an object for tracking.

The targets should be fixed access areas in the tracking ground. Because the tracking ground is moving, the potential area in which the objects can enter and exit changes. We require the borders of the camera field of view to be the area for all objects that can enter and exit.

The sequence uses Seq04left from the ETHMS dataset. We obtain object characteristics by the method of combining the skin color and [22], and show the typical results in Fig. 10. The method of detection and localization in Section 4.1 only considers the available probability distribution of the object characteristic in the tracking ground, and does not relate to the background conditions. Therefore, the sequence for our experiment requires a consistent probability distribution. This constraint, in a way, limits the experimental conditions of performance for a dynamic background, but does not affect the conclusion that multi-object tracking using DW-A^* in a dynamic background is robust.

5.4 Simulation analysis

All the above experiments were performed on a Windows XP PC equipped with a 2.7 GHz Pentium(R) Dual-Core CPU and 8 GB of memory. The software platform uses Visual Studio 2010 and OpenCV2.2.

We contrast the proposed algorithm with Zhang’s method 2 [11], Berclaz’s KSP [12] and Pirsiavash’s SSP [13] in S2L1view8 of the PETS09 dataset and Fightmargaret of the BEHAVEDATA dataset with regard to the average tracking errors. The results are shown in Fig. 11. We also compared the algorithms with respect to tracking accuracy. Figure 12 shows false positives per image (FPPI) versus detection rate for all algorithms.

Figure 11 shows that the tracking errors of these algorithms are not significantly different in cases not involving clutter and occupancy. However, when tracking an object in the case of false negatives and false positives for a long time, our proposed algorithm exhibits clear superiority. Although the occupancy problem in the case of simple assumptions can be satisfied by Zhang’s method 2, the required assumptions result in omission and eventually lead to tracking failure when several false positives and false negatives frequently occur. In Fig. 12, when the above state-of-the-art algorithms have the same target detection rates, the DW-A^* algorithm performs better than KSP and SSP in controlling FPPI. The superiority of the proposed algorithm is due to its faster relaxation method with the dynamic constraint, and to more quickly finding the global optimal solution.

With the same object detection algorithm as above, we compared the false positives generated using DW-A^* with those from the other methods on the ETHMS dataset and the CAVIAR dataset, as shown in Table 2. The results show that the DW-A^* can track better. Futher, as shown in Fig. 13, the run time of the DW-A^* significantly outperforms the other three algorithms.

5.5 Run time

We evaluated the speed of our DW-A^* tracking algorithm on the sequences of the BEHAVEDATA dataset at 25 fps. The curves of the run time for DW-A^* and the above algorithms have been shown in Fig. 13. The vertical axis representing run time is plotted on a log scale. The solution of Zhang’s method 2 does not converge for a significant run time. When dealing with a video of 1000 frames, the KSP solver takes approximately 20 seconds and SSP takes 0.9 seconds, but our proposed DW-A^* solver only takes 0.15 seconds.

6 Conclusions

To solve the false positives and false negatives of the multi-object tracking in the clutter environments, we proposed a reliable tracker with a flow network model. In the min-cost flow framework established by the theory of integer program, we combined the A^* algorithm with dynamic weights to develop the DW-A^* algorithm. We used this novel algorithm to relax the integer program and to successfully identify the global optimal solution. The resulting algorithm can better solve the problems of short-time false negatives and false positives in multi-object tracking, and is more robust than state-of-the-art algorithms. The DW-A^* algorithm can quickly find the global optimal solution of the relaxed LP.

Experimental results indicate that the proposed algorithm is helpful in improving trajectory consistency and solving serious occlusion problems between multiple targets, and can satisfy real-time measurement requirements. Compared with other algorithms, there are obvious advantages of DW-A^*. Tracking multiple types of targets with a dynamic background in real-time will be the focus of our future research.

Footnotes

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grants No: 61372090).

References

Kim

Choi

Kong

2010

Intelligent visual surveillance-a survey

International Journal of Control, Automation, and Systems 8 5 926 939

Liu

Yang

2014

Multiple object tracking using shortest path faster association algorithm

The Scientific World Journal Article ID 481719

Jiang

Wang

Liu

2012

A multi-target tracking algorithm based on multiple cameras

Acta Automatica Sinica 38 4 531 539

Medioni

2009

Multiple target tracking by spatiotemporal monte carlo markov chain data association

IEEE Transactions on Pattern Analysis and Machine Intelligence 31 12 2196 2210

Serratosa

Alquezar

Amezquita

2012

A probabilistic integrated object recognition and tracking framework

Expert Systems with Applications 39 8 7302 7318

Fleuret

Berclaz

Lengagne

Fua

2008

Multi-camera people tracking with a probabilistic occupancy map

IEEE Transations on Pattern Analysis and Machine Intelligence 30 2 267 282

Sharp

Sathyan

2012

Positional accuracy measurement and error modeling for mobile tracking

IEEE Transactions on Mobile Computing 11 6 1021 1032

Giebel

Gavrila

Schnorr

2004

A bayesian framework for multi-cue 3D object tracking

Proceedings of the 8th European Conference on Computer Vision 241 252

Perera

AGA

Srinivas

Hoogs

Brooksby

W-S

2006

Multi-object tracking through simultaneous long occlusions and split-merge conditions

Proceedings of the 24th IEEE Conference on Computer Vision and PatternRecognition 666 673

10.

Liu

Zheng

2015

Dynamic shortest path association for multiple object tracking in video sequence

Journal of Electronic Imaging 24 1 013009

11.

Zhang

Nevatia

2008

Global data association for multi-object tracking using network flows

Proceeding of the 26th IEEE Conference on Computer Vision and Pattern Recognition 342 349

12.

Berclaz

Fleuret

Tueretken

Fua

2011

Multiple object tracking using k-shortest paths optimization

IEEE Transactions on Pattern Analysis and Machine Intelligence 33 9 1806 1819

13.

Pirsiavash

Ramanan

Fowlkes

2011

Globally- optimal greedy algorithms for tracking a variable number of objects

Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition 1201 1208

14.

Gonzalez-Lima

Oliveira

ARL

Oliveira

2013

A robust and efficient proposal for solving linear systems arising in interior-point methods for linear programming

Computational Optimization and Applications 56 3 573 597

15.

Khan

Ahmad

Maan

2013

A simplified novel technique for solving fully fuzzy linear programming problems

Journal of Optimization theory and Applications 159 2 536 546

16.

Dimitri

2011 Convex Optimization Theory Tsinghua university press

Beijing

17.

Wang

Ren

2011 Graph Theory, implementation and application Peking university press

Beijing

18.

Sigari

Fathy

2008

Real-time background modeling/ subtraction using two-layer codebook model

Proceedings of the International Multi-conference of Engineers and Computer Scientists 2008 19 21

19.

Bugeau

Perez

2008

Track and cut: Simultaneous tracking and segmentation of multiple objects with graph cuts

Proceedings of EURASIP Journal on Image and Video 317278

20.

Liu

Yuan

Sun

Zhang

2014

Spatial neighborhood constrained linear coding for visual object tracking

IEEE Transactions on Industrial Informatics 10 1 469 480

21.

Rangachar

Dmitry

Padmanabhan

Vasant

John

Rachel

Matthew

Valentina

Zhang

2009

Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol

IEEE Transactions on Pattern Analysis and Machine Intelligence 31 2 319 335

22.

Guan

Juang

Chen

2012

Face localization using fuzzy classifier with wavelet-localized focus color features and shape features

Digital Signal Processing 22 6 961 970

Multiple object tracking using A * association algorithm with dynamic weights

Abstract

Keywords

1 Introduction

2 Network flow model

2.1 Formalization

3.1 A* algorithm

3.3 Dynamic weights

4 Object localization and sequence processing

4.1 Object detection and localization

5.1 Parameter setting

5.2 Evaluation metrics

5.4 Simulation analysis

5.5 Run time

6 Conclusions

Footnotes

Acknowledgments

References

3.1 A^* algorithm