Abstract
Persistently tracking multiple objects is very challenging when there exit occlusions. We present a tracking association approach based on the A* algorithm. We first formulate the multiple object tracking as an integer programming problem of the flow network. Under this framework, the integer assumption is relaxed to a standard linear programming problem. Therefore, the global optimal solution can quickly be obtained using the A* algorithm with dynamic weights. The proposed method avoids the difficulties of integer programming and more importantly, it has a lower worst-case complexity than competing methods but a better tracking accuracy and robustness in complex environments. Experiment results revealed that our proposed method achieved state-of-the-art time costs and can operate in real-time.
Introduction
Multiple object tracking is important for many computer vision applications, such as video surveillance, human-computer interaction, intelligent navigation, and others [1, 2]. Apart from a high performance detection algorithm as an auxiliary, high quality multi-object tracking should also track the algorithm for support, which can address certain types of complex cases, e.g., occlusion, illumination, clutter, and so on [3]. The data association (DA) method is a favorite method for multi-object tracking. The utilized techniques often incluode the nearest neighbor method, joint probability data association (JPDA), methods based on neural networks, and so on [4, 5].
The effect of the DA methods mentioned above is closely related to the detection accuracy in adjacent frames. These typical approaches are resilient to false positives and false negatives: if an object is not detected in a frame but is detected in the preceding and following frames, it is a false negative. A false positive is mistaking the tracking object “A” as object “B”. Although this problem can be solved using targeted design in a statistical trajectory model with filtering [6, 7], the calculation method that provides the maximum posterior probability is NP-complete.
To deal with this problem, some recent papers have proposed different approaches: Giebel [8] uses sampling and particle filtering to remove clutter from the same object and reduce the probability of NP-completeness. This method obtains a relatively accurate tracking trajectory but requires a sufficient number of sampling points. Perera [9] divides a long sequence into several short ones, yielding many tracklets, and links them using Kalman filtering. This can avoid NP-completeness. The accuracy of this method is inversely proportional to the length of the tracklets; the shorter the length, the better the tracking result. However, the excessive division increases computation time, therefore, the method cannot track objects for long. Fleuret [6] processes trajectories individually over long sequences using greedy dynamic programming (DP) to choose the order. These approaches, while effective, cannot attain the global optimum. Xi [10] improves the SPFA algorithm to relax the integer assumption and to successfully identify the global optimal soluteion. However, this algorithm is not easy to understand.
Zhang’s approach [11] relies on a min-cost network flow framework-based optimization method to find the global optimum for multi-object tracking. However, the two algorithms he proposes have defects and his complexity is polynomial. Under this framework, Berclaz [12] formulates multi-object tracking as an Integer Programming (IP) problem and reduces it to linear programming (LP). By relying on the k-shortest paths (KSP) algorithm for the optimization of the LP problem, this approach reduces the complexity to perform robust multi-object tracking in time. However, because of KSP’s lack of a motion model over DP, the tendency of the latter to ignore fragmentary trajectories makes it more robust. Pirsiavash [13] continues the work of Zhang and uses his method to obtain the global optimal solution with the greedy algorithm for K = 1 in O (N), but only obtains the approximate solutions for K > 1 in O (KN), where K is the unknown optimal number of unique tracks.
By contrast, we effectively combine the models proposed by Zhang and Berclaz, to devise a more efficient A* association algorithm with dynamic weights(DW-A*). Not only can the DW-A* algorithm directly obtain the global solution without greedy optimization, but it can show better performance with respect to both the worst-case complexity and the run time than the above-mentioned state-of-the-art algorithms. The main contributions of this paper include the following: Based on the min-cost network framework, we introduce a novel general mathematical IP formulation for multi-object tracking. The proposed IP method convenient and easier to comprehend than the state-of-the-art methods stated in Refs. [12] and [13]. Furthermore, it is conducive to naturally filtering out false positives and false negatives using DW-A*. To solve the integer LP formulation of the proposed framework, and to obtain the global optimum, we propose a novel more rapid and more efficient DW-A* algorithm that improves the A* algorithm. Compared with the state-of-the-art methods of Refs. [12] and [13], the proposed algorithm can obviously reduce the running time and the results of MOTA and MOTP have improved. Extensive experimental validations.
The rest of paper is organized as follows. InSection 2, we formulate an IP problem using the min-cost network flow framework and relax it to a continuous LP. Section 3 contains our proposed DW-A* algorithm for the relaxation of the original integer assumption. We introduce methods to target localization and long sequence segmentation processing in Section 4. Section 5 shows the experimental results and a complete evaluation metrics, and Section 6 concludes the paper.
Network flow model
The target motion of multi-objet tracking can be described better by the relationship between the neighborhood locations that uses the DP method in a min-cost network flow model. We define an objective function for multi-object tracking in the same manner as Ref. [12]. The objective presence of likelihood will be estimated by the marginal posterior probability in every frame, thereby obtaining the potential trajectory of the moving object.
Formalization
We formulate multi-object tracking as a process where the objective location of each target discretely changes in continuous time. A directed 3D spatiotemporal group with random variable k is used to describe the video sequence.
For any location k at time t, the neighborhood N (k)⊂ { 1, 2, ⋯ , K } denotes the locations that an object can reach at time t + 1. A single track as an ordered set of state vectors T = (k
1, ⋯ , k
N
), and X = (T
1, ⋯ , T
L
) is a set of tracks. We assume that the tracking tracks are independently of each other and describe the network flow model of multi-object tracking using the dynamic model as follows:
P source (k 1) is the probability of a tracking trackstarting at location k 1 and P sink (k N ) is the probability of a tracking track ending at location k N .
In the spatial coordinate set V, a binary indicator variable φ
i,k represents the directed flow from location i to location k, i.e., it stands for the number of objectsmoving from i to k. φ
i,k is 1 when the space-time locations i and k are included in some track, given that the object is at location i at time t and location k at time t + 1, which means that an object stays at the same spatial location between times t and t + 1. Some constraint conditions are executed for the variable φ
i,k.
Let a random variable M
k
stands for the true presence of an object at location k in space-time. For every instant of t, the detector is used to check every location of the tracking zone. The marginal posterior probability of an existing object is calculated as
M
k
is conditional independence in X, we infer the maximum a posteriori estimate of tracks by the probability distributions of the existence of objects:
In our model, because the objects can enter and leave the tracking area, we introduce additional nodes for the source and sink that have been defined in the model proposed by [12]. Equations (8–13) can then be naturally translated into an integer program:
”, c
source,i and c
i,sink are represented by dash line “
”.
The costs are defined as follows:
The relaxation of integer program using standard methods is NP complete. In general, the variants of the simple algorithm [14] or the interior point-based methods [15] can be used to solve this problem. However, these algorithms have very high worst case time complexities. In [12] and [13], although the methods for KSP and the stochastic shortest path (SSP) can successfully relax the IP to a continuous linear program, both of them have their own deficiency. We used the proposed DW-A* algorithm to compensate for the deficiencies of these methods.
In this paper, we propose an A* algorithm with dynamic weights to relax the IP by the network flow mode, the worst-case complexity of this algorithm is O (KN). The global optimal solution of the proposed algorithm makes the tracking more the reliabe and more efficient. The network flow model needs two particular properties to realize our algorithm: All edges and nodes are independent of each other and all edges are unit capacity. The network is a directed acyclic graph (DAG).
A* algorithm
Let C be the total cost of any location in space V, and let E be the set of the edges between adjacent frames of any neighborhood location. The state transition between any pair of nodes of the model can be attained by E, and the DAG G (V, E C) can completely describe the flow activity of an object of the min-cost flow model.
Let G r (φ) as the residual graph of G (V, E, C) that denotes all locations from the current node to the ending node. We can than find the shortest path between both nodes by the A* algorithm in G r (φ).
Because the tracking targets may appear inside the tracking area and others may leave, we introduce two additional virtual nodes, source and sink into our DAG. These two virtual nodes denote the potential position, source and sink here denote the position where a target appeares and disappeares, respectively. Then, we use the neighbors of birth and end to replace the original position and form a new DAG with virtual nodes source and sink, as shown in Fig. 2.
We create the Open list and the Closed list. The Open list records all the nodes that are considered to find the shortest path, and the Closed list records the nodes that are no longer considered. In the proposed min-cost flow model, we can obtain the shortest path througth the following steps:
1) We put the initial position l
birth
into the Open list, calculating the total cost of any path f
i,birth from l
birth
to current position l
i
:
The cost from l
i
to the potential position l
j
, l
j
∈ N (l
i
) of the next frame is calculated as follows:
The estimate of total cost of any path f
j,end from l
j
to the potentially terminal position is calculated by Equation (21), l
birth
is put into the Closed list.
2) Putting the neighborhood of l
i
, l
j
∈ N (l
i
) into the Open list, we obtain the position , which can be satisfied by Equation (22), putting into the Closed list.
We empty the Open list and update the current location to . Steps 1)-3) are iterated until l
end
is added into the Open list (because l
birth
has been add into the Closed list, the Open list no longer adds it). We output all the searched locations in the Closed list according to first in first out (FIFO), which is the shortest path.
Figure 2 shows the simple processing steps of the A* algorithm in our proposed mode. Here,
The IP problem U 1 and the corresponding relaxed LP problem U 2 are considered as follows:
U 1 : min cx ; s . t . Ax = b, x ≥ 0 and as an integer vector.
U 2 : min cx ; s . t . Ax = b, x ≥ 0.
where, c, b and A are the known appropriate dimension vectors and constraint matrix, respectively.
The total unimodularity of A has been proved in [12]. To improve the screening speed of the A* algorithm for the large number of nodes in the initial stage and then ignore some objective movement in the later stage, we use the dynamic weight on the A* algorithm.
Dynamic weights
To help find the optimal solution quickly and accurately, we can prioritize speed in the initial stage of the search and increase the precision priority in the later stage, which can be achieved by adding a dynamic weight ω in Equation (22).
Pirsiavash and Berclaz propose the KSP and SSP algorithms. The worst-case complexity of both algorithms is O (KN log N), where K is the unknown optimal number of unique tracks, and N is the frame number of the sequence. Because of the different values of K, Pirsiavash uses different methods to obtain the solution. The specific complexity of this algorithm is related to the value of K.
The Dijkstra algorithm is recognized as an effective method to compute the shortest path in O (N log N) time. Unfortunately, in our proposed min-cost flow network, there are negative costs, which contrdict the precondition of the Dijkstra algorithm. Fortunately, the simpler A* algorithm can be adopted in this network. For the DAG G (V, E, C), the worst case will appear when the number of extended sub-nodes from the current node is up to 3 sub-nodes that are not in the Open list. While the Open list will be empty in each iteration, the number of nodes of the Open list will not increase. Therefore, for the N frame sequence, the worst-case complexity of the DW-A* algorithm is O (KN), where K is the number of optimal paths using the A* algorithm in the DAG. Generally, K ≤ 3. In fact, achieving the tracking curves using the A* algorithm involves obtaining the optimal solution using the heuristic Dijkstra algorithm. The heuristic feature of the A* algorithm makes the search direction more objective and reduces unnecessary calculations.
Object localization and sequence processing
High quality multi-object tracking requires a reliable tracker, a detector that can accurately segment and locate multiple objects, and a pre-processing method that can improve the performance of the algorithm.
Object detection and localization
To obtain an accurate target for the tracker, we establish a background model with the improved codebook algorithm and extract the observed characteristic information of the tracking object by the foreground/background subtraction method of [18]. Using the method from [19], we segment objects that were initially merged together. Then, we obtain the probability distributions of the planes of the objects from the detector, and these can serve as the input to the DW-A* algorithm. A few selected frames of target localization are illustrated in Fig. 3.
Full range tracking in a camera field of view increases the processing time of the algorithm and consumes a significant portion of the limited memory resources. For this reason, because most of the probabilities of the objective presence are 0, we can reduce the number of nodes and computational cost by this characteristic. On the other hand, we limit the potential birth area of targets to reduce the amount of computation. The proposed method also checks the maximum detection probability of each location k within a given spatiotemporal neighborhood of each frame t.
If the value at a location is below the set threshold, an object represented by the value is considered not able to reach the location, and all flows from and to it are removed from the model. This method can reduce by an order of magnitude the number of required variables and constraints. In our experiment, we pruned the graph by a radius of ɛ 1 = ɛ 2 = 3.
In theory, processing a long video sequence using the DW-A* algorithm can obtain the global optimum for tracking time, but requires lots of operation time. To address this issue, we split the long sequence into segments of 100 frames each, which yields good results with a delay of less than 0.5 s between input and output and can be performed in real time.
For each segment maintaining temporal consistency, we use the method of multi-frame overlay, as shown in Fig. 4, and add the last 10 frames of the previously optimized segmentation to the first 10 frames of the current one. We then force the sum of the flows of every location of the first 10 frames of the current frame to be consistent with the total number of flows of the last locations of the object in the last 10 frames of the previous one. This effectively solves the problem of the missing target on the piecewise point.
If we cannot find the tracking object in the first 10 frames of the current segment, our method searches for the object in t′ frames after the current one. In our experiment, we let t′ = 10. If we find the object in a frame within t′, this frame is the first frame of the current segment, the tracking fails otherwise.
In our simulation, sequences with differentcharacteristics were selected from the CAVIAR, BEHAVEDATA, PETS09 and ETHMS datasets. The challenges for each of these are summarized in Table 1. The selected sequences cover almost all problems that commonly occur in multi-object tracking.
Parameter setting
In the training period, a detector is established by the background subtraction method of the improved codebook algorithm model. We combine the detection result with the activity scope of the object by foreground/background segment update in real time, and calculate the location of the object with a high probability. Because the size of the activity scope of the object and the number of the pixels of the object are not identical in every sequence, our method can generate about 900 detections per frame in each video sequence. We set the log-likelihood ratio of each detection to be the negative score as the results of the linear detector.
We used a bounded values dynamic model. We define the cost c i,j between two locations in consecutive frames in the case of spatial overlap (i.e., an object remains at a location) as 0. The costs from the virtual position to the neighborhood of birth and end are c source,birth = 10, c end,sink = 10, respectively.
Evaluation metrics
Let GTi,t be the i-th ground truth bounding box for the t-th frame, and TRi,t be the tracked bounding box. C
i,t for the t-th frame and i-th object is defined as the ratio between the area of intersection GTi,t ∩ TRi,t and the area of union GTi,t ∪ TRi,t [20].
In our experiment, we set the threshold of C i,t to 0.5, which means that the tracking is successful when the overlapping areas of the ground truth bounding box and the tracked bounding box exceed 0.5.
Our results are evaluated using the multiple object tracking accuracy (MOTA) and multiple object tracking precision (MOTP) metrics of the standard CLEAR2006 metrics [21].
To ensure the unique identification of each tracking target, we use different colors to indicate the order. The video sequences used in our experiment are from Table 1. The detection results are obtained by the process described in Section 4.1 as the input of our algorithm. We then conduct a performance test of the multi-object tracking of false positives, false negatives and a dynamic background, respectively.
The available probability distribution of the dynamic background of the sequence needs to be relatively consistent. Only in this way can the algorithm quickly obtain the location of an object for tracking. The targets should be fixed access areas in the tracking ground. Because the tracking ground is moving, the potential area in which the objects can enter and exit changes. We require the borders of the camera field of view to be the area for all objects that can enter and exit.
The sequence uses Seq04left from the ETHMS dataset. We obtain object characteristics by the method of combining the skin color and [22], and show the typical results in Fig. 10. The method of detection and localization in Section 4.1 only considers the available probability distribution of the object characteristic in the tracking ground, and does not relate to the background conditions. Therefore, the sequence for our experiment requires a consistent probability distribution. This constraint, in a way, limits the experimental conditions of performance for a dynamic background, but does not affect the conclusion that multi-object tracking using DW-A* in a dynamic background is robust.
Simulation analysis
All the above experiments were performed on a Windows XP PC equipped with a 2.7 GHz Pentium(R) Dual-Core CPU and 8 GB of memory. The software platform uses Visual Studio 2010 and OpenCV2.2.
We contrast the proposed algorithm with Zhang’s method 2 [11], Berclaz’s KSP [12] and Pirsiavash’s SSP [13] in S2L1view8 of the PETS09 dataset and Fightmargaret of the BEHAVEDATA dataset with regard to the average tracking errors. The results are shown in Fig. 11. We also compared the algorithms with respect to tracking accuracy. Figure 12 shows false positives per image (FPPI) versus detection rate for all algorithms.
Figure 11 shows that the tracking errors of these algorithms are not significantly different in cases not involving clutter and occupancy. However, when tracking an object in the case of false negatives and false positives for a long time, our proposed algorithm exhibits clear superiority. Although the occupancy problem in the case of simple assumptions can be satisfied by Zhang’s method 2, the required assumptions result in omission and eventually lead to tracking failure when several false positives and false negatives frequently occur. In Fig. 12, when the above state-of-the-art algorithms have the same target detection rates, the DW-A* algorithm performs better than KSP and SSP in controlling FPPI. The superiority of the proposed algorithm is due to its faster relaxation method with the dynamic constraint, and to more quickly finding the global optimal solution.
With the same object detection algorithm as above, we compared the false positives generated using DW-A* with those from the other methods on the ETHMS dataset and the CAVIAR dataset, as shown in Table 2. The results show that the DW-A* can track better. Futher, as shown in Fig. 13, the run time of the DW-A* significantly outperforms the other three algorithms.
Run time
We evaluated the speed of our DW-A* tracking algorithm on the sequences of the BEHAVEDATA dataset at 25 fps. The curves of the run time for DW-A* and the above algorithms have been shown in Fig. 13. The vertical axis representing run time is plotted on a log scale. The solution of Zhang’s method 2 does not converge for a significant run time. When dealing with a video of 1000 frames, the KSP solver takes approximately 20 seconds and SSP takes 0.9 seconds, but our proposed DW-A* solver only takes 0.15 seconds.
Conclusions
To solve the false positives and false negatives of the multi-object tracking in the clutter environments, we proposed a reliable tracker with a flow network model. In the min-cost flow framework established by the theory of integer program, we combined the A* algorithm with dynamic weights to develop the DW-A* algorithm. We used this novel algorithm to relax the integer program and to successfully identify the global optimal solution. The resulting algorithm can better solve the problems of short-time false negatives and false positives in multi-object tracking, and is more robust than state-of-the-art algorithms. The DW-A* algorithm can quickly find the global optimal solution of the relaxed LP.
Experimental results indicate that the proposed algorithm is helpful in improving trajectory consistency and solving serious occlusion problems between multiple targets, and can satisfy real-time measurement requirements. Compared with other algorithms, there are obvious advantages of DW-A*. Tracking multiple types of targets with a dynamic background in real-time will be the focus of our future research.
Footnotes
Acknowledgments
This work is supported by the National Natural Science Foundation of China (Grants No: 61372090).
