Abstract
Visual tracking is of great importance in multimedia technology enhanced learning. Many human-machine interaction based learning/teaching activities need tracking of specific object. Particle filter has grown to be a standard framework for visual tracking in the past decades. One of its key issues is the design of the proposal distribution which can greatly affect the performance of particle trackers. In this paper we propose an enhanced particle filter for robust visual tracking. First, we propose a new particle filter using two proposal distributions to generate particles, that is, the unscented Kalman filter and the transition prior. Second, we introduce the locality sensitive histogram (LSH) and color based appearance model to deal with the appearance variation within the particle filter tracking framework. Third, by combining our new particle filter and the LSH and color based appearance model, we develop a robust tracking algorithm. Experimental results show that our tracking algorithm is better than or not worse than several other tracking algorithms over several public sequences.
Introduction
Visual tracking is an active research area in the computer vision community which has been applied successfully to a variety of areas including augmented reality, human-machine interaction, intelligent surveillance [1]. Such applications are of great importance to multimedia technology enhanced learnning (TEL) [2–5]. In many video-based teaching activities, teachers often need to use special devices to analyze the contents, such as the trajectories of an moving object (a ball, a car, a human face or a player), in order to make the students better understand and acquire knowledge. Consequently, tracking such objects is of great help to the teachers or students.
In the past decades, many researchers have done many works on visual tracking and proposed many tracking methods [6–8]. These methods can be divided into two categories, that is, stochastic sampling based methods and deterministic methods.
As a famous stochastic-sampling based method, particle filter has grown to be a standard tracking framework. It recursively approximates the posterior probability distribution using a set of weighted particles (or samples). Since its first successful application in visual tracking, that is, the CONDENSATION algorithm is proposed [9], many improved particle filter based tracking algorithms are proposed to solve different challenging situations in visual tracking [6, 10– 13]. One of the key issues in particle filter is to choose an appropriate proposal distribution to generate particles. The most popular proposal distribution is the transition density which is first applied in CONDENSATION algorithm. Many existing particle trackers choose the transition density as the proposal distribution. However, the transition density does not make use of the current observations which leads to tracking failure in complex scenarios. Rui [14] used the unscented Kalman filter (UKF) to generate proposal distribution for face tracking. The tracker, which is named as unscented particle filter (UPF), could take into account current observations and showed superior tracking performance in face tracking. Sun [11] designed an iterated unscented particle filter (IUPF) which used the iterated unscented Kalman filter (IUKF) as the proposal distribution for visual tracking. The IUKF could make better use of the current observations to achieve better estimation of the object states. The IUPF achieved better tracking performance in handling partial occlusion and rotation.
Another important issue for designing a particle filter based tracker is the appearance model, that is, the likelihood model. Many of the existing particle trackers are based on the color based appearance model [15]. Such appearance model uses the color histogram to model the object appearance. During tracking each particle is evaluated based on the distance between the reference template color histogram and the particle color histogram. However, the color based appearance model is easy to be influenced by illumination variation which leads to tracking failure in dealing with severe illumination change. Some researchers proposed to fuse different visual cues in particle filtering framework in order to better handle the different challenging tracking scenarios. Brasnett [16] proposed to use basic particle filter and Gaussian sum particle filter for visual tracking, in which the likelihood distribution is designed based on color and texture features under the assumption that the features being used are independent. The proposed algorithms showed improved tracking accuracy and robustness. Dore [13] adopted the shape and color cues in particle filtering framework for handling deformable objects tracking. A new importance density function was designed for the predicting phase, which exploited multiple Mean Shift trackers to infer the global motion of the target and its deformation. This tracker is robust to handle non-rigid object and partial occlusions. Xiao [17] proposed to use color and histogram of gradients (HOG) features in designing particle filter based tracker. The tracker showed superior performance in handling partial occlusions. Xin [18] proposed an adaptive multiple cues integration tracking approach in the particle filter framework, aiming at the robust visual tracking for outdoor vehicle. In this approach, the reliability of the observation likelihood probability of each cue is estimated according to the uncertainty metric factor of each cue and the corresponding spatial distribution of particles. It shows robustness in handling partial occlusions and background clutters.
In general, many of the existing particle filter based trackers rely on single proposal distribution and color based appearance model, which often lead to tracking failure when severe appearance change occurred. In order to solve the problem, we proposed a new tracking framework based on particle filtering algorithm. First, we use two distributions in the particle filtering framework, that is, the unscented Kalman filter and the transition prior, each of which corresponds to a visual feature. Second, we choose color and LSH [19, 20] based features to construct the appearance model. The color feature is associated with the transition prior while the LSH feature is associated with the UKF. The features are assumed to be independent. The proposed tracking method is evaluated on several typical video sequences, and the results show that our method is superior to several other particle filter trackers.
The rest of the paper is organized as follows. In Section 2, we state the problem of object tracking within the particle filtering framework. Section 3 shows the detail of our proposed tracking method. The experimental results are shown in Section 4. The last section draws conclusion.
Particle filtering framework
Bayesian tracking
Visual tracking can be considered as Bayesian filtering problem. As the aim of Bayesian filtering is to estimate the system states recursively based on the incoming observations, visual tracking under the Bayesian framework refers to estimating the value of object state variable using the image observations. Thus, the task of our tracking method is to find the best state that can maximize the posterior probability density function (PDF) p (x t |y1:t). Here, we define the object state as a 3D vector, that is, . The three components represent the position coordinates and scale of the object, respectively. The PDF can be acquired by the following Bayesian filter,
As is known that, it is difficult to obtain the analytical formulation due to the infeasibility to compute the integral in Equation (1). Thus, many real world applications resort to an approximation to p (x t |y1:t) using Monte Carlo sampling methods one of which is the particle filter.
The basic idea underlying the particle filter is to approximate PDF using a set of weighted particles ,
As we have discussed in Section 1, designing an appropriate proposal distribution is of great importance for particle filter performance as well as the likelihood model. We will show the detail of our contribution in Section 3.
Motion model
In order to robustly track an object, we initially select a region to define the object which provides the prior information for tracking purpose. The motion model adopted is a second order constant acceleration model.
Many particle filter based tracking methods use color histogram in designing appearance model. As we have discussed above, although it is simple and efficient, color based appearance model cannot handle severe appearance changes induced by illumination change. Thus, we consider combining another efficient histogram based feature, LSH, with the color histogram in order to better modeling the object appearance change. Under the assumption that the features we adopted are independent, the overall appearance model is a product of the likelihoods of the individual features,
LSH has been applied successfully in visual tracking and image filtering [19, 20]. It is also very simple and easy to implement. For an image I, the LSH computed at pixel r is,
For the LSH, a distance metric which is appropriate to decide the similarity between the candidate object region and the template region is earth movers distance (EMD) as follow.
For color based likelihood model, we define a HSV color histogram based likelihood function as follow.
In our method, we use two distributions to generate particles, the UKF and the transition prior. The motivation behind this contribution contains two aspects. On the one hand, although the transition prior is simple and easy to implement, it neglects to incorporate the recent observations. The observations contain lots of valuable information of the objects. On the other hand, the UPF uses the UKF as the proposal distribution, but it has very high computational cost which prohibits its extensive application in practice. Thus we propose to combine these two proposal distribution in the particle filtering framework. We denote the transition prior as q1 and the UKF as q2. The final proposal distribution can be formulated as follow,
As q1 is the transition prior p (x
t
|xt-1), according to (6) and (13), the particle weights in (4) can be updated as follow.
Figure 1 shows the schematic view of our proposed particle filter method. At time step t - 1, we use the designed proposal distribution to generate particles and calculate the particle weights. The main procedure contains the following steps: Predicting step. Using the particle set of time t - 2 to generate the predicted particle set . This particle set is generated through the proposal distribution we designed as in (13). Update step. For each predicted particles in predicting step, we calculate the likelihood score according to Equation (6) and then calculate particle weights according to Equation (4). Estimation step. Using the updated particle set to estimate the object states with Equation (2). Resampling step. Remove the particles with low particle weights and multiply particles with high particle weights, and set each particle with equal weight 1/N. Return the resampled particle set.
The above steps will be executed iteratively until the tracking process is ended (the last frame). We summarize our proposed tracker as in Algorithm 1.
Particle set of p (xt-1|y1:t-1), image based observation y t .
Particles of p (x t |y1:t), estimated object state ;
1:
2: – Propagate particle through the motion model to generate the predicted particle set.
3: – Compute the color feature and the LSH feature with the current frame.
4: – Calculate the particle weight using Equation (14).
5:
6: – Obtain the current particle set
7: – Calculate the estimated object state using Equation (2).
8: – Resample the particle set to obtain
9:
This section evaluates the proposed tracking method from qualitative and quantitative perspectives. We select several other tracking methods to compare their performance with our proposed method. The methods are: PF [9], UPF tracker [14], IUPF tracker [11], and SSAMC tracker [7]. We select six typical video sequences to test the trackers’ performance which are shown in Table 1. Such video sequences may appear in different teaching or learning scenarios.
Qualitative analysis
For qualitative comparison, we use 100 particles for UPF, our method and IUPF, 600 for PF and 300 samples for SSAMC by default.
Face. In this sequence, a man is moving his face quickly and suddenly changes moving direction. The tracking results are shown in Fig. 2. It is clear that our tracker could capture the face successfully with fewer particles compared to the other four trackers. The IUPF and UPF could also track the man’s face, but their performance is obviously worse than our tracker. The SSAMC and PF frequently lose the face throughout the sequence. The results on Face sequence demonstrate that our method is efficient in dealing with fast motions.
Grilface. This sequence refers to a girl face moving with rotation, pose change and partial occlusions. This is a long sequence containing different challenging factors. As shown in Fig. 3, the proposed tracker could efficiently deal with the thorny problems induced by rotation (# 177), pose change (# 306) and partial occlusions (#458 and #468). The SSAMC and IUPF show moderately good performance, but their tracking accuracy is not as good as ours. The UPF and PF show the worst performance, they cannot capture the girl face during rotation and occlusion. From these results, our tracker could efficiently solve the appearance variation induced by rotation and partial occlusions.
SeqMS. The SeqMS sequence contains a man frequently moving his hands to occlude his face frequently which leads to frequent partial and total occlusions. 300 particles are used for the methods except the PF, and 600 for the PF. Sample frames of the tracking results are shown in Fig. 4. It is shown that, when the man’s face is occluded by his hands, our method could accurately localize its position before or after occlusion. Our method shows steadily better performance. The SSAMC shows better performance in dealing with partial occlusions compared to the IUPF and UPF. However, as for total occlusions, only our method could successfully handle the problem while the other four methods fail to deal with them.
David and Singer. In this experiment, 300 particles are used for the methods except the PF, and 600 for the PF. The David and Singer sequences mainly contain severe illumination variation which causes severe appearance changes. As we have stated previously, we adopted the LSH based appearance model which can deal with illumination change efficiently. Our tracker is expected to show much better performance in these two sequences compared to the other trackers. Tracking results over the David sequence is shown in Fig. 5. It is shown that, although severe illumination change occurred throughout the sequence, our method could consistently capture the object. The SSAMC could track the object at certain frames, but its tracking accuracy is lower than our method. The IUPF, UPF and PF methods frequently lose the object because they adopt the color based appearance model which cannot deal with the severe appearance variation in complex scenarios. Figure 6 shows the tracking results on the Singer sequence. The results are quite the same as that on the David sequence. Our method shows the best performance compared to the other four methods. The results on these two sequences demonstrate that our proposed tracking method could efficiently deal with appearance changes induced by severe illumination change.
Hockey. In this sequence, the hockey player is moving with many similar players wearing the same clothes. The results are shown in Fig. 7. When the object interacts with similar players, our track could give steady and accurate tracking result (#44 and #52). The PF tracker could also capture the object when interaction occurred, but its accuracy is worse than ours. The IUPF and UPF lose the object after interaction occurred, while the SSAMC frequently lose the object throughout the sequence.
Quantitative analysis
Success rate
For quantitative comparison, we first evaluate the trackers’ success rates on different sequences. As that in [21], we define a successful track when thef-measure is larger than 0.5 at one frame. For a video sequence, the success rate (SR) is defined as the ratio between the successfully tracked frames and the total frames of the sequence. Table 2 shows the trackers’ SR over the six video sequences. It is observed that our method outperforms the other five methods on all of the sequences. The SSAMC and IUPF are superior to the UPF and PF except the Hokey sequence. For the Hockey sequence, PF and our method are better than the other methods.
Time consumption
In this section, we compare the time cost of the trackers. Thus, for fair comparison, we choose the Singer sequence to run the five trackers using 300 particles or samples. The frame size is 320 pixels*240 pixels. All the trackers are run on a personal computer with CUP: Intel Core i5-2540M, Memory: 12.0 GB, OS: Window 7 64 Bit. As shown in Table 3, it is clear that the time costs of the IUPF and UPF are higher than that of the other trackers. The SSAMC and our method are almost the same, while the PF needs the least time.
Conclusion
We have proposed an enhanced particle filter for visual tracking. The proposal distribution we adopted is a hybrid distribution that is composed of the UKF and the transition prior. For the appearance model, we construct an efficient model to deal with appearance variation. The color histogram and LSH are used in this appearance model. Experimental results demonstrate that the proposed method shows superior performance to several other methods. Our future work will focus on updating the template model during tracking on the basis of this work.
Footnotes
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Nos. 61300082, 61272369, 61402069), Program for Liaoning Excellent Talents in University (No. LJQ2015006) and Liaoning Natural Science Foundation (No. 2015020015).
