Tracking objects in video-based education using an enhanced particle filter

Abstract

Visual tracking is of great importance in multimedia technology enhanced learning. Many human-machine interaction based learning/teaching activities need tracking of specific object. Particle filter has grown to be a standard framework for visual tracking in the past decades. One of its key issues is the design of the proposal distribution which can greatly affect the performance of particle trackers. In this paper we propose an enhanced particle filter for robust visual tracking. First, we propose a new particle filter using two proposal distributions to generate particles, that is, the unscented Kalman filter and the transition prior. Second, we introduce the locality sensitive histogram (LSH) and color based appearance model to deal with the appearance variation within the particle filter tracking framework. Third, by combining our new particle filter and the LSH and color based appearance model, we develop a robust tracking algorithm. Experimental results show that our tracking algorithm is better than or not worse than several other tracking algorithms over several public sequences.

Keywords

Video based learning visual tracking particle filter proposal distribution

1 Introduction

Visual tracking is an active research area in the computer vision community which has been applied successfully to a variety of areas including augmented reality, human-machine interaction, intelligent surveillance [1]. Such applications are of great importance to multimedia technology enhanced learnning (TEL) [2 –5]. In many video-based teaching activities, teachers often need to use special devices to analyze the contents, such as the trajectories of an moving object (a ball, a car, a human face or a player), in order to make the students better understand and acquire knowledge. Consequently, tracking such objects is of great help to the teachers or students.

In the past decades, many researchers have done many works on visual tracking and proposed many tracking methods [6 –8]. These methods can be divided into two categories, that is, stochastic sampling based methods and deterministic methods.

As a famous stochastic-sampling based method, particle filter has grown to be a standard tracking framework. It recursively approximates the posterior probability distribution using a set of weighted particles (or samples). Since its first successful application in visual tracking, that is, the CONDENSATION algorithm is proposed [9], many improved particle filter based tracking algorithms are proposed to solve different challenging situations in visual tracking [6 , 10– 13]. One of the key issues in particle filter is to choose an appropriate proposal distribution to generate particles. The most popular proposal distribution is the transition density which is first applied in CONDENSATION algorithm. Many existing particle trackers choose the transition density as the proposal distribution. However, the transition density does not make use of the current observations which leads to tracking failure in complex scenarios. Rui [14] used the unscented Kalman filter (UKF) to generate proposal distribution for face tracking. The tracker, which is named as unscented particle filter (UPF), could take into account current observations and showed superior tracking performance in face tracking. Sun [11] designed an iterated unscented particle filter (IUPF) which used the iterated unscented Kalman filter (IUKF) as the proposal distribution for visual tracking. The IUKF could make better use of the current observations to achieve better estimation of the object states. The IUPF achieved better tracking performance in handling partial occlusion and rotation.

Another important issue for designing a particle filter based tracker is the appearance model, that is, the likelihood model. Many of the existing particle trackers are based on the color based appearance model [15]. Such appearance model uses the color histogram to model the object appearance. During tracking each particle is evaluated based on the distance between the reference template color histogram and the particle color histogram. However, the color based appearance model is easy to be influenced by illumination variation which leads to tracking failure in dealing with severe illumination change. Some researchers proposed to fuse different visual cues in particle filtering framework in order to better handle the different challenging tracking scenarios. Brasnett [16] proposed to use basic particle filter and Gaussian sum particle filter for visual tracking, in which the likelihood distribution is designed based on color and texture features under the assumption that the features being used are independent. The proposed algorithms showed improved tracking accuracy and robustness. Dore [13] adopted the shape and color cues in particle filtering framework for handling deformable objects tracking. A new importance density function was designed for the predicting phase, which exploited multiple Mean Shift trackers to infer the global motion of the target and its deformation. This tracker is robust to handle non-rigid object and partial occlusions. Xiao [17] proposed to use color and histogram of gradients (HOG) features in designing particle filter based tracker. The tracker showed superior performance in handling partial occlusions. Xin [18] proposed an adaptive multiple cues integration tracking approach in the particle filter framework, aiming at the robust visual tracking for outdoor vehicle. In this approach, the reliability of the observation likelihood probability of each cue is estimated according to the uncertainty metric factor of each cue and the corresponding spatial distribution of particles. It shows robustness in handling partial occlusions and background clutters.

In general, many of the existing particle filter based trackers rely on single proposal distribution and color based appearance model, which often lead to tracking failure when severe appearance change occurred. In order to solve the problem, we proposed a new tracking framework based on particle filtering algorithm. First, we use two distributions in the particle filtering framework, that is, the unscented Kalman filter and the transition prior, each of which corresponds to a visual feature. Second, we choose color and LSH [19, 20] based features to construct the appearance model. The color feature is associated with the transition prior while the LSH feature is associated with the UKF. The features are assumed to be independent. The proposed tracking method is evaluated on several typical video sequences, and the results show that our method is superior to several other particle filter trackers.

The rest of the paper is organized as follows. In Section 2, we state the problem of object tracking within the particle filtering framework. Section 3 shows the detail of our proposed tracking method. The experimental results are shown in Section 4. The last section draws conclusion.

2 Particle filtering framework

2.1 Bayesian tracking

Visual tracking can be considered as Bayesian filtering problem. As the aim of Bayesian filtering is to estimate the system states recursively based on the incoming observations, visual tracking under the Bayesian framework refers to estimating the value of object state variable using the image observations. Thus, the task of our tracking method is to find the best state that can maximize the posterior probability density function (PDF) p (x_t|y_1:t). Here, we define the object state as a 3D vector, that is, $x_{t} = (x_{t}^{x}, x_{t}^{y}, x_{t}^{s})$ . The three components represent the position coordinates and scale of the object, respectively. The PDF can be acquired by the following Bayesian filter,

$\begin{matrix} p (x_{t} | y_{1 : t}) \propto p (y_{t} | x_{t}) \\ \int p (x_{t} | x_{t - 1}) p (x_{t - 1} | y_{1 : t - 1}) d x_{t - 1} \end{matrix}$ (1) where p (x_t|x_t-1) denotes the motion model that produces the predicted state based on x_t-1; p (y_t|x_t) represents the appearance model which is used to measure the similarity between the target object and observation at the proposed state, and it is also named as likelihood model. With the computed p (x_t|y_1:t), we can obtain the estimated state value through maximum a posteriori (MAP) estimate over a sample set containing N samples, $x_{t}^{MAP} = arg max p (x_{t}^{(i)} | y_{1 : t}), i = 1, 2, . . ., N$ (2)

As is known that, it is difficult to obtain the analytical formulation due to the infeasibility to compute the integral in Equation (1). Thus, many real world applications resort to an approximation to p (x_t|y_1:t) using Monte Carlo sampling methods one of which is the particle filter.

2.2 Particle filtering

The basic idea underlying the particle filter is to approximate PDF using a set of weighted particles ${x_{t}^{i}, ϖ_{t}^{i}}_{i = 1}^{N}$ , $p (x_{t} | y_{1 : t}) \approx \sum_{i = 1}^{N} ϖ_{t}^{i} δ (x_{t} - x_{t}^{i})$ (3) where δ is the Delta-Dirac function. The particles used in such approximation are drawn from a proposal distribution q, and the particle weight $ϖ_{t}^{i}$ satisfies $\sum_{i = 1}^{N} ϖ_{t}^{i} = 1$ which can be recursively computed using the following equation, $ϖ_{t}^{(i)} \propto ϖ_{t - 1}^{(i)} \frac{p (y_{t} | x_{t}^{i}) p (x_{t}^{i} | x_{t - 1}^{i})}{q (x_{t}^{i} | x_{t - 1}^{i}, y_{t})}$ (4)

As we have discussed in Section 1, designing an appropriate proposal distribution is of great importance for particle filter performance as well as the likelihood model. We will show the detail of our contribution in Section 3.

3 Our method

3.1 Motion model

In order to robustly track an object, we initially select a region to define the object which provides the prior information for tracking purpose. The motion model adopted is a second order constant acceleration model. $x_{t} = A_{1} x_{t - 1} + A_{2} x_{t - 2} + Γ v_{t}$ (5) where A₁, A₂ and Γ are coefficient matrices. This motion model is widely adopted is the literature.

3.2 Appearance model

Many particle filter based tracking methods use color histogram in designing appearance model. As we have discussed above, although it is simple and efficient, color based appearance model cannot handle severe appearance changes induced by illumination change. Thus, we consider combining another efficient histogram based feature, LSH, with the color histogram in order to better modeling the object appearance change. Under the assumption that the features we adopted are independent, the overall appearance model is a product of the likelihoods of the individual features, $p (y_{t} | x_{t}) = p_{lsh} (y_{lsh, t} | x_{t}) \cdot p_{color} (y_{color, t} | x_{t})$ (6)

3.2.1 Locality sensitive histogram

LSH has been applied successfully in visual tracking and image filtering [19, 20]. It is also very simple and easy to implement. For an image I, the LSH computed at pixel r is, $H_{r}^{I} (b) = \sum_{j = 1}^{R} τ^{| r - j |} . L (I_{j}, b), (b = 1, 2, . . ., B)$ (7) where R is the number of pixels in image I, B is the number of bins. If the intensity value of pixel location j belongs to the bin b, then the value of L (I_j, b) is 1; otherwise, L (I_j, b) is 0. τ is an parameter which is used to control the weight when a pixel move far away from the location center. More details of LSH can be found in [19].

For the LSH, a distance metric which is appropriate to decide the similarity between the candidate object region and the template region is earth movers distance (EMD) as follow.

$d (T_{1}, T_{2}) = \sum_{b = 1}^{B} | C_{1} (b) - C_{2} (b) |$ (8) where T₁ and T₂ denote the template region and candidate region, respectively; C₁ (b) and C₂ (b) are the cumulative histograms of LSHs. The likelihood based on LSH is defined as:

$p_{lsh} (y_{lsh, t} | x_{t}) = e^{- λ_{1} d (T_{1}, T_{2})}$ (9) where λ₁ is a parameter.

3.2.2 Color based likelihood model

For color based likelihood model, we define a HSV color histogram based likelihood function as follow. $p_{color} (y_{color, t} | x_{t}) = e^{- λ_{2} D (h_{1}, h_{2})}$ (10) where λ₂ is a predefined parameter, h₁ is the color histogram of the reference model, while h₂ is the color histogram of the candidate region. A distance metric to decide the closeness of the two histograms D (h₁, h₂) is the Bhattacharyya similarity distance which is defined as follow: $D (h_{1}, h_{2}) = \sqrt{1 - ρ (h_{1}, h_{2})}$ (11) where ρ is the Bhattacharyya coefficient defined as follow, $ρ (h_{1}, h_{2}) = \sum_{i} \sqrt{h_{1, i} \cdot h_{2, i}}$ (12)

3.3 Multiple proposal distribution framework

In our method, we use two distributions to generate particles, the UKF and the transition prior. The motivation behind this contribution contains two aspects. On the one hand, although the transition prior is simple and easy to implement, it neglects to incorporate the recent observations. The observations contain lots of valuable information of the objects. On the other hand, the UPF uses the UKF as the proposal distribution, but it has very high computational cost which prohibits its extensive application in practice. Thus we propose to combine these two proposal distribution in the particle filtering framework. We denote the transition prior as q₁ and the UKF as q₂. The final proposal distribution can be formulated as follow, $q (x_{t} | x_{t - 1}, y_{t}) = q_{1} (x_{t} | x_{t - 1}, y_{t}) \cdot q_{2} (x_{t} | x_{t - 1}, y_{t})$ (13)

As q₁ is the transition prior p (x_t|x_t-1), according to (6) and (13), the particle weights in (4) can be updated as follow. $ϖ_{t}^{(i)} \propto ϖ_{t - 1}^{(i)} \frac{p_{lsh} (y_{lsh, t} | x_{t}^{i}) . p_{color} (y_{color, t} | x_{t}^{i})}{q_{2} (x_{t}^{i} | x_{t - 1}^{i}, y_{t})}$ (14) When a candidate particle is being evaluated, the weight is calculated by fusing the color feature and the LSH based feature through Equation (14). Such integration of the two features can obviously make better use of the observation information in current frame, which consequently produce positive influence to the tracking results.

Figure 1 shows the schematic view of our proposed particle filter method. At time step t - 1, we use the designed proposal distribution to generate particles and calculate the particle weights. The main procedure contains the following steps:

Predicting step. Using the particle set ${x_{t - 2}^{i}, 1 / N}_{i = 1}^{N}$ of time t - 2 to generate the predicted particle set ${x_{t - 1 | t - 2}^{i}, 1 / N}_{i = 1}^{N}$ . This particle set is generated through the proposal distribution we designed as in (13).

Update step. For each predicted particles in predicting step, we calculate the likelihood score according to Equation (6) and then calculate particle weights according to Equation (4).

Estimation step. Using the updated particle set to estimate the object states with Equation (2).

Resampling step. Remove the particles with low particle weights and multiply particles with high particle weights, and set each particle with equal weight 1/N.

Return the resampled particle set.

The above steps will be executed iteratively until the tracking process is ended (the last frame). We summarize our proposed tracker as in Algorithm 1.

Algorithm 1 Enhanced Particle Tracker

Input:

Particle set of p (x_t-1|y_1:t-1), image based observation y_t.

Output:

Particles of p (x_t|y_1:t), estimated object state $x_{t}^{MAP}$ ;

1: for (i = 1; i < = N; i++) do

2: – Propagate particle $x_{t}^{i}$ through the motion model to generate the predicted particle set.

3: – Compute the color feature and the LSH feature with the current frame.

4: – Calculate the particle weight $ϖ_{t}^{(i)}$ using Equation (14).

5: end for

6: – Obtain the current particle set ${x_{t}^{i}, ϖ_{t}^{(i)}}_{i = 1}^{N}$

7: – Calculate the estimated object state $x_{t}^{MAP}$ using Equation (2).

8: – Resample the particle set to obtain ${x_{t}^{i}, 1 / N}_{i = 1}^{N}$

9: return Particle set ${x_{t}^{i}, 1 / N}_{i = 1}^{N}$ , $x_{t}^{MAP}$ ;

4 Experimental results

This section evaluates the proposed tracking method from qualitative and quantitative perspectives. We select several other tracking methods to compare their performance with our proposed method. The methods are: PF [9], UPF tracker [14], IUPF tracker [11], and SSAMC tracker [7]. We select six typical video sequences to test the trackers’ performance which are shown in Table 1. Such video sequences may appear in different teaching or learning scenarios.

4.1 Qualitative analysis

For qualitative comparison, we use 100 particles for UPF, our method and IUPF, 600 for PF and 300 samples for SSAMC by default.

Face. In this sequence, a man is moving his face quickly and suddenly changes moving direction. The tracking results are shown in Fig. 2. It is clear that our tracker could capture the face successfully with fewer particles compared to the other four trackers. The IUPF and UPF could also track the man’s face, but their performance is obviously worse than our tracker. The SSAMC and PF frequently lose the face throughout the sequence. The results on Face sequence demonstrate that our method is efficient in dealing with fast motions.

Grilface. This sequence refers to a girl face moving with rotation, pose change and partial occlusions. This is a long sequence containing different challenging factors. As shown in Fig. 3, the proposed tracker could efficiently deal with the thorny problems induced by rotation (# 177), pose change (# 306) and partial occlusions (#458 and #468). The SSAMC and IUPF show moderately good performance, but their tracking accuracy is not as good as ours. The UPF and PF show the worst performance, they cannot capture the girl face during rotation and occlusion. From these results, our tracker could efficiently solve the appearance variation induced by rotation and partial occlusions.

SeqMS. The SeqMS sequence contains a man frequently moving his hands to occlude his face frequently which leads to frequent partial and total occlusions. 300 particles are used for the methods except the PF, and 600 for the PF. Sample frames of the tracking results are shown in Fig. 4. It is shown that, when the man’s face is occluded by his hands, our method could accurately localize its position before or after occlusion. Our method shows steadily better performance. The SSAMC shows better performance in dealing with partial occlusions compared to the IUPF and UPF. However, as for total occlusions, only our method could successfully handle the problem while the other four methods fail to deal with them.

David and Singer. In this experiment, 300 particles are used for the methods except the PF, and 600 for the PF. The David and Singer sequences mainly contain severe illumination variation which causes severe appearance changes. As we have stated previously, we adopted the LSH based appearance model which can deal with illumination change efficiently. Our tracker is expected to show much better performance in these two sequences compared to the other trackers. Tracking results over the David sequence is shown in Fig. 5. It is shown that, although severe illumination change occurred throughout the sequence, our method could consistently capture the object. The SSAMC could track the object at certain frames, but its tracking accuracy is lower than our method. The IUPF, UPF and PF methods frequently lose the object because they adopt the color based appearance model which cannot deal with the severe appearance variation in complex scenarios. Figure 6 shows the tracking results on the Singer sequence. The results are quite the same as that on the David sequence. Our method shows the best performance compared to the other four methods. The results on these two sequences demonstrate that our proposed tracking method could efficiently deal with appearance changes induced by severe illumination change.

Hockey. In this sequence, the hockey player is moving with many similar players wearing the same clothes. The results are shown in Fig. 7. When the object interacts with similar players, our track could give steady and accurate tracking result (#44 and #52). The PF tracker could also capture the object when interaction occurred, but its accuracy is worse than ours. The IUPF and UPF lose the object after interaction occurred, while the SSAMC frequently lose the object throughout the sequence.

4.2 Quantitative analysis

4.2.1 Success rate

For quantitative comparison, we first evaluate the trackers’ success rates on different sequences. As that in [21], we define a successful track when thef-measure is larger than 0.5 at one frame. For a video sequence, the success rate (SR) is defined as the ratio between the successfully tracked frames and the total frames of the sequence. Table 2 shows the trackers’ SR over the six video sequences. It is observed that our method outperforms the other five methods on all of the sequences. The SSAMC and IUPF are superior to the UPF and PF except the Hokey sequence. For the Hockey sequence, PF and our method are better than the other methods.

4.2.2 Time consumption

In this section, we compare the time cost of the trackers. Thus, for fair comparison, we choose the Singer sequence to run the five trackers using 300 particles or samples. The frame size is 320 pixels*240 pixels. All the trackers are run on a personal computer with CUP: Intel Core i5-2540M, Memory: 12.0 GB, OS: Window 7 64 Bit. As shown in Table 3, it is clear that the time costs of the IUPF and UPF are higher than that of the other trackers. The SSAMC and our method are almost the same, while the PF needs the least time.

5 Conclusion

We have proposed an enhanced particle filter for visual tracking. The proposal distribution we adopted is a hybrid distribution that is composed of the UKF and the transition prior. For the appearance model, we construct an efficient model to deal with appearance variation. The color histogram and LSH are used in this appearance model. Experimental results demonstrate that the proposed method shows superior performance to several other methods. Our future work will focus on updating the template model during tracking on the basis of this work.

Footnotes

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Nos. 61300082, 61272369, 61402069), Program for Liaoning Excellent Talents in University (No. LJQ2015006) and Liaoning Natural Science Foundation (No. 2015020015).

References

Maggio

and Cavallaro

, Video Tracking, Johan Wiley & Sons, 2011.

Khamparia

and Pandey

, Knowledge and intelligent computing methods in e-learning, International Journal of Technology Enhanced Learning7 (2015), 221–242.

Leontidis

, Halatsis

and Grigoriadou

, Using an affective multimedia learning framework for distance learning to motivate the learner effectively, International Journal of Learning Technology6 (2011), 223–250.

, Halawani

, Feng

, Réhman

and Li

, Touch-less inter-active augmented reality game on vision-based wearable device, Personal and Ubiquitous Computing19 (2015), 551–567.

, Wearable Smartphone: Wearable Hybrid Framework for Hand and Foot Gesture Interaction on Smartphone, Proceedings of International Conference on Computer Vision Workshops, 2013, pp. 436–443.

Wang

and Lu

, Robust particle tracker via markov chain monte carlo posterior sampling, Multimedia Tools and Applications72 (2014), 573–589.

Zhou

, Lu

, Di

, Zhao

and Zhang

, Abrupt motion tracking via nearest neighbor field driven stochastic sampling, Neurocomputing165 (2015), 350–360.

Xie

, Pei

, Zhang

, Meng

and Jia

, Tracking pedestrian with incrementally learned representation and classification model, Journal of Information Science and Engineering30 (2014), 1035–1052.

Isard

and Blake

, CONDENSATION-conditional density propagation for visual tacking, International Journal of Computer Vision29 (1998), 5–28.

10.

Nummiaro

, Koller-Meier

and Van

, Gool, An adaptive color-based particle filter, Image and Vision Computing21 (2003), 99–110.

11.

Sun

, Li

, Qiu

and Wang

, Iterated unscented Kalman particle filter for visual tracking, Journal of Computational Information Systems10 (2014), 681–689.

12.

Martinez-del-Rincon

, Orrite

and Medrano

, Rao-blackwellised particle filter for color-based tracking, Pattern Recognition Letters32 (2011), 210–220.

13.

Dore

, Beoldo

and Regazzoni

C.S.

, Multiple Cue Adaptive Tracking of Deformable Objects with Particle Filter, Proceedings of International Conference on Image Processing, 2008, pp. 237–240.

14.

Rui

and Chen

, Better Proposal Distributions: Object Tracking using Unscented Particle Filter, Proceedings of International Conference on Computer Vision and Pattern Recognition2 (2001), 786–793.

15.

Perez

, Hue

, Vermaak

and Ganget

, Color-based Probabilistic Tracking, Proceedings of European Conference on Computer Vision, LNCS 2350 (2002), 661–675.

16.

Brasnett

, Mihaylova

, Canagarajah

and Bull

, Particle Filtering with multiple cues for object tracking in video sequences, Proceedings of SPIE Image and Video Communications and Processing5685 (2005), 430–441.

17.

Xiao

, Stolkin

, Oussalah

and Leonardis

, Continuously adaptive data fusion and model relearning for particle filter tracking with multiple features, IEEE Sensors Journal16 (2016), 2639–2649.

18.

Xin

, Liu

, Ran

and Liu

, Adaptive Multiple Cues Integration for Robust Outdoor Vehicle Visual Tracking, Proceedings of Chinese Control Conference, 2015, pp. 4913–4918.

19.

, Yang

, Lau

R.W.H.

, Wang

and Yang

M.-H.

, Visual Tracking Via Locality Sensitive Histograms, Proceedings of Computer Vision and Pattern Recognition, 2013, pp. 2427–2434.

20.

, Yang

, Lau

R.W.H.

and Yang

M.-H.

, Fast weighted histograms for bilateral filtering and nearest neighbor searching, IEEE Transactions on Circuits and Systems for Video Technology (2015), 1–12. doi: 10.1109/TCSVT.2015.2430671

21.

Kwon

and Lee

K.M.

, Tracking of Abrupt Motion using Wang-Landau Monte Carlo Estimation, Proceedings of European Conference on Computer Vision, 2008, pp. 387–400.