Robust object tracking with scene-adaptive scheme in occlusion

Abstract

Occlusion handling is a challenging problem in object tracking. Most existing methods fail to handle well in complex image sequences. This paper presents a scene adaptive tracking algorithm in occlusion. We decompose the tracking into target translation and scale prediction. A kernelized correlation filter with an adaptive update scheme is adopted to estimate target position. The adaptive online update scheme takes advantage of the confidence score sensitivity to occlusion and reduces the false updating in occlusion during the tracking sequence. The target scale can be estimated by the correlation filter with the ridge regression. Extensive experiments results on 29 challenging occlusion sequences show that the proposed tracking approach achieves the average overlap precision (OP) of 72.2%, which improves the performance by 7.6% compared to the DSST. On OTB-50 dataset, our tracking approach is also superior comparing to several state-of-the-art trackers.

Keywords

Object tracking correlation filter Fast Fourier Transform (FFT)

1 Introduction

Object tracking is one of fundamental problems in computer vision with the wide range of applications such as surveillance, security and motion analysis. Though the many visual tracking algorithms has proposed, visual tracking is still a challenge task because of the factors such as occlusion, scale variation, deformation, background clutter, out-of-plane rotation and so on.

The current tracking algorithms can be generally categorized into either generative or discriminative methods. Generative methods [1 –3] learn an appearance model and treat tracking problem as finding the regions which are the most similar to the generative model. Ross et al. [1] proposed an incremental visual tracking (IVT) method that models the target appearance as a low-dimensional PCA subspace,where the subspace is updated adaptively with the historical and sequential appearance variations. Kwon et al. [2] proposed the visual tracking decomposition(VTD) method that the observation models are built by sparse principal component analysis (SPCA) of a set of feature templates. Comaniciu et al. [3] proposed the mean-shift tracker that tracks the object by calculating the Bhattacharyya similarity between histogram of candidate object and histogram of object model. The mean-shift tracker was well-known for robustness against partial occlusion and scale variations using the color histogram. However, these generative methods model only the appearance of the object while neglect the background information, which is the important information that can help us to distinguish between the target and background.

Different from Generative trackers, the discriminative approaches formulate the tracking as a classification problem that distinguishes the tracked targets from the backgrounds [4 –6]. Zhang [4] proposed a real-time compressive tracking that formulated the task as a binary classification in the compressed domain. Different from tracking-by-detection framework, Kalal et al. [5] proposed the TLD tracker that decomposed the tracking task into three sub-tasks: tracking, learning and detection. The TLD tracker built a long-term tracking system based on the P-N learning while running in real time. Babenko [6] proposed a MIL tracker with few parameter tweak using multiple instance learning instead of traditional supervised learning. Recently, the trackers based the correlation filter in have been proven to be high efficiency and achieve robust tracking performance.Bolme et al. [7] proposed to learn a minimum output sum of squared error (MOSSE) filter for visual tracking. With the use of the fast Fourier transformation (FFT), the MOSSE tracker is high efficient with running at hundreds frames per second. Several extensions have been proposed to considerably improves tracking accuracy, including kernelized correlation filters (KCF) [8], color name tracker(CN) [9] and discriminative scale space tracker (DSST) [10, 11].

Though the existing correlation filter trackers is high efficient, those trackers cannot handle the occlusion problem well. In this paper, we aim to build a correlation filter tracker which is able to handle partial in occlusion and run in real-time. Our key idea is to employ the fast discriminative KCF to estimate target location, and construct a new adaptive update scheme which takes the occlusion into consideration. In addition, we decompose the tracking task into translation and scale estimation which can improve the performance to a great extent. The experiments proved the effectiveness of our tracking approach.

Our paper is arranged as follows. We describe the related work in Section 2. Our algorithm framework and the adaptive update scheme are discussed in Section 3. We give the experimental results of our method in Sections 4 and 5 concludes this paper.

2 Related work

In recent years, the correlation filter is applied to the field of object tracking. The MOSSE tracker is a classic correlation filter algorithm that performs well under changes in rotation and lighting. Henriques et al. [8] proposed the KCF tracker based on the correlation filter which used a circulant structure of tracking-by-detection with kernels method and multi-channel features. The DSST tracker [10, 11] learnt multi-scale correlation filters to track object using HOG features. However, these trackers didn’t address the adaptive model update for occlusion. Therefore, these correlation filter trackers are easy to drift in the occlusion environment.

To handle the occlusion, many trackers divided the target into separate parts to obtain stable target position. Kwon et al. [12] presented a local patch-based appearance model and provided an efficient algorithm to evolve the topology between local patches by on-line update. Zhang [13] presented a tracker with partial occlusion handling by robust parts matching among multiple frames. Zhao [14] proposed an adaptive template update scheme taking advantage of local sparse representation to detect occlusions during the tracking sequence. Yu [15] proposed a robust PCA algorithm to select part of image pixels to compute coefficients, which can successfully avoid false updates in occlusion and noisy. Liu [16] proposed a part-based visual tracker based on the adaptive correlation filters. However, the computational complexity of these methods is high and difficult to run in real-time. Unlike separate parts, we use a global response value to represent the occlusion while running in real-time.

3 Scene adaptive tracking

3.1 The target translation prediction

As the basis of our tracker, the KCF filter is used to train the translation filter, which can predict the target location. Here we use HOG feature and intensity feature to learn the KCF filter. The KCF tracker is trained using an image patch x of size, M × N, and considers all cyclic shifts x_m,n, (m, n) ∈ {0, ⋯ , M - 1} × {0, ⋯ , N - 1} as the training examples. The goal of KCF is to train a linear model f (z) = 〈w, z〉, which indicates the probability of image patch z being the tracked target, by minimizing the squared error over samples $min_{w} \sum_{m, n} {{(f (x_{m, n}) - y_{m, n})}^{2} + λ_{0} | | w | |}^{2}$ (1) where y_m,n is the desired output that follows a Gaussian function. Here λ₀ is the model parameter matrix, w is a regularization parameter and 〈〉 represents the inner product. After mapping the inputs of a linear problem to a non-linear feature-space, the model can be rewritten as [8]

$\begin{matrix} f (z) & = & 〈 w, z 〉 = \sum_{m . n} α_{m . n} 〈 z, x_{m, n} 〉 \\ = & \sum_{m, n} α_{m . n} g (z, x_{m, n}) \end{matrix}$ (2) where g() denotes the Gaussian kernel function. The solution to the problem is

$α = F^{- 1} (\frac{F (y)}{F (k^{x, x}) + λ_{0}})$ (3)

Here F() denotes the FFT transformation and α is a matrix consisting of coefficients α_m.n. The kernel correlation operation k^x,x′ is defined as

$\begin{matrix} k^{x, x^{'}} & = & exp (- \frac{1}{σ^{2}} (| | x | |^{2} + | | x^{'} | |^{2} - 2 F^{- 1} \\ (\sum_{c} F (\bar{x_{c}}) ⊙ F (x_{c}^{'})))) \end{matrix}$ (4)

Here the bar $\bar{x_{c}}$ denotes complex conjugation, ⊙ denotes the element-wise product and the subscript c denotes the cth channel of the image feature patch. The confidence map on an image patch z in the new frame is computed as $y = F^{- 1} (F (k^{x, z}) ⊙ F (α))$ (5)

Therefore, the new position of target is detected by searching for the location of the maximal confidence score of y.

3.2 Adaptive online update scheme

We update the KCF filter by simple linear interpolation with an adaptive learning rate η. The update scheme is defined as $F (α)^{t} = (1 - η_{0}) F (α)^{t - 1} + η_{0} F (α)$ (6) where η₀ is learning rate and t is the index of frame. In KCF tracker, the learning rate is a fix parameter which means that correlation filter will be updated without adaptation. The adaptive strategy is often used to deal with the dynamic problem [17, 18]. Here we will set an adaptive learning rate based on the occlusion.

Occlusion is one of the main challenging problems that often make it fail to relocate the objects in visual tracking. In General, the maximal confidence score becomes small when the target is partial occluded suddenly. Therefore, the maximal confidence score can reflect the occlusion changes. Let ymax denote the maximal confidence score for the confidence map on an image patch. Figure 1 shows the changes of the maximal confidence score when the target is occluded for the Faceocc2 sequence.

Fig.1

The confidence maps for the Faceocc2 image sequences. The images (b), (d) and (f) are the confidence maps to the sequence images (a), (c) and (e) separately. Here the ymax is the maximal confidence score.

However, the maximal confidence score cannot be used as the learning rate since it only reflects the absolute amount for the occlusion. Here we use the ratio of the maximal confidence scores between adjacent frames to denote the relative quantity of the occlusion. The learning rate η₀ is defined as $η_{0} = b (c (y_{max}^{t} / y_{max}^{t - 1}) + y_{max}^{t})$ (7)

Here the part $y_{max}^{t}$ denotes the maximal confidence score in the tth frame, b is the basic factor and c is used to adjust the weight between the part $y_{max}^{t} / y_{max}^{t - 1}$ and the part $y_{max}^{t}$ . The part $y_{max}^{t} / y_{max}^{t - 1}$ , which reflects the ratio of the maximal confidence scores between adjacent frames, can be regard as the relative quantity of the occlusion. In this paper, the parameter b is set to 0.02 and c is set to 0.5. The quantity $y_{max}^{t}$ is a positive number less than 1 and the ratio $y_{max}^{t} / y_{max}^{t - 1}$ usually changes between 0 and 5 in a real environment. So the learning rate η₀ is a positive number less than 1. The online update method is an adaptive scheme with occlusion using the maximal confidence score between adjacent frames.

3.3 The target scale prediction

Here the 1-dimensional scale correlation filter is used to estimate the target scale. We use HOG features to learn the scale filter. Let H × W be the target size and N is the number of scales $S = {s_{i} = z^{i} | z \in ([- \frac{N - 1}{2}], \dots, [\frac{N - 1}{2}])}$ . Here z denotes the scale factor. For each s_i ∈ S, we extract an image patch of size s_iH × s_iW centered around the estimated location. Unlike DSST algorithm, we resize all patches with size H × W again. We denote feature dimension number j ∈ {1, 2, ⋯ , d} of f by f^j. The task is to learn a correlation filter h, consisting of one filter h^j per feature dimension. The scale filter h^j is achieved by the cost function [11] $min | | \sum_{j = 1}^{d} h^{j} * f^{j} - g | |^{2} + λ_{1} | | h^{j} | |^{2}$ (8) where g is as the 1-dimensional Gaussian shaped peak centered around the current scale and the star * denotes circular correlation. The solution to (8) is $H^{j} = \frac{\bar{G} ⊙ F^{j}}{\sum_{j = 1}^{d} F^{j} ⊙ \bar{F^{j}} + λ_{1}}$ (9) where the upper case variables F^j, G and the filter H^j denote the Fourier transform of their lower case counterparts separately. Here ⊙ is the element-wise product and the bar $\bar{G}$ represents complex conjugation. The scale filter $H_{t}^{j}$ can be updated by the numerator $A_{t}^{j}$ and denominator B_t. $A_{t}^{j} = η_{1} A_{t - 1}^{j} + (1 - η_{1}) F_{t}^{j} ⊙ \bar{G}$ (10) $B_{t} = η_{1} B_{t - 1} + (1 - η_{1}) \sum_{j = 1}^{d} F_{t}^{j} ⊙ \bar{F_{t}^{j}}$ (11)

where t is the index of frame and η₁ is a learning rate. For each s_i ∈ S, the confidence score can be calculated as $y^{s_{i}} = \frac{\sum_{j = 1}^{d} \bar{A_{t}^{j}} ⊙ F_{s_{i}}^{j}}{B_{t} + λ}$ (12) where $F_{s_{i}}^{j}$ denotes the target features on the scale s_i. So the optimal scale $\hat{s}$ of target is

$\begin{matrix} \hat{s} & = & \underset{s}{arg max} (max (y^{s_{1}}), \dots, max (y^{s_{i}}), \\ \dots, max (y^{s_{N}})) \end{matrix}$ (13)

We present an outline of our method in Algorithm 1.

Algorithm 1

Proposed tracking algorithm

Input: The tth frame video sequence, initial target position p0 and scale s0

Output: Estimated target position pt and scale strepeat

1.Crop out the searching region in frame t at target position pt−1 and scale st−1

Translation estimation:

2.Extract the target features around at target position pt−1 and scale st−1

3.Compute the translation filter using (3)

4.Find the target position pt which maximizes (5)

Scale estimation:

5.Extract the target features around at target position pt and scale st−1

6.Compute the scale filter using (9)

7.Find the target scale st using (13) Model update:

8.Update the translation model using (6)

9.Update the scale model using (10) and (11) until the end of the video sequence

4 Experimental results

4.1 Parameter setup

We name our proposed tracker “SAT” (Scene Adaptive Tracking). In the translation filter, the standard deviation for the Gaussian kernel is set to 0.5. In the scale filter, the standard deviation for the desired correlation output is set to 0.25 of the target size. The regularization parameter in SAT is set to λ₀ = 10^-2 in (1) and λ₁ = 10^-4 in (8). The size of the search window for translation estimation is set to 1.4 times of the target size. The scale learning rate η₁ is set to 0.03 in formula (10) and (11). The number of scale is |S|=33 with a self-adaptive scale factor z. Given a target of size, the self-adaptive scale factor can be set to $z = {\begin{matrix} 1.06, if \min (H, W) \leq 20 \\ 1.02, if min (H, W) \geq 100 \\ 1.04, else \end{matrix}$ (14)

The strategy can adjust the scale factor parameter adaptively.

We evaluate the proposed algorithm on a large benchmark dataset [19] that contains 50 videos. In benchmark dataset, we adopt the 29 sequences annotated with “occlusion” as the occlusion dataset. We compare our algorithm with 9 state-of-the-art trackers: DSST [10, 11], KCF [8], Struck [20], VTD [2], TLD [5], IVT [1], CT [4], MIL [6], CXT [21]. For the occlusion dataset, we report the overlap precision at a threshold of 0.5, which correspond to the PASCAL evaluation criteria. Meanwhile, we provide two kinds of plots: Precision Plot and Success Plot to evaluate all trackers, where trackers are ranked using the area under curve (AUC). Precision Plot indicates the ratio of frames with center location error (CLE) below a certain threshold. Success Plot [10] is based on the overlap precision (OP) that indicates the percentage of frames where the bounding box overlap surpasses throughout all threshold t ∈ [0, 1]. All trackers in this paper are implemented in Matlab2013 on an Intel I5-3210 2.50 GHz CPU with 4 GB RAM.

4.2 Experiment 1: Evaluation on occlusion dataset

The occlusion dataset includes the 29 sequences that those sequences also have challenging problems such as illumination variation, deformation and background clutter. Table 1 shows the Per-video OP at a threshold 0.5 compared with 9 state-of-the-art trackers. Among the trackers in the literature, our SAT algorithm performs well with an average OP of 72.2%, which outperforms the DSST algorithm by 7.6%. In the sequences of david3, faceocc1, jogging-1, tiger1 and tiger2, the main challenge is the occlusion. The SAT algorithm handles the occlusion changes well on those sequences.

Table 1
Per-video overlap precision (OP) (%) on the occlusion dataset. The red fonts indicate the best performance, the blue fonts indicate the second best ones, and the green fonts indicate the third best ones

SAT DSST KCF Struck VTD TLD IVT CT MIL CXT

basketball 88.1 69.8 89.8 10.2 92.4 2.5 9.5 25.9 27.4 2.5

bolt 98.6 100 94.3 1.7 23.4 14.6 1.4 0.6 1.1 1.7

carScale 98.8 84.5 44.4 43.3 48.0 43.7 70.2 44.8 44.8 78.2

coke 79.4 83.2 72.2 94.2 13.7 28.9 13.1 9.3 11.7 59.1

david 83.0 100.0 62.2 23.6 67.7 97.0 79.4 42.7 22.9 83.4

david3 100.0 52.8 99.2 33.7 48.4 10.3 63.5 34.9 68.3 13.9

doll 96.8 99.7 55.2 68.8 81.1 62.4 42.4 53.1 43.3 97.5

dudek 99.8 98.1 97.6 98.0 100.0 84.2 96.8 85.2 85.7 92.4

faceocc1 100.0 100.0 100.0 100.0 92.5 83.4 97.5 85.4 76.5 77.1

faceocc2 99.3 100 99.6 100.0 99.4 82.9 91.4 74.4 93.6 94.6

football 66.0 79.0 70.2 66.0 76.8 41.2 71.5 78.5 73.8 65.2

freeman4 50.2 41.7 18.4 15.5 13.8 26.9 19.8 0.4 2.1 18.0

girl 86.0 30.6 74.2 98.0 65.2 76.4 18.6 17.8 29.4 64.2

ironman 14.5 13.3 15.1 4.8 15.7 6.6 5.4 9.0 4.8 3.0

jogging-1 96.7 22.5 22.5 22.5 21.5 96.7 22.5 22.5 22.5 95.4

jogging-2 18.2 18.2 16.0 24.8 16.3 83.1 19.2 14.0 16.3 15.3

lemming 26.6 27.2 44.2 64.0 49.3 59.4 16.7 68.0 81.1 61.0

liquor 64.6 40.9 98.1 40.6 58.0 58.2 20.7 20.9 20.1 21.0

matrix 14.0 18.0 13.0 12.0 7.0 7.0 2.0 2.0 11.0 4.0

singer1 60.1 100.0 27.6 29.9 43.0 99.1 48.1 24.8 27.6 32.2

skating1 47.5 52.3 36.3 37.0 56.8 22.8 7.5 10.0 10.3 12.0

soccer 16.1 38.8 39.3 15.6 22.7 12.2 17.3 20.2 15.6 12.8

subway 99.4 22.3 100.0 90.9 21.1 22.9 21.1 76.6 79.4 22.9

suv 98.3 98.4 98.4 57.5 55.0 83.9 44.3 23.1 13.0 91.5

tiger1 87.1 59.3 85.7 18.3 11.7 45.6 8.0 24.6 9.7 27.8

tiger2 76.2 29.6 36.4 64.9 16.7 17.3 7.7 37.0 44.7 28.8

walking 82.0 99.8 51.5 56.6 81.3 38.3 99.8 50.2 54.1 21.8

walking2 55.6 100.0 38.0 43.4 40.2 34.0 100.0 38.4 38.0 39.8

woman 92.0 93.3 93.6 93.5 18.1 16.6 18.4 15.9 18.8 20.6

Average OP 72.2 64.6 61.8 49.3 46.8 46.8 39.1 34.8 36.1 43.4

	SAT	DSST	KCF	Struck	VTD	TLD	IVT	CT	MIL	CXT
basketball	88.1	69.8	89.8	10.2	92.4	2.5	9.5	25.9	27.4	2.5
bolt	98.6	100	94.3	1.7	23.4	14.6	1.4	0.6	1.1	1.7
carScale	98.8	84.5	44.4	43.3	48.0	43.7	70.2	44.8	44.8	78.2
coke	79.4	83.2	72.2	94.2	13.7	28.9	13.1	9.3	11.7	59.1
david	83.0	100.0	62.2	23.6	67.7	97.0	79.4	42.7	22.9	83.4
david3	100.0	52.8	99.2	33.7	48.4	10.3	63.5	34.9	68.3	13.9
doll	96.8	99.7	55.2	68.8	81.1	62.4	42.4	53.1	43.3	97.5
dudek	99.8	98.1	97.6	98.0	100.0	84.2	96.8	85.2	85.7	92.4
faceocc1	100.0	100.0	100.0	100.0	92.5	83.4	97.5	85.4	76.5	77.1
faceocc2	99.3	100	99.6	100.0	99.4	82.9	91.4	74.4	93.6	94.6
football	66.0	79.0	70.2	66.0	76.8	41.2	71.5	78.5	73.8	65.2
freeman4	50.2	41.7	18.4	15.5	13.8	26.9	19.8	0.4	2.1	18.0
girl	86.0	30.6	74.2	98.0	65.2	76.4	18.6	17.8	29.4	64.2
ironman	14.5	13.3	15.1	4.8	15.7	6.6	5.4	9.0	4.8	3.0
jogging-1	96.7	22.5	22.5	22.5	21.5	96.7	22.5	22.5	22.5	95.4
jogging-2	18.2	18.2	16.0	24.8	16.3	83.1	19.2	14.0	16.3	15.3
lemming	26.6	27.2	44.2	64.0	49.3	59.4	16.7	68.0	81.1	61.0
liquor	64.6	40.9	98.1	40.6	58.0	58.2	20.7	20.9	20.1	21.0
matrix	14.0	18.0	13.0	12.0	7.0	7.0	2.0	2.0	11.0	4.0
singer1	60.1	100.0	27.6	29.9	43.0	99.1	48.1	24.8	27.6	32.2
skating1	47.5	52.3	36.3	37.0	56.8	22.8	7.5	10.0	10.3	12.0
soccer	16.1	38.8	39.3	15.6	22.7	12.2	17.3	20.2	15.6	12.8
subway	99.4	22.3	100.0	90.9	21.1	22.9	21.1	76.6	79.4	22.9
suv	98.3	98.4	98.4	57.5	55.0	83.9	44.3	23.1	13.0	91.5
tiger1	87.1	59.3	85.7	18.3	11.7	45.6	8.0	24.6	9.7	27.8
tiger2	76.2	29.6	36.4	64.9	16.7	17.3	7.7	37.0	44.7	28.8
walking	82.0	99.8	51.5	56.6	81.3	38.3	99.8	50.2	54.1	21.8
walking2	55.6	100.0	38.0	43.4	40.2	34.0	100.0	38.4	38.0	39.8
woman	92.0	93.3	93.6	93.5	18.1	16.6	18.4	15.9	18.8	20.6
Average OP	72.2	64.6	61.8	49.3	46.8	46.8	39.1	34.8	36.1	43.4

Resultant precision plots and success plots of OPE [18] are shown in Fig. 2. In the precision plot, the precision score of the SAT algorithm is 0.7 which outperforms the KCF algorithm by 2.5% and DSST algorithm by 5.3%. In the success plot, the proposed SAT algorithm achieves the score of 0.565 which outperforms the DSST algorithm by 3.3% and KCF algorithm by 5.1%. This indicates that the adaptive update strategy of filter based on the scene can improve the accuracy in occlusion. Our algorithm provides promising results compared to several trackers in the literature both in success plot and in precision plot on occlusion dataset.

Fig.2

Precision and success plots on occlusion dataset. The legend contains the AUC score for each tracker. The proposed SAT tracker performs favorably against the state-of-the-art trackers on occlusion dataset.

4.3 Experiment 2: Evaluation on benchmark dataset

To further evaluate the robustness of our SAT, we set up a comparison on the benchmark dataset (OTB-50) [19] with challenging attributes such as occlusion, out-of-plane rotation, deformation and background clutter. Resultant Precision Plots and Success Plots of OPE on are shown in Fig. 3, which shows our tracker is superior comparing to state-of-the-art trackers on the OTB-50 dataset. In the precision plot, the precision score of the SAT algorithm is 0.705 which outperforms the DSST algorithm by 3.0% and KCF algorithm by 3.1%. In the success plot, the success score of the SAT algorithm is 0.575, which also outperforms the DSST algorithm.

Fig.3

Precision and success plots on OTB-50 benchmark dataset.

In addition, we report results for the deformation attributes in Fig. 4. On deformation sequences,the KCF method performs well with the precision score of 0.671 and the success score of 0.534 while the SAT algorithm achieves 0.749 and 0.605, which outperforms the KCF algorithm. Therefore, the SAT algorithm also robustness to the change of deformation.

Fig.4

Precision and success plots on the deformation sequences of the benchmark dataset.

Figure 5 shows a visualization of the tracking results of our method and the visual trackers DSST, KCF, Struck and CT on challenging sequences: carscale, david3, freeman4, jogging-1 and tiger2, which shows our tracker can preferably adapt to the occlusion change of target while keeping high precision. On OTB-50 dataset, our SAT algorithm performs at 39.4 frames per second that indicates the algorithm can be run in real time in most cases.

Fig.5

A visualized comparison of our tracker with four state-of-the-art trackers. The frames are from carscale, david3, freeman4, jogging-1 and tiger2 respectively from top to bottom.

5 Conclusion

In this paper, we propose a scene adaptive tracking algorithm based on correlation filter in a tracking-by-detection framework. The tracking task is decomposed into target translation and scale prediction in our method. A kernelized correlation filter based on the multidimensional features is adopted to estimate target position. We present an adaptive online update scheme for the kernelized correlation filter. The target scale can be estimated by the correlation filter with the ridge regression. Experimental results show that the SAT algorithm performs favorably against several state-of-the-art trackers on both occlusion dataset and the OTB-50 dataset while running in real time. Moreover, the SAT algorithm also adapts to the change of deformation.

Footnotes

Acknowledgments

This work was supported by Shandong Natural Science Foundation (ZR2013FL018) and National Nature Science Foundation of China (61773244, 61772319, 61472227).

References

Ross

, Lim

, Lin

R.S.

and Yang

M.H.

, Incremental learning for robust visual tracking, International Journal of Computer Vision 1 (2008), 125–141.

Kwon

, Lee

K.M.

Visual tracking decomposition IEEE Conference on Computer Vision and Pattern Recognition, IEEE Computer Society Press (2010), 1269–1276, California, USA.

Comaniciu

, Ramesh

and Meer

, Kernel-based object tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence 5 (2003), 564–575.

Zhang

, Zhang

, Yang

M.-H.

Real-time compressive tracking European Conference on Computer Vision, (2012), 864–877, ECCV Press Firenze, Italy.

Kalal

, Mikolajczyk

and Matas

, Tracking-learning-detection, IEEE Transactions on Pattern Analysis and Machine Intelligence 7 (2012), 1409–1422.

Babenko

, Yang

M.-H.

and Belongie

, Robust object tracking with online multiple instance learning, IEEE Transactions on Pattern Analysis and Machine Intelligence 8 (2011), 1619–1632.

Bolme

D.S.

, Beveridge

J.R.

, Draper

B.A.

, Lui

Y.M.

Visual object tracking using adaptive correlation filters IEEE Conference on Computer Vision and Pattern Recognition (2010), 2544–2550, IEEE Computer Society Press, Florida, USA.

Henriques

J.F.

, Caseiro

, Martins

and Batista

, High-speed tracking with kernelized correlation filters, IEEE Transactions on Pattern Analysis and Machine Intelligence 3 (2015), 583–596.

Danelljan

, Khan

F.S.

, Felsberg

Adaptive Color Attributes for Real-Time Visual Tracking IEEE Conference on Computer Vision and Pattern Recognition (2014), 1090–1097, IEEE Computer Society Press, Columbus, USA.

10.

Danelljan

, Hager

, Khan

F.S.

, Felsberg

Accurate scale estimation for robust visual tracking Proceedings of British Machine Vision Conference, (2014), 1–11 IEEE Computer Society Press, Nottingham, UK.

11.

Danelljan

, Hager

and Khan

F.S.

, Discriminative scale space tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence 8 (2017), 1561–1575.

12.

Kwon

, Lee

K.M.

Tracking of a non-rigid object via patchbased dynamic appearance modeling and adaptive basin hopping monte carlo sampling

IEEE Conference Computer Vision and Pattern Recognition (2009), 1208–1215 IEEE Computer Society Press, Florida, USA.

13.

Zhang

, Jia

, Xu

, Ma

, Ahuja

Partial occlusion handling for visual tracking via robust part matching IEEE Conference on Computer Vision and Pattern Recognition (2014), 1258–1265, IEEE Computer Society Press Columbus, USA.

14.

Zhao

, Wang

and Liu

, Robust object tracking with occlusion handling based on local sparse representation, International Journal of Signal Processing, Image Processing and Pattern Recognition 3 (2014), 407–420.

15.

, Hu

, Lu

and Li

, Robust object tracking with occlusion handle, Neural Computing and Applications 7 (2011), 1027–1034.

16.

Liu

, Wang

, Yang

Real-time part-based visual tracking via adaptive correlation filters IEEE Conference on Computer Vision and Pattern Recognition (2015), 4902–4911, , IEEE Computer Society Press Boston, USA.

17.

Bing

and Ning

, Research on the dynamic evolution behavior of group loitering air vehicles, Applied Mathematics and Nonlinear Sciences 2 (2016), 353–358.

18.

Balibrea

, On problems of Topological Dynamics in non-autonomous discrete systems, Applied Mathematics and Nonlinear Sciences 2 (2016), 391–404.

19.

, Lim

, Yang

M.H.

Online object tracking: A benchmark IEEE Conference on Computer Vision and Pattern Recognition (2013), 2411–2418, ortland, USA.

20.

Hare

, Saffari

, Torr

P.H.S.

Struck: Structured output tracking with kernels

IEEE International Conference on Computer Visio (2011), 263–270, IEEE Computer Society Press Barcelona, Sain.

21.

Dinh

T.B.

, Vo

, Medioni

Context tracker: Exploring supporters and distracters in unconstrained environments IEEE Conference on Computer Vision and Pattern Recognition (2011), 1177–1184 IEEE Computer Society Press, Colorado Srings, USA.