An improved kernel correlation filtered image target tracking algorithm

Abstract

The kernel correlation filtering (KCF) tracking algorithm cannot solve the target tracking mesoscale variation and target loss problem. For this, an improved kernel correlation filtering (IKCF) target tracking algorithm is proposed in this paper. A scale filter is added to the training displacement filter to improve the target scale change problem. In order to solve the problem of target loss, the occlusion processing mechanism is combined, when the target is affected by a small occlusion area, the support vector machine (SVM) is used to train the sample online; when the target is occluded, the re-detection classifier is used for detection. The experimental results show that the tracking accuracy of this method is significantly improved compared with other excellent tracking algorithms.

Keywords

Kernel correlation filter target loss scale change occlusion support vector machine

1. Introduction

Target tracking is an important topic in the field of machine vision, there are a wide range of application scenarios, such as robotics, video surveillance, intelligent transportation, etc. [1, 2, 3, 4]. In recent years, although the target tracking technology has been significantly improved with the introduction of machine learning, it still faces many challenges, such as scale changes, illumination changes, target deformation, target occlusion, etc. [5].

The target tracking algorithm based on detection shows good tracking performance [6, 7, 8], it is the mainstream tracking algorithm in recent years. These algorithms usually treat the tracking process as a classification problem, the classifier is trained offline or online through existing video frames, the trained classifier is used to determine the target position of the next frame, such as the kernel structured output tracking algorithm (Structured output tracking with kernel, Struck) [9], Tracking-Learning Detection tracking algorithm (TLD) [10], Multiple Instance Learning Tracking algorithms (MIL) [11]. The sampling methods of these algorithms are generally sparse sampling, they have obvious shortcomings in tracking accuracy and computational efficiency. Correlation filters have been widely used in target detection and recognition. A Minimum Output Sum of Squared Error (MOSSE) tracking algorithm was proposed by Bolme et al. [12], it was applied to the target tracking field for the first time, it achieved good results. Subsequently, Henriques et al. proposed a Circulant Structure of tracking-by-detection with kernel (CSK) tracking algorithm [13], cyclic structure coding intensive sampling was used innovatively, nonlinear classifier of the regularized least squares (RLS) was trained with kernel method. Then, in the Kernelized Correlation Filter tracking algorithm (KCF) [14], the CSK is improved by using the Histogram of Oriented Gradients (HOG) feature [15]. The Discriminative Scale Space Tracker (DSST) tracking algorithm was proposed by Danelljan et al. [16], the problem of target scale change was solved in the tracking process based on the MOSSE tracking algorithm [17, 18, 19].

Therefore, based on the KCF tracking algorithm, the DSST tracking algorithm is firstly used in this paper to solve the problem of target scale change in the tracking process, the algorithm can not effectively deal with the large-area occlusion or loss of the target encountered in long-term tracking, a re-detection method is proposed to solve the problem of missing re-catch in target tracking, it further improves the accuracy and robustness of the tracking algorithm.

2. KCF tracking algorithm improvement

The KCF tracking algorithm uses cyclic sampling to train the classifier. This method of dense sampling is superior to the tracking algorithm of sparse sampling, the tracking speed is also improved because the operation is converted to the frequency domain. However, during the tracking process, the tracking frame cannot be adaptively changed with the target scale. After the target is lost, the target cannot be re-tracked, it results in lower tracking performance.

2.1 Discriminant correlation filter

KCF Tracking Algorithm main idea of training displacement filters is to learn a discriminant correlation filter to locate a new frame of image. The specific method is to extract a set of gray image blocks $x_{1}$ , $x_{2}$ , $\ldots$ , $x_{t}$ from the target and its background as training samples, wherein each image block corresponds to a target output $g_{1}$ , $g_{2}$ , $\ldots$ , $g_{t}$ , in general, $g_{j}$ expected output function, it is a Gaussian function whose peak is at the center of $x_{j}$ . In this paper, a scale filter is added to the displacement filter. The scale filter is also a discriminant correlation filter. The biggest difference is that the training sample is extracted by establishing the target scale pyramid. The two filters are independent of each other, different features are selected for training. In the algorithm, the ridge regression is used to find the optimal correlation filter $h_{j}$ :

$\displaystyle\varepsilon=\sum\limits_{j=1}^{t}{||h_{j}\ast x_{j}-g_{j}||^{2}}+% \lambda\sum\limits_{j=1}^{t}{||h_{j}||^{2}}$ (1)

The operation is converted to the frequency domain according to the Parseval theorem:

$\displaystyle\min\frac{1}{\textit{MN}}\left({\sum\limits_{j=1}^{t}{||\bar{H}_{% t}X_{j}-G_{j}||^{2}}+\lambda\sum\limits_{j=1}^{t}{||H_{t}||^{2}}}\right)$ (2)

Where $f_{j}$ , $g_{j}$ and $h_{t}$ are all $M\times N$ , $\bar{H}_{t}$ is a complex conjugate matrix. $\lambda\geqslant$ 0 is a regularization term that can be used to prevent overfitting. Then we can get the Eq. (3):

$\displaystyle H_{t}=\frac{\sum\limits_{j=1}^{t}{\bar{G}_{j}X_{j}}}{\sum\limits% _{j=1}^{t}{\bar{X}_{j}X_{j}+\lambda}}$ (3)

2.2 Dense sampling

It is not difficult to see that the calculation of Eq. (3) is very large, so it also greatly affects the real-time performance of the tracking algorithm. The improvement method is to perform cyclic sampling or dense sampling on the target area, it not only improves the calculation efficiency but also improves the tracking accuracy. Different from the sparse sampling method of other algorithms, the correlation filtering does not strictly distinguish between positive and negative samples. The algorithm uses the transformation matrix P to cyclically shift the target image block x. For a one-dimensional image $x=[x_{1},x_{2},\ldots,x_{n}]$ , the transformation matrix can be of Eq. (4):

$\displaystyle P=\left[\begin{array}[]{ccccc}0&0&0&\ldots&1\\ 1&0&0&\ldots&0\\ 0&1&0&\ldots&0\\ \vdots&\vdots&&&\vdots\\ 0&0&\cdots&1&0\\ \end{array}\right]$ (4)

The transformed image constitutes a circulant matrix Eq. (5):

$\displaystyle X=C(x)=\left[\begin{array}[]{ccccc}{x_{1}}&{x_{2}}&{x_{3}}&% \ldots&{x_{n}}\\ {x_{n}}&{x_{1}}&{x_{2}}&\ldots&{x_{n-1}}\\ {x_{n-1}}&{x_{n}}&{x_{1}}&\ldots&{x_{n-2}}\\ \vdots&\vdots&&&\vdots\\ {x_{2}}&{x_{3}}&{x_{4}}&\ldots&{x_{1}}\\ \end{array}\right]$ (5)

The cyclic matrix has a very good property, that is, regardless of the form of x, its circular matrix $X$ can be expressed as $X=$ Fdiag( $\hat{x}$ )F ${}^{H}$ , and substitution Eq. (3) can greatly simplify the calculation.

2.3 Filter response

In order to further simplify the calculation, the numerator $A_{j}$ and the denominator $B_{j}$ of the Eq. (3) are updated respectively:

$\displaystyle A_{j}=(1-\theta)A_{j-1}+\theta\bar{G}_{j}X_{j}$ (6) $\displaystyle B_{j}=(1-\theta)B_{j-1}+\theta\sum{\bar{X}_{j}X_{j}}$ (7)

A robust approximation can be obtained, where $\theta$ is the learning rate. Therefore, for a new frame of input image $z$ , the target position can solve the maximum correlation filter response $y$ , Eq. (8) is obtained:

$\displaystyle y=F^{-1}\left\{{\frac{\sum{\bar{A}Z}}{B+\lambda}}\right\}$ (8)

2.4 Retest

It has been mentioned that the KCF tracking algorithm performs well in various tracking performance indicators, but it cannot achieve effective tracking of the target lost scene. That is because the KCF tracking algorithm does not clearly distinguish between positive and negative samples during the sampling process. The trained classifier only takes the point with the largest confidence value as the target. Therefore, when the target is completely occluded, the samples are all negative samples, it results in the trained classifier, the ability to distinguish between the target and the background is lost, it causes the tracking to fail. In response to this problem, based on the KCF algorithm, this paper adds a support vector machine training classifier, which can be re-detected when the target is lost, thus the tracking re-acquisition is achieved after the target is lost.

The support vector machine is a two-class model that defines the maximum spacing in the feature space [20, 21]. The basic idea is to solve for a separate hyperplane that correctly divides the training data set, and it has the largest geometric spacing. In the machine vision field, SVM is often used for identification and classification. During training, features are usually extracted from the training image, then the feature vector is used to represent the image. When pixels are used as features, the images are scanned in lexicographic order to form feature vectors. Given the N-column vector $x_{i}\in R^{d}$ and the class label $t_{i}\in{\{}-1,1{\}}$ , $\forall i\in{\{}1\ldots N{\}}$ , the SVM classifier will find a hyperplane that satisfies the following Eq. (2.4):

$\displaystyle\mathop{\min}\limits_{w,b1}w^{T}w+C\sum\limits_{i=1}^{N}{% \varepsilon_{i}}$

(9) $\displaystyle{s.t.}\quad{t_{i}(x_{i}^{T}w+b)}\geqslant 1-\varepsilon_{i}$

The superscript $T$ represents transposition, $w$ and $b$ represent hyperplane ( $w$ represents the normal of the hyperplane, $b$ is the bias), $\varepsilon_{i}$ is the slack variable, and $C>0$ is called the penalty parameter, which is generally determined by the application problem, it indicates the degree of punishment for misclassification.

In the improved algorithm, two thresholds, threshold1 and threshold2, are used. Threshold1 is used to determine the occlusion degree of the target. In the experiment, the value is 0.4. When the confidence value is greater than threshold1, it indicates that the frame target is not covered by large area. This frame image can be used to train the SVM classifier to re-detect the classifier. Threshold2 is used to determine whether retesting is required. In the experiment, the value is 0.2. When the classifier response is less than threshold2, it means that the confidence of the detected target is not high, and the SVM classifier is needed for retesting.
2.5 Improved algorithm flow

Based on the previous algorithm improvement analysis, based on the KCF tracking algorithm, the improved kernel correlation filter tracking algorithm is constructed as follows:

SVM-based filtering target tracking algorithm

(1)
Parameter initialization.
(2)
Read the $i$ -th frame picture sequence.
(3)
if $i>1$ .
(4)
The HOG feature is extracted by cutting the search area according to the estimated target position ( $x_{i-1}$ , $y_{i-1}$ ) and size $M\times N$ of the previous frame.
(5)
Calculate the maximum displacement correlation filter confidence value $y^{t}_{i}$ , estimate the latest target position ( $x_{i}$ , $y_{i}$ ).
(6)
Establish a target pyramid at ( $x_{i}$ , $y_{i}$ ), calculate the maximum scale correlation filter confidence value $y^{s}_{i}$ , and estimate the target scale.
(7)
If $y^{t}_{i}\leqslant$ threshold2.
(8)
Use the SVM classifier to detect the target.
(9)
The shift filter and the scale filter are updated according to Eqs (6) and (7).
(10)
If $y^{t}_{i}>$ threshold1.
(11)
Train the SVM classifier according to Eq. (2.4).
(12)
Return to step 2 to start the next frame tracking.

3. Experiment and analysis

In order to verify the effectiveness of the proposed algorithm, 10 video sequences were selected from the literature [5]. These video sequences include illumination changes, scale changes, target occlusion, target loss, rotation, etc. (as shown in Table 1). In the comparative experiment, this paper selects several excellent tracking algorithms for comparison, including Struck, TLD, MIL, KCF, DSST and so on. At the same time, the KCF algorithm for increasing the scale filter (represented by KCF+S) was also added to the comparative experiment. The experiment was carried out in the same experimental environment by using the code published in the author’s paper.

Table 1
Video sequence

Video sequence	Number of frames	Main challenge
Sylvester	1 345	Light change, rotation
Dudek	1 145	Scale change, rotation, target loss, fast movement
Lemming	1 336	Light changes, scale changes, rotation, target loss
Suv	946	Occlusion, rotation, target loss
Tiger2	365	Light changes, rotation, lost target, fast moving
Carscale	254	Scale change, occlusion, rotation, fast movement
Dog1	1 350	Scale change, rotation
Fish	476	Illumination change
Coke	292	Light changes, occlusion, rotation, fast movement
Bolt	350	Occlusion, rotation, target deformation

3.1 Experimental environment and evaluation indicators

Considering the operational efficiency, the program is mixed with MATLAB and C language. The experimental software platform is Matlab R2014b, and two library files, Opencv3.0 and VLFeat, are configured. The operating environment is configured as an Intel Core i7-4790 CPU with a clock speed of 3.6 GHz and 4 GB of memory. In order to quantitatively analyze the performance of the algorithm, three evaluation indexes in the literature [5] were used in the experiment: Center Location Error (CLE), Distance Precision (DP), and Overlap Precision (OP). CLE is the Euclidean distance between the target center of the tracking result and the target center of the manual annotation. The smaller the value, the better the pixel is in pixels. DP is the percentage of frames whose CLE is less than the threshold (typically 20 pixels) as a percentage of the number of frames in the video sequence.

OP is the tracking score: score $=$ area( $B_{t}\cap B_{g}$ )/area( $B_{t}\cup B_{g}$ ) $\in$ [0,1], it is greater than a percentage of the number of frames with a certain threshold (0.5 in this paper) is in the total tracking sequence length, where $B_{t}$ represents the t-frame tracking frame, $B_{g}$ represents the true annotation tracking frame, $\cap$ represents the overlapping area, and $\cup$ srepresents the total coverage area.

3.2 Experimental results and analysis

In the experiment, the algorithm and other excellent tracking algorithms are used to test the video sequences in Table 1, the average of CLE, DP, OP and FPS are obtained. The results are shown in Table 2. The best results are bolded in the table, and the suboptimal ones are underlined.

Table 2
Test results

Evaluation index	CLE/pixel	DP/%	OP/%	FPS
MIL	73.3	43.5	38.3	23
TLD	19.6	74	57.2	14
Struck	70.27	63.3	62.9	11
KCF	21	81.6	72.5	176
DSST	22.1	79.3	79.4	43
KCF+S	21.8	80.5	78.8	46
IKCF	9.9	89.4	82.2	27

It can be seen from Table 2 that compared with other algorithms, IKCF algorithm is optimal among the three evaluation indicators. Compared with the KCF tracking algorithm, CLE reduced the average 11.1 pixel, DP increased by 7.8%, and OP increased by 9.7%. The improvement effect was obvious. However, the introduction of scale filters and re-detection classifiers will inevitably increase the computational complexity, it resultes in slower tracking speeds, but it still guarantee real-time tracking. The experiment also plotted the tracking accuracy curve (DP curve) of the six groups of videos, as are shown in Fig. 1.

Figure 1.

DP curve.

Figure 2.

Several algorithms track results.

It can be seen from the test results in Table 2 that the CLE and DP evaluation indexes of the KCF tracking algorithm are slightly better than the KCF $+$ S tracking algorithm, indicating that the tracking accuracy of the KCF tracking algorithm is better. On the other hand, the OP evaluation index of the KCF tracking algorithm is slightly worse than the KCF $+$ S tracking algorithm, it not only shows that the tracking ratio of the tracking frame and the labeling frame of the KCF $+$ S tracking algorithm is high, but it also indicates that the KCF $+$ S tracking algorithm does change the target scale in the tracking. There is improvement.

The method in this paper is based on the KCF tracking algorithm to solve the scale change and target loss problem. The improvement in scale change is based on the DSST tracking algorithm. Therefore, it can be seen from the DP curve of Fig. 1, for the video sequences of the three main challenges of dog1, fish and dudek, such as scale change, illumination change, partial occlusion, etc., the algorithm of this paper is similar to the DSST tracking algorithm. However, for the three major challenges of coke, lemming, and tiger2, the DP curve of this method is significantly better than the DSST tracking algorithm. This also shows that the method in this paper not only inherits the advantages of the original algorithm, but also it carries out effective and significant improvements.

In order to facilitate a more intuitive comparison, Fig. 2 shows the tracking results of some frames. In the experiment, we can observe the 254 $\sim$ 266 frames of the coke video, the 337 $\sim$ 362 frames of the lemming video and the 677 $\sim$ 686 frames of the suv video, they all have the target full occlusion. It is not difficult to see that this method can quickly find the target location for tracking after the target is lost and reappears, so the performance of these video sequences is better than other tracking algorithms. This shows that the algorithm has re-detection function and can improve the tracking accuracy of the algorithm.

4. Conclusion

Based on the KCF tracking algorithm, an improved kernel correlation filtering target tracking algorithm is proposed. Based on the original algorithm, the scale filter and the re-detection classifier are added to solve the problem of KCF tracking algorithm scale and target loss. Compared with several other excellent tracking algorithms, the experimental results show that the proposed method has excellent performance in various evaluation indicators, and it has certain robustness to illumination changes, scale changes and rotation, it has certain research and application value. The next step is to optimize the performance of each classifier, simplify the calculations, and further improve the algorithm tracking performance.

We validate our object tracking and feature point tracking. Our formulation enables the integration of multi-resolution feature maps. In addition, our approach is capable of accurate sub-pixel localization. Experiments on object tracking benchmarks demonstrate that our approach achieves superior performance compared to the state-of-the-art. Further, our method obtains substantially improved accuracy and robustness for real-time feature point tracking. In this work, we do not use any video data to learn an application specific deep feature representation. This is expected to further improve the performance of our object tracking framework. Another research direction is to incorporate motion-based deep features into our framework.

Footnotes

Acknowledgments

This work is partially supported by Key Research Project of Hunan Provincial Department of Education (395-16A121).

References

Chen

Hong

Z.B.

and Tao

D.C.

, An experimental survey on correlation filter-based tracking, Computer Science 53 (2015), 68–85.

Smeulders

A.W.M.

Chu

D.M.

Cucchiara

et al., Visual tracking: An experimental survey, IEEE Transactions on Pattern Analysis and Machine Intelligence 36(7) (2014), 1442–1468.

Yilmaz

Javed

and Shah

, Object tracking: A survey, ACM Computing Surveys 38(4) (2006), 81–93.

Niu

C.F.

Chen

D.F.

and Liu

Y.S.H.

, Target tracking method based on SIFT feature and particle filter, Robot 32(2) (2010), 241–247.

Lim

and Yang

M.H.

, Object tracking benchmark, IEEE Transactions on Pattern Analysis and Machine Intelligence 37(9) (2015), 1834–1848.

Bertinetto

Valmadre

Golodetz

Miksik

and Torr

P.H.S.

, Staple: Complementary learners for real-time tracking, In: CVPR, 2016.

Gladh

Danelljan

Shahbaz Khan

and Felsberg

, Deep motion features for visual tracking, In: ICPR, 2016.

Liang

Blasch

and Ling

, Encoding color information for visual tracking: Algorithms and benchmark, TIP 24(12) (2015), 5630–5644.

Hare

Saffari

and Torr

P.H.S.

, Struck: Structured output tracking with kernels, 2011 IEEE International Conference on Computer Vision (ICCV), 2011, pp. 263–270.

10.

Kalal

Mikolajczyk

and Matas

, Tracking-learning-detection, IEEE Transactions on Pattern Analysis and Ma chine Intelligence 34(7) (2012), 1409–1422.

11.

Babenko

Yang

M.H.

and Belongie

, Robust object tracking with online multiple instance learning, IEEE Transactions on Pattern Analysis and Machine Intelligence 33(8) (2011), 1619–1632.

12.

Bolme

D.S.

Beveridge

J.R.

Draper

B.A.

et al., Visual object tracking using adaptive correlation filters, 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2010), 2010, pp. 2544–2550.

13.

Henriques

J.F.

Caseiro

Martins

et al., Exploiting the circulant structure of tracking- by-detect with kernels, European Conference on Computer Vision (ECCV 2012), Berlin Heidelberg: Springer, 2012, pp. 702–715.

14.

Henriques

J.F.

Caseiro

Martins

et al., High-speed tracking with kernelized correlation filters, IEEE Transactions on Pattern Analysis and Machine Intelligence 37(3) (2015), 583–596.

15.

Dalal

and Triggs

, Histograms of oriented gradients for human detection, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, USA, 2005, pp. 886–893. DOI: 10.1109/CVPR.2005.177.

16.

Danelljan

Häger

Khan

et al., Accurate scale estimation for robust visual tracking, Proceedings of British Machine Vision Conference, Nottingham, September 1–5, 2014.

17.

Danelljan

Häger

Khan

F.S.

and Felsberg

, Accurate Scale Estimation for Robust Visual Tracking, In Proceedings of the British Machine Vision Conference (BMVC), 2014. http://www.cvl.isy.liu.se/research/objrec/visualtracking/scalvistrack/index.html

18.

Danelljan

Häger

Khan

and Felsberg

, Learning spatially regularized correlation filters for visual tracking, In Proceedings of the International Conference in Computer Vision (ICCV), 2015. http://www.cvl.isy.liu.se/research/objrec/visualtracking/regvistrack/index.html

19.

Danelljan

Häger

Khan

and Felsberg

, Convolutional features for correlation filter based visual tracking, ICCV workshop on the Visual Object Tracking (VOT) Challenge, 2015. http://www.cvl.isy.liu.se/research/objrec/visualtracking/regvistrack/index.html

20.

Rodriguez

Boddeti

V.N.

Kumar

B.V.

et al., Maximum margin correlation filter: A new approach for localization and classification, IEEE Transactions on Image Processing 22(2) (2013), 631–643.

21.

Rodriguezperez

A.F.

, Maximum margin correlation filters, Carnegie Mellon University, 2012.

An improved kernel correlation filtered image target tracking algorithm

Abstract

Keywords

1. Introduction

2. KCF tracking algorithm improvement

2.1 Discriminant correlation filter

Table 1 Video sequence

3.2 Experimental results and analysis

Table 2 Test results

Footnotes

Acknowledgments

References

Table 1
Video sequence

Table 2
Test results