Abstract
The kernel correlation filtering (KCF) tracking algorithm cannot solve the target tracking mesoscale variation and target loss problem. For this, an improved kernel correlation filtering (IKCF) target tracking algorithm is proposed in this paper. A scale filter is added to the training displacement filter to improve the target scale change problem. In order to solve the problem of target loss, the occlusion processing mechanism is combined, when the target is affected by a small occlusion area, the support vector machine (SVM) is used to train the sample online; when the target is occluded, the re-detection classifier is used for detection. The experimental results show that the tracking accuracy of this method is significantly improved compared with other excellent tracking algorithms.
Introduction
Target tracking is an important topic in the field of machine vision, there are a wide range of application scenarios, such as robotics, video surveillance, intelligent transportation, etc. [1, 2, 3, 4]. In recent years, although the target tracking technology has been significantly improved with the introduction of machine learning, it still faces many challenges, such as scale changes, illumination changes, target deformation, target occlusion, etc. [5].
The target tracking algorithm based on detection shows good tracking performance [6, 7, 8], it is the mainstream tracking algorithm in recent years. These algorithms usually treat the tracking process as a classification problem, the classifier is trained offline or online through existing video frames, the trained classifier is used to determine the target position of the next frame, such as the kernel structured output tracking algorithm (Structured output tracking with kernel, Struck) [9], Tracking-Learning Detection tracking algorithm (TLD) [10], Multiple Instance Learning Tracking algorithms (MIL) [11]. The sampling methods of these algorithms are generally sparse sampling, they have obvious shortcomings in tracking accuracy and computational efficiency. Correlation filters have been widely used in target detection and recognition. A Minimum Output Sum of Squared Error (MOSSE) tracking algorithm was proposed by Bolme et al. [12], it was applied to the target tracking field for the first time, it achieved good results. Subsequently, Henriques et al. proposed a Circulant Structure of tracking-by-detection with kernel (CSK) tracking algorithm [13], cyclic structure coding intensive sampling was used innovatively, nonlinear classifier of the regularized least squares (RLS) was trained with kernel method. Then, in the Kernelized Correlation Filter tracking algorithm (KCF) [14], the CSK is improved by using the Histogram of Oriented Gradients (HOG) feature [15]. The Discriminative Scale Space Tracker (DSST) tracking algorithm was proposed by Danelljan et al. [16], the problem of target scale change was solved in the tracking process based on the MOSSE tracking algorithm [17, 18, 19].
Therefore, based on the KCF tracking algorithm, the DSST tracking algorithm is firstly used in this paper to solve the problem of target scale change in the tracking process, the algorithm can not effectively deal with the large-area occlusion or loss of the target encountered in long-term tracking, a re-detection method is proposed to solve the problem of missing re-catch in target tracking, it further improves the accuracy and robustness of the tracking algorithm.
KCF tracking algorithm improvement
The KCF tracking algorithm uses cyclic sampling to train the classifier. This method of dense sampling is superior to the tracking algorithm of sparse sampling, the tracking speed is also improved because the operation is converted to the frequency domain. However, during the tracking process, the tracking frame cannot be adaptively changed with the target scale. After the target is lost, the target cannot be re-tracked, it results in lower tracking performance.
Discriminant correlation filter
KCF Tracking Algorithm main idea of training displacement filters is to learn a discriminant correlation filter to locate a new frame of image. The specific method is to extract a set of gray image blocks
The operation is converted to the frequency domain according to the Parseval theorem:
Where
It is not difficult to see that the calculation of Eq. (3) is very large, so it also greatly affects the real-time performance of the tracking algorithm. The improvement method is to perform cyclic sampling or dense sampling on the target area, it not only improves the calculation efficiency but also improves the tracking accuracy. Different from the sparse sampling method of other algorithms, the correlation filtering does not strictly distinguish between positive and negative samples. The algorithm uses the transformation matrix P to cyclically shift the target image block x. For a one-dimensional image
The transformed image constitutes a circulant matrix Eq. (5):
The cyclic matrix has a very good property, that is, regardless of the form of x, its circular matrix
In order to further simplify the calculation, the numerator
A robust approximation can be obtained, where
It has been mentioned that the KCF tracking algorithm performs well in various tracking performance indicators, but it cannot achieve effective tracking of the target lost scene. That is because the KCF tracking algorithm does not clearly distinguish between positive and negative samples during the sampling process. The trained classifier only takes the point with the largest confidence value as the target. Therefore, when the target is completely occluded, the samples are all negative samples, it results in the trained classifier, the ability to distinguish between the target and the background is lost, it causes the tracking to fail. In response to this problem, based on the KCF algorithm, this paper adds a support vector machine training classifier, which can be re-detected when the target is lost, thus the tracking re-acquisition is achieved after the target is lost.
The support vector machine is a two-class model that defines the maximum spacing in the feature space [20, 21]. The basic idea is to solve for a separate hyperplane that correctly divides the training data set, and it has the largest geometric spacing. In the machine vision field, SVM is often used for identification and classification. During training, features are usually extracted from the training image, then the feature vector is used to represent the image. When pixels are used as features, the images are scanned in lexicographic order to form feature vectors. Given the N-column vector
The superscript
In the improved algorithm, two thresholds, threshold1 and threshold2, are used. Threshold1 is used to determine the occlusion degree of the target. In the experiment, the value is 0.4. When the confidence value is greater than threshold1, it indicates that the frame target is not covered by large area. This frame image can be used to train the SVM classifier to re-detect the classifier. Threshold2 is used to determine whether retesting is required. In the experiment, the value is 0.2. When the classifier response is less than threshold2, it means that the confidence of the detected target is not high, and the SVM classifier is needed for retesting.
Based on the previous algorithm improvement analysis, based on the KCF tracking algorithm, the improved kernel correlation filter tracking algorithm is constructed as follows:
SVM-based filtering target tracking algorithm
Parameter initialization. Read the if The HOG feature is extracted by cutting the search area according to the estimated target position ( Calculate the maximum displacement correlation filter confidence value Establish a target pyramid at ( If Use the SVM classifier to detect the target. The shift filter and the scale filter are updated according to Eqs (6) and (7). If Train the SVM classifier according to Eq. (2.4). Return to step 2 to start the next frame tracking.
In order to verify the effectiveness of the proposed algorithm, 10 video sequences were selected from the literature [5]. These video sequences include illumination changes, scale changes, target occlusion, target loss, rotation, etc. (as shown in Table 1). In the comparative experiment, this paper selects several excellent tracking algorithms for comparison, including Struck, TLD, MIL, KCF, DSST and so on. At the same time, the KCF algorithm for increasing the scale filter (represented by KCF+S) was also added to the comparative experiment. The experiment was carried out in the same experimental environment by using the code published in the author’s paper.
Video sequence
Video sequence
Considering the operational efficiency, the program is mixed with MATLAB and C language. The experimental software platform is Matlab R2014b, and two library files, Opencv3.0 and VLFeat, are configured. The operating environment is configured as an Intel Core i7-4790 CPU with a clock speed of 3.6 GHz and 4 GB of memory. In order to quantitatively analyze the performance of the algorithm, three evaluation indexes in the literature [5] were used in the experiment: Center Location Error (CLE), Distance Precision (DP), and Overlap Precision (OP). CLE is the Euclidean distance between the target center of the tracking result and the target center of the manual annotation. The smaller the value, the better the pixel is in pixels. DP is the percentage of frames whose CLE is less than the threshold (typically 20 pixels) as a percentage of the number of frames in the video sequence.
OP is the tracking score: score
Experimental results and analysis
In the experiment, the algorithm and other excellent tracking algorithms are used to test the video sequences in Table 1, the average of CLE, DP, OP and FPS are obtained. The results are shown in Table 2. The best results are bolded in the table, and the suboptimal ones are underlined.
Test results
Test results
It can be seen from Table 2 that compared with other algorithms, IKCF algorithm is optimal among the three evaluation indicators. Compared with the KCF tracking algorithm, CLE reduced the average 11.1 pixel, DP increased by 7.8%, and OP increased by 9.7%. The improvement effect was obvious. However, the introduction of scale filters and re-detection classifiers will inevitably increase the computational complexity, it resultes in slower tracking speeds, but it still guarantee real-time tracking. The experiment also plotted the tracking accuracy curve (DP curve) of the six groups of videos, as are shown in Fig. 1.
DP curve.
Several algorithms track results.
It can be seen from the test results in Table 2 that the CLE and DP evaluation indexes of the KCF tracking algorithm are slightly better than the KCF
The method in this paper is based on the KCF tracking algorithm to solve the scale change and target loss problem. The improvement in scale change is based on the DSST tracking algorithm. Therefore, it can be seen from the DP curve of Fig. 1, for the video sequences of the three main challenges of dog1, fish and dudek, such as scale change, illumination change, partial occlusion, etc., the algorithm of this paper is similar to the DSST tracking algorithm. However, for the three major challenges of coke, lemming, and tiger2, the DP curve of this method is significantly better than the DSST tracking algorithm. This also shows that the method in this paper not only inherits the advantages of the original algorithm, but also it carries out effective and significant improvements.
In order to facilitate a more intuitive comparison, Fig. 2 shows the tracking results of some frames. In the experiment, we can observe the 254
Based on the KCF tracking algorithm, an improved kernel correlation filtering target tracking algorithm is proposed. Based on the original algorithm, the scale filter and the re-detection classifier are added to solve the problem of KCF tracking algorithm scale and target loss. Compared with several other excellent tracking algorithms, the experimental results show that the proposed method has excellent performance in various evaluation indicators, and it has certain robustness to illumination changes, scale changes and rotation, it has certain research and application value. The next step is to optimize the performance of each classifier, simplify the calculations, and further improve the algorithm tracking performance.
We validate our object tracking and feature point tracking. Our formulation enables the integration of multi-resolution feature maps. In addition, our approach is capable of accurate sub-pixel localization. Experiments on object tracking benchmarks demonstrate that our approach achieves superior performance compared to the state-of-the-art. Further, our method obtains substantially improved accuracy and robustness for real-time feature point tracking. In this work, we do not use any video data to learn an application specific deep feature representation. This is expected to further improve the performance of our object tracking framework. Another research direction is to incorporate motion-based deep features into our framework.
Footnotes
Acknowledgments
This work is partially supported by Key Research Project of Hunan Provincial Department of Education (395-16A121).
