Abstract
Kernel Correlation Filter (KCF) tracker has shown great potential on precision, robustness and efficiency. However, the candidate region used to train the correlation filter is fixed, so tracking is difficult when the target escapes from the search window due to fast motion. In this paper, an improved KCF is put forward for long-term tracking. At first, the moth-flame optimization (MFO) algorithm is introduced into tracking to search for lost target. Then, the candidate sample strategy of KCF tracking method is adjusted by MFO algorithm to make it has the capability of fast motion tracking. Finally, we use the conservative learning correlation filter to judge the moving state of the target, and combine the improved KCF tracker to form a unified tracking framework. The proposed algorithm is tested on a self-made dataset benchmark. Moreover, our method obtains scores for both the distance precision plot (0.891 and 0.842) and overlap success plots (0.631 and 0.601) on the OTB-2013 and OTB-2015 data sets, respectively. The results demonstrate the feasibility and effectiveness compared with the state-of-the-art methods, especially in dealing with fast or uncertain motion.
Introduction
Visual tracking is applied widely as an important task in computer vision. It has long been playing a key role that ranges from military reconnaissance to intelligent traffic control system. Despite significant progress have been achieved in recent years, visual object tracking still has challenging problems, such as deformation, background clutter, fast motion, et al. To overcome these critical problems, a large number of methods have been proposed, and often, these methods can be classified into two categories, the generative and the discriminative methods [1, 3].
In discriminative methods, the tracking algorithm based on correlation filter (CF) has outstanding performance. CF tracker starts from Bolme et al. [4], the data matrix formed by dense sampling of a base sample has circulant structures, which advantages the use of the discrete Fourier transform (DFT) for efficient visual tracking. Then, Henriques et al. [5] firstly utilize circulant structure to devise a kernel correlation filter(KCF). Due to the high accuracy and computational efficiency, KCF tracker attracts considerable attention quickly. Liu et al. [6] fuse multi scale search strategy into KCF to strengthen the ability of adaptive scale change of object. Qian et al. [7] combine KCF with Gaussian Curvature Filter (GCF) to eliminate the excursion problem. Zhang et al. [8] show co-trained KCF (COKCF) to cope with complex surrounding environment and large appearance variations of the target. KCF and its variants show a relatively good performance in tracking. However, it may be difficult for traditional KCF algorithm to handle the challenges of abrupt motion, on account of the target search window only involves a small local neighborhood to limit drift and keep computational cost low. Directly increasing of the search radius will cause interference to the tracker from the background, and the greatly increased negative sample space will cause pressure on the update of the discriminant classifier. Besides, expanding the search radius for a small number of frames that fail to track due to abrupt motion will slow down of the tracker.
To deal with the tracking failure caused by the above mentioned factor, an extended KCF tracker via the moth-flame optimization is presented. The goal is to make the basic KCF tracker successfully perform fast motion tracking with the help of the MFO. The main contributions of this work include three aspects: 1) The MFO is introduced to visual tracking. The candidates region can be obtained from the entire image space instead of a small local region, so the proposed method will predict the fast and uncertainty motion of the targets. 2) We introduce the MFO algorithm into the basic KCF for fast motion tracking. Specifically, MFO is used to predict the possible location of the target and then to guide the generation of candidate samples for the KCF method, thus improving the effectiveness of the KCF framework in fast-motion scenarios. 3) A unified fast motion tracking framework is proposed. We use the MFO guided KCF method and a conservative learning correlation filter to construct our moth-flame optimization kernel correlation filter(MFOKCF) tracking framework. Through the conservative learning correlation filter to determine whether the target is lost due to fast motion, the framework can deal with both smooth motion and fast motion of the target. Extensive experiments on self-made data sets, OTB-2013 and OTB-2015 benchmarks demonstrate the effectiveness of the method.
Related work
Trackers based on KCF
Recently, Kernel correlation filters for visual tracking have drawn much attention because of its computational efficiency. Tracking algorithms based on kernel correlation filters have made considerable progress. Jeong et al. [9] employ a scale space filter and multi-block scheme based on a kernelized correlation filter tracker to address the problems associated with scale variation and occlusion. Uzkent et al. [10] design an ensemble of KCF to address the variations in scale and translation of moving objects which deploy each of KCFs in turn and develop a particle filter to guarantee run-time performance and reduce any potential drifts between individual KCFs’ transition respectively. Gao et al. [11] introduce the spatial regularization component into the ridge regression model used by KCF to overcome traditional KCF does not consider the prior spatial constraint of the feature distribution of the target which simultaneously keeps the real-time and improve the tracking performance. Li et al. [12] present an online tracking framework which combines shallow convolutional neural networks with KCF. The dataset shows that the method achieves excellent performance. Zhang et al. [13] propose an output constraint transfer (OCT) method for mitigating the drifting problem of KCF by modeling the distribution of correlation response in a Bayesian optimization framework. Ding et al. [14] propose a scalable visual tracking algorithm based on kernelized correlation filters, referred to as quadrangle kernelized correlation filters (QKCF) to handle the scale variation of object. Li et al. [15] propose a new scale adaptive kernelized correlation filter tracker (SKCF), which estimates an accurate scale and models the distribution of correlation response to address the template drifting problem. Zhou et al. [16] make some things to improve KCF covers fused features HOG, color-naming, and HSV to boost the tracking performance;enlarge the scale space from the countable integer space into uncountable float space for handling the scale variations; propose adaptive learning rate and occlusion detection mechanism to update the target appearance model for occlusion problem. Wang et al. [17] propose a multi-scale superpixels and color feature guided kernelized correlation filters (MSSCF-KCF) to deal with fast motion and scale variation. Zhang et al. [18] introduce Simulated Annealing into KCF and a unified framework was designed for object tracking of abrupt motion.
Trackers based on swarm intelligence
In a sense, tracking procedure can be treated as optimizing problem to search optimal solution in the search space. Among the many optimization methods, swarm intelligence optimization algorithm has attracted more and more attention and has been applied to tracking problems successfully. Zhang et al. [19] incorporate the temporal continuity information into the Particle Swarm Optimization (PSO) for forming a multilayer importance sampling in the framework of the particle filter. In this case, the tracker got better performance especially when the object has an arbitrary motion or undergoes large appearance changes. Kang et al. [20] propose a hybrid gravitational search algorithm (HGSA) method which combines GSA’s gravitational update component with the cognitive and social components of PSO to increase the utilization of particle information and to facilitate exploitation. Wang et al. [21] propose an ant colony optimization (ACO) based iterative particle filter in which ACO is incorporated into the particle filtering framework in order to overcome the well-known problem of particle impoverishment. Nguyen et al. [22] presente a modified bacterial foraging optimization (BFO) algorithm, and design a visual tracking system based on the bacterial foraging optimization to handle some challenges. Gao et al. [23] propose BA-based (bat algorithm) tracker. A BA-based tracking architecture is given and the results showed it is a potentially powerful tracking method.
Although the above methods are successful in tracking, but a considerable number of them predict the position of a target under the assumption that the target is moving smoothly. For efficiency, these methods typically use a smaller local search region. However, these methods will no longer work when the target is in rapid motion. Compared with the above methods, in this paper, we use the optimization algorithm to dynamically adjust the tracker’s search mode so that it can adapt to both fast motion and smooth motion scenarios.
MFOKCF tracking framework
In this work, we propose a novel MFOKCF tracking framework to address the fast motion tracking problem. There exist two fundamental modules: KCF and MFO. The basic KCF tracker aims to conduct robust object regression in a local search region; when fast motion occurs, the MFO algorithm can provide the most likely candidate regions. These components are described in detail below.
The base KCF tracker
Using the ridge regression as a filtering model, KCF aims to find a function f (x i ) = w T x that minimizes the squared error over samples x i and their objective function y i ,
Where w is kernelized filtering template, λ is a regularization parameter.
The kernel trick is used to further improve performance by allowing the classification on a diversified high-dimensional feature space. The inputs are mapped to the feature space using φ (x i ), defined by the kernel k (x j , x i ) = φ (x j ) φ (x i ) = x j x i . Generally, w can be expressed as w = ∑ i α i φ (x i ). Then
Let kernel matrix K = κ (x
i
, x
j
). If the kernel matrix K is circulant, the dual space coefficients α can be learnt as below:
Where
In the image, the target can be obtained by the trained parameter α and the base image patch x. Suppose a new sample as z, a confidence map y will be calculated by:
Where ⊙ is the element-wise product, k xz is the kernel correlation between sample x and z, the position of the maximum value of y is regarded as the new position of the predicted tracking object.
Inspired by the navigation method of moths in nature called transverse orientation, Mirjalili [24] proposes Moth-Flame Optimization (MFO) algorithm. The key components in the MFO algorithm are moths and flames, which are considered to be a solution, however, they differ in the way they are handled and updated. The moth is the actual search body that moves in the search space, and the flame is the best solution that the moth has acquired so far. Spiral flight of moth. Moths fly spirally around flames and finally converge toward flames. In order to simulate the moth’s behavior of convergence in the mathematical model, a logarithmic spiral is defined in the MFO algorithm:
The number of flames. An adaptive mechanism is applied to the number of flames to strengthen the exploitation of the best promising solutions, as shown in equation (7):
In MFO algorithm, the gradual decrement in number of flames balances the search space for global exploration and local exploitation. It shows superior capabilities of challenging optimization problems with unknown search spaces.
KCF based approaches have shown impressive performance on the tracking benchmarks. However, there are still some deficiencies. Because of the localized search strategy, KCF may fail when the target undergoes fast motion or abrupt motion. In addition, the direct expansion of the search area will bring problems such as difficulty in updating the filter and background interference. We note the advantages of the optimization approach in global exploration. In this paper, MFO algorithm is mentioned to solve the problem of abrupt motion tracking. Through providing possible rough target positioning through MFO algorithm, the shortcomings of traditional KCF can be compensated to some extent. The efficiency of KCF also provide the possibility for this method mentioned above. Based on this, we built the MFOKCF fast motion tracking framework, as shown in Fig. 1.

The flowchart of the MFOKCF tracker.
The MFOKCF tracking framework consists of two parts: the basic KCF module and the MFO module. In frame t, the tracker first determines the search window based on the target location of the previous frame. Then, feature extraction and filtering operations are performed on the extracted candidate samples to obtain the target response. Preliminary target is identified according to the response. We test the reliability of the preliminary target to see if it is lost due to fast motion. The MFOKCF tracker used a conservatively updated correlation filter to maintain the long-term memory of the target. This memory generator adopts a smaller patch to reduce the introduction of background in the training process. It is updated only when the target is in smooth motion. Threshold Thr is defined to decide the necessity of model update. If the response of the memorizer is larger than Thr, the predicted target is considered to be consistent with the target memory. Otherwise, the target is considered to have escaped the search range of the backbone tracker. For this, MFO module is used to search for estimated target locations in a global context, and then the search patch from MFO is provided to KCF for more accurate target location.
Compared with the traditional correlation filter, our method uses the optimization method to improve the tracker’s global exploration ability. When fast motion occurs,tracking drifts will not only cause the tracking failure, but also introduce a lot of background information when the correlation filter is updated, which makes it difficult to recover the follow-up tracking. We first use a conservatively updated correlation filter to capture the occurrence of fast motion. The MFO algorithm is then introduced to provide the target with a new candidate region from which to recover the target. Moreover, the framework avoids introducing more errors in the process of correlation filtering to some extent.
When MFO algorithm is performed to search target, a fitness function is designed for to measure the similarity between the candidate and the target. In this paper, the candidates’ similarity value is computed as:
The symbol D (·) represents variance and the symbol cov (·) calls covariance. X and Y denote the HOG feature of the target and candidate respectively. In addition, an objective function E is designed as the formula below:
Similar values guide the search behavior of moths and the update of the flame position.
On this basis, using the MFO method, the best flame is found as the tracking output based on the maximum similarity value.
KCF is executed to track the target if the maximal response value of memory next frame is greater than the Thr. Otherwise, the motion model is replaced by the MFO. Getting the base image can be represented as an optimization problem, as listed below
Where M i denotes the candidate, S (·) expresses map function, N i indicates the corresponding value. The goal of minimizing is to find the best candidate as the base image. This operation makes it possible to cover the uncertain motion when the KCF faces a tracking failure by searching for a local area.
Experimental setup
To verify the performance of the proposed tracing method,our MFOKCF tracker is validated on three data sets, including self-made fast motion data sets OTB-2013 and OTB-2015 respectively. The proposed method was tested in MATLAB R2016a. The experiments were executed on the PC with Intel Core i5 2.50GHz and 8GB RAM. Threshold Thr determines whether fast motion occurs, we adopted an adaptive threshold to adjust the timing of MFO invocation. After testing, we set the threshold Thr to be 0.38 times of the average memorizer response.The setting of the basic KCF module can be referred to [5]. In order to get anywhere globally, flames number set to 2 and moths number set to 500. The parameters of the tracking method are consistent in each experiment.
Fast motion sequences
We use 10 challenging sequences to prove that the proposed method can adapt to the visual tracking, especially fast motion. The sequences are divided into two groups according to the motion displacement of the target in the adjacent image (Listed in Table 1). One of the both contains the CARDARK, MOUNTAINBIKE, TRELLIS, MAN and DEER sequences, whose motion displacement is less than 50 pixels. And the displacement of the other group is more than 50 pixels, including the BLURCAR3, ZT, BLURFACE, FACE2 and FLEETFACE sequences. FACE2 and ZT are our own. Other sequences are from the website http://visualtracking.net. In addition, in order to mimic the large motion displacement formed by the fast motion of the target, the frames 306-310 in BLURFACE sequence and 401- 410 in FLEETFACE sequence and the frames 26-35 in MAN sequence are extracted.
The image sequences
The image sequences
We compared our tracker with other state-of-the-art trackers, including, High Speed Tracking with Kernelized Correlation Filters(KCF) [5], Accurate Scale Estimation for Robust Visual Tracking (DSST) [25], Fast Compressive Tracking(FCT) [26], Fast Tracking via Spatio-Temporal Context Learning (STC) [27], Least soft-threshold squares tracking(LSST) [28] and Context-Aware Correlation Filter Tracking(CACF) [29]. The same set of parameters of MFO is applied to all sequences.
The slight motion group In CARDARK sequence, as shown in Fig. 2 (row #1), there is a car moving quickly in a dark road scene with background clutter and varying lighting conditions. In the sequence, the target has a relatively smooth motion. Most of trackers have a good performance except FCT that fails at frame #95 due to the illumination changes and the low contrast between foreground and background. In MOUNTAINBIKE sequence, as shown in Fig. 3 (row #2), object undergoes in-plane and out-of-plane rotations in a cluttered background. LSST fails to locate the target after out-of-plane rotation occurs at frame #85. From frame #85, #158 and #195 other trackers are able to accurately track the object throughout the entire sequences. The TRELLIS sequence, as shown in Fig. 2 (row #3), it is captured in an outdoor environment where the object appearance changes significantly as a result of cast shadows, motion and head pose variation. FCT, STC and LSST start to fail in tracking the face after the #412 frame. In MAN sequence, as shown in Fig. 2 (row #4), the larger illumination changing and displacement from frame #25 to #26. As a consequence, all trackers drift or even fail to different degrees. In contrast to these competing trackers, our tracker is able to successfully track the face till the end of the sequence. In DEER sequence, as shown in Fig. 2 (row #5), it describes several deer run and jump in a river. The target undergoes large motion and some frames are blurred, for example frame #32. Meanwhile, there are similar targets at frame #43 that interfere with tracking. DSST, CACF and KCF lose the target from frame #26. Our tracker generates accurate results even with heavy motion blur and water occlusion. The large motion group In BLURCAR3 sequence, as shown in Fig. 3 (row #1), the target is severely blurred in the cluttered background due to fast motion of the camera or the fast motion of the target itself. From the results we can see that our tracker can handle these challenges well. KCF, LSST and FCT, fail to locate the target at frame #119 while STC break down at frame #274. In FACE2 sequence, as shown in Fig. 3 (row #2), the occurrence of fast motion results in a motion displacement up to 88 pixels between adjacent frames, accompanied by a target blur. Most trackers lost their targets in frames #0153 and #0160. However, our method and CACF achieve the best tracking results. In FLEETFACE sequence, as shown in Fig. 3 (row #3), the displacement reaches 125 pixels. At frame #0365, the FCT begins to deviate significantly from the target. At frame #411, the sequence experienced the largest motion displacement. Overall, combined with the results of the frame #0411 and #0635, only our tracker can complete the entire sequence. In BLURFACE sequence, as shown in Fig. 3 (row #4), there are the severe motion blur at frame #150 and #272, which reduces the discriminative information in feature vectors, it is difficult to predict their locations. At the beginning, all trackers perform well in sequences. However, only our tracker obtains the target when it undergoes fast motion at frame #311. Compared with these trackers, our tracker outperform the other trackers. In ZT sequence, as shown in Fig. 3 (row #5), a human face which moves left and right rapidly and motion displacement between consecutive image frames up to 256 pixels. At frame #0011, FCT, LSST and STC deviate from the target first. Immediately thereafter, KCF, STC and DSST failed to track at frame #0033. CACF and our tracker have better tracking performance, and our tracker is slightly better than CACF.

The tracking results with the slight motion.

The tracking results with the large motion.
It can be intuitively seen from the tracking results that in the small motion group, our tracking methods can maintain stable tracking, which indicating that our methods can adapt to smooth or small target displacement. Other comparison algorithms can maintain the tracking in most cases, but there is still some drift or even loss when displacement occurs. When the target displacement increases, in the large motion group, most of the algorithms have serious drift, which leads to the tracking failure. Our algorithm has good tracking performance because it introduces MFO algorithm to estimate the possible position of the target in whole frame. Especially in the FACE sequence, only our method can complete the tracking due to the serious displacement of simulation. Overall, our method can be well adapted to smooth and large displacement tracking scenes.
Fig. 4 respectively shows the Distance Precision and Overlap Precision of 10 sequences. Table 2 and Table 3 report MFOKCF tracker compared to DSST, LSST, KCF, STC, FCT and CACF. Table 2 involves average overlap rate, and Table 3 is related to average error center rate. In the tables, red and blue fonts represent the two best results. It is clear from Fig. 4, Table 2 and Table 3 that MFOKCF tracker performs better than the other 6 trackers when there is a larger movement displacement between continuous images. The MFOKCF tracker also shows better performance in other image sequences. Overall, the method has the advantage of fast motion compared to the 6 trackers.

Precision plots and success plots of OPE.
Average overlap rate
Average center error rate
To test the overall performance of our tracker, we compared our MFOKCF tracker and backbone KCF in OTB-2013 [30] and OTB-2015 [31]. The object tracking benchmark (OTB) is one of the most important datasets in the tracking community. Among them, OTB-2013 contains 50 sequences with different challenges, and OTB-2015 contains 100 sequences. For comparison, the tracking results include the results of some baselines trackers, including HCFT [33], SRDCF [34], Staple [35], MEEM [36], LCT [37], SAMF [38], DCFnet [39], KCF [5] and DSST [25]. Different from the basic KCF tracker based on HOG features, in this section, we use VGGnet [32] deep feature to make the MFOKCF tracker and perform better.
OTB-2013 results
The one-pass evaluation (OPE) result on OTB2013 with the distance precision plot(DPR) and the overlap success plots(OSR) are shown in Fig. 5. It can be seen that the proposed MFOKCF tracker has achieved optimal or suboptimal performance in both metrics. Compared to the basic KCF tracker, the proposed MFOKCF tracker obtain 16.8% gain in DRP and 12.7% gain in OSR. This shows that it is more accurate to obtain the basic samples by MFO algorithm than to apply the localized search window directly. The ability to respond to fast or uncertain motion of the target makes our trackers perform better. Compared to deep feature based tracker DCFnet, the proposed MFOKCF tracker obtain a 9.6% gain in DRP. Moreover, compared to the HCFT tracker using the multilayer convolution feature, the MFOKCF tracker achieves HCFT similar performance on DSP and 2.6% gain on OSR. This shows that our basic sample selection strategy implemented by MFO can help KCF improve tracking performance. In addition, LCT with long-term tracking ability can cope with the fast motion of the target to a certain extent. Compared with LCT, our MFOKCF obtain a 4.3% gain in DRP.

Average overall performance on OTB-2013(Left: Precision Plot Right: Success Plot).
Based on otb-2013, OTB2015 increased the total number of sequences to 100. In the overall performance, our tracker also got the best results.The OPE result on OTB2015 with the DRP and the OSR are shown in Fig. 6. Compared with HCFT, in more complex tracking scenarios, our MFOKCF tracker obtain 0.05% gain in DRP and 3.9% gain in OSR. This shows that our method has better generalization ability. Meanwhile, compared with the basic KCF tracker, our method is obtained 15.5% gain in DRP and 12.9% gain in OSR. Compared with LCT with long time tracking ability, our method still achieves better performance.

Average overall performance on OTB-2015(Left: Precision Plot Right: Success Plot).
The OTB2015 contain 11 different attributes, including out-of-view (OV), deformation (DEF), background clutters (BC), in-plane rotation (IPR), illumination variation (IV), motion blur (MB), fast motion (FM), scale variation (SV), low resolution (LR), out-of-plane rotation (OPR) and occlusion (OCC). We evaluate our tracker in OTB2015 with these 11 different attributes. Our framework improves tracking performance in most scenarios.And There are certain categories that benefit more than others. The most significant improvement is achieved in the cases of motion blur, out-of-plane rotation, background clutter and fast motion as shown in Fig. 7. In particular, if the object experiences fast motion, our framework is very beneficial. Furthermore, our framework also improves significantly for videos with motion blur. This is largely due to the fact that adding the MFO allows for a modified search region. When the target is lost, the MFOKCF retrieves the lost target through a global search. It is also notable that our method has the best performances in background clutter.

Average performance on OTB-100 for 4 attributes.
Combining with MFO algorithm, we propose a fast motion tracking framework based on correlation filtering. MFO algorithm can identify potential targets from the whole framework and provide search patches, so that the KCF tracker has more recognition of specific sequences. In fast motion sequences, our tracker performs well in all test sequences. As the displacement increases, only our method can adapt to the fast motion of the target. This is because the introduced MFO algorithm can effectively capture fast moving targets globally. Because of global sampling, our method also has some adaptability to the uncertain motion of the target (for example, BLURFACE and FLEETFACE). Based on the correlation filter of conservative training, our method can track both smooth and fast motion sequences. Experimental results on otb-2013 data set show that the proposed tracking algorithm has better performance than existing methods. In the 11 challenges of the otb-2015 data sets, our approach performed well for fast motion, motion blur, out-of-field and background complexity. MFO algorithm not only provides more accurate search window, but also enables KCF to use more accurate target template in the process of filter training, thus further improving the accuracy. To sum up, our method can achieve the target fast or smooth tracking.
Conclusion
In this paper, a unified framework is designed based on the correlation filter to handle fast motion tracking, in which MFO are used to enhance the tracking performance. If the target has a relatively smooth motion, the KCF tracker can achieve the desired result. On the contrary, if the target occurs to abrupt motion, MFO is employed to generate base image. Therefore, the proposed method can track both smooth and fast motions simultaneously. The experimental results show that the method improves the accuracy of tracking, especially the target undergoes fast motion. However, as the movement intensifies, the target is usually accompanied by a large change in appearance. Hand-crafted features and single layer learning features are difficult to provide better appearance representation. In future work, we plan to design a better representation for the tracker to cope with the drastic appearance changes that occur when the target is in fast motion.
Compliance with ethical standards
Funding: This study was funded by the National Natural Science Foundation of China (grant number 61873246 and 61503173).
Conflict of Interest: Authors Yibin Chen et al. declare that they have no conflict of interest.
