Abstract
Snowboarding is a kind of sport that takes snowboarding as a tool, swivels and glides rapidly on the specified slope line, and completes all kinds of difficult actions in the air. Because the sport is in the state of high-speed movement, it is difficult to direct guidance during the sport, which is not conducive to athletes to find problems and correct them, so it is necessary to track the target track of snowboarding. The target tracking algorithm is the main solution to this task, but there are many problems in the existing target tracking algorithm that have not been solved, especially the target tracking accuracy in complex scenes is insufficient. Therefore, based on the advantages of the mean shift algorithm and Kalman algorithm, this paper proposes a better tracking algorithm for snowboard moving targets. In the method designed in this paper, in order to solve the problem, a multi-algorithm fusion target tracking algorithm is proposed. Firstly, the SIFT feature algorithm is used for rough matching to determine the fuzzy position of the target. Then, the good performance of the mean shift algorithm is used to further match the target position and determine the exact position of the target. Finally, the Kalman filtering algorithm is used to further improve the target tracking algorithm to solve the template trajectory prediction under occlusion and achieve the target trajectory tracking algorithm design of snowboarding.
Introduction
Snowboarding [1, 2] is a kind of snowboarding competitive event, which uses snowboarding as a tool to rapidly turn around and slide down on the specified slope line; takes off in the air by means of the landslide in the special “U” field or from a certain height platform; competes in the snow with various obstacles, such as racing, flying, jumping and tumbling. At the 1998 Nagano Winter Olympics in Japan, snowboarding was listed as an official event. Because of the high-speed running state of snowboarding, it is very important to predict the sliding trend of snowboarders in advance. And in the process of athletes’ practice, the effective tracking of their movement track can make the athletes find the deficiencies in their training, better guide their operation technology, and improve their ability. But the traditional artificial judgment lacks the actual basis, only relies on the coach’s individual ability to carry on the instruction, and lacks certain objectivity. It is possible to design an intelligent system for tracking the target track of snowboarding, which can give a better objective analysis of the athletes’ track.
Target tracking [3–5] is one of the core problems of machine vision. It is a high-tech which integrates the advanced achievements in different fields [6, 7]. It is important in many fields such as military guidance, video monitoring, medical diagnosis, product detection, virtual reality, and so on. A typical target tracking scene uses a box to identify one or more targets to be tracked in the start frame of the video, and then detects and continuously tracks the specified targets in the subsequent video frames. Some target tracking algorithms have been proposed one after another, but robust target tracking in complex scenes is still a very challenging work. Occlusion is a common situation in target tracking. The target may be occluded by the static object in the background or the mutual occlusion between multiple targets. A robust target tracking algorithm should be able to accurately judge the occurrence of target occlusion and use the residual information of the target to continue tracking the target during occlusion, even when the target is completely occluded, it will not lose the target. Whether the occlusion can be effectively dealt with, especially the serious occlusion and all occlusion, is the basis for evaluating the target tracking algorithm, which is of great significance for improving the robustness of the target tracking algorithm. Researchers at home and abroad have done a lot of research and put forward a lot of effective algorithms, some of which have achieved good tracking results on specific occasions [8, 9].
The mean shift algorithm [10, 11] is a nonparametric estimation method of the density gradient. It has some good properties in the tracking field, such as good real-time performance. In order to solve this problem and achieve better tracking of snowboard moving target, based on the good performance of SIFT [12] and Kalman algorithm [13], this paper proposes a multi-algorithm fusion algorithm for snowboard moving target tracking. In this algorithm, the SIFT feature algorithm can be used for rough matching to locate the fuzzy position of the moving target, and then the mean shift algorithm can be used to locate the exact position of the target accurately. The content of this paper is as follows: in the second chapter, we first introduce the research of different researchers on moving object detection, as well as some existing moving object tracking algorithms. Then in the third chapter, the multi-target matching tracking algorithm and the target tracking algorithm under occlusion are introduced; in the fourth chapter, the multi-algorithm fusion moving target tracking algorithm is formed by the fusion of mean shift algorithm, sift operator and Kalman filtering algorithm is presented. In the fifth chapter, the matching points of the SIFT image matching algorithm are analyzed and compared with the SIFT image matching algorithm. At the same time, the comparison results of various algorithms under different occlusion conditions are given, and the superiority of the fusion of mean shift and Kalman algorithm is verified. The sixth chapter summarizes the whole paper.
Related works
In the aspect of target detection, many target detection algorithms have been proposed in recent years, target detection based on background subtraction, and target detection based on the background model. Typical algorithms include two frame difference method, three frame difference method, optical flow method [14], simple background subtraction method, and Gaussian modeling method. These target detection algorithms have their own advantages and disadvantages. In the static and dynamic backgrounds, the detection effect is different. Two frame difference method, three frame difference method, background subtraction method, and Gaussian modeling method can extract moving objects well in the static background, but the detection effect is not ideal in the dynamic background, and the background noise is large. The optical flow method is effective for dynamic background scenes, but the complexity of this method is high. Without special hardware support, it is difficult to meet the requirements of real-time detection.
Therefore, in recent years, the problem of target detection in the dynamic background has become a difficult and hot topic in the field of target detection. Li Yue et al. [15] proposed a binary hypothesis test method based on the clustering of agglomerative hierarchical features. In the laboratory environment, multiple uniforms (different directions) infrared sensors were used to test the method under the condition of changing ambient light intensity. The experimental results show that the target detection method based on feature level sensor fusion is better than that based on decision level sensor fusion. After that, Li Yue and others [16] further studied and proposed a two-valued hypothesis testing algorithm based on multi-sensor feature clustering. First, simultaneous interpreting the features of the time series signals is from different sensors by using the new feature extraction tool of symbolic dynamic filtering, then grouping these features in the feature space to evaluate the uniformity of sensor response. Finally, according to the distance measurement between sensor clusters, the decision-making of target detection is made. In the research of target detection and tracking algorithm, the complex environment has a great impact on the performance of the algorithm, such as the change of background beam, the swing of trees, rainy and snowy days, etc., which will bring more difficulties to the detection of moving targets. Based on this, Guo et al. [17] proposed a background dynamic generation algorithm, which uses the background dynamic generation algorithm to dynamically construct the background image, reducing the impact of environmental changes on moving object detection to a certain extent, and they also proposed a translation/tilt/zoom rotation angle calculation algorithm. According to the translation/tilt/zoom rotation angle algorithm, the horizontal rotation angle and vertical rotation angle of translation/tilt / Zoom are calculated, and the moving target detection and automatic tracking function are realized. To solve the problem that the traditional moving object detection algorithm is difficult to accurately detect the moving object in the dynamic background, Lu m et al. [18] proposed a moving object detection algorithm based on cellular automata in the dynamic background. In this method, firstly, the SLIC algorithm is used to segment the video image, and the multi-mode hybrid dynamic texture model is used to model the background of the video image. Then the spatiotemporal saliency detection is combined with the optimized saliency map based on the automatic update mechanism of cellular automata. Finally, the optimal saliency image is segmented to get the moving object in the video image. The simulation results show that in the dynamic background, the algorithm can effectively suppress the influence of the non-moving objects in the video image on the detection results, and the detection accuracy of the moving objects is high, and it has certain robustness.
Principle of multi-target matching tracking algorithm
Firstly, the Hog feature of training samples is taken as the input feature of the classifier, and then the prepared data is handed over to the support vector machine (SVM) to learn a hyperplane that can effectively distinguish athletes and background. In target detection, a sliding window is used to search all the targets in the graph, and a detection window is used to scan the whole graph. Every time the window slides to a position, the classifier will give the classification result to judge whether there is an athlete in the window. At the same time, a multi-scale scanning method is used to detect different sizes of targets to ensure that the targets of different sizes are not missed. The test results are recorded as
All tracking targets are represented by a set, which is recorded as O
s
={ O1, O2, ⋯ , O
z
} and z is the total number of tracking targets. The track of a tracking target is recorded as
In the continuous video frame, the motion of the target is continuous in time and space, so it is considered that the predicted target position and the target space of the previous frame meet the distance constraint, that is, the coordinate position of the same target in the adjacent frame will not be too far away. Therefore, the search range can be narrowed according to the distance constraint between the tracking target and the candidate target center point, and the set of candidate boxes satisfying the spatial locality can be filtered out.
By filtering, the computation of the matching process can be reduced and the efficiency of the tracking algorithm can be improved. In this paper, we choose Bhattacharyya Distance (BD) as the distance measurement function. Because the color histogram reflects the color distribution characteristics of the image, the BD can measure the similarity of the two images very well. The BD distance is
Where g (·) is the calculation function of the Babbitt coefficient, that is?
Where h1, h2 represents the color of two targets, which is calculated by f (·) distribution number. nbins represents the cell number of the color histogram. This formula (1) is rewritten as
After finding the most similar candidate target, it is added to the track sequence of the tracking object. Then we do the same operation for other tracking objects to complete the matching of all tracking objects and candidate targets between adjacent frames.
For the mean shift algorithm, it can represent the target image well, but its ability to track the moving target is poor. On the contrary, SIFT can track the fast-moving target well, but it can’t track the target when the image features are less. Therefore, they can complement each other’s advantages and form a moving target tracking algorithm combining the SIFT feature and mean shift algorithm. This algorithm mainly takes the color feature of the target image as the target model, then uses the invariance of the SIFT feature to predict the fuzzy position of the target, and then further determine the location of the target. When the target is occluded, this paper further uses the Kalman filtering algorithm to solve the template trajectory prediction in the occluded situation, and realizes the algorithm design of snowboard target trajectory tracking.
Mean shift algorithm
The histogram distribution of the search window under the weight of the kernel function is calculated, and the histogram distribution of the corresponding window of the current frame is calculated by the same method. Based on the principle of the maximum similarity of the two distributions, the search window moves the real position of the target along the direction of the maximum density increase. In the initial frame, including the target search window, the probability of the u eigenvalue is
Where x0 is the central pixel coordinate of the search window (pixels), x i is the coordinate of the i pixel; k (∥ x ∥ 2) is the kernel function, h is the bandwidth of kernel function, generally equal to half of the window width; The function b and δ are used to judge whether the color value at x i belongs to the characteristic value u; C is a standardized constant-coefficient so that the sum of probabilities of all eigenvalues is 1.
The mean shift vector can be derived by maximizing the similarity function
The mean shift algorithm iterates over and over again to get the optimal position of the target in the current frame.
Firstly, the sift image matching algorithm uses the concept of scale-space to establish the scale-space of the image, searches out the local extreme points of the image as candidate key points in scale space, removes the key points with low contrast and unstable edge response points, determines the main direction of key points, generates key point feature descriptors, and makes each key point have the position, scale and direction information. Finally, the Euclidean distance between feature descriptors is used to measure the matching degree between two feature points.
Feature point detection can be divided into three steps: accurate positioning of key points and distribution of key point directions. The scale-space L (x, y, σ) of a two-dimensional image I (x, y) can be expressed as
The key point of stability detection in scale space: using the Gaussian difference (DOG) operator to approximate the normalized Laplace Gaussian (LOG), the operator
Where k is the threshold? In this way, the Gaussian pyramid is obtained by Gaussian smoothing and downsampling, and then the scale space is formed by subtracting the adjacent scale images to generate the DOG pyramid.
Accurate location of key points: in order to improve the anti-noise ability and stability of matching, the location and scale of key points are determined. Because the DOG operator can produce strong edge response, the unstable edge response points can be removed by using the Hessian matrix to calculate the set threshold of the principal curvature. The specific method is: take (16×16) image as the center, take (pixel area forms a plain window, 4×4) seed points in the window, a total of 4×4 seed points, calculate the gradient accumulation values of 8 directions in each seed point area, draw the direction histogram of the gradient direction, and finally get a 4×4×8 = 128-dimensional feature description vector.
As a numerical estimation optimization method, the Kalman filter has a strong combination with the background of the application field. Therefore, when using Kalman filter to solve practical problems, it is not only the realization and optimization of the algorithm, but also the formal description of the recognized system by using the acquired domain knowledge, the establishment of an accurate mathematical model, and then proceed with the design and implementation of the filter from this model.
Kalman filter is a kind of software filtering method. Its basic idea is: Taking the minimum mean square error as the best estimation criterion, using the state-space model of signal and noise, using the estimated value of the previous time and the observed value of the current time to update the estimation of the state variable, and obtaining the estimated value of the current time. According to the established system equation and observation equation, the algorithm estimates the minimum mean square error of the signal to be processed. Kalman filter introduces the state space theory into the mathematical modeling process of the physical system, which assumes that the system state can be represented by a vector X ∈ R n of dimension space.
The basic Kalman filter (KF) is limited to the linear condition. In most nonlinear cases, the extended Kalman filter (EKF) is used to estimate the system state. In order to understand the Kalman filter more intuitively, the application diagram of the Kalman filter is given, as shown in Fig. 1:

Application diagram of Kalman filter.
Firstly, a SIFT feature library is established for the target image, which is updated continuously in the process of tracking the target, including adding new features, replacing matched features, and deleting old features; manually frame the target, and use the gray or color statistical features of the target image to establish the target model; extract the SIFT features of the target image, and add these features to the feature library. In the next frame, the center of the target search box is invariable and the length and width are enlarged. Then, SIFT features of the search box area are extracted and matched with the feature library. According to the matching results, the approximate position of the target is predicted, and the search box is reduced to the original size; taking the predicted position of the target as the starting search position, the mean shift algorithm performs accurate search; according to the feature matching results and the accurate position of the target, the feature library is updated.
Then, in the Kalman filtering prediction and estimation framework, the target representation of a multi-scale comprehensive histogram is adopted, and then the search optimization is carried out by means of mean shift algorithm. Compared with the traditional histogram tracking algorithm, the tracking accuracy is improved, compared with the SIFT tracking algorithm, the approximate tracking accuracy and the requirements of real-time processing are met. Kalman filter can filter the historical position information of the target, get the motion parameter estimation of the target, reduce the search range of the candidate target matching, and also reduce the interference of the similar object in the comprehensive histogram.
The algorithm is described as follows: Kalman filter is used to calculate the predicted position X
k
of the k frame of the moving target, which provides the initial position Calculate the multi-scale comprehensive histogram of the candidate target area and the model Calculate the weight {w
i
} , i = 1, 2, ⋯ , n and the next new position Recalculate the target model When the
Analysis of algorithm results
Through a sift image matching algorithm to be processed is used to count the gradient direction of the neighboring pixels of the key point, and the peak value of the histogram represents the main direction of the neighboring gradient of the key point, which is the direction of the key point. Figure 2 is the direction histogram corresponding to the image matching point. Figure 3 is a comparison of the sift image matching algorithm. It can be seen from Fig. 2 that the SIFT algorithm can be used to process the picture to get the histogram of the image, and the histogram is used as the direction of the key point to complete the feature point detection so that the feature point can contain the information of location, scale, and direction. As shown in Fig. 3: (1) In terms of rotation and scaling, the mismatch rate of rotation and scaling is lower than that of the non-SIFT algorithm, which indicates that the matching performance is improved after the SIFT algorithm is added; (2) in the aspect of noise blur, after using SIFT algorithm, the noise blur is suppressed and the mismatch rate is reduced, which also shows that the matching performance is improved after adding SIFT algorithm. To sum up, it can be shown that after adding the SIFT algorithm, the image histogram can be well processed, and the performance of the algorithm in two aspects of rotation and scaling, noise blur can be improved, so as to improve the matching performance of the algorithm.

Direction histogram of matching point pair.

The contrast of the SIFT image matching algorithm.
Because of its real-time performance, the mean shift algorithm is insensitive to edge blocking, target rotation, deformation, and background motion, and has a good application effect in the field of target tracking. However, if the target moves too fast, the tracking effect of the algorithm will be weakened. Therefore, in order to solve this problem, this paper proposes the mean shift algorithm, SIFT algorithm, and Kalman filter algorithm, forming a multi fusion moving target tracking algorithm. This algorithm uses sift to determine the fuzzy position of the target, and then uses a mean shift algorithm to determine the exact position of the target. In order to solve the influence of occlusion on the target tracking, this paper further uses the Kalman filtering algorithm to further improve the target tracking algorithm, so as to realize the design of the target tracking algorithm for snowboarding. Figure 4 shows the matching performance comparison before and after the algorithm improvement, where Fig. 4(a) is the performance comparison of each algorithm in the case of no occlusion; Fig. 4(b) is the performance comparison of each algorithm in the case of occlusion. It can be seen from Fig. 4 that the algorithm designed in this paper has a lower mismatch rate than that of the SIFT algorithm or mean shift algorithm, whether there is an occlusion or not, which shows that the performance of the target tracking algorithm designed in this paper is more superior, and it can well realize the tracking of the snowboard trajectory.

Comparison of various algorithms under different occlusion conditions.
Using the Kalman filter, we can reduce the number of iterations of the mean shift algorithm in each frame. Because the calculation of the Kalman algorithm is very small, the running time of the improved algorithm is less than that of the mean shift algorithm. Table 1 is about the analysis of the mean shift plus Kalman algorithm (algorithm in this paper). The second row is the number of iterations of the mean shift algorithm in each frame, and the third row is the total operation time of the algorithm in each frame. Table 2 is the algorithm time analysis of this paper. It can be seen that the algorithm can reduce the iteration times of the mean shift in each frame, and the total calculation time of each frame is relatively reduced. The average running time of the first 15 frames is 0.0185 s, while the corresponding average time of the mean shift algorithm is 0.0284 s. As shown in Fig. 5, the corresponding results of the two algorithms respectively, it can be seen that the mean shift algorithm and the Kalman algorithm are significantly improved when they are used together than when they are used alone.
Analysis of mean-shift algorithm
Analysis of algorithm in this paper

Two algorithms using time contrast graph in preframe.
At present, snowboarding has attracted more and more attention. How to train athletes to get better results has become a hot spot in snowboarding research. Because of its complex and fast action, snowboarding is difficult to be guided by the naked eyes, so it is very important for athletes to effectively use an intelligent system to track snowboarding movement. The effect of the traditional target tracking algorithm in snowboarding is not satisfactory, so this paper proposes a multi-algorithm fusion target tracking algorithm. Using the SIFT algorithm to determine the fuzzy position of the target, and then using the mean shift algorithm to further determine the exact position of the target. In order to avoid the interference of occlusion, this paper further uses the Kalman filtering algorithm to improve the tracking algorithm. In the simulation analysis, the performance of the SIFT algorithm is analyzed. It can be found that the SIFT algorithm improves the performance of the target tracking algorithm in rotation, scaling, and noise blur. In the further analysis, it can be seen that, whether there is an occlusion or not, the target tracking algorithm designed in this paper is superior to the SIFT algorithm or mean shift algorithm before the improvement, and the target tracking effect is better.
Footnotes
Acknowledgments
This work was supported by The Key Special Orientation Project of the National Science and Technology Ministry’s Key R & D Program “Technology Winter Olympics” under Grant No. 2018YFF0300506 and 2018YFF0300601, and was supported by Key Projects of Higher Education Teaching Reform in Heilongjiang Province under Grant No. SJGZ20190033.
