Abstract
In the past, the research of target tracking was often to track problems in a static background, and the tracking scenes were often stable, and the targets were special. However, target tracking is often a tracking problem in the face of realistic complex scenes, and the target and scene are more complex. Therefore, the target tracking algorithm still faces many challenges in practical applications, especially in sports visual feature recognition. Based on the needs of sports feature recognition, this study combines the EIA algorithm to construct a feature recognition model. Moreover, for the shortcomings of the compressed sensing tracking algorithm that cannot accurately and comprehensively describe the target shape through a single target feature, the multi-feature adaptive fusion method is used to visualize the target appearance model, thus improving the accuracy of target tracking. In addition, this study design experiments to analyze the performance of the algorithm model. The research results show that the algorithm model of this study has certain recognition effects.
Introduction
The rapid development of modern science has made tremendous progress in the input and output of sports science and technology. Moreover, high-tech means are widely used in sports training and guide sports training, which makes the level of sports training more and more rational and scientific, and finally obtains the steady improvement of athletes’ technical movement level and the excellent performance in the competition. With the development of the times and the continuous improvement of the scientific level, more and more scientific training methods and monitoring methods will be applied to various sports, which will eventually play an increasingly important role in improving the level of sports training and improving sports performance. So far, the scientific aids used in the correction and improvement of athletes’ technical movements are mainly motion image analysis systems. With the advancement of the times and the leap-forward development of computers, the motion image analysis system has also made rapid progress, both in terms of shooting speed, video quality, and shooting methods [1]. The brand of motion image analysis system independently developed by various countries is a wide variety. For example, the SIMI system developed by Germany, the BASS system, the Motion Analysis system developed by the United States, the Ariel system, the Qualisys system in Switzerland, the NAC system in Japan, and the Aijie and Gold coaching systems independently developed by the Chinese Academy of Sciences, which have independent intellectual property rights [2].
Target tracking is a technical means to distinguish the background from the target and obtain the target motion information (position, angle, trajectory, contour, etc.) to provide data support for deeper processing and analysis. With the rapid development of optoelectronic imaging technology and computer, image-based target tracking technology has been used in important applications [3]. In the war, accurately and quickly identifying and tracking military targets such as aircraft, ships, and missiles is the key to winning the initiative of the war and is of great significance for improving the performance of modern combat systems. In traffic, the positioning, identification and tracking of targets in the scene and the analysis and judgment of the target behavior on this basis can prevent traffic accidents. In modern medicine, the analysis of the heart’s motion can not only help doctors diagnose the heart’s condition, but also conduct in-depth research on the heart’s operating mechanism. However, although the target tracking technology has been studied for several decades and has made great progress and progress, due to changes in the target itself and the surrounding environment, the long-term stable tracking of the target still faces challenges such as posture changes, illumination changes, target occlusion, and frame loss, which is still a very challenging problem [4]. Based on the above analysis, this study applies the visual recognition algorithm to the sports recognition of sports athletes and promotes the practical development of machine learning related algorithms.
Related work
Moving target detection and segmentation is a research hotspot in the field of computer vision. When the weather or illumination in the scene changes, the branches in the background sway, the objects being occluded by the tracking targets, or the target self-occlusion occurs, the detection and segmentation will become difficult. Many world-issued journals (Image Process, IJCV, PAMI, PR, PR Letters, Signal Processing, Computer Vision and Image Process, Digital Signal Process, etc.) have taken the detection and segmentation of targets as an important special research content. In important international conferences such as ICCV, ACCV, CVPR, ECCV, ICPR, ICIP, etc., the target tracking problem is also released as a separate unit. As the frontier of multidisciplinary research, a large number of high-quality, high-impact, and wide-ranging scientific research results have emerged in the field of target detection and target tracking. Computer-based laboratories have been established in well-known universities at home and abroad [5].
With in-depth research, researchers from various countries have designed many kinds of computer vision-based target detection algorithms. There are three commonly used detection algorithms: difference method, background modeling method and optical flow method. The difference method is a relatively simple and effective algorithm for detecting moving targets. In the harsh environment, at the time of target detection, Nawabi D H [6] and others used the interframe difference method in combination with the codebook model. Firstly, a motion model is generated by the inter-frame difference method, and then the codebook model is updated and modified by this model to reduce the interference of the complex background on the moving target and overcome the shortcomings of the codebook model. Even if the codebook model detects the target for a long time, the sensitivity of the detection does not decrease significantly. However, this method may detect certain errors in scenes where multiple moving targets are mutually shielded and cross-moved. Literature [7] completed the detection of targets by using an improved hybrid Gaussian background modeling method in a complex environment. Moreover, the Gaussian component of the pixel is adaptively adjusted by a method of constructing a Gaussian component based on the distribution of pixels in the time domain. In the model parameter update, new parameters are introduced, which can make the background model adaptively adjust with the change of pixels, realize the adaptive detection of the target, and prove the effectiveness and convergence of the algorithm through experiments.
The optical flow method [8] determines the movement trend of the target at the next moment by establishing a statistical model. In recent years, research on the reliability and real-time of target detection based on optical flow method has made some progress. The gradient-based methods currently studied focus on improvements to Lucas-Kanade (LK) and Horn-Schunck (HS). The advantage of HS is that the algorithm has simple estimation of optical flow field and high reliability, but its shortcoming is that it needs constant iterative calculation and consumes time.
In order to overcome the shortcomings of the background modeling algorithm, Providencia R [9] combined the optical flow velocity with the background modeling method to construct a scene model based on optical flow velocity. After that, the detection of the target is completed by this algorithm. In addition to the above basic methods, there are many algorithms that have been proposed. Literature [10] proposed the fusion of multiple algorithms for target detection algorithms. The authors combine contrast with a priori significant target detection to improve the overall brightness of the target. Moreover, in this way, it is also possible to suppress noise and obtain an ideal image. Tracking moving targets through computer vision is one of the important applications of this method. It is developed from image-based target tracking. Under the leadership of Ribet S [11], researchers have developed a tracking system that can be used in real life. This system tracks moving targets based on image information. They perform correlation operations on the real-time image of the radar echo and the pre-obtained image, thus realizing the verification and tracking of the dynamic target. This method has the prototype of applying computer vision technology to the theory of tracking moving targets. Wang M [12] proposed the general principle of target tracking, and was recognized by the world researchers, and began a special study on the tracking of moving targets. In the same period, there are many novel research ideas in the field of image tracking, such as two-dimensional autocorrelation calculation method. Moreover, many new research new theories are proposed based on this theory. Modern tracking technology began with the introduction of the Kalman filter theory. In the 1960s, data association theory and multi-target tracking technology achieved substantial leap-forward development. In the early 1980s, along with the research of intelligent tracking and adaptive tracking algorithms, computer technology has further developed in the field of target tracking. At the end of the last century, a new algorithm, particle filtering, was proposed for target tracking in nonlinear, non-Gaussian, multimodal scenarios. Along with the constant innovation of the theory, many researchers have proposed many improved methods for different tracking algorithms to overcome the shortcomings of the algorithm itself. Foreign researchers have proposed many innovations and improvements for algorithms based on computer vision-based moving target tracking. They mainly propose solutions to the tracking problems in complex scenes such as real-time tracking of targets, changes in target motion state, and occlusion of targets. These algorithms are more efficient, faster, and more stable than traditional algorithms. In the complex and varied scenes, the literature [13], in the framework of particle filter theory, uses the multi-task tracking MTT method to solve the sparse learning problems in the tracking process, such as the change of illumination intensity, the change of target scale, and the occlusion of the target. The algorithm has better adaptability in complex environments, but the algorithm is complex, and the real-time performance of the system is poor. In [14], the combination of mean shift and weighted histogram is used to track the moving target, which has the advantage of accurate tracking.
Plappert M et al. [15] proposed nuclear tracking of targets in four-dimensional space, including the scale of the moving target, the translation of the target, and the rotation of the target. Renyun Z [16] improved the kernel tracking algorithm and proposed two improved algorithms for nuclear tracking. One is the EMD-based nuclear tracking method, and the other is the multi-view based nuclear tracking algorithm. The article [17] implementated IoT-based Smart City is achieved by exploiting IoT and BigData Analytics using Hadoop ecosystem in real time environments. The article [18] reflects on IoT and its main role in the development of human behaviors and actions. The paper also deals with the compilation of various data from different databases connected to the Internet. The literature [19] addresses the numerous issues in the field of vehicle communication with the suggestion for a mutual unified and dispersed spectrum sensing model. The introduction of a mutual cognitive paradigm minimizes conflict and multiple unknown problems. The literature [20] discusses the issue, such as large amount of bigdata, and introduces the SmartBuddy framework for creating smart and adaptive ecosystems using human behaviors and human dynamics. The article [21] talks around the development of coordinated non-cyclic chart for video coding calculations for movement estimation in parallel reconfigurable computing frameworks [22]. The partitioning algorithm moreover plays a key part in optimizing the encoding of images [23].
Common adaptive window adjustment methods
Method based on kernel width change
Kernel window width selection method based on the data-driven. The general flow of the method is as follows: We assume that h
old
is the kernel window width of the tracking frame used by the target tracking algorithm in the previous frame of the image. Then, we can express the new kernel width as:
In the formula, β is the kernel width increase coefficient, which is generally 0.2. The similarity coefficients of the image features in different tracking windows are calculated separately, that is, Bhattacharyya coefficient. Then, according to the result of the calculation, the corresponding tracking window in the case where the value of the Bhattacharyya coefficient is the largest is used as the new value h new of the new tracking window. Then, the expression of the updated tracking window can be expressed as:
However, the update of the trace window size of Equation (2) relies excessively on the value of the Bhattacharyya coefficient, so the coefficient will fall into a local minimum resulting in inaccurate values of the trace window. In response to this shortcoming, this paper attempts to introduce an update parameter r, so that the update of the trace window is linked with the window of the previous frame, and the update of the kernel window is no longer excessively dependent on the Bhattacharyya coefficient, and the update of the kernel window becomes an adaptive learning process. The updated formula is as follows:
As shown in Fig. 1, we divide the change of the target scale in the target tracking process into three categories: the target becomes larger and the target becomes smaller, and the target remains unchanged. When the target becomes larger or smaller, the three new kernel width parameters h1, h2, h3 based on the above one frame are calculated, and the Bhattacharyya coefficient of the three kernel window widths is calculated. Then, according to the formula (3), the current true tracking window h new is updated. The parameter r is very important during the update process of the trace window. Therefore, for different parameters r, the update ability of the tracking window is different, and it also affects the tracking effect. Next, we will discuss the experiment.

Schematic diagram of the change of the Kernel window.
For the value of the parameter r, this article will determine its value and its impact on the tracking effect in the scene where the target size changes. Figure 2 shows the influence of the value of parameter r on the tracking success rate.

Influence of value of parameter r on success rate.
The value of the parameter r represents the update speed and learning ability of the tracking frame. It can be found through experimental comparison that the target tracking success rate is higher when the r values are 0.5, 0.6 and 0.7. At the same time, the validity of the introduced parameters is also proved, and the improved formula (3) can correctly update the tracking frame size and enable it to vary with the target size.
In this paper, a method based on feature point matching affine transformation is used to update the tracking window. The general idea of this method is: firstly, the corner point detection of each frame of the image is performed on the target corner points detected in each frame image, so as to predict and estimate the affine parameters of the corner points. Finally, the target tracking window is adjusted by the estimated parameters. In the target tracking process, the target changes are generally divided into two types: scaling and translation. The affine model is as follows:
In the formula, (x, y) and (x′, y′) are the positions of the same feature point of the target in the jth frame and the i + 1-th frame image, respectively. The translation parameter is e ={ e x , e y }, and the stretch amplitude of the horizontal and vertical directions of the target is s ={ s x , s y }. Then, the kernel window width h new can be updated according to the following methods:
In the formula, h old is the kernel window width in the image of the previous frame.
This method of window adaptive adjustment is more complicated. During the tracking process, the corner points of the target in each frame of the image need to be detected, and the affine model becomes inaccurate when occlusion or background occurs, which seriously affects the adaptive adjustment of the tracking window.
In the tracking process, image information-based measurement method needs to calculate the information amount of the first type and the second type feature points of the target area in each frame of the picture to determine the target size change to update the size of the tracking window.
According to Marr’s visual theory, in order to measure the amount of image information, the number of features of pixels of each frame of image can be used as a measure metric. Therefore, it is necessary to introduce a definition of image feature points, which can be roughly divided into a first type of feature points and a second type of feature points. The number of image feature points within the tracking frame size in each frame of image represents the size of the target size. If the target becomes smaller during the tracking process, the number of feature points of the target is reduced, and vice versa.
We assume that in a two-dimensional space, for a fixed pixel point p (x, y) of an image, two 8 adjacent pixel points around the pixel point are selected.
(a) If ∇ I 1 p (x, y) . ∇ I 2 p (x, y) > 0, then p (x, y) is called the first type of feature point.
(b) If
p (x, y) is called the second type of feature point. We take an alarm clock as an example and give two types of feature points corresponding to the alarm image. As shown in Fig. 3, (a) is the original image, (b) is the first type of feature point, and (c) is the second type of feature point. It can be seen that these two types of feature points reflect the important detail features of the original picture.

Two types of feature points of the alarm clock.
We assume that the number of feature points of the first type of target to be tracked is I1, the number of feature points of the second type is I2, and the amount of information J of the target image is the sum of l1 and l2. During the tracking process, the amount of information in the target tracking window in the previous frame image is defined as I
old
. For a target within each frame of the image, the size of the target can become larger or smaller. Then, as shown in Fig. 1, the amount of information of the target image in the outermost frame can be calculated. If the tracking window of the current frame is set to be larger than the tracking window of the previous frame, the expression of the current frame tracking window is:h
new
= (1 + α) h
old
(α ∈ [0, 0.2]). Through the formula, the amount of feature information in the current frame target tracking window is obtained, which is I
new
.We assume that h
old
is the size of the previous frame tracking window. Then, the target tracking window h
new
in the current frame image can be expressed as:
The tracking window adjustment method based on the amount of image information excessively depends on the number of the first and second types of feature points of the tracking target. In a real scene, there will be various kinds of interference around the target, which will affect the statistics of the feature points. For example, in an occluded target tracking environment, the feature points of the occlusion affect the statistics of the feature points of the real target, which leads to inaccurate changes in the size of the tracking window.
Establishment of adaptive window adjustment model
The window adjustment model proposed in this paper depends on the measurement of the amount of image information and the number of frames of the target size change interval. In the metric module of the image information amount, we use different number of feature points as a measure of the amount of image information. This module directly uses the information amount of the compression feature in the positive sample as a measure of the amount of image information, which avoids a large number of feature calculations and greatly saves time. In the target size change interval frame number module, we do not count the amount of information for each frame. We define the number of interval frames as N and define the amount of compressed feature information for the target positive samples in the tracking window in the current n-th frame image as I1. The current frame tracking window is multiplied by a transform coefficient, and the coefficient is defined as 1 ± α (0 ⪡ α ⪡ 0.5). The value of this coefficient represents an increase or decrease in the tracking window. After the tracking window is increased, the target positive sample compression feature information amount of the tracking window is defined as I2. After the tracking window is reduced, the target positive sample compression feature information amount of the tracking window is defined as I3. Similarly, the information amount of the three tracking frames in the n + N-th frame is defined as I4, I5 and I6,respectively. At this time, the size of the tracking window in the image of the frame is set to be equal to the size of the tracking frame in the image of the n-th frame. As shown in Fig. 4, we can determine the change in target size by the change in the amount of information of the positive sample compression feature within the maximum tracking frame.

The simplified schematic diagram of the corresponding relationship between the amount of information and the size of the window.
We assume that the window scaling factor is T. The calculation method is as follows: When I5 - I2 > 0, we judge that the size of the target may increase, and update the scale change proportional coefficient S according to Equation (7). When I5 - I2 < 0, we can judge that the size of the target may be reduced and update the scale change proportional coefficient S according to Equation (8).
When the tracking scene is complex and there is occlusion or background interference, it will affect the measurement of the amount of image information and lead to inaccuracy of the tracking window update. Therefore, we need to eliminate the influence of background or occlusion on the amount of information. In view of this, we introduce a new parameter β, and its value is different when the target size becomes larger or smaller. According to the analysis of formula (7) and formula (8), when the target size becomes larger, we choose the value of β to be greater than 1. When the target size becomes smaller, we choose the value of β to be less than 1.
The scale variation coefficient T is calculated according to formula (7) and formula (8) after we determine the amount of change in the target positive sample compression feature. Finally, the size of the new tracking window is calculated according to the following formula (9) and formula (10).
The steps of the compression tracking algorithm of the adaptive window are as follows: We set the kth frame as n = 0, the target position tracking window scaling factor is S = 0, and the value of the target tracking window update frame number is N. A series of candidate target atlases are acquired at the k-frame image target position l
k
and their feature values v
z
are extracted in a low-dimensional space, and H, W of the tracking window is extracted, and the information amount l1 (n) , l2 (n) , l3 (n) is calculated. When the number of interval frames is less than N, the classifier parameters are updated according to formula (8) When the number of interval frames is equal to N, the low-dimensional features of these candidate target atlases are classified using a classifier, and the target position IK+Nof the K + N-frame image is found by the maximum classification response. According to Equation (8), the classifier parameters are updated and proceed to the next iteration.
Experimental results and analysis
The parameters of the hardware environment used in the verification algorithm are: i3CPU, 2.4 GHz frequency, 4GB memory, implemented on the Matlab2012 development platform under the Windows7 operating system. Moreover, it was tested on five challenging image sequences.
We run 5 times for each video sequence and use 2 performance metrics to measure the tracking performance of the target algorithm: the first indicator is the success rate (sr). The success rate is defined as the ratio of the number of successful tracking frames n suc to the total number of frames n total of the test video, and is specifically defined as Equation (11). We specify that when the ratio of the tracking frame of the target tracking algorithm to the real target tracking frame is greater than 0.5, the target tracking algorithm successfully tracks the target.
The second indicator is Center Location Error (CLE), which is defined as the deviation between the center coordinates of the tracking frame and the center coordinate of the real frame, as shown in Equation (12).
In the formula, ROI x is the x coordinate of the center of the tracking frame, and ROI y is the y coordinate of the center of the tracking frame.
The algorithm of this paper is compared with three representative algorithms, namely Compressive Tracking (CT) algorithm, MIL algorithm, STC tracking algorithm. This article sets the algorithm parameters according to the parameters provided by the author’s original text and uses the running code provided by the author. Moreover, the parameters of the tracking algorithms adopted in this experiment are all set according to the data in the author’s paper, and the code used is the source code publicly released by the author. The success rate and center deviation are shown in Tables 1 and 2. The tracking effect diagram is shown in Figs. 8.
Success rate
Center deviation

Effect images of CarScale tracking.

Effect images of Singer1 tracking.

Effect images of David2 tracking.

Effect images of Girl tracking.
Experiment analysis:
In the CarScale video sequence, when the CarScale’s tracking target is far and near, the background changes significantly, and the target size grows from small to large. This video sequence mainly verifies the tracking effect of this paper in the case of background changes and dimensional changes. A few frames of tracking effect images are selected. As shown in Fig. 5, the red box corresponds to the algorithm of this paper, the blue box represents CT, the green box represents STC algorithm, and the pink box represents MIL algorithm. As shown in the tracking effect diagram, the four algorithms track the target because the target size does not change much in the 38th frame image. In the 82th frame image, the STC algorithm drifts due to a slight change in the target size, and the other three can track the target, due to the CT algorithm, the tracking frame size of the STC algorithm and the MIL algorithm is always constant, the collection of negative samples is too close to the target area, so that the distinguishability of the background and the target is reduced, and the tracking target is drifted or even lost. Only the tracking frame of the algorithm can adjust the size according to the target size, which ensures the tracking effect.
In the Singer1 video sequence, the background illumination intensity of the image changes significantly, and the target scale is changed from large to small. As shown in the tracking effect diagram, there is no change in the target size and background in the 1st and 7th frames, so the four tracking algorithms do not drift. However, from the 153th frame to the 251th frame, the target size becomes smaller, causing some background information to be introduced into the parametric model. At this time, the target tracking of CT algorithm, STC algorithm and MIL algorithm produces drift, and only the target tracking of the algorithm of this paper is more accurate.
In the David2, Girl, and Tiger1 video sequences, the target scale does not change much, only the rotation and displacement in the plane are changed, so the difference between the four algorithms is small. However, through the analysis of the success rate and center position error table, it is found that the algorithm of this study adopts the method of adaptive window adjustment, so the target is more accurately tracked.
Extracting athletes from complex backgrounds is the key to the design of sports competition tracking systems. It can be used for self-starting of tracking, as well as modeling of moving targets and template updating. Based on the motion statistics of the background, the motion area in the scene is segmented, including the background of the stadium and the athletes. Thereafter, based on the statistical characteristics of the background color, the background area in the motion area is detected, and the remaining motion area is the athlete. In the initial frame, since the motion area does not contain moving targets, it can be used for color statistical modeling.
First, in the initial frame, the athlete’s target is not included, and the largest motion area is extracted using the image difference. Secondly, a color model is established based on the characteristics that pixels are clustered into a long and narrow ellipse in the color RGB space. Thirdly, when the athlete appears in the video, the largest sporting area includes the athlete’s goal. According to the color model, in the largest motion area, the background image is removed, and the remaining images are images of the athletes, thereby completing the extraction of the target.
By eliminating noise interference such as small holes generated by image differential processing and smoothing the outer contour of the target, a foreground moving image can be obtained, including moving objects, fields, and the like, as shown in Fig. 10. It can be seen that the athlete and the background part are separated as a separate area. Since athletes are the main research object of the subject, the proportion of images in this area is extremely high, which is much larger than other sports areas.

Effect images of Tiger1 tracking.

Example of moving target tracking image.
Based on the above analysis, we select the largest area in Fig. 10 as the motion area for the next analysis, including the background and athlete targets, as shown in Fig. 11. Figure 11(a) is a preliminary processing feature image, and 11(b) is a feature extraction image after noise elimination.

Segmentation results of feature recognition motion.
Because of the different characteristics of the sports field, the athletes, the outdoor environment and the like in the image sequence, it is a challenging task to design a system that is robust and can quickly track the athlete’s target. The difficulty in system design is that because the complex background projects are outdoor sports, the various tracks and the surrounding natural environment are different. When the camera calibrates the target, there will be certain errors. The accurate calibration of the target’s three-dimensional world coordinates is an accurate tracking premise. In the future work, it is necessary to find a better and more effective calibration method to complete the three-dimensional calibration of the scene. In the target extraction, a detection algorithm based on richer, more stable and reliable feature points can be designed for detecting and extracting targets. In the tracking phase, a stable and fast tracking algorithm can be designed to improve tracking efficiency and reduce the time of searching for targets. In terms of target detection, image region extraction is performed using image difference. Then, based on the statistical properties of the color of the complex background, a complex background color model is created in the RGB space. By calculating the squares of Mahalanobis distance between the color eigenvectors of the pixel points and the sample center and setting a reasonable threshold, the complex background image can be extracted from the motion region without any prior information. Then, the image of the motion area is differentiated from the complex background image to obtain an image of the athlete, thereby completing the target detection.
In recent years, many new theories and algorithms have appeared in the field of target tracking, and these algorithms have achieved good tracking results. However, how to design a robust real-time tracking algorithm in complex scenes (light changes, occlusion, background interference, etc.) is a challenging problem. This paper mainly studies the tracking algorithm based on compressed sensing and makes theoretical and experimental analysis of the compressed sensing tracking algorithm and proposes two improvements to its shortcomings. The specific improvements are as follows:
According to the situation the unchanged tracking frame window size of the compressed sensing tracking algorithm is easy to cause tracking drift,. this paper analyzes the common window adaptive adjustment algorithm. Moreover, combined with the characteristics of the compressed sensing tracking algorithm, this paper proposes a compressed sensing tracking algorithm based on window adaptive adjustment. The method establishes a mathematical model of tracking window adjustment according to the change of image size information. This method can adjust the size of the tracking frame in time, thus improving the effective line of the tracking algorithm. The experimental results show that the proposed algorithm achieves better tracking results in the scene of target scale change.
Aiming at the problem that the single target feature cannot accurately and comprehensively describe the target model in the compressed sensing tracking algorithm, a multi-feature adaptive fusion based compressed sensing tracking algorithm is proposed by analyzing the common multi-feature fusion method. The method combines texture and color features to adaptively adjust the fusion coefficients in different scenarios, thereby increasing the reliability of describing the target model. The experimental results show that the proposed algorithm achieves good tracking results in various tracking scenarios.
