Abstract
Football matches not only showcase athletes’ skills and spirit but also foster sports culture and social cohesion. This study designs an improved Faster Region-based Convolutional Neural Network-based target detection model to extract football players’ movement routes. The model enhances feature extraction using residual and feature pyramid networks and optimizes Anchor boxes with binary K-means clustering. A similarity matrix integrating motion and appearance features is then used for movement route extraction. The results show that the accuracy and recall of the detection model are excellent, and the intersection over union ratio and precision mean are also better than the comparison model. Meanwhile, in various scenarios, the accuracy of the mobile route extraction algorithm is at a high level, and the average frame rate performs well. The results demonstrate the effectiveness of the detection model and extraction algorithm in analyzing player movement, providing valuable insights for tactical analysis.
Introduction
On the passionate and strategic stage of football, every athlete’s movement is like a dynamic brush outlining the ever-changing game on the green field. The movement route of football players not only contains their personal technical and tactical awareness but also reflects the strategic layout and cooperation of the entire team.1,2 Accurately extracting these movement routes can reveal the hidden codes behind football matches and provides valuable basis for tactical analysis, player performance evaluation, training mode optimization, and many other aspects. It has become a key to exploring the mysteries of football and promoting the development of the football industry.3,4 Therefore, extracting the movement routes of football players becomes particularly important. The research mainly starts with target detection and tracking. Currently commonly used methods include wearable sensors, implantable sensors, monocular and multi-visual tracking, and radio frequency identification technology.5,6 In addition, many scholars both domestically and internationally have explored this topic.
Jiang J B et al. built a Transformer single target tracking algorithm relying on spatiotemporal information fusion to improve the accuracy of tracking algorithms under multiple similar targets. Preliminary tracking results were obtained through a MixFormer tracker. A target state calculation module was designed to calculate and store target state information such as acceleration and motion direction. Compared with the benchmark algorithm MixFormer, the Area Under the Curve (AUC) of the designed algorithm increased by 2.8%, the PNorm index increased by 2.6%, and the average number of frames processed per second reached 28 frames. 7 Li X et al. designed a hybrid system for detecting and recognizing basketball players’ movements. The system took an enhanced Yolo algorithm to detect players within frames and introduced fuzzy logic for final basketball action classification. In addition, the study introduced a residual network model for multi-feature extraction to improve the recognition rate of players. The method achieved a recognition accuracy of 99.3% on 8 basketball movements and had good robustness. 8 Born Z et al. used a fine tuned You Only Look Once version 4 (YOLOv4) to detect players and trained a Convolutional Neural Network (CNN) for target tracking to clarify the on-site positions of professional Australian style football players. The target detection accuracy and athlete tracking were 0.94 and 0.98, respectively, demonstrating good performance and providing good technical and information support for analyzing the collective team behavior of opponents. 9 Wang B designed an improved YOLOv5 and a deep simple online real-time tracking algorithm to enhance the effectiveness of football target detection and tracking. Firstly, the YOLOv5 model was improved using a lightweight network architecture, and attention mechanism was introduced to enhance feature extraction capability and object detection accuracy. Secondly, an unscented Kalman filter was introduced in deep simple online real-time tracking. The results showed that the average accuracy of object detection using the improved YOLOv5 model exceeded 90%, and the AUC values of the object tracking algorithm in different scenarios all exceeded 85%. 10
Ong P et al. built a tracking method using flower pollination algorithm to address the tracking problem of athletes, aiming to track their movements from sports videos. This method represented the current position of the athlete through a search window and evaluated the hue and saturation within that window. This method had significant advantages in detection rate, tracking accuracy, and processing time, far surpassing existing methods. 11 Zhang L et al. designed a computer vision technology to track the movement trajectory of athletes and provide scientific physical data analysis for improving their athletic performance. This technology was based on a universal background eliminator detection model to obtain athlete targets and included a tracking algorithm based on kernel correlation filters. When facing partially occluded targets, this technology also solved the problem by obtaining depth information of moving targets and background segmentation thresholds. The results showed that the designed method could effectively reduce trajectory prediction errors and improve operational efficiency in the trajectory tracking. 12 Zhong Y et al. designed combined CNN and Visual Geometry Group 16-layer net for tracking tennis balls. This model could recognize tennis ball images in single frame images, as well as learn patterns from consecutive frames, and locate tennis balls through 640-360 size images. The highest accuracy was 99.6%, and its performance was significantly better than the standard method, with good ball tracking performance. 13 Maglo A et al. used an encoder-decoder architecture to extract the position of individual football players on the field. The attention mechanism of the encoder was based on a visual transformer, capturing global feature tone features in video frames. In addition, the multiplayer tracker associated player detections between the current frame and the previous frame through binary matching, thereby generating trajectories in frame space. On public datasets, this method outperformed previous works on most metrics. 14 Wu C et al. proposed a new online target tracking algorithm based on motion features to track football players. This algorithm extracted a set of standardized local images from the target areas of visible and infrared images as the target convolution filter. It also took Hue, Saturation, Value color space, and non-uniform quantization algorithms to remove the main color of the stadium and extract histograms of the main colors. The results showed that the algorithm had strong robustness and could better adapt to the player tracking requirements in different scenes of football videos. 15
However, current research and methods also face some challenges, such as player obstruction issues, tracking errors, and accuracy in high-speed motion scenes. To extract the movement routes of football players, a detection model according to Faster Region-based CNN (Faster R-CNN) is designed. The study also constructs a movement route extraction model on the basis of feature similarity and considers the player obstruction. The research objective is to optimize the accuracy of player detection and tracking, reduce errors in movement routes, and enhance the accuracy of football tactical analysis.
The objective of the research is to accurately extract the dynamic trajectories of football players during matches, providing strong data support for tactical analysis, performance evaluation, and training optimization, thereby improving the efficiency and depth of match analysis and assisting coaching teams in developing more scientific training plans and match strategies.
The research method is to use deep learning and computer vision technology to analyze football match videos. Firstly, an improved deep learning Faster R-CNN model is used for object detection and recognition, accurately locating football players. Next, by combining feature extraction and data fusion techniques, player position changes are tracked in real-time. Finally, a movement route extraction model based on similarity matrix is constructed, and the Hungarian algorithm is applied to solve the data association problem, extracting football players’ movement routes while dealing with issues such as target occlusion, disappearance, and addition.
The research innovatively combines ResNet50, Feature Pyramid Network (FPN), and binary K-means clustering to optimize the Faster R-CNN model and improve the detection accuracy of football players. In addition, constructing a similarity matrix that integrates motion, detection, and apparent features effectively solves occlusion problems, reduces tracking errors, and improves the accuracy and robustness of mobile route extraction.
Methods and materials
A detection model based on optimized Faster R-CNN is first designed for extracting the movement routes of football players. Then, a football player movement route extraction model based on a similarity matrix is designed, and the detection extraction in the similarity matrix is obtained by improving the Faster R-CNN.
Improved faster R-CNN model
Target recognition and detection are prerequisites for target tracking. In order to extract the movement routes of football players, the study starts with target recognition and tracking. A football player target detection model on the basis of an optimized Faster R-CNN is designed. A football player movement route extraction model based on similarity matrix is also constructed. The Faster R-CNN, as a deep learning object detection model, has three parts: feature extraction network, Region Proposal Network (RPN), and Region of Interest (ROI) detection network. It has the advantage of balancing detection accuracy and speed.16–18 However, this model has issues such as missed detection and false detection in detecting small targets. To better recognize football player targets, the feature extraction and Anchor box design of the Faster R-CNN model are improved, as displayed in Figure 1. Structure of the improved Faster R-CNN.
From Figure 1, the model involves input images, Residual Network 50 (ResNet50), FPN, ROI Pooling, RPN, fully connected layers, classification tasks, and Bbox regression. The research provides a detailed explanation of the specific improvements. In the feature extraction network, the original network is Visual Geometry Group Network 16 (VGG16).
19
However, in the VGG16 model, as the network layer increases, a tricky problem—gradient vanishing problem—is prone to arise. Therefore, the first improvement of the Faster R-CNN is to replace VGG16 with ResNet50, introduce FPN, and combine ResNet50 and FPN to obtain multi-scale features. ResNet50 is a deep CNN on the basis of residual network architecture, which has powerful feature extraction ability and can effectively solve the gradient vanishing.20,21 The residual unit output Optimization steps for Anchor box.
From Figure 2, the first step of Anchor box optimization is to input the width and height of the bounding box, and the second step is to obtain the initial clustering prior box. Third step, the Intersection over Union (IoU) values of all real boxes and prior boxes is solved. The fourth step is to count the distance between each bounding box and each cluster. The fifth step is to classify the real box as the nearest initial prior box based on the obtained distance and form one cluster. Sixth, the formed cluster is divided into two. The seventh step is to count the mean of all bounding box classes, the eighth step is to select clusters that can be decomposed and divide them into two, and the ninth step is to determine whether the cluster is divided into K. If obtained, the Anchor box is obtained and outputted. Otherwise, the process returns to step four. The IoU is displayed in equation (4).
25
Construction of similarity matrix for target movement route extraction
A football player object detection model is designed for research. For extracting the movement routes of football players, multi-target tracking of football players is changed into a data association problem, and constructing a similarity matrix is a crucial step. The similarity matrix can quantitatively represent the degree of similarity between different elements. In this scene, by constructing a similarity matrix, the matching situation between the existing trajectory set and the detection target can be analyzed. In the similarity matrix, the study considers motion, detection, and visual features and designs different methods. When predicting motion characteristics, the Kalman filter method is adopted in the study. Kalman filter is a mathematical method that uses linear state equations to filter and estimate a variable. It is often used in target tracking, and the state equation of its linear system is shown in equation (7).26,27 Motion feature prediction process based on Kalman filter.
From Figure 3, the first step in motion feature prediction is to initialize the state vector of the target, which usually includes information such as the initial position and velocity of the target. The second is to set the velocity information to 0, which is usually used as the starting point for the prediction process. The third step is to calculate the time difference between adjacent frames, which is achieved by calculating the time interval between the current frame and the previous frame. Fourth, the current motion state is predicted, using the Kalman filter prediction equation to predict the position and velocity of the target in the current frame based on the previous state and time difference. Fifth, the similarity of motion states is solved. The sixth step is to clarify the position information of the target. The seventh step is to refresh the system state of the Kalman filter, which updates the internal state of the filter by using the predicted state vector as a new prior estimate. The eighth step is to output the predicted motion features, including the new position and velocity of the target in the current frame. These predicted motion features are used for subsequent tracking steps, such as data association and trajectory updates, to track target motion. To obtain detection features, an improved Faster R-CNN is applied to obtain multi-scale features and use them as detection features. Subsequently, based on the similarity between different features, cosine similarity is adopted in the study. However, the discriminability of detection features on similar targets can be further optimized, so the research adds apparent feature acquisition to solve similarity between targets. To obtain apparent features, twin networks are used in the study. Twin networks, also known as twin neural networks, are mainly used to solve tasks involving learning similarity or correlation between two inputs. It is typically composed of two identical sub-networks with the same parameters and weights.29,30 The sub-network in the twin network is shown in Figure 4. Structure of sub-networks in twin networks.
From Figure 4, the sub-network involves an input layer, five feature extraction networks, hidden units, and an output layer. The input layer involves convolutional layers and max pooling. “32@64×64” indicates that the feature extraction network has 32 feature maps, and each feature map is 64×64. The output of this network is a multidimensional vector, which is the apparent feature. The loss function is shown in equation (9).
Design of target movement route extraction algorithm
The study constructs a similarity matrix. By solving this matrix, the movement routes of football players can be extracted. To design the algorithm for extracting the target movement route, the Hungarian is applied to solve the data association problem, namely, the similarity matrix, and the target obstruction is considered. The Hungarian is a combinatorial optimization algorithm for solving task allocation problems, which has the advantage of ensuring optimality, being able to get the optimal solution to the allocation problem, and having wide applicability.31,32 The main process of the Hungarian is displayed in Figure 5. Main process of Hungarian algorithm.
From Figure 5, the minimum value of each row element is subtracted from the coefficient matrix. Second, the minimum value of each column element is subtracted from the coefficient matrix. The third step is to determine whether the number of lines used is equal to the order of the matrix. If it is equal, the minimum match is found and output based on the position of the 0 elements in the matrix. Otherwise, more 0 elements are created and the process returns to the second step. In this algorithm, the original benefit matrix is converted into a reduction matrix, as shown in equation (14). Processing flow for target obstruction.
From Figure 6, the processing flow for target obstruction involves obtaining the trajectory of the disappearing target, determining whether the distance is less than the threshold, determining whether the area ratio is less than the threshold, and adding the trajectory set. In addition, the threshold values used for determining the area ratio and distance are based on relevant research.34,35 By considering the target movement route extraction for football players in these three special situations, the tracking performance of the algorithm can be better improved. Finally, the main process of the designed algorithm for extracting the target movement route of football players is shown in Figure 7. Main process of extracting the target movement route algorithm for football players.
From Figure 7, firstly, the overall similarity matrix is solved. Secondly, the correlation between the trajectory and the object detection set is calculated. The third step is to determine whether the association is successful based on a threshold. If successful, the corresponding target is added to the corresponding trajectory. Otherwise, the association is canceled. The fourth step is to determine if there are still uncorrelated trajectories. If not, the process goes to the seventh step. Otherwise, the target obstruction processing method is used to handle the remaining uncorrelated trajectories. The fifth step is to handle disappearing targets, and the sixth step is to handle newly added targets. The seventh step is to output the modified new trajectory set, which is the set of target movement routes, and end the process.
Results
To validate the Faster R-CNN and target movement route extraction algorithm, the experimental environment is set up, the model parameters are clarified, and the dataset used in the experiment is also explained. The research also explains the comparative model and evaluation indicators used in the experiment.
Performance validation of improved Faster R-CNN
Ablation experimental results of the improved Faster R-CNN.
From Table 1, when the Faster R-CNN model was not improved, the corresponding mAP, recall, and MSE values were 0.701, 68.72%, and 3.567, respectively, indicating no significant advantage. The model optimized by combining ResNet50, FPN, and Anchor box outperformed other single or incomplete improvement combinations in mAP, recall, and MSE values. For example, on mAP, the model optimized with ResNet50, FPN, and Anchor box was 0.987, which was 0.035 and 0.074 higher than the mAP optimized with ResNet50 and FPN, and ResNet50 models, respectively. Overall, the improvements made to the Faster R-CNN in the study are effective and can effectively recognize targets for football players. The accuracy and IoU comparison of different models are shown in Figure 8. Comparison of accuracy and intersection over union ratio of different models. (a) Comparison of accuracy. (b) Comparison of intersection over union.
From Figure 8(a), the maximum accuracy of the proposed model was 98.73%, and the minimum was 95.13%. The maximum accuracy of YOLOv5, Faster R-CNN model, combination model A, combination model B, and FPGA model was 89.71%, 87.11%, 93.55%, 94.79%, and 93.21%, respectively, which were 9.02%, 11.62%, 5.18%, 3.94%, and 5.52% lower than 98.73%. From Figure 8(b), in the comparison of IoU, the value of the proposed method was closer to 1, with an average IoU of 0.954. The average IoU of the other five comparison models was 0.843, 0.801, 0.912, 0.925, and 0.903, all of which were lower than 0.954. Overall, the improved Faster R-CNN performs better. This may be because the ResNet50 model introduced in the study solved the gradient vanishing problem of the original Faster R-CNN, introduced FPN to obtain multi-scale features, and optimized the Anchor box. The recall and precision are displayed in Figure 9. Comparison of recall and precision of different models. (a) Comparison of recall. (b) Comparison of precision.
From Figure 9(a), when comparing the recall rates of different models, the proposed method performed better, followed by the combination model B, combination model A, and FPGA model, while the Faster R-CNN and YOLOv5 performed worse. The maximum recall rate of the proposed method was 98.92%, and the minimum was 96.33%. The maximum recall rates of the YOLOv5, Faster R-CNN, combination model A, combination model B, and FPGA model were 90.23%, 85.76%, 93.79%, 95.40%, and 92.66%, respectively. From Figure 9(b), the proposed model and the five comparison models were 98.21%, 88.23%, 86.42%, 91.13%, 92.85%, and 90.04%. Compared with the proposed method with an average precision of 98.21%, the precision of other models was 9.98%, 11.79%, 7.08%, 5.36%, and 8.17% lower, respectively. Overall, the improved Faster R-CNN performs better, with good precision and recall in target recognition for football players, providing a solid technical foundation for extracting their movement routes. This good performance also depends on the introduced ResNet50 model, FPN, and binary K-means.
The improved Faster R-CNN model outperforms the comparison model on accuracy and IoU ratio, as well as recall and precision. This may be because the ResNet50 model is introduced to solve the gradient vanishing problem of the original Faster R-CNN model, and FPN is introduced to obtain multi-scale features and optimize Anchor boxes. The FPN to obtain multi-scale features helps the model maintain high detection accuracy on targets with different sizes, while the optimized Anchor box reduces false positives and false negatives, making the detection results more accurate. These may be the reasons for the data value deviation between the improved Faster R-CNN model and the comparison model.
Performance verification of target movement route extraction algorithm
Performance changes of models under different weight values.
Distribution of the soccer dataset.
From Table 3, there were 80 video sequences in this dataset, and the video images had 19,908 frames. In addition, the resolution in this dataset is 624×352. The experiment marked the positions of 80 players and represented their positions through the text file groundtruth.txt. In addition, when the dataset is obstructed, the frequency range is [1,7]. Meanwhile, according to the degree of obstruction, the date was divided into partial obstruction and complete obstruction. The convergence comparison of different models is shown in Figure 10. Comparison of convergence of different models.
From Figure 10, the proposed mobile route extraction model outperformed the comparison model in terms of convergence speed and final performance. In the early stages of training, the loss value of the model rapidly decreased and dropped to 0.018 in the first 30 iterations, demonstrating high learning efficiency. Afterwards, the loss value of the model remained stable at a minimum of 0.016, demonstrating its excellent convergence performance. In contrast, other models such as DLT and Siamese had a slower decrease in loss values, and the final loss value was also higher than that of the research model, indicating that their convergence effect was relatively poor. Overall, the research model can converge faster and more stably during the training process. The accuracy comparison of different models in different scenes is shown in Figure 11. Comparison of accuracy of various models in different scenes. (a) Unobstructed scene. (b) Same team players obstructing the scene. (c) The scene is obstructed by players from different teams. (d) Mixed dense scene.
Comparison of maximum trajectory displacement errors of different models in different scenes/m.
From Table 4, in different scenes, the maximum trajectory displacement error of the designed movement route extraction algorithm was significantly smaller than that of comparison models. For example, in a mixed dense scene, the maximum value range of trajectory displacement error for designed movement route extraction algorithm was [0.41 m, 0.79 m], with an average of 0.548 m. The average maximum trajectory displacement errors of comparison models were 3.675 m, 2.067 m, 2.405 m, 1.190 m, and 0.805 m, which were 3.127 m, 1.519 m, 1.857 m, 0.642 m, and 0.257 m higher than the research method, respectively. In summary, the trajectory displacement error of the designed movement route extraction algorithm is smaller, and the extracted football player’s movement route is closer to the actual trajectory, resulting in better performance. The tracking speeds of different models in different scenes are shown in Figure 12. Comparison of tracking speed of different models in different scenes. (a) Unobstructed scene. (b) Same team players obstructing the scene. (c) The scene is obstructed by players from different teams. (d) Mixed dense scene.
From Figure 12(a), in an unobstructed scene, the average frame rates of the designed extraction algorithm and the comparison models were 1.23 fps, 7.45 fps, 6.39 fps, 1.37 fps, 2.17 fps, and 2.35 fps, respectively. The designed movement route extraction algorithm has a smaller average frame rate, indicating that its tracking speed is faster. According to Figure 12(b), in the same team players obstructing the scene, the average frame rate of the designed movement route extraction algorithm was 1.36 fps, which was significantly lower than comparison models. From Figure 12(c), in the scene obstructed by players from different teams, the designed movement route extraction algorithm has a lower average frame rate, with a value of 1.05fps. The DLT model and Siamese model had higher average frame rates, with values of 7.68 fps and 6.82 fps, respectively. According to Figure 12(d), in a mixed dense scene, the average frame rates of the six models were 1.17fps, 7.98fps, 6.11fps, 1.53fps, 2.09fps, and 2.67fps, respectively. In summary, the designed movement route extraction algorithm has faster tracking speeds and can extract the movement routes of football players more quickly. The current frame rate of the mobile route extraction model designed in the paper was relatively low (1.05fps∼1.36fps), indicating that there was still room for improvement in processing speed and real-time applications. The low frame rate is mainly due to the complexity of feature extraction and data association calculation in this model, resulting in longer processing time. This also indicates that the model is mainly suitable for offline tactical analysis, as offline analysis does not require high real-time performance, and the accuracy of the model is relatively high. In addition, the model can be optimized for real-time use in the following ways. The first is to optimize the model structure, such as pruning, quantization, and other methods to reduce computational complexity. The second is to adopt more efficient feature extraction algorithms. The third is to use hardware acceleration technology, such as graphics processor acceleration. These optimizations can increase frame rates and make model performance closer to the requirements of real-time applications, especially in situations where real-time performance is not very strict. The AUC of different models in different scenes is shown in Figure 13. Comparison of AUC of different models in different scenes. (a) Unobstructed scene. (b) Same team players obstructing the scene. (c) The scene is obstructed by players from different teams. (d) Mixed dense scene.
From Figure 13(a), in unobstructed scenes, the maximum AUC was achieved by the research algorithm, with a value of 0.987. The AUC values of DLT, Siamese, CCOT, combination model C, and combination model D were 0.627, 0.823, 0.759, 0.923, and 0.956, respectively. From Figure 13(b), in the same team players obstructing the scene, the designed algorithm still had significant advantages in AUC values, which were 0.362, 0.171, 0.187, 0.053, and 0.038 higher than comparison models, respectively. In Figure 13(c), the DLT model had the lowest AUC, with a value of 0.617. The maximum AUC was the designed algorithm, with a value of 0.991. According to Figure 13(d), in a mixed dense scene, the AUC values of the designed movement route extraction algorithm and comparison models were 0.993, 0.625, 0.845, 0.766, 0.930, and 0.952. The research algorithm has better AUC values in different scenes, indicating better performance.
The performance of the target movement route extraction algorithm is superior to the comparison model in terms of accuracy and trajectory displacement error, as well as tracking speed and AUC value. This may be because the algorithm constructs a similarity matrix that integrates motion, detection, and appearance features and uses the Hungarian algorithm to solve the data association problem. The constructed similarity matrix enables the algorithm to comprehensively consider the position changes, apparent features, and motion trends of the target, thereby more accurately matching and tracking the target. The Hungarian algorithm effectively handles problems such as target occlusion, disappearance, and addition, improving the accuracy and robustness of tracking. In addition, the algorithm has developed specialized processing strategies for occlusion situations, such as using Boolean labels to determine the disappearance of targets and initializing new target trajectories, further improving tracking performance in complex scenes. These improvement measures work together to make the target movement route extraction algorithm superior to the comparison model in all performance indicators.
Practical application
The improved Faster R-CNN model and target movement route extraction algorithm designed for research have significant value in practical applications, especially for football match analysis and tactical research. The improved Faster R-CNN model, which combines ResNet50 and FPN structures, achieves a leap in detection accuracy and feature extraction capability, with fast detection speed and strong stability, effectively supporting real-time event analysis. The target movement route extraction algorithm innovatively integrates a similarity matrix with multiple features and the Hungarian algorithm, effectively solving the data association and maintaining high tracking accuracy in complex scenes. However, there is still room for improvement in real-time performance of the model, and the current processing speed is more suitable for offline tactical analysis. In practical applications, high-performance hardware is required to accelerate data processing, and algorithm parameters should be optimized based on the competition environment to enhance their applicability.
In addition, the improved Faster R-CNN model has multi-scale and strong adaptability in feature extraction by introducing ResNet50 and FPN, which enables it to effectively detect targets in different sports scenes (such as badminton and basketball), demonstrating good universality. The model has strong robustness to camera angle changes, mainly due to the FPN structure’s ability to capture multi-scale features, which to some extent offsets the impact of angle changes. The algorithm for extracting target movement routes relies on a similarity matrix constructed by integrating motion, detection, and apparent features, which can adapt to tracking requirements in different motion scenarios. Even when facing changes in camera angles, it can achieve more accurate target tracking through comprehensive feature balancing. Therefore, the improved model has good universality in different movements and camera angles.
Discussion and conclusion
A target detection method based on an optimized Faster R-CNN was designed for extracting the movement routes of football players, and a movement route extraction model on the basis of the similarity matrix was designed. In the ablation experiment, the detection model optimized with ResNet50, FPN, and Anchor box was significantly better than other single or incomplete improvement combinations in mAP, recall, and MSE values. The proposed method showed mAP increases of 0.035 and 0.074 compared to the combination of ResNet50 and FPN models, as well as the ResNet50 model. This indicates that the improvements are effective and can effectively recognize targets for football players. Meanwhile, the maximum accuracy of this model was 98.73%, which was 9.02%, 11.62%, 5.18%, 3.94%, and 5.52% higher than comparison models, respectively. The movement route extraction algorithm had good performance in both unobstructed and obstructed scenes, and its corresponding accuracy and average frame rate were significantly better than comparison models. The average values of the maximum trajectory displacement error of this algorithm in different scenes were 0.586, 0.594, 0.535, and 0.548, respectively, all lower than comparison models, and the extracted movement routes of football players were closer to the actual trajectory. The designed detection model and trajectory extraction algorithm have good performance, which can quickly and accurately detect football players and extract their movement routes.
Limitations
There are some shortcomings in the research. Firstly, the real-time performance of the mobile route extraction model is insufficient and only suitable for offline tactical analysis. Secondly, there is insufficient differentiation of identity information, and the detection model cannot distinguish different identities such as players and referees, which limits its application in complex football scenes. Thirdly, the adaptability of the model needs to be enhanced. In complex and changeable competition scenes, the detection accuracy and tracking effect may be affected. Fourthly, the feature fusion is insufficient. Although a similarity matrix has been constructed to fuse multiple features, there is still room for improvement in the fusion method and depth, and the tracking performance improvement is limited in situations such as occlusion.
Future research can be optimized and explored from multiple perspectives. One is to optimize the model structure, adopt efficient feature extraction algorithms, or utilize hardware acceleration technology to improve real-time performance and meet real-time application requirements. The second is to introduce identity recognition mechanism in the model, combined with identity preservation algorithm, to accurately differentiate and track different identities over the long term such as players and referees. The third is to conduct research on multi-view integration, integrate data from different camera perspectives, overcome the limitations of a single view, improve tracking accuracy and robustness, and expand tactical analysis perspectives. The fourth is to explore tracking methods based on Transformer, using its self-attention mechanism to better model player feature changes and improve tracking accuracy and stability. The fifth is to develop a scenario adaptive model that integrates scenario information to automatically adjust strategies, enhancing generality and adaptability. The sixth is to conduct in-depth research on deep feature fusion methods, mine feature correlation information, improve tracking performance in complex situations, and provide deeper insights for football game analysis.
Footnotes
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
