Football player movement route extraction based on improved faster R-CNN and similarity matrix

Abstract

Football matches not only showcase athletes’ skills and spirit but also foster sports culture and social cohesion. This study designs an improved Faster Region-based Convolutional Neural Network-based target detection model to extract football players’ movement routes. The model enhances feature extraction using residual and feature pyramid networks and optimizes Anchor boxes with binary K-means clustering. A similarity matrix integrating motion and appearance features is then used for movement route extraction. The results show that the accuracy and recall of the detection model are excellent, and the intersection over union ratio and precision mean are also better than the comparison model. Meanwhile, in various scenarios, the accuracy of the mobile route extraction algorithm is at a high level, and the average frame rate performs well. The results demonstrate the effectiveness of the detection model and extraction algorithm in analyzing player movement, providing valuable insights for tactical analysis.

Keywords

Introduction

On the passionate and strategic stage of football, every athlete’s movement is like a dynamic brush outlining the ever-changing game on the green field. The movement route of football players not only contains their personal technical and tactical awareness but also reflects the strategic layout and cooperation of the entire team.^1,2 Accurately extracting these movement routes can reveal the hidden codes behind football matches and provides valuable basis for tactical analysis, player performance evaluation, training mode optimization, and many other aspects. It has become a key to exploring the mysteries of football and promoting the development of the football industry.^3,4 Therefore, extracting the movement routes of football players becomes particularly important. The research mainly starts with target detection and tracking. Currently commonly used methods include wearable sensors, implantable sensors, monocular and multi-visual tracking, and radio frequency identification technology.^5,6 In addition, many scholars both domestically and internationally have explored this topic.

Jiang J B et al. built a Transformer single target tracking algorithm relying on spatiotemporal information fusion to improve the accuracy of tracking algorithms under multiple similar targets. Preliminary tracking results were obtained through a MixFormer tracker. A target state calculation module was designed to calculate and store target state information such as acceleration and motion direction. Compared with the benchmark algorithm MixFormer, the Area Under the Curve (AUC) of the designed algorithm increased by 2.8%, the PNorm index increased by 2.6%, and the average number of frames processed per second reached 28 frames.⁷ Li X et al. designed a hybrid system for detecting and recognizing basketball players’ movements. The system took an enhanced Yolo algorithm to detect players within frames and introduced fuzzy logic for final basketball action classification. In addition, the study introduced a residual network model for multi-feature extraction to improve the recognition rate of players. The method achieved a recognition accuracy of 99.3% on 8 basketball movements and had good robustness.⁸ Born Z et al. used a fine tuned You Only Look Once version 4 (YOLOv4) to detect players and trained a Convolutional Neural Network (CNN) for target tracking to clarify the on-site positions of professional Australian style football players. The target detection accuracy and athlete tracking were 0.94 and 0.98, respectively, demonstrating good performance and providing good technical and information support for analyzing the collective team behavior of opponents.⁹ Wang B designed an improved YOLOv5 and a deep simple online real-time tracking algorithm to enhance the effectiveness of football target detection and tracking. Firstly, the YOLOv5 model was improved using a lightweight network architecture, and attention mechanism was introduced to enhance feature extraction capability and object detection accuracy. Secondly, an unscented Kalman filter was introduced in deep simple online real-time tracking. The results showed that the average accuracy of object detection using the improved YOLOv5 model exceeded 90%, and the AUC values of the object tracking algorithm in different scenarios all exceeded 85%.¹⁰

Ong P et al. built a tracking method using flower pollination algorithm to address the tracking problem of athletes, aiming to track their movements from sports videos. This method represented the current position of the athlete through a search window and evaluated the hue and saturation within that window. This method had significant advantages in detection rate, tracking accuracy, and processing time, far surpassing existing methods.¹¹ Zhang L et al. designed a computer vision technology to track the movement trajectory of athletes and provide scientific physical data analysis for improving their athletic performance. This technology was based on a universal background eliminator detection model to obtain athlete targets and included a tracking algorithm based on kernel correlation filters. When facing partially occluded targets, this technology also solved the problem by obtaining depth information of moving targets and background segmentation thresholds. The results showed that the designed method could effectively reduce trajectory prediction errors and improve operational efficiency in the trajectory tracking.¹² Zhong Y et al. designed combined CNN and Visual Geometry Group 16-layer net for tracking tennis balls. This model could recognize tennis ball images in single frame images, as well as learn patterns from consecutive frames, and locate tennis balls through 640-360 size images. The highest accuracy was 99.6%, and its performance was significantly better than the standard method, with good ball tracking performance.¹³ Maglo A et al. used an encoder-decoder architecture to extract the position of individual football players on the field. The attention mechanism of the encoder was based on a visual transformer, capturing global feature tone features in video frames. In addition, the multiplayer tracker associated player detections between the current frame and the previous frame through binary matching, thereby generating trajectories in frame space. On public datasets, this method outperformed previous works on most metrics.¹⁴ Wu C et al. proposed a new online target tracking algorithm based on motion features to track football players. This algorithm extracted a set of standardized local images from the target areas of visible and infrared images as the target convolution filter. It also took Hue, Saturation, Value color space, and non-uniform quantization algorithms to remove the main color of the stadium and extract histograms of the main colors. The results showed that the algorithm had strong robustness and could better adapt to the player tracking requirements in different scenes of football videos.¹⁵

However, current research and methods also face some challenges, such as player obstruction issues, tracking errors, and accuracy in high-speed motion scenes. To extract the movement routes of football players, a detection model according to Faster Region-based CNN (Faster R-CNN) is designed. The study also constructs a movement route extraction model on the basis of feature similarity and considers the player obstruction. The research objective is to optimize the accuracy of player detection and tracking, reduce errors in movement routes, and enhance the accuracy of football tactical analysis.

The objective of the research is to accurately extract the dynamic trajectories of football players during matches, providing strong data support for tactical analysis, performance evaluation, and training optimization, thereby improving the efficiency and depth of match analysis and assisting coaching teams in developing more scientific training plans and match strategies.

The research method is to use deep learning and computer vision technology to analyze football match videos. Firstly, an improved deep learning Faster R-CNN model is used for object detection and recognition, accurately locating football players. Next, by combining feature extraction and data fusion techniques, player position changes are tracked in real-time. Finally, a movement route extraction model based on similarity matrix is constructed, and the Hungarian algorithm is applied to solve the data association problem, extracting football players’ movement routes while dealing with issues such as target occlusion, disappearance, and addition.

The research innovatively combines ResNet50, Feature Pyramid Network (FPN), and binary K-means clustering to optimize the Faster R-CNN model and improve the detection accuracy of football players. In addition, constructing a similarity matrix that integrates motion, detection, and apparent features effectively solves occlusion problems, reduces tracking errors, and improves the accuracy and robustness of mobile route extraction.

Methods and materials

A detection model based on optimized Faster R-CNN is first designed for extracting the movement routes of football players. Then, a football player movement route extraction model based on a similarity matrix is designed, and the detection extraction in the similarity matrix is obtained by improving the Faster R-CNN.

Improved faster R-CNN model

Target recognition and detection are prerequisites for target tracking. In order to extract the movement routes of football players, the study starts with target recognition and tracking. A football player target detection model on the basis of an optimized Faster R-CNN is designed. A football player movement route extraction model based on similarity matrix is also constructed. The Faster R-CNN, as a deep learning object detection model, has three parts: feature extraction network, Region Proposal Network (RPN), and Region of Interest (ROI) detection network. It has the advantage of balancing detection accuracy and speed.^16–18 However, this model has issues such as missed detection and false detection in detecting small targets. To better recognize football player targets, the feature extraction and Anchor box design of the Faster R-CNN model are improved, as displayed in Figure 1.

Figure 1.

Structure of the improved Faster R-CNN.

From Figure 1, the model involves input images, Residual Network 50 (ResNet50), FPN, ROI Pooling, RPN, fully connected layers, classification tasks, and Bbox regression. The research provides a detailed explanation of the specific improvements. In the feature extraction network, the original network is Visual Geometry Group Network 16 (VGG16).¹⁹ However, in the VGG16 model, as the network layer increases, a tricky problem—gradient vanishing problem—is prone to arise. Therefore, the first improvement of the Faster R-CNN is to replace VGG16 with ResNet50, introduce FPN, and combine ResNet50 and FPN to obtain multi-scale features. ResNet50 is a deep CNN on the basis of residual network architecture, which has powerful feature extraction ability and can effectively solve the gradient vanishing.^20,21 The residual unit output $H (x)$ is shown in equation (1).²²

H (x) = F (x) + x

(1)

in equation (1),

x

signifies the input.

F (x)

signifies the fitting residual. The final output feature learned from shallow

l

to deep

L

is shown in equation (2).

x_{L} = x_{l} + \sum_{i = l}^{L - 1} F (x_{i}, w_{i})

(2)

in equation (2),

i

represents the sequence number of the residual unit.

w_{i}

signifies the weight of the

i

-th residual unit. ResNet50 can solve the gradient vanishing problem faced by VGG16. FPN includes feature acquisition, feature fusion, and multi-scale merging. In feature fusion and multi-scale merging, the study uses double up-sampling to ensure that the fused feature map size is consistent. Then, the results are merged with the output of each layer of the ResNet50 at multiple scales. Convolution operation is used to avoid aliasing effects caused by up-sampling, and the final feature map

G

is output.

G

is displayed in equation (3).

G = G_{s u m} + \log_{2} \frac{\sqrt{a h}}{m}

(3)

in equation (3),

a

and

h

signify the width and height of the ROI.

G_{s u m}

represents the total final feature map.

m

represents the image size. The second improvement of the Faster R-CNN is to optimize the Anchor boxes using the binary K-means clustering. The basic idea is to gradually divide the dataset into smaller sub-sets, and each partition divides a cluster into two sub-clusters.^23,24 This algorithm avoids local optima, easy to understand and implement, and suitable for handling clusters of different shapes and sizes. The optimization steps for the Anchor box are shown in Figure 2.

Figure 2.

Optimization steps for Anchor box.

From Figure 2, the first step of Anchor box optimization is to input the width and height of the bounding box, and the second step is to obtain the initial clustering prior box. Third step, the Intersection over Union (IoU) values of all real boxes and prior boxes is solved. The fourth step is to count the distance between each bounding box and each cluster. The fifth step is to classify the real box as the nearest initial prior box based on the obtained distance and form one cluster. Sixth, the formed cluster is divided into two. The seventh step is to count the mean of all bounding box classes, the eighth step is to select clusters that can be decomposed and divide them into two, and the ninth step is to determine whether the cluster is divided into K. If obtained, the Anchor box is obtained and outputted. Otherwise, the process returns to step four. The IoU is displayed in equation (4).²⁵

I o U = \frac{A \cap B}{A \cup B}

(4)

in equation (4),

A

signifies the real frame.

B

signifies the prior box.

\cap

and

\cup

represent intersection and union operations, respectively. In the eighth step, the Sum of Squared Errors (SSE) of the cluster is considered. The

S S E_{R}

of the Anchor box width

R

is shown in equation (5).

S S E_{R} = \sum_{j = 1}^{P} C_{j} {(R_{j} - \bar{R})}^{2}

(5)

in equation (5),

C_{j}

represents the weight value.

j

represents the sequence number of the Anchor box.

P

is the total number of Anchor boxes.

\bar{R}

represents the average width. The

S S E_{D}

for the height

D

of the Anchor box is shown in equation (6).

S S E_{D} = \sum_{j = 1}^{P} C_{j} {(D_{j} - \bar{D})}^{2}

(6)

in equation (6),

\bar{D}

represents the height mean.

Construction of similarity matrix for target movement route extraction

A football player object detection model is designed for research. For extracting the movement routes of football players, multi-target tracking of football players is changed into a data association problem, and constructing a similarity matrix is a crucial step. The similarity matrix can quantitatively represent the degree of similarity between different elements. In this scene, by constructing a similarity matrix, the matching situation between the existing trajectory set and the detection target can be analyzed. In the similarity matrix, the study considers motion, detection, and visual features and designs different methods. When predicting motion characteristics, the Kalman filter method is adopted in the study. Kalman filter is a mathematical method that uses linear state equations to filter and estimate a variable. It is often used in target tracking, and the state equation of its linear system is shown in equation (7).^26,27

{\begin{cases} y_{t} = E y_{t - 1} + F u_{t} + q_{t} \\ z_{t} = O y_{t} + v_{t} \end{cases}

(7)

in equation (7),

t

represents time.

y_{t}

and

y_{t - 1}

represent the state values at

t

and

t - 1

O

signifies the output matrix.

F

represents the control matrix.

E

is the state transition matrix.

u_{t}

v_{t}

q_{t}

, and

z_{t}

represent the input variables, observation noise, process noise, and observation values at

t

, respectively. The state update equation is shown in equation (8).²⁸

{\begin{cases} K_{t} = P_{t}^{-} H^{T} {(H P_{t}^{-} H^{T} + V)}^{- 1} \\ {\hat{y}}_{t} = {\hat{y}}_{t}^{-} + K_{t} (Z_{t} - H {\hat{y}}_{t}^{-}) \\ P_{t} = (I - K_{t} H) P_{t}^{-} \end{cases}

(8)

in equation (8),

K_{t}

represents the Kalman filter gain.

P_{t}

represents the prior estimated covariance at time

t

H

signifies the observation matrix.

T

signifies the matrix transpose operation.

V

represents the covariance of the observed noise.

{\hat{y}}_{t}

is the posterior estimated state value at

t

Z_{t}

represents the actual observation value at time

t

I

is the identity matrix. In addition, the negative sign on the variable represents the inverse matrix of the matrix. Therefore, the motion feature prediction process based on Kalman filter is shown in Figure 3.

Figure 3.

Motion feature prediction process based on Kalman filter.

From Figure 3, the first step in motion feature prediction is to initialize the state vector of the target, which usually includes information such as the initial position and velocity of the target. The second is to set the velocity information to 0, which is usually used as the starting point for the prediction process. The third step is to calculate the time difference between adjacent frames, which is achieved by calculating the time interval between the current frame and the previous frame. Fourth, the current motion state is predicted, using the Kalman filter prediction equation to predict the position and velocity of the target in the current frame based on the previous state and time difference. Fifth, the similarity of motion states is solved. The sixth step is to clarify the position information of the target. The seventh step is to refresh the system state of the Kalman filter, which updates the internal state of the filter by using the predicted state vector as a new prior estimate. The eighth step is to output the predicted motion features, including the new position and velocity of the target in the current frame. These predicted motion features are used for subsequent tracking steps, such as data association and trajectory updates, to track target motion. To obtain detection features, an improved Faster R-CNN is applied to obtain multi-scale features and use them as detection features. Subsequently, based on the similarity between different features, cosine similarity is adopted in the study. However, the discriminability of detection features on similar targets can be further optimized, so the research adds apparent feature acquisition to solve similarity between targets. To obtain apparent features, twin networks are used in the study. Twin networks, also known as twin neural networks, are mainly used to solve tasks involving learning similarity or correlation between two inputs. It is typically composed of two identical sub-networks with the same parameters and weights.^29,30 The sub-network in the twin network is shown in Figure 4.

Figure 4.

Structure of sub-networks in twin networks.

From Figure 4, the sub-network involves an input layer, five feature extraction networks, hidden units, and an output layer. The input layer involves convolutional layers and max pooling. “32@64×64” indicates that the feature extraction network has 32 feature maps, and each feature map is 64×64. The output of this network is a multidimensional vector, which is the apparent feature. The loss function is shown in equation (9).

L ({\partial, (β, χ_{1}, χ_{2})}^{ξ}) = (1 - β) \frac{2}{ϕ} {(φ)}^{2} + 2 β ϕ e^{- \frac{2.77}{ϕ} φ}

(9)

in equation (9),

\partial

represents the parameters that the network needs to learn.

χ_{1}

and

χ_{2}

are two training objectives, and

{(β, χ_{1}, χ_{2})}^{ξ}

represents the

ξ

-th pair input data.

β

is the label.

ϕ

represents a constant.

φ

represents the energy function.

e

is a natural constant. The similarity matrix

θ^{t}

is displayed in equation (10).

θ^{t} = λ^{t} \circ η^{t} \circ δ^{t}

(10)

in equation (10),

λ^{t}

η^{t}

, and

δ^{t}

represent the similarity matrices of motion, detection, and apparent features, respectively.

\circ

represents element multiplication operation. The

λ^{t}

is displayed in equation (11).

λ^{t} = e^{- ϖ_{1} \times ({(\frac{X - X^{'}}{R^{'}})}^{2} + {(\frac{Y - Y^{'}}{D^{'}})}^{2})}

(11)

in equation (11),

ϖ_{1}

represents the contribution weight of

λ^{t}

(X, Y)

signifies the estimated position coordinate.

(X^{'}, Y^{'})

represents the coordinates of the detection location.

(R^{'}, D^{'})

signifies the width and height values of the detection box. The

η^{t}

is shown in equation (12).

η^{t} = ϖ_{2} \times c o s i n e (ε_{t - 1}, ϑ_{t})

(12)

in equation (12),

ϑ_{t}

represents the target detection feature at

t

ϖ_{2}

represents the contribution weight of

η^{t}

ε_{t - 1}

represents the detection feature of the motion trajectory at time

t - 1

c o s i n e (\cdot)

is the cosine function. The

δ^{t}

is shown in equation (13).

δ^{t} = ϖ_{3} \times c o s i n e (Φ_{t - 1}, Θ_{t})

(13)

in equation (13),

Φ_{t - 1}

represents the apparent characteristics of the motion trajectory at time

t - 1

ϖ_{3}

represents the contribution weight of

δ^{t}

Θ_{t}

represents the apparent characteristics of the target at time

t

Design of target movement route extraction algorithm

The study constructs a similarity matrix. By solving this matrix, the movement routes of football players can be extracted. To design the algorithm for extracting the target movement route, the Hungarian is applied to solve the data association problem, namely, the similarity matrix, and the target obstruction is considered. The Hungarian is a combinatorial optimization algorithm for solving task allocation problems, which has the advantage of ensuring optimality, being able to get the optimal solution to the allocation problem, and having wide applicability.^31,32 The main process of the Hungarian is displayed in Figure 5.

Figure 5.

Main process of Hungarian algorithm.

From Figure 5, the minimum value of each row element is subtracted from the coefficient matrix. Second, the minimum value of each column element is subtracted from the coefficient matrix. The third step is to determine whether the number of lines used is equal to the order of the matrix. If it is equal, the minimum match is found and output based on the position of the 0 elements in the matrix. Otherwise, more 0 elements are created and the process returns to the second step. In this algorithm, the original benefit matrix is converted into a reduction matrix, as shown in equation (14).

Γ_{τ, υ}^{'} = Γ_{τ, υ} - \min (Γ_{τ, $}) - \min (Γ_{$, υ})

(14)

in equation (14),

Γ_{τ, υ}

and

Γ_{τ, υ}^{'}

represent the matrix elements before and after reduction, respectively.

Γ_{τ, $}

and

Γ_{$, υ}

represent the minimum values in row

τ

and column

υ

, respectively. In addition, the implementation process of finding independent zero elements is shown in equation (15).³³

Ξ_{τ, υ} = {\begin{cases} 1 \\ 0 \end{cases} \begin{array}{l} , \\ , \end{array} \begin{array}{l} i f (τ, υ) = Ω \\ e l s e \end{array}

(15)

in equation (15),

Ξ_{τ, υ}

represents the elements in the matrix.

Ω

is an independent zero element. In addition, in the target movement route extraction algorithm, in order to avoid false associations, a similarity threshold i9s set, which means that once the similarity between the trajectory and the target is less than this threshold, no association will be made. After going through some specific processing steps mentioned above, if there are still elements in the trajectory set or object detection set that have not yet established an association relationship, it can be reasonably inferred that there are complex situations such as target disappearance, addition, or obstruction. The research carefully and comprehensively develops specialized handling strategies for these special and challenging situations. Firstly, Boolean labels are used to determine the disappearance of the target. Secondly, to handle the newly added target, the study needs to initialize an additional trajectory and add it to the trajectory set. The processing flow for target obstruction is shown in Figure 6.

Figure 6.

Processing flow for target obstruction.

From Figure 6, the processing flow for target obstruction involves obtaining the trajectory of the disappearing target, determining whether the distance is less than the threshold, determining whether the area ratio is less than the threshold, and adding the trajectory set. In addition, the threshold values used for determining the area ratio and distance are based on relevant research.^34,35 By considering the target movement route extraction for football players in these three special situations, the tracking performance of the algorithm can be better improved. Finally, the main process of the designed algorithm for extracting the target movement route of football players is shown in Figure 7.

Figure 7.

Main process of extracting the target movement route algorithm for football players.

From Figure 7, firstly, the overall similarity matrix is solved. Secondly, the correlation between the trajectory and the object detection set is calculated. The third step is to determine whether the association is successful based on a threshold. If successful, the corresponding target is added to the corresponding trajectory. Otherwise, the association is canceled. The fourth step is to determine if there are still uncorrelated trajectories. If not, the process goes to the seventh step. Otherwise, the target obstruction processing method is used to handle the remaining uncorrelated trajectories. The fifth step is to handle disappearing targets, and the sixth step is to handle newly added targets. The seventh step is to output the modified new trajectory set, which is the set of target movement routes, and end the process.

Results

To validate the Faster R-CNN and target movement route extraction algorithm, the experimental environment is set up, the model parameters are clarified, and the dataset used in the experiment is also explained. The research also explains the comparative model and evaluation indicators used in the experiment.

Performance validation of improved Faster R-CNN

To validate the Faster R-CNN, the study takes the Windows 10 operating system, Intel Core i5-12600K central processor, 3.7 GHz base frequency, and 20 MB of third level cache. The study collected 2600 images of publicly available football matches on the internet and used LabelMe software for data annotation. Finally, the dataset was separated into training and testing sets in a 7:3. The data sources for 2600 images are the SoccerNet dataset and the FIFA World Cup dataset. In addition, when using LabelMe software for data annotation, the annotation content includes the bounding boxes and motion trajectories of football players. The bounding box annotation includes the precise coordinates of the athlete in the image, while the motion trajectory annotation is based on the position changes of the athlete in consecutive frames, and records their motion direction and speed. The maximum iteration for the optimized Faster R-CNN is 300, with Adam as the optimizer and an initial learning rate of 0.0002. The comparative models used in the experiment include YOLOv5, the Faster R-CNN before improvement, the combination model A combining multi-attention and YOLOv5, the combination model B combining weighted bidirectional FPN, coordinate attention mechanism, and YOLOv5, and the algorithm based on improved Feature Pyramid and Generative Adversarial Network (FPGAN). The evaluation metrics include mean Average Precision (mAP), accuracy, IoU, recall, and precision. Table 1 displays the ablation experimental results.

Table 1.

Ablation experimental results of the improved Faster R-CNN.

Module			Evaluation indicators
ResNet50	FPN	Anchor box optimization	mAP	Recall	Mean squared error (MSE)
√	√	√	0.987	98.89%	0.557
√	√	×	0.952	96.25%	0.653
√	×	×	0.913	92.73%	0.698
×	×	×	0.701	68.72%	3.567

From Table 1, when the Faster R-CNN model was not improved, the corresponding mAP, recall, and MSE values were 0.701, 68.72%, and 3.567, respectively, indicating no significant advantage. The model optimized by combining ResNet50, FPN, and Anchor box outperformed other single or incomplete improvement combinations in mAP, recall, and MSE values. For example, on mAP, the model optimized with ResNet50, FPN, and Anchor box was 0.987, which was 0.035 and 0.074 higher than the mAP optimized with ResNet50 and FPN, and ResNet50 models, respectively. Overall, the improvements made to the Faster R-CNN in the study are effective and can effectively recognize targets for football players. The accuracy and IoU comparison of different models are shown in Figure 8.

Figure 8.

Comparison of accuracy and intersection over union ratio of different models. (a) Comparison of accuracy. (b) Comparison of intersection over union.

From Figure 8(a), the maximum accuracy of the proposed model was 98.73%, and the minimum was 95.13%. The maximum accuracy of YOLOv5, Faster R-CNN model, combination model A, combination model B, and FPGA model was 89.71%, 87.11%, 93.55%, 94.79%, and 93.21%, respectively, which were 9.02%, 11.62%, 5.18%, 3.94%, and 5.52% lower than 98.73%. From Figure 8(b), in the comparison of IoU, the value of the proposed method was closer to 1, with an average IoU of 0.954. The average IoU of the other five comparison models was 0.843, 0.801, 0.912, 0.925, and 0.903, all of which were lower than 0.954. Overall, the improved Faster R-CNN performs better. This may be because the ResNet50 model introduced in the study solved the gradient vanishing problem of the original Faster R-CNN, introduced FPN to obtain multi-scale features, and optimized the Anchor box. The recall and precision are displayed in Figure 9.

Figure 9.

Comparison of recall and precision of different models. (a) Comparison of recall. (b) Comparison of precision.

From Figure 9(a), when comparing the recall rates of different models, the proposed method performed better, followed by the combination model B, combination model A, and FPGA model, while the Faster R-CNN and YOLOv5 performed worse. The maximum recall rate of the proposed method was 98.92%, and the minimum was 96.33%. The maximum recall rates of the YOLOv5, Faster R-CNN, combination model A, combination model B, and FPGA model were 90.23%, 85.76%, 93.79%, 95.40%, and 92.66%, respectively. From Figure 9(b), the proposed model and the five comparison models were 98.21%, 88.23%, 86.42%, 91.13%, 92.85%, and 90.04%. Compared with the proposed method with an average precision of 98.21%, the precision of other models was 9.98%, 11.79%, 7.08%, 5.36%, and 8.17% lower, respectively. Overall, the improved Faster R-CNN performs better, with good precision and recall in target recognition for football players, providing a solid technical foundation for extracting their movement routes. This good performance also depends on the introduced ResNet50 model, FPN, and binary K-means.

The improved Faster R-CNN model outperforms the comparison model on accuracy and IoU ratio, as well as recall and precision. This may be because the ResNet50 model is introduced to solve the gradient vanishing problem of the original Faster R-CNN model, and FPN is introduced to obtain multi-scale features and optimize Anchor boxes. The FPN to obtain multi-scale features helps the model maintain high detection accuracy on targets with different sizes, while the optimized Anchor box reduces false positives and false negatives, making the detection results more accurate. These may be the reasons for the data value deviation between the improved Faster R-CNN model and the comparison model.

Performance verification of target movement route extraction algorithm

To verify the performance of the designed target movement route extraction algorithm, the existing public dataset Soccer Dataset is adopted. This dataset contains football videos in four different scenes, namely, unobstructed scenes, same team players obstructing the scene, the scene obstructed by players from different teams, and mixed dense scenes.³⁶ After obtaining the data, the study preprocessed it by uniformly adjusting all video frames to a size of 624×352 pixels and normalizing them to adjust pixel values within the range of 0–1. The study also removes frames in the dataset that are blurry, has low resolution, or has incomplete annotations to ensure data quality. In addition, the study also employs techniques such as random flipping, rotation, and color jitter to increase the diversity of the dataset and improve the model’s generalization ability. After preprocessing, the dataset is divided into a training set and a validation set in a 7:3 ratio, with 70% of the data used for model training and 30% for model performance validation. In terms of parameter settings for the model, it takes a contrastive loss function with 300 iterations and a training time of approximately 12 hours. In addition, the values of

ϖ_{1}

ϖ_{2}

, and

ϖ_{3}

in the movement route extraction algorithm are 0.3, 1.2, and 1.2, respectively, with a similarity threshold of 0.9. The hardware settings used in the experiment are consistent with the performance validation. The comparative algorithms used in the experiment include Direct Linear Transformation (DLT) algorithm, Siamese algorithm, Continuous-time Convolutional Operators for Tracking (CCOT) for tracking, a combination model C combining model predictive control and second-order oscillatory particle swarm optimization algorithm, and a combination model D combining Hidden Markov Model and Viterbi algorithm. The evaluation indicators include accuracy, trajectory displacement error, and tracking speed. During the model training and validation process, referring to relevant research, a total of six combinations of

ϖ_{1}

ϖ_{2}

, and

ϖ_{3}

weights are set up.³⁷ By continuously adjusting the weight combination and observing the performance indicators of the model under different weight values, the optimal weight configuration can be determined. The observation results are shown in Table 2.

Table 2.

Performance changes of models under different weight values.

Weight combination	$ϖ_{1}$	$ϖ_{2}$	$ϖ_{3}$	Accuracy/%	Trajectory displacement error/m	AUC
Combination 1	0.1	1.0	1.0	89.52	0.65	0.952
Combination 2	0.2	1.0	1.0	92.34	0.59	0.961
Combination 3	0.3	1.0	1.0	94.12	0.53	0.966
Combination 4	0.3	1.1	1.1	95.67	0.49	0.969
Combination 5	0.3	1.2	1.2	96.46	0.45	0.978
Combination 6	0.3	1.3	1.3	95.83	0.47	0.971

From Table 2, in terms of accuracy indicators, combination 5 performed better (with

ϖ_{1}

= 0.3,

ϖ_{2}

= 1.2, and

ϖ_{3}

= 1.2), with an accuracy of 96.46%, followed by combination 6 (with

ϖ_{1}

= 0.3,

ϖ_{2}

= 1.3, and

ϖ_{3}

= 1.3), with an accuracy of 95.83%. In terms of trajectory displacement error index, combination 5 was 0.45 m, which was significantly lower than other weight combinations. In addition, the AUC value of combination 5 was 0.978, which was 0.026, 0.017, 0.012, 0.009, and 0.007 higher than the values of the other five combinations, respectively. Under the weight combination of 5 (

ϖ_{1}

= 0.3,

ϖ_{2}

= 1.2, and

ϖ_{3}

= 1.2), the performance indicators of the model are excellent, indicating that the model performance is better under this weight combination. The distribution of the Soccer Dataset is displayed in Table 3.

Table 3.

Distribution of the soccer dataset.

Serial number	Scene	Video sequence number	Number of video sequences	Number of frames
1	Unobstructed scene	1–20	20	5385
2	Same team players obstructing the scene	21–40	20	3954
3	The scene is obstructed by players from different teams	41–60	20	5060
4	Mixed dense scene	61–80	20	5509
Total		1–80	80	19,908

From Table 3, there were 80 video sequences in this dataset, and the video images had 19,908 frames. In addition, the resolution in this dataset is 624×352. The experiment marked the positions of 80 players and represented their positions through the text file groundtruth.txt. In addition, when the dataset is obstructed, the frequency range is [1,7]. Meanwhile, according to the degree of obstruction, the date was divided into partial obstruction and complete obstruction. The convergence comparison of different models is shown in Figure 10.

Figure 10.

Comparison of convergence of different models.

From Figure 10, the proposed mobile route extraction model outperformed the comparison model in terms of convergence speed and final performance. In the early stages of training, the loss value of the model rapidly decreased and dropped to 0.018 in the first 30 iterations, demonstrating high learning efficiency. Afterwards, the loss value of the model remained stable at a minimum of 0.016, demonstrating its excellent convergence performance. In contrast, other models such as DLT and Siamese had a slower decrease in loss values, and the final loss value was also higher than that of the research model, indicating that their convergence effect was relatively poor. Overall, the research model can converge faster and more stably during the training process. The accuracy comparison of different models in different scenes is shown in Figure 11.

Figure 11.

Comparison of accuracy of various models in different scenes. (a) Unobstructed scene. (b) Same team players obstructing the scene. (c) The scene is obstructed by players from different teams. (d) Mixed dense scene.

As shown in Figure 11(a), in unobstructed scenes, the maximum of the proposed movement route extraction algorithm was 96.37%, which had obvious advantages. The maximum accuracy values of DLT, Siamese, CCOT, combination model C, and combination model D were 60.17%, 87.81%, 87.25%, 90.33%, and 91.17%, respectively, which were lower than the designed movement route extraction algorithm’s 96.37%, and the differences between them were 36.20%, 8.56%, 9.12%, 6.04%, and 5.20%, respectively. From Figure 11(b), in the same team players obstructing the scene, the designed movement route extraction algorithm performed better in accuracy, with a maximum value of 93.51%. The maximum accuracy values of the five comparative models were 44.52%, 68.97%, 69.42%, 84.55%, and 86.73%, respectively. The combination model D performed better in accuracy compared to the DLT model. According to Figure 11(c), the maximum accuracies of the designed movement route extraction algorithm and comparison models were 94.17%, 48.13%, 73.11%, 86.03%, 86.42%, and 88.93%, respectively. At this point, the more effective tracking method is still the designed movement route extraction algorithm. According to Figure 11(d), in mixed dense scenes, the maximum accuracies of comparison models were 48.91%, 77.03%, 63.41%, 83.56%, and 86.77%, respectively. The maximum accuracy of the designed movement route extraction algorithm was 95.17%, which was 46.26%, 18.14%, 31.76%, 11.61%, and 8.40% higher than the comparison methods. In summary, the designed algorithm has good tracking performance in different scenes and can better extract the movement routes of football players. This is also because the study constructs a similarity matrix, taking into account the motion, detection, and apparent features of football players, and separately considering solutions to obstruction problems, which improved the application effectiveness. The maximum trajectory displacement errors of different models in various scenes are displayed in Table 4.

Table 4.

Comparison of maximum trajectory displacement errors of different models in different scenes/m.

Unobstructed scene
Trajectory number		2	4	6	8	10	12	14	16	18	20
Model	DLT	3.55	3.73	3.88	3.64	3.67	3.58	3.71	3.47	3.54	3.88
	Siamese	1.90	1.87	2.11	2.21	1.79	1.77	1.85	1.76	1.77	1.93
	CCOT	2.24	2.62	2.18	2.73	2.33	2.16	2.21	2.40	2.38	2.26
	Combination model C	0.78	1.06	0.97	0.77	1.09	1.11	0.99	0.74	1.12	0.91
	Combination model D	1.08	1.06	1.03	0.63	0.82	0.96	0.95	1.07	0.90	0.92
	Manuscript	0.40	0.64	0.51	0.50	0.79	0.51	0.71	0.75	0.45	0.60
Same team players obstructing the scene
Trajectory number		22	24	26	28	30	32	34	36	38	40
Model	DLT	3.77	3.55	3.83	3.30	3.58	3.60	3.90	3.47	3.32	3.39
	Siamese	1.80	1.98	2.08	2.19	2.29	1.78	1.95	1.82	1.89	2.24
	CCOT	2.50	2.49	2.63	2.12	2.62	2.57	2.63	2.60	2.24	2.20
	Combination model C	1.02	0.94	1.23	1.13	1.44	1.04	0.75	1.25	1.16	1.01
	Combination model D	1.08	1.15	0.65	0.77	0.85	0.70	0.97	0.65	0.64	1.04
	Manuscript	0.76	0.40	0.47	0.79	0.56	0.64	0.72	0.60	0.56	0.44
The scene is obstructed by players from different teams
Trajectory number		42	44	46	48	50	52	54	56	58	60
Model	DLT	3.59	3.41	3.67	3.70	3.38	3.62	3.74	3.66	3.39	3.57
	Siamese	1.77	1.73	1.92	1.91	1.85	2.15	1.99	1.83	2.14	2.05
	CCOT	2.67	2.64	2.72	2.51	2.25	2.44	2.47	2.76	2.31	2.29
	Combination model C	1.23	0.76	0.80	0.75	1.45	1.17	1.11	0.71	0.95	0.92
	Combination model D	0.91	0.81	1.05	0.95	0.82	0.97	0.65	0.91	0.60	1.13
	Manuscript	0.53	0.43	0.47	0.64	0.43	0.41	0.76	0.57	0.47	0.64
Mixed dense scene
Trajectory number		62	64	66	68	70	72	74	76	78	80
Model	DLT	3.50	3.83	3.70	3.53	3.52	3.76	3.72	3.62	3.83	3.74
	Siamese	2.17	2.22	2.23	2.28	1.99	1.72	2.22	2.02	2.07	1.75
	CCOT	2.16	2.23	2.14	2.43	2.37	2.72	2.28	2.35	2.69	2.68
	Combination model C	1.38	1.05	1.30	1.42	1.04	1.43	1.16	0.74	1.31	1.07
	Combination model D	0.87	0.70	0.63	0.62	0.82	0.86	1.07	0.67	1.19	0.62
	Manuscript	0.41	0.72	0.47	0.47	0.42	0.60	0.53	0.41	0.66	0.79

From Table 4, in different scenes, the maximum trajectory displacement error of the designed movement route extraction algorithm was significantly smaller than that of comparison models. For example, in a mixed dense scene, the maximum value range of trajectory displacement error for designed movement route extraction algorithm was [0.41 m, 0.79 m], with an average of 0.548 m. The average maximum trajectory displacement errors of comparison models were 3.675 m, 2.067 m, 2.405 m, 1.190 m, and 0.805 m, which were 3.127 m, 1.519 m, 1.857 m, 0.642 m, and 0.257 m higher than the research method, respectively. In summary, the trajectory displacement error of the designed movement route extraction algorithm is smaller, and the extracted football player’s movement route is closer to the actual trajectory, resulting in better performance. The tracking speeds of different models in different scenes are shown in Figure 12.

Figure 12.

Comparison of tracking speed of different models in different scenes. (a) Unobstructed scene. (b) Same team players obstructing the scene. (c) The scene is obstructed by players from different teams. (d) Mixed dense scene.

From Figure 12(a), in an unobstructed scene, the average frame rates of the designed extraction algorithm and the comparison models were 1.23 fps, 7.45 fps, 6.39 fps, 1.37 fps, 2.17 fps, and 2.35 fps, respectively. The designed movement route extraction algorithm has a smaller average frame rate, indicating that its tracking speed is faster. According to Figure 12(b), in the same team players obstructing the scene, the average frame rate of the designed movement route extraction algorithm was 1.36 fps, which was significantly lower than comparison models. From Figure 12(c), in the scene obstructed by players from different teams, the designed movement route extraction algorithm has a lower average frame rate, with a value of 1.05fps. The DLT model and Siamese model had higher average frame rates, with values of 7.68 fps and 6.82 fps, respectively. According to Figure 12(d), in a mixed dense scene, the average frame rates of the six models were 1.17fps, 7.98fps, 6.11fps, 1.53fps, 2.09fps, and 2.67fps, respectively. In summary, the designed movement route extraction algorithm has faster tracking speeds and can extract the movement routes of football players more quickly. The current frame rate of the mobile route extraction model designed in the paper was relatively low (1.05fps∼1.36fps), indicating that there was still room for improvement in processing speed and real-time applications. The low frame rate is mainly due to the complexity of feature extraction and data association calculation in this model, resulting in longer processing time. This also indicates that the model is mainly suitable for offline tactical analysis, as offline analysis does not require high real-time performance, and the accuracy of the model is relatively high. In addition, the model can be optimized for real-time use in the following ways. The first is to optimize the model structure, such as pruning, quantization, and other methods to reduce computational complexity. The second is to adopt more efficient feature extraction algorithms. The third is to use hardware acceleration technology, such as graphics processor acceleration. These optimizations can increase frame rates and make model performance closer to the requirements of real-time applications, especially in situations where real-time performance is not very strict. The AUC of different models in different scenes is shown in Figure 13.

Figure 13.

Comparison of AUC of different models in different scenes. (a) Unobstructed scene. (b) Same team players obstructing the scene. (c) The scene is obstructed by players from different teams. (d) Mixed dense scene.

From Figure 13(a), in unobstructed scenes, the maximum AUC was achieved by the research algorithm, with a value of 0.987. The AUC values of DLT, Siamese, CCOT, combination model C, and combination model D were 0.627, 0.823, 0.759, 0.923, and 0.956, respectively. From Figure 13(b), in the same team players obstructing the scene, the designed algorithm still had significant advantages in AUC values, which were 0.362, 0.171, 0.187, 0.053, and 0.038 higher than comparison models, respectively. In Figure 13(c), the DLT model had the lowest AUC, with a value of 0.617. The maximum AUC was the designed algorithm, with a value of 0.991. According to Figure 13(d), in a mixed dense scene, the AUC values of the designed movement route extraction algorithm and comparison models were 0.993, 0.625, 0.845, 0.766, 0.930, and 0.952. The research algorithm has better AUC values in different scenes, indicating better performance.

The performance of the target movement route extraction algorithm is superior to the comparison model in terms of accuracy and trajectory displacement error, as well as tracking speed and AUC value. This may be because the algorithm constructs a similarity matrix that integrates motion, detection, and appearance features and uses the Hungarian algorithm to solve the data association problem. The constructed similarity matrix enables the algorithm to comprehensively consider the position changes, apparent features, and motion trends of the target, thereby more accurately matching and tracking the target. The Hungarian algorithm effectively handles problems such as target occlusion, disappearance, and addition, improving the accuracy and robustness of tracking. In addition, the algorithm has developed specialized processing strategies for occlusion situations, such as using Boolean labels to determine the disappearance of targets and initializing new target trajectories, further improving tracking performance in complex scenes. These improvement measures work together to make the target movement route extraction algorithm superior to the comparison model in all performance indicators.

Practical application

The improved Faster R-CNN model and target movement route extraction algorithm designed for research have significant value in practical applications, especially for football match analysis and tactical research. The improved Faster R-CNN model, which combines ResNet50 and FPN structures, achieves a leap in detection accuracy and feature extraction capability, with fast detection speed and strong stability, effectively supporting real-time event analysis. The target movement route extraction algorithm innovatively integrates a similarity matrix with multiple features and the Hungarian algorithm, effectively solving the data association and maintaining high tracking accuracy in complex scenes. However, there is still room for improvement in real-time performance of the model, and the current processing speed is more suitable for offline tactical analysis. In practical applications, high-performance hardware is required to accelerate data processing, and algorithm parameters should be optimized based on the competition environment to enhance their applicability.

In addition, the improved Faster R-CNN model has multi-scale and strong adaptability in feature extraction by introducing ResNet50 and FPN, which enables it to effectively detect targets in different sports scenes (such as badminton and basketball), demonstrating good universality. The model has strong robustness to camera angle changes, mainly due to the FPN structure’s ability to capture multi-scale features, which to some extent offsets the impact of angle changes. The algorithm for extracting target movement routes relies on a similarity matrix constructed by integrating motion, detection, and apparent features, which can adapt to tracking requirements in different motion scenarios. Even when facing changes in camera angles, it can achieve more accurate target tracking through comprehensive feature balancing. Therefore, the improved model has good universality in different movements and camera angles.

Discussion and conclusion

A target detection method based on an optimized Faster R-CNN was designed for extracting the movement routes of football players, and a movement route extraction model on the basis of the similarity matrix was designed. In the ablation experiment, the detection model optimized with ResNet50, FPN, and Anchor box was significantly better than other single or incomplete improvement combinations in mAP, recall, and MSE values. The proposed method showed mAP increases of 0.035 and 0.074 compared to the combination of ResNet50 and FPN models, as well as the ResNet50 model. This indicates that the improvements are effective and can effectively recognize targets for football players. Meanwhile, the maximum accuracy of this model was 98.73%, which was 9.02%, 11.62%, 5.18%, 3.94%, and 5.52% higher than comparison models, respectively. The movement route extraction algorithm had good performance in both unobstructed and obstructed scenes, and its corresponding accuracy and average frame rate were significantly better than comparison models. The average values of the maximum trajectory displacement error of this algorithm in different scenes were 0.586, 0.594, 0.535, and 0.548, respectively, all lower than comparison models, and the extracted movement routes of football players were closer to the actual trajectory. The designed detection model and trajectory extraction algorithm have good performance, which can quickly and accurately detect football players and extract their movement routes.

Limitations

There are some shortcomings in the research. Firstly, the real-time performance of the mobile route extraction model is insufficient and only suitable for offline tactical analysis. Secondly, there is insufficient differentiation of identity information, and the detection model cannot distinguish different identities such as players and referees, which limits its application in complex football scenes. Thirdly, the adaptability of the model needs to be enhanced. In complex and changeable competition scenes, the detection accuracy and tracking effect may be affected. Fourthly, the feature fusion is insufficient. Although a similarity matrix has been constructed to fuse multiple features, there is still room for improvement in the fusion method and depth, and the tracking performance improvement is limited in situations such as occlusion.

Future research can be optimized and explored from multiple perspectives. One is to optimize the model structure, adopt efficient feature extraction algorithms, or utilize hardware acceleration technology to improve real-time performance and meet real-time application requirements. The second is to introduce identity recognition mechanism in the model, combined with identity preservation algorithm, to accurately differentiate and track different identities over the long term such as players and referees. The third is to conduct research on multi-view integration, integrate data from different camera perspectives, overcome the limitations of a single view, improve tracking accuracy and robustness, and expand tactical analysis perspectives. The fourth is to explore tracking methods based on Transformer, using its self-attention mechanism to better model player feature changes and improve tracking accuracy and stability. The fifth is to develop a scenario adaptive model that integrates scenario information to automatically adjust strategies, enhancing generality and adaptability. The sixth is to conduct in-depth research on deep feature fusion methods, mine feature correlation information, improve tracking performance in complex situations, and provide deeper insights for football game analysis.

Footnotes

ORCID iD

Lingrui Li

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Appendix

References

Anzer

Bauer

. Determining the difficulty of a pass in football (soccer) using spatio-temporal data. Data Min Knowl Discov 2022; 36(1): 295–317. DOI: 10.1007/s10618-021-00810-3.

Wang

Jia

Bai

, et al. Research on the location of railway train in tunnel based on factor graph optimization. Appl Comput Lett 2023; 7(1).

Raabe

Nabben

Memmert

. Graph representations for the analysis of multi-agent spatiotemporal sports data. Appl Intell 2023; 53(4): 3783–3803. DOI: 10.1007/s10489-022-03631-z.

Elugbadebo

Orunsolu

Akinyele

, et al.. An efficient and secured graphical authentication system. Acta inform Malays 2022; 6(1): 17–21. DOI: 10.26480/aim.01.2022.17.21.

Zhang

Shan

. A multi-sensor-system cooperative scheduling method for ground area detection and target tracking. Front Inform Technol Electron Eng 2023; 24(2): 245–258. DOI: 10.1631/FITEE.2200121.

Roshan

Shanker

Bal

. Study of chronological order in intersecting printed and pen strokes with the help of chromaticity diagram. Acta Scientifica Malaysia 2022; 6(2): 38–42.

Jiang

Xuan

. Transformer single target tracking algorithm integrating spatio-temporal information. Comput Eng Appl 2024; 60(19): 230–241. DOI: 10.3778/j.issn.1002-8331.2307-0069.

Luo

Islam

. Tracking and detection of basketball movements using multi-feature data fusion and hybrid YOLO-T2LSTM network. Soft Comput 2024; 28(2): 1653–1667. DOI: 10.1007/s00500-023-09512-y.

Born

Mundt

Mian

, et al. The eye in the sky-A method to obtain on-field locations of Australian rules football athletes. AI 2024; 5(2): 733–745. DOI: 10.3390/ai5020038.

10.

Wang

. Football sports video tracking and detection technology based on YOLOv5 and DeepSORT. Discov Appl Sci 2025; 7(6): 1–17. DOI: 10.1007/s42452-025-07116-9.

11.

Ong

Chong

Ong

, et al. Tracking of moving athlete from video sequences using flower pollination algorithm. Vis Comput 2022; 38(3): 939–962. DOI: 10.1007/s00371-021-02060-2.

12.

Zhang

Dai

. Motion trajectory tracking of athletes with improved depth information-based KCF tracking method. Multimed Tools Appl 2023; 82(17): 26481–26493. DOI: 10.1007/s11042-023-14929-6.

13.

Zhong

Liang

. Using CNN-VGG 16 to detect the tennis motion tracking by information entropy and unascertained measurement theory. Adv Nano Res 2022; 12(2): 223–239. DOI: 10.12989/anr.2022.12.2.223.

14.

Maglo

Orcesi

Denize

, et al. Individual locating of soccer players from a single moving view. Sensors 2023; 23(18): 7938. DOI: 10.3390/s23187938.

15.

Zhao

Jin

. Application of intelligent analysis technology of football video based on online target tracking algorithm of motion characteristics in football training. Comput Intell Neurosci 2022; 2022(1): 4739712. DOI: 10.1155/2022/4739712.

16.

Raphael

Mery

. Mapping fire blight cankers and autumn blooming in pear trees using Faster R-CNN. Precis Agric 2024; 25(1): 396–411. DOI: 10.1007/s11119-023-10077-x.

17.

Wang

Bai

Song

, et al. Optimized faster R-CNN for oil wells detection from high-resolution remote sensing images. Int J Remote Sens 2023; 44(22): 6897–6928. DOI: 10.1080/01431161.2023.2275322.

18.

Mubarak Suud

. An image processing approach for monitoring soil plowing based on drone rgb images. Big data Agr 2022; 5(1): 01–05.

19.

Zhai

Sun

Huyan

, et al. Feature representation improved Faster R-CNN model for high-efficiency pavement crack detection. Can J Civ Eng 2022; 50(2): 114–125. DOI: 10.1139/cjce-2022-0137.

20.

Anand

Lakshmi

Pandey

DPBK

, et al. An enhanced ResNet-50 deep learning model for arrhythmia detection using electrocardiogram biomedical indicators. Evol Syst 2024; 15(01): 83–97. DOI: 10.1007/s12530-023-09559-0.

21.

Giri

Prasad Chimouriya

Ram Ghimire

. Crossing strokes examination from cromaticity diagram. Sci Herit J 2023; 7(1): 01–08. DOI: 10.26480/gws.01.2023.01.08.

22.

Karthika

Durgadevi

Rani

. Enhancing diabetic retinopathy diagnosis with ResNet-50-based transfer learning: a promising approach. Ann Data Sci 2024; 11(1): 1–24. DOI: 10.1007/s40745-023-00494-0.

23.

Zarif

Morad

Amin

, et al. Video summarization approach based on binary robust invariant scalable keypoints and bisecting K-means. Comput Mater Contin 2024; 78(3): 3565–3583. DOI: 10.32604/cmc.2024.046185.

24.

Peng

Tang

Dai

, et al. Prediction of loom machine status based on binary K-means theory. Fangzhi Xuebao/J Textile Res 2023; 44(05): 112–118. DOI: 10.13475/j.fzxb.20220100801.

25.

Budiarsa

Wardoyo

Musdholifah

. Face recognition with occluded face using improve intersection over union of region proposal network on Mask region convolutional neural network. Int J Electr Comput Eng 2024; 14(03): 3256–3265. DOI: 10.11591/ijece.v14i3.pp3256-3265.

26.

Zhang

Lou

Song

, et al. Performance enhancement of PPP/SINS tightly coupled navigation based on improved robust maximum correntropy kalman filtering. Adv Space Res 2024; 74(05): 2078–2091. DOI: 10.1016/j.asr.2024.05.072.

27.

Shao

. Distributed consensus Kalman filtering for asynchronous multi-rate sensor networks. Signal Image Video Process 2024; 18(8): 6419–6429. DOI: 10.1007/s11760-024-03326-7.

28.

Akrami

Mohsenian-Rad

. Event-triggered distribution system state estimation: sparse kalman filtering with reinforced coupling. IEEE Trans Smart Grid 2024; 15(01): 627–640. DOI: 10.1109/TSG.2023.3270421.

29.

Hung

. On intelligent placement decision-making algorithms for wireless digital twin networks via bandit learning. IEEE Trans Veh Technol 2024; 73(06): 8889–8902. DOI: 10.1109/TVT.2024.3360959.

30.

Wang

Ming

Liu

, et al. Secure and flexible data sharing with dual privacy protection in vehicular digital twin networks. IEEE Trans Intell Transport Syst 2024; 25(9): 12407–12420. DOI: 10.1109/TITS.2024.3368342.

31.

Groumpos

. A critical historic overview of artificial intelligence: issues, challenges. Opportunities, and Threats, AIA 2023; 1(4): 197–213. DOI: 10.47852/bonviewAIA3202689.

32.

Lin

. A systematic weighted-Hungarian-algorithm for optimization and postoptimal analysis of transportation problem. J Stat Manag Syst 2023; 26(04): 843–866. DOI: 10.47974/JSMS-936.

33.

Yao

Lou

. Path planning for multiple unmanned surface vehicles using glasius bio-inspired neural network with Hungarian algorithm. IEEE Syst J 2018; 17(03): 3906–3917. DOI: 10.1109/JSYST.2022.3222357.

34.

Merzah

Croock

Rashid

. Football player tracking and performance analysis using the OpenCV library. Math Model Eng Probl 2024; 11(1): 123–132. DOI: 10.18280/mmep.110113.

35.

Chen

Liu

. Constructing a basic training system for football in universities based on object detection and tracking algorithms. Revista Multidisciplinar De Las Ciencias Del Deporte 2024; 24(96): 120–145. DOI: 10.15366/rimcafd2024.96.008.

36.

Liu

. Mathematical method to construct the linear programming of football training. Appl Math Nonlinear Sci 2023; 8(1): 437–442. DOI: 10.2478/amns.2022.2.00026.

37.

Ötting

Karlis

. Football tracking data: a copula-based hidden Markov model for classification of tactics in football. Ann Oper Res 2023; 325(1): 167–183. DOI: 10.1007/s10479-022-04660-0.