From data to prevention: A systematic review of artificial intelligence applications in sports injury prediction

Abstract

Purpose

Artificial intelligence (AI) shows considerable potential for sports injury prediction, yet a comprehensive methodological review of its empirical applications remains limited. This study aimed to systematically review the empirical literature on the use of AI and machine learning (ML) for sports injury prediction.

Methods

Following the PRISMA 2020 guidelines, a systematic search was conducted across PubMed, IEEE Xplore, SPORTDiscus, Web of Science, and Scopus for literature published between January 2015 and March 2026. In addition to the structured database search, a small number of database-recommended articles identified through platform recommendation functions were also screened for eligibility. After a multi-stage screening process, 18 empirical studies were included in the final qualitative synthesis.

Results

Risk of bias assessment using PROBAST indicated that only one study (5.6%) had a low overall risk of bias, whereas most studies were judged to be high risk, primarily because of weaknesses in the analysis domain and reliance on internal validation. Across the included studies, AI-based models demonstrated potential for handling multidimensional training load, physiological, biomechanical, and psychological data; however, most prediction models relied exclusively on internal validation, limiting confidence in their generalizability.

Conclusion

AI demonstrates clear promise for sports injury prediction, but the current evidence base remains constrained by limited external validation, inconsistent reporting practices, and the continued overrepresentation of male cohorts. Future research should prioritize methodological rigor, broader geographic and demographic representation, standardized reporting, and explainable AI (XAI) approaches to enhance the trustworthiness and practical utility of these models for coaches and clinicians.

Keywords

Artificial intelligence injury prediction machine learning sports injury

1. Introduction

Sports injuries are a prevalent challenge inherent to athletic training and competition. They can cause physical pain and functional limitations for athletes but may also lead to long-term psychological consequences. Furthermore, injuries can disrupt participation, necessitate tactical adjustments, and reduce a team's competitive advantage. Beyond their performance-related impact, sports injuries also impose substantial economic and resource burdens. Goodlin et al. reported that annual medical expenditures for sports injuries in the United States amount to at least $160 billion, and between 2008 and 2013, Major League Baseball lost approximately $1.6 billion in salaries due to player injuries.¹ Edouard et al. further found that approximately two-thirds of track and field athletes sustain at least one injury per season, and that around 100 injuries occur per 1000 registered athletes during international championships.² These findings underscore that sports injuries remain a significant and persistent challenge in competitive sports.

Recent research suggests that the relationship between psychological state and sports injury may be bidirectional. Not only can injuries negatively affect mental health, but an athlete's psychological state—such as stress, anxiety, mood fluctuations, and sleep quality—may also contribute to injury risk. These psychological factors may indirectly influence concentration, decision-making, and neuromuscular control, thereby increasing the likelihood of injury occurrence.

Accordingly, recent studies have begun to incorporate subjective psychological indicators into AI-based prediction models, integrating these variables with objective physiological and training-load data to establish more comprehensive injury risk prediction systems. For instance, Bergeron et al. incorporated psychological variables such as sleep quality and academic stress as key features in a machine learning model for injury risk prediction, highlighting the potential value of integrating psychological data into sports injury forecasting.³ Sports injuries are widely regarded as a key factor hindering sustained training and performance development⁴; therefore, effective prevention has become a major focus in sports medicine and sports science.

Despite the growing emphasis on injury prevention, traditional prediction and assessment methods face substantial practical challenges. Sports injuries are often sudden and multifactorial, with etiologies involving complex interactions among intrinsic characteristics, extrinsic exposures, and inciting events, making accurate assessment based on single variables difficult. Furthermore, coaches often rely on intuition and experience, while athletes may overestimate or underestimate their physical condition during self-assessment, leading to subjective data bias.⁵ Traditional statistical models, such as linear regression, may be less effective in capturing the multifactorial, non-linear, and dynamic mechanisms of injury risk, and may struggle to adequately model the complex interactions among workload, recovery, biomechanics, and psychological status. For example, factors such as the acute:chronic workload ratio, sleep quality, and psychological stress are highly interrelated and may influence injury risk in ways that are difficult for linear models to fully capture.^6,7,8 Van Eetvelde et al. similarly noted that injury prediction requires the integration of intrinsic and extrinsic risk factors with inciting events to build more explanatory risk models.⁹ In the sports injury literature, and particularly in football-related research, recent methodological reviews have further emphasized that current injury prediction studies remain highly heterogeneous in terms of variable selection, data imbalance handling, validation approaches, and performance reporting, thereby complicating interpretation and limiting robust comparison across studies.

Against this background, AI has demonstrated significant potential in sports medicine by offering new opportunities for sports injury prediction. Through high-performance computing and machine learning techniques, AI can learn risk patterns from large historical datasets and model non-linear relationships that are difficult to detect using conventional approaches.¹⁰ Common algorithms, such as Random Forest and Support Vector Machine, are capable of handling multidimensional and highly interdependent variables to build predictive risk models that may support medical teams and coaches in making timely intervention decisions. Bahr and Krosshaug emphasized the need for a multifactorial approach that integrates risk factors and inciting mechanisms, and this complexity supports the application of advanced computational methods in injury prediction.¹¹ In parallel, recent biomechanics research has highlighted the increasing importance of objective movement analysis, wearable technologies, and neuromuscular assessment for enhancing sports performance, mitigating injury risks, and optimizing rehabilitation, further supporting the integration of biomechanical and physiological data into AI-based injury prediction frameworks.

Building on this rationale, numerous recent studies have attempted to apply AI and ML techniques to sports injury risk prediction and have shown considerable potential. Previous reviews have examined the broader application of AI and ML in sports medicine or injury-related research; however, many have combined heterogeneous tasks such as injury diagnosis, image-based classification, rehabilitation assessment, and risk prediction. This makes it difficult to isolate the methodological strengths and limitations of AI-based models specifically intended for prospective sports injury prediction. The existing literature also faces several limitations, including constrained application scenarios, inconsistent metric definitions, limited external validation, and a lack of comprehensive synthesis. In addition, the current evidence base appears to be geographically concentrated, with some football regions and populations remaining underrepresented in AI-based injury prediction research, which may restrict the global generalizability of currently available models. Accordingly, the present review differs from previous reviews by focusing exclusively on empirical studies that used AI or ML for prospective or prognostic sports injury prediction, while excluding diagnostic imaging and purely classification-based studies. This review also updates the evidence through March 2026, synthesizes model types, predictor domains, validation strategies, and performance metrics, and evaluates the methodological quality of included prediction models using PROBAST. By doing so, this review seeks to provide a focused and systematic assessment of both the predictive potential and the current methodological limitations of AI-based injury prediction models in sports settings, while helping practitioners better understand the applied boundaries of AI in sports medicine.

2. Methods

This study was conducted as a systematic review to identify, evaluate, and synthesize empirical research on the application of AI and ML in sports injury prediction. A structured review approach was used to improve the transparency, reproducibility, and consistency of study identification, screening, and synthesis.

2.1 Search strategy

The literature search strategy for this study was adapted from keywords used in previous high-quality systematic reviews,^12,13 with modifications based on the current research question. The final search terms were developed through team discussion and expert consultation to ensure both breadth and specificity.

The search strategy was designed to retrieve studies specifically focused on sports injury prediction using artificial intelligence (AI) or machine learning (ML). To improve search sensitivity and precision, Medical Subject Headings (MeSH) were used in PubMed, and equivalent controlled vocabulary terms were applied where appropriate in other indexed databases. The search terms were structured around three conceptual domains:

AI and predictive modeling: Including terms such as “Artificial Intelligence,” “Machine Learning,” “Deep Learning,” “Neural Networks,” and “Predictive Analytics.”

Sports injury outcomes: Including terms such as “Athletic Injuries,” “Sports Injuries,” “Musculoskeletal Injury,” “Injury Risk,” “Risk Assessment,” and “Injury Forecasting.”

Monitoring and biomechanical context: Including terms such as “Biosensing Techniques,” “Wearable Electronic Devices,” “Biomechanical Phenomena,” “Sensor,” “GPS,” “IMU,” “Force Plate,” and “Kinematic.”

To improve specificity and maintain a focus on predictive models rather than diagnostic classification, negative filters (NOT logic) were applied to exclude diagnostic imaging and purely clinical identification studies, such as MRI-, CT-, or fracture-classification-based studies. Full database-specific search strategies are provided in Supplementary Table S1.

The search strategy was systematically applied across PubMed, IEEE Xplore, Web of Science, Scopus, and SPORTDiscus. The search period covered January 2015 to March 2026 in order to capture recent developments in AI-based sports injury prediction. In addition to the structured database search, a small number of database-recommended articles identified through platform recommendation functions were screened for eligibility.

2.2 Inclusion criteria

Studies were included if they met the following criteria: (1) they were peer-reviewed journal articles with full text available; (2) they were published in English; (3) they explicitly applied AI or ML methods; (4) they focused on sports injury prediction, including prospective or prognostic prediction of injury occurrence or related outcomes; and (5) the study participants were human athletes or comparable structured physically trained populations. Studies based on animal models or simulated data were not included.

2.3 Exclusion criteria

To enhance the relevance and quality of the included literature, studies were excluded if they met any of the following criteria: (1) publication types such as reviews, editorials, book chapters, dissertations, or theses; (2) non-English publications; (3) studies that did not explicitly apply AI or ML methods, or were not directly related to sports injury prediction as a prospective or prognostic task; (4) studies involving animal models or simulated data rather than human participants; and (5) articles for which the full text was unavailable for screening and quality assessment.

2.4 Selection of sources of evidence

The literature screening process was conducted independently by two researchers. First, all retrieved records were screened by title and abstract based on the predefined inclusion and exclusion criteria. Full-text articles that passed the initial screening were then reviewed by both researchers to confirm their eligibility. Any disagreements during the screening process were resolved through review and arbitration by a senior expert in the field to reach a final consensus.

The conduct and reporting of this systematic review followed the PRISMA 2020 statement.¹⁴

2.5 Data extraction

During the data extraction phase, key study characteristics were recorded, including injury type, primary study aim, sample size, participant demographics, sport type, data source, AI/ML model type, study design, country of origin, and publication year. Performance metrics such as accuracy, precision, recall, sensitivity, specificity, and area under the curve (AUC) were also extracted where available. When multiple metrics or time points were reported, those identified by the original authors as primary or most relevant to the study aim were prioritized. Missing or unclear data were recorded as “not reported,” and no assumptions were made to infer unavailable values.

2.6 Quality assessment

The risk of bias and applicability of the included prediction models were assessed using the Prediction model Risk Of Bias ASsessment Tool (PROBAST).¹⁵ PROBAST was specifically developed to assess the risk of bias and applicability of studies that develop, validate, or update multivariable prediction models.^15,16 The tool is organized into four key domains: (1) participants, (2) predictors, (3) outcome, and (4) analysis.¹⁵ These domains comprise 20 signaling questions that guide structured and transparent judgments of risk of bias.¹⁶ Following PROBAST guidance, each domain was rated as ‘low’, ‘high’, or ‘unclear’ risk of bias, and an overall high risk of bias was assigned if at least one domain was judged to be at high risk.^15,16

3. Results

Based on the aforementioned literature screening process, a total of 18 eligible empirical studies were included in this review. The following sections provide a comprehensive analysis of their study characteristics, applied AI/ML technologies, model predictive performance, and identified physiological and biomechanical risk factors associated with sports injuries.

3.1 Study selection

The systematic search was finalized in March 2026, covering literature published between January 2015 and March 2026. A total of 955 potentially relevant records were retrieved from PubMed (n = 645), IEEE Xplore (n = 122), Web of Science (n = 51), Scopus (n = 76), SPORTDiscus (n = 55), and other methods (database-recommended articles; n = 6). After an initial screening of titles and abstracts, 924 articles were excluded. The remaining 31 articles underwent a full-text review, of which 13 were excluded based on predefined criteria (e.g., methodological guidelines or commentaries, qualitative surveys/perceptions, and imaging-based or non-sports dataset studies). Ultimately, 18 empirical studies were included in the final analysis (Figure 1).

Figure 1.

PRISMA flow diagram.

3.2 Risk of bias assessment

The risk of bias assessment for the 18 included studies is summarized in Table 1. Based on the PROBAST criteria, only one study (5.6%) was classified as having a low overall risk of bias.¹⁷ Most studies (94.4%) were classified as high risk, mainly associated with limitations in the Analysis domain.

Table 1.
PROBAST risk of bias assessment.

Specifically, most risk prediction models relied exclusively on internal validation (e.g., k-fold cross-validation) without testing on an external, independent dataset, which often leads to optimistic performance estimates. Furthermore, a significant methodological concern was the lack of detail regarding data splitting techniques; many studies failed to report whether stratification was used or how temporal leakage was prevented in longitudinal training-load datasets. Additionally, the reporting quality was generally low, with only one study¹⁷ providing 95% confidence intervals (CIs) for performance metrics, limiting the assessment of model precision.

3.3 Study characteristics and model performance

The characteristics and performance of the 18 included studies are presented in Table 2. Following the refined scope of this review, all included studies focused exclusively on prospective or prognostic injury prediction models, aiming to forecast the probability of injury occurrence.

Table 2.
Characteristics and performance of included studies on ai-based sports injury prediction.

Author (Year) Population (N, level, sex) Injury / outcome definition Data source / predictors Model algorithm Validation strategy Performance [95% CI]

Bergeron et al.³
N = 1611

High school student-athletes

(Mixed)

Sport-related concussion

Symptom resolution time

(prediction thresholds at 7, 14, and 28 days)
NATION injury surveillance data; 17-item yes/no symptom checklist, injury circumstances, sex, class year, and other injury-related variables Naive Bayes
Internal

(10-fold cross-validation, repeated 10 times)

AUC: 0.727 [NR]

(Naive Bayes; 7-day threshold)

Chu et al.²⁰
N = 655

Pediatric Patients

(362 Male, 293 Female)

Protracted Recovery

(>21 days to medical clearance)
VOMS, King-Devick test, C3 Logix Trails Test CatBoost
Internal

(65% training, 15% validation, 20% testing; model comparison results generated using 5-fold cross-validation)

AUC (Male): 0.84 [NR]

AUC (Female): 0.78 [NR]

Jauhiainen et al.¹⁸
N = 314

Youth Basketball & Floorball

(152 Male, 162 Female)
Moderate to severe acute non-contact knee and ankle injuries; time-loss at least 8 days Physical test data including 3D motion analysis (VDJ), anthropometrics, strength, joint laxity, flexibility, and other physical measures L1-regularized logistic regression
Internal

(10-fold cross-validation, repeated 100 times)

AUC: 0.65 [NR]

Random Forest AUC: 0.63 [NR]

Lu et al.¹⁷
N = 2103

NBA athletes
Time-loss lower extremity muscle strains (LEMSs); defined as major muscle strains leading to loss of playing time, including hamstring, quadriceps, calf, and groin strains Publicly sourced online platforms; demographic characteristics, prior injury history, and performance metrics XGBoost
Internal

(0.632 bootstrap, 1000 resamples; model training used 10-fold CV repeated 3 times)

AUC: 0.840 [0.831–0.845]

Brier score: 0.030

Rommers et al.¹⁹
N = 734

Elite-level youth football players; U10–U15 (Male)
Medical attention injury; first occurring injury during the season was used in the analysis; injuries were also classified as overuse or acute Preseason anthropometric, motor coordination, physical fitness, and demographic measures Extreme Gradient Boosting (XGBoost)
Internal

(random split with 80% training data and 20% hold-out test data; model also evaluated with cross-validation and grid search for hyperparameter optimization)

Precision: 85% [NR]

Recall: 85% [NR]

F1: 85% [NR]

Alzahrani et al.²²
N = 50

28 Amateur, 22 Professional (incl. 5 National-level)

(70% Male, 30% Female)
Real-time injury-risk assessment and rehabilitation tracking / optimization; early detection of emerging ACL and muscle-strain risks Wearable IMUs and sEMG; joint angles, angular velocities, muscle forces, and asymmetry-related biomechanical features Hybrid IMU–sEMG framework with LSTM-based dynamic modeling
Internal

(Subject-wise 10-fold cross-validation)

AUC: 0.93 [NR]

Accuracy: 92.3%[NR]

Precision: 88.1% [NR]

Recall: 90.5%[NR]

Weng et al.²¹
N = 98

Baseball players spanning youth to professional levels

(Male)
Baseball-related upper extremity injuries during a baseball season; shoulder or elbow complaints causing the player to miss at least one practice or game; non-baseball and non-throwing injuries were excluded Clinical parameters including glenohumeral internal/external rotation, posterior capsule thickness, supraspinatus tendon thickness, acromiohumeral distance, and occupation ratio GIRD, Logistic Regression, Random Forest, SVM, and CatBoost;
Internal

(Train/test split (80:20); bootstrapping (100 rounds) plus SMOTE; final model validated on an independent test set)

Bootstrapping (CatBoost):

AUC 0.66 ± 0.05;

Accuracy 0.70 ± 0.05; Precision 0.52 ± 0.09; Recall 0.55 ± 0.09; F1-score 0.53 ± 0.08; Specificity 0.77 ± 0.07

Independent test set (CatBoost): AUC 0.62;

Accuracy 0.70; Precision 0.33;

Recall 0.50;

F1-score 0.40;

Specificity 0.75;

Castellanos et al.³⁰
N = 15,682

Participants from 21 US academic institutions and military service academies during the 2015–2016 academic year

(65.8% Male, 34.2% Female)
Clinician-diagnosed sport-related concussion sustained during sport participation between Aug 1, 2015 and Jul 31, 2016 Baseline CARE Consortium data; 176 baseline covariates mapped to 957 binary features, including demographics, medical/sport/academic/family history, SCAT symptom checklist, SAC, BESS, computerized neurocognitive assessment, and other baseline measures Linear Support Vector Machine (SVM)
Internal

(Repeated train/test splitting; approximately 80% training / 20% held-out test set, repeated 20,000 times)

AUROC: 0.73 [0.70–0.76]

Jauhiainen et al.²⁸
N = 791

Female elite handball and soccer players

(Female)
Noncontact / indirect contact ACL injuries Extensive preseason screening data including demographic, neuromuscular, biomechanical, anatomic, and genetic variables; 3D motion analysis, anthropometrics, strength, flexibility, balance, questionnaire/history data; 283 variables Linear support vector machine (SVM) was the best-performing model
Internal

(5-fold cross-validation, repeated 100 times, with permutation tests)

AUC-ROC: 0.63 [NR]

Saberisani et al.²⁹
N = 25

Professional soccer players from one team in the Persian Gulf Pro League (Iran)

(Male)
Any collision or non-collision injury that prevented a player from participating in at least one training session or match GPS-based external load variables; acute-to-chronic workload ratio (ACWR) derived from: total distance covered, average total distance covered, distances covered at high and moderate speeds, total distance load covered, accelerations, decelerations Decision Tree Classifier
Internal

(Random split: 80% training / 20% testing)

Best AUC model: Decelerations — AUC: 0.91 [NR]

Accuracy: 94.7% [NR]

Precision: 58.3% [NR]

Recall: 87.5% [NR]

Claros et al.²³
N = 194

Collegiate student-athletes (NCAA) from a single institution

(73 Male, 121 Female)
Subsequent musculoskeletal (MSK) injury risk within 1 year after return to play (RTP) following sport-related concussion (SRC); MSK injury was defined as an injury requiring treatment by athletic training staff or team physicians and resulting in at least 1 day of limited activity Single-institution cohort data; predictors included demographics and anthropometrics, medical and athletic history, concussion injury characteristics, and multidomain concussion assessments collected at baseline, acute (<48 h), asymptomatic, and RTP time points; 135 variables entered into analysis Weight of evidence (WoE) transformation + L1-penalized logistic regression for feature selection + final L2-regularized logistic regression
Internal

(Stratified random split into training set (n = 155) and test set (n = 39) using a held-out test set; additional stability analysis across 20 random training-test splits)

AUC: 0.82 [NR]

Average Precision: 0.85 [NR]

Precision (PPV): 95% [NR]

Sensitivity/Recall: 79% [NR] at a false-positive rate of 6.67%;

Haller et al.²⁷
N = 25

Elite under-18 academy / elite European youth soccer players

(Male)
Non-contact injury prediction; injury statistics included time-loss and medical-attention injuries, but only non-contact injuries were included in the current analysis Comprehensive monitoring approach including training and game data / training load, blood biomarkers, questionnaires, neuromuscular performance (CMJ), hamstring strength, hip adductor/abductor strength, and fitness-related variables Linear Support Vector Machine (SVM)
Internal

(18 players randomly selected for training and 5 for testing; model trained with two-fold cross-validation)

Accuracy: 96.3% [NR]

Precision: 11.1% [NR]

Recall: 25.0% [NR]

Cohen's Kappa: 0.138 [NR]

Merrigan et al.²⁶
N = 155

NCAA Division I female athletes (basketball, volleyball, soccer, lacrosse, and field hockey)

(Female)
Noncontact lower-extremity injuries occurring within 3 months following baseline CMJ testing; occurred during practice or competition; resulted in modified or unavailable training status according to medical staff Injury surveillance records and baseline countermovement jump (CMJ) force-time metrics collected using dual force plates (ForceDecks); predictors included previous injury history, jump height, eccentric mean power, and minimum eccentric force Logistic regression
Internal

(80:20 train-test split)

ROC AUC: 0.659 [NR]

Accuracy: 85.6% [NR]

Recall: 0.0 [NR]

Ren et al.²⁵
N = 63

Professional rugby union players

(Male)
Noncontact injuries / time-loss injury prediction GPS data and derived metrics: total workload in the 1, 2, and 3 weeks prior to injury, EWMA ACWR over different time windows, monotony, and strain Random Forest (RF) (best for forwards overall)
Internal

(10-fold cross-validation; 70/30 train-test split; process repeated 20 times)

AUC: 0.86 [NR]

Precision: 0.74 [NR]

F1: 0.61 [NR]

Tabben et al.²⁴
N = 1258

Adult male professional footballers from Qatar Stars League / domestic professional football

(Male)
Subsequent injury defined as any new time-loss injury occurring to the same player within the same competitive season as the index injury; contact injuries excluded in the analytic dataset Prospectively recorded time-loss injury surveillance data across 8 seasons (July 2013–May 2021); variables included diagnosis, onset, severity, injury type, body part, index injury / re-injury status, training vs match injury, and contact vs non-contact First-order Markov model NR
Subsequent injuries: 34.0% (1599/4700)

Hamstring→ Hamstring: 7.5% (±1.3%)

Groin→ Hamstring: 2.9% (±0.82%)

Tsilimigkras et al.³³
N = 25

Professional soccer players;

(Male)
First-time, non-contact muscle injury Internal load (heart rate-related metrics) and external load (GPS/accelerometer-derived metrics: speed, distance, accelerations, decelerations, sprint-related variables) over a 28-day pre-injury epoch, compared with a 28-day baseline epoch Support Vector Machine (SVM) with RBF kernel
Internal

(leave-one-subject-out cross-validation, LOOCV)

AUC: 0.747 [NR]

Accuracy: 0.78 [NR]

Sensitivity: 0.73 [NR]

Specificity: 0.85 [NR]

Cohan et al.³¹
N = 856

Professional basketball players (NBA)

(Male)
Injury classification / injury risk prediction based on past injuries and game activity; injuries attributed to the most recent game within 30 days Publicly available NBA injury and game-related data; predictors included current game features, past 5 games, and past injuries. Examples: coach, player position, age, height, weight, game location, time played, time since game, body area, age during injury, time since injury METIC (Multiple bidirectional Encoder Transformers for Injury Classification); deep learning transformer-based model
Internal

(Data split into 85% training / 15% validation using an iterative stratified approach for injury pairs)

ROC AUC: 0.80 [NR]

Accuracy: 93.4% [NR]

Average Precision: 0.0087 [NR]

Recall: 0.20 [NR]

F1: 0.020 [NR]

Guo et al.³²
N = 104

Collegiate basketball players

(Male)
Incidence of ACL injury during 12-month follow-up; injury outcome confirmed clinically by a positive Lachman test and MRI Athletes’ profile, physical functions, basketball-specific skills, biomechanics, and EMG of seven lower-limb muscles during unanticipated side-cutting maneuvers; only variables with significant between-group differences were entered into modeling Random Forest (RF) (among SVM, RF, XGBoost, and logistic regression; RF was the best-performing model)
Internal

(90/10 train-test split + stratified 10-fold cross-validation; model performance also evaluated on an independent test set; SMOTE applied within training subsets)

AUC: 0.7974 ± 0.2273 [NR]

Accuracy: 0.9619 ± 0.0376 [NR]

Precision: 0.6667 ± 0.4714 [NR]

Sensitivity: 0.6021 ± 0.4595 [NR]

F1: 0.6133 ± 0.4431 [NR]

Specificity: 0.9947 ± 0.0166 [NR]

Author (Year)	Population (N, level, sex)	Injury / outcome definition	Data source / predictors	Model algorithm	Validation strategy	Performance [95% CI]
Bergeron et al.³	N = 1611 High school student-athletes (Mixed)	Sport-related concussion Symptom resolution time (prediction thresholds at 7, 14, and 28 days)	NATION injury surveillance data; 17-item yes/no symptom checklist, injury circumstances, sex, class year, and other injury-related variables	Naive Bayes	Internal (10-fold cross-validation, repeated 10 times)	AUC: 0.727 [NR] (Naive Bayes; 7-day threshold)
Chu et al.²⁰	N = 655 Pediatric Patients (362 Male, 293 Female)	Protracted Recovery (>21 days to medical clearance)	VOMS, King-Devick test, C3 Logix Trails Test	CatBoost	Internal (65% training, 15% validation, 20% testing; model comparison results generated using 5-fold cross-validation)	AUC (Male): 0.84 [NR] AUC (Female): 0.78 [NR]
Jauhiainen et al.¹⁸	N = 314 Youth Basketball & Floorball (152 Male, 162 Female)	Moderate to severe acute non-contact knee and ankle injuries; time-loss at least 8 days	Physical test data including 3D motion analysis (VDJ), anthropometrics, strength, joint laxity, flexibility, and other physical measures	L1-regularized logistic regression	Internal (10-fold cross-validation, repeated 100 times)	AUC: 0.65 [NR] Random Forest AUC: 0.63 [NR]
Lu et al.¹⁷	N = 2103 NBA athletes	Time-loss lower extremity muscle strains (LEMSs); defined as major muscle strains leading to loss of playing time, including hamstring, quadriceps, calf, and groin strains	Publicly sourced online platforms; demographic characteristics, prior injury history, and performance metrics	XGBoost	Internal (0.632 bootstrap, 1000 resamples; model training used 10-fold CV repeated 3 times)	AUC: 0.840 [0.831–0.845] Brier score: 0.030
Rommers et al.¹⁹	N = 734 Elite-level youth football players; U10–U15 (Male)	Medical attention injury; first occurring injury during the season was used in the analysis; injuries were also classified as overuse or acute	Preseason anthropometric, motor coordination, physical fitness, and demographic measures	Extreme Gradient Boosting (XGBoost)	Internal (random split with 80% training data and 20% hold-out test data; model also evaluated with cross-validation and grid search for hyperparameter optimization)	Precision: 85% [NR] Recall: 85% [NR] F1: 85% [NR]
Alzahrani et al.²²	N = 50 28 Amateur, 22 Professional (incl. 5 National-level) (70% Male, 30% Female)	Real-time injury-risk assessment and rehabilitation tracking / optimization; early detection of emerging ACL and muscle-strain risks	Wearable IMUs and sEMG; joint angles, angular velocities, muscle forces, and asymmetry-related biomechanical features	Hybrid IMU–sEMG framework with LSTM-based dynamic modeling	Internal (Subject-wise 10-fold cross-validation)	AUC: 0.93 [NR] Accuracy: 92.3%[NR] Precision: 88.1% [NR] Recall: 90.5%[NR]
Weng et al.²¹	N = 98 Baseball players spanning youth to professional levels (Male)	Baseball-related upper extremity injuries during a baseball season; shoulder or elbow complaints causing the player to miss at least one practice or game; non-baseball and non-throwing injuries were excluded	Clinical parameters including glenohumeral internal/external rotation, posterior capsule thickness, supraspinatus tendon thickness, acromiohumeral distance, and occupation ratio	GIRD, Logistic Regression, Random Forest, SVM, and CatBoost;	Internal (Train/test split (80:20); bootstrapping (100 rounds) plus SMOTE; final model validated on an independent test set)	Bootstrapping (CatBoost): AUC 0.66 ± 0.05; Accuracy 0.70 ± 0.05; Precision 0.52 ± 0.09; Recall 0.55 ± 0.09; F1-score 0.53 ± 0.08; Specificity 0.77 ± 0.07 Independent test set (CatBoost): AUC 0.62; Accuracy 0.70; Precision 0.33; Recall 0.50; F1-score 0.40; Specificity 0.75;
Castellanos et al.³⁰	N = 15,682 Participants from 21 US academic institutions and military service academies during the 2015–2016 academic year (65.8% Male, 34.2% Female)	Clinician-diagnosed sport-related concussion sustained during sport participation between Aug 1, 2015 and Jul 31, 2016	Baseline CARE Consortium data; 176 baseline covariates mapped to 957 binary features, including demographics, medical/sport/academic/family history, SCAT symptom checklist, SAC, BESS, computerized neurocognitive assessment, and other baseline measures	Linear Support Vector Machine (SVM)	Internal (Repeated train/test splitting; approximately 80% training / 20% held-out test set, repeated 20,000 times)	AUROC: 0.73 [0.70–0.76]
Jauhiainen et al.²⁸	N = 791 Female elite handball and soccer players (Female)	Noncontact / indirect contact ACL injuries	Extensive preseason screening data including demographic, neuromuscular, biomechanical, anatomic, and genetic variables; 3D motion analysis, anthropometrics, strength, flexibility, balance, questionnaire/history data; 283 variables	Linear support vector machine (SVM) was the best-performing model	Internal (5-fold cross-validation, repeated 100 times, with permutation tests)	AUC-ROC: 0.63 [NR]
Saberisani et al.²⁹	N = 25 Professional soccer players from one team in the Persian Gulf Pro League (Iran) (Male)	Any collision or non-collision injury that prevented a player from participating in at least one training session or match	GPS-based external load variables; acute-to-chronic workload ratio (ACWR) derived from: total distance covered, average total distance covered, distances covered at high and moderate speeds, total distance load covered, accelerations, decelerations	Decision Tree Classifier	Internal (Random split: 80% training / 20% testing)	Best AUC model: Decelerations — AUC: 0.91 [NR] Accuracy: 94.7% [NR] Precision: 58.3% [NR] Recall: 87.5% [NR]
Claros et al.²³	N = 194 Collegiate student-athletes (NCAA) from a single institution (73 Male, 121 Female)	Subsequent musculoskeletal (MSK) injury risk within 1 year after return to play (RTP) following sport-related concussion (SRC); MSK injury was defined as an injury requiring treatment by athletic training staff or team physicians and resulting in at least 1 day of limited activity	Single-institution cohort data; predictors included demographics and anthropometrics, medical and athletic history, concussion injury characteristics, and multidomain concussion assessments collected at baseline, acute (<48 h), asymptomatic, and RTP time points; 135 variables entered into analysis	Weight of evidence (WoE) transformation + L1-penalized logistic regression for feature selection + final L2-regularized logistic regression	Internal (Stratified random split into training set (n = 155) and test set (n = 39) using a held-out test set; additional stability analysis across 20 random training-test splits)	AUC: 0.82 [NR] Average Precision: 0.85 [NR] Precision (PPV): 95% [NR] Sensitivity/Recall: 79% [NR] at a false-positive rate of 6.67%;
Haller et al.²⁷	N = 25 Elite under-18 academy / elite European youth soccer players (Male)	Non-contact injury prediction; injury statistics included time-loss and medical-attention injuries, but only non-contact injuries were included in the current analysis	Comprehensive monitoring approach including training and game data / training load, blood biomarkers, questionnaires, neuromuscular performance (CMJ), hamstring strength, hip adductor/abductor strength, and fitness-related variables	Linear Support Vector Machine (SVM)	Internal (18 players randomly selected for training and 5 for testing; model trained with two-fold cross-validation)	Accuracy: 96.3% [NR] Precision: 11.1% [NR] Recall: 25.0% [NR] Cohen's Kappa: 0.138 [NR]
Merrigan et al.²⁶	N = 155 NCAA Division I female athletes (basketball, volleyball, soccer, lacrosse, and field hockey) (Female)	Noncontact lower-extremity injuries occurring within 3 months following baseline CMJ testing; occurred during practice or competition; resulted in modified or unavailable training status according to medical staff	Injury surveillance records and baseline countermovement jump (CMJ) force-time metrics collected using dual force plates (ForceDecks); predictors included previous injury history, jump height, eccentric mean power, and minimum eccentric force	Logistic regression	Internal (80:20 train-test split)	ROC AUC: 0.659 [NR] Accuracy: 85.6% [NR] Recall: 0.0 [NR]
Ren et al.²⁵	N = 63 Professional rugby union players (Male)	Noncontact injuries / time-loss injury prediction	GPS data and derived metrics: total workload in the 1, 2, and 3 weeks prior to injury, EWMA ACWR over different time windows, monotony, and strain	Random Forest (RF) (best for forwards overall)	Internal (10-fold cross-validation; 70/30 train-test split; process repeated 20 times)	AUC: 0.86 [NR] Precision: 0.74 [NR] F1: 0.61 [NR]
Tabben et al.²⁴	N = 1258 Adult male professional footballers from Qatar Stars League / domestic professional football (Male)	Subsequent injury defined as any new time-loss injury occurring to the same player within the same competitive season as the index injury; contact injuries excluded in the analytic dataset	Prospectively recorded time-loss injury surveillance data across 8 seasons (July 2013–May 2021); variables included diagnosis, onset, severity, injury type, body part, index injury / re-injury status, training vs match injury, and contact vs non-contact	First-order Markov model	NR	Subsequent injuries: 34.0% (1599/4700) Hamstring→ Hamstring: 7.5% (±1.3%) Groin→ Hamstring: 2.9% (±0.82%)
Tsilimigkras et al.³³	N = 25 Professional soccer players; (Male)	First-time, non-contact muscle injury	Internal load (heart rate-related metrics) and external load (GPS/accelerometer-derived metrics: speed, distance, accelerations, decelerations, sprint-related variables) over a 28-day pre-injury epoch, compared with a 28-day baseline epoch	Support Vector Machine (SVM) with RBF kernel	Internal (leave-one-subject-out cross-validation, LOOCV)	AUC: 0.747 [NR] Accuracy: 0.78 [NR] Sensitivity: 0.73 [NR] Specificity: 0.85 [NR]
Cohan et al.³¹	N = 856 Professional basketball players (NBA) (Male)	Injury classification / injury risk prediction based on past injuries and game activity; injuries attributed to the most recent game within 30 days	Publicly available NBA injury and game-related data; predictors included current game features, past 5 games, and past injuries. Examples: coach, player position, age, height, weight, game location, time played, time since game, body area, age during injury, time since injury	METIC (Multiple bidirectional Encoder Transformers for Injury Classification); deep learning transformer-based model	Internal (Data split into 85% training / 15% validation using an iterative stratified approach for injury pairs)	ROC AUC: 0.80 [NR] Accuracy: 93.4% [NR] Average Precision: 0.0087 [NR] Recall: 0.20 [NR] F1: 0.020 [NR]
Guo et al.³²	N = 104 Collegiate basketball players (Male)	Incidence of ACL injury during 12-month follow-up; injury outcome confirmed clinically by a positive Lachman test and MRI	Athletes’ profile, physical functions, basketball-specific skills, biomechanics, and EMG of seven lower-limb muscles during unanticipated side-cutting maneuvers; only variables with significant between-group differences were entered into modeling	Random Forest (RF) (among SVM, RF, XGBoost, and logistic regression; RF was the best-performing model)	Internal (90/10 train-test split + stratified 10-fold cross-validation; model performance also evaluated on an independent test set; SMOTE applied within training subsets)	AUC: 0.7974 ± 0.2273 [NR] Accuracy: 0.9619 ± 0.0376 [NR] Precision: 0.6667 ± 0.4714 [NR] Sensitivity: 0.6021 ± 0.4595 [NR] F1: 0.6133 ± 0.4431 [NR] Specificity: 0.9947 ± 0.0166 [NR]

3.3.1 Population diversity and geographic context

The study populations demonstrated an increasing trend toward demographic and geographic diversity. While earlier research predominantly featured professional male athletes, more recent studies have expanded to include female and other non-traditional cohorts. For example, Jauhiainen et al. developed an ACL injury prediction model using data from female elite handball and soccer players, whereas Merrigan et al. examined noncontact lower-body injury predictability in female NCAA Division I athletes using countermovement jump force-time metrics.^26,28 In addition, the geographic scope of recent studies has broadened beyond traditionally dominant North American and European settings. For instance, Saberisani et al. investigated football injury prediction in Iranian professional players, Tabben et al. analyzed subsequent injury patterns in professional football players from the Qatar Stars League, and Guo et al. examined ACL injury prediction in male collegiate basketball players in an Asian context.^24,29,32 Furthermore, the demographic scope has expanded beyond elite professionals to include collegiate athletes and more diverse competitive contexts, providing broader insight beyond traditional male professional cohorts. Overall, these developments suggest a gradual move toward greater contextual diversity, although male athletes remain overrepresented in the current literature.

3.3.2 Methodological quality and validation strategies

A significant validation gap persists within the field of injury prediction. Notably, most included studies relied solely on internal validation strategies, such as k-fold cross-validation or random data splitting, rather than external validation in independent cohorts. For example, Jauhiainen et al. explicitly reported repeated cross-validation procedures and emphasized the instability of prediction estimates across repetitions, ultimately concluding that statistical predictive ability did not necessarily translate into clinical usefulness.²⁸ Only a small minority of studies reported more rigorous testing procedures or uncertainty estimates. Castellanos et al., for instance, evaluated concussion prediction using a held-out test set and reported AUROC with 95% confidence intervals, while the study by Lu et al. was one of the few studies in the review to provide confidence intervals for model performance metrics.^17,30 Additionally, reporting quality was often limited, with many studies failing to clearly describe stratification procedures, temporal separation strategies, or safeguards against data leakage in longitudinal datasets. Collectively, these issues suggest that many reported performance metrics may be optimistic and may not fully reflect real-world model generalizability.

3.3.3 Technical frameworks and predictors

To provide a more structured synthesis of the heterogeneous evidence, the included studies can be broadly grouped according to model type, predictor domain, and sport context. In terms of model type, tree-based ensemble methods, including Random Forest, XGBoost, and CatBoost, were commonly used for handling non-linear injury risk factors in tabular datasets involving workload, injury history, clinical, and biomechanical variables. For example, Guo et al. reported that Random Forest achieved the highest predictive performance for ACL injury risk among male basketball players,³² while Weng et al. identified CatBoost as the best-performing model in baseball-related upper extremity injury prediction.²¹ Support Vector Machine and logistic-regression-based approaches were also frequently applied, particularly in smaller datasets or studies emphasizing interpretable classification. Emerging research has also begun to apply deep learning and temporal modeling approaches. Cohan et al., for instance, used a deep learning framework to forecast injuries in NBA basketball based on longitudinal injury and game-related data.³¹

In terms of predictor domains, the included studies generally drew on four overlapping categories: workload and exposure variables, previous injury and clinical history, biomechanical or neuromuscular measures, and subjective or contextual indicators such as sleep quality, academic stress, or questionnaire-based measures. Previous injury history and training load metrics remained among the most frequently reported and consistent predictors across studies. Tsilimigkras et al. further extended this line of work by incorporating both external load and internal load variables, including heart-rate-related measures, into a soccer injury risk model, highlighting the importance of physiological overload alongside mechanical demand.³³ Meanwhile, more complex models increasingly incorporated biomechanical and neuromuscular indicators. Guo et al. integrated sport-specific side-cutting biomechanics and electromyography,³² while Bergeron et al. demonstrated the added value of subjective psychological indicators such as sleep quality and academic stress.³

Finally, sport contexts varied across the included evidence, with football/soccer, basketball, baseball, rugby, and collegiate multi-sport cohorts represented. This structured grouping suggests that model choice was often shaped by data structure and sport context, but the lack of standardized predictor definitions and validation procedures limits direct comparison across studies. Together, these findings suggest an increasing emphasis on holistic and multimodal injury prediction frameworks.

3.3.4 Model performance metrics

Reported model performance was variable, with AUC values in many studies falling approximately between 0.65 and 0.85, suggesting modest to moderate discriminative ability in most cases. For example, Jauhiainen et al. reported a mean AUC-ROC of 0.63 for ACL injury prediction in female elite athletes, emphasizing that statistically significant prediction above chance may remain insufficient for clinical implementation.²⁸ By contrast, Claros et al. reported an AUC of 0.82 for predicting post-concussion musculoskeletal injury risk in collegiate athletes, suggesting that more comprehensive and clinically structured variable sets may improve predictive discrimination.²³

However, performance metrics should be interpreted cautiously. Merrigan et al. reported a high overall model accuracy of 85.6% but a low AUC of 0.659 and a recall of 0.0%, indicating that no injuries in the testing dataset were correctly classified.²⁶ Similarly, Haller et al. observed apparently high classification accuracy in a holistic monitoring framework, yet also acknowledged low precision and a considerable number of false positives.²⁷ These examples highlight that accuracy alone is insufficient for evaluating injury prediction models, particularly in imbalanced datasets, and that clinically meaningful interpretation requires consideration of recall, precision, AUC, and generalizability.

4. Discussion

4.1 Summary of principal findings

This systematic review synthesized 18 empirical studies that applied artificial intelligence and machine learning techniques to sports injury prediction. Overall, the findings suggest that this field has expanded rapidly in recent years, with increasing diversity in study populations, sports contexts, and data sources. Across the included studies, tree-based ensemble methods such as Random Forest, XGBoost, and CatBoost were the most frequently used algorithms, although deep learning approaches have also begun to emerge in more complex or longitudinal datasets.

A central finding of this review is that, despite promising performance in some individual studies, most AI-based sports injury prediction models remain methodologically underdeveloped. Based on the PROBAST assessment, only one study demonstrated a low overall risk of bias, with most judged to be high risk primarily because of weaknesses in the analysis domain, particularly the reliance on internal validation alone. These findings suggest that much of the current literature remains at a proof-of-concept stage rather than being ready for routine clinical or field deployment.

In addition, this review reaffirmed that previous injury history and training load remain the most frequently reported and consistent predictors across sports and settings. At the same time, more recent work has begun to integrate broader predictor domains, including biomechanical variables, wearable sensor outputs, electromyography, and subjective psychological indicators such as sleep or stress. These developments suggest that the field is gradually moving from single-domain screening toward more holistic and multimodal injury prediction frameworks.

4.2 Interpretation and comparison with existing literature

4.2.1 Methodological quality and validation gap

A major finding of this review is the persistence of a clear validation gap. Most included studies used internal validation strategies such as random train-test splits, k-fold cross-validation, or bootstrap-based resampling, whereas external validation across independent cohorts was uncommon. This is important because internally validated models may perform well within the development dataset yet fail to generalize when applied to new teams, seasons, institutions, or athlete populations. Jauhiainen et al., for example, explicitly demonstrated that although prediction performance was statistically better than chance, the resulting ACL injury model still lacked sufficient clinical utility.²⁸

This concern is consistent with the broader methodological literature. Majumdar et al. noted that football injury prediction studies are characterized by substantial heterogeneity in variable selection, data treatment, balancing procedures, and validation approaches, making unified interpretation difficult and limiting practical translation.³⁴ They further emphasized that further progress in the field will require not only better-performing algorithms but also multiple seasons of data, careful handling of class imbalance, and more interpretable analytical pipelines.

The validation gap also affects how high reported performance values should be interpreted. Some studies reported strong accuracy or discrimination metrics, but these estimates were often derived from small samples, event-sparse datasets, or single-center cohorts. Under such conditions, even apparently strong model performance may partly reflect overfitting, optimistic sampling, or hidden leakage rather than robust predictive ability. Accordingly, future studies should treat external validation as a core design requirement rather than an optional enhancement.

4.2.2 Reporting quality and interpretation of performance metrics

A second important issue concerns reporting quality. Across the included studies, model reporting was inconsistent, particularly with respect to uncertainty estimates, calibration, data preprocessing steps, and the handling of missing values. Only a very limited number of studies reported confidence intervals for key performance metrics. This omission substantially limits the ability to judge the precision, stability, and reproducibility of reported models.

Equally important, this review highlights that model performance cannot be interpreted on the basis of accuracy alone. In injury prediction, class imbalance is often substantial because injury events are much less frequent than non-injury observations. Under these conditions, a model may achieve high overall accuracy while still performing poorly in identifying injured athletes. The study by Merrigan et al. provides a clear example: despite relatively high accuracy, the model's AUC was modest and recall was 0.0%, meaning that no injuries in the testing dataset were correctly classified.²⁶ Haller et al. likewise reported apparently strong classification results but also acknowledged low precision and a notable false-positive burden.²⁷

This interpretation aligns closely with the concerns raised by both Cohan et al. and Majumdar et al., who emphasized that sports injury datasets are typically imbalanced and that inappropriate evaluation metrics can overstate practical usefulness.^31,34 Therefore, future research should routinely report a broader range of metrics, including AUC, recall/sensitivity, specificity, precision, F1-score, and calibration-related information, ideally with confidence intervals.

4.2.3 Predictor domains: From foundational variables to multimodal frameworks

Although algorithmic diversity has increased, the main predictor domains used in injury prediction models have remained relatively consistent. Previous injury history and workload-related measures continue to emerge as foundational predictors across multiple studies and sports. This pattern is biologically plausible and practically meaningful: prior injury likely reflects residual impairment, incomplete recovery, or persistent vulnerability, whereas workload variables capture cumulative and acute mechanical and physiological stress.

At the same time, the literature is increasingly moving beyond these core variables. Tsilimigkras et al. incorporated both external and internal load, including heart-rate-derived variables, suggesting that physiological overload may add predictive value beyond purely mechanical training metrics.³³ Guo et al. integrated sport-specific biomechanics and EMG during unanticipated side-cutting, illustrating the value of capturing movement patterns at biomechanically critical phases.³² Claros et al. combined demographic, injury-specific, and concussion-assessment variables to model subsequent musculoskeletal injury after concussion, while Bergeron et al. highlighted the importance of psychological and contextual variables such as sleep quality and academic stress.^3,23

These developments are consistent with a broader shift in sports biomechanics and athlete monitoring. Dhahbi emphasized that biomechanical analysis, wearable technologies, and neuromuscular monitoring are increasingly central not only to performance optimization but also to injury mitigation and rehabilitation planning.³⁵ In this sense, the growing move toward multimodal models is theoretically coherent. However, the inclusion of more variables does not automatically guarantee better real-world performance. Larger and more heterogeneous datasets also create new challenges related to missingness, feature redundancy, dimensionality, and transportability. Thus, the field should pursue multimodal integration cautiously and with strong methodological discipline.

4.2.4 Gender imbalance and population generalizability

Another important observation is the continued overrepresentation of male athletes. Although more recent studies have begun to include female elite athletes and NCAA women's sport cohorts, the literature remains predominantly male-centered. This imbalance matters because female athletes may differ from male athletes in hormonal milieu, neuromuscular control, biomechanics, exposure patterns, and injury mechanisms. Models derived primarily from male cohorts may therefore not generalize well to female populations without dedicated validation.

This issue extends beyond sex to broader questions of generalizability. The geographical profile of the literature has become more diverse, with recent studies from Iran, Qatar, Greece, and broader Asian contexts, yet representation remains uneven. Gaddour et al. argued that AI-driven football injury prediction research remains geographically incomplete, particularly in underrepresented regions such as Africa, where contextual, infrastructural, and athlete-specific factors may differ substantially from those of Europe or North America.³⁶ This point is relevant more broadly: the external validity of injury prediction models likely depends not only on sport type but also on local training culture, competition structure, environmental conditions, and resource availability. This concern also extends to structured physically demanding populations outside elite sport. For example, Dhahbi et al. reported a retrospective cohort of 979 newly recruited male police cadets during an initial training phase, showing that musculoskeletal injury profiling in tactical cohorts is both feasible and clinically relevant, with nearly half of recruits sustaining at least one injury and most injuries occurring between Weeks 2 and 5.³⁷ Taken together, these observations suggest that future research should prioritize more inclusive and context-sensitive datasets rather than assuming that models developed in one setting will be universally portable.

4.2.5 Clinical translation, explainability, and implementation challenges

From a practical standpoint, the current evidence suggests that AI models should be viewed as decision-support tools rather than replacements for clinician or practitioner judgment. This is particularly true for injury prediction, where model outputs must be interpreted in context and where false positives and false negatives may carry meaningful consequences for both athlete welfare and performance planning. The implementation challenge is therefore not simply to maximize discrimination, but to develop models that are interpretable, operationally usable, and trustworthy.

Explainability is especially important here. Some recent studies have already begun to incorporate explainable AI techniques, such as SHAP-based model interpretation, to clarify which variables contribute most strongly to risk classification. This approach is valuable not only for building user trust, but also for generating clinically meaningful hypotheses and identifying modifiable intervention targets. Weng et al., for instance, used SHAP to interpret model predictions in baseball-related injury forecasting, illustrating how explainability can bridge predictive analytics and individualized prevention.²¹

In addition, the increasing use of wearable devices, computer vision, force plates, and other sensor-based platforms introduces a separate but related challenge: technological standardization. This concern has been emphasized in recent methodological commentary on AI and sport technologies. In “The Algorithmic Athlete,” Dhahbi and Chamari argued that AI-driven and sensor-based systems require more rigorous and standardized evaluation extending beyond model performance alone to include assessment of the underlying measurement technologies, including verification, analytical validation, and practical or clinical validation.³⁸ This point is highly relevant to sports injury prediction. Even a theoretically strong model may be difficult to interpret or reproduce if the underlying measurement system lacks standardization, validity, or consistent reporting. Accordingly, progress in injury prediction will depend not only on better algorithms, but also on better measurement infrastructure and stronger reporting standards.

4.3 Strengths and limitations of this review

This review has several strengths. First, the search strategy was intentionally interdisciplinary, covering both medical and engineering-oriented databases. This increased the likelihood of capturing studies across sports medicine, biomechanics, data science, and wearable technology domains. Second, the review process followed PRISMA principles, improving transparency in study selection and reporting. Third, and most importantly, the review prioritized methodological appraisal through PROBAST rather than relying solely on descriptive summaries of model performance. This allowed critical methodological weaknesses—especially those related to analysis, validation, and reporting—to be identified systematically rather than being obscured by performance metrics alone.

Nevertheless, several limitations should be acknowledged. First, although the final set of 18 studies allowed for qualitative synthesis, the evidence base remains relatively small and heterogeneous. Second, the restriction to English-language publications introduces potential language bias. Third, because of substantial heterogeneity in injury definitions, study populations, predictor sets, model architectures, and evaluation metrics, a formal meta-analysis was not appropriate. Finally, as with any systematic review in a rapidly evolving field, some newer studies may reflect emerging methodological norms that differ from older ones, making direct comparison imperfect.

4.4 Directions for future research and practical implications

To move the field from proof-of-concept toward robust real-world application, several priorities should guide future work. First, external validation must become routine. Models should be tested across independent cohorts, institutions, teams, seasons, and preferably across different geographical settings. Where feasible, temporal validation using later seasons or prospective validation in newly recruited cohorts should be prioritized over random split-sample validation alone, particularly for longitudinal training-load and injury surveillance datasets. Without this step, claims of predictive utility remain provisional. Second, reporting practices should become more standardized. Future AI-based prediction studies should align more closely with the updated TRIPOD + AI guidance for prediction model reporting.³⁹ In particular, studies should clearly report the intended use of the model, target population, predictor and outcome definitions, missing-data handling, model development procedures, and validation strategy. Greater standardization of datasets is also needed, including harmonized definitions of injury outcomes, exposure time, training-load variables, biomechanical indicators, and return-to-play status. In addition, performance reporting should include uncertainty estimates, such as 95% confidence intervals for discrimination, calibration, and classification metrics. Greater methodological transparency may enhance reproducibility and allow more credible comparison across studies. Third, explainability should be incorporated more systematically. As models become more complex, black-box predictions alone are unlikely to gain sustained practitioner trust. XAI techniques such as SHAP or LIME can help clarify individualized risk patterns and support the translation of model output into practical prevention strategies. Fourth, future research should prioritize more representative datasets. This includes female athletes, underrepresented regions, youth cohorts, and non-traditional but physically comparable cohorts where injury burden is high. Broader representation will improve both scientific fairness and real-world generalizability.

For practitioners, the present evidence suggests cautious optimism. In real-world sports team settings, AI-based injury prediction models could be integrated into athlete-monitoring systems that combine workload data, injury history, wellness questionnaires, wearable sensor outputs, and biomechanical assessments. For example, a model could flag athletes showing elevated risk patterns, prompting coaches, athletic trainers, or clinicians to review training load, recovery status, and consider individualized prevention strategies. In rehabilitation environments, prediction models could support return-to-play decision-making by tracking recovery trajectories and identifying athletes who may require additional assessment. In clinical workflows, these tools may help prioritize follow-up evaluation or guide shared decision-making when combined with professional assessment. However, successful implementation depends on high-quality data, interoperable infrastructure, appropriate privacy safeguards, clinician and coach training, and user trust in model outputs. Given the present limitations in validation and reporting, these tools should support—not replace—clinical reasoning, coach expertise, and individualized athlete management.

5. Conclusion

This systematic review synthesized 18 empirical studies on AI- and ML-based sports injury prediction, with particular attention to methodological quality. Overall, the field shows considerable promise, but current evidence remains constrained by a widespread reliance on internal validation, inconsistent reporting practices, and a generally high risk of bias. As a result, many published models should still be viewed as preliminary rather than clinically established.

Across the included studies, previous injury history and training load were the most frequently reported and consistent predictor domains, while newer approaches increasingly incorporated biomechanics, wearable-derived signals, neuromuscular variables, and psychological indicators. These developments suggest a meaningful shift toward more holistic and multimodal injury prediction. However, stronger predictive frameworks will require not only richer data, but also better study design, clearer reporting, and more rigorous external validation.

The review also identified persistent structural issues in the literature, including male-dominated study populations, uneven geographical representation, variable injury definitions, and insufficient attention to uncertainty and calibration. Future progress should therefore prioritize methodological rigor over algorithmic novelty alone. Standardized reporting, independent validation, explainable AI, and better evaluation of underlying sensor and monitoring technologies will be essential if AI-based injury prediction is to become trustworthy, interpretable, and clinically actionable.

In summary, AI has clear potential to enhance sports injury prevention, but the field has not yet reached a level of evidence sufficient for routine deployment. Advancing toward personalized, equitable, and practically useful injury prevention systems will depend on combining computational sophistication with transparent methods, representative datasets, and strong validation in real-world contexts.

Supplemental Material

sj-docx-1-thc-10.1177_09287329261455797 - Supplemental material for From data to prevention: A systematic review of artificial intelligence applications in sports injury prediction

Supplemental material, sj-docx-1-thc-10.1177_09287329261455797 for From data to prevention: A systematic review of artificial intelligence applications in sports injury prediction by Chu-Hsuan Lee, Ming-Hui Chang and Zheng-Hao Li in Technology and Health Care

Footnotes

Abbreviations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Credit statement / author contributions

C.-H.L.: Conceptualization, methodology, investigation, funding acquisition, writing – original draft, and writing – review and editing.

M.-H.C.: Conceptualization, methodology support, and writing – review and editing.

Z.-H.L.: Data extraction, data curation, investigation, formal analysis, and writing – original draft.

All authors read and approved the final version of the manuscript.

Funding

The author is grateful to the National Science and Technology Council, R.O.C for supporting this research under grant 115-2420-H-239-002-.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Availability of data and materials

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Supplemental material

Supplemental material for this article is available online.

ORCID iDs

Chu-Hsuan Lee

Ming-Hui Chang

Zheng-Hao Li

References

Goodlin

Roos

, et al. The dawning age of genetic testing for sports injuries. Clin J Sport Med 2015; 25: 1–5.

Edouard

Dandrieux

Iatropoulos

, et al. Injuries in athletics (track and field): a narrative review presenting the current problem of injuries. Dtsch Z Sportmed 2024; 75: 132–141.

Bergeron

Landset

Maugans

, et al. Machine learning in modeling high school sport concussion symptom resolve. Med Sci Sports Exerc 2019; 51: 1362–1371.

Gulanes

Fadare

Pepania

, et al. Preventing sports injuries: a review of evidence-based strategies and interventions. Salud Cienc Tecnol 2024; 4: 51.

Duignan

Doherty

Caulfield

, et al. Single-item self-report measures of team-sport athlete wellbeing and their relationship with training load: a systematic review. J Athl Train 2020; 55: 944–953.

Adi

Aliriad

Kusuma

DWY

, et al. Athletes’ stress and anxiety before the match. Indones J Phys Educ Sport Sci 2024; 4: 11–21.

Gabbett

. The training-injury prevention paradox: should athletes be training smarter and harder? Br J Sports Med 2016; 50: 273–280.

Milewski

Skaggs

Bishop

, et al. Chronic lack of sleep is associated with increased sports injuries in adolescent athletes. J Pediatr Orthop 2014; 34: 129–133.

Van Eetvelde

Mendonça

Ley

, et al. Machine learning methods in sport injury prediction and prevention: a systematic review. J Exp Orthop 2021; 8: 27.

10.

Guelmami

Fekih-Romdhane

Mechraoui

, et al. Injury prevention, optimized training and rehabilitation: how is AI reshaping the field of sports medicine. N Asian J Med 2023; 1: 30–34.

11.

Bahr

Krosshaug

. Understanding injury mechanisms: a key component of preventing injuries in sport. Br J Sports Med 2005; 39: 324–329.

12.

Leckey

van Dyk

Doherty

, et al. Machine learning approaches to injury risk prediction in sport: a scoping review with evidence synthesis. Br J Sports Med 2025; 59: 491–500.

13.

Amendolara

Pfister

Settelmayer

, et al. An overview of machine learning applications in sports injury prediction. Cureus 2023; 15: e46170.

14.

Page

McKenzie

Bossuyt

, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. Br Med J. 2021; 372: 71.

15.

Wolff

Moons

KGM

Riley

, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med 2019; 170: 51–58.

16.

Moons

KGM

Wolff

Riley

, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med 2019; 170: W1–W33.

17.

Pareek

Lavoie-Gagne

, et al. Machine learning for predicting lower extremity muscle strain in National Basketball Association athletes. Orthop J Sports Med 2022; 10: 23259671221111742.

18.

Jauhiainen

Kauppi

Leppänen

, et al. New machine learning approach for detection of injury risk factors in young team sport athletes. Int J Sports Med 2021; 42: 175–182.

19.

Rommers

Rössler

Verhagen

, et al. A machine learning approach to assess injury risk in elite youth football players. Med Sci Sports Exerc 2020; 52: 1745–1751.

20.

Chu

Knell

Brayton

, et al. Machine learning to predict sports-related concussion recovery using clinical data. Ann Phys Rehabil Med 2022; 65: 101626.

21.

Weng

Chang

, et al. Enhanced personalized prediction of baseball-related upper extremity injuries through novel features and explainable artificial intelligence. J Sports Sci 2025; 43: 719–727.

22.

Alzahrani

Aljohany

Alsirhani

. Real-time wearable biomechanics framework for sports injury prevention and rehabilitation optimization. Sci Rep 2026; 16: 4436.

23.

Claros

Anderson

Qian

, et al. A machine learning model for post-concussion musculoskeletal injury risk in collegiate athletes. Sports Med 2025; 55: 1971–1982.

24.

Tabben

Chamari

Alkhelaifi

, et al. Longitudinal cohort study on subsequent injury risk in professional football players in the Qatar stars league: a probabilistic approach using basic learning. Biol Sport 2026; 43: 489–498.

25.

Ren

Boisbluche

Philippe

, et al. Global positioning system-derived metrics and machine learning models for injury prediction in professional rugby union players. Eur J Sport Sci 2025; 25: e70057.

26.

Merrigan

Stone

Kraemer

, et al. Female national collegiate athletic association division-I athlete injury prediction by vertical countermovement jump force-time metrics. J Strength Cond Res 2024; 38: 783–786.

27.

Haller

Kranzinger

, et al. Predicting injury and illness with machine learning in elite youth soccer: a comprehensive monitoring approach over 3 months. J Sports Sci Med 2023; 22: 476–487.

28.

Jauhiainen

Kauppi

Krosshaug

, et al. Predicting ACL injury using machine learning on data from an extensive screening test battery of 880 female elite athletes. Am J Sports Med 2022; 50: 2917–2924.

29.

Saberisani

Barati

Zarei

, et al. Prediction of football injuries using GPS-based data in Iranian professional football players: a machine learning approach. Front Sports Act Living 2025; 7: 1425180.

30.

Castellanos

Phoo

Eckner

, et al. Predicting risk of sport-related concussion in collegiate athletes and military cadets: a machine learning approach using baseline data from the CARE consortium study. Sports Med 2021; 51: 567–579.

31.

Cohan

Schuster

Fernandez

. A deep learning approach to injury forecasting in NBA basketball. J Sports Anal 2021; 7: 277–289.

32.

Guo

Cui

Loh

, et al. Prediction of ACL injury incidence and analysis of key features in basketball players based on multi-algorithm models. PeerJ 2025; 13: e20141.

33.

Tsilimigkras

Kakkos

Matsopoulos

, et al. Enhancing sports injury risk assessment in soccer through machine learning and training load analysis. J Sports Sci Med 2024; 23: 537–547.

34.

Majumdar

Bakirov

Hodges

, et al. Machine learning for understanding and predicting injuries in football. Sports Med Open 2022; 8: 73.

35.

Dhahbi

. Editorial: advancing biomechanics: enhancing sports performance, mitigating injury risks, and optimizing athlete rehabilitation. Front Sports Act Living 2025; 7: 1556024.

36.

Gaddour

Nticha

Mtawaa

, et al. Injury prediction in football: how artificial intelligence is shaping the present and transforming the future in Africa. Sports Health 2026; 18: 247–249.

37.

Dhahbi

Ben Saad

Dergaa

, et al. Injury profiling in male police cadets during initial training phase: a retrospective cohort study. Am J Mens Health 2024; 18: 15579883241304584.

38.

Dhahbi

Chamari

. The algorithmic athlete: a call to standardize assessment of sensor technologies and artificial intelligence. Int J Sports Physiol Perform 2026; 1: –2.

39.

Collins

Moons

KGM

Dhiman

, et al. TRIPOD + AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. Br Med J. 2024; 385: e078378.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.03 MB

0.00 MB

From data to prevention: A systematic review of artificial intelligence applications in sports injury prediction

Abstract

Purpose

Methods

Results

Conclusion

Keywords

1. Introduction

2. Methods

2.1 Search strategy

2.2 Inclusion criteria

2.3 Exclusion criteria

2.4 Selection of sources of evidence

2.5 Data extraction

2.6 Quality assessment

3. Results

3.1 Study selection

Table 1. PROBAST risk of bias assessment.

3.3.2 Methodological quality and validation strategies

3.3.3 Technical frameworks and predictors

3.3.4 Model performance metrics

4. Discussion

4.1 Summary of principal findings

4.2 Interpretation and comparison with existing literature

4.2.1 Methodological quality and validation gap

4.2.2 Reporting quality and interpretation of performance metrics

4.2.3 Predictor domains: From foundational variables to multimodal frameworks

4.2.4 Gender imbalance and population generalizability

4.2.5 Clinical translation, explainability, and implementation challenges

4.3 Strengths and limitations of this review

4.4 Directions for future research and practical implications

5. Conclusion

Supplemental Material

sj-docx-1-thc-10.1177_09287329261455797 - Supplemental material for From data to prevention: A systematic review of artificial intelligence applications in sports injury prediction

Footnotes

Abbreviations

Ethics approval and consent to participate

Consent for publication

Credit statement / author contributions

Funding

Declaration of conflicting interests

Availability of data and materials

Supplemental material

ORCID iDs

References

Supplementary Material

Table 1.
PROBAST risk of bias assessment.