A Machine Learning-Based Prognostic Model for Lower Extremity Deep Vein Thrombosis Following Acute Stroke

Abstract

Objectives

To develop a machine learning (ML)-based prognostic model for predicting the risk of lower extremity deep vein thrombosis (DVT) after acute stroke, with an emphasis on limb functional assessments.

Methods

We conducted a retrospective analysis of 225 acute stroke patients admitted within 15 days of onset between December 1, 2015, and April 30, 2025. Predictor variables were selected using collinearity diagnostics and Least Absolute Shrinkage and Selection Operator (LASSO) regression. Three machine learning survival models—Gradient Boosting Machine (GBM), Random Survival Forest (RSF), and Generalized Linear Model (GLM)—were employed to identify the most effective model. The performance of the optimal ML model was compared with that of the traditional Cox proportional hazards model using the concordance index (C-index), cumulative/dynamic area under the curve (C/D AUC), and integrated Brier score (IBS). The optimal model was further interpreted using SurvSHAP(t).

Results

Six variables were selected for the model: age, stroke type, gender, tension of the muscle, Brunnstrom stage (lower limb), and sitting balance. The RSF model, implemented using the Ranger algorithm, demonstrated superior performance, with an integrated Brier score (IBS) of 0.081 and a C-index of 0.841. Age and Brunnstrom stage (lower limb) were identified as the most influential predictors.

Conclusion

We developed an ML-based prognostic model for predicting the risk of lower limb DVT after acute stroke. Age and Brunnstrom stage (lower limb) were the most significant predictors. This model shows promise for risk stratification in clinical practice.

Keywords

machine learning stroke deep vein thrombosis prognostic model rehabilitation

Introduction

Stroke is a leading cause of acquired disability worldwide, with substantial consequences for individuals and society.¹ Deep Venous Thrombosis (DVT) is a common and potentially deadly blood clot that typically affects the veins in the lower limbs. It starts in the calf veins and can block venous return.^2,3 Current guidelines for preventing DVT after stroke lack specific measures targeting early rehabilitation, despite evidence supporting the crucial role of early activity and rehabilitation in risk reduction.^4-6 Moreover, there is limited evidence on how to tailor activity levels to different functional stages or how to incorporate rehabilitation assessments to predict post-stroke DVT.

The selection of predictive variables is crucial for developing a risk assessment model. Factors such as advanced age, female sex, diabetes mellitus, hypertension have been linked to post-stroke DVT.^7-13 Additionally, laboratory indicators including elevated D-dimer, elevated C-reactive protein, and homocysteine serve as significant biochemical predictors.^9-11,13-15 Low activity levels raise DVT risk.¹⁶ Gait function and the Brunnstrom stage are connected in post-stroke patients, impacting walking ability.^17-19 The Barthel Index(BI) evaluates activities of daily living(ADL) performance with 10 items. Multiple studies have shown a correlation between a low total BI score and the occurrence of DVT.^13,20-23 However, the effect of Brunnstrom stage and BI on DVT after stroke has not been reported.

Machine Learning (ML) is a prominent methodology utilized within the realm of data mining.^24,25 ML techniques play a crucial role in the transformation of extensive data sets, thereby enhancing decision-making processes, increasing efficacy and efficiency, and mitigating human error and costs.²⁶ Particularly within the healthcare sector, ML methods present novel prospects for the precise evaluation of DVT based on clinical medical records, surpassing conventional statistical approaches. However, there is a scarcity of prognostic models utilizing ML algorithms to forecast DVT occurrence post-stroke.

Rehabilitation plays a crucial role in restoring functional impairments, enhancing the quality of life for patients, and expediting their reintegration into their familial and societal environments. DVT poses a significant obstacle to the rehabilitation process, resulting in elevated healthcare expenditures, underscoring the importance of timely detection and prevention. The objective of this study is to employ ML techniques to develop a predictive model for the risk assessment of post-stroke DVT across varying levels of patient functionality, thereby empowering healthcare providers to make more informed prognoses for individuals at risk of post-stroke DVT.

Material and Methods

Ethical Approval and Patient Selection

This study conforms to the TRIPOD statement and was approved by the Ethics Committee of the First Affiliated Hospital of Nanjing Medical University in accordance with the principles of the Declaration of Helsinki (Ethics No. [ 2022-SR-479 ]). Since all the data were de-identified, the informed consent was waived. Inclusion criteria: Stroke confirmed by CT or MRI, age 18-85, within 15 days of onset, and complete medical records. Exclusion criteria:severe systemic diseases—including but not limited to end-stage renal disease requiring dialysis, decompensated liver cirrhosis, and severe chronic obstructive pulmonary disease; prior history of DVT or vena cava filter placement; pre-admission use of anticoagulant or antiplatelet agents; chronic vascular diseases—including arterial and venous lesions such as varicose veins, peripheral artery occlusive disease, and venous ulcers; and incomplete medical records.

Data Collection for Predictor Variables

A retrospective review was carried out to identify all patients with acute stroke who were admitted to the Rehabilitation Medical Center between December 1, 2015, and April 30, 2025. Demographic characteristics, including age, gender, subtype, and location of stroke, history of hypertension, diabetes mellitus, smoking history, alcohol use history, and previous stroke history, were systematically documented for each patient within 24 hours of admission. Furthermore, data about rehabilitation functional assessment parameters, such as Brunnstrom stage (comprising hand, arm, and lower limb), calf muscle tone(Ashworth Scale), the Barthel Index (total score and sub-items: feeding, transfers, mobility, stairs), and sitting and standing balance, were also recorded within 24 hours of admission. Information regarding DVT prevention measures, encompassing anticoagulant and antiplatelet medication therapy strategies, was also collected. The primary anticoagulant used for VTE prophylaxis was low molecular weight heparin, administered at standard prophylactic doses. All antithrombotic medications were initiated after admission in line with clinical guidelines.

The Endpoint Event

The outcome variables were defined as time to thrombosis and thrombosis status. The study’s endpoint was defined as the occurrence of lower extremity DVT within 15 days after the onset date. Diagnostic criteria for lower extremity DVT included the presence of thrombus echoes in the lumen on two-dimensional ultrasonography, characterized by either uniform low echoes or heterogeneous enhanced.²⁷

Predictive Variables Screening

To identify predictors, a combination of collinearity diagnosis and the least absolute shrinkage and selection operator (LASSO) regression techniques was employed. Collinearity diagnosis involved assessing the variance inflation factor (VIF), with any variable exhibiting a VIF exceeding 10 being excluded.²⁸ Subsequently, the remaining variables were deemed as potential predictors and underwent LASSO regression analysis.²⁹

Construction, Evaluation, and Explanation of Predictive Models

We constructed prediction models using three machine learning survival algorithms—Gradient Boosting Machine (GBM), Random Survival Forest (RSF), and Generalized Linear Model (GLM)—for comparative analysis. Then, the best ML model was compared with the classical Cox Proportional Hazards (Cox) model to filter out the optimal algorithm model. The predictive performance of the model was evaluated using a range of metrics, including accuracy, precision, sensitivity, specificity, F1 score, kappa, consistency index (C index), cumulative/dynamic AUC index (C/D AUC), and integrated Brier score (IBS). SurvSHAP(t) was developed for global and local interpretation of the model. SHAP uses the Shapley value from game theory to distribute predictions fairly among features.³⁰ It calculates each feature’s contribution to a prediction and quantifies its impact on the model’s prediction.^31,32 SurvSHAP(t) extends this method to interpret time-dependent black-box models.³³

Statistical Analysis

The random forest method was used to impute missing values.³⁴ Normality tests were performed on the dataset, with continuous variables reported as mean ± standard deviation or median (interquartile range [IQR]) and categorical variables presented as frequency (percentage). One-way ANOVA and the Kruskal-Wallis test were utilized for comparisons of normally and non-normally distributed continuous variables, respectively, while Pearson’s chi-square test was employed for comparing categorical variables. Model accuracy assessment and hyperparameter optimization were conducted using a 10-fold cross-validation technique during the training process. Statistics R software (version 4.2.2, https://www.r-project.org/) was used for statistical analyses, and P < .05 was considered statistically significant.

Results

Baseline Data Comparison

A total of 225 patients met the inclusion criteria for acute stroke. All patients were hospitalized within 15 days of the onset of their stroke. Of these patients, 160 were categorized in the non-DVT group and 65 in the DVT group. The average age of the patients was 63.71±12.13 years, with males comprising 70.70% and females 29.30% of the total sample. The baseline data for demographic characteristics and rehabilitation function assessment parameters for both groups are presented in Table 1.

Table 1.

Comparison of Demographic and Clinical Characteristics Between Patients With and Without DVT

Characteristics	Non-DVT (n=160)	DVT (n=65)	Overall (n=225)	P-value
Age (years), mean ± SD	62.44 ± 12.38	66.82 ± 10.97	63.71 ± 12.13	0.014
Stroke type, n (%)				1.000
Hemorrhagic	49 (30.6)	20 (30.8)	69 (30.7)
Ischemic	111 (69.4)	45 (69.2)	156 (69.3)
Gender, n (%)				0.016
Male	121 (75.6)	38 (58.5)	159 (70.7)
Female	39 (24.4)	27 (41.5)	66 (29.3)
Hypertension, n (%)				0.922
Yes	116 (72.5)	46 (70.8)	162 (72.0)
No	44 (27.5)	19 (29.2)	63 (28.0)
Diabetes mellitus, n (%)				0.377
Yes	43 (26.9)	22 (33.8)	65 (28.9)
No	117 (73.1)	43 (66.2)	160 (71.1)
Smoking history, n (%)				1.000
Yes	46 (28.7)	19 (29.2)	65 (28.9)
No	114 (71.3)	46 (70.8)	160 (71.1)
Alcohol use history, n (%)				1.000
Yes	36 (22.5)	15 (23.1)	51 (22.7)
No	124 (77.5)	50 (76.9)	174 (77.3)
Previous stroke, n (%)				0.816
Yes	21 (13.1)	10 (15.4)	31 (13.8)
No	139 (86.9)	55 (84.6)	194 (86.2)
Basal ganglia involvement, n (%)				0.580
Yes	87 (54.4)	32 (49.2)	119 (52.9)
No	73 (45.6)	33 (50.8)	106 (47.1)
Feeding (%)				0.129
0	34 (21.2)	22 (33.8)	56 (24.9)
1	78 (48.8)	25 (38.5)	103 (45.8)
2	48 (30.0)	18 (27.7)	66 (29.3)
Transfers (%)				0.002
0	52 (32.5)	39 (60.0)	91 (40.4)
1	44 (27.5)	13 (20.0)	57 (25.3)
2	51 (31.9)	11 (16.9)	62 (27.6)
3	13 (8.1)	2 (3.1)	15 (6.7)
Mobility (%)				0.001
0	89 (55.6)	53 (81.5)	142 (63.1)
1	19 (11.9)	1 (1.5)	20 (8.9)
2	42 (26.2)	11 (16.9)	53 (23.6)
3	10 (6.2)	0 (0.0)	10 (4.4)
Stairs (%)				0.017
0	123 (76.9)	60 (92.3)	183 (81.3)
1	27 (16.9)	5 (7.7)	32 (14.2)
2	10 (6.2)	0 (0.0)	10 (4.4)
BI, mean ± SD	44.97 ± 22.13	30.92 ± 20.21	40.91 ± 22.47	<0.001
Brunnstrom stage (arm), n (%)				0.065
I	17 (10.6)	11 (16.9)	28 (12.4)
II	38 (23.8)	23 (35.4)	61 (27.1)
III	30 (18.8)	15 (23.1)	45 (20.0)
IV	27 (16.9)	7 (10.8)	34 (15.1)
V	32 (20.0)	7 (10.8)	39 (17.3)
VI	16 (10.0)	2 (3.1)	18 (8.0)
Brunnstrom stage (hand), n (%)				0.038
I	68 (42.5)	40 (61.5)	108 (48.0)
II	15 (9.4)	4 (6.2)	19 (8.4)
III	11 (6.9)	7 (10.8)	18 (8.0)
IV	23 (14.4)	3 (4.6)	26 (11.6)
V	27 (16.9)	9 (13.8)	36 (16.0)
VI	16 (10.0)	2 (3.1)	18 (8.0)
Brunnstrom stage (lower limb), n (%)				0.003
I	9 (5.6)	11 (16.9)	20 (8.9)
II	23 (14.4)	15 (23.1)	38 (16.9)
III	39 (24.4)	21 (32.3)	60 (26.7)
IV	46 (28.7)	10 (15.4)	56 (24.9)
V	29 (18.1)	7 (10.8)	36 (16.0)
VI	14 (8.8)	1 (1.5)	15 (6.7)
Tension of the muscle, n (%)				0.354
0	148 (92.5)	62 (95.4)	210 (93.3)
1	7 (4.4)	3 (4.6)	10 (4.4)
2	5 (3.1)	0 (0.0)	5 (2.2)
Anticoagulant therapy, n (%)				0.035
Yes	6 (3.8)	8 (12.3)	14 (6.2)
No	154 (96.2)	57 (87.7)	211 (93.8)
Antiplatelet therapy, n (%)				0.356
Yes	122 (76.2)	45 (69.2)	167 (74.2)
No	38 (23.8)	20 (30.8)	58 (25.8)
Sitting balance, n (%)				0.025
0	34 (21.2)	25 (38.5)	59 (26.2)
1	11 (6.9)	7 (10.8)	18 (8.0)
2	48 (30.0)	15 (23.1)	63 (28.0)
3	67 (41.9)	18 (27.7)	85 (37.8)
Standing balance, n (%)				<0.001
0	73 (45.6)	51 (78.5)	124 (55.1)
1	35 (21.9)	2 (3.1)	37 (16.4)
2	31 (19.4)	9 (13.8)	40 (17.8)
3	21 (13.1)	3 (4.6)	24 (10.7)

Screening of Prognostic Factors

All 22 independent risk factors after collinearity diagnosis treatment were included in the LASSO regression analysis. Six characteristics were selected based on the non-zero coefficient calculated from the LASSO regression analysis. These selected features included age, stroke type, gender, tension of the muscle, Brunnstrom stage (lower limb), and sitting balance. These features were then incorporated into the ML algorithm model for further analysis.

Comprehensive Analysis of ML Algorithms

The dataset underwent testing with three ML algorithm models, namely GBM, RSF, and GLM. Following ten-fold cross-validation, the models were comprehensively assessed using various metrics, including accuracy, precision, sensitivity, specificity, F1 score, and Kappa. The RSF model demonstrated the best performance, with a sensitivity of 0.964, an F1 score of 0.837, a precision of 0.740, and an accuracy of 0.738. The comprehensive analysis revealed that the RSF model demonstrated superior performance in predicting DVT outcomes. A detailed summary of the model’s performance is provided in Table 2.

Table 2.

Performance of the GBM, RSF, GLM Model

Model	Sensitivity	Specificity	Precision	F1	Accuracy	Kappa
GBM	0.913	0.176	0.750	0.824	0.714	0.110
RF	0.964	0.208	0.740	0.837	0.738	0.216
GLM	0.946	0.946	0.736	0.828	0.725	0.191

Comparison of the RSF and Cox Model

Evaluation of three survival models (ranger, rfsrc, coxph) for DVT risk prediction demonstrated consistent superiority of machine learning methods over traditional approaches(Figures 1A and B). The random survival forest (ranger) model achieved optimal performance with the highest discrimination capacity (C-index 0.841; C/D AUC 0.590) and best predictive accuracy (Brier score 0.081). The conditional inference survival forest (rfsrc) performed comparably well, while the Cox proportional hazards model (coxph) showed substantially inferior results across all metrics (C-index 0.676; Brier score 0.107). Performance hierarchy was established as: ranger ≥ rfsrc > coxph, supporting the preference for the Ranger algorithm in DVT risk stratification.

Figure 1.

Model performance created for the Rfsrc, Ranger, and Cox models. The line chart (A) and bar chart (B) display the evaluation metrics of the models over time. The Ranger model outperforms the others in terms of the IBS scoring metric (lower is better), C/D AUC metric (higher is better), and concordance index

Features Importance for the RSF and Cox Model

Figure 2 shows the results of the three ML algorithms for ranking the features of the models by the mean C index. The top three variables in the Cox model were Brunnstrom stage (lower limb), sitting balance, and tension of the muscle. The top three variables based on the Ranger algorithm(Ranger model) and the RandomForestSRC algorithm(Rfsrc model) were age, Brunnstrom stage (lower limb), and sitting balance.

Figure 2.

Feature Importance of the Rfsrc, Ranger, and Cox model. X: One minus C−Index loss after permutations. Y: ML Models and Variable Names

Explanations of the Ranger Model

Global Feature Importance and Dependence for DVT Risk Prediction

The integration of time-dependent variable importance analysis and partial dependence analysis provides comprehensive insights into the factors predicting DVT risk in stroke patients.

Time-dependent variable importance analysis (Figure 3) demonstrated that age was the most critical predictor throughout the 15-day observation period, consistently maintaining the highest importance value (approximately 0.06). Brunnstrom stage (lower limb) emerged as the second most significant factor, showing increased importance during the later phase (days 10-15), while sitting balance capability ranked as the third most influential clinical variable. Other factors, including stroke type, muscle tension, and gender, demonstrated substantially lower and stable contributions to the prediction model.

Figure 3.

Time−dependent feature importance plot. X: Time; Y: Brier score loss after permutations with loss of full model subtracted.

Complementary partial dependence analysis (Figure 4) revealed the specific nature of these predictive relationships. Age exhibited a pronounced negative relationship with survival probability, where higher age values corresponded to significantly steeper declines in survival curves, indicating substantially increased DVT risk among older patients. Both Brunnstrom stage (lower limb) and sitting balance capability showed strong protective effects, with better functional performance associated with markedly improved survival probabilities throughout the observation period. The separation between curves for different functional capacity levels was particularly evident in the later observation period, consistent with the time-dependent importance patterns observed in Figure 3.

Figure 4.

Partial survival dependence plot. It is an overall interpretation of the most important features, which shows the marginal effect of multiple features on the predictions of the machine learning algorithm model. This plot offered a comprehensive interpretation of the most significant predictive features, showcasing the marginal effects of these features on the predicted outcomes of the Ranger algorithm

Discussion

This study developed and validated an ML-based prognostic model for predicting lower limb DVT risk after acute stroke, incorporating early functional rehabilitation assessments. Through retrospective analysis, we found that the RSF model, particularly its implementation based on the Ranger algorithm, demonstrated superior overall discriminatory ability (C-index: 0.841) and predictive accuracy (IBS: 0.081) in predicting DVT risk, significantly outperforming the traditional Cox proportional hazards model (C-index: 0.676; IBS: 0.107). Age and lower limb Brunnstrom stage were identified as the most important predictors.

Nevertheless, the six predictive variables identified through LASSO regression and multicollinearity diagnostics—age, stroke type, gender, tension of muscle, Brunnstrom stage (lower limb), and sitting balance—have clear clinical relevance. From a rehabilitation perspective, the innovation of this study lies in systematically incorporating functional assessment indicators into predictive models, highlighting the unique value of rehabilitation evaluation in the early identification of patients at high risk for DVT.

Time-dependent variable importance analysis showed that age was the most important predictor throughout the 15-day observation period. Studies have shown a strong link between advanced age and DVT, with our research confirming that age was a significant factor in DVT development in acute stroke patients.^7,9,35,36 Aging causes blood to become more prone to clotting, increasing the risk of DVT.³⁷

The Brunnstrom stage (lower limb) was identified as the second most important predictor, with its predictive value further increasing during the later observation period (days 10-15). This indicates that the severity of lower limb motor dysfunction is closely related to DVT risk - worse motor function leads to more severe venous stasis and higher thrombosis risk. This highlights the importance of early rehabilitation interventions, particularly those targeting lower limb motor recovery.³⁸ Partial dependence analysis showed that higher Brunnstrom stages were associated with significantly improved survival probability, confirming the protective effect of motor function recovery in preventing DVT. The Brunnstrom stages, as a classic assessment tool for post-stroke motor recovery, demonstrate a strong association with DVT risk, suggesting that rehabilitation assessment holds prospective value in thrombosis prevention.

Sitting balance ability was the third most important predictor. Although standing balance showed stronger discriminative power in univariate analysis (p<0.01), sitting balance maintained predictive value in the multivariate model. This may be because sitting balance better reflects the early mobility potential and trunk control ability of patients in the acute phase of stroke, serving as the foundation for whether patients can perform bed activities, transfers, and early sitting training.^39,40 Poor sitting balance often indicates that patients require prolonged bed rest, further increasing the risk of DVT. In contrast, standing balance primarily reflects dynamic balance capabilities during the mid-to-late rehabilitation phase, and its discriminatory power within the first 15 days of the acute phase may be limited. Therefore, sitting balance, as an indicator more closely aligned with the actual functional status of acute-phase patients, holds greater clinical predictive utility.

Notably, while gender showed significant differences in univariate analysis, its contribution in the multivariate model was relatively low. This partially aligns with findings by Pan et al,¹¹ suggesting that gender differences may be confounded by other factors (such as age and activity level) and require further research validation.⁴¹

Stroke type has been confirmed as a predictor of DVT, but its contribution remains relatively limited. Time-dependent analysis indicates that the significance value of stroke type is markedly lower than that of primary indicators such as age and motor function. This relatively weak correlation may stem from indirect mechanisms through which different stroke subtypes influence DVT: patients with hemorrhagic stroke require prolonged immobilization due to the risk of rebleeding, while ischemic stroke patients often present with systemic hypercoagulability.⁴² However, these effects are largely masked by confounding factors such as functional status and activity level, diminishing the direct predictive value of stroke type itself.

Tension of muscle shows an inverse relationship with DVT risk, where higher tone correlates with lower risk, though it is less predictive than age or motor function. This may be due to better preservation of neurological function and muscle pump activity, which supports venous return. In contrast, low muscle tone (flaccidity) leads to loss of pump function and increases venous stasis risk.^43,44 Although muscle tone has limited predictive value, its clinical relevance suggests that patients with low tone should receive closer monitoring and early interventions, such as muscle activation training and positional management.⁴⁴ Further studies should use objective measures to confirm this relationship and refine interventions.

In terms of model comparison, machine learning methods (particularly RSF) significantly outperformed traditional statistical methods, likely due to their ability to better handle complex interaction relationships and nonlinear effects between variables. This finding supports the use of machine learning algorithms in clinical prediction models, especially when dealing with complex data such as rehabilitation function assessment.^45,46

Compared to previous studies, the strength of this research lies in the systematic inclusion of rehabilitation function assessment indicators into the prediction model and the use of advanced interpretability methods (such as SurvSHAP(t)) to reveal time-dependent effects of predictors. While Kelly et al.²² found correlations between the BI and DVT, they did not establish an effective prediction system; the prediction model developed by Liu et al.⁴⁷ lacked rehabilitation function indicators. This study fills this gap and provides a practical risk assessment tool for clinical use.

However, a critical finding that warrants in-depth discussion is that despite the model’s high overall discriminatory ability (C-index), its time-dependent discriminatory ability (Cumulative/Dynamic AUC) remains relatively low (approximately 0.590). This phenomenon of high C-index, low time-specific AUC may stem from several causes: First, the most likely reason is the limited sample size in this study, particularly the relatively small number of DVT events (n=65). For complex machine learning models like RSF, a finite number of events may lead to instability in predictions at specific time points, thereby affecting the estimation of time-dependent AUC, even though the model maintains good overall ranking capability. Second, the effect strength of predictive factors may dynamically change over time. For instance, certain factors may be strong predictors in the early post-stroke period but lose influence later. The RSF model captures this complex, nonlinear overall predictive pattern, but its discriminative accuracy fluctuates at each specific time point. This finding also partially reflects the clinical reality of early post-stroke DVT occurrence: within the acute 15-day window, event occurrence may be influenced by numerous unmeasured transient factors. Consequently, while models based on baseline admission indicators can effectively distinguish high-risk from low-risk populations, achieving extremely precise daily-level risk prediction remains challenging.

Limitations

This study has several limitations. First, this was a single-center retrospective study with a limited sample size, especially the small number of DVT events. This likely contributed to the model’s lower time-specific discriminatory accuracy (C/D AUC) and increased the risk of overfitting. Future large-scale, multicenter prospective studies are needed to validate and optimize the model, thereby laying the groundwork for interventional clinical trials. Second, data constraints precluded assessment of established thrombotic biomarkers (notably D-dimer) and pertinent clinical risk factors, including atrial fibrillation and carotid artery plaques, potentially compromising model discrimination. Moreover, the study did not include patients with major established risk factors for venous thromboembolism (e.g., active malignancy, recent surgery or fractures, and oral contraceptive use), resulting in a highly selective study population and precluding assessment of the influence of these factors. In addition, the potential impact of stroke localization (anterior/posterior circulation) on DVT risk was not further analyzed. Future research should incorporate these laboratory and clinical indicators to further enhance the model’s predictive efficacy. Finally, the risk prediction window of this model is limited to within 15 days post-acute stroke; extending follow-up periods would aid in further validating and refining the model’s long-term predictive capabilities.

Conclusion

We successfully developed an ML-based predictive model for acute post-stroke DVT risk incorporating rehabilitation functional assessments. The model demonstrated robust overall predictive performance, with age and lower limb Brunnstrom stage emerging as the primary contributing factors. Although the model exhibits limitations in time-point-specific discrimination, it provides clinicians with a valuable tool for early identification of high-risk patients and the formulation of individualized rehabilitation and prevention strategies. Future work should focus on validating this model in larger prospective cohorts and incorporating laboratory indicators to further enhance its performance.

Footnotes

Acknowledgments

We acknowledge Mr. Chen and his sons for their unconditional support.

ORCID iDs

Lingling Liu

Qian Ye

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The participants of this study did not give written consent for their data to be shared publicly, so due to the sensitive nature of the research supporting data is not available.*

References

GBD 2016 Stroke Collaborators . Global, regional, and national burden of stroke, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016. Lancet Neurol. 2019;18(5):439-458.

Chen

Feng

Jiang

, et al. Stent patency rates and prognostic factors of endovascular intervention for iliofemoral vein occlusion in post-thrombotic syndrome. BMC Surg. 2022;22(1):269.

Huang

Wang

Zhang

. Association Between Blood Lipid Levels and Lower Extremity Deep Venous Thrombosis: A Population-Based Cohort Study. Clin Appl Thromb Hemost. 2022;28:10760296221121282.

Brønnum-Hansen

Davidsen Mthorvaldsen

Danish MONICA Study Group . Long-term survival and causes of death after stroke. Stroke. 2001;32(9):2131-2136.

Zhang

Zhao

, et al. Chinese Stroke Association guidelines for clinical management of cerebrovascular disorders: executive summary and 2019 update of clinical management of stroke rehabilitation. Stroke Vasc Neurol. 2020;5(3):250-259.

Powers

Rabinstein

Ackerson

, et al. Guidelines for the Early Management of Patients With Acute Ischemic Stroke: 2019 Update to the 2018 Guidelines for the Early Management of Acute Ischemic Stroke: A Guideline for Healthcare Professionals From the American Heart Association/American Stroke Association. Stroke. 2019;50(12):e344-e418.

Cheng

Huang

, et al. Individualized predictions of early isolated distal deep vein thrombosis in patients with acute ischemic stroke: a retrospective study. BMC Geriatr. 2021;21(1):140.

Chen

Yliao

. The Incidence of Deep Vein Thrombosis in Asian Patients With Chronic Obstructive Pulmonary Disease. Medicine (Baltimore). 2015;94(44):e1741.

Pan

Wang

Fang

Deng

. A nomogram based on easily obtainable parameters for distal deep venous thrombosis in patients after acute stroke. Clin Neurol Neurosurg. 2021;205:106638.

10.

Wang

Shi

Dong

Fang

. Clinical Risk Factors of Asymptomatic Deep Venous Thrombosis in Patients With Acute Stroke. Clin Appl Thromb Hemost. 2019;25:1076029619868534.

11.

Pan

Wang

Chen

Fang

. Development and Validation of a Nomogram for Lower Extremity Deep Venous Thrombosis in Patients after Acute Stroke. J Stroke Cerebrovasc Dis. 2021;30(5):105683.

12.

Zheng

Liu

Sun

, et al. Prophylaxis of deep venous thrombosis and adherence to guideline recommendations among inpatients with acute stroke: results from a multicenter observational longitudinal study in China. Neurol Res. 2008;30(4):370-376.

13.

Lin

Han

Zhou

Wang

. The incidence of venous thromboembolism following stroke and its risk factors in eastern China. J Thromb Thrombolysis. 2012;34(2):269-275.

14.

Zhu

Zhang

Zhou

Yin

Dong

. Stratification of venous thromboembolism risk in stroke patients by Caprini score. Ann Palliat Med. 2020;9(3):631-636.

15.

Hara

. Deep venous thrombosis in stroke patients during rehabilitation phase. Keio J Med. 2008;57(4):196-204.

16.

Kakkos

Gohel

Baekgaard

, et al. Editor's Choice - European Society for Vascular Surgery (ESVS) 2021 Clinical Practice Guidelines on the Management of Venous Thrombosis. Eur J Vasc Endovasc Surg. 2021;61(1):9-82.

17.

Cho

Lee

Kang

. Factors Related to Gait Function in Post-stroke Patients. J Phys Ther Sci. 2014;26(12):1941-1944.

18.

Naghdi

Ansari

Mansouri

Hasson

. A neurophysiological and clinical study of Brunnstrom recovery stages in the upper limb following stroke. Brain Inj. 2010;24(11):1372-1378.

19.

Huang

Lin

Huang

, et al. Improving the utility of the Brunnstrom recovery stages in patients with stroke: Validation and quantification. Medicine (Baltimore). 2016;95(31):e4508.

20.

Khan

Ikram

Saeed

, et al. Deep Vein Thrombosis in Acute Stroke - A Systemic Review of the Literature. Cureus. 2017;9(12):e1982.

21.

Tang

Sun

, et al. Evaluation and analysis of incidence and risk factors of lower extremity venous thrombosis after urologic surgeries: A prospective two-center cohort study using LASSO-logistic regression. Int J Surg. 2021;89:105948.

22.

Kelly

Rudd

Lewis

Coshall

Moody

Hunt

. Venous thromboembolism after acute ischemic stroke: a prospective study using magnetic resonance direct thrombus imaging. Stroke. 2004;35(10):2320-2325.

23.

Eom

Chung

Hshim

. The effects of squat exercises in postures for toilet use on blood flow velocity of the leg vein. J Phys Ther Sci. 2014;26(9):1485-1487.

24.

Zou

. A narrative review of the application of machine learning in venous thromboembolism. Vascular. 2024;32(3):698-704.

25.

Santilli

. Application of machine learning techniques to physical and rehabilitative medicine. Ann Ig. 2022;34(1):79-83.

26.

Bates

Saria

Ohno-Machado

Shah

Escobar

. Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health Aff (Millwood). 2014;33(7):1123-1131.

27.

Naringrekar

Sun

Rodgers

. It's Not All Deep Vein Thrombosis: Sonography of the Painful Lower Extremity With Multimodality Correlation. J Ultrasound Med. 2019;38(4):1075-1089.

28.

Kim

. Multicollinearity and misleading statistical results. Korean J Anesthesiol. 2019;72(6):558-569.

29.

Tibshirani

. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological). 1996;58(1):267-288.

30.

Singh

Lanchantin

Sekhon

. Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin. Adv Neural Inf Process Syst. 2017;30:6785-6795.

31.

Zou

Shi

Sun

, et al. Extreme gradient boosting model to assess risk of central cervical lymph node metastasis in patients with papillary thyroid carcinoma: Individual prediction using SHapley Additive exPlanations. Comput Methods Programs Biomed. 2022;225:107038.

32.

Zhang

Shi

Yin

, et al. A machine learning model based on ultrasound image features to assess the risk of sentinel lymph node metastasis in breast cancer patients: Applications of scikit-learn and SHAP. Front Oncol. 2022;12:944569.

33.

Krzyziński

Spytek

Baniecki

Biecek

. SurvSHAP(t): Time-dependent explanations of machine learning survival models. Knowledge-Based Systems. 2023;262:110234.

34.

Sterne

White

Carlin

, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. Bmj. 2009;338:b2393.

35.

Chen

Xiong

Xzhou

. Differences in limb, age and sex of Chinese deep vein thrombosis patients. Phlebology. 2015;30(4):242-248.

36.

Chopard

Albertsen

Epiazza

. Diagnosis and Treatment of Lower Extremity Venous Thromboembolism: A Review. Jama. 2020;324(17):1765-1776.

37.

Wei

Huo

Liu

. Predictors of deep-vein thrombosis for acute stroke at admission to a rehabilitation unit: A retrospective study. Front Neurol. 2023;14:1137485.

38.

Brunnstrom

. Motor testing procedures in hemiplegia: based on sequential recovery stages. Phys Ther. 1966;46(4):357-375.

39.

Hugues

Di Marco

Ribault

, et al. Limited evidence of physical therapy on balance after stroke: A systematic review and meta-analysis. PLoS One. 2019;14(8):e0221700.

40.

Thilarajah

Mentiplay

Bower

, et al. Factors Associated With Post-Stroke Physical Activity: A Systematic Review and Meta-Analysis. Arch Phys Med Rehabil. 2018;99(9):1876-1889.

41.

Holst

Jensen Gprescott

. Risk factors for venous thromboembolism: results from the Copenhagen City Heart Study. Circulation. 2010;121(17):1896-1903.

42.

Ageno

Becattini

Brighton

Selby

Kamphuisen

. Cardiovascular risk factors and venous thromboembolism: a meta-analysis. Circulation. 2008;117(1):93-102.

43.

van der Worp

Bvan Gijn

. Clinical practice. Acute ischemic stroke. N Engl J Med. 2007;357(6):572-579.

44.

Kahn

Lim

Dunn

, et al. Prevention of VTE in nonsurgical patients: Antithrombotic Therapy and Prevention of Thrombosis, 9th ed: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines. Chest. 2012;141(2 Suppl):e195S-e226S.

45.

Dietrich

Floegel

Troll

, et al. Random Survival Forest in practice: a method for modelling complex metabolomics data in time to event analysis. Int J Epidemiol. 2016;45(5):1406-1420.

46.

Wright

Nziegler

. ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software. 2017;77(1):1-17.

47.

Liu

Zheng

Wang

, et al. Risk assessment of deep-vein thrombosis after acute stroke: a prospective study using clinical factors. CNS Neurosci Ther. 2014;20(5):403-410.