Abstract
Accurately predicting functional outcomes in patients with spinal cord injury (SCI) helps clinicians set realistic functional recovery goals and improve the home environment after discharge. The present study aimed to develop and validate machine learning (ML) models to predict functional outcomes in patients with SCI and deploy the models within a web application. The study included data from the Japan Association of Rehabilitation Database from January 1, 1991, to December 31, 2015. Patients with SCI who were admitted to an SCI center or transferred to a participating post-acute rehabilitation hospital after receiving acute treatment were enrolled in this database. The primary outcome was functional ambulation at discharge from the rehabilitation hospital. The secondary outcome was the total motor Functional Independence Measure (FIM) score at discharge. We used binary classification models to predict whether functional ambulation was achieved, as well as regression models to predict total motor FIM scores at discharge. In the training dataset (70% random sample) using demographic characteristics and neurological and functional status as predictors, we built prediction performance matrices of multiple ML models and selected the best one for each outcome. We validated each model's predictive performance in the test dataset (the remaining 30%). Among the 4181 patients, 3827 were included in the prediction model for the total motor FIM score. The mean (standard deviation [SD]) age was 50.4 (18.7) years, and 3211 (83.9%) patients were male. There were 3122 patients included in the prediction model for functional ambulation. The CatBoost Classifier and regressor models showed the best performances in the training dataset. On the test dataset, the CatBoost Classifier had an area under the receiver operating characteristic curve of 0.8572 and an accuracy of 0.7769 for predicting functional ambulation. Likewise, the CatBoost Regressor performed well, with an R2 of 0.7859, a mean absolute error of 9.2957, and a root mean square error of 13.4846 for predicting the total motor FIM score. The final models were deployed in a web application to provide functional predictions. The application can be found at http://3.138.174.54:8501. In conclusion, our prediction models developed using ML successfully predicted functional outcomes in patients with SCI and were deployed in an open-access web application.
Introduction
Traumatic spinal cord injury (SCI) is a devastating neurologic condition with substantial socioeconomic effects on patients and their caregivers. Patients with SCI also experience a high rate of unemployment and a decreased quality of life. 1 Therefore, functional recovery is one of the main goals for patients with SCI and their families. Accurate prediction of functional outcomes can help clinicians and patients set realistic functional recovery goals and improve the home environment after discharge. Moreover, accurate prediction may help stratify patients in interventional trials. 2,3 Several studies have reported prognostic models for patients with SCI using traditional statistical methods such as linear or logistic regression using patient characteristics as predictors. 4 –9
Machine learning (ML) is a mathematical model approach that finds patterns within large amounts of sample data called “training data” and makes predictions. In certain cases, ML models outperform traditional statistical methods, 10 given their ability to detect nonlinear relationships and interactions between variables. Although the use of ML models to generate clinical predictions is promising, one drawback is that clinicians cannot readily access and use these models, unlike simpler tools such as risk scores and nomograms. A well-performing and more powerful ML model may be less attractive if it cannot be practically accessed by healthcare professionals.
The present study aimed to develop and validate an ML model for predicting independent ambulation and the total motor Functional Independence Measure (FIM) score in patients with SCI utilizing data from patients enrolled in the Japan Association of Rehabilitation Database (JARD). We also aimed to provide healthcare professionals with an open-access web application that effectively communicates functional outcomes of SCI patients predicted by the ML models. In addition, we sought to identify factors associated with good functional outcomes using the ML approach.
Methods
Patients
The study was approved by the Institutional Review Board of Chiba University Graduate School of Medicine. The study included data from the JARD from January 1, 1991, to December 31, 2015. JARD enrollment included both patients with SCI who were transferred to a participating post-acute rehabilitation hospital after receiving acute treatment and patients admitted to a participating SCI center immediately after injury and subsequently underwent post-acute rehabilitation (Fig. 1). Thus, this database contains a mixture of acute and subacute patients with SCI. The database project itself was approved by the Institutional Review Board of JARD. The requirement for informed patient consent was waived, as retracing is not possible because the data are anonymized.

Three-quarters of patients received acute care at one hospital before being transferred to a rehabilitation hospital. In this case, the number of days from injury to admission was calculated as the length of the hospital stay during the acute phase. One quarter of patients were admitted to a spinal cord injury center immediately after the injury and went directly from acute treatment to post-acute rehabilitation. In this case, the number of days from injury to admission was counted as one.
Demographic data, including age, sex, occupation, educational background, marital status, and comorbidities (hypertension, cardiovascular disease, diabetes, chronic kidney disease, etc.) were collected. The following characteristics of the SCI injury were recorded: cause of injury; presence of vertebral fracture or dislocation; surgically treated or not; other associated injuries; received a blood transfusion or not; number of days from injury to admission; length of stay in rehabilitation hospital; neurological level of injury; and, at admission and discharge from the rehabilitation hospital, scores on each item in the FIM, the American Spinal Cord Injury Association (ASIA) Impairment Scale (AIS), and the ASIA motor and sensory assessments. 11 Surgical indications and timing of surgery were not standardized due to the multi-center nature of the study. We excluded patients with non-traumatic injuries, those who expired during hospitalization, and patients presenting with an AIS E. For the total motor FIM score prediction model, we further excluded patients with missing FIM motor scores at discharge. For the functional ambulation prediction model, we additionally excluded patients who had a FIM locomotion score of 6 or more, indicating their ability to walk independently at the time of admission.
Rehabilitation setting
Rehabilitation programs for patients with SCI focused on gait and exercises related to activities of daily living. The gym exercise program comprised 40-80 min of physical therapy 5-7 days per week. The program included a range of motion, muscle strengthening, and basic motion exercises (e.g., rolling over, standing up, and walking).
Outcome variables
The FIM score is an established index of disability severity that is widely used in rehabilitation settings. 12 The instrument comprises 18 items, each assessed on a 7-point ordinal scale; higher scores correlate to higher levels of independent activity. Motor FIM consists of 13 activity items (eating, grooming, bathing, upper body dressing, lower body dressing, toileting, bladder management, bowel management, bed-to-chair transfer, toilet transfer, tub/shower transfer, locomotion [walk/wheelchair], and stairs), with scores ranging from 13 (totally dependent) to 91 (totally independent). Cognitive FIM consists of five items (comprehension, expression, social interaction, problem solving, and memory).
The primary outcome in this study was functional ambulation at discharge from the rehabilitation hospital, evaluated by the FIM locomotion (walk/wheelchair) score. “Functional ambulation” was defined as a score of 6 (modified independence) or 7 (complete independence) where the mode of ambulation was either walking or walking and using a wheelchair equally. “Not functional ambulation” was defined as a score of 1 to 5 for any mode of locomotion. 13 The secondary outcome was the total motor FIM score at discharge from the rehabilitation hospital.
Machine learning models
We used PyCaret version 2.3.10 (https://pycaret.org) to apply ML to our dataset. PyCaret is an open-source, low-code ML library in Python that automates ML workflows. All ML algorithms were implemented in Python 3.8 using Visual Studio Code version 1.77.3 (Microsoft Corporation, WA, USA).
The list of features input into the model is displayed in Table 1. Proportions for missing values for each feature are shown in Table 2 and any features that had more than 30% of the values missing were removed. Iterative imputation was used for missing values. LightGBM and Random Forest were set as the numeric iterative imputer and the categorical iterative imputer, respectively.
List of Features Input into the Machine Learning Model
OPLL, ossification of the posterior longitudinal ligament; OLF, ossification of the ligamentum flavum; AIS, American Spinal Cord Injury Association Impairment Scale; ASIA, American Spinal Cord Injury Association; FIM, Functional Independence Measure.
Baseline Characteristics at Admission of Patients (N = 3827) Who Were Included in the Prediction Model for the Total Motor FIM Score
AIS, American Spinal Cord Injury Association impairment scale; ASIA, American Spinal Cord Injury Association; C, cervical; FIM, Functional Independence Measure; L, lumbar; S, sacral; SD, standard deviation; T, thoracic.
Using the Boruta algorithm, 14 the feature dimensions were reduced to eight features in the binary classification model and 12 features in the regression model. Boruta is a Python package designed to take the “all-relevant” approach to feature selection.
After preprocessing the dataset, we compared the ML models using the compare_models function in PyCaret. This function trains all the models in the model library with default hyperparameters and evaluates performance metrics using a 10-fold cross-validation in the training dataset. We used binary classification models to predict whether functional ambulation was achieved, as well as regression models to predict total motor FIM scores at discharge. The models used in this step are listed in Table 3 and Table 4 for functional ambulation and total motor FIM score, respectively. Then, we selected the best performing model from the training dataset and optimized the hyperparameters using the tune_model function in PyCaret.
Comparing the Performance of Binary Classification Models for Predicting the Achievement of Functional Ambulation
AUC, area under the receiver operating characteristic curve; Prec., precision.
Comparing the Performance of Regression Models for Predicting the Total Motor FIM Score
FIM, functional independence measure; MAE, mean absolute error; RMSE, root mean square error.
Model explainability
Shapley additive explanation (SHAP) values were computed for the top-performing models to allow for model explainability. SHAP, which is characterized as a “game-theoretic technique for understanding the output of any ML model,” was used to assess the impact of each variable on the model. 15
Model deployment to web application
The final algorithms for independent ambulation and total motor FIM score prediction were deployed within an open-access web application. Utilizing Streamlit (https://streamlit.io), an open-source app framework for ML and data science projects, the application was deployed on an Amazon Web Services (AWS) Elastic Compute Cloud (EC2) server.
Statistical analysis
The dataset was randomly split 7:3 into training and testing subsets. The training set was used for model training to solve binary classifications (n = 2185) and regression problems (n = 2678). Then, we trained each model using 10-fold cross-validation in the training set, and the model performance was assessed on the test set (n = 937 and n = 1149 for binary classifications and regression problems, respectively).
Accuracy, area under the curve (AUC), recall, precision, and F1 values were calculated to evaluate the binary classification models. R2, mean absolute error (MAE), and root mean square error (RMSE) were used to compare performance of the regression models. All analyses were conducted using the PyCaret version 2.2.3 in Python 3.8.
Results
Baseline patient characteristics
Of the 4181 patients in JARD, 3827 patients met the study inclusion criteria for the total motor FIM score prediction model and 3122 patients were included for the functional ambulation prediction model. A flowchart of the patient selection process is presented in Figure 2. Table 2 shows baseline characteristics of the 3827 patients who were included in the prediction model for the total motor FIM score; of these, 1266 (33.1%) were AIS A, 386 (10.1%) AIS B, 921 (24.0%) AIS C, and 1052 (27.5%) AIS D at admission to inpatient rehabilitation. There were 951 (24.8%) patients who were admitted on the day of injury.

Flowchart of patient selection. AIS, American Spinal Cord Injury Association impairment scale; FIM, Functional Independence Measure.
Evaluation metrics
The mean ± SD FIM locomotion (walk/wheelchair) score was 2.4 ± 2.1 at admission and 5.0 ± 2.1 at discharge. The mean ± SD total motor FIM score was 30.5 ± 22.3 at admission and 58.4 ± 28.8 at discharge. The average ± SD length of stay was 168.2 ± 138.9 days.
Model performance
Table 3 shows the results of compare_models in PyCaret with a summary of the accuracy, AUC, recall, precision, and F1 score for each binary classification model predicting the achievement of functional ambulation. In the training dataset, the CatBoost Classifier showed the highest AUC (i.e., 0.8407) and accuracy of 0.7615 for predicting functional ambulation. On the test dataset, the tuned CatBoost Classifier had an AUC of 0.8572, accuracy of 0.7769, recall of 0.8529, precision of 0.7645, and F1 score of 0.8063. The receiver operating characteristic curve and confusion matrix of the CatBoost Classifier for predicting functional ambulation are shown in Figure 3A and 3B, respectively.

The performance matrix of the best model for predicting the achievement of functional ambulation.
Table 4 shows the results of compare_models in PyCaret with a summary of the MAE, RSME, and R2 of each regression model predicting the total motor FIM score. In the training dataset, the CatBoost Regressor exhibited the second lowest MAE (i.e., 9.4766) and the lowest RMSE (i.e., 13.8682) and the highest R2 (i.e., 0.7635) for predicting the total motor FIM score. On the test dataset, the tuned CatBoost Regressor had an MAE of 9.2957, RMSE of 13.4846, and R2 of 0.7859. The prediction error plot of the best model for predicting the total motor FIM score at discharge from the rehabilitation hospital is shown in Figure 4.

Prediction error plot of the best model for predicting the total motor FIM score. FIM, Functional Independence Measure.
Model explainability
ASIA motor score, age, total FIM score, number of days from injury to admission, and neurological level of injury had high SHAP values and were identified as factors that strongly influence the model's output. SHAP values in the functional ambulation and total motor FIM score models are shown in Figure 5A and 5B, respectively.

SHAP values for prediction models.
Model deployment to web application
The final prediction models are available as a web application at http://3.138.174.54:8501 (Fig. 6).

Web application interface for the two final algorithms. Left panel, input tab for multiple features; main frame, prediction results.
Discussion
In the present study, we showed that using a large multi-center dataset from JARD, ML models could successfully predict functional outcomes. Further, we found that functional outcomes predicted by the model are strongly influenced by certain factors, including ASIA motor score, age, total FIM score, number of days from injury to admission, and neurological level of injury. We deployed these ML models in an open-access web application.
Our ML models could stratify the prognosis of walking ability and predict total motor FIM score in patients with SCI based on neurological and functional status and demographic data at admission to inpatient rehabilitation. Existing studies on the application of ML to predict functional outcomes in SCI are limited. Four studies had sought to predict neurological or functional outcomes of patients with SCI using an ML approach. Among them, Inoue and colleagues analyzed data from 165 patients with SCI and applied a two-class discrimination model using XGBoost to predict AIS (A, B, C or D, E) 6 months after SCI with an accuracy of 81.1% and an AUC of 0.867. 10 Facchinello and colleagues applied a regression tree algorithm for predicting the long-term functional outcome of 172 patients using the Spinal Cord Independence Measure following traumatic SCI. 16 Regression tree models demonstrated R2 values of 0.517 and 0.632 for the simplified and complete models, respectively. 16 The prediction models from the first two studies may be less stable given the small sample size of the datasets. Belliveau and colleagues used an artificial neural network (ANN) and logistic regression to predict walking recovery following SCI. The AUC for determining who could walk 150 ft at 1 year after hospital discharge was 0.8801 and 0.8754 for ANN and Logistic Regression, respectively. 17 DeVries and colleagues applied unsupervised ML models to predict functional ambulation one year after injury using FIM locomotion from a dataset of 862 SCI patients. 13 The proposed unsupervised ML model demonstrated an AUC of 0.86, showing no differences compared to the earlier conventional models.
More recently, studies have combined novel imaging and ML approaches to predict functional outcomes after SCI. McCoy and colleagues demonstrated that injury volume of the spinal cord as observed in magnetic resonance images derived from a deep learning segmentation model significantly correlated with motor scores at admission and discharge. 18 Okimatsu and colleagues reported that the combination of deep learning radiomics and random forest algorithms could distinguish between five grades of the ASIA impairment scale 1 month after injury with an accuracy of 0.715. 19
Despite the complexity of SCI pathology and the variability in our patient cohort, we achieved a favorable AUC for predicting functional ambulation and a high R2 for predicting the total motor FIM score. Regarding the prediction models for functional ambulation in patients with SCI, Van Middendorp and colleagues used a logistic regression model based on 492 patients, and they had an excellent AUC of 0.956. 2 A similar AUC (0.939) was achieved by external validation of Van Middendorp and colleagues' model on 184 patients by Van Silfhout and colleagues. 20 Hicks and colleagues reported a logistic regression model based on 278 patients that achieved an AUC of 0.889. 21 Another logistic regression model by Phan and colleagues based on 675 patients achieved AUCs ranging from 0.516 to 0.730 for each AIS A to D at admission. 22 The AUC of the present ML model for ambulation was slightly inferior to those previously reported. 2,20,21 This may be due to the heterogenous cohort in the present study, which includes both acute and subacute patients. Moreover, having a higher proportion of patients with AIS A or AIS D could inflate the model's predictive accuracy, 21 as it is intuitive to understand that patients with AIS A experience unfavorable gait outcomes and patients with AIS D experience good gait outcomes. The proportion of patients with AIS A or AIS D at admission was 57% in our classification model, 71% in Van Middendorp and colleagues' model, 2 68% in Hicks and colleagues' model, 21 and 76% in Phan and colleagues' model. 22
Regarding prediction models for the total motor FIM score, Wilson and colleagues reported a linear regression using four predictors 1 year after injury with an R2 value of 0.52. 7 An R2 value of 0.72 was reported by Abdul-Sattar for a linear regression between the motor FIM score and five predictors in the acute phase. 6 Another linear regression model was described by Post and colleagues, reporting an R2 value of 0.49 for model training. 23 The R2 of 0.7859 in our predictive model for total motor FIM score was better than those previously reported. 6,7,23 This may be because ML can uncover relationships within complicated datasets that standard linear regression may overlook.
The model we built here identified ASIA motor score, age, total FIM score, number of days from injury to admission, and neurological level of injury at admission as important features for predicting functional outcomes in patients with SCI. These results are consistent with findings presented previously. Recent systematic reviews 24 -26 suggest that the initial severity (based on AIS) of a traumatic SCI was the main predictor of functional outcomes. The ASIA motor score also was a significant predictive factor of functional outcomes because it directly correlates with AIS grade. 27 However, the ASIA motor score does not reflect neurologic sacral evaluation, which is considered a crucial factor in neurological and functional recovery. 28 Age is also consistently found to affect long-term functional status. 24,25,29 The final functional score was correlated with baseline functional status at discharge from acute care or at admission to the rehabilitation hospital. 24 This is consistent with the final functional outcome correlating with the total FIM score on admission in the present study. A shorter stay in acute care hospitals was similarly associated with improved functional recovery after inpatient rehabilitation. 6,8 Shorter length of stay in acute care may also reflect underlying factors such as fewer complications, comorbidities, or associated injuries that could lead to functional recovery. 24
In addition, we identified the number of days from injury to hospitalization as an important feature that may be considered for use in algorithms stratifying patients into acute and subacute phases. Neurological level of injury was another important predictor of functional recovery, especially when patients were divided into groups based on quadriplegia and paraplegia, 6 or when considering cases with the same severity of injury. 30 -32 While the trend toward early surgical management for SCI is recognized, there is uncertainty regarding the impact of the timing of surgical decompression following SCI, 4,8,33 and the timing of surgery was not recorded in the JARD.
This study created a modern ML model with the largest dataset to date of patients with SCI to predict functional outcomes, and this model was made publicly available in the form of a web application. An early and precise prediction of functional outcomes in patients using ML models can help healthcare professionals promote efficient care, optimize treatments, set realistic goals, and assess the effect of cutting-edge therapies in clinical trials. 34 Given these benefits, ML is expected to be an essential tool for making personalized medicine more common.
Limitations
There are several limitations in this study. First, the final motor FIM measurement was taken at discharge, and discharge timing varied between patients. Nonetheless, our study population, based in a country with low patient hospitalization costs, often stays hospitalized for extended rehabilitation and discharges around 6 months in average when neurologic and functional outcomes are known to plateau. 35 Second, the patients included in this study were a mixture of those in the acute and post-acute phases, making it difficult to perform a uniform initial evaluation at the time of admission. However, the number of days from injury to admission was included as a factor in our model, and the model could be applied to both acute and post-acute phases, which makes the model available for a larger population range. Third, by excluding deceased cases from the analysis, our model may not accurately predict the prognosis of cases that could potentially result in death. Fourth, some of the injury details were unavailable in the database, such as the timing of surgery and certain MRI characteristics of the spinal cord. Although these elements could have further enhanced the prediction accuracy of our ML model, this data was not available due to the nature of the retrospective study using a database.
Conclusions
To conclude, in the present study we showed that prediction models using ML could successfully predict functional outcomes, including ambulation and total motor FIM score. Further, our ML models identified ASIA motor score, age, total FIM score, number of days from injury to admission, and neurological level of injury as factors that strongly impact functional outcomes. We deployed the proposed ML models in an open-access web application with a user friendly interface for healthcare professionals.
Footnotes
Acknowledgments
We acknowledge the Japan Association of Rehabilitation Database for establishing the Japan Rehabilitation Database, which served as a core resource for this study.
Disclaimer: The views presented here are those of the authors and do not necessarily represent the views of the Japan Association of Rehabilitation Database. The registration data is not a representative sample of rehabilitation in Japan as well as rehabilitation in the web application user's country because most of the facilities that participated in the Japan Association of Rehabilitation Database were actively engaged in the rehabilitation of SCIs.
Authors' Contributions
Satoshi Maki had full access to all the data in the study and took responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Satoshi Maki.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Satoshi Maki.
Critical revision of the manuscript for important intellectual content: All authors. Statistical analysis: Satoshi Maki.
Obtained funding: Satoshi Maki.
Administrative, technical, or material support: All authors.
Supervision: Seiji Ohtori.
Funding Information
This work was supported by a research grant funded by the JOA-Subsidized Science Project Research 2020-1 and JSPS KAKENHI Grant Number JP20K18052.
Author Disclosure Statement
No competing financial interests exist.
