Abstract
Study Design
Retrospective Cohort Study.
Objective
De novo spinal infections (DNSI) require prompt treatment decisions, especially on conservative or surgical management. The Spinal Infection Treatment Evaluation (SITE) score was developed in 2022 to standardize treatment recommendations, but external validation remains limited.
Methods
In a retrospective, single-center study, we analyzed consecutive patients with DNSI treated between January 2016 and December 2022. We calculated SITE scores based on admission parameters and compared score-based treatment recommendations with actual clinical decisions. The primary outcome was concordance between SITE recommendations and treatment decisions. Secondary outcomes included individual component performance and discriminatory ability, as assessed using receiver operating characteristic analysis.
Results
We identified 95 patients with DNSI (mean age 66.4, 66.3% male). SITE scores ranged from 4-13 points. Treatment concordance with SITE recommendations was noticed in 82.1% of cases. In multivariate analysis, neurological status (OR 0.48, P = 0.006) and host comorbidities (OR 0.34, P = 0.035) were significant predictors of surgical treatment. The complete SITE score demonstrated a moderate ability (AUC 0.70) to classify for conservative or surgical treatment.
Conclusions
The SITE score demonstrated moderate predictive accuracy with good treatment concordance (82.1%) and clinically useful discriminatory performance (AUC 0.70). Its neurological component showed particular clinical value. However, clinicians were frequently found to override score recommendations for reasons not captured in the current SITE scoring system.
Level of Evidence
IV/3.
Highlights
• First external validation of the SITE score demonstrates moderate predictive accuracy (AUC 0.70) and good treatment concordance (82.1%) for surgical decision-making in de novo spinal infections • Neurological status emerges as the strongest predictor of surgical intervention (OR 0.48, P = 0.006), with all patients presenting acute plegia or bladder/bowel dysfunction undergoing surgery • Clinicians frequently override SITE score recommendations based on factors not captured in the scoring system, indicating the need for refined assessment tools incorporating broader clinical parameters
Introduction
De novo spinal infections (DNSI) are increasingly prevalent globally and present potentially life-threatening conditions requiring prompt treatment decisions.1-5 Mortality is influenced by comorbidities, immunodeficiency, and risk factors such as intravenous drug abuse (IVDA), alcohol abuse, and advanced age, with progression potentially leading to sepsis. 6 Management typically involves multidisciplinary collaboration based on clinical assessment, laboratory testing, and imaging studies. Treatment focuses on infection control through pathogen identification via CT-guided biopsy, intraoperative sampling, and blood cultures to guide antibiotic therapy. 7 The critical decision between conservative treatment (antibiotics and analgesics) vs surgical therapy (debridement, sampling, and instrumentation/fusion) remains challenging due to limited standardized guidelines.8-10
To address this clinical uncertainty various treatment algorithms and scoring systems have been established.6,11-13 The Spinal Infection Treatment Evaluation (SITE) Score was developed and published in 2022, incorporating statistically significant parameters like neurological status, infection location, radiological characteristics (including spinal deformity), pain levels, and host comorbidities (Figure 1A).
13
While initial validation with 30 patients showed promising inter- and intra-observer reliability, subsequent external validation studies have provided mixed results regarding the score’s generalizability across diverse patient populations and clinical settings. Two external validation studies from the United States and one from Asia have demonstrated varying performance characteristics, highlighting potential regional differences in treatment approaches.14-17 However, European validation data remains lacking, representing a significant gap in understanding the SITE score’s applicability across different healthcare systems. SITE Score and distribution of the frequency of the SITE Score in the cohort. (A) Tabular representation of the composition of the SITE Score according to Pluemer et al. (left panel). (B) Distribution of SITE Score values in study cohort histogram showing the frequency distribution of SITE Scores among 95 patients, with the complete range spanning 4-14 points. Cases of patients with acute plegia or bladder/bowel dysfunction are marked in red (n = 5), all other cases are shown in blue (right panel).
Hence, this study aimed to provide the first European external validation of the SITE score by assessing its accuracy in predicting the need for surgical intervention in DNSI patients.
Methods
Study Design and Setting
This retrospective, single-center study was conducted at the Interdisciplinary Spine Center, HOCH Health Ostschweiz, Cantonal Hospital St. Gallen. We included consecutive patients with DNSI between January 2016 and December 2022, who had infectious spondylodiscitis, spondylitis, epidural abscess, or paraspinal abscess confirmed by microbiological culture, or histopathological tissue analysis. Patients were identified through hospital database review and Diagnosis Related Groups (DRG) coding indicating spinal infection. Inclusion criteria were: (1) confirmed spinal infection through microbiological or histopathological analysis, (2) complete medical records available, and (3) general consent for deidentified research. Exclusion criteria included: (1) spine surgery within 90 days before presentation, (2) non-infectious inflammatory, degenerative, or neoplastic spine conditions in the affected segment, and (3) incomplete data preventing SITE score calculation.
Data Collection
Medical records were systematically reviewed and data from four timepoints was entered into a SecuTrial® database: hospital admission, surgery (if applicable), outpatient consultation at approximately 90 days post-discharge, and at 12 months post-discharge.
Variables collected included demographics (age, sex, body mass index (BMI), drug abuse), clinical indices (Charlson Comorbidity Index (CCI), 18 Canadian Frailty Index (CFI),19,20 ASA grade, 21 modified McCormick grade22,23), infection characteristics (anatomical region, segments affected, infection type, bacterial cultures, antibiotic resistance), laboratory values (CRP, leukocytes, thrombocytes, erythrocytes, hemoglobin, hematocrite and eGFR), treatment details, complications, and outcome parameters including MacNab criteria. 24 Additional assessments included ASIA Impairment Scale (AIS), neurological status and pain, imaging parameters, comorbidities, and presence of sepsis. 25
SITE Score Calculation
The SITE score evaluates spinal infections across five categories: neurological deficits (1-3 points), spinal location (1-4 points), radiological findings (1-5 points), pain severity (0-2 points), and host comorbidities (0-1 points). Total scores range from 3-15 points, stratifying patients into three categories: severe (3-8 points, surgery recommended), moderate (9-12 points, surgery optional), and mild infections (13-15 points, conservative management recommended). 13 All patients presenting with acute plegia or bladder/bowel dysfunction underwent surgical intervention in the development cohort, representing an absolute surgical indication regardless of other clinical parameters according to the authors. For complete information regarding validation methods and practical clinical application, we refer to the original research published by Pluemer and colleagues in 2023. 13 Our group recently evaluated the SITE score applicability and reliability among non-spine surgeons. 10
Treatment Decision Making
The treatment decisions for patients with DNSI at our institution involve interdisciplinary collaboration between spine surgery and infectious disease services. Decisions are individualized based on neurological status, extent of infection, spinal instability, pain control, patient mobility, and comorbidities. For critically ill septic patients, treatment decisions consider both the risk of conservative management and surgical intervention. 26
Statistical Analysis
Statistical analysis was performed using Stata v18 SE (StataCorp LLC, College Station, TX). Patient characteristics, disease-specific variables, and treatment outcomes were described using frequencies for categorical variables and mean ± standard deviation for continuous variables. Associations between SITE score components and treatment decisions were examined using Pearson’s chi-square tests and logistic regression models. We evaluated the concordance between the SITE score recommendation and actual treatment decisions. Mean SITE scores were compared between groups using Student's t-test to enable comparison with previous external validation studies that reported mean values. The discriminatory ability was assessed using receiver operating characteristic (ROC) curve analysis, with area under the curve (AUC) values of 0.7-0.8 considered acceptable, 0.8-0.9 excellent, and >0.9 outstanding. 27 To address the prediction issue with new-onset severe neurological deficits, we developed a hierarchical model that assigned a 99% predicted probability of surgery to all patients with acute plegia or bladder/bowel dysfunction and used the SITE score-based prediction for all other patients. Probability values of <0.05 were considered statistically significant
Ethical Considerations
The institutional review board of Eastern Switzerland approved the study (BASEC ID 2023-01343) with waiver for informed consent due to the retrospective design and use of anonymized data.
Results
Demographics and Clinical Characteristics
Basic Demographic Data of n = 95 Consecutive Patients, Presenting With De-Novo Spinal Infection. Results Are Displayed as Mean (Standard Deviation; Range) or Count (Percent)
Abbreviations: BMI, body mass index; CCI, Charlson comorbidity index; CFI, Canadian frailty index; CRP, C-reactive protein; GFR, Glomerular filtration rate.
*Daily consumption of at least 3 beverages.
**Includes intake of central nervous system (CNS) depressants, CNS stimulants, hallucinogens, dissociative anesthetics, narcotic analgesics, inhalants, cannabis.
Disease-Specific Characteristics
Disease-Specific Data of n = 95 Consecutive Patients With De-Novo Spinal Infection
Abbreviations: CRP, C-reactive protein; GFR, Glomerular filtration rate; spp, species.
*Other bacteria include: Veillonella spp. (n = 1), Campylobacter spp. (n = 1), Citrobacter spp. (n = 1), Enterococcus spp. (n = 1), Haemophilus spp. (n = 1), Klebsiella spp. (n = 1), and Mycobacterium spp. (n = 1). **Among patients requiring surgery, instrumented spine surgery was performed in 18 patients (18.9%). Note. Results are displayed as mean ± standard deviation (range) or count (percent).
SITE Score Distribution and Component Analysis
Distribution of the SITE Score Variables Among n = 95 Consecutive Patients, Presenting With De-Novo Spinal Infection. Results Are Displayed as Count (Percent). Significant values (p<0.05) are marked in bold.
The full description for the Radiology options are: (1) Spinal canal stenosis w/impingement of central neural elements w/or w/o de novo deformity, (2) Segmental angulation or translation w/de novo deformity/foraminal stenosis OR erosion of vertebral body on CT >50% OR PL involvement on both sides, (3) Visible endplate erosion on CT OR edema of vertebral body >50% on MRI OR intervertebral disc involvement on MRI OR PL involvement on 1 side, (4) None of these radiological findings
SITE Score Comparison Between Treatment Groups
Patients who underwent surgical treatment had significantly lower SITE scores compared to those managed conservatively. The surgical group had a mean SITE score of 7.9 ± 1.8 (median 8.0, IQR 6-9) vs 9.1 ± 1.9 (median 9.0, IQR 8-11) in the conservative group (P = 0.001). The Mann-Whitney U test confirmed this significant difference in SITE score distributions between treatment groups (P < 0.001).
Predictors of Surgical Intervention
Analysis of SITE Score Components and Correlation With Surgical Decision. Results are Displayed as Odds Ratio (OR) with 95% Confidence Interval (CI). Significant values (p<0.05) are marked in bold.
Abbreviations: IVDA, Intravenous Drug Abuse.
aPearson’s chi-square test.
bLogistic regression.
cPerfect prediction (no statistical test performed).
Estimation of the Effect Size Between the Singular Items or the Summary Score of the SITE Score and the Likelihood of n = 95 Consecutive Patients, Presenting With De-Novo Spinal Infection, to Require Surgical Treatment. Results are Displayed as Odds Ratio (OR) with 95% Confidence Interval (CI). Significant values (p<0.05) are marked in bold.
Abbreviation: SITE: Spinal Infection Treatment Evaluation.
Treatment Concordance and Score Performance
Overall treatment concordance with SITE score recommendations occurred in 82.1% of cases. In the surgery-recommended category, 66.0% received surgery; in the optional category, 39.5% received surgery; and 100% of patients (n = 2) in the conservative category were managed conservatively (Figures 2 and 3). The association between SITE score recommendations and actual treatment was statistically significant (χ2 = 8.76, P = 0.012). Using the original SITE score cutoff of ≤8 points (including patients with acute plegia or bladder/bowel dysfunction as surgical candidates regardless of score), the scoring system demonstrated a sensitivity of 66.0% (33/50 surgical cases correctly identified). Comparison of concordance to the SITE Score treatment recommendation based on the score value. Treatment concordance by SITE Score value bar chart comparing the proportion of patients treated according to SITE Score recommendations across different score values, demonstrating variable adherence patterns. Comparison of surgical versus non-surgical treatment of the cohort based on the recommendation by the SITE Score. Treatment patterns by SITE Score recommendation categories bar chart showing the distribution of surgical (red bars) versus conservative (green bars) treatment within each SITE Score recommendation category (surgery recommended: 3-8 points or patients with acute plegia or bladder/bowel dysfunction, optional: 9-12 points, conservative: 13-15 points).

Discriminatory Performance
ROC curve analysis of individual SITE score components showed AUC ranges of 0.11-0.29 when assessed separately. Excluding cases with acute plegia or bladder/bowel dysfunction resulted in AUC 0.67 (95% CI 0.56-0.78). Multivariate logistic regression incorporating all SITE score components demonstrated an AUC 0.71 (95% CI 0.60-0.82). The hierarchical model demonstrated acceptable discriminatory performance with an AUC of 0.70 (95% CI 0.60-0.81; Figure 4). ROC curve for SITE Score predicting surgery. ROC curve analysis of SITE Score performance in predicting surgical intervention the receiver operating characteristic curve demonstrates poor discriminatory ability of the SITE Score (AUC 0.67, 95% CI 0.54-0.88) when excluding patients with acute plegia or bladder/bowel dysfunction (left panel). Hierarchical model incorporating acute plegia cases (Assigned 99% predicted probability of surgery) shows improved discriminatory performance (AUC 0.70, 95% CI 0.60-0.81) (right panel).
Discussion
This external validation study demonstrates that the SITE score showed acceptable predictive accuracy for surgical decision-making in our cohort of 95 consecutive patients with DNSI. With an overall concordance rate of 82.1% and a moderate discriminatory ability (AUC 0.67 for the complete score excluding acute plegia cases, and AUC 0.70 for the hierarchical model), our findings suggest that while the SITE score shows some clinical utility, its performance may vary across different clinical settings and patient populations. 27 The hierarchical approach, which assigned 99% predicted probability to patients with acute plegia or bladder/bowel dysfunction and used the SITE score for the remaining patients, demonstrated the best discriminatory performance (AUC 0.70), approaching acceptable clinical utility thresholds. The discrepancy between our external validation results and the original study with internal validation highlights the challenges inherent in developing universally applicable clinical decision tools for complex conditions like spinal infections, where treatment decisions involve multiple interacting factors that may vary significantly between institutions and clinical contexts.5,7,11,12
Performance of Individual SITE Score Components
The neurological component emerged as the most consistently significant predictor of surgical intervention (OR 0.48, 95% CI 0.28-0.81, P = 0.006). All patients with acute plegia or bladder/bowel dysfunction underwent surgery, reflecting the universal recognition of severe neurological compromise as a surgical emergency.28-30 However, only 73.3% of patients with motor dysfunction received surgical treatment, suggesting that the severity and progression of neurological deficits, rather than their mere presence, may drive surgical decisions or the aggressiveness of treatment. This finding aligns with fundamental surgical principles and supports the inclusion of neurological status as a critical component in any treatment algorithm for spinal infections. The strong predictive value of this component suggests it should carry significant weight in clinical decision-making.30,31
The host comorbidity component showed a significant association with surgical intervention (OR 0.34, 95% CI 0.13-0.93, P = 0.035), with patients having IVDA or diabetes mellitus undergoing surgery more frequently (69.0% vs 45.5%). This finding aligns with clinical expectations, as these high-risk patients may present with more aggressive infections requiring surgical drainage, or institutional protocols may favor early surgical debridement in immunocompromised patients to prevent treatment failure.32-35 The narrow scope of the comorbidity component, limited to only IVDA and diabetes mellitus, may inadequately capture the complexity of patient comorbidity burden. Other scoring systems like the ASA classification or CCI, 23 which were collected by us but were not incorporated into the original SITE score, might provide more comprehensive risk stratification.
While radiological findings showed initial significance in univariate analysis (P = 0.026), this was not maintained in the multivariate model (P = 0.114), suggesting confounding by other clinical factors. The subgroup analysis revealed important patterns: patients with spinal canal stenosis had a relatively high surgical rate (76.9%), while those with segmental angulation (40.0%) or endplate erosion (40.5%) were less likely to undergo surgery. This pattern indicates that not all radiological abnormalities carry equal surgical weight, and the current SITE score may not adequately differentiate between radiological findings that mandate immediate intervention vs those that can be managed conservatively with close monitoring. Our differing results – compared to the original internal validation cohort – may also indicate differences in management strategies, in particular regarding instrumented fusion for suspected instability, between Europe and the United States. 36
Neither spinal location nor pain severity demonstrated significant associations with surgical decisions in our cohort, contrasting with the original SITE development study. This discrepancy may also reflect differences in institutional treatment philosophies or patient selection criteria. The lack of significance for these components questions their universal applicability and suggests that local practice patterns may significantly influence their predictive value.
Comparison With External Validation Studies
Our study represents the first European external validation of the SITE score, adding crucial geographic diversity to the validation literature. Three prior studies from the United States and Asia demonstrate varying performance characteristics, highlighting regional differences in treatment approaches.15-17
Xiong et al (2025) evaluated 213 patients and reported superior discriminatory performance with an AUC of 0.74 (95% CI 0.67-0.81) compared to our AUC of 0.70. 17 Mean SITE scores in surgical patients were significantly lower in the Stanford cohort compared to our study (5.6 vs 7.9), indicating a more severely affected patient population, which was more likely to undergo surgery. In both studies, conservatively managed patients had significantly higher SITE scores (7.5 and 9.1 respectively), reflecting the expected pattern of less severe infections being less likely to require surgery. Using the established cutoff of 8 points for “severe” infection, they achieved a sensitivity of 81.3%, while our study demonstrated a sensitivity of 66.0%. The higher sensitivity in the Stanford cohort suggests a potentially lower threshold for surgical decision-making, while the philosophy in our clinics remains “conservative” whenever possible. The Quiceno et al. study from another US tertiary center (2025) included 194 patients over 19 years and reported the lowest discriminatory performance (AUC 0.66), inferior to both the Stanford study (0.74) and our European cohort (0.70). 16 Mean SITE scores showed less difference between surgical and conservative patients (7.2 vs 8.2) compared to other studies, potentially explaining the reduced discriminatory ability. Lastly, an Iranian validation study (2024) reported the highest discriminatory performance with an AUC of 0.86, which is higher than for all other external validation studies. 15 This variability in discriminatory performance across different centers and continents emphasizes the influence of institutional factors and regional treatment protocols on the practical utility of the SITE score.
The concordance rates also varied significantly across studies: our European cohort achieved 82.1% concordance, while the other studies reported variable concordance depending on the scoring methodology employed. These differences suggest that while the SITE score captures many important factors influencing treatment decisions, regional variations in clinical practice, healthcare systems, and treatment philosophies significantly impact its practical performance.
Comparison With Original SITE Score Development
Methodological differences between our retrospective validation and the original expert panel-based development may explain performance variations. Difference in data collection methodology could introduce scoring inconsistencies and affect the accuracy of component assessments. Additionally, institutional differences in treatment thresholds, surgical expertise, and multidisciplinary care protocols may significantly impact the applicability of any scoring system. Our institution’s collaborative approach between spine surgery and infectious disease services may result in treatment decisions that differ systematically from those at other centers, affecting score performance.37,38
The good concordance rate (82.1%) suggests the SITE score captures many important treatment factors, though clinically relevant elements like comprehensive comorbidity assessment, infection severity markers, and patient preference remain unaccounted for.18,21,22,39 In the category where surgery is recommended, only 66.0% of patients received surgery, indicating some clinical override of score recommendations. This pattern suggests that clinicians consider additional factors beyond those included in the SITE score when making treatment decisions. This finding is consistent across the North American external validation studies, suggesting that the current SITE score, while useful, does not capture all clinically relevant decision-making factors.16,17 The small sample in the conservative-recommended category (n = 2) limits interpretation, but both patients were successfully managed conservatively, consistent with score recommendations.
Strengths and Limitations
Our article represents an external validation of the SITE score, providing crucial evidence about its generalizability. The consecutive patient inclusion minimizes selection bias, and the comprehensive analysis of individual components offers insights into their relative clinical relevance. Our cohort’s demographic and clinical characteristics align well with published spinal infection epidemiology, supporting the validity of our findings.40,41 As the first European validation study, our work fills an important geographic gap in the external validation literature and demonstrates that the SITE score maintains acceptable performance across different healthcare systems.
The retrospective design introduces potential information bias and limits the quality of certain assessments.37,38 Missing data for key variables (particularly CCI and ASA classification) reduced analytical power. The single-center design, while providing consistency in treatment protocols, may limit generalizability to other healthcare settings. The limited sample size, particularly regarding certain subgroups, results in limited statistical power.
Clinical Implications and Recommendations
Based on our findings, we recommend using the SITE score as a useful clinical tool in decision support. The neurological component appears particularly valuable and should be weighted heavily in clinical decision-making. Clinicians should recognize that while most treatment decisions align with score recommendations, some cases may appropriately deviate based on factors not captured by the scoring system. The score demonstrates particular value as a structured framework for non-spine specialists to evaluate key parameters and determine when spine surgery consultation is warranted, even though a prior work from our group showed that “non spine surgery physicians” have difficulties with SITE score grading. 10 The moderate discriminatory performance (AUC 0.70) suggests clinical utility, especially when combined with clinical judgment. The SITE score provides a standardized approach to assessing neurological status, radiological findings, and clinical severity that can help identify high-risk patients requiring urgent evaluation by spine surgeons, while providing a common terminology for interdisciplinary communication. Large-scale prospective validation studies across multiple centers are needed to better understand the score’s performance across diverse clinical settings. Such studies should ideally include standardized training for score assessment and detailed documentation of factors influencing treatment decisions beyond those included in the current SITE score. Additionally, refinement of the scoring system to include more comprehensive comorbidity assessment and updated radiological criteria may further improve its predictive accuracy. This should also consider regional variations in treatment approaches and healthcare system factors that influence surgical decision-making in spinal infections.
Conclusion
This external validation study reveals that the SITE score demonstrated acceptable predictive accuracy for surgical decision-making in patients with DNSI, with good predictive performance (AUC 0.70) and strong overall concordance with actual clinical decisions (82.1%). The neurological component proved particularly valuable in guiding treatment selection. While the score serves as a useful structured clinical decision support tool, clinicians frequently override recommendations based on factors not captured in the original scoring system. These findings support broader implementation of the SITE score while highlighting the need for future refinement to incorporate additional decision factors relevant across different healthcare settings.
Footnotes
Ethical Considerations
The institutional review board of Eastern Switzerland approved the study (BASEC ID 2023-01343) with waiver for informed consent due to the retrospective design and use of anonymized data.
Author Contributions
Each author made substantial contribution to this article. Conception and design: all authors. Acquisition of data: L.B. Statistical analysis: F.S. and M.S. Analysis and interpretation of data: L.B., F.S, M.S. Drafting the article: L.B., F.S. M.S. Reviewed submitted version of manuscript: all authors. Approved the final version of the manuscript: all authors. Administrative/technical/material support: all authors. Study supervision: F.S. and M.S.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.
