External Validation of the SITE Score for De Novo Spinal Infections: A Retrospective Cohort Study

Abstract

Study Design

Retrospective Cohort Study.

Objective

De novo spinal infections (DNSI) require prompt treatment decisions, especially on conservative or surgical management. The Spinal Infection Treatment Evaluation (SITE) score was developed in 2022 to standardize treatment recommendations, but external validation remains limited.

Methods

In a retrospective, single-center study, we analyzed consecutive patients with DNSI treated between January 2016 and December 2022. We calculated SITE scores based on admission parameters and compared score-based treatment recommendations with actual clinical decisions. The primary outcome was concordance between SITE recommendations and treatment decisions. Secondary outcomes included individual component performance and discriminatory ability, as assessed using receiver operating characteristic analysis.

Results

We identified 95 patients with DNSI (mean age 66.4, 66.3% male). SITE scores ranged from 4-13 points. Treatment concordance with SITE recommendations was noticed in 82.1% of cases. In multivariate analysis, neurological status (OR 0.48, P = 0.006) and host comorbidities (OR 0.34, P = 0.035) were significant predictors of surgical treatment. The complete SITE score demonstrated a moderate ability (AUC 0.70) to classify for conservative or surgical treatment.

Conclusions

The SITE score demonstrated moderate predictive accuracy with good treatment concordance (82.1%) and clinically useful discriminatory performance (AUC 0.70). Its neurological component showed particular clinical value. However, clinicians were frequently found to override score recommendations for reasons not captured in the current SITE scoring system.

Level of Evidence

IV/3.

Keywords

spinal infection de novo spine discitis epidural abscess treatment algorithm

Highlights

• First external validation of the SITE score demonstrates moderate predictive accuracy (AUC 0.70) and good treatment concordance (82.1%) for surgical decision-making in de novo spinal infections

• Neurological status emerges as the strongest predictor of surgical intervention (OR 0.48, P = 0.006), with all patients presenting acute plegia or bladder/bowel dysfunction undergoing surgery

• Clinicians frequently override SITE score recommendations based on factors not captured in the scoring system, indicating the need for refined assessment tools incorporating broader clinical parameters

Introduction

De novo spinal infections (DNSI) are increasingly prevalent globally and present potentially life-threatening conditions requiring prompt treatment decisions.^1-5 Mortality is influenced by comorbidities, immunodeficiency, and risk factors such as intravenous drug abuse (IVDA), alcohol abuse, and advanced age, with progression potentially leading to sepsis.⁶ Management typically involves multidisciplinary collaboration based on clinical assessment, laboratory testing, and imaging studies. Treatment focuses on infection control through pathogen identification via CT-guided biopsy, intraoperative sampling, and blood cultures to guide antibiotic therapy.⁷ The critical decision between conservative treatment (antibiotics and analgesics) vs surgical therapy (debridement, sampling, and instrumentation/fusion) remains challenging due to limited standardized guidelines.^8-10

To address this clinical uncertainty various treatment algorithms and scoring systems have been established.^6,11-13 The Spinal Infection Treatment Evaluation (SITE) Score was developed and published in 2022, incorporating statistically significant parameters like neurological status, infection location, radiological characteristics (including spinal deformity), pain levels, and host comorbidities (Figure 1A).¹³ While initial validation with 30 patients showed promising inter- and intra-observer reliability, subsequent external validation studies have provided mixed results regarding the score’s generalizability across diverse patient populations and clinical settings. Two external validation studies from the United States and one from Asia have demonstrated varying performance characteristics, highlighting potential regional differences in treatment approaches.^14-17 However, European validation data remains lacking, representing a significant gap in understanding the SITE score’s applicability across different healthcare systems.

Figure 1.

SITE Score and distribution of the frequency of the SITE Score in the cohort. (A) Tabular representation of the composition of the SITE Score according to Pluemer et al. (left panel). (B) Distribution of SITE Score values in study cohort histogram showing the frequency distribution of SITE Scores among 95 patients, with the complete range spanning 4-14 points. Cases of patients with acute plegia or bladder/bowel dysfunction are marked in red (n = 5), all other cases are shown in blue (right panel).

Hence, this study aimed to provide the first European external validation of the SITE score by assessing its accuracy in predicting the need for surgical intervention in DNSI patients.

Methods

Study Design and Setting

This retrospective, single-center study was conducted at the Interdisciplinary Spine Center, HOCH Health Ostschweiz, Cantonal Hospital St. Gallen. We included consecutive patients with DNSI between January 2016 and December 2022, who had infectious spondylodiscitis, spondylitis, epidural abscess, or paraspinal abscess confirmed by microbiological culture, or histopathological tissue analysis. Patients were identified through hospital database review and Diagnosis Related Groups (DRG) coding indicating spinal infection. Inclusion criteria were: (1) confirmed spinal infection through microbiological or histopathological analysis, (2) complete medical records available, and (3) general consent for deidentified research. Exclusion criteria included: (1) spine surgery within 90 days before presentation, (2) non-infectious inflammatory, degenerative, or neoplastic spine conditions in the affected segment, and (3) incomplete data preventing SITE score calculation.

Data Collection

Medical records were systematically reviewed and data from four timepoints was entered into a SecuTrial® database: hospital admission, surgery (if applicable), outpatient consultation at approximately 90 days post-discharge, and at 12 months post-discharge.

Variables collected included demographics (age, sex, body mass index (BMI), drug abuse), clinical indices (Charlson Comorbidity Index (CCI),¹⁸ Canadian Frailty Index (CFI),^19,20 ASA grade,²¹ modified McCormick grade^22,23), infection characteristics (anatomical region, segments affected, infection type, bacterial cultures, antibiotic resistance), laboratory values (CRP, leukocytes, thrombocytes, erythrocytes, hemoglobin, hematocrite and eGFR), treatment details, complications, and outcome parameters including MacNab criteria.²⁴ Additional assessments included ASIA Impairment Scale (AIS), neurological status and pain, imaging parameters, comorbidities, and presence of sepsis.²⁵

SITE Score Calculation

The SITE score evaluates spinal infections across five categories: neurological deficits (1-3 points), spinal location (1-4 points), radiological findings (1-5 points), pain severity (0-2 points), and host comorbidities (0-1 points). Total scores range from 3-15 points, stratifying patients into three categories: severe (3-8 points, surgery recommended), moderate (9-12 points, surgery optional), and mild infections (13-15 points, conservative management recommended).¹³ All patients presenting with acute plegia or bladder/bowel dysfunction underwent surgical intervention in the development cohort, representing an absolute surgical indication regardless of other clinical parameters according to the authors. For complete information regarding validation methods and practical clinical application, we refer to the original research published by Pluemer and colleagues in 2023.¹³ Our group recently evaluated the SITE score applicability and reliability among non-spine surgeons.¹⁰

Treatment Decision Making

The treatment decisions for patients with DNSI at our institution involve interdisciplinary collaboration between spine surgery and infectious disease services. Decisions are individualized based on neurological status, extent of infection, spinal instability, pain control, patient mobility, and comorbidities. For critically ill septic patients, treatment decisions consider both the risk of conservative management and surgical intervention.²⁶

Statistical Analysis

Statistical analysis was performed using Stata v18 SE (StataCorp LLC, College Station, TX). Patient characteristics, disease-specific variables, and treatment outcomes were described using frequencies for categorical variables and mean ± standard deviation for continuous variables. Associations between SITE score components and treatment decisions were examined using Pearson’s chi-square tests and logistic regression models. We evaluated the concordance between the SITE score recommendation and actual treatment decisions. Mean SITE scores were compared between groups using Student's t-test to enable comparison with previous external validation studies that reported mean values. The discriminatory ability was assessed using receiver operating characteristic (ROC) curve analysis, with area under the curve (AUC) values of 0.7-0.8 considered acceptable, 0.8-0.9 excellent, and >0.9 outstanding.²⁷ To address the prediction issue with new-onset severe neurological deficits, we developed a hierarchical model that assigned a 99% predicted probability of surgery to all patients with acute plegia or bladder/bowel dysfunction and used the SITE score-based prediction for all other patients. Probability values of <0.05 were considered statistically significant

Ethical Considerations

The institutional review board of Eastern Switzerland approved the study (BASEC ID 2023-01343) with waiver for informed consent due to the retrospective design and use of anonymized data.

Results

Demographics and Clinical Characteristics

We identified 95 consecutive patients with DNSI treated at a Swiss Spine center between January 2016 and December 2022 who met the inclusion criteria. The mean age of patients was 66.4 ± 18.4 years, with a male predominance (66.3%). Most patients (66.3%) presented as emergency admissions. Notable risk factors included active smoking (23.2%) and drug abuse (7.4%). Comorbidity burden varied considerably, with about 18% having moderate or severe CCI scores and 40% considered ASA III-IV patients. The neurological status at presentation showed that 72.6% were neurologically intact according to the modified McCormick grade, while varying degrees of motor and sensory deficits were present in the remainder. Detailed demographic and clinical characteristics are presented in Table 1.

Table 1.

Basic Demographic Data of n = 95 Consecutive Patients, Presenting With De-Novo Spinal Infection. Results Are Displayed as Mean (Standard Deviation; Range) or Count (Percent)

Age, in years	66.4 (±18.4; 16 - 91)
Sex
Female	32 (33.7%)
Male	63 (66.3%)
BMI, in kg/m²	26.0 (±5.1, 16.9 - 48.9)
Drug abuse
Active smoking	22 (23.2%)
Active drinking*	16 (16.8%)
Abuse of other drugs**	7 (7.4%)
CCI category
Very low (0)	26 (27.4%)
Mild (1-2)	16 (16.8%)
Moderate (3-4)	11 (11.6%)
Severe (>4)	6 (6.3%)
Missing data	36 (37.9%)
CFI
Very fit or well	5 (5.3%)
Managing well or vulnerable	9 (9.6%)
Mildly, moderately or severely frail	6 (6.4%)
Very severely frail or terminally ill	− (0.0%)
Missing data	74 (78.7%)
ASA grade
1 (no morbidity)	4 (4.2%)
2 (mild/moderate morbidity)	21 (22.1%)
3 (severe morbidity)	32 (33.7%)
4 (life threatening)	5 (5.3%)
Missing data	33 (34.7%)
Modified McCormick grade
1 (neurologically intact)	69 (72.6%)
2 (mild motor or sensory deficits, independent)	15 (15.8%)
3 (moderate deficit, independent with external aid)	6 (6.3%)
4 (severe motor or sensory deficit, care required)	4 (4.2%)
5 (paraplegic or quadriplegic)	1 (1.1%)
Type of hospital admission
Emergency	63 (66.3%)
Elective	32 (33.7%)
Total	n = 95 (100%)

Abbreviations: BMI, body mass index; CCI, Charlson comorbidity index; CFI, Canadian frailty index; CRP, C-reactive protein; GFR, Glomerular filtration rate.

*Daily consumption of at least 3 beverages.

**Includes intake of central nervous system (CNS) depressants, CNS stimulants, hallucinogens, dissociative anesthetics, narcotic analgesics, inhalants, cannabis.

Disease-Specific Characteristics

The lumbosacral spine was most affected (62.1%), with the majority of infections spanning two segments (72.6%). Spondylodiscitis was present in 61.3% of cases without an epidural or paravertebral abscess, and in 32.3% of cases with an abscess. Bacterial pathogens were identified in 57.9% of cases, with Staphylococcus species being most prevalent (24.2%). Surgical intervention was required in 52.6% of patients. Multi-resistant organisms were detected in 6.3% of cases. Complete disease-specific data are provided in Table 2.

Table 2.

Disease-Specific Data of n = 95 Consecutive Patients With De-Novo Spinal Infection

Parameter	Result
Anatomical Distribution
(Most severely affected spinal region)
Cervical	9 (9.6%)
Thoracic	26 (27.4%)
Lumbo-sacral	59 (62.1%)
Infection Extent
(Infection spanning spinal segments)
One segment	5 (5.3%)
Two segments	69 (72.6%)
Three or more segments	21 (22.1%)
Infection Classification
Spondylodiscitis without abscess	57 (61.3%)
Spondylodiscitis with abscess	30 (32.3 %)
Isolated Spondylitis	2 (2.2%)
Isolated Abscess	4 (4.3%)
Bacteria Identified in Culture
Staphylococcus spp.	23 (24.2%)
Escherichia spp.	6 (6.3%)
Streptococcus spp.	5 (5.3%)
Propionibacterium spp.	4 (4.2%)
Parvimonas spp.	3 (3.2%)
Pseudomonas spp.	2 (2.1%)
Enterobacter spp.	2 (2.1%)
Other bacteria*	7 (7.4%)
Mixed flora	3 (3.2%)
None identified	40 (42.1%)
Antibiotic Resistance
Multi-resistant organisms	6 (6.3%)
No resistance detected	89 (93.7%)
Laboratory Values at Admission
Leucocyte count (G/l)	12.5 ± 22.9 (4.1 - 228)
Thrombocyte count (G/l)	323.2 ± 129.8 (120 - 814)
Erythrocyte count (G/l)	3.9 ± 0.7 (2.3 - 5.5)
Hemoglobin (g/dl)	113.4 ± 25.3 (67 - 265)
Hematocrit (%)	34.8 ± 5.7 (21.2 - 49.5)
CRP (mg/l)	111.5 ± 88.1 (3 - 405)
GFR (ml/min/1.73 m²)	68.5 ± 28.6 (5 - 118)
Surgical Treatment
Required**	50 (52.6%)
Not required	45 (47.4%)
Total	n = 95 (100%)

Abbreviations: CRP, C-reactive protein; GFR, Glomerular filtration rate; spp, species.

*Other bacteria include: Veillonella spp. (n = 1), Campylobacter spp. (n = 1), Citrobacter spp. (n = 1), Enterococcus spp. (n = 1), Haemophilus spp. (n = 1), Klebsiella spp. (n = 1), and Mycobacterium spp. (n = 1). **Among patients requiring surgery, instrumented spine surgery was performed in 18 patients (18.9%). Note. Results are displayed as mean ± standard deviation (range) or count (percent).

SITE Score Distribution and Component Analysis

SITE scores ranged from 4-13 points (median 8.0, mean 8.5 ± 1.9 points) (Figure 1B). The distribution of SITE score parameters showed that most patients were neurologically intact (70.5%), had infections in junctional spinal regions (47.4%), and presented with pain allowing ambulation (69.5%). Visible endplate erosion at the time of imaging was the most common radiological finding (38.9%), and 30.5% had either IVDA or diabetes mellitus as host comorbidities (Table 3).

Table 3.

Distribution of the SITE Score Variables Among n = 95 Consecutive Patients, Presenting With De-Novo Spinal Infection. Results Are Displayed as Count (Percent). Significant values (p<0.05) are marked in bold.

SITE score item	Total	Surgery	No surgery	p-value
Neurology				0.033
Acute plegia OR bladder/bowel dysfunction	5 (5.3%)	5 (100.0%)	0 (0.0%)
Motor dysfunction	15 (15.8%)	11 (73.3%)	4 (26.7%)
Sensory dysfunction	8 (8.4%)	4 (50.0%)	4 (50.0%)
Neurologically intact	67 (70.5%)	30 (44.8%)	37 (55.2%)
Location				0.541
Junctional (C0-C2, C7-T2, T11-L1, L5-S1)	45 (47.4%)	24 (53.3%)	21 (46.7%)
Mobile (C3-6, L2-4)	33 (34.7%)	19 (57.6%)	14 (42.4%)
Semirigid (T3-10)	17 (17.9%)	7 (41.2%)	10 (58.8%)
Rigid (S2-5)	0 (0.0%)	0 (0.0%)	0 (0.0%)
Radiology				0.026
Spinal canal stenosis	26 (27.4%)	20 (76.9%)	6 (23.1%)
Segmental angulation or translation	15 (15.8%)	6 (40.0%)	9 (60.0%)
Visible endplate erosion on CT	37 (38.9%)	15 (40.5%)	22 (59.5%)
None of these radiological findings	17 (17.9%)	9 (52.9%)	8 (47.1%)
Pain				0.622
Standing axial pain OR inability to ambulate	15 (15.8%)	8 (53.3%)	7 (46.7%)
Other pain w/the ability to ambulate	66 (69.5%)	33 (50.0%)	33 (50.0%)
No pain	14 (14.7%)	9 (64.3%)	5 (35.7%)
Host Comorbidities				0.035
IVDA OR diabetes mellitus	29 (30.5%)	20 (69.0%)	9 (31.0%)
Other comorbidities OR no comorbidities	66 (69.5%)	30 (45.5%)	36 (54.5%)
Total	95 (100%)	50 (52.6%)	45 (47.4%)

The full description for the Radiology options are: (1) Spinal canal stenosis w/impingement of central neural elements w/or w/o de novo deformity, (2) Segmental angulation or translation w/de novo deformity/foraminal stenosis OR erosion of vertebral body on CT >50% OR PL involvement on both sides, (3) Visible endplate erosion on CT OR edema of vertebral body >50% on MRI OR intervertebral disc involvement on MRI OR PL involvement on 1 side, (4) None of these radiological findings

SITE Score Comparison Between Treatment Groups

Patients who underwent surgical treatment had significantly lower SITE scores compared to those managed conservatively. The surgical group had a mean SITE score of 7.9 ± 1.8 (median 8.0, IQR 6-9) vs 9.1 ± 1.9 (median 9.0, IQR 8-11) in the conservative group (P = 0.001). The Mann-Whitney U test confirmed this significant difference in SITE score distributions between treatment groups (P < 0.001).

Predictors of Surgical Intervention

In univariate analysis, neurology (P = 0.033), radiology (P = 0.026), and host comorbidities (P = 0.035) were significantly associated with surgical treatment. Patients with acute plegia/bladder dysfunction had the highest surgical rate (100%), while those with spinal canal stenosis (76.9%) and motor dysfunction (73.3%) also underwent surgery frequently. Among patients with IVDA or diabetes, 69.0% underwent surgery. Location (P = 0.541) and pain (P = 0.622) components did not show significant associations with the decision to operate (Table 4).

Table 4.

Analysis of SITE Score Components and Correlation With Surgical Decision. Results are Displayed as Odds Ratio (OR) with 95% Confidence Interval (CI). Significant values (p<0.05) are marked in bold.

Component	Odds ratio (95% CI)	Surgical rate	p-value
Neurology			0.033 ^a
Neurologically intact (Ref)	0.81 (0.50-1.31)	44.78%	0.393^b
Sensory dysfunction	1.23 (0.28-5.35)	50.00%	0.779^b
Motor dysfunction	3.39 (0.98-11.74)	73.33%	0.054^b
Acute plegia/bladder dysfunction	−	100.00%	−^c
Location			0.541^a
Junctional (Ref)	1.14 (0.64-2.05)	53.33%	0.655^b
Mobile	1.19 (0.48-2.94)	57.58%	0.710^b
Semirigid	0.61 (0.20-1.90)	41.18%	0.395^a
Radiology			0.026 ^a
Spinal canal stenosis (Ref)	3.33 (1.34-8.30)	76.92%	0.010 ^b
Segmental angulation	0.20 (0.05-0.79)	40.00%	0.022 ^b
Visible endplate erosion	0.20 (0.07-0.63)	40.54%	0.006 ^b
None of these findings	0.34 (0.09-1.26)	52.94%	0.106^b
Pain			0.622^a
Standing Pain	1.14 (0.41-3.15)	53.33%	0.796^b
Other Pain	0.88 (0.28-2.69)	50.00%	0.816^b
No Pain	1.58 (0.35-7.00)	64.29%	0.816^b
Host Comorbidities			0.035 ^a
IVDA/diabetes	2.22 (1.01-4.88)	68.97%	0.037 ^b

Abbreviations: IVDA, Intravenous Drug Abuse.

^aPearson’s chi-square test.

^bLogistic regression.

^cPerfect prediction (no statistical test performed).

Multivariate logistic regression confirmed neurology (OR 0.48, 95% CI 0.28-0.81, P = 0.006) and host comorbidities (OR 0.34, 95% CI 0.13-0.93, P = 0.035), as the only significant independent predictors of surgery. Location, radiology, and pain components were not statistically significant predictors. The complete summary SITE score demonstrated a significant association with surgical intervention (OR 0.69, 95% CI 0.54-0.88, P = 0.003), with higher SITE scores associated with a decreased likelihood of surgery (Table 5). The multivariate model demonstrated good overall fit (LR chi²(5) = 16.64, P = 0.005, Pseudo R² = 0.127).

Table 5.

Estimation of the Effect Size Between the Singular Items or the Summary Score of the SITE Score and the Likelihood of n = 95 Consecutive Patients, Presenting With De-Novo Spinal Infection, to Require Surgical Treatment. Results are Displayed as Odds Ratio (OR) with 95% Confidence Interval (CI). Significant values (p<0.05) are marked in bold.

SITE score item	OR	95% CI	p-value
Neurology	0.48	0.28 - 0.81	0.006
Location	0.82	0.44 - 1.49	0.509
Radiology	0.76	0.54 - 1.07	0.114
Pain	0.96	0.43 - 2.17	0.93
Host comorbidities	0.34	0.13 - 0.93	0.035
Summary SITE score	0.69	0.54 - 0.88	0.003

Abbreviation: SITE: Spinal Infection Treatment Evaluation.

Treatment Concordance and Score Performance

Overall treatment concordance with SITE score recommendations occurred in 82.1% of cases. In the surgery-recommended category, 66.0% received surgery; in the optional category, 39.5% received surgery; and 100% of patients (n = 2) in the conservative category were managed conservatively (Figures 2 and 3). The association between SITE score recommendations and actual treatment was statistically significant (χ² = 8.76, P = 0.012). Using the original SITE score cutoff of ≤8 points (including patients with acute plegia or bladder/bowel dysfunction as surgical candidates regardless of score), the scoring system demonstrated a sensitivity of 66.0% (33/50 surgical cases correctly identified).

Figure 2.

Comparison of concordance to the SITE Score treatment recommendation based on the score value. Treatment concordance by SITE Score value bar chart comparing the proportion of patients treated according to SITE Score recommendations across different score values, demonstrating variable adherence patterns.

Figure 3.

Comparison of surgical versus non-surgical treatment of the cohort based on the recommendation by the SITE Score. Treatment patterns by SITE Score recommendation categories bar chart showing the distribution of surgical (red bars) versus conservative (green bars) treatment within each SITE Score recommendation category (surgery recommended: 3-8 points or patients with acute plegia or bladder/bowel dysfunction, optional: 9-12 points, conservative: 13-15 points).

Discriminatory Performance

ROC curve analysis of individual SITE score components showed AUC ranges of 0.11-0.29 when assessed separately. Excluding cases with acute plegia or bladder/bowel dysfunction resulted in AUC 0.67 (95% CI 0.56-0.78). Multivariate logistic regression incorporating all SITE score components demonstrated an AUC 0.71 (95% CI 0.60-0.82). The hierarchical model demonstrated acceptable discriminatory performance with an AUC of 0.70 (95% CI 0.60-0.81; Figure 4).

Figure 4.

ROC curve for SITE Score predicting surgery. ROC curve analysis of SITE Score performance in predicting surgical intervention the receiver operating characteristic curve demonstrates poor discriminatory ability of the SITE Score (AUC 0.67, 95% CI 0.54-0.88) when excluding patients with acute plegia or bladder/bowel dysfunction (left panel). Hierarchical model incorporating acute plegia cases (Assigned 99% predicted probability of surgery) shows improved discriminatory performance (AUC 0.70, 95% CI 0.60-0.81) (right panel).

Discussion

This external validation study demonstrates that the SITE score showed acceptable predictive accuracy for surgical decision-making in our cohort of 95 consecutive patients with DNSI. With an overall concordance rate of 82.1% and a moderate discriminatory ability (AUC 0.67 for the complete score excluding acute plegia cases, and AUC 0.70 for the hierarchical model), our findings suggest that while the SITE score shows some clinical utility, its performance may vary across different clinical settings and patient populations.²⁷ The hierarchical approach, which assigned 99% predicted probability to patients with acute plegia or bladder/bowel dysfunction and used the SITE score for the remaining patients, demonstrated the best discriminatory performance (AUC 0.70), approaching acceptable clinical utility thresholds. The discrepancy between our external validation results and the original study with internal validation highlights the challenges inherent in developing universally applicable clinical decision tools for complex conditions like spinal infections, where treatment decisions involve multiple interacting factors that may vary significantly between institutions and clinical contexts.^5,7,11,12

Performance of Individual SITE Score Components

The neurological component emerged as the most consistently significant predictor of surgical intervention (OR 0.48, 95% CI 0.28-0.81, P = 0.006). All patients with acute plegia or bladder/bowel dysfunction underwent surgery, reflecting the universal recognition of severe neurological compromise as a surgical emergency.^28-30 However, only 73.3% of patients with motor dysfunction received surgical treatment, suggesting that the severity and progression of neurological deficits, rather than their mere presence, may drive surgical decisions or the aggressiveness of treatment. This finding aligns with fundamental surgical principles and supports the inclusion of neurological status as a critical component in any treatment algorithm for spinal infections. The strong predictive value of this component suggests it should carry significant weight in clinical decision-making.^30,31

The host comorbidity component showed a significant association with surgical intervention (OR 0.34, 95% CI 0.13-0.93, P = 0.035), with patients having IVDA or diabetes mellitus undergoing surgery more frequently (69.0% vs 45.5%). This finding aligns with clinical expectations, as these high-risk patients may present with more aggressive infections requiring surgical drainage, or institutional protocols may favor early surgical debridement in immunocompromised patients to prevent treatment failure.^32-35 The narrow scope of the comorbidity component, limited to only IVDA and diabetes mellitus, may inadequately capture the complexity of patient comorbidity burden. Other scoring systems like the ASA classification or CCI,²³ which were collected by us but were not incorporated into the original SITE score, might provide more comprehensive risk stratification.

While radiological findings showed initial significance in univariate analysis (P = 0.026), this was not maintained in the multivariate model (P = 0.114), suggesting confounding by other clinical factors. The subgroup analysis revealed important patterns: patients with spinal canal stenosis had a relatively high surgical rate (76.9%), while those with segmental angulation (40.0%) or endplate erosion (40.5%) were less likely to undergo surgery. This pattern indicates that not all radiological abnormalities carry equal surgical weight, and the current SITE score may not adequately differentiate between radiological findings that mandate immediate intervention vs those that can be managed conservatively with close monitoring. Our differing results – compared to the original internal validation cohort – may also indicate differences in management strategies, in particular regarding instrumented fusion for suspected instability, between Europe and the United States.³⁶

Neither spinal location nor pain severity demonstrated significant associations with surgical decisions in our cohort, contrasting with the original SITE development study. This discrepancy may also reflect differences in institutional treatment philosophies or patient selection criteria. The lack of significance for these components questions their universal applicability and suggests that local practice patterns may significantly influence their predictive value.

Comparison With External Validation Studies

Our study represents the first European external validation of the SITE score, adding crucial geographic diversity to the validation literature. Three prior studies from the United States and Asia demonstrate varying performance characteristics, highlighting regional differences in treatment approaches.^15-17

Xiong et al (2025) evaluated 213 patients and reported superior discriminatory performance with an AUC of 0.74 (95% CI 0.67-0.81) compared to our AUC of 0.70.¹⁷ Mean SITE scores in surgical patients were significantly lower in the Stanford cohort compared to our study (5.6 vs 7.9), indicating a more severely affected patient population, which was more likely to undergo surgery. In both studies, conservatively managed patients had significantly higher SITE scores (7.5 and 9.1 respectively), reflecting the expected pattern of less severe infections being less likely to require surgery. Using the established cutoff of 8 points for “severe” infection, they achieved a sensitivity of 81.3%, while our study demonstrated a sensitivity of 66.0%. The higher sensitivity in the Stanford cohort suggests a potentially lower threshold for surgical decision-making, while the philosophy in our clinics remains “conservative” whenever possible. The Quiceno et al. study from another US tertiary center (2025) included 194 patients over 19 years and reported the lowest discriminatory performance (AUC 0.66), inferior to both the Stanford study (0.74) and our European cohort (0.70).¹⁶ Mean SITE scores showed less difference between surgical and conservative patients (7.2 vs 8.2) compared to other studies, potentially explaining the reduced discriminatory ability. Lastly, an Iranian validation study (2024) reported the highest discriminatory performance with an AUC of 0.86, which is higher than for all other external validation studies.¹⁵ This variability in discriminatory performance across different centers and continents emphasizes the influence of institutional factors and regional treatment protocols on the practical utility of the SITE score.

The concordance rates also varied significantly across studies: our European cohort achieved 82.1% concordance, while the other studies reported variable concordance depending on the scoring methodology employed. These differences suggest that while the SITE score captures many important factors influencing treatment decisions, regional variations in clinical practice, healthcare systems, and treatment philosophies significantly impact its practical performance.

Comparison With Original SITE Score Development

Methodological differences between our retrospective validation and the original expert panel-based development may explain performance variations. Difference in data collection methodology could introduce scoring inconsistencies and affect the accuracy of component assessments. Additionally, institutional differences in treatment thresholds, surgical expertise, and multidisciplinary care protocols may significantly impact the applicability of any scoring system. Our institution’s collaborative approach between spine surgery and infectious disease services may result in treatment decisions that differ systematically from those at other centers, affecting score performance.^37,38

The good concordance rate (82.1%) suggests the SITE score captures many important treatment factors, though clinically relevant elements like comprehensive comorbidity assessment, infection severity markers, and patient preference remain unaccounted for.^18,21,22,39 In the category where surgery is recommended, only 66.0% of patients received surgery, indicating some clinical override of score recommendations. This pattern suggests that clinicians consider additional factors beyond those included in the SITE score when making treatment decisions. This finding is consistent across the North American external validation studies, suggesting that the current SITE score, while useful, does not capture all clinically relevant decision-making factors.^16,17 The small sample in the conservative-recommended category (n = 2) limits interpretation, but both patients were successfully managed conservatively, consistent with score recommendations.

Strengths and Limitations

Our article represents an external validation of the SITE score, providing crucial evidence about its generalizability. The consecutive patient inclusion minimizes selection bias, and the comprehensive analysis of individual components offers insights into their relative clinical relevance. Our cohort’s demographic and clinical characteristics align well with published spinal infection epidemiology, supporting the validity of our findings.^40,41 As the first European validation study, our work fills an important geographic gap in the external validation literature and demonstrates that the SITE score maintains acceptable performance across different healthcare systems.

The retrospective design introduces potential information bias and limits the quality of certain assessments.^37,38 Missing data for key variables (particularly CCI and ASA classification) reduced analytical power. The single-center design, while providing consistency in treatment protocols, may limit generalizability to other healthcare settings. The limited sample size, particularly regarding certain subgroups, results in limited statistical power.

Clinical Implications and Recommendations

Based on our findings, we recommend using the SITE score as a useful clinical tool in decision support. The neurological component appears particularly valuable and should be weighted heavily in clinical decision-making. Clinicians should recognize that while most treatment decisions align with score recommendations, some cases may appropriately deviate based on factors not captured by the scoring system. The score demonstrates particular value as a structured framework for non-spine specialists to evaluate key parameters and determine when spine surgery consultation is warranted, even though a prior work from our group showed that “non spine surgery physicians” have difficulties with SITE score grading.¹⁰ The moderate discriminatory performance (AUC 0.70) suggests clinical utility, especially when combined with clinical judgment. The SITE score provides a standardized approach to assessing neurological status, radiological findings, and clinical severity that can help identify high-risk patients requiring urgent evaluation by spine surgeons, while providing a common terminology for interdisciplinary communication. Large-scale prospective validation studies across multiple centers are needed to better understand the score’s performance across diverse clinical settings. Such studies should ideally include standardized training for score assessment and detailed documentation of factors influencing treatment decisions beyond those included in the current SITE score. Additionally, refinement of the scoring system to include more comprehensive comorbidity assessment and updated radiological criteria may further improve its predictive accuracy. This should also consider regional variations in treatment approaches and healthcare system factors that influence surgical decision-making in spinal infections.

Conclusion

This external validation study reveals that the SITE score demonstrated acceptable predictive accuracy for surgical decision-making in patients with DNSI, with good predictive performance (AUC 0.70) and strong overall concordance with actual clinical decisions (82.1%). The neurological component proved particularly valuable in guiding treatment selection. While the score serves as a useful structured clinical decision support tool, clinicians frequently override recommendations based on factors not captured in the original scoring system. These findings support broader implementation of the SITE score while highlighting the need for future refinement to incorporate additional decision factors relevant across different healthcare settings.

Footnotes

ORCID iD

Felix C. Stengel

Ethical Considerations

The institutional review board of Eastern Switzerland approved the study (BASEC ID 2023-01343) with waiver for informed consent due to the retrospective design and use of anonymized data.

Author Contributions

Each author made substantial contribution to this article. Conception and design: all authors. Acquisition of data: L.B. Statistical analysis: F.S. and M.S. Analysis and interpretation of data: L.B., F.S, M.S. Drafting the article: L.B., F.S. M.S. Reviewed submitted version of manuscript: all authors. Approved the final version of the manuscript: all authors. Administrative/technical/material support: all authors. Study supervision: F.S. and M.S.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.*

References

Lang

Walter

Schindler

, et al. The epidemiology of spondylodiscitis in Germany: a descriptive report of incidence rates, pathogens, in-hospital mortality, and hospital stays between 2010 and 2020. J Clin Med. 2023;12(10):3373. doi:10.3390/jcm12103373

Thavarajasingam

Subbiah Ponniah

Philipps

, et al. Increasing incidence of spondylodiscitis in England: an analysis of the national health service (NHS) hospital episode statistics from 2012 to 2021. Brain Spine. 2023;3:101733. doi:10.1016/j.bas.2023.101733

Conan

Laurent

Belin

, et al. Large increase of vertebral osteomyelitis in France: a 2010-2019 cross-sectional study. Epidemiol Infect. 2021;149:e227.

Kramer

Thavarajasingam

Neuhoff

, et al. Epidemiological trends of pyogenic spondylodiscitis in Germany: an EANS spine section study. Sci Rep. 2023;13(1):20225. doi:10.1038/s41598-023-47341-z

Blecher

Frieler

Qutteineh

, et al. Who needs surgical stabilization for pyogenic spondylodiscitis? Retrospective analysis of non-surgically treated patients. Glob Spine J. 2021;13(6):1550-1557. doi:10.1177/21925682211039498

Lener

Wipplinger

Lang

Hartmann

Abramovic

Thomé

. A scoring system for the preoperative evaluation of prognosis in spinal infection: the MSI-20 score. Spine J. 2022;22(5):827-834. doi:10.1016/j.spinee.2021.12.015

Lener

Hartmann

Barbagallo

GMV

Certo

Thomé

Tschugg

. Management of spinal infection: a review of the literature. Acta Neurochir. 2018;160(3):487-496. doi:10.1007/s00701-018-3467-2

Lacasse

Derolez

Bonnet

, et al. 2022 SPILF - clinical practice guidelines for the diagnosis and treatment of disco-vertebral infection in adults. Infect Dis Now. 2023;53(3):104647. doi:10.1016/j.idnow.2023.01.007

Berbari

Kanj

Kowalski

, et al. 2015 infectious diseases society of America (IDSA) clinical guidelines for the diagnosis and treatment of native vertebral osteomyelitis in adults. Clin Infect Dis. 2015;61(6):e26-e46. doi:10.1093/cid/civ482

10.

Kramer

Stienen

Martens

Stengel

Motov

. Evaluation of the SITE score for de-novo spinal infection patients in clinical practice - a case-based approach. Brain Spine. 2025;5:104228. doi:10.1016/j.bas.2025.104228

11.

Thavarajasingam

Vemulapalli

Vishnu

, et al. Conservative versus early surgical treatment in the management of pyogenic spondylodiscitis: a systematic review and meta-analysis. Sci Rep. 2023;13(1):15647. doi:10.1038/s41598-023-41381-1

12.

Duarte

Vaccaro

. Spinal infection: state of the art and management algorithm. Eur Spine J. 2013;22(12):2787-2799. doi:10.1007/s00586-013-2850-1

13.

Pluemer

Freyvert

Pratt

, et al. A novel scoring system concept for de novo spinal infection treatment, the spinal infection treatment evaluation score (SITE score): a proof-of-concept study. J Neurosurg Spine. 2023;38(3):396-404. doi:10.3171/2022.11.SPINE22719

14.

Pluemer

Freyvert

Pratt

, et al. Ongoing decision-making dilemma for treatment of de novo spinal infections: a comparison of the spinal infection treatment evaluation score with the spinal instability spondylodiscitis score and spine instability neoplastic score. J Neurosurg Spine. 2024;41(2):273-282. doi:10.3171/2024.2.SPINE23664

15.

Rezvani

Ahmadvand

Yazdanian

Azimi

Askariardehjani

. Value of spinal infection treatment evaluation score, pola classification, and brighton spondylodiscitis score from decision to surgery in patients with spondylodiscitis: a receiver-operating characteristic curve analysis. Asian Spine J. 2024;18(2):218-226. doi:10.31616/asj.2023.0317

16.

Quiceno

Soliman

MAR

Khan

AMA

, et al. External validation of the spinal infection treatment evaluation score: a single-center 19-year review of de novo spinal infections. J Neurosurg Spine. 2025;42(3):374-384. doi:10.3171/2024.7.SPINE24394

17.

Xiong

Huang

Narayanan

, et al. External performance of the spinal infection treatment evaluation (SITE) score and spinal instability spondylodiscitis score (SISS) in predicting operative intervention for de novo spinal infections. Spine J. 2025;25(9):2061-2070. doi:10.1016/j.spinee.2025.03.006

18.

Charlson

Pompei

Ales

MacKenzie

. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373-383. doi:10.1016/0021-9681(87)90171-8

19.

Rockwood

Song

MacKnight

, et al. A global clinical measure of fitness and frailty in elderly people. CMAJ (Can Med Assoc J). 2005;173(5):489-495. doi:10.1503/cmaj.050051

20.

Yildiz

Motov

Stengel

, et al. Influence of frailty on clinical and radiological outcomes in patients undergoing transforaminal lumbar interbody fusion-analysis of a controlled cohort of 408 patients. J Clin Med. 2025;14(6):1814. doi:10.3390/jcm14061814

21.

Statement on ASA physical status classification system. https://www.asahq.org/standards-and-practice-parameters/statement-on-asa-physical-status-classification-system. Published January 18, 2025. Accessed January 18, 2025.

22.

Manzano

Green

Vanni

Levi

. Contemporary management of adult intramedullary spinal tumors-pathology and neurological outcomes related to surgical resection. Spinal Cord. 2008;46(8):540-546. doi:10.1038/sc.2008.51

23.

Staartjes

Joswig

Corniola

Schaller

Gautschi

Stienen

. Association of medical comorbidities with objective functional impairment in lumbar degenerative disc disease. Glob Spine J. 2022;12(6):1184-1191. doi:10.1177/2192568220979120

24.

Neidre

MacNab

. Anomalies of the lumbosacral nerve roots. Review of 16 cases and classification. Spine. 1983;8(3):294-299. doi:10.1097/00007632-198304000-00010

25.

Rupp

Biering-Sørensen

Burns

, et al. International standards for neurological classification of spinal cord injury: revised 2019. Top Spinal Cord Inj Rehabil. 2021;27(2):1-22. doi:10.46292/sci2702-1

26.

Kramer

Thavarajasingam

Neuhoff

, et al. Management of severe pyogenic spinal infections: the 2SICK study by the EANS spine section. Spine J. 2025;25(5):876-885. doi:10.1016/j.spinee.2024.12.018

27.

Metz

. Basic principles of ROC analysis. Semin Nucl Med. 1978;8(4):283-298. doi:10.1016/S0001-2998(78)80014-2

28.

Keil

Akbar

Abel

. Querschnittlähmung bei septischen Erkrankungen der Wirbelsäule. Orthopä. 2005;34(2):113-114, 116-119. doi:10.1007/s00132-004-0753-x

29.

Moreno-González

Ibarra

. The critical management of spinal cord injury: a narrative review. Clin Pract. 2024;15(1):2. doi:10.3390/clinpract15010002

30.

Turgut

. Complete recovery of acute paraplegia due to pyogenic thoracic spondylodiscitis with an epidural abscess. Acta Neurochir. 2008;150(4):381-386. doi:10.1007/s00701-007-1485-6

31.

Placide

Reznicek

. Evaluation and management of pyogenic spondylodiscitis: a review. J Clin Med. 2025;14(10):3477. doi:10.3390/jcm14103477

32.

Ziu

Dengler

Cordell

Bartanusz

. Diagnosis and management of primary pyogenic spinal infections in intravenous recreational drug users. Neurosurg Focus. 2014;37(2):E3. doi:10.3171/2014.6.FOCUS14148

33.

Wang

Lenehan

Itshayek

, et al. Primary pyogenic infection of the spine in intravenous drug users: a prospective observational study. Spine. 2012;37(8):685-692.

34.

Isobe

Utsugi

Ohyama

, et al. Recurrent pyogenic vertebral osteomyelitis associated with type 2 diabetes mellitus. J Int Med Res. 2001;29(5):445-450. doi:10.1177/147323000102900511

35.

Management and outcome of whole-spine epidural abscesses – institutional case series and systematic review. https://www-sciencedirect-com-443.web.bisu.edu.cn/science/article/pii/S277252942500116X?via%3Dihub. Published June 25, 2025. Accessed June 25, 2025.

36.

LWW . An international comparison of back surgery rates: spine. Published June 25, 2025. Accessed June 25, 2025.

37.

Wang

Kattan

. Cohort studies: design, analysis, and reporting. Chest. 2020;158(1S):S72-S78. doi:10.1016/j.chest.2020.03.014

38.

Sauerland

Lefering

Neugebauer

EAM

. Retrospective clinical studies in surgery: potentials and pitfalls. J Hand Surg Br. 2002;27(2):117-121. doi:10.1054/jhsb.2001.0703

39.

MRC-011221-Aids to the examination of the peripheral nervous system. https://www.ukri.org/wp-content/uploads/2021/12/MRC-011221-AidsToTheExaminationOfThePeripheralNervousSystem.pdf. Accessed June 9, 2025.

40.

Tsantes

Papadopoulos

Vrioni

, et al. Spinal infections: an update. Microorganisms. 2020;8(4):476. doi:10.3390/microorganisms8040476

41.

Kim

T-H

. The epidemiology of concurrent infection in patients with pyogenic spine infection and its association with early mortality: a nationwide cohort study based on 10,695 patients. J Infect Public Health. 2023;16(6):981-988. doi:10.1016/j.jiph.2023.04.010