Abstract
The debate over the effects of the timing of surgical spinal decompression after traumatic spinal cord injury (tSCI) has remained unresolved for over a century. The aim of the current study was to perform a systematic review and quality-adjusted meta-analysis of studies evaluating the effects of the timing of spinal surgery after tSCI. Studies were searched for through the MEDLINE® database (1966 to August 2012) and a 15-item, tailored scoring system was used for assessing the included studies' susceptibility to bias. Random effects and quality effects meta-analyses were performed. Models were tested for robustness using one way and criterion-based sensitivity analysis and funnel plots. Results are presented as weighted mean differences (WMDs) and odds ratios (ORs) with 95% confidence intervals (CIs). A total of 18 studies were analyzed. Heterogeneity was evident among the studies included. Quality effects models showed that – when compared with “late” surgery – “early” spinal surgery was significantly associated with a higher total motor score improvement (WMD: 5.94 points, 95% CI:0.74,11.15) in seven studies, neurological improvement rate (OR: 2.23, 95% CI:1.35,3.67) in six studies, and shorter length of hospital stay (WMD: −9.98 days, 95% CI:−13.10,−6.85) in six studies. However, one way and criterion-based sensitivity analyses demonstrated a profound lack of robustness among pooled estimates. Funnel plots showed significant proof of publication bias. In conclusion, despite the fact that “early” spinal surgery was significantly associated with improved neurological and length of stay outcomes, the evidence supporting “early” spinal surgery after tSCI lacks robustness as a result of different sources of heterogeneity within and between original studies. Where the conduct of a surgical, randomized controlled trial seems to be an unfeasible undertaking in acute tSCI, methodological safeguards require the utmost attention in future cohort studies. (Prospero registration number: PROSPERO CRD42012003182. See also
Introduction
M
Advocates for early spinal decompressions refer to the concept of primary versus secondary mechanisms of injury. 2 The primary mechanism comprises the initial cord lesion that results from physical trauma to the tissue caused by a displacement of surrounding spinal structures. The primary mechanism in turn initiates a cascade of secondary injury mechanisms including ischemia, edema, increased excitatory amino acids, and lipid peroxidation. 2 Pre-clinical data support the theory that persistent compression of the spinal cord represents a cause of secondary injury and, therefore, may be potentially reversible. 3,4 Despite positive effects of acute spinal decompressions reported in these standardized pre-clinical studies, no neurological benefits have consistently been reported in the human, clinical setting.
A number of systematic reviews examining the timing of spinal surgery after tSCI have been published over the last decade. 5 –10 A profound limitation of all of these reviews is that non-comparative case series were also included. Furthermore, the meta-analysis published by La Rosa and colleagues 10 was severely limited by applying a single, nonspecific pooled outcome measure, namely the “neurological improvement rate.” Finally, the impact of original studies' variable susceptibility to bias on pooled outcome estimates has not been considered in previous reviews.
The aim of the current study was to perform a systematic review and meta-analysis of the effects of the timing of spinal surgery after tSCI that differs from previous reviews in two ways. First, we critically appraised the reporting of key methodological safeguards reducing the susceptibility to bias in studies comparing outcomes in patients who underwent either “early” or “late” spinal surgery. Second, based on the methodological quality score of each individual study, we subsequently performed a quality effects (QE) meta-analysis.
Methods
Data sources
In order to identify relevant articles on the effects of the timing of spinal surgery after tSCI, we conducted a computerized search using the MEDLINE® (1966 to August 2012) database. The search terms used in the PubMed interface are presented in Supplementary Appendix 1 (see online supplementary material at
Eligibility criteria
The initial screening for eligibility included the following two criteria. The study had to consist of clearly defined cohorts of subjects who had undergone spinal surgery after tSCI. In addition, comparative outcomes of “early” and “late,” or delayed, spinal surgery had to be reported. Both randomized and nonrandomized comparative studies were eligible for inclusion. All neurological levels and all severities of tSCI were included. All approaches for spinal surgery were eligible for inclusion, including anterior, posterior, and circumferential open approaches, as well as closed manual reductions. Studies combining patients with non-tSCIs with patients with tSCIs were excluded. Studies that included patients<14 years of age were also excluded. Articles published in languages other than English and those without an abstract were excluded. Titles and abstracts of retrieved articles were screened, and potentially relevant full-text articles and reports were retrieved and evaluated individually and independently by two reviewers (J.J.v.M. and A.J.F.H.), who were trained and experienced in performing systematic literature searches. In each phase, and in all cases, disagreement concerning inclusion of articles was resolved by discussion and consensus agreement. Details of the literature search and selection process are outlined in the flow chart in Figure 1.

Literature search and selection process. (*References that fit into more than one exclusion criterion were categorized in a single category at the abstractor's discretion, †Fourteen of the 18 studies [18%] included more than one outcome measure suitable for meta-analysis.)
Data abstraction and quality assessment
Critical appraisal of included articles was performed to examine studies' susceptibility to bias. A standard data extraction form summarizing the study design, study population, and relevant raw data was completed for each article. Because a standardized assessment for the methodological quality of both surgical RCTs and non-RCTs does not currently exist, a tailored bias assessment scoring instrument was devised based on reworded methodological items covered by the following three consensus-derived “checklists” for reporting randomized and nonrandomized clinical studies: the Consolidated Standards of Reporting Trials of Nonpharmacologic Treatment (CONSORT-NPT) checklist, 12 the criteria list for methodological quality assessment introduced by the Cochrane Collaboration Back Review Group for Spinal Disorders, 13 and the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement. 14
A 15-item, tailored scoring system was devised for the assessment of comparative surgical studies in spinal trauma (Table 1). The minimum and maximum quality scores are 0 and 25 points, respectively. Included items cover key methodological safeguards that reduce studies' susceptibility to bias. As assessment of the quality of a study is closely intertwined with the quality of reporting;
15,16
therefore, the devised quality scoring system required each item to be both reported and indicative of a reduced susceptibility to bias. The scoring system is applicable to both randomized and nonrandomized controlled comparative surgical studies. Guidance for scoring individual items is presented in an “Explanation and Elaboration document,” which can be found in Supplementary Appendix 2 (see online supplementary material at
An explanation and elaboration of these items can be found in Supplementary Appendix 2 (see online supplementary material at
The total quality score ranges from 0 to 25 points.
SCI: spinal cord injury.
Outcome measures
Upon completion of the critical appraisal, outcome measures suitable for meta-analysis were identified. The primary outcome measure was neurological improvement, either measured via 1) the mean difference in Total Motor Score as defined by the International Standards for Neurological Classification of Spinal Cord Injury, 17,18 or 2) the “neurological improvement rate” odds ratio (OR); using different scales, including the American Spinal Injury Association (ASIA) Impairment Scale 17,18 and the Frankel scale. 19 Secondary outcomes included the mean difference in length of hospital stay, the mortality OR, and the adverse events OR.
Statistical analysis
All evaluated and reported effects of the timing of spinal surgery after tSCI were abstracted and assessed for eligibility for meta-analysis ( Fig. 1). Outcomes' point estimates and measures of variability (where applicable) had to be reported in at least three studies in order to be considered for meta-analysis. For continuous outcomes, for example, the Total Motor Score, 17,18 we calculated the weighted mean difference (WMD), that is, the weighted average of the differences in the individual studies, the weight being the individual inverse variances (i.e., precision) for each study adjusted for heterogeneity (i.e., the variability in study effect sizes beyond that caused by random error alone). This adjustment was based on assessed study deficiencies with the QE model and simply on statistical variability of effect sizes across studies with the random effects (RE) model. Similarly, for binary outcomes, for example, occurrence of nonspecific neurological improvements and adverse events, we calculated the OR, again computing a weighted average across studies as described. The advantage of the QE meta-analysis 20,21 is that it allows for redistribution of individual studies' inverse variance weights based on assessed study deficiencies rather than the usual random redistribution of weights seen with the RE model. 22 The derived quality score of each individual study was normalized by dividing by the maximum possible quality score and indicated as the quality index (Qi, range: 0–1). Subsequently, Qi values were entered in the quality effects model as described by Doi et al. 20,21,23,24
Statistical heterogeneity was assessed using the I 2 and τ2 statistics. Heterogeneity was regarded as substantial when the I 2 statistic exceeded 30% and the τ2 was>0. To assess the robustness of the meta-analysis, we performed one-way and “criterion-based” sensitivity analyses. One-way sensitivity analysis was conducted by excluding individual studies sequentially from the meta-analyses. A criterion-based sensitivity analysis looks at the pooled results after altering inclusion criteria for meta-analysis. Studies were selected and grouped using the following predefined inclusion criteria: 1) SCI severity, 2) level of SCI, and 3) maximum delay “early” treatment. The outcomes of a meta-analysis can be considered robust when sensitivity analyses in subsets of studies show similar pooled effect sizes.
To test for small-study effects,
25
or potential publication bias, the symmetry of funnel plots was assessed visually and statistically with the Egger's linear regression test. Although appreciating other possible causes of funnel plot asymmetry,
26
publication bias was suspected if the intercept of Egger's regression line deviated from 0 with a p value of<0·1.
27
All statistical analyses were performed using MetaXL version 1.3 available from
Results
Search and screening results
The computerized search strategy resulted in 2589 citations (Fig. 1). After screening of titles and abstracts, 42 remaining potential eligible articles were obtained for full-text screening. Twenty-one articles were excluded after a more detailed evaluation. The majority of the irrelevant articles did not evaluate the effect of the timing of spinal surgery specifically, or compared operative and nonoperative treatments. Cross-referencing resulted in one additional study. Twenty-two studies 29 –50 reported on the effects of the timing spinal surgery after tSCI, of which 18 studies reported usable information for meta-analysis.
Study characteristics and critical appraisal
Included studies consisted of 1 randomized controlled trial (RCT) (5%), 1 quasi-RCT (5%), 4 prospective cohort studies (18%), and 16 retrospective comparative cohort studies (73%) (Table 2). Early treatment was defined as spinal surgery performed within 24 and 72 h after injury in ten (46%) and six (27%) studies, respectively. The maximum delay for “late” spinal surgery was 1 week in one study 36 (5%), 2–8 weeks in four studies 31,39,43,47 (18%), and 6–12 months in two studies 33,40 (9%). A majority of 15 studies (68%) did not report the maximum delay for surgery.
A slash (‘/’) denotes details for both early and late groups were presented. Values in parentheses are ranges of mean values, when reported. bThe timing of surgical spinal decompression was processed as a continuous measure in this study. cVarious subgroup analyses were presented in this article. However, as no details of the subgroups were presented these analyses were not included in the current review.
RCS, retrospective cohort study; PCS, prospective cohort study; RCT, randomized controlled trial; h: hour(s); T, tetraplegia; P, paraplegia; I, incomplete; C, complete; NEU, neurological; LOS, length of stay; Qi, “quality” index value (see Supplementary Appendix 3), NR, not reported; NA, not applicable.
Applied as a rough indicator for the level of SCI, half of the studies (11) included only tetraplegics, whereas 5 studies (23%) included only paraplegics. Thirteen studies (59%) included both complete and incomplete SCI subjects, and seven studies (32%) included only incomplete SCI subjects. Follow-up was notably short for studies focusing on length of stay outcomes with a minimum reported mean duration of 10 days for early treated paraplegics in one study. 38 Studies that focused on neurological outcomes reported longer follow-up periods ranging from 6 months to 9 years. Only four studies (18%) reported sufficient details on point estimates and measures of variability of the latest follow-up visits. 30,33,39,40
As indicated by the low Qi values presented in Table 2, included studies demonstrated a high susceptibility to bias. A complete set of assigned quality scores is presented in detail in Supplementary Appendix 3 (see online supplementary material at
Quantitative synthesis
Neurological outcomes were reported in 19 studies 29,31 –33,35 –37,39 –50 (86%), length of stay related outcomes were reported in 12 studies 30,31,34,37 –39,41,42,45,47,48,50 (55%), financial outcomes were reported in 2 studies 34,41 (9%), bladder function outcomes were reported in 2 studies 33,37 (9%), and the Functional Independence Measure 51 and blood loss 44 were each reported in 1 study (5%) (Table 2).
Neurological outcomes
Nine studies 29,32,33,37,41,42,45,48,50 (41%) reported the Total Motor Score, six studies 31,36,39,44,49,50 (27%) reported the neurological improvement rate, four studies 31,36,41,50 (18%) reported the ASIA Impairment Scale, and four studies 42,46 –48 (14%) reported the Frankel scale. In addition, a variety of neurological aggregate scores, sensory scores, and level of injury measures were each reported in one or two studies. 35,37,39,41,45 The Total Motor Score and the neurological improvement rate outcomes were suitable measures for meta-analysis.
The mean Total Motor Score improvement was specified for both “early” and “late” groups in seven studies (32%), comprising 815 pooled subjects. Studies were considered heterogeneous (τ2>0; I
2
=52%; p=0·05), and the QE model was used to pool the data, with the RE model being used for standardized comparison. The studies' individual and pooled WMDs and 95% CIs obtained with the QE model are shown in Figure 2. The pooled WMD of the Total Motor Score between “early” and “late” surgical spinal decompression was 5·94 (95% CI: 0.74, 11.15), indicating that, on average, subjects who undergo early spinal surgery gain approximately six motor score points more than subjects who undergo late treatment (Table 3). The RE model showed a slightly smaller pooled WMD of 4·73 (95% CI: −0·13, 9·59) points, with 0 now being included in the CI. For details see Supplementary Appendix 4 (see online supplementary material at

Forest plot of the Total Motor Score weighted mean differences (WMDs) in individual studies and pooled estimate using the quality effects model.
Values in parentheses are 95% confidence intervals.
WMD, weighted mean differences; OR, odds ratio; QE, quality effects; QE-1WS, quality effects one-way sensitivity analysis; RE, random effects; Early +, exclusion of one study resulting in a pooled effect most favoring “early” spinal surgery, Late +, exclusion of one study resulting in a pooled effect most favoring “late” spinal surgery.
The neurological improvement rate was specified in six studies (27%), comprising 495 pooled subjects. As a measure of improvement, two studies applied one ASIA Impairment Scale grade,
36,50
one study applied one or more ASIA Impairment Scale grades,
31
one study applied one or more Frankel scale grades,
49
and two studies applied “any”
39
or “good”
44
improvement. Three studies performed either two subgroup analyses
39,44
or included two “late” comparison groups,
49
resulting in a total of nine comparisons for meta-analysis. The ORs were considered heterogeneous (τ2>0; I
2
=15%; p=0·31). The pooled OR for neurological improvement was 2·23 (95% CI: 1·35, 3·76) indicating a twice as high probability of improvement for subjects undergoing “early” intervention (Table 3). The RE model again showed a slightly lower pooled OR of 1·74 (95% CI: 1·04, 2·91). Details of both models are presented in Supplementary Appendix 5 (see online supplementary material at
Length of stay outcomes
Twelve studies 30,31,34,37 –39,41,42,45,47,48,50 (55%) reported the length of stay in hospital and/or rehabilitation center, six studies 31,34,37,38,42,47 (27%) reported the length of stay in the intensive care unit, and three studies 34,38,42 (14%) reported the duration of mechanical ventilation. The length of stay in hospital was the single suitable measures for meta-analysis.
The mean length of hospital stay was specified in six studies (27%), comprising 1103 pooled subjects. Studies were considered heterogeneous (τ2>0; I
2
=71%; p<0·01). The pooled WMD of the length of hospital stay was −9·98 (95% CI: −13·10, −6·85), indicating that on average, subjects who undergo early spinal surgery spend 10 days less in hospital than do subjects who undergo late treatment (Table 3 and Fig. 3
). The RE model showed a smaller pooled WMD of −8·51 (95% CI: −12·78, −4·25). For details see Supplementary Appendix 6 (see online supplementary material at

Forest plot of the length of hospital stay weighted mean differences (WMDs, in days) in individual studies and pooled estimate using the quality effects model.
Mortality
Out of the 10 studies
29,31,34,36,38,39,46
–49
(46%) that reported on mortality outcomes, 9 were suitable for meta-analysis. Five studies reported two comparisons: two studies evaluated two subgroups,
38,39
two studies included two “late” comparison groups,
29,49
and one study evaluated short- and long-term follow-up.
36
This resulted in a total of 14 comparisons in the QE model, comprising 1148 pooled subjects. The ORs were considered homogeneous (τ2=0; I
2
=0%; p=0·64). The pooled OR for mortality was 0·97 (95% CI: 0·40, 2·31), indicating no differences in mortality risks between subjects undergoing “early” and “late” spinal surgery (Table 3 and Fig. 4). The RE model showed a similar pooled OR of 0·70 (95% CI: 0·35, 1·41). For details, see Supplementary Appendix 7 (see online supplementary material at

Forest plot of mortality odds ratios (ORs) in individual studies and pooled estimate using the quality effects model.
Other adverse events
Of the 19 studies (86%) reporting on adverse events, 5 studies specified all reported adverse events in both groups (see Supplementary Appendix 3). However, 12 studies (55%) reported at least one adverse event in both groups and, therefore, were suitable for meta-analysis. Twenty-eight comparisons were reported in the following (post-hoc devised) subgroups: neurological deterioration (five studies, 23%), 36,37,42,45,49 thromboembolic events (two studies, 9%), 36,41 (other) pulmonary events (four studies, 23%), 31,34,38,41 sepsis (three studies, 14%), 31,36,49 wound and skin events (three studies, 14%), 33,36,41 number of patients with adverse events (two studies, 9%), 39,47 and various other adverse events – six in total – in four studies. 33,36,37,41 Outcomes of analyses on all adverse events combined are presented in Table 3.
No statistically significant differences were found for the subgroups using the QE model. However, the RE model demonstrated a statistically significant pooled effect (OR: 0·67, 95% CI: 0·51, 0·89) for pulmonary events, indicating that subjects who underwent “late” surgical intervention had a higher risk of developing pulmonary complications, for example, pneumonia and atelectasis, than did “early” treated subjects. However, caution is required in interpretation of such random effect results, particularly when considering that no significant effects were observed in the quality effect model. Details of both models are presented in Supplementary Appendix 8 (see online supplementary material at
Sensitivity analysis and publication bias
One-way sensitivity analyses with each study individually removed was done for all performed QE models, see Table 3. The pooled estimate for Total Motor Score improvement varied considerably after having excluded a single study resulting in a pooled effect most (8·16, 95% CI: 1·61, 14·71)
41
and least (3·62, 95% CI: −1·15, 8·40)
42
favoring “early” spinal surgery. Similar effect modifications were seen for the neurological improvement rate OR analyses. Summary tables with data from the one way sensitivity analyses are presented in Supplementary Appendix 9 (see online supplementary material at
Results of the “criterion-based” sensitivity analyses are presented in Table 4. Pooled estimates for the Total Motor Score suggested smaller improvements for studies that included incomplete SCI subjects only, included both tetraplegics and paraplegics, and – unexpectedly – considered “early” surgery as being within 24 h instead of later (Table 4). Summary estimates for the neurological improvement rate suggested smaller improvements for studies that included incomplete SCI subjects or tetraplegics only. Pooled estimates for length of hospital stay outcomes suggested smaller differences for studies that included incomplete SCI subjects only. Mortality sensitivity analyses demonstrated substantially dispersed pooled risks for all criteria. The summary risk estimate for adverse events was sensitive to change when tested for severity of injury (incomplete SCI: higher risk) and timing of “early” surgery (≤24 h: lower risk, Table 4).
Values in parentheses are 95% confidence intervals.
Median Quality index score=0.20; btwo studies did not report the severity of the injury, see Table 2.
SCI, spinal cord injury; WMD, weight mean difference, OR, odds ratio.
Presence of small-study effects was assessed by evaluating funnel plots for symmetry both visually and statistically. A possibility of publication bias was found for the following outcomes: Total Motor Score (intercept: 1·85, 95% CI: 0·66, 3·03; p=0·01), “neurological improvement rate” (intercept: 0·9, 95% CI: −0·05, 1·85; p=0·06), and length of hospital stay (intercept: −2·53, 95% CI: −4·76, −0·29; p=0·04) (Fig. 5a–c). The funnel plots clearly illustrate that for all three outcomes it were the studies with smaller samples that favored “early” spinal surgery.

Funnel plots for Total Motor Score
Discussion
This meta-analysis demonstrated that patients who underwent “early” spinal surgery after tSCI had significantly greater neurological recovery outcomes and significantly shorter hospital admissions than did patients who underwent “late” spinal surgery. However, this study also demonstrated that these findings are far from robust, as estimates dispersed substantially in both one way and criterion-based sensitivity analyses and funnel plots also demonstrated a high likelihood of publication bias. Two major sources of heterogeneity were identified: 1) patients with a variety of SCI severities and levels of injury were included in original studies and 2) included studies demonstrated a high susceptibility to various sources of bias.
To our knowledge, this is the first literature review to present and integrate both qualitative and quantitative data demonstrating a profound heterogeneity among original studies reporting on the effects of the timing of spinal surgery after tSCI. Although basic methodological limitations have been addressed in previous reviews and meta-analyses, 5 –10 no study has yet quantified the potential impact of studies' susceptibility to bias on pooled treatment effect estimates. Previous reviews have addressed the methodological limitations of original studies, generally using an almost instinctive, standard closing sentence referring to “the need for randomized controlled trials.” However, clinical equipoise – defined as a state of genuine uncertainty on the part of the clinical investigator regarding the comparative therapeutic merits of each arm in a trial 52 – has lost ground, as a recent survey indicated an outspoken preference for “early” spinal decompression among spine surgeons. 53 This essential aspect, combined with a number of other feasibility issues, 54 makes conducting a properly powered, multicenter, surgical RCT on tSCI patients a very challenging, if not unfeasible, undertaking. 54 The current study indicates that much more can – and needs to be – done in addressing other sources of heterogeneity than confounding by indication alone.
The first identified source of heterogeneity relates to the variety of SCI severities and levels of injury included in original studies. SCI is a complex condition that presents with markedly heterogeneous manifestations related to the severity and anatomic level of injury. 55 Moreover, SCI patients may also present with concomitant injuries and pre-existent comorbidities. 56,57 Together with age, 58 these patient related factors have been demonstrated to be strongly related to patient outcomes after tSCI. 58 –60 In the current review, we found that 1) not a single study reported a complete overview of patients' age, SCI level, SCI severity, concomitant injuries and co-morbidities; 2) only 2 of the 14 studies that included more than one type of severity and/or level of SCI, stratified subjects accordingly; 38,39 and 3) only 4 of the 21 nonrandomized studies (19%) applied statistical techniques to adjusted evaluated outcome measures for confounding. 35,36,40,50
Although beyond the scope of this review, it is worth mentioning several countermeasures to address observed confounders in nonrandomized comparative studies, including matching, stratification, and multivariate regression techniques. 61 Statistical techniques to adjust for unobserved, or unconsidered, confounders have been documented, albeit such adjustments require a well-defined high-volume cohort of patients. 62 Finally, propensity scores can be used to balance the covariates in the two study groups, and thus reduce the magnitude of selection bias in nonrandomized studies. 63
The second identified source of heterogeneity relates to the individual studies' susceptibility to bias other than confounding by indication. This was assessed using a tailored scoring system devised based on items covered by three consensus-derived “checklists.” 12 –14 Where these checklists mainly focus on the detail of reporting, the devised scoring system was primarily intended as a tool to assess individual, surgical studies' susceptibility to bias, or methodological quality. Some authors state that a study's true susceptibility to bias can only be assessed through its surrogate measure: the quality of reporting. 64 Although this argument makes sense from a non-reporting perspective, it does not hold from a “do-reporting” perspective. To illustrate, author group “A” reports that the “treatment allocation was random, and double-blind” and author group “B” reports that “treatment allocation was neither random, nor blind.” Both reports score a high “quality of reporting” on this item, however, only group “A’’'s report scores a high “methodological quality.” Nonetheless, most of the methodological safeguards assessed in the current review were not reported in original studies, and, therefore, may have resulted in an overestimation of included studies' true susceptibility to bias.
The specific sources of individual studies' susceptibility to bias are outlined in Supplementary Appendix 3. Although the six prospective studies tended to present higher Qi values, the single included RCT 48 did not reach the highest quality score. Whereas random treatment allocation is a key methodological safeguard, other sources of heterogeneity than confounding by indication alone may have an equally striking impact on studies' validity. No study performed and reported a sample size calculation, and only two studies (9%) did report a primary outcome measure required for determining a sample size. Whereas the manifestations of SCI are highly variable, even more so are its recovery patterns. Sample size calculations are of crucial importance in addressing the impact of this disorder-specific variability on the outcomes of therapeutic studies. 59 In a recent protocol publication by the Prospective, Observational European Multicenter study on the efficacy of acute surgical decompression after traumatic Spinal Cord Injury (SCI-POEM) investigators, a stratified sample size calculation (by ASIA Impairment Scale grades) was reported and based on reference data from a large European cohort of SCI patients. 54,58
The heterogeneity found in the current review may raise the question of whether the eligibility criteria that were applied were too loose. Tables 2 and 4 perfectly demonstrate, however, that if we had applied more strict eligibility criteria for the sake of homogeneity, only a few studies would have been included, and the generalizability of our findings would have been limited. Discussed sources of heterogeneity and susceptibility to bias had a profound impact on the quantitative analyses presented in the current review. However, the slight differences between the pooled effects and variance reported in the RE and QE models may suggest otherwise. The explanation for this is straightforward: as the Qi median value of 0·20 indicates, a large majority of studies were susceptible to multiple sources of bias. As such, quality discounting was not limited to a few studies and resulted in an inevitable “quality homogeneity paradox.”
Although acknowledging the inherent limitations of applied sensitivity and small-study effects analyses, several findings merit further consideration. One-way sensitivity analyses indicated that the pooled effects of both the Total Motor Score improvement and the neurological improvement rate dispersed substantially after excluding one single study from the analysis. In addition, criterion-based sensitivity analyses demonstrated that none of the pooled neurological outcome estimates were robust enough after stratifying by each of the predefined criteria. These sensitivity analyses clearly demonstrate the lack of robustness in these data and, therefore, do not validate the statistically significant outcomes of the current meta-analysis. No significant differences in mortality were observed between the “early” and “late” treatment groups. In the light of potential selection bias, one might expect to see a higher mortality rate the “late” group. However, as only a few studies reported the required information to explore this hypothesis, no conclusive findings can be drawn from the current meta-analysis. Finally, results from the funnel plots and related Egger regression coefficients cannot be mistaken; there is a clear preponderance of smaller studies reporting effects favoring “early” spinal surgery. Without getting into a “chicken and egg” conundrum, the likelihood of observed publication bias coincides well with recently reported spine surgeons' outspoken preference for “early” spinal decompression after tSCI. 53
No consensus exists as to what constitutes a clinically meaningful neurological improvement. Where an improvement of five Motor Score points may seem to be trivial for clinicians and researchers, minimal neurological improvements could result in profound functional benefits, particularly in tetraplegic patients. However, only three studies related the clinical meaningfulness of observed neurological improvements to functional outcomes, that is, bladder function 33,37 and the Functional Independence Measure. 51
Illustrated by the variety of applied definitions for “early” spinal surgery in included studies, there is no consensus on what constitutes the threshold value distinguishing “early” from “late” surgery. More than 100 years after Burrell's problem definition, 1 it is still unclear what the therapeutic window of opportunity after tSCI is; and both clinical and pre-clinical scientists face a similar dilemma. 65 The criterion-based sensitivity analyses demonstrated a counterintuitively, greater neurological benefit in studies considering “early” surgery as within 72 h instead of within 24 h. This raises another discussion about the limited reliability of very early neurological examinations. 66 Others have stated, however, that when no concomitant injuries are present, and full cooperation can be obtained, the timing of the examination itself is not affecting the examination's precision. 67 Given that timing of surgery was considered as a “continuous” factor in the retrospective study reported by Newton et al. 43 (Qi: 0·20) it could not be included in the current analysis. However, using receiver operating characteristics (ROC) analysis, the latter authors determined that the optimal cutoff time for surgical reduction based on the outcome “complete neurological recovery after tSCI” was ≤4 h after injury for all patients as well as for patients completely paralyzed on admission. 43 Once confirmed in future studies, this finding could have far-reaching implications for pre-hospital management and logistics.
The variety of approaches for spinal surgery that has been applied within and between original studies has frequently been underappreciated in previous literature reviews. The current review demonstrated that the surgical approaches applied in both the “early” and “late” groups were reported and comparable in not more than two studies (9%). 31,39 In the remainder of the studies, it was not clear whether study groups had a comparable number of patients subjected to closed, anterior, posterior, or circumferential surgical approaches. In fact, 10 studies (46%) explicitly reported the term “decompression,” or “reduction,” as part of the intervention performed in all subjects. Fewer studies reported details on more specific surgical techniques believed to play a role in the decompression of spinal canal, including laminectomies and discectomies. Only three studies (14%) indicated having performed postsurgical imaging (magnetic resonance imaging in one study, 36 computed tomography in two studies 31,33 ) for confirming a successful decompression of the spinal cord, although no criteria for successful decompression were documented. Where pre-clinical scientists pay utmost attention to the standardization of injury models, 68 the current review clearly uncovered clinical researchers' limited attention to technical, qualitative, and quantitative aspects of the actual surgical decompression of the spinal cord in their final study reports.
Strengths of our study include independent screening and data abstraction by two reviewers, the critical appraisal of studies' susceptibility to bias, the inclusion of a variety of clinically relevant outcomes, applied statistical techniques for assessing bias in this review, and reporting according to the PRISMA statement's recommendations. 26 Nonetheless, several potential limitations of this review exist. Although the tailored bias assessment scoring instrument was based on items from established consensus-derived “checklists,” 12 –14 no validation study has demonstrated its psychometric properties. Pooling of outcome measures, including the neurological improvement rate, may have resulted in additional “analytical heterogeneity.” Including two “late” surgery comparison groups in analyses led to doubling of “early” intervention group data and may, therefore, have resulted in distorted outcomes. Sensitivity analyses may have limited specificity, as the predefined criteria were not adjusted. We did not perform a meta-regression analysis, as the number of included studies was too small and the overall quality of included studies was too low. 26 Finally, studies that did evaluate the effect of timing of surgical spinal decompression as a continuous variable could not be included in meta-analysis.
Conclusion
In conclusion, despite the fact that “early” spinal surgery was significantly associated with improved neurological and length of stay outcomes, none of these positive associations were robust, as demonstrated through sensitivity analyses. Moreover, funnel plots showed significant proof of publication bias. Therefore, the authors' verdict is that because of different sources of heterogeneity within and between original studies, there is no robust evidence supporting “early” spinal surgery after tSCI. However, what evidence there is clearly supports early intervention. Where properly powered RCTs have proven to be an almost unfeasible study design for acute, surgical tSCI management, we discussed several approaches that can be implemented in future nonrandomized studies, and subsequently diminish their susceptibility to bias. By endorsing such a high-quality standard, we may well find a definitive answer to the hypotheses postulated by Burrell more than a century ago.
Footnotes
Acknowledgments
We thank Michael Schuetz for his contributions during the early phases of this research project. Drs. van Middendorp and Doi contributed to the design of this research project. Drs. van Middendorp and Hosman conducted the search and abstracted the data. Dr. van Middendorp performed the statistical analysis. Dr. Doi assisted in data abstraction and statistical analysis. Dr. van Middendorp drafted the manuscript. All authors critically revised the manuscript and approved its final version.
Author Disclosure Statement
No competing financial interests exist.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
