Comparative Diagnostic Performance of Hounsfield Units,Vertebral Bone Quality,and Endplate Bone Quality for Predicting Cage Subsidence After Lumbar Fusion Surgery: A Systematic Review and Meta-Analysis

Abstract

Study Design

Systematic Review and Meta-analysis.

Objective

Poor preoperative bone quality is a key risk factor for postoperative cage subsidence (CS) following degenerative lumbar fusion surgery. Computed tomography (CT)-based Hounsfield unit (HU) values, and magnetic resonance imaging (MRI)-based vertebral bone quality (VBQ) and endplate bone quality (EBQ) scores, are reliable tools for assessing bone quality. This study is the first meta-analysis comparing the predictive value of HU, VBQ, and EBQ for postoperative CS.

Methods

A comprehensive literature search was conducted through databases such as PubMed up to April 5, 2025. The QUADAS-2 was used to evaluate the quality of included studies. Calculate the pooled sensitivity, specificity and hierarchical summary receiver operating characteristic (HSROC) curve, and perform subgroup analyses and meta-regression to identify sources of heterogeneity.

Results

20 studies involving a total of 2648 patients were included. The quality of these studies was relatively low. The areas under the HSROC curves for HU value, VBQ, and EBQ were 0.78 (95% CI, 0.74-0.81), 0.86 (95% CI, 0.83-0.89), and 0.83 (95% CI, 0.79-0.86), respectively. The pooled sensitivities were 0.84, 0.82, and 0.80, while the pooled specificities were all 0.76. The corresponding diagnostic odds ratios (DORs) were 16.34 (95% CI, 7.43-35.93), 14.67 (95% CI, 10.51-20.48), and 12.24 (95% CI, 5.63-26.61), respectively.

Conclusion

The HU value, VBQ, and EBQ all demonstrate relatively high efficacy in predicting CS, with the VBQ showing a modest advantage. Collectively, these indicators can provide valuable information for preoperative risk stratification and individualized surgical decision-making.

Keywords

vertebral bone quality endplate bone quality Hounsfield unit cage subsidence magnetic resonance imaging

Introduction

Lumbar fusion is an established treatment for degenerative lumbar spine diseases, demonstrating clear efficacy in relieving neural compression and restoring spinal stability.^1,2 However, postoperative complications related to the implant remain common, with cage subsidence (CS) being one of the most frequent.^3,4 CS not only compromises implant stability but may also result in fusion failure, recurrent intervertebral stenosis, and reduced clinical outcomes.³ Previous studies have identified poor preoperative bone quality as a significant risk factor for CS, with the prevalence of osteopenia/osteoporosis reaching up to 65.2% in patients undergoing lumbar spine surgery.⁵ Therefore, accurate preoperative assessment of bone quality is crucial for optimizing surgical planning and minimizing postoperative complications.⁶

In recent years, computed tomography (CT)-derived Hounsfield unit (HU) values have emerged as a potential tool for evaluating bone quality.⁷ HU values have shown a strong correlation with both DXA T-scores and BMD measured by quantitative CT (QCT).^8-10 In addition, the magnetic resonance imaging (MRI)-based vertebral bone quality (VBQ) score indirectly reflects bone quality by quantifying fat infiltration within the vertebral body and has demonstrated an accuracy of up to 81% in predicting osteopenia/osteoporosis.¹¹ The endplate bone quality (EBQ) score, by contrast, focuses on the signal characteristics of the endplate at the fusion interface, offering a more localized assessment of bone quality at the site of implant insertion and potentially providing a more direct correlation with the risk of CS.¹² Several studies have preliminarily explored the predictive value of HU values, VBQ scores, and EBQ scores for CS following lumbar spine surgery.^12-14

However, the diagnostic performance and relative advantages of HU values, VBQ scores, and EBQ scores in predicting CS have not been systematically summarized. Therefore, this study aims to compare the accuracy of the aforementioned three indicators in predicting postoperative CS in lumbar spine surgery, thereby providing a basis for selecting appropriate imaging indicators for preoperative assessment.

Materials and Methods

This study was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines¹⁵ and has been registered in PROSPERO (CRD420251160603).

Search Strategy and Screening

We conducted a comprehensive literature search across PubMed, Embase, Web of Science, Cochrane Library, and the China National Knowledge Infrastructure (CNKI), without language restrictions, with a cut-off date of April 5, 2025. The search aimed to identify original studies evaluating the utility of HU values, VBQ, and EBQ in predicting CS following degenerative lumbar spine surgery. The search strategy was based on the following keywords: “vertebral bone quality,” “Hounsfield unit,” “endplate bone quality,” “cage subsidence,” “magnetic resonance imaging,” and “lumbar vertebrae.”

Inclusion and Exclusion Criteria

After removing duplicates, two authors independently screened the titles and abstracts of all retrieved studies. Full texts of potentially eligible articles were then assessed for inclusion. Discrepancies were resolved through discussion and consensus. Studies were included if they investigated the use of HU values, VBQ scores, or EBQ scores in relation to CS following lumbar spine surgery. The exclusion criteria were: (1) studies not related to postoperative CS prediction using HU values, VBQ scores, or EBQ scores; (2) studies using enhanced CT or non–T1-weighted MRI for HU, VBQ, or EBQ measurements; (3) studies focused on the cervical or thoracic spine; and (4) studies with incomplete outcome data or review articles.

HU Value, VBQ and EBQ Measurement Methods

The HU value was measured according to the method described by Schreiber et al.¹⁶ Three regions of interest (ROIs), as large as possible, were delineated on non-contrast lumbar CT axial images: just below the upper endplate, at the mid-vertebral level, and just above the lower endplate of the lumbar vertebra. Areas of cortical bone, venous plexus, and bony islands were carefully avoided. The average of these three measurements was defined as the HU value of the vertebral body (Figure 1A). VBQ scoring was first proposed by Ehresman et al.¹⁷ using non-contrast T1-weighted median sagittal MRI images of the lumbar spine. ROIs were placed on the cancellous bone of the L1–L4 vertebrae to obtain vertebral signal intensity (SI). If the median sagittal slice was unsuitable, a parasagittal slice was used. The VBQ score was calculated by dividing the mean SI of the L1–L4 vertebral bodies by the SI of the cerebrospinal fluid (CSF) at the L3 level (Figure 1B). The EBQ score, as described by Jones et al.,¹² was also obtained from non-contrast T1-weighted MRI images. ROIs were placed 3 mm below the cartilage of both the upper and lower endplates of the operated segments. The EBQ score was calculated by dividing the mean SI of the upper and lower endplates by the SI of the L3-level CSF (Figure 1C).

Figure 1.

A represents an example of HU value measurement based on lumbar spine CT; B and C represent examples of VBQ and EBQ measurements from lumbar spine T1-weighted MRI, respectively

Definition of Cage Subsidence

The definition of CS was based on measurements obtained from mid-sagittal CT or radiographic images during postoperative follow-up. Most studies defined CS as a vertical displacement of more than 2 mm.^{1,4,13,14,18-30} In contrast, two studies^12,20 evaluated subsidence severity according to the percentage of intervertebral disc height loss, using the grading system proposed by Marchi et al.³¹ Additionally, three studies^28,32,33 defined CS as cephalad or caudal migration of the fusion device into the adjacent endplates.

Given the differences in the definition of CS in various studies, a meta-regression analysis will be conducted subsequently to assess the potential heterogeneity arising from different CS criteria.

Data Extraction

Data were independently extracted from eligible studies by two reviewers. Discrepancies were resolved through consultation with a third author to ensure consistency. Extracted data included HU values, VBQ, and EBQ scores for both CS and non-CS groups; counts of true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN); as well as study characteristics such as first author, publication year, country, incidence of CS, patient age, gender, body mass index (BMI), follow-up duration, CS definitions, sample sizes, cut-off values, sensitivity, and specificity.

Assessment of Methodological Quality

The quality of included studies was evaluated using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool.^34,35 QUADAS-2 is specifically designed to evaluate risk of bias in diagnostic accuracy studies across four domains: patient selection, index test, reference standard, and flow and timing. For each domain, risk of bias was assessed, and applicability concerns were evaluated for the first three domains. Each of the seven signaling questions was answered as “yes,” “no,” or “unclear.” A domain was considered to have low risk of bias only if all signaling questions were answered “yes.” If any signaling question was answered “no,” potential bias was indicated. The “unclear” response was used when available data were insufficient to make a definitive judgment.

Statistical Analysis

Data were analyzed using Review Manager (version 5.4.1, Thomson Research Soft, Carlsbad, CA, USA) and STATA (version 18.0, StataCorp, College Station, TX, USA). The diagnostic performance of HU values, VBQ, and EBQ scores for identifying CS after lumbar spine surgery was evaluated by calculating mean differences (MD) with 95% confidence intervals (CI). Threshold effect-related heterogeneity was assessed using the Spearman correlation coefficient. Extracted data on TP, FP, FN, and TN were pooled to calculate sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), diagnostic odds ratio (DOR), and their 95% CI.

To synthesize diagnostic accuracy across studies, we constructed hierarchical summary receiver operating characteristic (HSROC) curves. The HSROC model was fitted to all available data, accounting for variations in test accuracy among studies and the correlation between sensitivity and specificity within each study. Publication bias was assessed using funnel plots, and Fagan nomograms were applied to estimate the post-test probability of disease based on positive or negative test results. Subgroup and meta-regression analyses were performed to identify potential sources of heterogeneity. A P-value <0.05 was considered statistically significant.

Results

Included Studies

Figure 2 illustrates the flow of the stepwise literature review. According to the inclusion and exclusion criteria, 20 studies were ultimately included. The characteristics of the included studies are summarized in Table 1. Of these, 17 studies were conducted in China,^{4,13,14,18-21,23-30,32,33} and 3 were conducted in the USA,^1,12,22 encompassing a total of 2648 patients. Eight studies employed oblique lateral interbody fusion (OLIF),^{21,23-25,27-29,33} eight used transforaminal lumbar interbody fusion (TLIF),^{1,4,13,18-20,30,32} two used posterior lumbar interbody fusion (PLIF),^14,26 one used lateral lumbar interbody fusion (LLIF),¹² and one used both TLIF and OLIF,²² which may represent a potential source of heterogeneity.

Figure 2.

Flowchart of study selection

Table 1.

Characteristics of the Including Studies

No.	Study	Country	Surgery type	Incidence of subsidence (%)	Age (yr)	Female (%)	BMI (kg/m²)	Follow-up (mo)	Cut-off	Definition of CS
1	Wang et al. (2025)¹⁴	China	PLIF	29/84 (34.5)	60.06 ± 10.96	54/84 (64.3)	26.43 ± 3.82	8.57 ± 3.20	VBQ: 2.94	>2 mm
1	Wang et al. (2025)¹⁴	China	PLIF	29/84 (34.5)	60.06 ± 10.96	54/84 (64.3)	26.43 ± 3.82	8.57 ± 3.20	HU: 98.05	>2 mm
2	Khoylyan et al. (2025)²²	USA	TLIF/PLIF	24/51 (47.1)	53.34 ± 12.55	24/51 (47.1)	31.33 ± 6.14	22.1 ± 21.7	VBQ: 2.71	>2 mm
3	Huang et al. (2023)²¹	China	OLIF	39/102 (38.2)	59.62 ± 9.29	53/102 (52.0)	25.06 ± 3.02	20.1 ± 4.2	VBQ: 3.435	≥2 mm
4	Zhang et al. (2024)²⁶	China	PLIF	45/165 (27.3)	65.03 ± 8.56	75/165 (45.5)	24.56 ± 3.09	>12.0	VBQ: 2.875	>2 mm
4	Zhang et al. (2024)²⁶	China	PLIF	45/165 (27.3)	65.03 ± 8.56	75/165 (45.5)	24.56 ± 3.09	>12.0	EBQ: 4.19	>2 mm
5	Zheng et al. (2024)²⁷	China	OLIF	42/124 (33.9)	59.16 ± 8.89	94/124 (75.8)	25.76 ± 3.05	15.0 ± 4.0	N/A	2 mm
6	Hu et al. (2022)²⁰	China	TLIF	111/242 (45.9)	60.45 ± 13.26	160/242 (66.1)	25.98 ± 3.80	35.8 ± 16.3	VBQ: 3.28	Marchi grade ≥ I
7	Pu et al. (2023)²³	China	OLIF	15/52 (28.8)	68.02 ± 10.10	30/52 (57.7)	24.93 ± 2.98	16.81 ± 7.40	VBQ: 4.1	>2 mm
7	Pu et al. (2023)²³	China	OLIF	15/52 (28.8)	68.02 ± 10.10	30/52 (57.7)	24.93 ± 2.98	16.81 ± 7.40	HU: 116.2	>2 mm
8	Ai (1) et al. (2023)¹⁹	China	TLIF	40/283 (14.13)	58.30 ± 12.40	173/283 (61.1)	N/A	22.0 ± 10.3	VBQ: 3.5	>2 mm
9	Ai (2) et al. (2024)¹³	China	TLIF	52/104 (50.0)	60.80 ± 10.80	76/104 (73.1)	23.90 ± 3.10	22.4 ± 11.4	VBQ: 3.4	>2 mm
9	Ai (2) et al. (2024)¹³	China	TLIF	52/104 (50.0)	60.80 ± 10.80	76/104 (73.1)	23.90 ± 3.10	22.4 ± 11.4	EBQ: 4.7	>2 mm
10	Soliman et al. (2022)¹	USA	TLIF	42/74 (56.8)	59.74 ± 11.21	39/74 (64.3)	30.63 ± 5.44	N/A	N/A	≥2 mm
11	Luo et al. (2024)³⁰	China	TLIF	30/226 (13.3)	57.36 ± 12.95	139/226 (61.5)	23.90 ± 3.12	≥12.0	VBQ: 3.48	>2 mm
11	Luo et al. (2024)³⁰	China	TLIF	30/226 (13.3)	57.36 ± 12.95	139/226 (61.5)	23.90 ± 3.12	≥12.0	EBQ: 4.62	>2 mm
12	Chen et al. (2023)⁴	China	TLIF	42/280 (15.0)	58.8 ± 12.3	171/280 (61.1)	23.90 ± 3.10	N/A	EBQ: 4.73	>2 mm
13	Ran (1) et al. (2024)²⁵	China	OLIF	29/88 (33.0)	61.20 ± 9.50	53/88 (60.2)	25.03 ± 2.60	15.8 ± 5.2	EBQ: 2.318	>2 mm
14	Xie et al. (2022)¹⁸	China	TLIF	82/279 (29.4)	50.85 ± 8.87	137/279 (49.1)	24.48 ± 1.61	≥12.0	N/A	≥2 mm
15	Zhou (1) et al. (2021)²⁸	China	OLIF	16/76 (21.1)	56.10 ± 10.40	46/76 (60.5)	24.7 ± 2.2	≥6.0	HU: 115.7	Cage migration into endplate
16	Ran (2) et al. (2022)²⁵	China	OLIF	18/70 (25.7)	59.00 ± 10.40	41/70 (58.6)	N/A	15.4 ± 6	HU: 113	>2 mm
17	Chang et al. (2024)²⁹	China	OLIF	19/72 (26.4)	63.32 ± 10.49	47/72 (65.3)	25.16 ± 2.98	6.0	N/A	>2 mm
18	Jones et al. (2022)¹²	USA	SA-LLIF	50/205 (24.4)	65.94 ± 10.44	46/89 (51.7)	N/A	78.7%≥12.0	EBQ: 5.1	Marchi grade ≥ II
19	Mi et al. (2017)³²	China	TLIF	18/36 (50.0)	49.55 ± 38.08	16/36 (44.4)	23.94 ± 0.75	≥6.0	HU: 132	Cage migration into endplate
20	Zhou (2) et al. (2021)³³	China	OLIF	8/35 (22.9)	58.34 ± 14.93	20/35 (57.1)	25.30 ± 2.51	Average 38.7	HU: 116.1	Cage migration into endplate

Quality of the Included Studies

A graphical summary of the QUADAS-2 methodological assessment for these studies is presented in Figures 3A and 2B. In the patient selection domain, which assesses potential selection bias, 15 of the 20 studies did not specify whether patients were consecutively enrolled, resulting in an unclear risk of bias. The index test domain, which evaluates whether the index test interpretation was performed blinded to the reference standard and if the threshold was predetermined, showed high risk of bias in 3 studies and unclear risk in 11. The reference standard domain, assessing accuracy and blinding of the reference standard, identified high risk of bias in 3 studies and unclear risk in 10. The flow and timing domain, which examines the interval between index test and reference standard, consistency of reference standard application, and inclusion of all patients in analyses, showed unclear risk in 1 study. Regarding applicability concerns, the patient selection domain was judged unclear in 2 studies for potentially not matching the review question. The index test domain had high applicability concerns in 3 studies due to mismatch with the evaluation of complications. Similarly, the reference standard domain showed high applicability concerns in 3 studies.

Figure 3.

QUADAS-2 quality assessment; (A) Overall results. (B) Results of individual studies

In summary, the QUADAS-2 assessment indicates that the included studies have methodological limitations and potential risks of bias, which should be considered when interpreting the review findings.

Validity of HU Values, VBQ and EBQ in Identifying Cage Subsidence

8 studies^{14,18,23,24,28,29,32,33} reported HU values predicting CS in a total of 704 patients. The overall incidence of CS was 29.1% (205/704), as shown in Figure 4. HU values were significantly lower in the CS group compared to the control group (MD, −31.08; 95% CI, −37.23 to −24.92; P < 0.001; I² = 68%; random-effects model). When grouped by surgical type, three studies^14,18,32 involving TLIF/PLIF surgeries included 399 patients, with a postoperative CS incidence of 32.3% (129/399). HU values were significantly lower in the CS group (MD, −29.61; 95% CI, −33.22 to −25.99; P < 0.001). Five studies^{23,24,28,29,33} involving OLIF/LLIF included 305 patients, with a CS incidence of 24.9% (76/305). HU values were again significantly lower in the CS group than in controls (MD, −32.30; 95% CI, −44.70 to −19.89; P < 0.001). No significant differences were observed between subgroups (P = 0.68, I² = 0%), indicating that surgical approach did not significantly affect HU values. To further explore potential sources of heterogeneity, we conducted a sensitivity analysis by sequentially removing individual studies. Excluding the study by Chang et al²⁹ markedly reduced heterogeneity to an acceptable level (I² = 35%), while the effect size remained statistically significant (mean difference = −37.88, 95% CI: −44.34 to −31.42, P < 0.001). These findings suggest that the study by Chang et al was the primary contributor to heterogeneity.

Figure 4.

Forest plot of HU values grouped by type of surgery

12 studies^{1,12-14,19-23,26,27,30} reported VBQ scores predicting CS in a total of 1712 patients. The overall incidence of CS was 30.3% (519/1712), as shown in Figure 5. VBQ scores were significantly higher in the CS group than in the control group (MD, 0.60; 95% CI, 0.46-0.74; P < 0.001; I² = 75%; random-effects model). Grouped by surgical type, 8 studies^{1,13,14,19,20,22,26,30} involving TLIF/PLIF surgeries included 1229 patients, with a postoperative CS incidence of 30.3% (373/1229). VBQ scores were significantly higher in the CS group (MD, 0.61; 95% CI, 0.48-0.75; P < 0.001). 4 studies^12,21,23,27 involving OLIF/LLIF procedures included 483 patients, with a CS incidence of 30.2% (146/483). VBQ scores were again significantly higher in the CS group than in controls (MD, 0.65; 95% CI, 0.26-1.04; P < 0.001). No significant difference was observed between subgroups (P = 0.88, I² = 0%), indicating that surgical approach did not significantly affect VBQ scores. After eliminating each individual study one by one, the heterogeneity did not significantly decrease.

Figure 5.

Forest plot of VBQ scores grouped by type of surgery

7 studies^{4,12,13,25-27,30} reported EBQ scores predicting CS in a total of 1192 patients. The overall incidence of CS was 24.3% (290/1192), as shown in Figure 6. EBQ scores in the CS group were significantly higher than those in the control group (MD, 0.77; 95% CI, 0.65-0.88; P < 0.001; I² = 3%; fixed-effects model). When grouped by surgical type, 4 studies^4,13,26,30 involving TLIF/PLIF surgeries included 775 patients, with a CS incidence of 21.8% (169/775). EBQ scores were significantly higher in the CS group (MD, 0.75; 95% CI, 0.62-0.87; P < 0.001). The remaining three studies^12,25,27 reported on OLIF/LLIF procedures involving 417 patients, with a postoperative CS incidence of 29.0% (121/417). Again, EBQ scores were significantly higher in the CS group (MD, 0.92; 95% CI, 0.57-1.26; P < 0.001). No significant difference was observed between the subgroups (P = 0.37, I² = 0%), suggesting that surgical approach did not significantly affect EBQ scores.The pooled analysis showed low heterogeneity (I² = 3%), indicating strong consistency among the study results.

Figure 6.

Forest plot of EBQ scores grouped by type of surgery

Diagnostic Value of HU Value, VBQ and EBQ for CS after Lumbar Spine Surgery

Seven studies^{14,18,23,24,28,32,33} reported information on TP, FP, FN, and TN outcomes. Given the considerable variation in thresholds (HU 98-132; mean, 115.18 ± 10.80), we conducted a threshold effect test. The results indicated no significant heterogeneity (P = 0.787). The pooled sensitivity and specificity were 0.84 (95% CI, 0.70-0.92) and 0.76 (95% CI, 0.71-0.80), respectively (Figure 7). The PLR was 3.47 (95% CI, 2.83-4.25), the NLR was 0.21 (95% CI, 0.11-0.41), and the DOR was 16.34 (95% CI, 7.43-35.93), as summarized in Table 2. The HSROC curve, shown in Figure 8A, yielded an AUC of 0.78 (95% CI, 0.74-0.81).

Figure 7.

The forest plot of the combined sensitivity and specificity of HU for assessing CS

Table 2.

Sensitivity, Specificity, Positive Likelihood Ratio (PLR), Negative Likelihood Ratio (NLR), and Diagnostic Odds Ratio (DOR) of HU, VBQ, and EBQ for Predicting Postoperative Cage Subsidence (CS)

Methods	Sensitivity (95%CI)	Specificity (95%CI)	Diagnostic odds ratio (95%CI)	Positive likelihood ratio (95%CI)	Negative likelihood ratio (95%CI)
HU	0.84 (0.70-0.92)	0.76 (0.71-0.80)	16.34 (7.43-35.93)	3.47 (2.83-4.25)	0.21 (0.11-0.41)
VBQ	0.82 (0.76-0.87)	0.76 (0.68-0.83)	14.67 (10.51-20.48)	3.46 (2.66-4.51)	0.24 (0.18-0.31)
EBQ	0.80 (0.61-0.91)	0.76 (0.67-0.83)	12.24 (5.63-26.61)	3.27 (2.48-4.32)	0.27 (0.13-0.53)

Figure 8.

Hierarchical summary receiver operating characteristic (HSROC) curves for predicting postoperative CS. (A) HU value, (B) VBQ, and (C) EBQ

Nine studies^{13,14,19-23,26,30} reported information on TP, FP, FN, and TN outcomes. The VBQ threshold ranged from 2.71 to 4.10, with a mean of 3.30 ± 0.42. The threshold effect test revealed no significant heterogeneity (P = 0.205). The pooled sensitivity and specificity were 0.82 (95% CI, 0.76-0.87) and 0.76 (95% CI, 0.68-0.83), respectively (Figure 9). The pooled PLR was 3.46 (95% CI, 2.66-4.51), the NLR was 0.24 (95% CI, 0.18-0.31), and the DOR was 14.67 (95% CI, 10.51-20.48), as summarized in Table 2. The HSROC curve in Figure 8B yielded an AUC of 0.86 (95% CI, 0.83-0.89).

Figure 9.

Forest plot of the combined sensitivity and specificity of VBQ for assessing CS

Six studies^{4,12,13,25,26,30} reported information on TP, FP, FN, and TN outcomes. The EBQ threshold ranged from 2.318 to 5.100, with a mean of 4.28 ± 1.00. The results of the threshold effect analysis indicated no significant heterogeneity (P = 0.266). The pooled sensitivity and specificity were 0.80 (95% CI, 0.61-0.91) and 0.76 (95% CI, 0.67-0.83), respectively (Figure 10). The PLR was 3.27 (95% CI, 2.48-4.32), the NLR was 0.27 (95% CI, 0.13-0.53), and the DOR was 12.24 (95% CI, 5.63-26.61), as presented in Table 2. The HSROC curve, illustrated in Figure 8C, showed an AUC of 0.83 (95% CI, 0.79-0.86).

Figure 10.

Forest plot of the combined sensitivity and specificity of EBQ for assessing CS

Furthermore, the meta-regression analysis using HU, VBQ, and EBQ as covariates showed that there was no statistically significant difference in diagnostic accuracy among these three indicators (P > 0.05), indicating that they exhibited approximately equivalent diagnostic performance in predicting CS.

Risk of Publication Bias

Figure 11A–C presents the funnel plots of HU value, VBQ and EBQ. The results indicate that the risk of publication bias for these three indicators is relatively low, with P values of 0.43, 0.09 and 0.25 respectively.

Figure 11.

Deeks’ funnel plots for assessing publication bias in predicting postoperative CS. (A) HU value, (B) VBQ, and (C) EBQ

Fagan Plots

The clinical applicability of HU value, VBQ, and EBQ for identifying high-risk patients with CS was further evaluated using the Fagan nomogram (Figure 12A–C). Assuming a pre-test probability of 30% (consistent with the approximate prevalence in the included studies), a positive test result increased the post-test probabilities to 60%, 60%, and 58% for HU value, VBQ, and EBQ, respectively. Conversely, a negative test result reduced the post-test probabilities to 8%, 9%, and 10%. These findings suggest that the positive and negative predictive capacities of the three indicators are largely comparable.

Figure 12.

Fagan nomograms for predicting postoperative CS. (A) HU value, (B) VBQ, and (C) EBQ

Meta-Regression Analysis

To further investigate potential sources of heterogeneity for HU values, VBQ, and EBQ, we performed a meta-regression analysis (Figure 13A–C). The covariates included study country, CS definition, surgical type, cutoff value, age, gender ratio, BMI, sample size, and follow-up duration. The analysis indicated that the proportion of female patients might contribute to heterogeneity in VBQ, while CS definition, country, cut-off, and BMI were significant sources of heterogeneity for EBQ (P < 0.05). In contrast, no covariates were identified as significant sources of heterogeneity for HU values.

Figure 13.

Meta-regression analysis of included studies to identify sources of heterogeneity in predicting postoperative CS. (A) HU value, (B) VBQ, and (C) EBQ

Discussion

In this study, the diagnostic efficacy of HU values, VBQ scores, and EBQ scores in predicting CS after degenerative lumbar fusion was evaluated for the first time using a combination of systematic review and meta-analysis. Based on 20 included studies encompassing 2648 patients, our findings confirmed that all three imaging metrics differed significantly between the CS and non-CS groups. The mean cut-off values were 115.18 ± 10.80 for HU, 3.30 ± 0.42 for VBQ, and 4.28 ± 1.00 for EBQ. Among these, the VBQ score demonstrated the highest diagnostic accuracy, with a pooled AUC of 0.86 (95% CI: 0.83-0.89), suggesting its superior potential for preoperative identification of patients at high risk for CS.

Preoperative Bone Quality Assessment: HU, VBQ, and EBQ Scores

Poor preoperative bone quality has been identified as a significant risk factor for CS in patients undergoing spinal surgery.^4,19 To improve clinical outcomes, spinal surgeons should carefully assess patients’ bone quality before surgery and implement timely interventions to reduce the risk of postoperative complications. DXA is considered the gold standard for evaluating osteoporosis.³⁶ However, in patients with degenerative lumbar spine disease, factors such as osteophyte formation, aortic calcification, vertebral endplate degeneration, and prior compression fractures can interfere with DXA accuracy in assessing cancellous BMD of the vertebrae.^37-39 Therefore, there is a critical need for a simple, accurate method to assess preoperative bone quality.

CT-based HU values are considered a reliable alternative for assessing bone quality by placing a ROI in the cancellous bone of a vertebra while avoiding cortical bone and vascular structures, thus directly obtaining vertebral HU measurements.¹⁶ Studies have shown that HU values may offer improved accuracy for osteoporosis screening compared to DXA, and lumbar spine CT is routinely performed as part of preoperative assessment in spinal surgery.^37,40 This allows for opportunistic evaluation of bone quality and prediction of postoperative complications without additional cost or radiation exposure.^14,23

The VBQ score was first proposed by Ehresman et al¹¹ and has since been validated in subsequent studies. Salzmann et al¹⁰ demonstrated a significant correlation between the VBQ and BMD measured by QCT, confirming its ability to distinguish between normal BMD and osteopenia/osteoporosis. The VBQ score is based on the principle that adipose tissue appears hyperintense on T1-weighted MRI, and it is calculated as the SI ratio between the vertebral body and CSF. This metric indirectly reflects bone quality, with higher scores indicating poorer bone quality.⁴¹ Like HU values, the VBQ score can be derived from routine preoperative imaging—specifically MRI—and offers the advantages of being non-invasive, radiation-free, and widely accessible, making it a practical tool for opportunistic preoperative assessment of bone quality. Studies have shown that the VBQ score not only predicts fragility fractures⁴² but is also strongly associated with postoperative complications such as CS¹³ and screw loosening.⁴³

With increasing recognition of the importance of endplate bone quality at the fusion interface,⁴⁴ research has shown that compromised bone quality in this region significantly elevates the risk of CS and may lead to poor clinical outcomes or even necessitate revision surgery.^45,46 In response, Jones et al¹² introduced the EBQ score, which assesses local bone quality at the fusion surface by measuring the SI ratio of the superior and inferior endplates of the operative segment to the CSF at the L3 level on MRI. The EBQ score has been shown to be closely associated with postoperative CS and may address the limitations of the VBQ score in evaluating localized endplate changes.^13,26,30

Diagnostic Performance and Clinical Implications

The AUC is widely used to assess the diagnostic accuracy of tests, with values between 0.75 and 0.92 generally considered favorable.⁴⁷ In this study, the AUC for HU values was 0.78, which was lower than that of VBQ (0.86) and EBQ (0.83), suggesting that HU values may provide relatively limited predictive accuracy for postoperative CS following lumbar surgery. Further analysis showed that the pooled sensitivity and specificity of the VBQ score were 0.82 and 0.76, respectively, indicating a relatively strong diagnostic value. In contrast, the EBQ score demonstrated pooled sensitivity and specificity of 0.80 and 0.76, suggesting slightly inferior overall performance compared with VBQ.

This difference in diagnostic performance may be attributed to the distinct measurement focuses of the two methods. The VBQ score emphasizes the overall SI and structural integrity of vertebral cancellous bone, thereby reflecting global bone quality. In contrast, the EBQ score evaluates localized regions of the endplate and is more susceptible to confounding factors such as endplate sclerosis, degenerative changes, and MRI signal variability. For instance, some studies included patients with Modic changes in the endplate,¹³ which are characterized by low SI on T1-weighted images and may affect the accuracy of EBQ measurements. Additionally, the EBQ score requires precise measurement of the subchondral bone within 3 mm beneath the endplate, which can be challenging in cases of irregular endplate morphology or poor image clarity. In contrast, the VBQ score covers a broader area of cancellous bone and is less prone to measurement bias, making it more standardized and easier to apply across studies.

It is worth noting that the HU values demonstrated higher sensitivity (0.84) and diagnostic odds ratio (16.34) than VBQ and EBQ, indicating certain advantages in identifying high-risk patients. However, its overall AUC was relatively low, likely reflecting susceptibility to factors such as tube voltage, contrast agent use, and slice thickness.^48,49

Source of Heterogeneity

The HU values in this study exhibited considerable heterogeneity. After excluding the study by Chang et al.,²⁹ the I² value decreased significantly from 81% to 35%, suggesting that this study might be the main source of heterogeneity. The reason for this difference might be that measuring only one layer of the vertebrae resulted in it, while other studies measured three layers of the vertebrae and took the average value.^24,29 However, after including variables such as surgical methods, thresholds, and age in the Meta regression analysis, it was found that these factors did not significantly affect the heterogeneity of HU values.

The meta-analysis of VBQ scores also showed high heterogeneity. Through meta-regression, the gender ratio was identified as a potential source of heterogeneity. The proportion of females in the included studies varied greatly (45%–73%), and women, especially postmenopausal women, had a significantly higher incidence of osteoporosis and more pronounced decline in vertebral bone quality, making VBQ more sensitive to this. Additionally, variations in EBQ scores were primarily attributed to differences in CS definition, study country, cut-off values, and BMI (P < 0.05).

Therefore, in future studies, efforts should be made to maintain consistent measurement methods, balance the gender ratio of the study subjects, and standardize threshold settings to enhance the comparability among different studies and the reliability of the results.

Limitations and Future Research Directions

To our knowledge, this is the first systematic meta-analysis to compare the diagnostic value of HU values, VBQ, and EBQ scores for predicting the risk of postoperative CS in lumbar spine surgery. Despite the reference value of our findings, several limitations should be acknowledged. First, most included studies were conducted in China, with the majority of participants being middle-aged or elderly and of East Asian descent, which may limit the generalizability of the findings to younger or healthier populations and to other regions. Second, most studies employed retrospective designs, which are prone to selection bias, and the overall quality of the included studies was relatively low. Third, many studies only reported using ‘CT’ or ‘T1-weighted MRI’ for HU, VBQ, or EBQ measurement, without providing detailed imaging parameters such as CT tube voltage (kVp), slice thickness, or MRI field strength, limiting our ability to identify sources of heterogeneity in meta-regression analyses. Fourth, although HU values, VBQ scores, and EBQ scores demonstrated diagnostic utility, direct head-to-head comparative studies evaluating all three modalities within the same cohort are still lacking, limiting comprehensive assessment of their relative strengths. Fifth, thresholds for VBQ varied widely; although Spearman correlation analysis suggested minimal threshold effect, the optimal cut-off values require further investigation.

Future research should prioritize prospective, multicenter studies, include more diverse populations, and employ standardized imaging protocols—such as consistent CT tube voltage, uniform slice thickness, and fixed MRI field strength and sequence parameters. In addition, direct head-to-head comparisons of HU, VBQ, and EBQ within the same cohort are required to verify their relative diagnostic performance.

Conclusion

This study systematically assessed the diagnostic value of HU values, VBQ scores, and EBQ scores for predicting CS following lumbar fusion surgery, incorporating 20 studies with a total of 2648 patients. The results demonstrated that all three imaging metrics effectively distinguished patients with CS, with the VBQ score showing slightly superior diagnostic performance (AUC = 0.86). Therefore, the VBQ score represents a valuable preoperative tool for evaluating bone quality, aiding in the identification of high-risk patients, guiding surgical decision-making, and contributing to the prevention of postoperative complications.

Footnotes

ORCID iDs

Song Wang

Hua Jiang

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by National Natural Science Foundation of China (82360438), Guangxi Natural Science Foundation (2023GXNSFAA026339), and Joint Project on Regional High-Incidence Diseases Research of Guangxi Natural Science Foundation (2024GXNSFDA010043).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Soliman

Aguirre

Kuo

, et al. Vertebral bone quality score independently predicts cage subsidence following transforaminal lumbar interbody fusion. Spine J. 2022;22(12):2017-2023. doi:10.1016/j.spinee.2022.08.002

de Kunder

van Kuijk

Rijkers

, et al. Transforaminal lumbar interbody fusion (TLIF) versus posterior lumbar interbody fusion (PLIF) in lumbar spondylolisthesis: a systematic review and meta-analysis. Spine J. 2017;17(11):1712-1721. doi:10.1016/j.spinee.2017.06.018

Yao

Chou

Lin

Wang

Liu

Chang

. Risk factors of cage subsidence in patients received minimally invasive transforaminal lumbar interbody fusion. Spine. 2020;45(19):E1279-E1285. doi:10.1097/BRS.0000000000003557

Chen

Huang

, et al. MRI-based Endplate Bone Quality score independently predicts cage subsidence following transforaminal lumbar interbody fusion. Spine J. 2023;23(11):1652-1658. doi:10.1016/j.spinee.2023.07.002

Haffer

Muellner

Chiapparelli

, et al. Bone quality in patients with osteoporosis undergoing lumbar fusion surgery: analysis of the MRI-based vertebral bone quality score and the bone microstructure derived from microcomputed tomography. Spine J. 2022;22(10):1642-1650. doi:10.1016/j.spinee.2022.05.008

Gao

, et al. Assessing the utility of MRI-based vertebral bone quality (VBQ) for predicting lumbar pedicle screw loosening. Eur Spine J. 2024;33(1):289-297. doi:10.1007/s00586-023-08034-3

Chen

Zhu

Zhou

, et al. Utility of MRI-based vertebral bone quality scores and CT-based Hounsfield unit values in vertebral bone mineral density assessment for patients with diffuse idiopathic skeletal hyperostosis. Osteoporos Int. 2023;35:705-715. doi:10.1007/s00198-023-06999-x

Pan

Shi

Wang

, et al. Automatic opportunistic osteoporosis screening using low-dose chest computed tomography scans obtained for lung cancer screening. Eur Radiol. 2020;30(7):4107-4116. doi:10.1007/s00330-020-06679-y

Yang

Liao

Wang

, et al. Opportunistic osteoporosis screening using chest CT with artificial intelligence. Osteoporos Int. 2022;33:2547-2561. doi:10.1007/s00198-022-06491-y

10.

Salzmann

Okano

Jones

, et al. Preoperative MRI-based vertebral bone quality (VBQ) score assessment in patients undergoing lumbar spinal fusion. Spine J. 2022;22(8):1301-1308. doi:10.1016/j.spinee.2022.03.006

11.

Ehresman

Pennington

Schilling

, et al. Novel MRI-based score for assessment of bone density in operative spine patients. Spine J. 2020;20(4):556-562. doi:10.1016/j.spinee.2019.10.018

12.

Jones

Okano

Arzani

, et al. The predictive value of a novel site-specific MRI-based bone quality assessment, endplate bone quality (EBQ), for severe cage subsidence among patients undergoing standalone lateral lumbar interbody fusion. Spine J. 2022;22(11):1875-1883. doi:10.1016/j.spinee.2022.07.085

13.

Zhu

Chen

, et al. Comparison of predictive value for cage subsidence between MRI-based endplate bone quality and vertebral bone quality scores following transforaminal lumbar interbody fusion: a retrospective propensity-matched study. Spine J. 2024;24(6):1046-1055. doi:10.1016/j.spinee.2024.01.014

14.

Wang

Zhang

Tong

Miao

Wang

. Comparison of Hounsfield unit, vertebral bone quality, and dual-energy X-Ray absorptiometry T-Score for predicting cage subsidence after posterior lumbar interbody fusion. Glob Spine J. 2025;15(4):2226-2235. doi:10.1177/21925682241293038

15.

Page

McKenzie

Bossuyt

, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. doi:10.1136/bmj.n71

16.

Schreiber

Anderson

Rosas

Buchholz

. Hounsfield units for assessing bone mineral density and strength: a tool for osteoporosis management. J Bone Joint Surg Am. 2011;93(11):1057-1063. doi:10.2106/JBJS.J.00160

17.

Ehresman

Schilling

Pennington

, et al. A novel MRI-based score assessing trabecular bone quality to predict vertebral compression fractures in patients with spinal metastasis. J Neurosurg Spine. 2020;32(4):499-506. doi:10.3171/2019.9.SPINE19954

18.

Xie

Yang

, et al. The value of Hounsfield units in predicting cage subsidence after transforaminal lumbar interbody fusion. BMC Muscoskelet Disord. 2022;23(1):882. doi:10.1186/s12891-022-05836-2

19.

Chen

Huang

, et al. MRI-based vertebral bone quality score for predicting cage subsidence by assessing bone mineral density following transforaminal lumbar interbody fusion: a retrospective analysis. Eur Spine J. 2023;32(9):3167-3175. doi:10.1007/s00586-023-07854-7

20.

Yeh

Niu

, et al. Novel MRI-based vertebral bone quality score as a predictor of cage subsidence following transforaminal lumbar interbody fusion. J Neurosurg Spine. 2022;37(5):654-662. doi:10.3171/2022.3.SPINE211489

21.

Huang

Chen

Liu

Feng

. Vertebral bone quality score to predict cage subsidence following oblique lumbar interbody fusion. J Orthop Surg Res. 2023;18(1):258. doi:10.1186/s13018-023-03729-1

22.

Khoylyan

Girgis

Tang

Vazquez

Chen

. The utility of magnetic resonance imaging-based vertebral bone quality scores as a predictor of cage subsidence following transforaminal and posterior lumbar interbody fusion. Clin Spine Surg. 2025;38(3):E145-E151. doi:10.1097/BSD.0000000000001682

23.

Wang

Ran

, et al. Comparison of predictive performance for cage subsidence between CT-based Hounsfield units and MRI-based vertebral bone quality score following oblique lumbar interbody fusion. Eur Radiol. 2023;33(12):8637-8644. doi:10.1007/s00330-023-09929-x

24.

Ran

Xie

Zhao

Huang

Zeng

. Low Hounsfield units on computed tomography are associated with cage subsidence following oblique lumbar interbody fusion (OLIF). Spine J. 2022;22(6):957-964. doi:10.1016/j.spinee.2022.01.018

25.

Ran

Xie

Zhao

, et al. MRI-based endplate bone quality score predicts cage subsidence following oblique lumbar interbody fusion. Spine J. 2024;24(10):1922-1928. doi:10.1016/j.spinee.2024.05.002

26.

Zhang

Liang

Shi

Tuo

Yang

. Comparative analysis of MRI-based VBQ and EBQ score for predicting cage subsidence in PILF surgery. J Orthop Surg Res. 2024;19(1):839. doi:10.1186/s13018-024-05332-4

27.

Zheng

Tong

, et al. Predictive value of different site-specific MRI-based assessments of bone quality for cage subsidence among patients undergoing oblique lumbar interbody fusion. J Neurosurg Spine. 2024;41(2):246-253. doi:10.3171/2024.2.SPINE231107

28.

Zhou

Yuan

Liu

Zhou

Wang

. Hounsfield unit value on CT as a predictor of cage subsidence following stand-alone oblique lumbar interbody fusion for the treatment of degenerative lumbar diseases. BMC Muscoskelet Disord. 2021;22(1):960. doi:10.1186/s12891-021-04833-1

29.

Chang

Xiang

Wei

, et al. Analysis of factors impacting interbody cage subsidence following stand-alone oblique lumbar inter-body fusion. J Precis Med. 2024;39(03):252-256. doi:10.13362/j.jpmed.202403014

30.

Luo

Chen

, et al. Predictive value of MRI⁃based vertebral bone quality score and endplate bone quality score for cage subsidence after transforaminal lumbar interbody fusion. Practical Journal of Clinical Medicine. 2024;21(03):20-25.

31.

Marchi

Abdala

Oliveira

Amaral

Coutinho

Pimenta

. Radiographic and clinical evaluation of cage subsidence after stand-alone lateral interbody fusion. J Neurosurg Spine. 2013;19(1):110-118. doi:10.3171/2013.4.SPINE12319

32.

Zhao

. Vertebral body Hounsfield units are associated with cage subsidence after transforaminal lumbar interbody fusion with unilateral pedicle screw fixation. Clin Spine Surg. 2017;30(8):E1130-E1136. doi:10.1097/BSD.0000000000000490

33.

Zhou

Liu

Yuan

Wang

. CT value of vertebral body predicting Cage subsidence after stand-alone oblique lumbar interbody fusion. Chin J Reparative Reconstr Surg. 2021;35(11):1449-1456.

34.

Whiting

Rutjes

Westwood

, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529-536. doi:10.7326/0003-4819-155-8-201110180-00009

35.

Leeflang

Deeks

Takwoingi

Macaskill

. Cochrane diagnostic test accuracy reviews. Syst Rev. 2013;2:82. doi:10.1186/2046-4053-2-82

36.

Link

. Osteoporosis imaging: state of the art and advanced imaging. Radiology. 2012;263(1):3-17. doi:10.1148/radiol.2633201203

37.

Wang

Liu

Liao

, et al. Preoperative assessment of spinal bone quality using Hounsfield unit values and vertebral bone quality scores in patients with rheumatoid arthritis: a retrospective propensity-matched study. Eur Spine J. 2025;34:3186-3195. doi:10.1007/s00586-025-09003-8

38.

Hoshino

Kushida

Takahashi

Ohishi

Sugiyama

Inoue

. The influence of aortic calcification on spinal bone mineral density in vitro. Calcif Tissue Int. 1996;59(1):21-23. doi:10.1007/s002239900079

39.

Kinoshita

Tamaki

Hashimoto

Kasagi

. Factors influencing lumbar spine bone mineral density assessment by dual-energy X-ray absorptiometry: comparison with lumbar spinal radiogram. J Orthop Sci. 1998;3(1):3-9. doi:10.1007/s007760050015

40.

Wang

Liu

Yang

, et al. The significance of combined OSTA, HU value and VBQ score in osteoporosis screening before spinal surgery. World Neurosurg. 2024;182:e692-e701. doi:10.1016/j.wneu.2023.12.022

41.

Kim

Lyons

Sarmiento

Lafage

Iyer

. MRI-based score for assessment of bone mineral density in operative spine patients. Spine. 2023;48(2):107-112. doi:10.1097/BRS.0000000000004509

42.

Wang

Liu

, et al. Simplified S1 vertebral bone quality score in the assessment of patients with vertebral fragility fractures. World Neurosurg. 2024;185:e1004-e1012. doi:10.1016/j.wneu.2024.03.011

43.

Zhu

Hua

, et al. Vertebral bone quality score as a predictor of pedicle screw loosening following surgery for degenerative lumbar disease. Spine. 2023;48:1635-1641. doi:10.1097/BRS.0000000000004577

44.

Gao

, et al. The influence of endplate morphology on cage subsidence in patients with stand-alone oblique lateral lumbar interbody fusion (OLIF). Glob Spine J. 2023;13(1):97-103. doi:10.1177/2192568221992098

45.

Okano

Jones

Salzmann

, et al. Endplate volumetric bone mineral density measured by quantitative computed tomography as a novel predictive measure of severe cage subsidence after standalone lateral lumbar fusion. Eur Spine J. 2020;29(5):1131-1140. doi:10.1007/s00586-020-06348-0

46.

Jones

Okano

Salzmann

, et al. Endplate volumetric bone mineral density is a predictor for cage subsidence following lateral lumbar interbody fusion: a risk factor analysis. Spine J. 2021;21(10):1729-1737. doi:10.1016/j.spinee.2021.02.021

47.

Jones

Athanasiou

. Summary receiver operating characteristic curve analysis techniques in the evaluation of diagnostic tests. Ann Thorac Surg. 2005;79(1):16-20. doi:10.1016/j.athoracsur.2004.09.040

48.

Gerety

Bearcroft

. L1 vertebral density on CT is too variable with different scanning protocols to be a useful screening tool for osteoporosis in everyday practice. Br J Radiol. 2018;91(1084):20170395. doi:10.1259/bjr.20170395

49.

Hou

Cheng

You

, et al. Effect of intravenous iodinated contrast administration on diagnostic ability for osteoporosis using CT attenuation measurement in patients with liver cirrhosis. Br J Radiol. 2022;95(1139):20201251. doi:10.1259/bjr.20201251