Abstract
Background
Recently, a scoring system to grade sacroiliac joint (SIJ) degeneration using computed tomography (CT) scans was described. No independent evaluation has determined the inter- and intra-observer agreement using this scheme.
Purpose
To perform an independent inter- and intra-observer agreement assessment using the Eno classification and determining gas in the SIJ.
Material and Methods
We studied 64 patients aged ≥60 years who were evaluated with abdominal and pelvic computed tomography scans. Six physicians (three orthopaedic spine surgeons and three musculoskeletal radiologists) assessed axial images to grade SIJ degeneration into grade 0 (normal), grade 1 (mild degeneration), grade 2 (significant degeneration), and grade 3 (ankylosis). We also evaluated the agreement assessing the presence of gas in the SIJ. After a four-week interval, all cases were presented in a random sequence for repeat assessment. We determined the agreement using the kappa (κ) or weighted kappa coefficient (wκ).
Results
The inter-observer agreement was moderate (wκ = 0.50 [0.44–0.56]), without differences among surgeons (wκ = 0.53 [0.45–0.61]) and radiologists (wκ = 0.49 [0.42–0.57]). The agreement evaluating the presence of gas was also moderate (κ = 0.45 [0.35–0.54]), but radiologists obtained better agreement (κ = 0.61 [0.48–0.72]) than surgeons (κ = 0.29 [0.18–0.39]). The intra-observer agreement using the classification was substantial (wκ = 0.79 [0.76–0.82]), without differences comparing surgeons (wκ = 0.75 [0.70–0.80]) and radiologists (wκ = 0.83 [0.79–0.87]). The intra-rater agreement evaluating gas was substantial (κ = 0.77 [0.72–0.82]), without differences between surgeons (κ = 0.71 [0.63–0.78]) and radiologists (κ = 0.84 [0.78–0.90]).
Conclusion
Given the only moderate agreement obtained using the Eno classification, it does not seem adequate to be used in clinical practice or in research.
Introduction
Sacroiliac joint (SIJ) degeneration has been considered a potential cause of low back pain, but the identification of symptomatic SIJ is difficult, and a clear association between imaging findings and symptoms has not been established. Some studies have revealed a high prevalence of SIJ degeneration in adults (1–4), with an increasing frequency in older individuals, which may represent a normal aging process of this joint. While articular degeneration has been extensively evaluated for limb joints, SIJ degeneration has received less attention.
Recently, Eno et al. (1) described a scoring system to grade SIJ degeneration using computed tomography (CT) scans. Such a grading scheme considers four categories: grade 0 = a normal SIJ; grade 1 = mild degenerative changes with mild subchondral sclerosis, minimal osteophyte formation, and subtle joint-space narrowing; grade 2 = significant degeneration with large bridging osteophyte formation but without ankylosis; and grade 3 = SIJ ankylosis. As any scoring system, this scheme should allow communication among physicians using it, but it should also help for decision-making in individual patients, and aid in research. To achieve those objectives, the grading system should be easy to apply, comprehensive, and reproducible among different observers and by the same observer on different occasions (5). Although this classification has already been in use in clinical practice and research (3), no independent evaluation has been performed to determine the inter- and intra-observer agreement using this scheme.
Given that clinicians and radiologists can use the Eno’s grading system (1), an independent inter- and intra-observer agreement assessment using this grading scheme should be done by a multicenter panel of clinicians and radiologists to determine the classification’s real value. The aim of the present study was to perform an independent inter- and intra-observer agreement evaluation of the system proposed by Eno et al., including orthopaedic spine surgeons (OSS) and musculoskeletal radiologists (MSR) as assessors. The secondary aim of the present study, given that the presence of gas in the SIJ (SIJ vacuum phenomenon [SJVP]) is frequently described in imaging reports evaluating the SIJ, was to evaluate whether the presence of SJVP was related to degenerative changes of the SIJ. In addition, we determined the agreement assessing the presence of SJVP.
Material and Methods
Institutional review board approval was obtained to conduct this study; informed consent was waived as this was a retrospective study with no risk to participants.
We studied 64 patients aged 60 years and over who were examined with abdominal and pelvic computed tomography (CT) scans at a tertiary care university hospital. Each CT scan was obtained with a multidetector CT scanner (GE, Milwaukee, WI, USA). The CT images were requested for a variety of reasons that were not related to the spine or the SIJ, including fever, suspicion of an abdominal or pelvic malignancy or infection, urolithiasis, and examination of malignancies under treatment. One OSS, who later did not participate in the classification phase of this work, selected the cases from a large database of patients to include patients with all grades of SIJ degeneration as defined by the Eno’s scheme. The exclusion criteria were the presence of a sacral or iliac fracture or tumor, a known diagnosis of inflammatory disorders, or the presence of instrumentation in the lumbosacral spine.
The CT scans were evaluated using the Impax Web3000 program (Agfa-Gevaert, Mortsel, Belgium), by six physicians from three different centers in three countries: three fellowship-trained OSS and three fellowship-trained MSR, all with at least 10 years of experience as sub-specialist. Axial images were evaluated on both sides using bone contrast windows to grade SIJ degeneration according to Eno’s grading system into Grade 0 (normal joint), Grade 1 (mild degenerative changes with mild subchondral sclerosis, minimal osteophyte formation, and subtle joint space narrowing), Grade 2 (significant degeneration with large bridging osteophyte formation but without ankylosis), and Grade 3 (sacroiliac joint ankylosis) (1) (Fig. 1). Each SIJ was scored separately; therefore, we evaluated 128 SIJ. Specifically, we looked at the antero-inferior portion of the joint to determine SIJ degeneration. We also evaluated the agreement evaluating the presence of SJVP. At the time of evaluation, the assessors were blinded to the patients’ clinical and personal data.

Classification of sacroiliac joint degeneration: (a) Grade 0: normal; (b) Grade 1: mild degenerative changes with mild subchondral sclerosis, minimal osteophyte formation, and subtle joint space narrowing; (c) Grade 2: significant degeneration with large bridging osteophyte formation but without ankylosis; and (d) Grade 3: sacroiliac joint ankylosis.
The evaluators were trained in this classification system through an online session to discuss it and to clarify doubts before performing the assessments; therefore, we could standardize the evaluation process. Additionally, they were provided with the original article by Eno et al. (1) to solve any doubt at the time of evaluation.
Sample size estimation was performed with R software (The R Project for Statistical Computing, Vienna, Austria). We used a confidence interval (CI) approach to sample size estimation for inter-observer agreement studies with multiple raters as reported by Rotondi et al. (6). Using a lower limit of 0.61 and an upper limit of 0.80 (an expected substantial agreement), for six evaluators and a 95% CI, we calculated that the required sample size was 64 cases.
Statistical analyses were conducted using Stata statistical software, version 13.0 (Stata Corp., College Station, TX, USA). Considering the classification described by Eno et al. is an ordinal variable, we used the weighted kappa statistics (wκ) for two-way agreements. Weighted kappa allows measuring agreement with multiple response levels when not all disagreements are equally important; weight was set linearly. For the presence of SJVP, we used Cohen’s kappa (κ). Inter-observer agreement was determined by comparing the initial read of all the assessors. Intra-observer agreement was calculated by comparing the same evaluator’s reads between two assessments of the same patients. The two assessments were separated by a four-week interval and presented in a random sequence to avoid recall bias.
Levels of agreement for κ and wκ were determined as proposed by Landis et al. (7), as follows: κ values 0.00–0.20 = slight agreement; 0.21–0.40 = fair agreement; 0.41–0.60 = moderate agreement; 0.61–0.80 = substantial agreement; and 0.81–1.00 = almost perfect agreement. All agreements are expressed with 95% CIs.
Additionally, we used the chi-square test to compare gas prevalence in the SIJ among different SIJ degeneration grades.
Results
We evaluated 64 patients (27 women; mean age = 72.34 ± 9.36 years). We performed a total of 768 initial evaluations (128 SIJ by six evaluators); of these initial evaluations, 107 (14%) were grade 0, 321 (42%) grade 1, 190 (25%) grade 2, and 150 (19%) grade 3 of the Eno grading system. The evaluators rated the same grade of SIJ degeneration in both sides in 82% of the patients. The presence of gas was observed in 210 of the 768 initial evaluations (27%) of the evaluated joints. Side-by-side agreement for the presence of gas was 89%. The presence of gas was identified significantly more frequently in SIJ rated as grades 1 and 2 (39% and 33%, respectively) than in SIJ rated as grades 0 and 3 (12% and 6%, respectively) (Table 1).
Presence of gas and sacroiliac joint degeneration
Values are given as n (%) unless otherwise indicated.
*Significant difference (P < 0.05) in the pairwise comparison of gas frequency between grades.
The inter-observer agreement among the six evaluators was moderate (wκ = 0.50; 95% CI = 0.44–0.56), without difference comparing agreement of OSS (wκ = 0.53; 95% CI = 0.45–0.61) and MSR (wκ = 0.49; 95% CI = 0.42–0.57). The agreement evaluating the presence of SJVP was also moderate among observers (κ = 0.45; 95% CI = 0.35–0.54), but was significantly better for MSR, who obtained a substantial agreement (κ = 0.61; 95% CI = 0.48–0.72), than among OSS, who had a fair agreement (κ = 0.29; 95% CI = 0.18–0.39).
The intra-rater agreement was substantial (wκ = 0.79; 95% CI = 0.76–0.82). We found no differences in the intra-observer agreement obtained by OSS (wκ = 0.75; 95% CI = 0.70–0.80) and the intra-observer agreement reached by MSR (wκ = 0.83; 95% CI = 0.79–0.87). The observation of SJVP also had a substantial intra-rater agreement (κ = 0.77; 95% CI = 0.72–0.82), with no significant difference between OSS (κ = 0.71; 95% CI = 0.63–0.78) and MSR (κ = 0.84; 95% CI = 0.78–0.90).
Discussion
SIJ degeneration is a common finding in the aging population, but most patients are asymptomatic (1,2). Nevertheless, SIJ degeneration can be a cause of low back pain in some patients, and adequate grading of this degenerative process is required for a correct diagnosis. While several imaging tools may provide visualization of the SIJ, CT scan is a widely accepted method to evaluate its degeneration (1,2,4). Surprisingly, there is little consensus on the best instrument to grade SIJ degeneration. Recently, Eno et al. (1) developed a hierarchical classification to evaluate SIJ degeneration, but no previous studies have evaluated the agreement using this grading scheme by different observers, and by the same observer on separate occasions.
Our study showed a moderate inter-observer agreement and a substantial intra-observer agreement using this classification by a multicenter, independent panel of assessors, including OSS and MSR. Like any grading system, this scheme should facilitate communication among physicians, standardize research terminology, and help to guide decision-making in individual patients. However, it should be acknowledged that any attempt to grade a continuous process (such as SIJ degeneration) into discrete grades (as the Eno classification does) is limited because, with no objective boundaries, no classification scheme can perfectly determine if a specific joint should be classified into a precise category, or in the next one. Additionally, ankylosis was not well defined by Eno et al. (1), representing another limitation of this classification that can reduce the agreement using it. In fact, Eno et al. did not clarify whether anterior ankylosis (such as in cases with diffuse idiopathic skeletal hyperostosis), or an entheseal ankylosis posterior to the SIJ, are also part of the Grade 3.
In the present study, we decided to evaluate a variable not included in the original classification: the presence of gas in the SIJ. The SVJP is a commonly observed phenomenon in degenerated SIJ (8–10); similar to using the Eno grading scheme, our panel of assessors had a moderate inter-observer agreement and a substantial intra-observer agreement evaluating the presence of SJVP. Of note, the MSR showed a significantly better inter-observer agreement than OSS assessing the presence of SIJ gas. We believe such a difference may be explained because MRS frequently seek gas in multiple joints throughout the body in different clinical conditions (10). However, given our results, we consider that evaluating the presence of SVJP does not add value to the Eno classification.
Interestingly, the presence of gas was significantly more frequent in joints exhibiting grade 1 or 2 degeneration than in normal joints (grade 0), or in ankylosed joints (grade 3), as shown in Table 1. This finding could be explained because SJVP seems to be a finding of degenerated SIJ, but it should tend to disappear if a joint becomes completely ankylosed (11). However, similar to other characteristics of SIJ degeneration, there is no clear association of SJVP and symptoms.
It has been widely described that independent studies generally exhibit a lower inter-observer agreement than the agreement obtained by the original panel that developed a classification (12,13). Even though the original description of the classification was performed by a group of orthopaedic surgeons (1), we decided to perform this study with a panel of OSS and MSR, since such a classification may be useful in the day-to-day practice of clinicians and radiologists. The moderate inter-observer agreement using this classification scheme by the entire panel, and by each subgroup of specialists, does not reach the κ value of 0.55 proposed as the minimal inter-observer agreement level for a classification scheme to be useful in clinical practice (14). However, the panel describing this classification did not assess their agreement using it; therefore, we cannot establish a comparison with the authors describing the scheme.
We obtained a substantial intra-observer agreement in this study; this result can be explained because the intra-observer agreement depends on the individual interpretation of the grading system by each assessor, which reflects either a consistently correct or a consistently erroneous evaluation by an observer, independent of agreement with other assessors. Therefore, a substantial intra-observer agreement alone is not enough to support using this classification unless it demonstrates a better agreement when used by different observers.
Our cohort included patients 60 years, and older who were evaluated with abdomen and pelvic CT scans independently of the presence of spinal symptoms. We decided to include patients in such an age range to have enough cases with advanced SIJ degeneration; although evaluating older patients may be considered a limitation of our study, 56% of our evaluations were joints classified as grade 0 and 1. Therefore, we could obtain enough representation of all grades of degeneration.
Recently, there has been an increased interest in different invasive treatments to treat presumed sacroiliac joint pathology, from injection techniques to surgical options (15). The diagnosis is usually based on symptoms that may be non-specific, physical examination, and degenerative changes observed on CT scans. However, the association between pain and SIJ degeneration on CT scans has not been established. Comparable to lumbar disc degenerative findings (16,17), SIJ degeneration is present in many asymptomatic subjects and increases with advancing age (1,2,4). Consequently, caution is required when attributing symptoms to degeneration observed in CT scans; therefore, the clinical use of this classification has yet to be proven. However, alike the Pfirrmann classification of disc degeneration, this classification is sometimes used in imaging reports and in research; hence, an independent evaluation of this classification’s capacity to allow communication among physicians was required.
It has been proposed that a new classification requires a three-step validation process (18): (i) a definition of categories by experts; (ii) a multicenter agreement study performed by a representative group of future users of the classification; and (iii) a prospective clinical study to evaluate its clinical usefulness. The present study contributes to this scheme’s validation process with a multicenter, multi-specialist agreement evaluation performed by potential users of this classification.
In conclusion, given an only moderate agreement observed by our panel using this classification, we believe it does not seem adequate to be used in clinical practice or in research; only further prospective studies will reveal if its use is valuable in clinical practice.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
