Abstract
Background
More advanced disc degeneration on magnetic resonance imaging (MRI) is found in individuals with low back pain. However, it is unclear whether this grading is independent of the scanner’s field strength.
Purpose
To compare disc degeneration on high- versus low-field MRI.
Material and Methods
Low back pain patients were enrolled to undergo high-field (3 T) MRI, followed by low-field (0.25 T) MRI of the lumbar spine within 3 h. Three radiologists graded the disc degeneration on Pfirrmann’s grading scale with a hiatus of 3 months. A subsample was regraded 6 months later. Reproducibility was measured by weighted kappa statistics (using PROC FREQ statement with AGREE in the TABLES statement for SAS), absolute agreement (i.e. 1:1 agreement/the total number) and the difference in the prevalence (McNemar test).
Results
Moderate to substantial agreement (κ = 0.52–0.62) and absolute agreement of 43.8–66.1% were found between field strengths. Low-field MRI tended to have numerically higher and lower grades than high-field MRI resulting in a significant difference in the prevalence of grades (p < 0.001). Both field strengths resulted in a moderate to substantial inter-reader agreement (low-field: κ = 0.63, 0.63, 0.54 and high-field: κ = 0.55, 0.43, 0.53) and intra-reader agreement (high-field: κ = 0.57, 0.77, 0.67 and low-field: κ = 0.51, 0.50, 0.70). Only, the reader with the shortest experience had better agreement with high-field compared to low-field.
Conclusions
There were a significant difference in the prevalence of disc degeneration grading between 0.25 T and 3 T MRI. Therefore, field strength should be taken into consideration when comparing studies using disc degeneration grading as an outcome.
Introduction
Degenerative findings in the lumbar spine are common in individuals with or without low back pain (LBP),1–6 and the intervertebral disc undergoes progressive morphologic and cellular changes with age.1,7 Nevertheless, more advanced disc degenerative changes are found in individuals with LBP compared to individuals of the same age without LBP.3,5,8 Non-specific LBP is believed to be initiated by degenerative processes in the disc, amplified by inflammation and infection. 9 Magnetic resonance imaging (MRI) of the lumbar spine is often requested to identify the possible source of pain in patients with non-specific LBP. 10 This makes a reliable and feasible MRI grading of the disc degeneration relevant in a clinical setting.11,12
Disc degeneration is believed to be generated by a catabolic process, 13 with loss of the hydrophilic glycosaminoglycan (GAG) primary structure within the nucleus pulposus. 14 These changes result in reduced water content in the disc,9,15,16 which can be seen on MRI as decreased T2 signal (hypointense).11,17,18 For lumbar spine imaging, high-field MRI scanners (>1 T) are usually preferred by radiologists and clinicians due to better image quality related to a higher signal-to-noise ratio. 19 Despite this, low-field scanners do not necessarily yield lower diagnostic potential in LBP evaluation.3,20,21 Low-field MRI scanners have the advantages of lower purchase and maintenance cost, 19 and a previous study has indicated that grading degenerative changes in the vertebral endplates (Modic classification) are affected by the field strength of the MRI scanner. 21 Thus, it would be useful to know the diagnostic capability of low-field scanners compared with higher-field scanner for grading degeneration of the intervertebral disc in the lumbar spine. This study aimed to compare lumbar disc degeneration grading from a high-field (3 T) scanner and a low-field (0.25 T) MRI scanner.
Material and Methods
Design
This paper, which is reported in accordance with the Strengthening the Reporting of Observational Studies in Epidemiology guidelines (STROBE), 22 includes patients with LBP referred to conventional high-field MRI at the Department of Radiology, Frederiksberg Hospital, Denmark in 2012. The study was approved and conducted according to the local ethics committee (KF 01-045/03) and the Danish Data Protection Agency (01758 FRH-2012-003).
Study Group Selection
Patients 18–65 years of age with LBP with or without sciatica/radiculopathy who were referred to conventional high-field MRI of the lumbar spine were asked to participate in this study. The exclusion criteria were pregnancy, prior spine surgery, suspected fracture, cancer, or ‘red flag’ symptoms. The patients were consecutively included to undergo first high-field MRI followed by supine low-field MRI within 3 hours on the same day. Also, all patients had a clinical examination and were interviewed about their back pain by a physician (BBH).
Image Acquisition
The high-field scan (Siemens Verio 3 Tesla, Erlangen, Germany) included a sagittal T2-weighted (T2w) and a sagittal T1-weighted (T1w) turbo spin echo (TSE) sequence followed by an axial T2w TSE sequence covering the five lumbar disc levels. The following low-field MRI (G-Scan, ESAOTE, Italy 0.25-Tesla) included a sagittal T2w and sagittal T1w TSE sequence, as well as an axial 3D T2w isotropic gradient echo sequence (3D-Hyce) covering the lumbar spine from L2 to S1. The MRI sequences used in this study were part of the standard protocols from the manufacturers to keep them as realistic as possible (Table 1).
Details of the magnetic resonance imaging sequences of the lumbar spine,
TSE-T2 = T2-weighted turbo spin-echo; GRE-T2 = T2 weighted 3D hybrid contrast enhancement gradient echo; TSE-T1 = T1-weighted turbo spin-echo; TA = acquisition time; TR = repetition time; TE = echo time; ST = slice thickness; SBS = spacing between slices.
MRI Evaluation
One senior consultant neuroradiologist (A), one junior consultant radiologist (B) and one senior consultant musculoskeletal radiologist (C) graded the disc degeneration on the midsagittal T2-weighted images according to Pfirrmann’s five-point grading system. The disc degeneration was subjectively graded as Grade I, where the nucleus pulposus was homogeneous with high T2-signal intensity and a clear distinction between the nucleus and annulus; Grade II, where the nucleus pulposus was lightly inhomogeneous with a clear distinction between the nucleus and annulus with or without horizontal hypointense bands; Grade III, where the nucleus pulposus was inhomogeneous with unclear distinction between the nucleus and annulus, and a slightly decreased disc height; Grade IV, where the nucleus pulposus was inhomogeneous with a low signal intensity without distinction between the nucleus and annulus, and a manifest disc height reduction; Grade V, where the nucleus pulposus was inhomogeneous and hypointense, without distinction between the nucleus and annulus, and a fully collapsed disc space. 17 The evaluation was performed on a certified radiologic workstation monitor using IMPAX® software (AGFA®). Due to the limited field of view of the low field MRI scanner resulting in geometric distortion of the L1/L2 level in some tall individuals, the grading used for comparison only included the L2/L3 to L5/S1 lumbar levels. The radiologists were blinded to any clinical information, and no communication regarding the grading between the readers was allowed. The disc degeneration grading of the high-field MR images and low-field MR images were made with a hiatus of three months. A subsample of 20 high- and low-field MRIs was regraded by all radiologists 6 months later. In addition, the most experienced spine radiologist (C) evaluated all scans for disc herniations, high intensity zones (HIZ), subchondral endplate changes (Modic changes type 1 and type 2), spondylolisthesis, and spinal stenosis. 11
Statistical Analysis
Descriptive data are reported as point estimates (proportions and mean values ± standard deviations). Agreement of the scores between field strengths and readers was assessed by: (1) weighted kappa statistics for ordinal data using the PROC FREQ statement with AGREE in the TABLES statement for SAS (IBM Corp, version 9.4, 64-bit edition for Windows), which by default tests agreement by Cicchetti–Allison weights, and (2) absolute agreement (i.e. levels with 1:1 agreement/the total number of levels). 23 Based on previous MRI agreement studies, 24 and the accepted guidelines for kappa values, 0–0.2 indicates slight agreement, 0.2–0.4 fair agreement, 0.4–0.6 moderate agreement, 0.6–0.8 substantial agreement and >0.8 almost perfect agreement. 25 Frequency distributions of the Pfirrmann’s grades were reported for each reader, and McNemar test was used to compare the prevalence of each grade between field strengths.
Results
Patient Characteristics
Seventy-five patients accepted participation and were first scanned in the high-field (3 T) scanner and subsequently scanned in the low-field (0.25 T) scanner. One patient could not enter the low-field scanner, five patients had a history of spine surgery, and seven patients could not complete both scans due to accentuated pain. Six low-field MR examinations had either severe motion artefacts or both scans did not include both the L2/L3 and L5/S1disc level. Therefore, the following analyses are based on 56 patients (29 females) aged 21–65 years. Back pain was reported on a 100 mm line using the visual analogue scale (VAS), where 0 indicate no pain and 100 indicate worst pain imaginable. Back pain ranged from 18 to 100 mm during activities and 1 to 85 mm during rest. A majority of the patients (70%) had additional radicular symptoms (Table 2).
Patient characteristics.
aBased on the evaluation by one radiologist (ZR) and reported as number of levels with the specific MRI finding.
bIncludes the superior or inferior endplate or both.
LBP, low back pain; VAS, 0–100 mm visual analogue scale; HIZ, hyper intensity zones.
Disc degeneration – high-field vs. low-field MRI
The three radiologists graded the same 244 lumbar discs on both high- and low-field MRI with a hiatus of 3 months. Low-field MRI resulted in numerically more higher and lower grades compared to high-field MRI for all three radiologists (Fig. 1). There was a moderate to a substantial agreement between field strengths, although there was a significant difference in the prevalence of different grading between low- and high-field MRI (Table 3).

The histogram illustrates the number of disc degeneration grades for 3 T high-field MRI (dark grey) and 0.25 T low-field MRI (light grey).
Number of grades, agreement and absolute agreement between high- and low-field MRI for disc degeneration evaluations.
Agreement (κ) between 3 T high-field and 0.25 T low-field (0.25T) MRI scanners was calculated by weighted kappa statistics for ordinal data. The McNemar test was used to compare the relative prevalence of the different grades and is given as p-values. Reader A = senior neuroradiologist; Reader B = junior radiologist; Reader C = senior musculoskeletal radiologist.
Inter- and intra-reader agreement
Table 4 presents the inter- and intra-reader agreement and absolute agreement for each reader, with corresponding 95% confidence intervals (CIs) for both for the high- and low-field MRI scanners. There was a moderate to substantial inter- and intra-reader agreement for both field strengths. Between reader A and C there was a higher inter-reader agreement for low-field MRI than for high-field MRI. The reader with the shortest experience evaluating low-field MRI (B) showed a higher intra-reader agreement for high-field MRI compared to low-field MRI.
Agreement and absolute agreement of 3 T and 0.25 T MRI between radiologists.
The inter- and intra-reader agreement was calculated by weighted kappa statistics for ordinal data. Absolute agreement is given as number (no) and percent (%); Reader A = senior neuroradiologist; Reader B = junior radiologist; Reader C = senior musculoskeletal radiologist.
Discussion
Principal Findings
This study presents the reproducibility of Pfirrmann’s disc degeneration grading on a high-field (3 T) and a low-field (0.25 T) MRI scanner in a population similar to the original test of the scale (Fig. 2). 17 Low-field MRI grading resulted in numerically higher and lower grades on the scale compared to high-field MRI, resulting in a moderate to substantial agreement and a significant difference in prevalence of the grading between field strengths. The radiologist with the shortest experience evaluating low-field MRI had a better intra-reader agreement with high-field MRI. These are important findings, as Pfirrmann’s five-point grading system is considered a reliable and clinical feasible in vivo evaluation of disc degeneration, and therefore widely used in research.11,12

Examples of Pfirrmann’s disc degeneration grades on 3 T (a) and 0.25 T (b) sagittal T2-weighted magnetic resonance images in the same patient and lumbar disc level.
Disc Degeneration – High-field vs Low-field MRI
The signal-to-noise ratio increases with higher field strengths and this can be used to either shorten the scan time or increase the image quality, which, in theory, should result in better reader performance. A previous study investigated the diagnostic reproducibility of high- versus low-field MRI for structural degenerative changes in the spine (i.e. disc herniation, central canal, lateral recess and foraminal stenosis as well as nerve root compression). 20 They found almost perfect agreement between experienced radiologists (κ-values: 0.71–0.92) and no significant difference between field strengths. In this study, we investigated a tissue property, which may be more dependent on the T2 signal. Despite this, it is notable that we found only a moderate to substantial agreement, low absolute agreement, and a significant difference in the proportions of the grades between field strengths. Another study comparing high-field MRI to low-field MRI of another tissue property (Modic changes) also found a significant difference in proportions and reproducibility between field strengths. 21 The authors argued that more pronounced subchondral degeneration could be difficult to identify on high-field MRI because of increasing inhomogeneity of the signal in the endplate. Whatever the reason, field strength, tissue properties and/or sequence choice, these parameters seem to affect the disc degeneration grading. Therefore, studies investigating degenerative disc grading on MRI with different field strengths can be expected to generate different results. This adds another potential limitation to Pfirrmann’s grading system, which has been criticized for its subjectivity and poor definition of reduced disc height. 12 This calls for better standardisation of disc degeneration grading with better reproducibility to avoid bias in clinical studies. Ideally, disc degeneration grading should be a continuous and maybe computerised measure, which would enable clinicians to follow the degenerative process during treatment and have the potential to distinguish early painful degenerative changes from age-related changes.9,26 Imaging modalities such as T2-mapping, T1rho, dGEMRIC, spectroscopy or sodium MRI have been considered for this purpose. However, the clinical implications of such new sequences on a patient level as well as their reproducibility and associations to the clinical LBP are still widely unknown. 26
Inter-reader Agreement
The initial test of the grading system by Pfirrmann et al. was performed on a 1.0 T MRI scanner and they found an inter-reader agreement (κ-values) between 0.84 and 0.90 and absolute agreements between 88.0 and 92.3%. 17 In this study, we tried to make the readings as close to a clinical setting as possible, and therefore, no initial consensus training of the radiologists was conducted. In addition, only the senior musculoskeletal radiologist (C) was subspecialized in the interpretation of degenerative pathologies in the spine. These considerations may explain our lower inter-reader agreement and absolute agreements compared to the original reliability test of the grading system. In another clinical study, disc degeneration has been graded on a four-item grading scale and comparable moderate inter-reader agreement (κ-value: 0.59) on a low-field (0.2 T) MRI scanner was found. 27 Another similar study also graded the disc degeneration on Pfirrmann’s grading scale and they found a similar inter-reader agreement with κ-values ranging from 0.63 to 0.70 on high-field MRI (1.5 T). 24 We observed a tendency to better overall inter-reader agreement for low-field MRI compared with high-field MRI. This might be due to lower signal-to-noise ratio and lower spatial resolution of low-field MRI, resulting in fewer details on the mid-sagittal images.
Intra-reader Agreement
We observed a tendency to a better intra-reader agreement than inter-reader agreement, which has also been found in previous studies.24,27 In the original study by Pfirrmann et al., 17 the intra-reader agreement varied between 0.74 and 0.81 and the absolute agreement between 80.0% and 85.0%, which is better than our results. However, our intra-reader agreement is comparable with grading other degenerative pathologies in the spine.24,27 A likely reason for our lower intra-reader performance again could be the radiologists’ different experience evaluating degenerative spine diseases – especially on low-field MRI. This conclusion is supported by the results showing that the radiologist with the shortest experience evaluating low-field MRI had a better intra-reader performance with high-field MRI.
Limitation of the Study
Due to logistic reasons, all the patients were first scanned in the high-field MRI scanner and secondly in the low-field MRI scanner. This may represent a limitation, as the water content in the discs is known to decrease during the day because of the upright position. Ideally, the patients should have been randomised to either high- or low-field MRI, as the first examination. However, this limitation may be compensated by the low- and high-field MRI was performed on the same day with a maximum time span of three hours. Patients above age 65 were not included to ensure a higher prevalence of low disc degeneration grades in the study population. This may explain the fairly low mean age, which may limit the ability to generalise our MRI findings to other studies; however, this has not affected the overall aim of the study. Another limitation could be the use of a 3 T scanner as the representative for high-field MRI. In a clinical setting, 1.5 T MRI scanners are often used for spine imaging, which may also limit the ability to generalise our findings into a clinical context. Further, 3 T MRI scanners might have a different reproducibility in grading degenerative discs compared to 1.5 T or 1 T MRI high-field scanners. Finally, Kappa value on ordinal data depends on the prevalence in each category, which may lead to difficulties comparing the Kappa values to other studies with different prevalence in the categories. 28
In conclusion, a significant difference in the prevalence of disc degeneration grading was found between low- and high-field MRI of the lumbar spine when using the Pfirrmann’s disc degeneration grading system. Moderate inter- and intra-reader agreement and absolute agreement were found in the current study highlighting the need for dedicated training before the disc degeneration grading can be used with higher precision in both clinical practice and research setting.
Footnotes
Acknowledgements
The authors thank all participants and the MRI staff of the Department of Radiology.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Professor Boesen has received travel grants from ESAOTE, Genoa, Italy to hold invited lectures at ESSR 2015 and 2017 and ECR 2015. The other authors report no conflicts of interest.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by The Oak Foundation, Copenhagen University Hospital, Bispebjerg and Frederiksberg, Savværksejer Jeppe Juhl og Hustru Ovita Juhls Mindelegat, Minister Erna Hamiltons Legat for Videnskab og Kunst and the Danish Rheumatism Association.
