Abstract
Background
Disc degeneration quantification is important for monitoring the effects of new therapeutic methods, such as cell and growth factor therapy. Magnetic resonance (MR) image texture reflects biochemical and structural tissue properties and has been used for differentiating between normal and pathological status in a variety of medical applications.
Purpose
To investigate the suitability of textural descriptors for the quantification of intervertebral disc degeneration using conventional T2-weighted magnetic resonance images of the lumbar spine.
Material and Methods
A 3 Tesla scanner was used, and conventional T2- weighted MR images were obtained, and a total of 255 lumbar discs were analyzed. An atlas-based method was used for segmenting the disc regions from the images. A set of first and second order statistics describing texture of each region were calculated. The validity and reliability of these descriptors for disc degeneration severity quantification was tested through their correlation with patient age and qualitative clinical grading of degeneration severity. Texture quantification results were compared to a widely accepted method for disc degeneration quantification based on the measurement of disc's mean signal intensity.
Results
Out of the set of texture descriptors tested, two descriptors quantifying image intensity inhomogeneity, i.e. the grey level standard deviation and co-occurrence derived sum of squares displayed the strongest association to patient age and clinical grading of disc degeneration severity (P < 0.001). This is attributed to these inhomogeneity descriptors' capability to capture the progressive loss of nucleus-annulus distinction in the degenerative progress. Statistical analysis indicates that these descriptors can effectively separate between early stages of degeneration. Quantitative measurements are highly repeatable (intraclass correlation >0.98).
Conclusion
Inhomogeneity descriptors could be a valuable tool for tracking the evolution of disc degeneration and monitoring the response to treatment in a simple, precise and repeatable manner.
Disc degeneration is a complex process involving both structural disruption and cell-mediated changes in disc composition (1). Magnetic resonance imaging (MRI) is a good modality for evaluating intervertebral disc degeneration (2), providing both morphological and biochemical information (1). Grading systems for degeneration severity evaluation are based on qualitative descriptions of disc image descriptors such as the signal intensity, height and distinction between disc nucleus and annulus (3–5). However, these systems have a limited number of degeneration severity classes (typically 3–5 classes) which impairs the detection of small changes in intervertebral discs (6). Moreover, this qualitative evaluation is susceptible to inter- and intra-observer variabilities (7).
On the other side, the quantification of disc characteristics can provide an objective and reproducible evaluation of degeneration severity (6). Moreover, the continuous nature of quantitative measures renders them more sensitive to small changes (6, 8). The need for objective diagnostic methods increases with the development of emerging treatment technologies, such as nucleus replacement, cell therapy, and growth factor therapy, which require precise monitoring (9, 10).
Nowadays, a widely accepted method for disc degeneration quantification is the measurement of disc's mean signal intensity from mid-sagittal T2-weighted MR images (4–6). Signal intensity reflects tissue biochemical properties and is correlated to disc's proteoglycan content (5, 11). The decrease of disc's mean signal intensity may be the earliest degenerative change seen in MRI and is a sensitive and reliable measure of degeneration severity (6). An intrabody reference, such as a region within the cerebrospinal fluid (CSF) is commonly used for intensity normalization in this approach (5, 8).
Alternative approaches to disc degeneration quantification include the measurement of T1, T1ρ and T2 relaxation times and of the apparent diffusion coefficient (10, 12–14). Overall, these quantitative approaches provide additional information to the conventional T2-weighted images, regarding both the biochemical composition and structural integrity of the intervertebral discs. However, this does not come without a cost. These approaches require specific image acquisition protocols with relatively long acquisition times, which complicate their application in clinical routine (13).
The present study introduces an automated method for texture-based quantification of disc degeneration severity from conventional T2-weighted spine images acquired in a clinical 3.0 Tesla MR scanner. The aim is to capture additional image information, encoded in image texture, in order to assist the evaluation of severity of disc degeneration. The notion of image texture is associated to the spatial distribution of pixel intensities variations and is known to be particularly sensitive for the assessment of pathology (15). Textural descriptors quantify image microstructure (16) and can thus be used to assess structural changes of the underlying tissue. Image-based texture analysis, has been employed for distinguishing between healthy and pathological tissue in a variety of clinical problems and a wide range of imaging modalities starting from plain radiographs and moving to CT, MRI, and ultrasound (17).
In the present study, the validity and reliability of texture descriptors for disc degeneration quantification is evaluated by: (a) testing their association to patient age and to clinical grading of disc degeneration severity; (b) comparing these results to that of the standard quantification method based on adjusted disc mean signal intensity; (c) evaluating descriptors ability to distinguish between discs of different degeneration severities; and (d) testing measurements repeatability.
Material and Methods
MRI data-sets and qualitative grading
Two hundred and fifty-five lumbar intervertebral discs from midsagittal T2-weighted images of 51 patients' lumbar spines (mean age 52, SD 15 years) were analyzed. Data were acquired using a 3.0 Tesla MRI scanner (Signa; General Electric, Milwaukee, WI, USA), and a fast relaxation fast spin echo imaging sequence with the following acquisition parameters: repetition time = 3000–3400 ms, echo time = 102–109 ms, flip angle = 90°, field of view = 28 cm, slice thickness = 3 mm, inter-slice gap = 1 mm, matrix = 512×512, voxel size = 0.55 × 0.55 × 3 mm3, number of averages = 1.
Pfirrmann's grading system was employed for qualitative clinical grading of disc degeneration from MR images (3). Three experienced radiologists independently reviewed the images and classified each one of the 255 discs into one of five classes of degeneration severity (with I corresponding to normal and V corresponding to severely degenerate).
Cohen's Kappa statistic was employed for evaluating inter-reader agreement. Readers' consensus (given by the majority of three gradings) was used for establishing a grading truth for each individual disc. This consensus grading served for validating the effectiveness of the quantitative descriptors in the task of disc degeneration severity evaluation.
Disc degeneration quantification
Disc segmentation
Prior to quantifying disc information, the disc regions need to be segmented from the MR images. For this purpose an atlas based segmentation method was employed (Fig. 1, white line). This is a hybrid method combining a probabilistic disc atlas with robust fuzzy clustering techniques. The fuzzy clustering approach exploits grey-level image information in order to assign the image pixels in tissue classes (i.e. disc, bone or CSF tissues), providing compensation for partial volume effects and robustness against image noise. Additionally, the disc atlas introduces anatomical information in the segmentation process and helps controlling border leakage towards surrounding structures. The combination of these two techniques results in a robust segmentation method which has been shown to work effectively on both normal and degenerated intervertebral discs (18).

Disc delineation resulting from the atlas-based segmentation (white line); the manually defined CSF-ROI used as reference for adjusting disc signal intensity (black rectangle)
Texture-based quantification
A set of 14 textural image descriptors was calculated from each segmented disc region. These are four first order descriptors (the mean, standard deviation, skewness and kurtosis) and 10 second order co-occurrence descriptors (the angular second moment, correlation, sum of squares, inverse different moment, entropy, contrast, difference entropy, sum average, sum entropy, sum variance). The definitions of these descriptors according to the original formulation by Haralick (16) are given in the Appendix.
First order descriptors are derived from the image histogram and quantify the grey level values distribution. Co-occurrence descriptors are calculated from co-occurrence matrices, which encode the spatial dependence of grey levels within the image and thus these descriptors quantify the spatial distribution of image grey-level values. Co-occurrence descriptors are perhaps the most commonly used method of texture analysis, due to their ability to summarize the image microtexture (15, 16).
A co-occurrence matrix is denoted as P θ(i,j) and describes the frequency in which two grey-levels (i and j) appear in a certain pixel distance (d) and in a particular direction (θ) within the ROI. Typically, a different co-occurrence matrix is calculated for each one of the basic directions (horizontal 0°, diagonal 45°, vertical 90°, and antidiagonal 135°). In this study, a pixel distance d = 2 was utilized, and four co-occurrence matrices were calculated across the different basic directions. Next, the descriptor values are derived from these matrices and mean value of each descriptor over the four matrices was calculated to produce the final co-occurrence descriptors averaging disc microtexture.
Adjusted mean signal intensity
The adjusted mean signal intensity of each segmented disc region was additionally calculated and used as a basis for testing the effectiveness of the texture-based quantification.
For each segmented disc region, the adjusted signal intensity was calculated as the ratio of the mean disc over the mean CSF signal intensities. The mean CSF signal was calculated by selecting a rectangular region of size 3×5 pixels within the CSF in the anterior part of the dural sac adjacent to the disc (Fig. 1, black rectangle). This CSF region was selected in the proximity of the disc in order to minimize the effect of intensity non-uniformities, which could adversely affect the normalization value. Moreover, the ROI size was chosen to be small enough to fit the CSF region in the spinal canal, while being large enough to reduce the effect of noise on the calculated mean CSF value. In cases where the CSF signal intensity measurement was considered unreliable, due to flow artifacts or narrowing of the dural sack, the observation was recorded as missing and was removed from the case sample analyzed (4, 6).
Statistical analysis
Pearson's r correlation coefficient was used for measuring the association between patient age and the quantitative descriptor values. In addition, Spearman's ρ correlation coefficient was employed for measuring the association between the qualitative clinical grading of disc degeneration according to Pfirrmann's scale and the quantitative descriptor values. Finally, the two-tailed unpaired Student's t-test was used for assessing the ability of these descriptors to discriminate among the degeneration grades (class separability P < 0.001).
The repeatability of quantitative measurements was tested on a subset of 50 randomly selected intervertebral discs using intraclass correlation coefficients. These quantitative descriptors measurements are subject to small variations in the disc segmentation and CSF–ROI selection processes. Specifically, the atlas-based disc segmentation method requires manual selection of two initialization landmarks. This landmark selection can affect the segmentation result and thus impose small changes in the descriptor values calculated from the segmented disc region. Additionally, the CSF–ROI selection requires manual input of a single landmark corresponding to the centre of the ROI. This could result in a small variation of the calculated mean CSF value and consequently affect the value of the adjusted mean intensity descriptor. To test the effect of manual landmark selection on the quantitative descriptor values, the manual positioning of disc and CSF–ROI landmarks was performed twice. Consequently, two measurements of the quantitative descriptor values were acquired and intraclass correlation coefficients were calculated between these measurements in order to test their repeatability.
Results
The qualitative grading of degeneration severity yielded a Kappa coefficient value equal to 0.695, indicating substantial agreement between the three radiologists (19). The three experts' consensus (grading truth) distribution was the following: 52 discs were assigned as Grade II, 90 discs as Grade III, 88 discs as Grade IV and 25 discs as Grade V. Fig. 2 illustrates representative examples of discs assigned to the different degeneration grades. Please notice that no Grade I discs were found. This grade is used for discs with a bright nucleus and without visible intranuclear cleft, which normally belong to pediatric patients who were not represented in the data sample analyzed.

Examples of discs with different degrees of degeneration severity: from left to right, the discs become darker and the distinction between nucleus and annulus is reduced, resulting in lower inhomogeneity descriptors values (more homogeneous appearance)
CSF reference samples were not available for 49 out of 255 discs. Particularly, these comprised six Grade II discs, nine Grade III, 20 Grade IV discs and 14 Grade V discs. This lack of reference was more frequent in cases of moderately and severely degenerated discs. This can be attributed to the fact that these discs are more likely to bulge or herniate causing narrowing of the dural sac and altering the CSF flow, thus hindering the acquisition of a reliable CSF reference sample. The discs without CSF reference were removed from our sample and only the remaining 206 discs were analyzed.
Table 1 summarizes the Pearson correlation coefficients between quantitative descriptor values and patient's age, and the Spearman correlation coefficients between the same descriptors and the clinical consensus grading of degeneration severity. As expected, a negative correlation value between disc's adjusted mean signal intensity and age was obtained, which is indicative of the progression of degeneration with aging. With respect to the clinical grading or disc degeneration, we obtained again a negative correlation with respect to the adjusted mean disc intensity indicating the loss of signal as degeneration severity increases. The strength of correlation was found to be statistically significant in both cases (P < 0.001).
Descriptors correlation to patient age and clinical grading
For 12 out of 14 textural descriptors the correlation with patient age is weaker than the corresponding one for disc's adjusted mean signal intensity (Table 1). In addition, 11 descriptors display weaker correlation with clinical grading of degeneration severity. However, two textural descriptors, the first order standard deviation and the co-occurrence derived sum of squares (also known as co-occurrence derived variance), demonstrate the strongest overall association to both patient age and clinical grading of disc degeneration severity (again P < 0.001). Moreover, the sum entropy descriptor displays stronger association to clinical grading than the adjusted mean signal intensity, although the corresponding association to patient age is weaker. These descriptors are considered representative of disc inhomogeneity, i.e. they quantify the progressive loss of nucleus-annulus distinction in the degenerative process. The resulting negative correlation sign indicates that disc inhomogeneity decreases for increasing age and increasing degeneration severity (i.e. degenerated discs appear more homogeneous).
The properties of the standard deviation and sum of squares inhomogeneity descriptors were further investigated. Fig. 3a illustrates the means and standard errors of the adjusted mean signal intensity, standard deviation and co-occurrence derived sum of squares descriptor as a function of the degeneration grades assigned to them, while Fig. 3b displays the percentage differences of mean descriptor values between neighboring grades. Table 2 gives the corresponding P values for grade separability, as calculated using the two-tailed unpaired Student's t-test. Results indicate that both adjusted mean signal intensity and the inhomogeneity descriptors can separate degeneration Grades II from III and III from IV but not Grade IV from Grade V (P < 0.001).

Comparison of (a) adjusted mean signal intensity and two texture descriptors in different grades of degeneration severity; (b) percentage difference in descriptors values between neighboring grades
P values for separating between discs in neighboring disc degeneration grades
Finally, intraclass correlation coefficients values used for testing measurement repeatability were found to be over 0.98 for both adjusted mean signal intensity and texture descriptors quantification.
Discussion
Results indicate that out of the 14 textural descriptors tested, two descriptors measuring image intensity inhomogeneity, namely the grey level standard deviation and co-occurrence derived sum of squares, could provide good measures of disc degeneration severity. These inhomogeneity descriptor values display statistically significant correlation with both patient age and clinical diagnosis of disc degeneration, while measurement repeatability is very high.
Degeneration and aging result in nucleus dehydration (due to proteoglycan fragmentation) and the disc progressively becomes more fibrous (1, 20). As a result, degenerated discs appear darker in T2 weighted MR images, while the nucleus annulus distinction progressively decreases (Fig. 2). Fig. 4 illustrates grey-level histograms for representative examples of discs assigned to Grades II, III, IV and V. Moving from Grade II to Grade V we can see how the histogram shifts towards lower signal intensity values indicating disc dehydration. Grades II and III are characterized by bimodal distributions, where the two peaks correspond to the mean annulus (left) and nucleus (right) grey-levels. The histograms of Grades IV and V are unimodal distributions indicating the complete loss of nucleus annulus distinction. When moving from Grade II to Grade III, the peak corresponding to the disc's nucleus gradually shifts towards lower grey-level values, while in Grade IV the two peaks merge indicating the loss of distinction between disc nucleus and annulus. This progressive loss of distinction results in narrowing of the histogram when moving to higher grades of degeneration indicating the decrease in image inhomogeneity (i.e. the disc image becomes more homogeneous).

Representative histograms of discs for the different grades of degeneration severity. The more degenerated the disc the lower the mean value and the narrower the histogram (fewer frequent intensities are dominant)
The standard deviation descriptor directly measures image grey levels variation. On the other hand, the sum of squares is a measure of variance which additionally takes into account the spatial distribution of grey levels within the image. The strength of correlation with patient age and clinical diagnosis of degeneration severity for inhomogeneity descriptors is actually higher than the corresponding ones for discs' adjusted mean signal intensity (Table 1). This can be attributed to the ability of these descriptors to represent additional image information regarding the grey-levels variation. The effectiveness of inhomogeneity quantification may be related to biochemical and structural tissue alterations. In the degenerative process, proteoglycan fragmentation results in reduced water binding capacity of the nucleus. In addition, a slow substitution of the fine type II collagen fibrils of the inner nucleus by type I fibers is taking place (20). These alterations lead to nucleus dehydration and annulus encroaching on the nucleus, which macroscopically appear as a progressive loss of nucleus annulus distinction. In MRI this results in an increase of image homogeneity, which can be captured by the decrease of inhomogeneity descriptor values. Thus inhomogeneity descriptors can indirectly evaluate structural disc alterations resulting in loss of distinction between nucleus and annulus by measuring the grey-level variations within the image. By this we do not imply that inhomogeneity descriptors should substitute the adjusted mean intensity based method, but they should rather be used together to obtain the maximum amount of information regarding disc degeneration severity.
In Fig. 3b the percentage differences between Grades II, III and III, IV indicate that inhomogeneity descriptors provide better separability between these degeneration grades than the adjusted mean intensity. In addition, Table 2 shows that the co-occurrence derived sum of squares has the lowest overall P values for these grades. Thus, these descriptors could be good measures for early and intermediate degeneration. On the other hand looking at the separability of Grades IV and V we can see that texture descriptors cannot distinguish between severely degenerated discs. This is because severely degenerated discs have completely lost their nucleus-annulus distinction and thus appear almost equally homogeneous in MRI. Although textural information cannot help distinguishing between these grades, shape quantification such as the calculation of disc space narrowing could assist in this task, since Grade V represents a collapsed disc space and thus a narrower disc than Grade IV (3).
Quantitative measurements of both adjusted mean disc intensity and inhomogeneity descriptor values were found to be highly reproducible (intraclass correlation >0.98). This can be partly attributed to the atlas-based segmentation approach used for extracting the disc region. Moreover, use of this algorithm significantly reduces user interaction time required for segmenting the disc in comparison to manual segmentation (18), and thus helps to simplify and speed up disc quantification. In addition, since the proposed inhomogeneity quantification approach utilizes conventional T2-weighted MR images, it could be widely applicable in clinical practice. An additional advantage of inhomogeneity descriptors is that being relative measurements they are independent of grey-level shifting and do not require an intra-body reference for grey-level adjustment.
A limitation of this study is the lack of Grade I discs (according to Pfirrmann's scale) which represent the normal young population. In addition, longitudinal studies should be used for further investigating the ability of inhomogeneity descriptors to evaluate degeneration severity, especially in early stages, which are of particular importance for new therapeutic methods (9). Moreover, the effect of MR image acquisitions parameters on disc quantification needs to be investigated, since texture descriptors can be sensitive to parameters variation. A recent study showed that co-occurrence descriptors are more robust than other texture analysis methods with respect to imaging parameters and spatial resolution variations (21).
Future work will focus on quantifying shape properties of the intervertebral discs. The disc space narrowing will be quantified in order to evaluate changes on advanced degenerative stages, while disc shape compactness and boundary irregularity will be exploited for evaluating alterations such as disc herniation and end plate defects. Moreover, a computer-aided diagnosis system exploiting inhomogeneity descriptors for the automated classification of discs to the different classes of degeneration severity, is currently being designed.
In conclusion, this study demonstrates the strong potential of disc signal inhomogeneity descriptors in evaluating intervertebral disc degeneration from conventional T2-weighted MR images. These descriptor values decrease with increasing severity of degeneration, reflecting the progressive loss of distinction between disc nucleus and annulus and are particularly sensitive to early disc degeneration. The proposed texture-based approach is intended to be used for evaluating the progress of disc degeneration and assessing the outcome of new treatment methods, such as growth factor therapy. Being a quantitative tool, it can offer precise and objective monitoring which is sensitive to small changes. This texture-based quantification approach could be used in conjunction with the adjusted mean signal intensity method to offer additional information about the disc structure. From a practical point of view, the quantitative measurements have been automated to facilitate user interaction and the system can be easily incorporated in diagnostic workstations. The proposed approach provides simple, fast and repeatable quantification of disc degeneration, from conventional MR images. Overall, inhomogeneity quantification could be a valuable tool for tracking the evolution of disc degeneration and monitoring the response to treatment.
Footnotes
ACKNOWLEDGEMENT
The authors wish to thank Mrs Alexandra Kazantzi, Dr Vera Mann and Dr Aikaterini Vassiou. Sofia Michopoulou is supported by a grant by the Greek State Scholarship Foundation.
Appendix
This section provides definitions for the texture descriptors used in this study. First order descriptors are defined in equations 1–4, while equations 5–14 provide the definitions of co-occurrence descriptors.
Mean
Standard Deviation
Skewness
Kurtosis
Angular second moment
Correlation
Sum of squares
Inverse difference moment
Entropy
Contrast
Sum average
Sum entropy
Sum variance
Difference entropy
Notation:
