Abstract
BACKGROUND:
Assessment of local condition specific outcome measures in combination to obtain an idea about the disability status of the whole spine is a conventional method. Oswestry Disability Index (ODI) and Neck Disability Index (NDI) are two outcome measures used together. Test re-test reliability of ODI in healthy subjects has clinical importance and the test re-test reliability of NDI may have clinical significance.
OBJECTIVE:
The purpose of this study is to investigate the test re-test reliability of the NDI by using long-term test re-test reliability method of ODI on healthy office workers.
METHODS:
Participants who have no chronic neck pain history were included in the study. Subjects were assessed by the Turkish-NDI (e-forms) on 1st, 2nd, 4th, 8th, 15th, 30th days. 49 (20 female, 29 male) of 106 participants (57 female, 49 male) completed the study. Kolmogorov-Smirnov and Friedman tests were used.
RESULTS:
The difference between median score of each day (χ= 9.275, p > 0.05) was neither statistically nor clinically significant.
CONCLUSIONS:
NDI has test re-test reliability in healthy subjects over a 1-month time interval and the test re-test reliability is also valid in cases where both questionnaires are used in combination in this time interval.
Introduction
All segments of the spine are interrelated anatomically, biomechanically and functionally. For this reason, investigation of the relationships between pain, disability and functional status of spinal segments have impotence in both clinic and research area [1]. Conventionally Oswestry Disability Index (ODI) and Neck Disability Index (NDI) can be used together to assess the disability status of the whole spine because of common structural origin [5, 6].
To improve the clinical utility of this method, new outcome measures are developed based on selected items of ODI and NDI [2, 3].
Functional Rating Index (FRI) was published in 2001. FRI consist of 10 items. Nine of these items are from ODI and NDI (3 items from ODI, 1 item from NDI and 5 common items) [3]. Total Disability Index (TDI) was published in 2016. TDI consists of 14 items (9 items from ODI and 5 items from NDI) [2]. The correlations between TDI items and original ODI and NDI is well studied [2].
One of the important results of combine or merged assessment strategy is the clarification of the interaction between spinal segments. The disability in the involved spinal segment may elevate the disability status of currently uninvolved spinal segment [2, 4]. When a single assessment is performed on a single spinal segment, the secondary problem that arises from the intersegmental interaction can be skipped over. If the opposite is the case, the main pathology will be missed out. Considering the effect of psychosocial factors such as stress on elevated perception of pain, the probability of the second condition in the working environment is not negligible [7]. Even though the first situation seems more plausible.
Fairbank et al. published the first version (original version) of ODI in 1980 [8]. ODI measures the intensity of pain and effect of low back pain (disability) on nine different daily living activities (lifting, ability to care for oneself, ability to walk, ability to sit, sexual function, ability to stand, social life, sleep quality and ability to travel). Each question (subscale) consists of six statements that describe the least amount (0) to severe disability (5). Patient performs a self-assessment by selecting the statement for each question that most closely resembles his/her situation. Disability is evaluated by calculating the total score. The total score can vary from 0–50 or 0% to 100%. Total score; 0% –20% reflects minimal disability, 21% –40% reflects moderate disability, 41% –60% reflects severe disability. 61% –80% and 81% –% 100 are the high score bands.
Vernon and Mior developed NDI based on ODI and published 1991 [9]. In draft versions of NDI scoring system and six items of ODI (pain intensity, personal care, lifting, sleep, driving and sex life) was used with four additional items (headaches, concentration, reading and work). In pilot studies, “sex life” item was rejected by the participants and replaced with recreation. The wordings of two items were also modified in pilot studies (pain intensity and sleep).
Scoring system and interpretation intervals
NDI and ODI have the same scoring system but disability interpretation intervals are not the same. This issue may be the only important point that can cause mistakes in formulation and interpretation of combine assessment results. In the original NDI study, suggested interpretation intervals was different from ODI; the total score from 0–4 reflects no disability, 5–14 reflects mild disability, 15–24 reflects moderate disability, 25–34 reflects severe disability and 35 and the higher scores reflects complete disability [9] (see Table 1). There are also differed suggestions for interpretation intervals in NDI literature [10, 11].
Comparison of NDI and ODI interpretation intervals
Comparison of NDI and ODI interpretation intervals
Also, the minimally clinical important change (MCIC) is different. MCIC for ODI is 9% and conventionally 3 to 5 point is used for NDI but there are studies that suggest higher change values like 10.5 and 10.2 points [12–14]. Also, there is a smaller suggestion for NDI as 1.66 for clinically stable patients [15].
In 2000, Fairbank clearly stated the need for new studies on healthy participants for ODI [16]. Fairbank may have had doubts about the control group response to ODI for both research and clinical applications [17–19].
In our previous study, we focused on long-term test re-test reliability of ODI in healthy male office workers according to the Fairbank recommendations [17, 20]. We have assessed healthy male participants on 1st, 2nd, 4th, 8th, 15th, 30th days to determine the stability of ODI scores over time. We have found that the difference between median score of the days with 1st day was neither statistically (χ= 6.482, p > 0.05) nor clinically significant. We have shown that ODI has long-term test re-test reliability in healthy subjects (n = 36) over a one month time interval [17].
The categories of the NDI studies reported by Vernon in his review is as below; 21 studies on psychometric properties of NDI, 14 studies where NDI is used as a follow up instrument and, 2–14 clinical trials where NDI was used as an outcome measure for different treatment modalities [21]. In psychometric properties of NDI group, only one study has a control group [21]. The minimum baseline NDI score was 2.4 in studies to follow up prognosis [16, 21]. In clinical trials group studies have 1–3 NDI follow up scores and baseline NDI score is not lower than 13 except only one study [15, 21].
There are relatively very long-term studies in the NDI literature (2-3 years to 17 years) when compared with ODI [20, 22]. However, NDI is generally used as a follow-up instrument in these studies. Improvement in health status is calculated according to the base line scores instead of healthy controls as in ODI studies [21]. For this reason, the test re-test reliability of NDI for healthy control groups may be an open issue in the literature.
In our previous study we have clarified, the normal values of ODI in healthy subjects and test re-test reliability in one-month time interval. This result can be used in clinics for cases where the main disability is in cervical spine. When treating the neck, the clinician can use our previous findings, to identify if is there a significant elevation in disability status of lumbar spine. For inverse cases where the main problem is in the lumbar region, there is a lack of clinical decision support data for the cervical region.
We would like to investigate the stability of NDI in the same time intervals as ODI for the following reasons; the interaction between low back disability and neck disability, the use of NDI and ODI together for disability assessment of the whole spine and common structural origin of ODI and NDI.
The purpose of this study is to investigate relatively short-term test re-test reliability of NDI by long term test re-retest reliability method of ODI on healthy office workers.
Population
Participants who have no chronic neck pain history were included in this study. The incidence of current neck pain was evaluated via visual analogue scale (VAS) prior to the study to ensure that participants met the inclusion criteria.
Participants who have following conditions were excluded from the study; chronic neck pain history, higher NDI scores showing more than a mild disability level, acute neck pain during the study, previous surgical operations related to neck pain, previous physical therapy innervations related to neck pain, inability to speak Turkish fluently, inability to complete questionnaires by themselves, or request not to participate the study [17].
How were the subjects recruited?
Subjects were invited to join the study by invitation e-mails. E-mail invitations were sent to administrative staff members and engineering faculty students. The study was described in details to the candidates who responded to the invitation mail. The candidates who accepted to participate in the study voluntarily and meet the study acceptance criteria were included in the study. All subjects gave their written consent.
The study was approved by the authorized local ethical committee for human researches (35877407/2015).
Data collection
Data collection was performed using a web-based NDI database. Participants were invited to complete web-based NDI forms by reminder e-mails. Reminder e-mails were sent to participants on Days 1, 2, 4, 8, 15, 30 [17]. NDI forms were requested to be submitted by the participants within 24 hours. Late submissions are not processed and participants were excluded from the study [17].
Most of the subjects were administrative staff members who worked 8 hours per day. The rest of the participants are engineering faculty students who have at least 5 hours of computer use per day [23–25].
Data analysis
A Kolmogorov-Smirnov test was used to check whether the data were normally distributed or not. A Friedman test was used to investigate the statistical significance of the differences between the NDI scores on the 1st day through the 30th day [17].
All tests were performed by using SPSS 11.
Results
Subjects
When the study had completed; 49 (20 female, 29 male) out of 106 (57 female, 49 male) were left for a response rate of 42%. The number of excluded participants and the reasons are as follows; 21 subjects whose first assessment scores were higher than the healthy population, 22 subjects did not answer the first NDI assessment properly and 14 subjects did not complete and submit the re-test questionnaires in 24 hours.
The mean age of the participants were 26.46±6.19 years old. The educational status of participants were bachelor degree (administrative staff members) or bachelor degree candidates (engineering faculty students).
NDI responses were not normally distributed. There is a floor and ceiling effect and data were evaluated by a non-parametric approach. The mean score of 294 questionnaires (6×49 = 294) over highest possible raw score (50) was 4.64±2.83 (median 5, IQR 0–4) (see Table 2). The mean score of all questionnaires was over the highest limit of no disability range ([0–4]) and under the lower limit of mild disability range ([5–14]). The scores of all subjects were between 0 and 12.
Mean and median NDI scores at different time intervals. No significant differences were found over time
Mean and median NDI scores at different time intervals. No significant differences were found over time
NDI daily scores were not normally distributed (KS = 0.078 p < 0.05). There was no statistically significant difference among the median scores of each day (χ= 9.275, p > 0.05). The median score difference from day1 is presented in Table 3 and Fig. 1.
NDI score change from baseline. No significant differences were found over time

(
Number of subjects in 3-point score interval is presented in Table 4.
Cumulative percentage of subjects per score interval. NDI scores in percentage and number of participants. Highest score can be 50
The most important finding of this study is that NDI has relatively short term test re-test reliability in healthy subjects over a 1 month time interval as ODI. And the test re-test validity is also valid in cases where both questionnaires are used in combination in this time interval.
In our previous study, we have found that the change in ODI scores in cumulative time intervals for healthy subjects is neither clinical nor statistically significant. Despite the differences between NDI and ODI, we have found the similar results for NDI in healthy participants [17].
Second important finding of this study is that NDI scores of healthy controls have a non-normal distribution, which may affect the selection of statistical tests required to compare patient groups with healthy controls. However, this result may not be as important as the ODI studies where healthy subjects are more commonly used as control group [17]. In NDI studies, generally NDI scores are used as a base assessment and future assessments are performed by comparing results with the first one.
NDI scores of a healthy population were accumulated in 0–≤12 interval. In our previous study, we have also found an accumulation for ODI between 0% –≤10% interval [17]. NDI score changes from baseline was between –0.15±2.54 and –0.95±2.82 (Table 3 and Fig. 2). The reported minimally clinical important change scores of NDI was between 3 to 5 for patient groups and 1.66 for clinically stable patients [12–15]. The changes were very small and clinically not significant. In our previous study, ODI score changes from baseline for healthy participants was also so small and clinically not significant (–0.1±3 and 0.9±3) [17].

NDI score change from baseline. Neither statistical nor clinical significant differences were found over time.
When we compare the results with studies included in Vernon’s review our study group has very lower mean NDI score (4.64±2.83). Also, in our study, we had 6 time points for test re-test reliability and this is higher than the studies included in Vernon’s Review.
The floor and ceiling effect is the reason for non-normal distribution in healthy controls for both ODI and NDI (Table 4). If the researcher wants to determine the health status of the patient groups according to the healthy controls, the use of non-parametric tests may be more suitable.
In the NDI literature, there are very long-term studies where NDI was used as a follow up instrument without healthy control groups. In this study the test re-test reliability of the NDI was tested on healthy subjects with a relatively short time interval. Future studies with long term time intervals (like 6 months, 1 or 2 years) may improve the findings of this study.
Even though the results of this study and our previous study may allow positive prediction for test re-test reliability of FRI and TDI (merge use) in healthy participants; the results can only be applicable to combine use of ODI and NDI. For FRI and TDI similar studies are necessary to ensure the test re-test reliability in healthy controls.
Conclusions
NDI and ODI is used together for disability assessment of the whole spine. To improve the clinical effectiveness of combine use new hybrid outcome measures are developed with selected items of ODI and NDI.
In this study, test re-test reliability of NDI was tested by using a cumulative time interval method we have developed for ODI. Despite the differences between ODI and NDI, NDI has very similar test re-test reliability response to ODI. The difference between the median score of day 1 was neither statistically nor clinically significant from days 2, 4, 8, 15, or 30 for NDI as ODI.
The conclusion is that NDI has relatively short-term test re-test reliability in healthy subjects over a 1 month time interval and the test re-test reliability is also valid in cases where both questionnaires are used in combination in this time interval.
Conflict of interest
None to report.
Footnotes
Acknowledgments
In the review process of the manuscript, there was an unavoidable delay. I would like to thank the reviewers and editor for their understanding.
