Abstract
BACKGROUND:
The Oxford shoulder score (OSS) questionnaire for measuring patient perception of shoulder disability, has not tested specifically in a non-surgical population and no study has assessed the OSS with modern psychometrics based on Rasch model (RM).
OBJECTIVE:
To assess the psychometric properties of the OSS using RM among health-care workers with shoulder disorders and to verify its interest in a non-surgical population.
METHODS:
In an occupational health department of a French hospital center, a retrospective review was performed of the medical records from June 2019 to October 2020. Responses to 110 questionnaires were examined from 55 subjects (97% of women). A polytomous Rasch model based on the Partial Credit Model was used.
RESULTS:
Overall fit was satisfactory, the reliability coefficient was high and an ascending order was observed with the 5 categories of the scale. Analysis of the residuals supports unidimensionality and the local independence assumption. Item performance remained stable across the subgroup examined (DIF measures). Scale to-sample targeting indicated a substantial floor effect, and the mildest impairments were not well discriminated.
CONCLUSIONS:
OSS presents good psychometric qualities. However, it does not clearly discriminate subjects presenting the lowest levels of impairment. Its use in a non-surgical population is questionable.
Keywords
Introduction
Physicians can use a validated questionnaire in their clinical practice to assess the functional level of activity of patients with shoulder disorders [1, 2]. Numerous questionnaires are available differing in length, psychometric properties, and target population [2–4]. One of the most commonly used shoulder-related functional questionnaires is the Oxford shoulder score (OSS), first described and validated by Dawson et al. in 1996 [5]. It was devised to specifically assess the outcomes of operations on the shoulder, excluding the problem of stabilization [5]. The OSS has a format which makes it easy to administer and has a high rate of patient compliance [3]. It has been adapted in different languages and is widely used internationally [6–8].
However, for some authors, the reliability and validity of the OSS have not been tested specifically on a non-surgical population [3, 9]. Also, to our knowledge, the OSS has never been evaluated using the Rasch approach [9–11]. The Rasch model (RM) presents some well-documented advantages over the classic test theory (CTT) [10, 11]. In addition, the RM allows assessing the key criteria of “objective measurement” such as unidimensionality, monotonicity and local independence [11]. If this is the case, raw scores may be converted to a linear interval scale allowing parametric statistical techniques to be used with confidence [10, 11]. The RM can be used to improve the evaluation of patient-reported outcome measures (PROMS) [10, 11].
It should be noted that work-related shoulder disorders are among the leading causes of occupational diseases among HCW [12, 13]. The shoulder is a body segment frequently affected due to occupational tasks in the hospital environment such as nursing care, manual handling and pushing/pulling [14–17]. The prevalence of shoulder symptoms in HCW is reported as relatively high by some authors: 37.8% [18], 44.8% [19], 55.0% [20], 60% [21], 85.8% [16]. Among HCW, shoulder disorders were found to be a major cause of absenteeism and, demands for a change of duty or job [13]. Also, in occupational medical visits made in hospitals, there is a need to use PROMS, like the OSS, to assess shoulder disorders. However, this raises the question as to whether the OSS is adapted to measure shoulder-related impairment levels in this non-surgical population.
Therefore, a study among HCW suffering shoulder disorders was conducted to assess the psychometric properties of the OSS by applying a RM and to verify its interest in a non-surgical adult population.
Methods
Study population
This study was organized in the Department of Occupational Health at a General Hospital in France (about 2350 HCW). The target population was the healthcare workers working in this hospital but not the patients.
According to French labor law, each HCW must benefit from a medical visit with an occupational physician at least every two years or at the time of returning to work after one sick leave longer than one month. During these visits, the OSS was usually used at the department of occupational health to assess shoulder disability. Also, all medical records from occupational visits between June 2019 and October 2020 were examined a posteriori to select the HCW having filled-in the OSS (self-administrated on paper) form during the medical visits. A questionnaire was included if at least one of the twelve items was answered. A clinical examination was performed in all the cases and the diagnostic of the shoulder pathology was based on medical imaging reports. Also, socio-demographic information about the subjects was analyzed.
The inclusion criteria for eligible HCW were ≥18 years of age, able to converse and read in French, and presenting a shoulder disorder caused by degenerative, inflammatory or traumatic pathologies. Using the OSS questionnaire, as defined in original paper from Dawson et al. 1996, HCW with shoulder instability (symptoms of dislocation or subluxation) were excluded [5, 22]. It exists a specific questionnaire for shoulder instability [23]. Also, shoulder pain originating from neurological or cardiovascular problems and language difficulties were excluded.
Statistical unit (shoulder)
The level of the shoulder disability was assessed at each medical visit by the OSS questionnaire and clinical examination. During the study period (from June 2020 to October 2021), a HCW could be seen several times in occupational medical visit and to complete several questionnaires. To correctly measure the level of shoulder disability observed in an occupational medical visit, all the questionnaires were retained whatever the clinical situations (first visit or follow-up visit). Note that the shoulder assessed at each medical visit by one OSS questionnaire was the subject of this study (statistical unit). Also, at the time of the medical visit, one subject having bilateral symptoms filled-in two questionnaires. At last, for the subjects having filled-in the OSS several times, the questionnaire was retained if the time interval was longer than four weeks. This time interval was chosen because the OSS questionnaire assesses the disability during the past four weeks (see Dawson et al, 1996) [5].
OSS questionnaire
The OSS is recommended for the disability evaluation of patients with shoulder disorders without unstable lesion [5, 22]. The OSS is a self-assessment instrument containing 12 items [5, 22]. It includes 4 items about pain (2 for pain, 2 for interference with pain) and 8 about daily functions. Respondents indicate whether they had experienced each symptom or problem within the past month on a 5-point Likert scale (from 0 to 4). A total score ranging from 0 (worst outcome) to 48 (best outcome) is obtained by adding the scores from each question (no subscores) [22]. The use of OSS does not need an endorsement [3]. We used the French translation reported by Tuton et al. [7].
In accordance with French legislation, this study was declared to the National Commission for Data Protection (n° 2219869) relating to the reference methodology (MR-004) and deposited at the public directory of studies at National Institute of Health data. Approval from the Ethical Committee was not needed for MR-004 methodology. All the volunteer participants enrolled gave their informed consent and the data were collected anonymously.
Statistical analyses
The RM was the method used for testing the psychometric characteristics of the OSS [10, 11]. The RM is a probability-based method used to analyze rating scales and evaluate a latent variable not measurable directly. This method uses a logistic function to transform raw ordinal scores into interval-level measurements (expressed in logits). It calculates item difficulty (item measure) in relation to person disability (person measure) by placing both on the same linear continuum of the latent variable. The positive (upper) part of the scale represents items with greater difficulty and persons with higher disability, while the negative (lower) side represents persons with lower disability and less difficult items. We used an RM for polytomous ordered responses based on the Partial Credit Model (PCM). All analyses were performed using R, version 3.1.0, with the R-packages TAM, lordif, eRm and ltm for RM [10].
The RM analysis plan was as follows.
Results
A total of 110 OSS questionnaires were included. Fifty-five subjects completed the questionnaire twice on average. Five subjects presented bilateral symptoms. We observed one refusal and excluded two questionnaires because they were completed within a time interval of less than 28 days.
The clinical characteristics of the shoulders examined are presented in Table 1. Finally, 97 percent of women completed the OSS with an average age of 44.8 (±9.3) years. Aide-nurse was the main occupation represented. The most frequently observed diagnostic was cuff pathology. The diagnostic was confirmed by a radiological exploration in 90 percent of cases. At the time of the medical visit, 41% of subjects were still on sick leave. The duration of the shoulder disorder was longer than 6 months in 72% of cases.
Clinical characteristics of the shoulders examined (n = 110*) and healthcare workers (n = 55**) from June 2019 to October 2020
Clinical characteristics of the shoulders examined (n = 110*) and healthcare workers (n = 55**) from June 2019 to October 2020
MRI: Magnetic Resonance Imaging; sd: standard deviation; VAS: visual analog scale; $at least one medical imagery of the shoulder: X-ray or Ultrasound or MRI or Scanner; +from June 2019 to October 2020, a healthcare worker could complete the questionnaire several times depending on the clinical course of the disease; **at the first occupational visit *number of the OSS questionnaire linked to one shoulder disorder; §from medical imagery.
The description of the items is reported in Table 2. The item difficulties range from –0.9 logits for item 8 (the easiest) to 2.5 logits for item 4 (the most difficult). There was a floor effect for 8 items and no ceiling effect for any of the items. With respect to the RM and the good adjustment to the model, all the items, have mean square infit or outfit values between 0.6 and 1.3. Therefore, no underfit or overfit was reported. The internal consistency of the OSS expressed as Cronbach’s alpha is high with r≥0.9 for all the items. The correlations between each item and the total score are also high (>0.7), except for one item (>0.6, moderate correlation), meaning that each item is a “good” contributor to what the test measures [29].
Item analyses of OSS questionnaire used among hospital workers having shoulder disorders (n = 110)
n: number; sd: standard deviation; iqr: interquartile range; Coef. Alpha: Cronbach’s alpha; MSQ: mean-square statistics; OSS: Oxford shoulder score. aFloor and ceiling effects represent the number and proportion of study subjects with the worst (4) or best value (0) for each item of the OSS. bstrength of correlation: 0.5 to 0.7 (moderate positive correlation); 0.7 to 0.9 (high positive correlation); 0.9 to 1.00 (very high positive correlation) [29]. $Items are ordered by increasing difficulty.
In Table 3, several summary statistics for a fit to a Rasch model are presented. The PSI and PR values indicated the good discriminant ability of the scale. From the G index, 4.9 statistically different levels of subject ability were distinguished in our sample. However, when interpretating the score, note that 49% of the subjects are classed in the “mild” category. The average of all the item residual correlations was –0.08 near zero. Using an adjusted Yen’s Q3 statistic, only three inter-item correlations (4.5%) were found to have local dependence (see Supplementary Table 1). It is reasonable to conclude that there is local independence in the data set.
Summary statistics for fit to a polytomous Rasch model
sd: standard deviation; iqr: interquartile range; r: correlation of spearman. Person-separation index: cut-off >2; Person reliability: cut-off >0.8. §n: number of correlations ≥Q3, max –
Figure 1 presents the person-item map. First, item difficulty locations (black points) cluster at the range of subjects with higher person parameters (right side of the latent dimension). There was a lack of matching items across the whole range of the latent dimension. More precisely, we observe no item difficulty location (black points) among many persons with mild impairment (left side of the latent dimension). This indicates a substantial floor effect. This observation argues for a bad calibration. The OSS questionnaire is made of too many difficult items for the ability level of the examined population in occupational medical visit. Second, we can observe that the ordered thresholds were fulfilled for all items and that, overall, the important criterion of monotonicity was validated.

Person-Item Map. The top of the figure shows the distribution of person parameters and the bottom displays the locations of the thresholds (white point with numbers) and item difficulty parameters (black point without a number). Vertical dashed lines indicate the lower (left) and upper (right) extent of instrument coverage for the questioned population if a good targeting.
Figure 2 displays the PCA results. With respect the raw responses (see Box A), we observed for the first factor a variance of 55 % and the eigenvalue for the second largest dimension <2. From the residuals (see Box B), the unexplained variance by the first contrast was 1.9 eigenvalue units. Also, the Martin-Loef Test was not significant. Thus, no evidence of multidimensionality was observed.

Principal composant analysis (eigenvalues). A – from raw responses at OSS scale. B- standardized residuals from Rasch model.
The DIF results are presented in Table 4. Results of the LR test show that, at the level 0.4%, only three items were concerned. Item 5 (“do the household shopping”) was affected by uniform DIF for age. Here, in subjects with the same level of inability (latent trait), the younger women (<45 years) responded in different ways (higher item score) compared to women ≥45 years old. Item 11 (“pain interfered with work (including housework)”) and item 4 (“use a knife and fork”) was affected by non-uniform DIF. When impairment duration <1 year and subjects on sick leave, the two item scores were higher for high values of inability. However, the magnitudes for all these DIF from these three items were a moderate with pseudo-R2 <0.07.
DIF results from OSS-12 items scale based on three subgroups (Age, Impairment duration, Sick leave)
From the RM, the OSS scale appears to be a valid measurement for shoulder disorders without unstable lesion among HCW. However, its use during occupational medical visits among HCW (non-surgical population) was unable to discriminate between subjects with the lowest levels of shoulder impairment.
The strengths of this study are that there were no missing data and we had extremely high response rates. Also, our study is based on precise medical information. All the eligible subjects during the study period underwent a clinical assessment and the diagnostic was confirmed by at least one radiology examination in 90% of the case. Note that the OSS was filled-in by all the HCW (homogenous population) in similar conditions (location, explanations, paper support) and, supervised by the same occupational physician. This limited measurement errors. However, this study also presents several limitations. First, the HCW recruited were from a single center, which may not be representative of the entire health-care population in France (few men recruited). Second, although for some authors, sample sizes as small as 100 are often adequate for estimating stable Rash-model parameters [30–32], the sample size remains relatively small in this study. Also, some results of this study need be confirmed by another study with a larger sample.
To our knowledge, there are no other publications based on RM on OSS with which we can compare our results or another study led in the workplace, as reported by several authors [3, 9]. Apart from the results from the RM, several results of our study were consistent with other studies reporting OSS as reliable, valid, and responsive [3, 22]. Our results confirm a unidimensional structure (see Fig. 2) and good internal consistency with a Cronbach’s alpha higher than 0.9 (see Table 1). As expected, the overall mean score for OSS in our non-surgical population (mean at 16.1, SD±9.41, range from 1 to 35) was markedly lower than those observed in other surgical populations such as in one French study [7] (mean at 32.7, SD±10.29, range from 9 at 48) or in one British study [33] (mean at 33.0, range from 31.3 to 34.8).
Our study has several clinical meanings. RM is considered as a relatively new method presenting some advantages over CTT [10, 11]. Its application verifies several key assumptions including the unidimensionality of the latent trait, monotonicity, local independence, and stability measured [10, 11]. In this study, the results from the PCA of the standardized residuals argues for one dimension, confirming a single characteristic in common (latent trait). With respect to monotonicity, the visual check in the person-item map shows that the response probabilities are arranged in ascending order, concordant with the categories of the items. Item performance remains stable across the subgroup examined in this study (see DIF measures). It is reasonable to conclude that local independence exists in the data set. Therefore, all these key assumptions mentioned above were verified. Our results reinforce the information about reliability and validity of the OSS, notably at the individual patient level.
Otherwise, the sampling distribution of the respondents is another important consideration to be explored (showing good calibration) [10, 11]. Although we did not observe a floor effect, our analysis of scale-to-sample targeting revealed that targeting at the scale was substantially better for subjects with the highest levels of impairment that the lowest levels. This limited the discrimination between HCW with mild shoulder disorders. In other words, the items from OSS in our study context would be too difficult. This limitation could be addressed by adding one or several items to the lower end of the scale.
On the basis of our results, several research perspectives exist in clinical practice. First, as pointed out by certain authors, further testing in a non-surgical population is needed [3]. Our study partially answers this question because our population was strictly selected (only women working in hospital). Second, we did not find normative data for OSS in the literature published with which we can compare our results [3, 34]. This is needed to improve the interpretation of the score for a given population (e.g., levels of severity, minimal significant clinical change) [3, 34]. Third, it should be recalled that OSS reflects a specific view of disability by developers. For the OSS, the choice was to capture joint-specific problems and to avoid the influence of co-morbidity [5, 35]. However, using the International Classification of Functioning, Disability and Health as reference, OSS explores a limited range of domains related to disability (e.g., psychological and social functioning) [35–37]. Therefore, more studies are needed to investigate the place of the OSS questionnaire among other shoulder-related questionnaires in the context of return work after shoulder disorders in the workplace [35–39].
Conclusion
This study used the RM and confirmed that OSS presents good psychometric qualities but does not clearly discriminate non-surgical subjects with shoulder disorders presenting the lowest levels of impairment such as received in occupational medical visits. Its use for a non-surgical population is questionable.
Conflict of interest
The authors declare that they have no conflicts of interest.
Ethical approval
This study was declared to the French National Commission for Data Protection (no. 2219869) related to reference methodology (MR-004) and deposited in a public directory of studies at the National Institute of Health data. Approval from the Ethical Committee regarding the MR-004 methodology was not needed. All the volunteer participants gave their informed consent to be enrolled and the data were collected anonymously.
