Rasch analysis of the Oxford shoulder score in a non-surgical occupational population

Abstract

BACKGROUND:

The Oxford shoulder score (OSS) questionnaire for measuring patient perception of shoulder disability, has not tested specifically in a non-surgical population and no study has assessed the OSS with modern psychometrics based on Rasch model (RM).

OBJECTIVE:

To assess the psychometric properties of the OSS using RM among health-care workers with shoulder disorders and to verify its interest in a non-surgical population.

METHODS:

In an occupational health department of a French hospital center, a retrospective review was performed of the medical records from June 2019 to October 2020. Responses to 110 questionnaires were examined from 55 subjects (97% of women). A polytomous Rasch model based on the Partial Credit Model was used.

RESULTS:

Overall fit was satisfactory, the reliability coefficient was high and an ascending order was observed with the 5 categories of the scale. Analysis of the residuals supports unidimensionality and the local independence assumption. Item performance remained stable across the subgroup examined (DIF measures). Scale to-sample targeting indicated a substantial floor effect, and the mildest impairments were not well discriminated.

CONCLUSIONS:

OSS presents good psychometric qualities. However, it does not clearly discriminate subjects presenting the lowest levels of impairment. Its use in a non-surgical population is questionable.

Keywords

Shoulder patient health questionnaire psychometrics health personnel hospitals occupational health

1 Introduction

Physicians can use a validated questionnaire in their clinical practice to assess the functional level of activity of patients with shoulder disorders [1, 2]. Numerous questionnaires are available differing in length, psychometric properties, and target population [2 –4]. One of the most commonly used shoulder-related functional questionnaires is the Oxford shoulder score (OSS), first described and validated by Dawson et al. in 1996 [5]. It was devised to specifically assess the outcomes of operations on the shoulder, excluding the problem of stabilization [5]. The OSS has a format which makes it easy to administer and has a high rate of patient compliance [3]. It has been adapted in different languages and is widely used internationally [6 –8].

However, for some authors, the reliability and validity of the OSS have not been tested specifically on a non-surgical population [3, 9]. Also, to our knowledge, the OSS has never been evaluated using the Rasch approach [9 –11]. The Rasch model (RM) presents some well-documented advantages over the classic test theory (CTT) [10, 11]. In addition, the RM allows assessing the key criteria of “objective measurement” such as unidimensionality, monotonicity and local independence [11]. If this is the case, raw scores may be converted to a linear interval scale allowing parametric statistical techniques to be used with confidence [10, 11]. The RM can be used to improve the evaluation of patient-reported outcome measures (PROMS) [10, 11].

It should be noted that work-related shoulder disorders are among the leading causes of occupational diseases among HCW [12, 13]. The shoulder is a body segment frequently affected due to occupational tasks in the hospital environment such as nursing care, manual handling and pushing/pulling [14 –17]. The prevalence of shoulder symptoms in HCW is reported as relatively high by some authors: 37.8% [18], 44.8% [19], 55.0% [20], 60% [21], 85.8% [16]. Among HCW, shoulder disorders were found to be a major cause of absenteeism and, demands for a change of duty or job [13]. Also, in occupational medical visits made in hospitals, there is a need to use PROMS, like the OSS, to assess shoulder disorders. However, this raises the question as to whether the OSS is adapted to measure shoulder-related impairment levels in this non-surgical population.

Therefore, a study among HCW suffering shoulder disorders was conducted to assess the psychometric properties of the OSS by applying a RM and to verify its interest in a non-surgical adult population.

2 Methods

2.1 Study population

This study was organized in the Department of Occupational Health at a General Hospital in France (about 2350 HCW). The target population was the healthcare workers working in this hospital but not the patients.

According to French labor law, each HCW must benefit from a medical visit with an occupational physician at least every two years or at the time of returning to work after one sick leave longer than one month. During these visits, the OSS was usually used at the department of occupational health to assess shoulder disability. Also, all medical records from occupational visits between June 2019 and October 2020 were examined a posteriori to select the HCW having filled-in the OSS (self-administrated on paper) form during the medical visits. A questionnaire was included if at least one of the twelve items was answered. A clinical examination was performed in all the cases and the diagnostic of the shoulder pathology was based on medical imaging reports. Also, socio-demographic information about the subjects was analyzed.

The inclusion criteria for eligible HCW were ≥18 years of age, able to converse and read in French, and presenting a shoulder disorder caused by degenerative, inflammatory or traumatic pathologies. Using the OSS questionnaire, as defined in original paper from Dawson et al. 1996, HCW with shoulder instability (symptoms of dislocation or subluxation) were excluded [5, 22]. It exists a specific questionnaire for shoulder instability [23]. Also, shoulder pain originating from neurological or cardiovascular problems and language difficulties were excluded.

2.2 Statistical unit (shoulder)

The level of the shoulder disability was assessed at each medical visit by the OSS questionnaire and clinical examination. During the study period (from June 2020 to October 2021), a HCW could be seen several times in occupational medical visit and to complete several questionnaires. To correctly measure the level of shoulder disability observed in an occupational medical visit, all the questionnaires were retained whatever the clinical situations (first visit or follow-up visit). Note that the shoulder assessed at each medical visit by one OSS questionnaire was the subject of this study (statistical unit). Also, at the time of the medical visit, one subject having bilateral symptoms filled-in two questionnaires. At last, for the subjects having filled-in the OSS several times, the questionnaire was retained if the time interval was longer than four weeks. This time interval was chosen because the OSS questionnaire assesses the disability during the past four weeks (see Dawson et al, 1996) [5].

2.3 OSS questionnaire

The OSS is recommended for the disability evaluation of patients with shoulder disorders without unstable lesion [5, 22]. The OSS is a self-assessment instrument containing 12 items [5, 22]. It includes 4 items about pain (2 for pain, 2 for interference with pain) and 8 about daily functions. Respondents indicate whether they had experienced each symptom or problem within the past month on a 5-point Likert scale (from 0 to 4). A total score ranging from 0 (worst outcome) to 48 (best outcome) is obtained by adding the scores from each question (no subscores) [22]. The use of OSS does not need an endorsement [3]. We used the French translation reported by Tuton et al. [7].

In accordance with French legislation, this study was declared to the National Commission for Data Protection (n° 2219869) relating to the reference methodology (MR-004) and deposited at the public directory of studies at National Institute of Health data. Approval from the Ethical Committee was not needed for MR-004 methodology. All the volunteer participants enrolled gave their informed consent and the data were collected anonymously.

2.4 Statistical analyses

The RM was the method used for testing the psychometric characteristics of the OSS [10, 11]. The RM is a probability-based method used to analyze rating scales and evaluate a latent variable not measurable directly. This method uses a logistic function to transform raw ordinal scores into interval-level measurements (expressed in logits). It calculates item difficulty (item measure) in relation to person disability (person measure) by placing both on the same linear continuum of the latent variable. The positive (upper) part of the scale represents items with greater difficulty and persons with higher disability, while the negative (lower) side represents persons with lower disability and less difficult items. We used an RM for polytomous ordered responses based on the Partial Credit Model (PCM). All analyses were performed using R, version 3.1.0, with the R-packages TAM, lordif, eRm and ltm for RM [10].

The RM analysis plan was as follows. Scale-to-sample targeting – Scale-to-sample targeting was examined in two ways. Firstly, a person-item map was used to verify whether the items were of an appropriate level of difficulty for the ability of the population studied [10]. Ideally, the distribution of item difficulties should match the distribution of a person’s abilities. This can be presented as a calibration process. Person-item map represents a visual check of this. Secondly, floor and ceiling effects were considered present if more than 15% of the respondents achieved the highest or lowest possible score, respectively [24]. Category functioning –it was expected that the response probabilities were arranged in ascending order concordant with the categories. This refers to monotonicity [11]. Here, the ordering of thresholds was examined in the person-item map. Unidimensionality –This assumes the ability of a scale to measure just one characteristic in common called latent trait [11]. To assess this assumption, a principal component analysis (PCA) was conducted on the raw responses and on standardized residuals from the RM [11]. The common criterion for unidimensionality was that at least 50% of the variance should be explained by the first dimension [25]. Also, it is expected that in a PCA no factor can be removed from the standardized residuals (first contrast, eigenvalue ≤2). The Martin-Loef Test was also used (p-value <0.05 if violation of unidimensionality) [26]. Local independency – This residual correlation represents the relationship between the portion of the items that was not associated with the latent trait [11]. If local independency holds, this correlation should be near zero. The adjusted Yen’s Q3 statistic (max – ${\bar{Q}}_{3}$ ) was used as critical value to identify the dependent pairs of items and their number [27]. Good fit – We examined two indicators. Firstly, the simple average (unweighted) of the squared standardized residuals was defined as the outfit mean square (outfit MNSQ) [11]. Secondly, the standardized residuals were weighted by their individual variance defining the infit mean square (infit MNSQ) [11]. Ideal values for both are about 1.0 with the 0.5–1.7 range considered satisfactory in this study [11]. Reliability and separation statistics –Person separation indices provide an estimate of the ability of the OSS to discriminate between strata or groups [11]. Person reliability (PR) is the proportion of the variance of person estimates that is not due to error. This is an estimate of the internal consistency reliability, equivalent to the Cronbach α test (acceptable value:≥0.80 [11]). The person separation index (PSI) determines the number of groups (>2, minimum requirement) [11]. A G index is also used to calculate the number of distinct levels of disability (strata) that the items can distinguish (Number of the strata = [4×PSI + 1]/3) [11]. Stability - differential item functioning (DIF) measures the degree to which item performance remains stable across group [10, 11]. A hybrid ordinal logistic regression and an item response theory approach for DIF detection were used [10, 28]. Presence of DIF uniform (effect is constant) or non-uniform (effect is no constant) was evaluated across age, impairment duration and sick leave. A likelihood ratio X² with a Bonferroni adjustment (α= 0.05/number of items) was used to compare the different models [28]. Also, the magnitude of DIF based on the effect size measures, McFadden’s pseudo-R² was evaluated: negligible <0.035; moderate, between 0.035 and 0.07; large, ≥0.07 [28].

3 Results

A total of 110 OSS questionnaires were included. Fifty-five subjects completed the questionnaire twice on average. Five subjects presented bilateral symptoms. We observed one refusal and excluded two questionnaires because they were completed within a time interval of less than 28 days.

The clinical characteristics of the shoulders examined are presented in Table 1. Finally, 97 percent of women completed the OSS with an average age of 44.8 (±9.3) years. Aide-nurse was the main occupation represented. The most frequently observed diagnostic was cuff pathology. The diagnostic was confirmed by a radiological exploration in 90 percent of cases. At the time of the medical visit, 41% of subjects were still on sick leave. The duration of the shoulder disorder was longer than 6 months in 72% of cases.

Table 1
Clinical characteristics of the shoulders examined (n = 110^*) and healthcare workers (n = 55^**) from June 2019 to October 2020

Shoulders examined⁺ Healthcare workers⁺

n (%) Mean (sd) n (%) Mean (sd)

Gender

(women) 107 (97) 55 (95%)

Age

years 44.8 (9.3) 45.9 (8.9)

≥45 years 56 (51) 33 (60)

Occupation

Aide-nurse 69 (63) 31 (56)

Nurse 15 (14) 10 (18)

Hotel assistant 9 (8) 3 (6)

Other 17 (15) 11 (20)

Sick leave at the time of the visit

yes 45 (41) 25 (45)

Diagnostic^§

Cuff tendinotis/bursitis 89 (81) 43 (86)

with tear (partial or total) 25 (23) 12 (24)

with calcification 21 (19) 9 (18)

Adhesive capsulitis/algoneurodystrophy 3 (3) 2 (4)

Traumatic sequelae 5 (4) 3 (6)

Other diagnostics 1 (1) 1 (2)

Without diagnostic^§ 12 (11) 6 (11)

Affected shoulder side

(dominant) 74 (69) 41 (77)

Radiologic exploration

At least one medical imagery^$ 99 (90) 50 (91)

with MRI or Scanner 70 (64) 34 (62)

Surgical operation

Yes 25 (23) 11 (20)

Loss of motion

(abduction < 90°) 32 (31) 17 (33)

Pain (VAS) 3.7 (1.9) 4.0 (2.1)

Pathology duration

≤6 month 28 (28) 20 (37)

7–24 month 51 (51) 26 (47)

≥25 month 20 (20) 9 (16)

	Shoulders examined⁺	Healthcare workers⁺
Gender
(women)	107 (97)		55 (95%)
Age
years		44.8 (9.3)		45.9 (8.9)
≥45 years	56 (51)		33 (60)
Occupation
Aide-nurse	69 (63)		31 (56)
Nurse	15 (14)		10 (18)
Hotel assistant	9 (8)		3 (6)
Other	17 (15)		11 (20)
Sick leave at the time of the visit
yes	45 (41)		25 (45)
Diagnostic^§
Cuff tendinotis/bursitis	89 (81)		43 (86)
with tear (partial or total)	25 (23)		12 (24)
with calcification	21 (19)		9 (18)
Adhesive capsulitis/algoneurodystrophy	3 (3)		2 (4)
Traumatic sequelae	5 (4)		3 (6)
Other diagnostics	1 (1)		1 (2)
Without diagnostic^§	12 (11)		6 (11)
Affected shoulder side
(dominant)	74 (69)		41 (77)
Radiologic exploration
At least one medical imagery^$	99 (90)		50 (91)
with MRI or Scanner	70 (64)		34 (62)
Surgical operation
Yes	25 (23)		11 (20)
Loss of motion
(abduction < 90°)	32 (31)		17 (33)
Pain (VAS)		3.7 (1.9)	4.0 (2.1)
Pathology duration
≤6 month	28 (28)		20 (37)
7–24 month	51 (51)		26 (47)
≥25 month	20 (20)		9 (16)

MRI: Magnetic Resonance Imaging; sd: standard deviation; VAS: visual analog scale; ^$at least one medical imagery of the shoulder: X-ray or Ultrasound or MRI or Scanner; ⁺from June 2019 to October 2020, a healthcare worker could complete the questionnaire several times depending on the clinical course of the disease; ^**at the first occupational visit ^*number of the OSS questionnaire linked to one shoulder disorder; ^§from medical imagery.

The description of the items is reported in Table 2. The item difficulties range from –0.9 logits for item 8 (the easiest) to 2.5 logits for item 4 (the most difficult). There was a floor effect for 8 items and no ceiling effect for any of the items. With respect to the RM and the good adjustment to the model, all the items, have mean square infit or outfit values between 0.6 and 1.3. Therefore, no underfit or overfit was reported. The internal consistency of the OSS expressed as Cronbach’s alpha is high with r≥0.9 for all the items. The correlations between each item and the total score are also high (>0.7), except for one item (>0.6, moderate correlation), meaning that each item is a “good” contributor to what the test measures [29].

Table 2

Item analyses of OSS questionnaire used among hospital workers having shoulder disorders (n = 110)

Item	Mean	Median	Skewness	Floor	Ceiling	item-total	Coef.	Difficulty^$	Outfit	Infit
	(sd)	(iqr)		effect^a	effect^a	correlation^b	alpha	(logits)	(MSQ)	(MSQ)
				n (%)	n (%)
q8 describe the pain you usually	2.4(1.0)	3(2–3)	–0.5	5(5)	14(13)	0.7	0.9	–0.9	1.0	0.9
q1 the worst pain	2.2(0.9)	2(2–3)	–0.0	2 (2)	6 (5)	0.7	0.9	–0.6	1.0	1.0
q12 pain in bed at night	2.2(1.2)	2(1–3)	–0.3	12(11)	16(15)	0.7	0.9	–0.5	1.2	1.3
q11 pain interfered with the work	1.9(0.9)	2(1–2)	–0.1	5(5)	1(1)	0.8	0.9	0.4	0.8	0.8
q2 any trouble dressing yourself	1.1(1.0)	1(0–2)	0.5	38(35)	9(8)	0.8	0.9	0.8	0.7	0.8
q9 hang your clothes up	1.4(1.2)	1(0–2)	0.4	30(27)	4(4)	0.9	0.9	0.9	0.6	0.6
q5 do the household shopping	1.4(1.0)	1(1–2)	0.5	22(20)	3(3)	0.8	0.9	1.0	0.8	0.7
q7 brush/comb your hair	1.3(1.0)	1(0–2)	0.4	30(27)	1(1)	0.7	0.9	1.4	1.0	1.1
q6 carry a tray	1.0(1.0)	1(0–2)	0.9	41(37)	3(3)	0.8	0.9	1.5	1.0	0.9
q10 to wash under both arms	0.9(1.1)	0(0–1)	1.0	58(53)	1(1)	0.8	0.9	2.0	0.8	0.8
q3 any trouble using transport	0.4(0.7)	0(0–1)	1.8	77(70)	2(2)	0.7	0.9	2.4	0.7	1.0
q4 use a knife and fork	0.3(0.7)	0(0–0)	2.3	87(79)	3(3)	0.6	0.9	2.5	0.8	1.0

n: number; sd: standard deviation; iqr: interquartile range; Coef. Alpha: Cronbach’s alpha; MSQ: mean-square statistics; OSS: Oxford shoulder score. ^aFloor and ceiling effects represent the number and proportion of study subjects with the worst (4) or best value (0) for each item of the OSS. ^bstrength of correlation: 0.5 to 0.7 (moderate positive correlation); 0.7 to 0.9 (high positive correlation); 0.9 to 1.00 (very high positive correlation) [29]. ^$Items are ordered by increasing difficulty.

In Table 3, several summary statistics for a fit to a Rasch model are presented. The PSI and PR values indicated the good discriminant ability of the scale. From the G index, 4.9 statistically different levels of subject ability were distinguished in our sample. However, when interpretating the score, note that 49% of the subjects are classed in the “mild” category. The average of all the item residual correlations was –0.08 near zero. Using an adjusted Yen’s Q3 statistic, only three inter-item correlations (4.5%) were found to have local dependence (see Supplementary Table 1). It is reasonable to conclude that there is local independence in the data set.

Table 3

Summary statistics for fit to a polytomous Rasch model

Statistic	Value
From raw data
Total scores
mean (sd)	16.4 (8.7)
median (iqr)	15 (10–23)
Cronbach’s alpha	0.9
Score interpretation^$(n, %) [3]
Satisfactory (0–8)	19 (17)
Mild (9–18)	52 (47)
Moderate (19–28)	23 (21)
Severe (29–48)	16 (15)
From Rasch model
Person reliability	0.9
Person-separation index	3.4
G Index	4.9
Adjusted Yen’s Q3 [24]
Mean (sd)	–0.08 (–0.2)
Cut-off (r)	0.33
Number^§(n/N,%)	3/66 (4.5)

sd: standard deviation; iqr: interquartile range; r: correlation of spearman. Person-separation index: cut-off >2; Person reliability: cut-off >0.8. ^§n: number of correlations ≥Q3, max – ${\bar{Q}}_{3}$ ; (cut_off) N: number of correlations. ^$level of impairment from OSS.

Figure 1 presents the person-item map. First, item difficulty locations (black points) cluster at the range of subjects with higher person parameters (right side of the latent dimension). There was a lack of matching items across the whole range of the latent dimension. More precisely, we observe no item difficulty location (black points) among many persons with mild impairment (left side of the latent dimension). This indicates a substantial floor effect. This observation argues for a bad calibration. The OSS questionnaire is made of too many difficult items for the ability level of the examined population in occupational medical visit. Second, we can observe that the ordered thresholds were fulfilled for all items and that, overall, the important criterion of monotonicity was validated.

Fig. 1

Person-Item Map. The top of the figure shows the distribution of person parameters and the bottom displays the locations of the thresholds (white point with numbers) and item difficulty parameters (black point without a number). Vertical dashed lines indicate the lower (left) and upper (right) extent of instrument coverage for the questioned population if a good targeting.

Figure 2 displays the PCA results. With respect the raw responses (see Box A), we observed for the first factor a variance of 55 % and the eigenvalue for the second largest dimension <2. From the residuals (see Box B), the unexplained variance by the first contrast was 1.9 eigenvalue units. Also, the Martin-Loef Test was not significant. Thus, no evidence of multidimensionality was observed.

Fig. 2

Principal composant analysis (eigenvalues). A – from raw responses at OSS scale. B- standardized residuals from Rasch model.

The DIF results are presented in Table 4. Results of the LR test show that, at the level 0.4%, only three items were concerned. Item 5 (“do the household shopping”) was affected by uniform DIF for age. Here, in subjects with the same level of inability (latent trait), the younger women (<45 years) responded in different ways (higher item score) compared to women ≥45 years old. Item 11 (“pain interfered with work (including housework)”) and item 4 (“use a knife and fork”) was affected by non-uniform DIF. When impairment duration <1 year and subjects on sick leave, the two item scores were higher for high values of inability. However, the magnitudes for all these DIF from these three items were a moderate with pseudo-R² <0.07.

Table 4

DIF results from OSS-12 items scale based on three subgroups (Age, Impairment duration, Sick leave)

	Age				Impairment duration				Sick leave
	< 45 y versus ≥45 y				< 1 year versus ≥1 year				Yes versus no
	Uniform DIF		Non-uniform DIF		Uniform DIF		Non-uniform DIF		Uniform DIF		Non-uniform DIF
	p-value $X_{12}^{2}$	Δ ₁₂ R ²	p-value $X_{23}^{2}$	Δ ₂₃ R ²	$X_{12}^{2}$	Δ ₁₂ R ²	$X_{23}^{2}$	Δ ₂₃ R ²	$X_{12}^{2}$	Δ ₁₂ R ²	$X_{23}^{2}$	Δ ₂₃ R ²
	%		%
q1 -the worst pain	1^*	A	33	A	0.6^*	A	33	A	3^*	A	4	A
q2 -any trouble dressing yourself	3.1^*	A	15	A	55	A	98	A	2^*	A	57	A
q3 -any trouble using transport	86	A	21	A	99	A	55	A	19	A	57	A
q4 -use a knife and fork	77	A	20	A	72	A	0.3^*+	B	47	A	85	A
q5 -do the household shopping	0.4^*+	B	49	A	46	A	44	A	43	A	40	A
q6 -carry a tray	2^*	A	59	A	62	A	25	A	19	A	37	A
q7 -brush/comb your hair	3.2^*	A	77	A	40	A	58	A	69	A	58	A
q8 -describe the pain you usually	55	A	89	A	98	A	48	A	87	A	62	A
q9 -hang your clothes up	48	A	77	A	69	A	1^*	A	89	A	19	A
q10 -to wash under both arms	15	A	59	A	54	A	22	A	13	A	92	A
q11 -pain interfered with the work	8	A	43	A	29	A	1^*	A	14	A	<0.1^*+	B
q12 -pain in bed at night	24	A	68	A	95	A	31	A	94	A	45	A

Q: question; OSS: Oxford shoulder score; DIF: Differential item functioning; ^*p-value <5%; ⁺p-value <0.4% (Bonferroni adjusted type I error rate with α= 5% /12 = 0.4%). X²: likehood ratio; X² test to compare the nested models with p-value in percent (%); $X_{12}^{2}$ : comparison; Model 1 (explanatory variable) versus; Model 2 (explanatory variable + vector of group identifiers). $X_{23}^{2}$ : comparison; Model 2 (explanatory variable + vector of group identifiers) versus; Model 3 (explanatory variable + vector of group identifiers+explanatory variable^*vector of group identifiers). ΔR²: magnitude of DIF based on the effect size measure from McFadden’s pseudo R²; Interpretation of magnitude: A =ΔR² < 0.035 (negligible); B =ΔR² between 0.035 and 0.07 (moderate); C =ΔR² ⩾ 0.07 (large). Δ ₁₂R²: Change of McFadden’s pseudo-R² between Model 1 and Model 2; Δ ₂₃R²: Change of McFadden’s pseudo R² between Model 2 and Model 3.

4 Discussion

From the RM, the OSS scale appears to be a valid measurement for shoulder disorders without unstable lesion among HCW. However, its use during occupational medical visits among HCW (non-surgical population) was unable to discriminate between subjects with the lowest levels of shoulder impairment.

The strengths of this study are that there were no missing data and we had extremely high response rates. Also, our study is based on precise medical information. All the eligible subjects during the study period underwent a clinical assessment and the diagnostic was confirmed by at least one radiology examination in 90% of the case. Note that the OSS was filled-in by all the HCW (homogenous population) in similar conditions (location, explanations, paper support) and, supervised by the same occupational physician. This limited measurement errors. However, this study also presents several limitations. First, the HCW recruited were from a single center, which may not be representative of the entire health-care population in France (few men recruited). Second, although for some authors, sample sizes as small as 100 are often adequate for estimating stable Rash-model parameters [30 –32], the sample size remains relatively small in this study. Also, some results of this study need be confirmed by another study with a larger sample.

To our knowledge, there are no other publications based on RM on OSS with which we can compare our results or another study led in the workplace, as reported by several authors [3, 9]. Apart from the results from the RM, several results of our study were consistent with other studies reporting OSS as reliable, valid, and responsive [3 , 22]. Our results confirm a unidimensional structure (see Fig. 2) and good internal consistency with a Cronbach’s alpha higher than 0.9 (see Table 1). As expected, the overall mean score for OSS in our non-surgical population (mean at 16.1, SD±9.41, range from 1 to 35) was markedly lower than those observed in other surgical populations such as in one French study [7] (mean at 32.7, SD±10.29, range from 9 at 48) or in one British study [33] (mean at 33.0, range from 31.3 to 34.8).

Our study has several clinical meanings. RM is considered as a relatively new method presenting some advantages over CTT [10, 11]. Its application verifies several key assumptions including the unidimensionality of the latent trait, monotonicity, local independence, and stability measured [10, 11]. In this study, the results from the PCA of the standardized residuals argues for one dimension, confirming a single characteristic in common (latent trait). With respect to monotonicity, the visual check in the person-item map shows that the response probabilities are arranged in ascending order, concordant with the categories of the items. Item performance remains stable across the subgroup examined in this study (see DIF measures). It is reasonable to conclude that local independence exists in the data set. Therefore, all these key assumptions mentioned above were verified. Our results reinforce the information about reliability and validity of the OSS, notably at the individual patient level.

Otherwise, the sampling distribution of the respondents is another important consideration to be explored (showing good calibration) [10, 11]. Although we did not observe a floor effect, our analysis of scale-to-sample targeting revealed that targeting at the scale was substantially better for subjects with the highest levels of impairment that the lowest levels. This limited the discrimination between HCW with mild shoulder disorders. In other words, the items from OSS in our study context would be too difficult. This limitation could be addressed by adding one or several items to the lower end of the scale.

On the basis of our results, several research perspectives exist in clinical practice. First, as pointed out by certain authors, further testing in a non-surgical population is needed [3]. Our study partially answers this question because our population was strictly selected (only women working in hospital). Second, we did not find normative data for OSS in the literature published with which we can compare our results [3, 34]. This is needed to improve the interpretation of the score for a given population (e.g., levels of severity, minimal significant clinical change) [3, 34]. Third, it should be recalled that OSS reflects a specific view of disability by developers. For the OSS, the choice was to capture joint-specific problems and to avoid the influence of co-morbidity [5, 35]. However, using the International Classification of Functioning, Disability and Health as reference, OSS explores a limited range of domains related to disability (e.g., psychological and social functioning) [35 –37]. Therefore, more studies are needed to investigate the place of the OSS questionnaire among other shoulder-related questionnaires in the context of return work after shoulder disorders in the workplace [35 –39].

5 Conclusion

This study used the RM and confirmed that OSS presents good psychometric qualities but does not clearly discriminate non-surgical subjects with shoulder disorders presenting the lowest levels of impairment such as received in occupational medical visits. Its use for a non-surgical population is questionable.

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical approval

This study was declared to the French National Commission for Data Protection (no. 2219869) related to reference methodology (MR-004) and deposited in a public directory of studies at the National Institute of Health data. Approval from the Ethical Committee regarding the MR-004 methodology was not needed. All the volunteer participants gave their informed consent to be enrolled and the data were collected anonymously.

Footnotes

The supplementary material is available in the electronic version of this article: .

References

Higginson

, Carr

. Measuring quality of life: Using quality of life measures in the clinical setting, BMJ. 322. 2001;(7297)1297–300.

Varghese

, Lamb

, Rambani

, Venkateswaran

. The use of shoulder scoring systems and outcome measures in the UK, Ann R Coll Surg Engl. 2014 96(8), 590–2.

Angst

, Schwyzer

H-K

, Aeschlimann

, Simmen

, Goldhahn

. Measures of adult shoulder function, Arthritis Care & Research. 2011 63(11), S174–88.

Huang

, Grant

, Miller

, Mirza

, Gagnier

. A Systematic review of the psychometric properties of patient-reported outcome instruments for use in patients with rotator cuff disease, Am J Sports Med. 2015 43(10), 2572–82.

Dawson

, Fitzpatrick

, Carr

. Questionnaire of the perceptions of patients about shoulder surgery, J Bone Joint Surg Br. 1996 78(4), 593–600.

Huber

, Jochen

. Hofstaetter JG, Hanslik-Schnabel B, Posch M, Wurnig C, The German version of the Oxford shoulder score –cross-cultural adaptation and validation. Arch Orthop Trauma Surg. 2004 124(8), 531–6.

Tuton

, Barbe

, Salmon

, Dramé

, Nérot

, Ohl

. Transcultural validation of the oxford shoulder score for thefrench-speaking population, Orthopaedics & Traumatology: Surgery &Research. 2016 102(5), 409–13.

Mürren

, Vulcano

, D’Angelo

, Monti

, Cherubino

. Italian cross-cultural adaptation and validation of the Oxford shoulder score, J Shoulder Elbow Surg. 2010 19(3), 335–41.

Schmidt

, Ferrer

, González

, Valderas

, Alonso

, Escobar

, Vrotsou

, EMPRO

GrouEvaluation ofshoulder-specific patient-reported outcome measures: A systematicand standardized comparison of available evidence

J Shoulder ElbowSurg. 2014 23(3), 434–44.

10.

Bartolucci

, Silvia

Bacci S

, Gnaldi

. Statistical Analysis of Questionnaires–A Unified Approach Based on R and Stata. 1st ed. Boca Raton: Ed.Taylor&Francis Group; 2016.

11.

Penta

, Arnould

, Decruynaere

. Develop and interpret a measurement scale (applications of the Rasch model). 1st ed. Liège: Ed Pierre Mardaga; 2005.

12.

Anderson

, Oakman

. Allied health professionals and work-related musculoskeletal disorders: A systematic review. Saf Health Work. 2016 7(4), 259–67.

13.

Soylar

, Ozer

. Evaluation of the prevalence of musculoskeletal disorders in nurses: A systematic review. Med Sci. 2018 7(3), 479–85.

14.

Occhionero

, Korpinen

, Gobba

. Upper limb musculoskeletaldisorders in healthcare personnel. Ergonomics. 2014 57(8), 1166–91.

15.

Long

, Johnston

, Bogossian

. Work-related upper quadrant musculoskeletal disorders in midwives, nurses and physicians: A systematic review of risk factors and functional consequences. App Ergon. 2012 43(3), 455–67.

16.

Lin

, Lin

, Liu

, Fang

, Lin

. Exploring the factors affecting musculoskeletal disorders risk among hospital nurses. PLoS ONE. 2020 15(4), e0231319.

17.

Anderson

, Oakman

. Allied health professionals and work-related musculoskeletal disorders: A systematic review. Safety and Health at Work. 2016 7(4), 259–67.

18.

Ribeiro

, Serranheira

, Loureiro

. Work related musculoskeletal disorders in primary health care nurses. Applied Nursing Research. 2017 33, 72–7.

19.

Ryu

, Ye

, Yi

, Kim

. Risk factors of musculoskeletal symptoms in university hospital nurses. Ann Occup Environ Med. 2014 26(1), 47.

20.

Warming

, Precht

, Suadicani

, Ebbehøj

. Musculoskeletal complaints among nurses related to patient handling tasks and psychosocial factors–based on logbook registrations. Appl Ergon. 2009 40(4), 569–76.

21.

Bos

, Krol

, Star

, van der Star

, Groothoff

. Risk factors and musculoskeletal complaints in non-specialized nurses, IC nurses, operation room nurses, and X-ray technologists. Int Arch Occup Environ Health. 2007 80(3), 198–206.

22.

Dawson

, Rogers

, Fitzpatrick

, Carr

. The Oxford Shoulder Score revisited. Arch Orthop Trauma Surg. 2009 129, 119–23.

23.

Dawson

, Fitzpatrick

, Carr

. The assessment of soulder instability: The development and validation of a questionnaire. The Journal of bone and joint surgery. British Volume. 1999 81(3), 420–6.

24.

McHorney

, Tarlov

. Individual-patient monitoring in clinical practice: Are available health status surveys adequate? Qual Life Res. 1995 4(4), 293–307.

25.

Linacre

. A User’s Guide to Winsteps: Rasch-Model Computer Programs—Program Manual 3.68.0. Chicago: MESA Press; 2009.

26.

Verguts

, De Boeck

. A note on the Martin-Leif test for unidimensionality. Methods of Psychological Research Online. 2000 5(1), 1–7.

27.

Christensen

, Makransky

, Horton

. Critical values for Yen’s Q Identification of local dependence in the Rasch Model using residual correlations. Appl Psychol Meas. 2016 41(3), 178–94.

28.

Choi

, Gibbons

, Crane

. Lordif: An R package for detecting differential item functioning using iterative Hybrid Ordinal Logistic Regression/Item Response Theory and Monte Carlo Simulations. J Stat Softw. 2011 39(8), 1–30.

29.

Mukaka

. Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med J. 2012 24, 69–71.

30.

Linacre

. Sample size and item calibration stability. Rasch Mes Trans. 1994 7(4), 328.

31.

Nguyen

, Han

, Kim

, Chan

. An introduction to item response theory for patient-reported outcome measurement. Patient. 2014 7(1), 23–35.

32.

Edelen

, Reeve

. Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Qual Life Res. 2007 16(1), 5–18.

33.

Dawson

, Hill

, Fitzpatrick

, Carr

. The benefits of using patient-based methods of assessment: Medium-term results of an observational study of shoulder surgery. J Bone Joint Surg Br. 2001 83(6), 877–82.

34.

van Kampen

, Willems

, van Beers

, Castelein

, Scholtes

, Terwee

. Determination and comparison of the smallest detectable change (SDC) and the minimal important change (MIC) of four-shoulder patient-reported outcome measures (PROMs). J Orthop Surg Res. 2013 8(1), 1–9.

35.

Roe

, Lundegaard Soberg

, Erik Bautz-Holter

, Ostensjo

. A systematic review of measures of shoulder pain and functioning using the International classification of functioning, disability and health (ICF). BMC Musculoskelet Disord. 2013 14(1), 1–12.

36.

Page

, Huang

, Verhagen

, Buchbinder

, Gagnier

. Identifyinga core set of outcome domains to measure inclinical trials for shoulder disorders: A modified Delphi study. RMDOpen. 2016 2(2), e000380.

37.

Roquelaure

, Ha

, Rouillon

, Fouquet

, Leclerc

, Descatha

, Members of Occupational Health Ser-vices of the Pays de la Loire Region. Risk factors for upper-extremity musculoskeletal disorders in the working population. Arthritis Rheum. 2009 61(10), 1425–34.

38.

Chiarottoa

, Ostelo

, Turkc

, Buchbinderd

, Boersa

. Core outcome sets for research and clinical practice. Braz J Phys Ther. 2017 21(2), 77–84.

39.

Gagnier

. Patient reported outcomes in orthopaedics. J Orthop Res. 2017 35(10), 2098–108.