Evaluating the validity of the Work Role Functioning Questionnaire (Canadian French version) using classical test theory and item response theory

Abstract

BACKGROUND:

The Work Role Functioning Questionnaire (WRFQ) was developed to assess workers’ perceived ability to perform job demands and is used to monitor presenteeism. Still few studies on its validity can be found in the literature.

OBJECTIVE:

The purpose of this study was to assess the items and factorial composition of the Canadian French version of the WRFQ (WRFQ-CF).

METHODS:

Two measurement approaches were used to test the WRFQ-CF: Classical Test Theory (CTT) and non-parametric Item Response Theory (IRT).

RESULTS:

A total of 352 completed questionnaires were analyzed. A four-factor and three-factor model models were tested and shown respectively good fit with 14 items (Root Mean Square Error of Approximation (RMSEA) = 0.06, Standardized Root Mean Square Residual (SRMR) = 0.04, Bentler Comparative Fit Index (CFI) = 0.98) and with 17 items (RMSEA = 0.059, SRMR = 0.048, CFI = 0.98). Using IRT, 13 problematic items were identified, of which 9 were common with CTT.

CONCLUSIONS:

This study tested different models with fewer problematic items found in a three-factor model. Using a non-parametric IRT and CTT for item purification gave complementary results. IRT is still scarcely used and can be an interesting alternative method to enhance the quality of a measurement instrument. More studies are needed on the WRFQ-CF to refine its items and factorial composition.

Keywords

Factor analysis psychometrics presenteeism

1 Introduction

In the field of work disability prevention research, little attention has been given to presenteeism (i.e., “attending work while ill” [1]) compared to work absenteeism. Yet, it has been found that presenteeism is a prevalent problem and accounts for higher productivity losses and costs compared toabsenteeism [1]. For example, prevalence rates between 45% and 64% were reported among the working population [2, 3]. The cost of lost productivity due to common pain problems was estimated at US$61.2 billion per year [4]. The largest part of indirect costs made on employees’ claims was found to be associated with presenteeism (63%; $311.8M)) compared to absenteeism (6%; $27M) [5].

Although presenteeism is an important problem for organizations, few validated measurement instruments have been developed as yet. A systematic review analyzed 16 articles assessing 7 of these instruments and concluded that none demonstrated satisfactory results and that there is a lack of evidence to recommend one over another [6]. To better understand presenteeism and eventually develop effective strategies, researchers and organizations need to rely on valid instruments.

Currently, the Work Role Functioning Questionnaire (WRFQ) and Work Limitation Questionnaire (WLQ) are among the most studied presenteeism instruments [6]. These self-administered questionnaires assess the perceived impact of a health problem on workers’ ability to perform their job [7]. Aside from turning up at work despite ill health (presenteeism), there is evidence that workers’ limitations to perform work demands have a negative impact on their work productivity [8]. Consequently as presenteeism is related to work ability/productivity, the WRFQ and WLQ have been used as a proxy for assessing presenteeism. The items in these questionnaires describe a number of work demands chosen because of their frequent occurrence in a variety of jobs and their importance as identified from the workers’ perspective [7, 9]. Both questionnaires are grounded in the same conceptual framework and their items were all drawn from the same pool [9]. They differ in the number of items, recall period, and response set [9]. These questionnaires are popular in several countries, which is demonstrated by the number of published cross-cultural adaptations: Canadian French [10], Dutch [11], Brazilian Portuguese [12], Turkish [13], and Spanish [14].

To our knowledge, very few studies on the factorial composition of the WRFQ and WLQ can be found in the literature. Factor analysis is a statistical method to identify clusters of related variables and is important for the validity of an instrument [15]. It is essential for construct validity by assessing the internal structure and the cross-structure of items of an assessment instrument [15]. Factor analysis is also important for content validity by providing valuable information to revise an instrument (e.g., identify items that need rewording) [15]. Moreover, it can be useful for choosing instruments that can be used as predictors (predictive validity) [15]. Among existing studies on the WRFQ and the WLQ, contrasting results have been found for their dimensional structure [6]. The objective of this study was to test the validity of one version of the WRFQ, the Canadian French version (WRFQ-CF), by examining its factorial composition and items.

In addition, up to now, the WRFQ has been studied using a Classical Test Theory (CTT) approach. CTT is the most popular measurement approach and has existed for more than a century [16]. A more recent approach has been proposed, specifically Item Response Theory (IRT). These two approaches differ in their basic assumptions, orientation, information provided, and sample size [17, 18]. It has been recommended that these two approaches can be used to provide a quantitative assessment of items/scales and maximize the content validity of patient-reported outcome measures [19]. In this study, we thus tested how IRT could contribute to the content validity of WRFQ-CF by comparing the results with those from the CTT approach.

In summary, the main objective of this study was to test the validity of the WRFQ-CF and more specifically the following: (a) to determine the internal structure of the WRFQ-CF, (b) to identify problematic items using two measurement theory approaches (CTT and IRT), and (c) to compare the results obtained from the two measurement theory approaches.

2 Method

2.1 Participants

This study used a sample of data collected in an online survey of workers from a government agency in Quebec (Canada) aiming at identifying the determinants of work disability [20]. The survey was conducted among workers with regular or casual/temporary positions at the agency. Workers who had been on the job for less than six months were excluded to avoid recruiting those who were in the process of returning to work and who might therefore exhibit different characteristics. This survey was approved by the Hôpital Charles LeMoyne ethical committee. More information on this survey is provided in Coutu et al. [20, 21].

We conducted a secondary analysis of this survey using the data on presenteeism collected with the WRFQ-CF. As this study aimed to assess the factorial composition and item response of the WRFQ-CF, we included participants for whom all the items of the questionnaire were applicable to their job.

2.2 Measurement instrument

The WRFQ-CF includes a total of 27 items distributed into five subscales: work scheduling demands (items W1 to W5) (e.g., W1: Work the required number of hours), output demands (items O1 to O7) (e.g., O1: Handle the workload), physical demands (items P1 to P6) (e.g., P1: Walk or move around different work locations), mental demands (items M1 to M6) (e.g., M1: Keep your mind on your work), and social demands (items S1 to S3) (e.g., S1: Speak with people in-person, in meetings, or on the phone) [10]. The WRFQ items are scored on a five-level difficulty response scale measuring the amount of time a physical or emotional problem interfered with the ability to perform work demands in the past four weeks. The response options are: 0-difficult all of the time (100%), 1-difficult most of the time (25%), 2-difficult half of the time (50%), 3-difficult some of the time (75%), and 4-difficult none of the time (0%). The category “does not apply to my job” is also available to make the questionnaire applicable to different types of jobs. According to Amick et al. [22], when more than 20% of the items are marked as not applying (invalid items) to the respondent’s work, a score of the questionnaire or the subscale concerned cannot be calculated. Each subscale is scored separately by adding the response of each item, dividing by the number of valid items, and multiplying by 25 to obtain a score varying from 0 (always limited) to 100 (never limited) [10]. The total score can also be calculated using the same process (i.e., adding the score of all items, dividing by the number of valid items, and multiplying by 25). A high score corresponds to there being few functional limitations to performing the work [10], and therefore low presenteeism.

The Canadian French version of the WRFQ (WRFQ-CF) was created in 2004. The measurement properties of this version were assessed in one study with workers with a musculoskeletal disorder (n = 40). The results showed acceptable construct validity, and variable internal consistency among the subscales (α coefficients ranging from 0.66 to 0.92) [10]. To our knowledge, no other study was published on the WRFQ-CF.

2.3 Data analysis

In this study, two common measurement theory approaches were used: CTT and IRT.

2.3.1 Classical Test Theory (CTT)

CTT is a test-oriented approach that is based on the decomposition of observed scores into the sum of the true and error scores [16]. In this study, the steps suggested by Churchill [23] were followed. First, the dimensionality of the WRFQ-CF was verified by generating a total variance explained table using unrotated maximum likelihood (ML). The internal consistency of each factor of the WRFQ-CF was then measured using alpha coefficients and item-total correlations. In general, variables with low reliability coefficients should be avoided (below 0.70) [24]. Also, indicators with an item-total correlation below 0.50 should be eliminated [25]. Second, exploratory factor analysis (EFA) was used for identifying poor items. Items were considered problematic if they had the following criteria: (1) low communality (h²) (below 0.40), (2) small factor loadings (λ estimate) (below 0.40), (3) items loading more on another factor than the intended factor, and (4) cross-loading, i.e., significant loading on several factors [24 , 27]. Third, confirmatory factorial analysis (CFA) was done to test model fit. The following fit indices were analyzed: Bentler Comparative Fit Index (CFI), chi-square value, Root Mean Square Error of Approximation (RMSEA), and Standardized Root Mean Square Residual (SRMR). There is no rule of thumb on the best index for assessing model fit. The fit indices chosen are among the most currently used in the literature and recommended by Kline [28]. Two of these indices are absolute fit indices (chi-square, SRMR), which assess how well the a priory model reproduces the data [29]. Two are incremental fit indices (RMSEA, CFI), which means that they compare a target model with a baseline model in which the observed variables are uncorrelated [29]. To support a good fit, the indices were interpreted using the following cut-off values: >0.90 for CFI; non-significant p-value of the chi-square value (>0.05); and <0.05 for the RMSEA and SRMR [30]. Alpha coefficients, item-total correlations, and EFA were computed in SPSS v20. CFA was performed in Lisrel v8.80.

2.3.2 Item Response Theory (IRT)

IRT is a more recent measurement approach and provides information on the relationship between the items and the latent trait. Hence, it consists of an item-oriented approach and each item can be examined independently to determine their contribution in a test [18]. For IRT, the performance of the WRFQ-CF was tested regarding three measures: option effectiveness (i.e., effectiveness of response options and items at different levels of the latent trait), items bias (i.e., extent to which different groups endorse items differently), and scale discriminability (i.e., extent to which a scale can detect differences among persons at different levels of the latent trait) [31]. In this study, a nonparametric IRT was performed using TestGraf, a software that uses the nonparametric kernelsmoothing approach to model responses [32]. It generates the test information function (TIF), option characteristic curves (OCC), and item characteristic curves (ICC). The TIF is a measure of precision and indicates the amount of information in the test at various levels of the trait score [32]. The latent trait will be more precisely estimated when the values of the test information function are large [32]. The OCC provides a graphical representation of the probability of a particular response option being endorsed at different levels of work ability (latent trait) [32]. In the WRFQ-CF, a 5-point scale is used. Hence, we can expect to see five option curves in the OCC of each item. In an OCC, the X- and Y-axes represent respectively the latent trait (θ) and the probability of endorsing an option. To be considered as a “good” item in the WRFQ-CF, it was expected that the probability of choosing option 4 (difficult none of the time) will increase as the latent trait increases, and the probability of choosing option 0 (difficult all of the time) will decrease as the latent trait increases. Also, the probability of choosing options 1 to 3 will increase then decrease throughout the latent trait, with lower options located more to the left of the X-axis and higher options to the right of the X-axis. The ICC graphically represents the relationship between the expected item score and the expected total score (i.e., work ability) [32]. In the WRFQ-CF, the expected item score ranges from 0 to 4 (Y-axis in the ICC). It was expected that the respondents’ low on the latent trait will score 0 and those high on the trait will score 4. Hence, a curve that increases as the trait increases is expected. Also, the slope of the curve will indicate how effectively an item discriminates respondents at different levels of the latent trait: a steep curve meaning that the expected item scores increase rapidly as a function of work ability whereas a flat curve means that the item scores are less effective at discriminating respondents at different levels of work ability [33].

3 Results

3.1 Participants

A total of 352 participants were retained for the purpose of this study. As shown in Table 1, the participants were mainly women (66%), aged between 40 and 59 years (79%), married or common-law (69%), and had more than 10 years seniority at the agency (65%). Most workers had 1 to 5 days of work absence over the past year due to illness (56%).

Table 1
Sociodemographic characteristics of the participants of this study (n = 352)

Characteristics Number (%)

Gender

Women 231 (65.6)

Men 121 (34.4)

Age

20–29 years old 12 (3.4)

30–39 years old 51 (14.5)

40–49 years old 126 (35.8)

50–59 years old 153 (43.5)

60–69 years old 10 (2.8)

≥70 years old 0 (0)

Marital status

Common-law 111 (31.5)

Divorced 35 (9.9)

Married 131 (37.2)

Separated 12 (3.4)

Single 59 (16.8)

Widowed 4 (1.1)

Number of work absence days over the past year

None 51 (14.5)

1-2 days 82 (23.3)

3–5 days 114 (32.4)

6–10 days 53 (15.1)

>10 days 52 (14.8)

Job seniority

<2 years 26 (7.4)

2 to 5 years 8 (2.3)

6 to 10 years 88 (25.0)

11 to 20 years 103 (29.3)

>20 years 127 (36.1)

Characteristics	Number (%)
Gender
Women	231 (65.6)
Men	121 (34.4)
Age
20–29 years old	12 (3.4)
30–39 years old	51 (14.5)
40–49 years old	126 (35.8)
50–59 years old	153 (43.5)
60–69 years old	10 (2.8)
≥70 years old	0 (0)
Marital status
Common-law	111 (31.5)
Divorced	35 (9.9)
Married	131 (37.2)
Separated	12 (3.4)
Single	59 (16.8)
Widowed	4 (1.1)
Number of work absence days over the past year
None	51 (14.5)
1-2 days	82 (23.3)
3–5 days	114 (32.4)
6–10 days	53 (15.1)
>10 days	52 (14.8)
Job seniority
<2 years	26 (7.4)
2 to 5 years	8 (2.3)
6 to 10 years	88 (25.0)
11 to 20 years	103 (29.3)
>20 years	127 (36.1)

3.2 Classical Test Theory (CTT) approach

3.2.1 Dimensionality and internal consistency

The eigenvalues and total variance explained by five factors ranged from 0.96 to 11.88 and 44.01% to 65.41%, respectively. Based on Kaiser’s rule stating that eigenvalues of 1.0 or greater represent the number of factors, four factors were identified in this questionnaire [34]. However, a five-factor model was retained for the following analysis since the eigenvalue of the fifth factor is close to 1.0, the total variance explained by five factors is over 65%, and the a priori number of common factors is specified as five in the literature on the WRFQ and WLQ.

Table 2 presents alpha coefficients and item-total correlations for each factor of the WRFQ-CF. As shown in Table 2, the alpha coefficient of the social demands factor was low (0.672) as well as the item-total correlations of all three items in this factor (S1, S2, and S3). Also, item M6 in the mental demands factor had a correlation below 0.50. Hence at this stage, four items were found problematic (M6, S1, S2, and S3). Consequently, the social demands factor was also removed.

Table 2
Cronbach’s alpha, item-total correlations and alpha if item deleted of the WRFQ-CF (5 factors)

Factors Items Alpha Item-total Alpha if

correlation item deleted

Work scheduling demands 0.844

W1 0.673 0.806

W2 0.658 0.813

W3 0.649 0.816

W4 0.584 0.830

W5 0.716 0.797

Output demands 0.891

O1 0.760 0.866

O2 0.763 0.866

O3 0.758 0.866

O4 0.671 0.878

O5 0.681 0.876

O6 0.592 0.892

O7 0.645 0.881

Physical demands 0.875

P1 0.663 0.859

P2 0.758 0.841

P3 0.753 0.841

P4 0.706 0.850

P5 0.709 0.854

P6 0.575 0.872

Mental demands 0.875

M1 0.761 0.839

M2 0.727 0.846

M3 0.672 0.855

M4 0.828 0.827

M5 0.713 0.848

M6 0.424 0.901

Social demands 0.672

S1 0.476 0.587

S2 0.482 0.580

S3 0.494 0.563

Factors	Items	Alpha	Item-total	Alpha if
Work scheduling demands		0.844
	W1		0.673	0.806
	W2		0.658	0.813
	W3		0.649	0.816
	W4		0.584	0.830
	W5		0.716	0.797
Output demands		0.891
	O1		0.760	0.866
	O2		0.763	0.866
	O3		0.758	0.866
	O4		0.671	0.878
	O5		0.681	0.876
	O6		0.592	0.892
	O7		0.645	0.881
Physical demands		0.875
	P1		0.663	0.859
	P2		0.758	0.841
	P3		0.753	0.841
	P4		0.706	0.850
	P5		0.709	0.854
	P6		0.575	0.872
Mental demands		0.875
	M1		0.761	0.839
	M2		0.727	0.846
	M3		0.672	0.855
	M4		0.828	0.827
	M5		0.713	0.848
	M6		0.424	0.901
Social demands		0.672
	S1		0.476	0.587
	S2		0.482	0.580
	S3		0.494	0.563

3.2.2 Exploratory Factor Analysis (EFA)

Using the 23 remaining items, EFA was conducted using ML with a fixed number of factors (n = 4) and Direct Oblimin rotation. The four factors accounted for 66.41% of the total variance. Also, the correlation between factors ranged from 0.202 to 0.684, confirming that oblique rotation was warranted. The examination of the factor loadings indicated seven problematic items: the O1, O2, and O6 items loaded more on the work scheduling demands factor (λ of –0.710, 0.451, and –0.477, respectively) compared with the output demands intended factor (λ of 0.307, 0.382, and 0.135, respectively); the M3 item loaded more on the output demands factor (λ= 0.643) compared with the mental demands intended factor (λ= –0.250); the M2 item cross-loaded with the output demands factor (λ= 0.331) and the mental demands intended factor (λ= 0.492); the O7 item had a small factor loading on the output demands intended factor (–0.338); and the W4 item had low communality (h² = 0.392). The final four-factor model after purification with EFA was composed of 16 items. In this model, the four factors accounted for 70.97% of the total variance. Also, the correlations between the factors ranged from 0.392 to0.628.

Since several items had cross-loadings between the work scheduling and output demands factors, we conducted another EFA using three factors (ML and Direct Oblimin rotation). Only two items were found problematic due to cross-loading with the work scheduling/output demands factor (λ of M2 = 0.416; λ of M3 = 0.458) compared to the intended mental demands factor (λ of M2 = –0.446; λ of M3 = –0.338). Fewer items were removed compared to a four-factor model because the five problematic items from the work scheduling and output demands factors did not show anomalies when using three factors. Hence, a three-factor model with 21 items was found for EFA. The three factors accounted for 62.18% of the total variance. Also, the correlations between the factors ranged from 0.370 to 0.726.

3.2.3 Confirmatory Factor Analysis (CFA)

CFA was performed in Lisrel to assess the four-factor and three-factor model fits. The factor loadings and fit indices of these models are presented in Table 3. Among the four indices presented in Table 3, two showed a good model fit.

Table 3
Standardized estimates of confirmatory factor model

Four-factor model (16 items) Three-factor model (21 items)

Items λ estimate S.E. λ estimate S.E.

Work scheduling demands

W1 0.75 0.05 0.68 0.05

W2 0.72 0.05 0.66 0.05

W3 0.72 0.05 0.70 0.05

W4 – – 0.61 0.05

W5 0.80 0.05 0.79 0.05

Output demands

O1 – – 0.82 0.04

O2 – – 0.82 0.04

O3 0.80 0.05 0.79 0.05

O4 0.79 0.05 0.70 0.05

O5 0.76 0.05 0.66 0.05

O6 – – 0.64 0.05

O7 – – 0.66 0.05

Physical demands

P1 0.72 0.05 0.72 0.05

P2 0.79 0.05 0.79 0.05

P3 0.82 0.05 0.82 0.05

P4 0.78 0.05 0.78 0.05

P5 0.75 0.05 0.75 0.05

P6 0.62 0.05 0.62 0.05

Mental demands

M1 0.84 0.04 0.84 0.04

M4 0.93 0.04 0.92 0.04

M5 0.80 0.05 0.80 0.05

Goodness-of-fit estimates: CFI = 0.97 CFI = 0.97

RMSEA = 0.080 RMSEA = 0.084

SRMR = 0.050 SRMR = 0.057

x² = 306.04 (p < 0.01; df = 98) x² = 609.91 (p < 0.01; df = 186)

	Four-factor model (16 items)	Three-factor model (21 items)
Work scheduling demands
W1	0.75	0.05	0.68	0.05
W2	0.72	0.05	0.66	0.05
W3	0.72	0.05	0.70	0.05
W4	–	–	0.61	0.05
W5	0.80	0.05	0.79	0.05
Output demands
O1	–	–	0.82	0.04
O2	–	–	0.82	0.04
O3	0.80	0.05	0.79	0.05
O4	0.79	0.05	0.70	0.05
O5	0.76	0.05	0.66	0.05
O6	–	–	0.64	0.05
O7	–	–	0.66	0.05
Physical demands
P1	0.72	0.05	0.72	0.05
P2	0.79	0.05	0.79	0.05
P3	0.82	0.05	0.82	0.05
P4	0.78	0.05	0.78	0.05
P5	0.75	0.05	0.75	0.05
P6	0.62	0.05	0.62	0.05
Mental demands
M1	0.84	0.04	0.84	0.04
M4	0.93	0.04	0.92	0.04
M5	0.80	0.05	0.80	0.05
Goodness-of-fit estimates:	CFI = 0.97	CFI = 0.97
	RMSEA = 0.080	RMSEA = 0.084
	SRMR = 0.050	SRMR = 0.057
	x² = 306.04 (p < 0.01; df = 98)	x² = 609.91 (p < 0.01; df = 186)

To improve the goodness-of-fit estimates (particularly of the chi-square and RMSEA), two items were removed from the model based on the largest negative and positive standardized residuals: items P2 and P4. After removing these two items, a satisfactory model fit was attained for the four-factor model (14 items): RMSEA showed an acceptable fit (0.06) and the other indices showed a good fit (SRMR = 0.04; CFI = 0.98). Although the chi-square value dropped from 306.04 to 167.36, its p-value was still significant (p <0.01) and thus did not support the fit.

For the three-factor model, four items were removed (W1, O3, O4, and P4) to obtain an acceptable model fit in all indices beside the chi-square value (RMSEA = 0.059; SRMR = 0.048; CFI = 0.98; and x² = 264.93, p < 0.01, df = 116). Hence, in the three-factor model, a good fit was obtained with 17 items.

3.3 Item Response Theory (IRT) approach

Prior to conducting the analysis in TestGraf, the unidimensionality of each factor in the WRFQ-CF was ascertained using unrotated principal component analysis (PCA) in SPSS. The first factor of all dimensions accounted for more than 50% of the total variance (ranging from 60.39% to 62.28%); the eigenvalues of the first factor were over 1.0 (ranging from 1.81 to 4.32); and the ratio between the first and second factors was high (from 2.98 : 1 to 7.22 : 1), which confirmed their unidimensionality [35].

Appendix 1 presents the Option characteristic curves (OCC) and item characteristic curves (ICC) for all the items of the WRFQ-CF. Based on a visual inspection of the OCC and ICC of the items presented in Appendix 1, options 3 and 4 dominated for all the items and the probability of endorsing options 0, 1, or 2 was less than 45%, except for items P5 and M1. A total of 13 items were identified as weak items since the slope of their ICC was flat and options 3 and 4 dominated (nearly) the whole range of the latent trait in the OCC: factor 1 (items W3 and W4); factor 2 (items O4, O5, and O7); factor 3 (items P1, P4, and P6); factor 4 (items M3 and M6); and factor 5 (S1, S2, and S3).

3.4 Comparison between the results of CTT and IRT

A total of 13 weak items were identified with the 4-factor model CTT approach, 10 with the 3-factor model CTT approach, and 13 with the IRT approach. As shown in Table 4, six items were identified in all three approaches: one item on physical demands (P4); two items on mental demands (M3 and M6); and all three items on social demands. In addition, three items were found in IRT and one of the CTT: one item on work scheduling demand (W4), and two items on output demands (O4 and O7). Seven items were found only with CTT, and four items only with IRT.

Table 4
Comparison of problematic items identified with Classical test theory (CTT) and Item response theory (IRT)

Items CTT 4-factor CTT 3-factor IRT

model model

W1 1. Travailler le nombre d’heures demandé (Work the required number of hours) ✓

W2 2. Commencer votre journée de travail avec entrain (Get going easily at the beginning of the workday)

W3 3. Commencer á travailler dès votre arrivée au travail (Start on your job as soon as you arrived at work) ✓

W4 4. Faire votre travail sans prendre une pause supplémentaire (Do your work without stopping to take extra breaks or rests) ✓ ✓

W5 5. Maintenir une routine ou un horaire régulier (Stick to a routine or schedule)

O1 6. Assumer votre charge de travail (Handle the workload) ✓

O2 7. Avoir un rythme de travail suffisant (Work fast enough) ✓

O3 8. Terminer le travail á temps (Finish work on time) ✓

O4 9. Faire votre tavail sans faire d’erreur (Do your work without making mistakes) ✓ ✓

O5 10. Satisfaire les personnes qui jugent votre travail (Satisfy the people who judge your work) ✓

O6 11. Sentir que vous vous accomplissez dans votre travail (Feel a sense of accomplishment in your work) ✓

O7 12. Avoir l’impression que vous avez fait ce dont vous étiez capable de faire (Feel you have done what you are capable of doing) ✓ ✓

P1 13. Marcher ou se déplacer dans différents endroits de travail (Walk or move around different work locations) ✓

P2 14. Lever, transporter ou déplacer des objets de plus de 10 livres au travail (Lift, carry, or move objects at work weighing more than 10 pounds) ✓

P3 15. Rester assis, debout ou dans la même position plus de 15 minutes en travaillant (Sit, stand, or stay in one position for longer than 15 minutes while working)

P4 16. Répéter les mêmes mouvements á de nombreuses reprises en travaillant (Repeat the same motions over and over again while working) ✓ ✓ ✓

P5 17. Travailler penché, en torsion ou en s’étirant (Bend, twist, or reach while working)

P6 18. Utiliser des outils ou des équipements á l’aide de vos mains (Use hand-held tools or equipment) ✓

M1 19. Maintenir votre attention sur votre travail (Keep your mind on your work)

M2 20. Planifier et organiser efficacement votre travail (Think clearly when working) ✓ ✓

M3 21. Travailler avec soin (Do work carefully) ✓ ✓ ✓

M4 22. Se concentrer sur votre travail (Concentrate on your work)

M5 23. Travailler sans perdre le fil de vos idées (Work without losing your train of thought)

M6 24. Lire ou utiliser vos yeux en travaillant (Easily read or use your eyes when working) ✓ ✓ ✓

S1 25. Parler avec les gens en personne, en réunion ou au téléphone (Speak with people in-person, in meetings or on the phone) ✓ ✓ ✓

S2 26. Maîtriser votre humeur en présence d’autres personnes pendant le travail (Control your temper around people when working) ✓ ✓ ✓

S3 27. Aider les autres pour que le travail soit fait (Help other people to get work done*) ✓ ✓ ✓

Items	CTT 4-factor	CTT 3-factor	IRT
W1	1. Travailler le nombre d’heures demandé (Work the required number of hours)		✓
W2	2. Commencer votre journée de travail avec entrain (Get going easily at the beginning of the workday)
W3	3. Commencer á travailler dès votre arrivée au travail (Start on your job as soon as you arrived at work)			✓
W4	4. Faire votre travail sans prendre une pause supplémentaire (Do your work without stopping to take extra breaks or rests)	✓		✓
W5	5. Maintenir une routine ou un horaire régulier (Stick to a routine or schedule*)
O1	6. Assumer votre charge de travail (Handle the workload)	✓
O2	7. Avoir un rythme de travail suffisant (Work fast enough)	✓
O3	8. Terminer le travail á temps (Finish work on time)		✓
O4	9. Faire votre tavail sans faire d’erreur (Do your work without making mistakes)		✓	✓
O5	10. Satisfaire les personnes qui jugent votre travail (Satisfy the people who judge your work)			✓
O6	11. Sentir que vous vous accomplissez dans votre travail (Feel a sense of accomplishment in your work)	✓
O7	12. Avoir l’impression que vous avez fait ce dont vous étiez capable de faire (Feel you have done what you are capable of doing)	✓		✓
P1	13. Marcher ou se déplacer dans différents endroits de travail (Walk or move around different work locations)			✓
P2	14. Lever, transporter ou déplacer des objets de plus de 10 livres au travail (Lift, carry, or move objects at work weighing more than 10 pounds)	✓
P3	15. Rester assis, debout ou dans la même position plus de 15 minutes en travaillant (Sit, stand, or stay in one position for longer than 15 minutes while working)
P4	16. Répéter les mêmes mouvements á de nombreuses reprises en travaillant (Repeat the same motions over and over again while working)	✓	✓	✓
P5	17. Travailler penché, en torsion ou en s’étirant (Bend, twist, or reach while working)
P6	18. Utiliser des outils ou des équipements á l’aide de vos mains (Use hand-held tools or equipment)			✓
M1	19. Maintenir votre attention sur votre travail (Keep your mind on your work)
M2	20. Planifier et organiser efficacement votre travail (Think clearly when working)	✓	✓
M3	21. Travailler avec soin (Do work carefully)	✓	✓	✓
M4	22. Se concentrer sur votre travail (Concentrate on your work)
M5	23. Travailler sans perdre le fil de vos idées (Work without losing your train of thought)
M6	24. Lire ou utiliser vos yeux en travaillant (Easily read or use your eyes when working)	✓	✓	✓
S1	25. Parler avec les gens en personne, en réunion ou au téléphone (Speak with people in-person, in meetings or on the phone)	✓	✓	✓
S2	26. Maîtriser votre humeur en présence d’autres personnes pendant le travail (Control your temper around people when working)	✓	✓	✓
S3	27. Aider les autres pour que le travail soit fait (Help other people to get work done)	✓	✓	✓

4 Discussion

To our knowledge, the measurement properties of the WRFQ-CF were only tested in one study [10]. To contribute to its improvement, this study aimed to test the validity of the WRFQ-CF by assessing its items and factorial composition. It used two different measurement theory approaches (CTT and IRT), which gave similar and complementary results with nine weak items identified in both approaches. Three-factor and four-factor models were tested, which generated different items on work scheduling and output demands. The three-factor model showed good model fit with fewer problematic items.

To date, few studies on the factorial structure of the WRFQ or its alternate forms can be found in the literature. In previous studies, the number of factors studied ranged from three to five. The original article on the WLQ-25 identified four distinct factors: time, physical, mental-interpersonal, and output demands [9]. Tang et al. [36] compared four- and five-factor models and found that both models showed acceptable goodness-of-fit indices, with the five-factor model performing better [36]. Also, Walker, Michaud, and Wolfe [37] carried out a factorial analysis of the WLQ-25 and extracted three factors (eigenvalue more than 1.0) with one predominant factor explaining 77% of the variance. Several papers have combined the social and mental demands dimensions [9 , 37]. Yet, in this study, all items in the social demands factor were found problematic using both measurement theory approaches, which prevented us from testing a five-factor model. The social dimension plays a salient role in presenteeism. For example, studies have found that workers who have a high level of support and integration at work will tend to show up ill at work compared to those with poor social support and integration [1]. We recommend that the social demands items be revised and further tested to provide better discrimination.

In this study, the work scheduling and output demands subscales were correlated. When using a four-factor model, three items in the output demands factor (O1, O2, and O6) loaded more on the work scheduling demands factor. Also, all except one item of the output demands were found weak by either measurement theory approach. This result suggests that these two factors could be combined. One study on the Dutch version of the WRFQ-27 also combined the work scheduling and output demands factors and suggested a new version composed of four subscales: (1) work scheduling and output demands, (2) physical demands, (3) mental and social demands, and (4) flexibility demands [45]. These two factors share some similar characteristics. Work scheduling demands were defined as “worker’s needs to manage the workday from beginning to end” and output demands as “activities related to completing work on time, with high quality and to everyone’s (including the worker’s) satisfaction” [38]. Both subscales involve a time component, which might explain the correlations between the factors. Also, these items might refer to control over tasks, which has been found to be an important risk factor for presenteeism [39].

The purification of items was conducted on the basis of exploratory and confirmatory factor analysis as well as of IRT. For the confirmatory factor analysis, we used four fit indices. Some may argue that one or two indices are enough, while others suggest combining different indices. Since there is no consensus in the literature and several criteria may influence the performance of each fit index [40], we preferred to use a combination of several types of indices to provide a better picture of the model fit. We used those proposed by Kline [28]. Also, in this study, the chi-square estimate did not support model fit. The chi-square estimate usually provides a reasonable measure of fit when the sample size ranges between 75 and 200 [41]. This estimate will often be statistically significant with a larger sample size [41]. Hence, the poor chi-square value might be explained by the sample size used in this study (n = 352).

In this study, the number of problematic items ranged from 10 (3-factor model EFA/CFA analysis) to 13 (4-factor model EFA/CFA analysis and IRT). If we consider all items that were identified as problematic by either approach, most factors will have fewer than three items. It is not recommended using fewer than three items per factor for patient-reported outcomes since the interpretation of some factors will be weak and will not properly assess the construct [24]. Revision of the weak items should be considered, especially the nine items found problematic by both approaches (Table 4). The results of this study could also be used to develop a short version of the WRFQ-CF. In the literature, some studies have reported short versions of the WRFQ and WLQ, such as WRFQ-15, WLQ-16, and WLQ-8 [6 , 42].

This study used IRT to test items. To our knowledge, no other study on the WRFQ has used this approach. Also, in the field of work disability, IRT is less frequently used compared to CTT (e.g., [43 –45]). Yet, IRT provides some advantages over CTT. For example, IRT can provide information on the relationship between the item and the latent trait. Thus, each item can be examined independently to determine their contribution in a test [18]. This allows for the comparison of different tests measuring the same ability [18]. This offers an advantage over CTT that requires test forms to be parallel before their scores can be compared [17]. In IRT, the items from different tests can be placed on a common scale, which enables the comparison of the level of difficulty of the tests as well as the development of item banks [18]. In this study, non-parametric IRT was useful to corroborate results obtained from CTT. Nine items were found to be problematic in both approaches. Also, using IRT gave complementary results with four additional items. This difference can be explained by the level of analysis of both approaches: CTT’s focus is on test-level information (comparing the items to the entire scale) whereas IRT focuses on the item level [18]. The results found in IRT can guide the choice of items and identify those that need revision.

Differences can be identified between studies on the WRFQ (and its alternate forms) and this study, which might explain the discrepancies between the number of factors identified and the items retained. First, studies on factorial validation of the WRFQ have mainly been conducted with workers having musculoskeletal disorders whereas in this study the workers were not recruited based on a specific diagnosed health problem. However, in this organization, distress was found in 62% of the sample, including 41% with high levels of psychological distress [21]. Other studies have also used the WRFQ with the general working population, which supports the need for validation studies on this population [46 –48]. Our study participants were from the service sector, which is a sector at risk of presenteeism [2]. Also, presenteeism is not limited to workers having musculoskeletal disorders and affects a large proportion of the workforce [49]. Other health conditions such as allergies, diabetes, and headaches were found among the top conditions associated with productivity loss [50]. Documenting presenteeism in the general working population can help to put in place appropriate approaches to reduce its impact. Second, the statistical analysis methods were not the same. Among the few factorial validation studies identified, EFA (mainly PCA and orthogonal rotation [37, 47]) and CFA [36] were used. In this study, ML with oblique rotation was used. We did not use PCA because it only accounts for variance in the observed variables and does not differentiate between common and unique variance [24]. Also, orthogonal rotation is usually not recommended when the constructs are often correlated with one another [24]. In this study, oblique rotation was used and was justified by the presence of correlations between factors (ranging from 0.202 to 0.684).

This study has some limitations that need to be considered. First, the study population was mainly office workers from one government agency. Hence, it is not possible to generalize the results of this study to the general working population. Second, we did not perform cross-validation, i.e. randomly split the sample into two groups to check if the factor solutions can be replicated across groups [34]. In this study, since the WRFQ-CF has 27 items and the communalities were moderate, a sample size of at least 200 was targeted [24]. The sample size was large enough (n = 352) for factor analysis based on a subjects-to-variable ratio of 10 : 1 [34]. Third, we used TestGraf, a software allowing non-parametric IRT for item purification. One main limitation of TestGraf is the lack of pre-assigned cut-off values [51]. Hence, the interpretation of graphs generated in TestGraf might be based on the subjective judgment of the researchers. However, in this study, there were no uncertainties and disagreement among our team during the visual inspection of the graphs since all the identified items clearly showed some weaknesses (e.g., flat curves, dominating options) (Appendix 1).

5 Conclusion

The WRFQ and WLQ questionnaires are widely used to assess presenteeism. These questionnaires can be used as part of a broader work disability prevention process, for tailored intervention development, for establishing the prevalence of presenteeism, and for identifying risk factors. Testing the validity of a questionnaire is crucial to ensure that it serves its intended purpose. Factor analysis is invaluable for testing the validity of an instrument and provides information on the internal structure, and the items that need to be revised. To this end, this paper provides an example of the application and comparison of two measurement theory approaches (CTT and IRT). IRT is more and more recommended but it requires a large sample size to yield reliable results [19]. Non-parametric IRT is an interesting alternative method for item purification and could be further explored for the development of assessment instruments in work disability. Based on our sample, these two approaches provided similar and complementary results (9 common items and 11 items found in one approach). Also, in the literature, the number of factors determined for the WRFQ (and alternate forms) varies from three to five. In this study, we found fewer problematic items when using a three-factor model. Furtherstudies are needed on this questionnaire to refine the items and define its internal structure. Finally, studies on other measurement properties (e.g., reliability, convergent and discriminant validity) of the WRFQ-CF are required.

Conflict of interest

None to report.

Footnotes

Appendix 1: Option characteristic curves and item characteristic curves for the five factors of the WRFQ-CF

Factor 1 – Work scheduling demands (5 items)

Factor 2 – Output demands (7 items)

Factor 3 – Physical demands (6 items)

Factor 4 – Mental demands (6 items)

Factor 5 – Social demands (3 items)

Acknowledgments

Quan Nha Hong is supported by a doctoral scholarship from the Canadian Institutes of Health Research (CIHR). Marie-France Coutu was supported by a junior research fellowship from the Fonds de recherche du Québec - Santé (FRQS) at the time of this study.

References

Johns

. Presenteeism in the workplace: A review and research agenda. J Organ Behav 2010;31(4):519–42.

Taloyan

, Aronsson

, Leineweber

, Hanson

, Alexanderson

, Westerlund

. Sickness presenteeism predicts suboptimal self-rated health and sickness absence: A nationally representative study of the Swedish working population. PLoS One 2012;7(9):e44721.

Taloyan

, Kecklund

, Thörn

, Kjeldgård

, Westerlund

, Svedberg

, et al. Sickness presence in the Swedish Police in 2007 and in 2010: Associations with demographic factors, job characteristics, and health. Work 2016;54(2):379–87.

Stewart

, Ricci

, Chee

, Morganstein

, Lipton

. Lost productive time and cost due to common pain conditions in the US workforce. JAMA 2003;290(18):2443–54.

Hemp

. Presenteeism: At work-but out of it. Harv Bus Rev 2004;82(10):49–58.

Roy

J-S

, Desmeules

, MacDermid

. Psychometric properties of presenteeism scales for musculoskeletal disorders: A systematic review. J Rehabil Med 2011;43(1):23–31.

Amick

3rd , Lerner

, Rogers

, Rooney

, Katz

. A review of health-related work outcome measures and their uses, and recommended measures. Spine 2000;25(24):3152–60.

Lerner

, Amick

Iii , Lee

, Rooney

, Rogers

, Chang

, et al. Relationship of employee-reported work limitations to work productivity. Med Care 2003;41(5):649–59.

Lerner

, Amick

3rd , Rogers

, Malspeis

, Bungay

, Cynn

. The work limitations questionnaire. Med Care 2001;39(1):72–85.

10.

Durand

, Vachon

, Hong

, Imbeau

, Amick

III , Loisel

. The cross-cultural adaptation of the work role functioning questionnaire in Canadian French. Int J Rehabil Res 2004;27(4):261–8.

11.

Abma

, Amick

III , Brouwer

, van der Klink

, Bültmann

. The cross-cultural adaptation of the Work Role Functioning Questionnaire to Dutch. Work 2012;43(2):203–10.

12.

Gallasch

, Alexandre

NMC

, Amick

III . Cross-cultural adaptation, reliability, and validity of the work role functioning questionnaire to Brazilian Portuguese. J Occup Rehabil 2007;17(4):701–11.

13.

Irmak

, Bumin

, Irmak

. The cross-cultural adaptation of the Work Role Functioning Questionnaire to Turkish. HCI International 2011– Posters’ Extended Abstracts: Springer; 2011. pp. 218–22.

14.

Ramada

, Serra

, Amick

III , Castaño

, Delclos

. Cross-cultural adaptation of the work role functioning questionnaire to spanish spoken in Spain. J Occup Rehabil 2013;23(4):566–75.

15.

Nunnally

, Bernstein

. Psychometric Theory. New York: McGraw, 1994.

16.

Raykov

, Marcoulides

. Introduction to psychometric theory. New York: Routledge, Taylor & Francis, 2010.

17.

Embretson

, Reise

. Item Response Theory for Psychologists. NeMahwah, NJ: Lawrence Earlboum associates, 2000.

18.

Hambleton

, Swaminathan

, Rogers

. Fundamentals of Item Response Theory. Newbury Park, CA: Sage, 1991.

19.

Cappelleri

, Lundy

, Hays

. Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures. Clin Ther 2014;36(5):648–62.

20.

Coutu

, Nastasia

, Durand

, Corbière

, Loisel

, Lemieux

, et al. Montréal, QC: Institut de recherche Robert-Sauvé en santé et sécurité au travail. 2011.

21.

Coutu

M-F

, Corbière

, Durand

M-J

, Nastasia

, Labrecque

M-E

, Berbiche

, et al. Factors Associated With Presenteeism and Psychological Distress Using a Theory-Driven Approach. J Occup Environ Med 2015;57(6):617–26.

22.

Amick

III BC

, Habeck

, Ossmann

, Fossel

, Keller

, Katz

. Predictors of successful work role functioning after carpal tunnel release surgery. J Occup Environ Med 2004;46(5):490–500.

23.

Churchill

Jr . A paradigm for developing better measures of marketing constructs. J Mark Res 1979;16:64–73.

24.

Fabrigar

, Wegener

, MacCallum

, Strahan

. Evaluating the use of exploratory factor analysis in psychological research. Psychol Methods 1999;4(3):272–99.

25.

Bearden

, Netemeyer

, Teel

. Measurement of consumer susceptibility to interpersonal influence. J Consum Res 1989;15(4):473–81.

26.

Byrne

. Factor analytic models: Viewing the structure of an assessment instrument from three perspectives. J Pers Assess 2005;85(1):17–32.

27.

Gerbing

, Anderson

. An updated paradigm for scale development incorporating unidimensionality and its assessment. J Mark Res 1988;25:186–92.

28.

Kline

. Principles and practice of structural equation modeling. New York: Guilford Press, 2005.

29.

, Bentler

. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Struct Equ Modeling 1999;6(1):1–55.

30.

Diamantopoulos

, Siguaw

. Introducing LISREL: A guide for the uninitiated. London, UK: Sage, 2000.

31.

Santor

, Ramsay

. Progress in the technology of measurement: Applications of item response models. Psychol Assess 1998;10(4):345.

32.

Ramsay

. TestGraf: A program for the graphical analysis of multiple choice test and questionnaire data. Montreal, QC: McGill University, 2000.

33.

Santor

, Ramsay

, Zuroff

. Nonparametric item analyses of the Beck Depression Inventory: Evaluating gender item bias and response option weights. Psychol Assess 1994;6(3):255.

34.

Floyd

, Widaman

. Factor analysis in the development and refinement of clinical assessment instruments. Psychol Assess 1995;7(3):286–99.

35.

Khan

, Lewis

, Lindenmayer

J-P

. Use of non-parametric Item Response Theory to develop a shortened version of the Positive and Negative Syndrome Scale (PANSS). BMC Psychiatry 2011;11:178.

36.

Tang

, Beaton

, Amick

Iii , Hogg-Johnson

, Côté

, Loisel

. Confirmatory factor analysis of the work limitations questionnaire (WLQ-25) in workers’ compensation claimants with chronic upper-limb disorders. J Occup Rehabil 2013;23(2):228–38.

37.

Walker

, Michaud

, Wolfe

. Work limitations among working persons with rheumatoid arthritis: Results, reliability, and validity of the work limitations questionnaire in 836 patients. J Rheumatol 2005;32(6):1006–12.

38.

Amick

Iii , Lerner

, Rogers

, Rooney

, Katz

. A review of health-related work outcome measures and their uses, and recommended measures. Spine (Phila Pa 1976) 2000;25(24):3152–60.

39.

Gosselin

, Lemyre

, Corneil

. Presenteeism and absenteeism: Differentiated understanding of related phenomena. J Occup Health Psychol 2013;18(1):75–86.

40.

L-T

, Bentler

. Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychol Methods 1998;3(4):424–.

41.

Kenny

. Measuring Model Fit 2012 [updated July 5. Available from: http://davidakenny.net/cm/fit.htm

42.

Beaton

, Kennedy

. Beyond return to work: Testing a measure of at-work disability in workers with musculoskeletal pain. Qual Life Res 2005;14(8):1869–79.

43.

Trierweiller

, Peixe

BCS

, Tezza

, do Valle Pereira

VLD

, Pacheco

Jr , Bornia

, et al. Measuring organizational effectiveness in information and communication technology companies using item response theory. Work 2012;41(Suppl 1):2795–802.

44.

Schmidt

, Amick

III , Katz

, Ellis

. Evaluation of an upper extremity student-role functioning scale using item response theory. Work 2002;19(2):105–16.

45.

Lochhead

, MacMillan

. Psychometric properties of the Oswestry disability index: Rasch analysis of responses in a work-disabled population. Work 2013;46(1):67–76.

46.

Abma

, Amick

Iii , Van Der Klink

JJL

, Bültmann

. Prognostic factors for successful work functioning in the general working population. J Occup Rehabil 2013;23(2):162–9.

47.

Abma

, van der Klink

, Bültmann

. The Work Role Functioning Questionnaire 2.0 (Dutch Version): Examination of its reliability, validity and responsiveness in the general working population. J Occup Rehabil 2013;23(1):135–47.

48.

Ramada

, Delclos

, Amick

BCI

, Abma

, Pidemunt

, Castaño

, et al. Responsiveness of the Work Role Functioning Questionnaire (Spanish Version) in a general working population. J Occup Environ Med 2014;56(2):189–94.

49.

Eurofound. Fifth European working conditions survey. Luxembourg: Publications Office of the European Union. 2012.

50.

Goetzel

, Long

, Ozminkowski

, Hawkins

, Wang

, Lynch

. Health, absence, disability, and presenteeism cost estimates of certain physical and mental health conditions affecting U.S. employers. J Occup Environ Med 2004;46(4):398–412.

51.

Laroche

, Kim

, Tomiuk

. IRT-based item level analysis: An additional diagnostic tool for scale purification. Adv Consum Res 1999;26:141–9.