Abstract
BACKGROUND:
The Work Role Functioning Questionnaire (WRFQ) was developed to assess workers’ perceived ability to perform job demands and is used to monitor presenteeism. Still few studies on its validity can be found in the literature.
OBJECTIVE:
The purpose of this study was to assess the items and factorial composition of the Canadian French version of the WRFQ (WRFQ-CF).
METHODS:
Two measurement approaches were used to test the WRFQ-CF: Classical Test Theory (CTT) and non-parametric Item Response Theory (IRT).
RESULTS:
A total of 352 completed questionnaires were analyzed. A four-factor and three-factor model models were tested and shown respectively good fit with 14 items (Root Mean Square Error of Approximation (RMSEA) = 0.06, Standardized Root Mean Square Residual (SRMR) = 0.04, Bentler Comparative Fit Index (CFI) = 0.98) and with 17 items (RMSEA = 0.059, SRMR = 0.048, CFI = 0.98). Using IRT, 13 problematic items were identified, of which 9 were common with CTT.
CONCLUSIONS:
This study tested different models with fewer problematic items found in a three-factor model. Using a non-parametric IRT and CTT for item purification gave complementary results. IRT is still scarcely used and can be an interesting alternative method to enhance the quality of a measurement instrument. More studies are needed on the WRFQ-CF to refine its items and factorial composition.
Introduction
In the field of work disability prevention research, little attention has been given to presenteeism (i.e., “attending work while ill” [1]) compared to work absenteeism. Yet, it has been found that presenteeism is a prevalent problem and accounts for higher productivity losses and costs compared toabsenteeism [1]. For example, prevalence rates between 45% and 64% were reported among the working population [2, 3]. The cost of lost productivity due to common pain problems was estimated at US$61.2 billion per year [4]. The largest part of indirect costs made on employees’ claims was found to be associated with presenteeism (63%; $311.8M)) compared to absenteeism (6%; $27M) [5].
Although presenteeism is an important problem for organizations, few validated measurement instruments have been developed as yet. A systematic review analyzed 16 articles assessing 7 of these instruments and concluded that none demonstrated satisfactory results and that there is a lack of evidence to recommend one over another [6]. To better understand presenteeism and eventually develop effective strategies, researchers and organizations need to rely on valid instruments.
Currently, the Work Role Functioning Questionnaire (WRFQ) and Work Limitation Questionnaire (WLQ) are among the most studied presenteeism instruments [6]. These self-administered questionnaires assess the perceived impact of a health problem on workers’ ability to perform their job [7]. Aside from turning up at work despite ill health (presenteeism), there is evidence that workers’ limitations to perform work demands have a negative impact on their work productivity [8]. Consequently as presenteeism is related to work ability/productivity, the WRFQ and WLQ have been used as a proxy for assessing presenteeism. The items in these questionnaires describe a number of work demands chosen because of their frequent occurrence in a variety of jobs and their importance as identified from the workers’ perspective [7, 9]. Both questionnaires are grounded in the same conceptual framework and their items were all drawn from the same pool [9]. They differ in the number of items, recall period, and response set [9]. These questionnaires are popular in several countries, which is demonstrated by the number of published cross-cultural adaptations: Canadian French [10], Dutch [11], Brazilian Portuguese [12], Turkish [13], and Spanish [14].
To our knowledge, very few studies on the factorial composition of the WRFQ and WLQ can be found in the literature. Factor analysis is a statistical method to identify clusters of related variables and is important for the validity of an instrument [15]. It is essential for construct validity by assessing the internal structure and the cross-structure of items of an assessment instrument [15]. Factor analysis is also important for content validity by providing valuable information to revise an instrument (e.g., identify items that need rewording) [15]. Moreover, it can be useful for choosing instruments that can be used as predictors (predictive validity) [15]. Among existing studies on the WRFQ and the WLQ, contrasting results have been found for their dimensional structure [6]. The objective of this study was to test the validity of one version of the WRFQ, the Canadian French version (WRFQ-CF), by examining its factorial composition and items.
In addition, up to now, the WRFQ has been studied using a Classical Test Theory (CTT) approach. CTT is the most popular measurement approach and has existed for more than a century [16]. A more recent approach has been proposed, specifically Item Response Theory (IRT). These two approaches differ in their basic assumptions, orientation, information provided, and sample size [17, 18]. It has been recommended that these two approaches can be used to provide a quantitative assessment of items/scales and maximize the content validity of patient-reported outcome measures [19]. In this study, we thus tested how IRT could contribute to the content validity of WRFQ-CF by comparing the results with those from the CTT approach.
In summary, the main objective of this study was to test the validity of the WRFQ-CF and more specifically the following: (a) to determine the internal structure of the WRFQ-CF, (b) to identify problematic items using two measurement theory approaches (CTT and IRT), and (c) to compare the results obtained from the two measurement theory approaches.
Method
Participants
This study used a sample of data collected in an online survey of workers from a government agency in Quebec (Canada) aiming at identifying the determinants of work disability [20]. The survey was conducted among workers with regular or casual/temporary positions at the agency. Workers who had been on the job for less than six months were excluded to avoid recruiting those who were in the process of returning to work and who might therefore exhibit different characteristics. This survey was approved by the Hôpital Charles LeMoyne ethical committee. More information on this survey is provided in Coutu et al. [20, 21].
We conducted a secondary analysis of this survey using the data on presenteeism collected with the WRFQ-CF. As this study aimed to assess the factorial composition and item response of the WRFQ-CF, we included participants for whom all the items of the questionnaire were applicable to their job.
Measurement instrument
The WRFQ-CF includes a total of 27 items distributed into five subscales: work scheduling demands (items W1 to W5) (e.g., W1: Work the required number of hours), output demands (items O1 to O7) (e.g., O1: Handle the workload), physical demands (items P1 to P6) (e.g., P1: Walk or move around different work locations), mental demands (items M1 to M6) (e.g., M1: Keep your mind on your work), and social demands (items S1 to S3) (e.g., S1: Speak with people in-person, in meetings, or on the phone) [10]. The WRFQ items are scored on a five-level difficulty response scale measuring the amount of time a physical or emotional problem interfered with the ability to perform work demands in the past four weeks. The response options are: 0-difficult all of the time (100%), 1-difficult most of the time (25%), 2-difficult half of the time (50%), 3-difficult some of the time (75%), and 4-difficult none of the time (0%). The category “does not apply to my job” is also available to make the questionnaire applicable to different types of jobs. According to Amick et al. [22], when more than 20% of the items are marked as not applying (invalid items) to the respondent’s work, a score of the questionnaire or the subscale concerned cannot be calculated. Each subscale is scored separately by adding the response of each item, dividing by the number of valid items, and multiplying by 25 to obtain a score varying from 0 (always limited) to 100 (never limited) [10]. The total score can also be calculated using the same process (i.e., adding the score of all items, dividing by the number of valid items, and multiplying by 25). A high score corresponds to there being few functional limitations to performing the work [10], and therefore low presenteeism.
The Canadian French version of the WRFQ (WRFQ-CF) was created in 2004. The measurement properties of this version were assessed in one study with workers with a musculoskeletal disorder (n = 40). The results showed acceptable construct validity, and variable internal consistency among the subscales (α coefficients ranging from 0.66 to 0.92) [10]. To our knowledge, no other study was published on the WRFQ-CF.
Data analysis
In this study, two common measurement theory approaches were used: CTT and IRT.
Classical Test Theory (CTT)
CTT is a test-oriented approach that is based on the decomposition of observed scores into the sum of the true and error scores [16]. In this study, the steps suggested by Churchill [23] were followed. First, the dimensionality of the WRFQ-CF was verified by generating a total variance explained table using unrotated maximum likelihood (ML). The internal consistency of each factor of the WRFQ-CF was then measured using alpha coefficients and item-total correlations. In general, variables with low reliability coefficients should be avoided (below 0.70) [24]. Also, indicators with an item-total correlation below 0.50 should be eliminated [25]. Second, exploratory factor analysis (EFA) was used for identifying poor items. Items were considered problematic if they had the following criteria: (1) low communality (h2) (below 0.40), (2) small factor loadings (λ estimate) (below 0.40), (3) items loading more on another factor than the intended factor, and (4) cross-loading, i.e., significant loading on several factors [24, 27]. Third, confirmatory factorial analysis (CFA) was done to test model fit. The following fit indices were analyzed: Bentler Comparative Fit Index (CFI), chi-square value, Root Mean Square Error of Approximation (RMSEA), and Standardized Root Mean Square Residual (SRMR). There is no rule of thumb on the best index for assessing model fit. The fit indices chosen are among the most currently used in the literature and recommended by Kline [28]. Two of these indices are absolute fit indices (chi-square, SRMR), which assess how well the a priory model reproduces the data [29]. Two are incremental fit indices (RMSEA, CFI), which means that they compare a target model with a baseline model in which the observed variables are uncorrelated [29]. To support a good fit, the indices were interpreted using the following cut-off values: >0.90 for CFI; non-significant p-value of the chi-square value (>0.05); and <0.05 for the RMSEA and SRMR [30]. Alpha coefficients, item-total correlations, and EFA were computed in SPSS v20. CFA was performed in Lisrel v8.80.
Item Response Theory (IRT)
IRT is a more recent measurement approach and provides information on the relationship between the items and the latent trait. Hence, it consists of an item-oriented approach and each item can be examined independently to determine their contribution in a test [18]. For IRT, the performance of the WRFQ-CF was tested regarding three measures: option effectiveness (i.e., effectiveness of response options and items at different levels of the latent trait), items bias (i.e., extent to which different groups endorse items differently), and scale discriminability (i.e., extent to which a scale can detect differences among persons at different levels of the latent trait) [31]. In this study, a nonparametric IRT was performed using TestGraf, a software that uses the nonparametric kernelsmoothing approach to model responses [32]. It generates the test information function (TIF), option characteristic curves (OCC), and item characteristic curves (ICC). The TIF is a measure of precision and indicates the amount of information in the test at various levels of the trait score [32]. The latent trait will be more precisely estimated when the values of the test information function are large [32]. The OCC provides a graphical representation of the probability of a particular response option being endorsed at different levels of work ability (latent trait) [32]. In the WRFQ-CF, a 5-point scale is used. Hence, we can expect to see five option curves in the OCC of each item. In an OCC, the X- and Y-axes represent respectively the latent trait (θ) and the probability of endorsing an option. To be considered as a “good” item in the WRFQ-CF, it was expected that the probability of choosing option 4 (difficult none of the time) will increase as the latent trait increases, and the probability of choosing option 0 (difficult all of the time) will decrease as the latent trait increases. Also, the probability of choosing options 1 to 3 will increase then decrease throughout the latent trait, with lower options located more to the left of the X-axis and higher options to the right of the X-axis. The ICC graphically represents the relationship between the expected item score and the expected total score (i.e., work ability) [32]. In the WRFQ-CF, the expected item score ranges from 0 to 4 (Y-axis in the ICC). It was expected that the respondents’ low on the latent trait will score 0 and those high on the trait will score 4. Hence, a curve that increases as the trait increases is expected. Also, the slope of the curve will indicate how effectively an item discriminates respondents at different levels of the latent trait: a steep curve meaning that the expected item scores increase rapidly as a function of work ability whereas a flat curve means that the item scores are less effective at discriminating respondents at different levels of work ability [33].
Results
Participants
A total of 352 participants were retained for the purpose of this study. As shown in Table 1, the participants were mainly women (66%), aged between 40 and 59 years (79%), married or common-law (69%), and had more than 10 years seniority at the agency (65%). Most workers had 1 to 5 days of work absence over the past year due to illness (56%).
Sociodemographic characteristics of the participants of this study (n = 352)
Sociodemographic characteristics of the participants of this study (n = 352)
Dimensionality and internal consistency
The eigenvalues and total variance explained by five factors ranged from 0.96 to 11.88 and 44.01% to 65.41%, respectively. Based on Kaiser’s rule stating that eigenvalues of 1.0 or greater represent the number of factors, four factors were identified in this questionnaire [34]. However, a five-factor model was retained for the following analysis since the eigenvalue of the fifth factor is close to 1.0, the total variance explained by five factors is over 65%, and the a priori number of common factors is specified as five in the literature on the WRFQ and WLQ.
Table 2 presents alpha coefficients and item-total correlations for each factor of the WRFQ-CF. As shown in Table 2, the alpha coefficient of the social demands factor was low (0.672) as well as the item-total correlations of all three items in this factor (S1, S2, and S3). Also, item M6 in the mental demands factor had a correlation below 0.50. Hence at this stage, four items were found problematic (M6, S1, S2, and S3). Consequently, the social demands factor was also removed.
Cronbach’s alpha, item-total correlations and alpha if item deleted of the WRFQ-CF (5 factors)
Cronbach’s alpha, item-total correlations and alpha if item deleted of the WRFQ-CF (5 factors)
Using the 23 remaining items, EFA was conducted using ML with a fixed number of factors (n = 4) and Direct Oblimin rotation. The four factors accounted for 66.41% of the total variance. Also, the correlation between factors ranged from 0.202 to 0.684, confirming that oblique rotation was warranted. The examination of the factor loadings indicated seven problematic items: the O1, O2, and O6 items loaded more on the work scheduling demands factor (λ of –0.710, 0.451, and –0.477, respectively) compared with the output demands intended factor (λ of 0.307, 0.382, and 0.135, respectively); the M3 item loaded more on the output demands factor (λ= 0.643) compared with the mental demands intended factor (λ= –0.250); the M2 item cross-loaded with the output demands factor (λ= 0.331) and the mental demands intended factor (λ= 0.492); the O7 item had a small factor loading on the output demands intended factor (–0.338); and the W4 item had low communality (h2 = 0.392). The final four-factor model after purification with EFA was composed of 16 items. In this model, the four factors accounted for 70.97% of the total variance. Also, the correlations between the factors ranged from 0.392 to0.628.
Since several items had cross-loadings between the work scheduling and output demands factors, we conducted another EFA using three factors (ML and Direct Oblimin rotation). Only two items were found problematic due to cross-loading with the work scheduling/output demands factor (λ of M2 = 0.416; λ of M3 = 0.458) compared to the intended mental demands factor (λ of M2 = –0.446; λ of M3 = –0.338). Fewer items were removed compared to a four-factor model because the five problematic items from the work scheduling and output demands factors did not show anomalies when using three factors. Hence, a three-factor model with 21 items was found for EFA. The three factors accounted for 62.18% of the total variance. Also, the correlations between the factors ranged from 0.370 to 0.726.
Confirmatory Factor Analysis (CFA)
CFA was performed in Lisrel to assess the four-factor and three-factor model fits. The factor loadings and fit indices of these models are presented in Table 3. Among the four indices presented in Table 3, two showed a good model fit.
Standardized estimates of confirmatory factor model
Standardized estimates of confirmatory factor model
To improve the goodness-of-fit estimates (particularly of the chi-square and RMSEA), two items were removed from the model based on the largest negative and positive standardized residuals: items P2 and P4. After removing these two items, a satisfactory model fit was attained for the four-factor model (14 items): RMSEA showed an acceptable fit (0.06) and the other indices showed a good fit (SRMR = 0.04; CFI = 0.98). Although the chi-square value dropped from 306.04 to 167.36, its p-value was still significant (p <0.01) and thus did not support the fit.
For the three-factor model, four items were removed (W1, O3, O4, and P4) to obtain an acceptable model fit in all indices beside the chi-square value (RMSEA = 0.059; SRMR = 0.048; CFI = 0.98; and x2 = 264.93, p < 0.01, df = 116). Hence, in the three-factor model, a good fit was obtained with 17 items.
Prior to conducting the analysis in TestGraf, the unidimensionality of each factor in the WRFQ-CF was ascertained using unrotated principal component analysis (PCA) in SPSS. The first factor of all dimensions accounted for more than 50% of the total variance (ranging from 60.39% to 62.28%); the eigenvalues of the first factor were over 1.0 (ranging from 1.81 to 4.32); and the ratio between the first and second factors was high (from 2.98 : 1 to 7.22 : 1), which confirmed their unidimensionality [35].
Appendix 1 presents the Option characteristic curves (OCC) and item characteristic curves (ICC) for all the items of the WRFQ-CF. Based on a visual inspection of the OCC and ICC of the items presented in Appendix 1, options 3 and 4 dominated for all the items and the probability of endorsing options 0, 1, or 2 was less than 45%, except for items P5 and M1. A total of 13 items were identified as weak items since the slope of their ICC was flat and options 3 and 4 dominated (nearly) the whole range of the latent trait in the OCC: factor 1 (items W3 and W4); factor 2 (items O4, O5, and O7); factor 3 (items P1, P4, and P6); factor 4 (items M3 and M6); and factor 5 (S1, S2, and S3).
Comparison between the results of CTT and IRT
A total of 13 weak items were identified with the 4-factor model CTT approach, 10 with the 3-factor model CTT approach, and 13 with the IRT approach. As shown in Table 4, six items were identified in all three approaches: one item on physical demands (P4); two items on mental demands (M3 and M6); and all three items on social demands. In addition, three items were found in IRT and one of the CTT: one item on work scheduling demand (W4), and two items on output demands (O4 and O7). Seven items were found only with CTT, and four items only with IRT.
Comparison of problematic items identified with Classical test theory (CTT) and Item response theory (IRT)
Comparison of problematic items identified with Classical test theory (CTT) and Item response theory (IRT)
To our knowledge, the measurement properties of the WRFQ-CF were only tested in one study [10]. To contribute to its improvement, this study aimed to test the validity of the WRFQ-CF by assessing its items and factorial composition. It used two different measurement theory approaches (CTT and IRT), which gave similar and complementary results with nine weak items identified in both approaches. Three-factor and four-factor models were tested, which generated different items on work scheduling and output demands. The three-factor model showed good model fit with fewer problematic items.
To date, few studies on the factorial structure of the WRFQ or its alternate forms can be found in the literature. In previous studies, the number of factors studied ranged from three to five. The original article on the WLQ-25 identified four distinct factors: time, physical, mental-interpersonal, and output demands [9]. Tang et al. [36] compared four- and five-factor models and found that both models showed acceptable goodness-of-fit indices, with the five-factor model performing better [36]. Also, Walker, Michaud, and Wolfe [37] carried out a factorial analysis of the WLQ-25 and extracted three factors (eigenvalue more than 1.0) with one predominant factor explaining 77% of the variance. Several papers have combined the social and mental demands dimensions [9, 37]. Yet, in this study, all items in the social demands factor were found problematic using both measurement theory approaches, which prevented us from testing a five-factor model. The social dimension plays a salient role in presenteeism. For example, studies have found that workers who have a high level of support and integration at work will tend to show up ill at work compared to those with poor social support and integration [1]. We recommend that the social demands items be revised and further tested to provide better discrimination.
In this study, the work scheduling and output demands subscales were correlated. When using a four-factor model, three items in the output demands factor (O1, O2, and O6) loaded more on the work scheduling demands factor. Also, all except one item of the output demands were found weak by either measurement theory approach. This result suggests that these two factors could be combined. One study on the Dutch version of the WRFQ-27 also combined the work scheduling and output demands factors and suggested a new version composed of four subscales: (1) work scheduling and output demands, (2) physical demands, (3) mental and social demands, and (4) flexibility demands [45]. These two factors share some similar characteristics. Work scheduling demands were defined as “worker’s needs to manage the workday from beginning to end” and output demands as “activities related to completing work on time, with high quality and to everyone’s (including the worker’s) satisfaction” [38]. Both subscales involve a time component, which might explain the correlations between the factors. Also, these items might refer to control over tasks, which has been found to be an important risk factor for presenteeism [39].
The purification of items was conducted on the basis of exploratory and confirmatory factor analysis as well as of IRT. For the confirmatory factor analysis, we used four fit indices. Some may argue that one or two indices are enough, while others suggest combining different indices. Since there is no consensus in the literature and several criteria may influence the performance of each fit index [40], we preferred to use a combination of several types of indices to provide a better picture of the model fit. We used those proposed by Kline [28]. Also, in this study, the chi-square estimate did not support model fit. The chi-square estimate usually provides a reasonable measure of fit when the sample size ranges between 75 and 200 [41]. This estimate will often be statistically significant with a larger sample size [41]. Hence, the poor chi-square value might be explained by the sample size used in this study (n = 352).
In this study, the number of problematic items ranged from 10 (3-factor model EFA/CFA analysis) to 13 (4-factor model EFA/CFA analysis and IRT). If we consider all items that were identified as problematic by either approach, most factors will have fewer than three items. It is not recommended using fewer than three items per factor for patient-reported outcomes since the interpretation of some factors will be weak and will not properly assess the construct [24]. Revision of the weak items should be considered, especially the nine items found problematic by both approaches (Table 4). The results of this study could also be used to develop a short version of the WRFQ-CF. In the literature, some studies have reported short versions of the WRFQ and WLQ, such as WRFQ-15, WLQ-16, and WLQ-8 [6, 42].
This study used IRT to test items. To our knowledge, no other study on the WRFQ has used this approach. Also, in the field of work disability, IRT is less frequently used compared to CTT (e.g., [43–45]). Yet, IRT provides some advantages over CTT. For example, IRT can provide information on the relationship between the item and the latent trait. Thus, each item can be examined independently to determine their contribution in a test [18]. This allows for the comparison of different tests measuring the same ability [18]. This offers an advantage over CTT that requires test forms to be parallel before their scores can be compared [17]. In IRT, the items from different tests can be placed on a common scale, which enables the comparison of the level of difficulty of the tests as well as the development of item banks [18]. In this study, non-parametric IRT was useful to corroborate results obtained from CTT. Nine items were found to be problematic in both approaches. Also, using IRT gave complementary results with four additional items. This difference can be explained by the level of analysis of both approaches: CTT’s focus is on test-level information (comparing the items to the entire scale) whereas IRT focuses on the item level [18]. The results found in IRT can guide the choice of items and identify those that need revision.
Differences can be identified between studies on the WRFQ (and its alternate forms) and this study, which might explain the discrepancies between the number of factors identified and the items retained. First, studies on factorial validation of the WRFQ have mainly been conducted with workers having musculoskeletal disorders whereas in this study the workers were not recruited based on a specific diagnosed health problem. However, in this organization, distress was found in 62% of the sample, including 41% with high levels of psychological distress [21]. Other studies have also used the WRFQ with the general working population, which supports the need for validation studies on this population [46–48]. Our study participants were from the service sector, which is a sector at risk of presenteeism [2]. Also, presenteeism is not limited to workers having musculoskeletal disorders and affects a large proportion of the workforce [49]. Other health conditions such as allergies, diabetes, and headaches were found among the top conditions associated with productivity loss [50]. Documenting presenteeism in the general working population can help to put in place appropriate approaches to reduce its impact. Second, the statistical analysis methods were not the same. Among the few factorial validation studies identified, EFA (mainly PCA and orthogonal rotation [37, 47]) and CFA [36] were used. In this study, ML with oblique rotation was used. We did not use PCA because it only accounts for variance in the observed variables and does not differentiate between common and unique variance [24]. Also, orthogonal rotation is usually not recommended when the constructs are often correlated with one another [24]. In this study, oblique rotation was used and was justified by the presence of correlations between factors (ranging from 0.202 to 0.684).
This study has some limitations that need to be considered. First, the study population was mainly office workers from one government agency. Hence, it is not possible to generalize the results of this study to the general working population. Second, we did not perform cross-validation, i.e. randomly split the sample into two groups to check if the factor solutions can be replicated across groups [34]. In this study, since the WRFQ-CF has 27 items and the communalities were moderate, a sample size of at least 200 was targeted [24]. The sample size was large enough (n = 352) for factor analysis based on a subjects-to-variable ratio of 10 : 1 [34]. Third, we used TestGraf, a software allowing non-parametric IRT for item purification. One main limitation of TestGraf is the lack of pre-assigned cut-off values [51]. Hence, the interpretation of graphs generated in TestGraf might be based on the subjective judgment of the researchers. However, in this study, there were no uncertainties and disagreement among our team during the visual inspection of the graphs since all the identified items clearly showed some weaknesses (e.g., flat curves, dominating options) (Appendix 1).
Conclusion
The WRFQ and WLQ questionnaires are widely used to assess presenteeism. These questionnaires can be used as part of a broader work disability prevention process, for tailored intervention development, for establishing the prevalence of presenteeism, and for identifying risk factors. Testing the validity of a questionnaire is crucial to ensure that it serves its intended purpose. Factor analysis is invaluable for testing the validity of an instrument and provides information on the internal structure, and the items that need to be revised. To this end, this paper provides an example of the application and comparison of two measurement theory approaches (CTT and IRT). IRT is more and more recommended but it requires a large sample size to yield reliable results [19]. Non-parametric IRT is an interesting alternative method for item purification and could be further explored for the development of assessment instruments in work disability. Based on our sample, these two approaches provided similar and complementary results (9 common items and 11 items found in one approach). Also, in the literature, the number of factors determined for the WRFQ (and alternate forms) varies from three to five. In this study, we found fewer problematic items when using a three-factor model. Furtherstudies are needed on this questionnaire to refine the items and define its internal structure. Finally, studies on other measurement properties (e.g., reliability, convergent and discriminant validity) of the WRFQ-CF are required.
Conflict of interest
None to report.
Footnotes
Appendix 1: Option characteristic curves and item characteristic curves for the five factors of the WRFQ-CF
Acknowledgments
Quan Nha Hong is supported by a doctoral scholarship from the Canadian Institutes of Health Research (CIHR). Marie-France Coutu was supported by a junior research fellowship from the Fonds de recherche du Québec - Santé (FRQS) at the time of this study.
