Abstract
BACKGROUND:
The Individual Work Performance Questionnaire (IWPQ), measuring task performance, contextual performance, and counterproductive work behavior, was developed in The Netherlands.
OBJECTIVES:
To cross-culturally adapt the IWPQ from the Dutch to the American-English language, and assess the questionnaire’s internal consistency and content validity in the American-English context.
METHODS:
A five stage translation and adaptation process was used: forward translation, synthesis, back-translation, expert committee review, and pilot-testing. During the pilot-testing, cognitive interviews with 40 American workers were performed, to examine the comprehensibility, applicability, and completeness of the American-English IWPQ.
RESULTS:
Questionnaire instructions were slightly modified to aid interpretation in the American-English language. Inconsistencies with verb tense were identified, and it was decided to consistently use simple past tense. The wording of five items was modified to better suit the American-English language. In general, participants were positive on the comprehensibility, applicability and completeness of the questionnaire during the pilot-testing phase. Furthermore, the study showed positive results concerning the internal consistency (Cronbach’s alphas for the scales between 0.79–0.89) and content validity of the American-English IWPQ.
CONCLUSION:
The results indicate that the cross-cultural adaptation of the American-English IWPQ was successful and that the measurement properties of the translated version are promising.
Introduction
In today’s world, it is increasingly important to maintain, improve, and optimize individual work performance of employees. In Europe and the United States of America, for example, the ‘grey wave’ (i.e., accelerated growth of the older working population and a decline in numbers of the younger working population) and the economic recession influence companies and employees to perform more or better work with less people. Due to the grey wave, the retirement age of older workers has been prolonged [1]. Thus, performance at work has to be maintained until a later age. In order to accurately establish the effectiveness of interventions, procedures and strategies to maintain, improve, or optimize individual work performance, valid measurement of individual work performance is a prerequisite.
Individual work performance, defined as “behaviors or actions that are relevant to the goals of the organization”, is since long considered to be a multidimensional construct [2, 3]. Based on several reviews of the literature [4–6], it can be concluded that individual work performance consists of three broad dimensions: task performance, contextual performance, and counterproductive work behavior. The first dimension, task performance, traditionally has received most attention, and can be defined as “the proficiency with which individuals perform the core substantive or technical tasks central to their job” [2]. The second dimension of individual work performance is contextual performance, defined as “behaviors that support the organizational, social and psychological environment in which the technical core must function” [7]. The third dimension of individual work performance is counterproductive work behavior, defined as “behavior that harms the well-being of the organization” [5].
While there is a general consensus on the three dimensions of individual work performance [4, 5], there is still little consensus on how to measure the construct. A multitude of measurement instruments to measure individual work performance (or related constructs such as presenteeism or productivity) exist [8]. For example, Williams and Anderson [9] developed a short and generic task performance scale, which measured behaviors such as adequately completing assigned duties, fulfilling prescribed responsibilities, and performing tasks that are expected of the employee. Scales used to assess contextual performance are those developed by for example Podsakoff and MacKenzie [10] or Van Scotter and Motowidlo [11]. The former focuses on measuring altruism, conscientiousness, sportsmanship, courtesy, and civic virtue. The latter focuses on measuring interpersonal facilitation and job dedication. Scales used to assess counterproductive work behavior were developed by for example Bennett and Robinson [12] or Spector et al. [13]. The former authors focus on measuring organisational and interpersonal deviance. The latter authors focus on measuring sabotage (e.g. damaging company equipment), withdrawal (e.g. taking longer breaks), production deviance (e.g. doing work incorrectly), theft (e.g. stealing company property), and abuse (e.g. making fun of someone at work). In the field of occupational health, the main focus for measuring individual work performance was on sickness absenteeism or presenteeism (i.e., work absence or losses in individual work performance due to health impairments). In accordance, numerous instruments have been developed to measure sickness absenteeism or presenteeism, such as the Work Productivity And Impairment Questionnaire [14], Work Limitations Questionnaire [15], and the WHO Health and Performance Questionnaire [16].
Several limitations can be observed in the scales developed to measure dimensions of individual work performance. Most strikingly, none of them measure all of the relevant dimensions of individual work performance together. Thus, they do not measure the full range of individual work performance. Also, scales measuring different dimensions can include antithetical items, creating unjust overlap between these scales [17]. As a result, the content validity of these scales can be questioned. Furthermore, none of the scales appear suitable for generic use. The scales were developed for specific populations, for example to ascertain the specific influence of physical and mental health on individual work performance [14–16], or they were developed and refined based on employees with a specific occupation [9, 10].
In order to overcome limitations of existing questionnaires, the Individual Work Performance Questionnaire (IWPQ) [18, 19] was developed in The Netherlands. The IWPQ measures individual work performance at the group-level based on individual worker self-report. Its conceptual framework was derived from a systematic review of the literature [6], and the item pool was developed from the literature, existing questionnaires, and expert interviews. The final items were determined using expert consensus [8] and Rasch analysis [18, 19]. The IWPQ is the first questionnaire to incorporate all relevant dimensions of individual work performance into one questionnaire. An advantage of this is that the content of each scale is complementary to the content of the other scales. As a result, the scales do not include redundant items, that is, items overlapping in content [17]. The psychometric properties of the IWPQ have been tested in The Netherlands, and indicate good to excellent internal consistency for task performance (α= 0.78), contextual performance (α= 0.85) and counterproductive work behavior (α= 0.79). Also, the IWPQ has shown good face and structural validity [8, 19], as well as sufficient convergent validity and good discriminative validity [20].
Another important characteristic of the IWPQ is that it is generically applicable [18, 19]. Thus, it is suitable for workers in all types of jobs, and provides a relatively comprehensive measure of work performance that can capture the potential influence of all sorts of variables (e.g., personal and environmental variables). Thus, conceivably, the IWPQ may be suitable for examining the effectiveness of a broad range of interventions, procedures and strategies to maintain, improve, or optimize individual work performance.
In order for the IWPQ to be used outside of The Netherlands, it has to be cross-culturally adapted and validated. The objectives of the current study were to perform a cross-cultural adaptation of the IWPQ from the Dutch to the American-English language, and assess the questionnaire’s internal consistency and content validity in the American-English context.
Methods
Individual Work Performance Questionnaire
The Individual Work Performance Questionnaire (IWPQ) [18, 19] measures “employee behaviors or actions that are relevant to the goals of the organization” [2]. The IWPQ consists of 18 items, divided into three scales: task performance, contextual performance, and counterproductive work behavior (see Table 1). All items have a recall period of 3 months and a 5-point rating scale (“seldom” to “always” for task and contextual performance, “never” to “often” for counterproductive work behavior). A mean score for each IWPQ scale can be calculated by adding the item scores, and dividing their sum by the number of items in the scale. Hence, the IWPQ yields three scale scores that range between 0 and 4, with higher scores reflecting higher task and contextual performance, and higher counterproductive work behavior.
Cross-cultural adaptation
The IWPQ’s cross-cultural adaptation process (Fig. 1) followed the stages proposed by Beaton et al. [21], and is based on the first step of the three-step process adopted by the International Society for Quality of Life Assessment (IQOLA) project [22].
Forward translation
The forward translation of the IWPQ’s instruction, items, and answer categories, was performed by two independent translators. Both translators were bilingual, with American-English as their mother tongue. One translator (the “informed” translator, a researcher) had expertise on individual work performance, and the other translator (the “uniformed” translator, a chiropractor) was naive about the topic. Both translators wrote a report of the translation, containing challenging phrases and uncertainties, and considerations for their decisions.
Synthesis
The results of both translations (T1 and T2) were compared by the two translators and one researcher (LK). A written report documented the consensus process, the discrepancies, and how the discrepancies were resolved. The translators and the researcher reached consensus on one common American-English questionnaire (T-12).
Back translation
The common American-English questionnaire was back-translated into Dutch by two other independent translators. Both translators were bilingual, with Dutch as their mother tongue. One translator was naive about the topic (a PhD student), whereas the other translator had expertise on the topic (a researcher). Both translators wrote a report of the translation, containing challenging phrases and uncertainties, and considerations for their decisions.
Expert committee review
All the translated versions were combined into one pre-final questionnaire by an expert committee. The expert committee consisted of the four translators, one researcher (LK), and one methodologist (HdV). Discrepancies between the original and translated versions were identified and discussed. Also, semantic, idiomatic, experiential and conceptual equivalences were evaluated. Again, a written report documented the consensus process, the discrepancies, and how the discrepancies were resolved. The expert committee reached consensus on a pre-final American-English version of the IWPQ.
Pilot-test
To examine the comprehensibility, applicability, and completeness of the translated questionnaire, a pilot-test was performed. A total of 40 participants were included in the pilot-test. Inclusion criteria were: currently working (8 hours a week or more), aged 18–65 years, and able to read and understand the American-English language. Participants were a convenience sample of employees of Tufts Medical Center in Boston, MA. In order to promote participation in the pilot-test, an outreach e-mail was send to employees of participating departments, after which an appointment with the researcher (LK) could be made. The pilot-test was approved by the Tufts University/Tufts Health Sciences Institutional Review Board (IRB number 10929).
After signing an informed consent form, participants filled in the American-English IWPQ. “Think aloud” and “probing” techniques [23] were used in order to identify participants’ opinion on the comprehensibility, applicability, and completeness of the instructions, items, and answer categories of the translated questionnaire. The duration of the pilot-test was on average 15 minutes, including questionnaire completion. Participants’ comments were written down into a report by the researcher (LK). The comments were independently assessed by two researchers (LK and CB), after which a consensus meeting took place. Any discrepancies that remained were discussed with the translators and the other IWPQ developers (VH, HdV, and AvdB), after which consensus was reached on a final American-English questionnaire.
Measurement properties of the pre-final questionnaire
Descriptive statistics of the IWPQ items and scales, and of the socio-demographic characteristics of the participants (gender, age, number of work hours a week, and primary type of occupation) were used to examine the distribution of the IWPQ responses. Internal consistency of the IWPQ scales was determined using Cronbach’s alpha [24]. Item-to-scale correlations were calculated to evaluate the fit of the item within the scale. Furthermore, scale scores were examined for floor or ceiling effects (>15% at the extreme values [23]). Statistical analyses of the data were donein SPSS20.
The content validity of the American-English questionnaire was evaluated by the members of the expert committee throughout the cross-cultural adaptation process, and by the developers of the IWPQ through qualitative analysis of the comments provided by the participants of the pilot-test.
Results
Cross-cultural adaptation
Translation
The forward translation of the IWPQ was conducted and some challenging issues were encountered. All issues were discussed among the two translators and the researcher, until consensus emerged. First, conceptual issues were identified with the instruction. “Behavior at work” was considered too evaluative and derogatory (implying a lack of maturity). To obtain conceptual equivalence to the original meaning, it was chosen to use “how you conducted yourself at work.” Second, for some questionnaire items, inconsistencies with the verb tense were identified. In Dutch, the simple past (e.g., “started”) and the present perfect (e.g., “have started”) are used interchangeably. It was chosen to consistently use the simple past in the American-English version, because the items refer to a completed action in the past 3 months. Furthermore, there were some idiomatic issues in the translation of items
The back-translation was conducted without major difficulties. Issues were discussed among the members of the expert committee until consensus emerged. First, a conceptual issue was identified with the instruction sentence “how you conducted yourself at work.” Comment was that you cannot “conduct yourself.” To obtain conceptual equivalence to the original meaning, it was chosen to use “how you carried out your work.” Second, there were some linguistic and conceptual issues in the wording of items TP2 (“results I needed to achieve in my work” was considered incorrect use of American-English),
Pilot-test
The resulting version of the questionnaire was administered to 40 employees of Tufts Medical Center (n = 18 men and n = 22 women). On average, participants were 34.5 (9.8) years of age, and worked 45.9 (13.7) hours a week. See Table 2 for an overview of the sample descriptives.
Five participants (12.5%) mentioned that the
Ten participants (25%) felt that the distinctions between the answer categories were unclear. This issue mainly concerned the distinction between “regularly” and “often,” with eight participants feeling that these categories are almost the same, and could also be placed the other way around. In addition, two participants felt that “seldom” and “sometimes” were almost the same. One participant felt that “seldom” should be worded as “rarely.” Finally, two participants wondered whether everyone would notice the change in answer categories for the CWB scale. Some participants suggested to rename the answer categories to “none of the time –some of the time –half of the time –most of the time –all of the time,” or to only name the extreme categories and number the middle categories. Another participant said that no matter how the answer categories are labeled, people will always have trouble distinguishing them, and they will be filled in like a visual analog scale. As no clear alternative arose during the pilot-test, and only a minority of participants reported an issue, it was chosen not to change the answer categories in order to retain equivalence to the Dutch version.
Although participants stated that they had no major difficulties in understanding or answering most of the items, six items elicited the greatest number of comments, most of which were in the task performance scale. Twelve participants (30%) were unsure what was meant by “work result” in question
Almost all participants (85%) felt that all questions were applicable to their job. Two participants said that question
All participants (100%) stated that the completeness of the questionnaire was good. When asked, 16 participants (40%) had suggestions to expand the questionnaire to include all relevant aspects of their work performance. These suggestions mainly included determinants of individual work performance (e.g., job satisfaction, job tenure, and sleep quality), or indicators of individual work performance that were previously included, but removed during the development of the questionnaire (e.g., relationship with co-workers and supervisor(s), collaboration with others, access to and use of supplies). Based on the suggestions, it was not considered necessary to add any new questions to the questionnaire. A short questionnaire with content identical to the Dutch version was considered most important.
Measurement properties of the pre-final questionnaire
Descriptive statistics of the IWPQ items can be seen in Table 1, and descriptive statistics of the IWPQ scales can be seen in Table 3. Almost all items showed floor or ceiling effects (>15% at the lowest or highest answer category). At the scale level, the mean score for task performance was 2.79 (SD = 0.69), 2.90 (SD = 0.65) for contextual performance, and 1.15 (SD = 0.73) for counterproductive work behavior. The mean scale scores are comparable to scores in The Netherlands, although the mean scale score for contextual performance was slightly higher than in The Netherlands (2.90 in the USA, versus 2.31 in The Netherlands) [18]. There were no ceiling or floor effects on the scale level. Five percent of the participants showed the highest score (4, “always”) for the task performance scale, and the contextual performance scale. Five percent also showed the lowest score (0, “never”) for the counterproductive work behavior scale.
Internal consistency of the IWPQ scales was determined using Cronbach’s alpha. For the task performance, contextual performance, and counterproductive work behavior scales, the Cronbach’s alpha’s were 0.79, 0.83 and 0.89, respectively (Table 3). The item-to-scale correlations were sufficiently high (r > 0.40), except for item
Based on the cultural adaptation process, and the comments provided by the participants of the pilot-test, the content validity of the American-English IWPQ was judged to be good. Almost all participants in the pilot-test considered the questions to be applicable and relevant to their job, and all participants felt that the completeness of the questionnaire was good.
Discussion
The goal of the current study was to cross-culturally adapt the IWPQ from the Dutch to the American-English language and assess the questionnaire’s internal consistency and content validity in the American-English context. The cross-cultural adaptation was systematically performed, resulting in an American-English version of the IWPQ that equals the original version. In general, participants were positive on the comprehensibility of the questionnaire. A few changes were made to optimize the comprehensibility of the questionnaire. Here, the consideration of not changing the wording of a question in order to keep it similar to the original question, versus changing the wording of a question in order to obtain conceptual equivalence to the original question, is important. For example, the answer category labels of the IWPQ were not changed in order to retain equivalence to the Dutch version, and because no alternative arose that was believed to improve comprehensibility. On the contrary, the wording of task performance items 3 (“I was able to distinguish main issues from side issues”) and 4 (“I was able to carry out my work well with minimal time and effort”) was changed in order to improve comprehensibility. In Dutch, it was chosen to give a description of “prioritizing” and “efficiently,” as these words are hardly ever used directly. However, based on American participants’ suggestions to improve comprehensibility, these items were shortened to more directly ask for “prioritizing” and “working efficiently.” All participants were positive on the completeness of the questionnaire, and almost all participants indicated that all the questions were relevant and applicable to them. This indicates good content validity of the questionnaire. Thus, there appear to be no cultural differences between The Netherlands and America in measuring the concept of individual work performance, and the indicators used to measure the concept of individual work performance seem to be equivalent over these contexts. Although additional indicators of individual work performance suggested by participants in the pilot-test (e.g., relationship with co-workers and supervisor(s), collaboration with others, access to and use of supplies) might have been included when developing the IWPQ from scratch in America, a short questionnaire with identical content to the Dutch IWPQ was considered most important in the current study. The generalizability of the questions in the Dutch IWPQ was probably promoted by the fact that people from multiple countries (including the USA) were involved in the developmental stages of the IWPQ, for example, during itemgeneration [8].
If the IWPQ items have kept the same meaning after the translation, the American-English questionnaire is expected to retain the same factor structure as in The Netherlands. The sample size in the current pilot-test (n = 40) was too small to conduct a confirmatory factor analysis. De Vet et al. [23] recommend a sample size of at least n = 100 to perform a reliable factor analysis. However, item-to-scale correlations were examined, and were similar to the item-to-scale correlations in The Netherlands. All items loaded sufficiently high on the expected scales, except for item
The measurement properties of the Dutch and American-English IWPQ appear to be similar. The mean item and scale scores appear to be similar in both versions, although the mean scale score for contextual performance was slightly higher for the American-English than Dutch IWPQ. As the IWPQ is analysed at the scale level, it is interesting to examine whether more than 15% of the responses are at the lower or upper end of the scale, indicating floor or ceiling effects. There were no considerable floor or ceiling effects at the scale level. The internal consistencies of the American-English IWPQ task performance, contextual performance, and counterproductive work behavior scales were 0.79, 0.83, and 0.89, respectively. This is similar to the Dutch version, where the scale reliabilities are 0.78, 0.85, and 0.79, respectively. The internal consistency of the American-English CWB scale is higher than in The Netherlands.
Limitations
A limitation of the current study was that participants were aware that the questionnaire measured individual work performance, due to the informed consent procedure before the study. Secondly, in the current study, a researcher was sitting next to the participants while they were filling in the questionnaire. Finally, some participants reported reservations to answer the CWB questions honestly, because they felt the questions were uncomfortable or intense to answer. All these factors may have elicited socially desirable answers, and resulted in different scores on the American-English version than the Dutch version of the questionnaire. In general, we recommend leaving out the questionnaire title and scale names when administering the questionnaire, so that participants are less aware they are filling in a questionnaire on individual work performance. We also recommend that participants’ answer are always anonymous and are treated confidentially. It should be guaranteed that only group level outcomes will be reported to managers or companies, obtained in large enough groups, so that results can never be traced back to individual participants.
The pilot-test in the current study was conducted in a relatively high-educated sample, with participants primarily working in a pink or white collar job. This may limit generalizability of the results to lower-educated workers, and blue collar workers. Although, in general, the translators were positive on the questionnaire’s comprehensibility, applicability, and completeness for lower-educated workers, and blue collar workers, one translator had concerns about the use of the word “priorities” in these groups. Ideally, the comprehensibility, applicability, and completeness of the American-English IWPQ, as well as its internal consistency and content validity, should still be examined in these groups.
Future research
Although the pilot-test results indicate good internal consistency and content validity of the American-English IWPQ, it is only after the cross-cultural translation and adaptation that the real cross-cultural validation takes place [23]. In a larger and more heterogeneous sample, special attention should be paid to the measurement invariance of the questionnaire in the original and the new target population. Measurement invariance means that a measurement instrument, a scale, or an item, functions in exactly the same way in different populations [23]. This can be examined, for example, using factor analysis or item response theory (IRT) techniques. Future research should perform confirmatory factor analysis in a larger and more heterogeneous sample, and examine if (and if so, why) item
Conclusion
The cross-cultural translation and adaptation of the IWPQ from the Dutch to the American-English language was conducted without major difficulties. The comprehensibility, applicability, and completeness of the translated version of the IWPQ were appraised positively. Also, the study showed positive results concerning its internal consistency and content validity. Future research should further examine the measurement invariance, reliability, validity, and responsiveness to change of the questionnaire in a larger and more heterogeneous sample. After further cross-cultural validation, the American-English IWPQ may be used to measure, for example, the effectiveness of workplace interventions on individual work performance in an American-English speaking context.
Footnotes
Acknowledgments
We wish to thank Kimi Uegaki, Tammy Rubinstein, Fenna Leijten and Nico Pronk for their help in translating the questionnaire, the EMGO+ Institute for Health and Care Research for providing the travel grant, Tufts Medical Center for their hospitality, and the Tufts Medical Center employees for their participation in the pilot-test.
