Abstract
BACKGROUND:
The Return-to-Work Assessment Scale (RAS) was developed in 2021 by Ibikunle et al. to assess return-to-work among post-stroke survivors.
OBJECTIVE:
The aim of this study was to describe how the conceptual (flag model and ICF) and theoretical framework (C-OAR-SE) were used in developing the RAS.
METHOD:
The development of the RAS consisted of three phases: (i) Initial item generation (ii) Face and content validity (iii) Psychometric testing. With each phase embracing the flag model, international classification of functioning, disability and health (ICF) and the C-OAR-SE an acronym for the six aspects of the theory: ‘C’ [construct definition], ‘OAR’ [object representation, attribute classification, and rater entity identification], and ‘SE’ [selection of item type and answer scale, as well as, enumeration].
RESULTS:
A triangulated approach drawn on three separate theories and models. Phase one was developed by using the flag model which provided the semi-structured open ended questions that materialized into the draft instrument while phases two and three were developed using the ICF and the C-OAR-SE. The scale consists of two sections, A and B. Section A comprises general information about post-stroke survivors, which would not be scored, while section B includes three parts that are important to consider when deciding to return-to-work.
CONCLUSION:
An instrument called RAS was developed, an excellent, internally consistent, as well as reliable tool that has demonstrated good group and structural validity.
Introduction
Scale development originated from France, at the beginning of the early twentieth century [1]. Alfred Binet and his associate developed a test intended to assess young school children in 1905 [1]. This test was used for proper placement of school pupils in Paris, it eventually influenced other schools, beyond Paris, within 10 years, while an English version was ready to be utilized in the United States of America and Germany [1]. During the great wars, the first and second, this test was used to recruit the military personnel, especially when aptitude needed to be assessed. Eventually, after the great wars, many tests became available to assess diverse variables [1].
Measurement is a conception that entails creating guidelines for the allocation of scores, which should relate to the concept [2]. In order to create these guidelines, the researcher should simplify the concept, by concentrating on its most significant areas, to achieve his/her aims [3, 4]. It is important to understand the reason for creating a measurement tool. Measuring tools, or instruments, are used to determine health outcomes, which facilitate decision-making about the provision of care for patients [3, 5]. One outcome measure may not be able to assess all the variables that interest a researcher; therefore, several outcome measures are developed, in the form of clinical indices, which are applied to assess variables of concern, by ascribing scores to them, and subsequently, summarizing them to generate item level scores. Most indices measure item scores at the nominal or ordinal level [4].
Until recently, no instrument exists that could be used to assess return-to-work from various perspectives. Consequently, there was a need to develop a contextually sensitive, multi-perspective instrument that could assess return-to-work from various perspectives, specifically, the employer’s, the employee’s, as well as psychosocial, physical, and contextual viewpoints. In addition, it has to be specific to the Nigerian context and culture. In this study, the researchers developed an indigenous, multi-perspective instrument that could assess return-to-work. This would help to assess post-stroke survivors and their readiness to return-to-work, thereby monitoring the stages of return-to-work. Without the means of assessing post-stroke survivors for their readiness to return-to-work, they might not be accepted back into formal employment, which could affect their self-esteem, confidence, and social identity. When subjected to assessment, using an indigenous and multi-perspective instrument that is valid and reliable, the employers may be reassured to reinstate affected employees, objectively and unbiasedly, should they be deemed ready to return-to-work.
The tests and interpretation of scores from subjective instruments are related to measures such as, validity and reliability [6, 7]. Cohen and Swerdlik describe a test simply as a “measuring device or procedure” [1]. In a similar approach, a “psychological test” is defined as a method intended to assess psychological variables [1]. Different tests are guided by various rules, intended to assess and understand the variables. Some tests are self-administered, while others are patient/participant administered, or even computer-based [1]. Most aptitude-based tests have test manuals that instruct researchers on how to administer the test, and explain the results. Tests also differ in terms of their technical quality and psychometric soundness.
Materials and method
The overall research design for this study was a sequential multiphase study design, comprising three phases. Every phase had its own methodological elements, and the findings of each phase fed into subsequent phases. The development of the Return-to-Work Assessment Scale (RAS) consisted of three phases: (i) Initial item generation (ii) Face and content validity (iii) Psychometric testing. Phase one was developed by using the flag model which provided the semi-structured open ended questions that materialized into the draft instrument while phases two and three were developed using the ICF and the C-OAR-SE. In the first phase the overall sample size was 18. The sample included seven post-stroke survivors and three caregivers, five rehabilitation specialist and three employers. The health professionals included physiotherapists, occupational therapists, as well as occupational nurses, and employers were also interviewed. In the second phase twenty participants were involved in the Delphi study, which included 16 males and four females while in phase three one-hundred-and-one patients, who were stroke survivors, participated in the psychometric testing of the RAS. The first phase adopted exploratory research as the primary design, the second phase addressed the construction and validation of the preliminary measure while phase three was a cross-sectional survey research design. The conceptual and theoretical frameworks employed for this study was triangulated and drew on three separate theories and models. First, the International Classification of Functioning Disability and Health model for disability (ICF) was complemented by the Flags model. This assisted with the development of content. The theoretical framework also included the C-OAR-SE theory to guide the construction process. Thus, the framework for the present study straddled content and process in pursuit of the aims of the study.
Conceptual framework
International Classification of Functioning, Disability and Health (ICF)
Disability was viewed by the ICF as not merely a handicap, but also the restriction of normal social, societal, environmental, and economic functioning of an individual [8]. This implies that an individual is disabled when s/he is unable to perform routine functional activities, due to pathology and/or a health condition (for example, stroke); therefore, such functional restrictions should be noted in the course of health assessment, and could be used as a measure of outcome of intervention [8].
Disability could be defined as a deficiency in the capacity to execute a movement in a manner considered normal for an individual. Disability involves the restriction of abilities, in the form of complex performance and behaviours that are generally acknowledged as important mechanisms of day to day living [9]. French and Swain observed that disability is a form of social oppression, especially when used to segregate people, in order to deny them social participation in their communities [10].
The ICF is a multipurpose classification with various specific aims: (1) providing a scientific basis for the comprehension of health and health-related outcomes and instruments; (2) providing a systematic coding scheme for health care discipline; (3) permitting comparison of data across countries, health care disciplines, service, and time; (4) providing a common language to describe health and health-related outcome measures, for health providers, such as policy-makers, researchers, health workers, as well as the general public. Therefore, in this research, the ICF was used as a conceptual framework for the development of the RAS (Fig. 1).

Representation of the International Classification of Functioning, Disability and Health (ICF) [8].
Flags could be compared to warning signals that surrounds an individual, acting as hindrances to full recuperation, and returning to work [11]. Flags could be observed in three areas, namely: (1) Yellow flags that relate to the Person [thoughts, feelings and behavior]; (2) Blue flags that depict the Place of work [work and health concerns]; and (3) Black flags that describe the Contextual issues [relevant people, system and policies] [11]. Identifying these flags involves detecting unhelpful behaviours and circumstances that could influence return-to-work. The Flags model was developed and refined over decades. In this current study, it was adopted as complementary to the ICF, for the purpose of construct definition in the first phase of the study. It was used to identify the common barriers of returning to work, as well as assist in discovering ways to overcome those barriers [11]. The semi-structured questions in Appendix 1 were developed with the principles and concept of the flag model which guided the development of the initial draft instrument after analyses of the interviews conducted.
Culler et al. identified factors that contribute to, or act as barriers to return-to-work in post-stroke survivors, involving employers, rehabilitation professionals, and post-stroke survivors, in a qualitative study, using the Flags model [12]. These factors could include thoughts, feelings, behavior, work and health concerns, relevant people, as well as the system and policies. In this current study, the researcher seeks to develop an instrument to assess return-to-work among post-stroke survivors. The Flags model was adopted as complementary to the ICF, to determine the variables (factors), or domains to be included in the domains proposed by the ICF and in the resultant instrument as identified by stakeholders.
Theoretical framework
The theoretical framework of this study was the modified C-OAR-SE theory [13]. The ICF and the Flags model provide a conceptual map for the content of the instrument, while the Psychometric Theory provides the theoretical underpinning for instrument validation. The Psychometric Theory assumes that constructs exist objectively, and therefore, can be measured, to accurately reflect reality, while acknowledging the probability of measurement error [14]. The inevitable measurement error has to be managed and curtailed to an acceptable level, within which the probability of significant errors are eliminated, or reduced [15]. This ontological assumption (nature of reality) translates into two important psychometric properties that have to be fulfilled, in order for measurements to be useful, namely, validity and reliability [16]. Psychometric theory provides clear operational definitions and statistical procedures for instrument development, through which reliability and validity are established [17]. Reliability and validity are expanded below to reveal its importance in the development of the research instrument, to be used in this current study.
Validity is the extent to which an instrument measures what it is designed to measure [18]. It is the extent to which the measurements accurately represent the construct under investigation [19]. There are four types of validity, namely, face-validity, content, criterion-related and construct validity [19]. Face validity is very popular, because of its simplicity, although technically, it is not a type of validity [17].
The reliability of an instrument refers to the degree of consistency with which the instrument measures whatever it measures [15]. In this current study the researcher uses the Psychometric theory to develop an instrument to assess physical and psychosocial determinants of return-to-work that satisfies the psychometric criteria, as evidenced by the validity and reliability estimates computed. Psychometric theory provides the theoretical framework, and the operational steps, to determine the reliability and validity of the proposed instrument, to assess return-to-work.
Psychometric theory adapted: C-OAR-SE theory
Traditional Psychometric theory emphasizes measurement for the purposes of establishing reliability and validity. Traditional Psychometric theory emphasizes the measuring of the validity of the construct, by examining the relationship between the measure used to assess the construct in question, and the scores that such a measure produces (in essence, evaluating a measure’s validity from the scores it produces). The major criticism against traditional psychometric theory is that the generation of quantitative indices was prioritized at the expense of thorough construct clarity and design procedures. The resultant statistical indices were thought to be sufficient evidence of good construction and construct validity. Therefore, for the purpose of this current study, the researcher considers a more recent theory and procedure, namely, the C-OAR-SE method by Rossiter [20].
C-OAR-SE is an acronym for the six aspects of the theory: ‘C’ [construct definition], ‘OAR’ [object representation, attribute classification, and rater entity identification], and ‘SE’ [selection of item type and answer scale, as well as, enumeration and scoring rule] [21, 22]. It is both a theory and a procedure that is testable through proof of rational argument [22]. In this way, the method is a drastic option to the conventional approach of psychometric theory [23]. The C-OAR-SE method is based on expert content validation [21, 23]. Specifically, it is a rational, rather than an empirically-based theory and procedure [20]. The C-OAR-SE method assigns greater value to content validity, by ensuring that the measure represents the construct, accurately, as defined [21, 22]; specifically, observing whether the measure examines the conceptual definition and the content universe of the construct. Ultimately, when the content validity is ignored, the construct is ignored [21, 22]. Additionally, for items to be content valid, it should have the properties of both high item-content validity, as well as, high answer-scale validity.
The operational steps, or processes included in the model are: construct definition; object representation; attribute classification; rater-entity identification; and scale construction (selection of item type and answer scale, enumeration and scoring). These processes concur with the phases of this current study and its objectives. For example, the FLAG model guided the construct definition. The psychometric theory (C-OAR-SE) was the primary framework, and the FLAG model was a second complementary theory. An overview of the traditional C-OAR-SE theory by Rossiter [21, 24], reveals that the theory is made up of three main principles, which differentiates it from the conventional psychometrics approach. The three major principles are: 1) Expert-assessed high content validity of items and the answer scale; 2) Predictive validity of the measure is additionally desirable for a predictor construct, as the notion of construct validity is non-sensical and misleading. In the last principle, the psychometrics approach goes badly astray [13]. These two initial principles distinguish the C-OAR-SE theory, drastically, from the psychometrics theory.
C-OAR-SE theory
The C-OAR-SE theory was developed by Rossiter [13] as a procedure for scale development, which places entire emphasis on the high content validity of the items, and answers scale, or answer scales, should varied ones be used for each item [20]. C-OAR-SE theory regards reliability as the only reference to the statistical precision of observed scores obtained from it in a particular application. Content validity therefore becomes necessary for reliability, reversing psychometric arguments which states that reliability is necessary for validity. The new procedure C-OAR-SE is an acronym for Construct definition, Object classification, Attribute classification, Rater identification, Scale formation, Enumeration and reporting. These six steps are the most essential when developing a proper measure of any construct (Fig 2).

Steps in the C-OAR-SE procedure (adapted from Rossiter [20]).
The C-OAR-SE approach is made up of contributions from previous works on conceptualization of construct by authors like Blalock [25], McGuire [26], Bollen and Lennox [27] and research on attribute classification from authors like, Fornell and Bookstein [28], Cohen, Cohen, Teresi, Marchi, and Velez [29], as well as Law and Wong [30], Edwards and Bagozzi [31], to mention just a few [13, 20].
The ICF and flag model guided in the analysis of the researchers in developing the RAS. The Atlas.ti facilitates textual analysis and interpretation, particularly, selecting, coding, annotation, and comparing important segments of text (www.atlasti.com). To facilitate evidence-based research, the findings, theses, and interpretations are grounded in the evidence. The transcribed interviews are transformed to PDF and uploaded into the qualitative analysis tool. Common concepts are identified, from which codes/quotations are derived, and later themes emerge, which are extracted and transcribed (Appendix 2).
The six-steps of thematic analysis were used to analyze the transcribed interviews: familiarization, coding, generating themes, reviewing themes, defining/ naming and writing-up by using the Atlas.ti.7.5.0 version software. The process of thematic analysis, as proposed by Braun and Clarke, involves reading all the transcripts thoroughly, selecting sentences that relate to key questions, and coding quotations, being alert for codes that may overlap, in order to merge them. Subsequently, the categories are grouped into themes, theoretical and methodological comments are recorded in writing, while the validity of the codes, categories, and themes, are checked by the researcher, as well as the study supervisors, who are experts in the process of thematic analysis. Generalisations are set, and linked to the formalised body of knowledge, in the form of constructs, or theories.
Emerging themes
The researcher closely examined the data to identify common themes like topics, ideas and patterns of meanings that comes up repeatedly. The six-steps of thematic analysis applied during the analysis of this article are: familiarization, coding, generating themes, reviewing themes, defining/naming and writing-up [32].
There are four types of thematic analysis universally used: (I) Inductive thematic analysis-no preconception of themes, it’s derived from the work; (ii) Deductive thematic analysis-using predetermined themes; (iii) Semantic thematic analysis-ignoring underlying means for data. What is explicitly or overtly stated is used when investigating opinion and viewpoint; (iv) Latent thematic analysis. According to Braun and Clarke, thematic analysis is a method of identifying, analysis, reporting patterns (themes) within data. It is particularly useful when a research project aims to discover themes and concepts embedded throughout qualitative data [32]. Thematic analysis is a method of analyzing qualitative data, it is usually applied to a set of text, such as an interview or transcript. However, thematic analysis is flexible method that can be adapted to many kinds of research using the Atlas Ti.7.5.0, the researchers uploaded the transcribed interviews into the software. After familiarization and coding, codes-quotations emerged, which were distilled into five themes. Using the Atlas.ti.7 version, quotations were extracted from the transcribed interviews, to which codes were assigned. The data analysis for this study revealed five themes, which are listed below.
All domains were developed from comprehensive sets of ICF items, and made to correspond directly with the ICF activity and participation dimension, which is applicable to any health condition. The three domains of RTW provide a profile and summary measure of functioning and disability that is reliable and applicable across cultures and adult populations.
C-OAR-SE as adopted in the development of the RAS combines the construct development with the face and content validation composing of the attribute classification, raters identification, enumeration and scale formation which were authenticated in the Delphi panel study and eventually in the pilot/psychometric study for the determination of the structural validity and reliability of the RAS.
The ICF was the primary conceptual framework for this current study, while the Flags model was complementary. The Flags model was included and used conceptually, as part of the construct definition, while the primary conceptual model was the ICF, with its codes and components. The psychometric theory (C-OAR-SE) was the primary framework, and the Flags model was a second complementary theory. An overview of the traditional C-OAR-SE theory by Rossiter [22, 24] reveals that the theory is built around three major principles, which distinguish this method of measure design, from the now standard psychometrics approach. The result of the standard psychometric approach reveals that 58 (57.4%) males and 43 (42.2%) females, with a mean age of 53.88, and±10.68 internal consistency was high with a Cronbach’s Alpha coefficient of 0.81 for Domain 1, 0.93 for Domain 2, and 0.76 for Domain 3.The test-retest reliability analysis provided an ICC of 0.85 (p = 0.001) for Domain 1, 0.91 (p = 0.001) for Domain 2, and 0.99 (p = 0.001) for Domain 3. The Bland Altman plotting method revealed that the test-retest results were not strictly centered, and the bias was only -0.93 for Domain 1, 0.07 for Domain 2, and –0.93 for Domain 3. The limits of agreement for the two scores of each domain were –3.16 to 1.31 for Domain 1, –6.99 to 7.14 for Domain 2, and –13.6 to 11.74 for Domain 3. The Kaiser-Meyer-Olkin measure of sampling adequacy (KMO) value for Domain 1 was X2 = 0.63, and the Bartlett’s test of Sphericity value was significant (P = 0.000); therefore factor analysis was appropriate. The Kaiser-Meyer-Olkin measure of sampling adequacy (KMO) value for Domain 2 was 0.839, and the Bartlett’s test of Sphericity value was significant (P = 0.000). Ideally the KMO value is supposed to be 0.6 or above, while that of the Test of Sphericity value should be significant (0.05 or smaller) to verify that the data used were suitable for factor analysis. The Kaiser-Meyer-Olkin measure of sampling adequacy for Domain 3 was 0.658, while the Bartlett’s test of Sphericity was significant (p = 0.001). The aim was to describe the theoretical and conceptual framework, and the phases on which the development of the RAS was grounded.
Discussion
The production of the RAS utilizing these great models and theories is a novel contribution to theoretical application in research. This is probably the first time the ICF model will be combined with the flag model and the C-OAR-SE to develop an instrument for validation and reliability testing. Lucieer et al. and van Stipout et al. developed a health-specific instrument to measure vestibulopathy which was previously measured with generic instruments [33–35]. Using generic questionnaires for vestibulopathy was observed to be less objective, it is therefore very important in research and clinical setting to regularly assess conditions with health-specific instruments to promote accuracy and objectivity. Developing the RAS was therefore very vital to post-stroke survivors in order to determine specifically and objectively their readiness to return to work and their daily experiences regarding the disability and limitations encountered in the course of their day-to-day challenges. With no specific means of evaluating return-to-work among post-stroke survivors, they may not be accepted back into their formal employment, which may affect their self-esteem, confidence, and social identity [36].
As earlier stated other works also adopted the ICF components, the development of the RAS is similar to the World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0), used in measuring functioning and disability, in accordance with the International Classification of Functioning, Disability, and Health developed by Usten [37]. However, the RAS combined the ICF codes and categories, with the Flags model as complementary, to produce a more exhaustive construct/conceptual definition for return-to-work, as combined with the C-OAR-SE theory. The researchers noticed the vacuum or lack of an outcome measure/instrument designed specifically to measure or assess return-to-work from all these studies. This however was the gap which the researchers filled with this current study.
In the work of Usten et al., WHODAS 2.0 is quite similar to the RAS although the WHODAS 2.0 was developed as a generic health scale for measuring functioning, disability in accordance with the ICF items [37]. The RAS was a health specific instrument for post-stroke survivors. No Delphi survey was conducted in the development of the WHODAS 2.0, the extensive and rigorous international research involved in developing WHODAS 2.0 included: (1) a critical review of the literature on conceptualization and measurement of functioning and disability, and of related instruments; (2) a systematic cross-cultural applicability study; and (3) a series of empirical field studies to develop and refine the instrument. Thus, the contribution of the present study was in providing a triangulated theoretical framework for the construction of the measure. This is not typically the case for health outcome measures resulting in instruments that lack reliability and validity due to a lack of construct clarity.
The psychometric properties were determined, using internal consistency and factor analysis. One-hundred-and-one patients, who were stroke survivors, participated in the psychometric testing of the RAS. The data supported the use of factor analysis [38]. The RAS is in line with the ideals and thoughts of Heerkens et al., and Hoefomit et al., they believed that the ICF should be broadened especially the contextual factors, personal and environment factors for early return-to-work [39, 40].
Applying the exploratory design of in-depth interview and six-steps of thematic analysis as described by Braun and Clarke [32], the instrument was developed in a linear and clear manner. This was similar to the work of Kimman et al., in which a health-specific questionnaire was developed for patient’s experience and satisfaction with medications. One of the main advantage of generic questionnaires is their large applicability, they are however less sensitive to small changes and may include domains which are not applicable for every population. Health-specific instruments such as the RAS can be used in clinical practice to (i) measure personal recovery between post-stroke survivors and rehabilitation specialist, (ii) rate objectively cope at work and readiness to return to work, and (iii) the extent to which contextual factors can influence recovery and return-to-work.
There are eleven return-to-work scales which were developed as health specific outcome measures: Return-to-Work Self-Efficacy Questionnaire (RTWSE) by Shaw and Huang [41], Readiness for Return-to-Work (RRTW) by Franche et al. [42] for sick leave, Supervisors to Support Return-to-Work (SSRW) by Munir et al. [43] for measuring behavior of supervisors, Return-to-Work Obstacle and Self Efficacy Scale (ROSES) by Corbiere et al. [44], Psychosocial Aspect of Work Questionnaire (PAWQ) by Gray, Adefolarin and Howe [45] for workers with low back pain, Quality of Work Life Scale by Nanjundeswaraswamy [46], Successful Return-to-Work by Greidanus et al. [47], Obstacles to Return-to-Work Questionnaire (ORTWQ) by Milani [48] for cancer survivors, Work Rehabilitation Questionnaire (WORQ) by Finger et al. [49], Work Role Functioning Questionnaire 2.0. The Work Role Functioning Questionnaire (WRFQ) by Abma, van der Klink and Bultmann [50], The Quality of Working Life Questionnaire by Monnette et al. [51], and the Quality of Working Life Questionnaire for Cancer Survivors (QWLQ-CS). It can be stated that these more specific questionnaires, such as the RAS, have greater sensitivity for small changes in health state or quality of life. The RAS development was also similar to the works of Maslach et al. [52]. The Maslach Burnout Inventory (MBI) is a health-specific outcome measure used to assess the burnout syndrome; it’s not a generic scale, very sensitive like the RAS. The third edition was constructed by Maslach and Jackson in 1996. The MBI is a 22-item self-assessment tool that assesses the degree or stages of burnout syndrome in terms of 3 subscales; emotional exhaustion (9 items), depersonalization (5 items) and decreased level of personal accomplishment (8 items). The RAS is designed similarly like the Maslach burnout inventory to interpret return-to-work in three subdivisions with each aspect measuring an aspect of return-to-work. The scale is designed not to add up but to be measured in three separate domains as already stated above. This does not necessarily mean that the outcome is entirely perfect or without need for improvement but it is merely an attempt to contribute to knowledge and practice.
This study was not without limitations. First, sex difference of participants was not considered in this study. Second, comparing rural or urban Nigeria was not considered neither was it the focus of this study.
Conclusion
The application of the ICF concept, the flag model and the C-OAR-SE to develop the RAS which was a health-specific multidimensional assessment tool, an excellent, internally consistent, and reliable tool was successful and the RAS demonstrated good group reliability, as well as divergent and structural validity. Consequently, it could be used to assess and monitor return-to-work among post-stroke survivors.
Ethical approval
Ethical approval was obtained from the Senate Research Committee of the University of the Western Cape, Republic of South Africa on 26/5/2015 (Registration number 15/3/20) and Faculty of Health Sciences Ethical Review Committee of Nnamdi Azikiwe University, Nigeria on 15/2/2018 (ERC/FHST/NAU/2018/028).
Informed consent
The authors informed all prospective participants about the scope and purpose of the study, as well as their rights regarding participation. Subsequently, after they agreed to participate, their informed consent was sought and obtained.
Conflict of interest
The authors declare that they have no conflict of interest.
