How and how well do workplace assessments work? Using contextual variations in a theory-based evaluation with a large N

Abstract

What can be said about effect of an intervention without a control group? The lack of evaluative evidence is a long-standing problem for regulatory policies against work-related health and safety risks. The European Union Occupational Safety and Health Framework has been in operation for three decades and covers more than 200 million workers, but the most recent evaluation was inconclusive about the benefits generated by this framework. A theory-based evaluation focusing on mechanisms in combination with a design capturing within-intervention variations offers a way forward. The idea is to measure the prevalence of most likely mechanisms and their correlation with outcomes. This approach is illustrated in a large-N evaluation of the use of workplace assessments in the public sector in Denmark. The strengths and weakness of the workplace assessment legislation are assessed. It is shown how findings based on the presented approach contribute to the public debate about workplace assessments.

Keywords

evaluation design and choosing methods evaluation learning explanation and causality governance and regulation influence and use performance management and accountability policy making public management theory building and theory use

Introduction

In the absence of a control group, how much can be said about how and how well an intervention works? Consider, for example, Directive 89/391/EEC (European Agency for Safety and Health at Work, 1989), which established a common framework to secure a minimum level of protection from work-related health and safety risks for workers in all of the member states of the European Union (European Commission, 2017). The framework describes

the prevention of occupational risks, the protection of safety and health, the elimination of risk and accident factors, the informing, consultation, balanced participation in accordance with national laws and/or practices and training of workers and their representatives, as well as general guidelines for the implementation of the said principles. (European Agency for Safety and Health at Work, 1989: 4)

It covers all workplaces, risks and sectors of activity, both public and private, and thus more than 200 million workers.

The Directive places responsibility for risk assessment and risk management on the employer. In addition to complying with minimum legal standards, employers must also actively carry out risk assessments and decide on the improvement measures that best meet the identified risk profile of their company (European Commission, 2017: 10). Given this wide discretion for employers concerning the design and use of these risk assessments, the form of this regulation is soft (Smismans, 2003). However, we know little about how and how well it works.

The most recent evaluation of the European Union (EU) occupational safety and health directives (European Commission, 2017) was unable to quantify the benefits generated by the EU Occupational Safety and Health (OSH) Framework (European Commission, 2017: 5). The evaluation laments the lack of a robust comparison group, the lack of prospective design (European Commission, 2017: 5) and the lack of systematic data monitoring (European Commission, 2017: 6). Despite the legislation having been in place for three decades, ‘The lack of data, and the resulting limitations, have not only posed a methodological problem for the evaluation, but also reflect a fundamental problem for policy and regulatory development in relation to OSH’ (European Commission, 2017: 47). This is an anomaly which makes learning and policy change difficult, especially in an era when evaluation has become a form of meta-regulation (Radaelli and Meuwese, 2010) that requires political actors to justify proposals for policy change with reference to evaluative evidence (Dunlop and Radaelli, 2016; Smismans, 2003).

The evaluation places its hopes in the development of ‘new methodologies, studies and tools’ (European Commission, 2017: 7). However, as long as the same legislation is in place in all of the EU, it remains futile to insist on forms of evaluation which require a control group without the intervention.

Instead, this article builds on a classical and productive idea in theory-based evaluation, which is to use intra-programme variations in outcomes to increase one’s understanding of how the programme works, using ‘mechanisms’ as explanatory devices. Furthermore, ‘how’ is a part of ‘how well’. So, by explicating and measuring contextual variations that are reasonable approximations of mechanisms empirically known to be in operation, it is possible to assess strengths and weaknesses regarding how well the legislation works.

Our empirical material comes from a survey-based evaluation of the use of workplace assessments (WPAs) in 2221 workplaces in Denmark. WPAs are legally required as a result of EU Framework Directive 89/391. The aim of the legislation is to make sure that employers regularly carry out an assessment of risk factors in a way that is designed for the individual workplace. We study mechanisms that explain whether action is taken as a result of the WPAs. For example, we assume that employee engagement, management support and the making of well-known action plans are mechanisms that will be conducive to taking action on the basis of WPAs.

Although large-N studies, variance-based approaches and quantitative methods are less established in realist evaluation, with Ravn (2019) and Ford et al. (2018) as noteworthy exceptions, this article argues that exactly these approaches provide findings that are useful in evaluative argumentation (Valovirta, 2002) in situations where a control group is absent.

The purpose of this article is threefold:

To justify an evaluation design that exploits within-intervention variations as a part of theory-based evaluation (with a focus on mechanisms)

To articulate a clear recipe for how such evaluation can be done

To demonstrate that the suggested approach is feasible and has been productive, using a large-N evaluation of the use of WPAs in the public sector in Denmark as an example.

An evaluation of WPA speaks to the discussion of soft EU regulation and the evaluation methods used herein (Smismans, 2003). It also contributes to knowledge about Occupational Health and Safety Management Systems (OHSMSs) and more broadly to how the impact of soft regulation and risk management can be evaluated in the many situations, where a control group is not possible or accessible (Da Silva and Amaral, 2019; Robson et al., 2007), and where a focus on variations in mechanisms is the most promising way towards statements about impacts. Although our empirical results cover only one instrument in one country, the hypotheses about the mechanisms we identify are ‘portable’ (Astbury and Leeuw, 2010: 374). More broadly and importantly, we wish to encourage evaluators to not let the lack of a control group be an impediment to meaningful statements about how and how well interventions work, but instead to use approaches like the one presented here.

The article proceeds as follows: The literature on OSH and OHSMSs is consulted to identify the roots of the problem regarding impact evaluation encountered in the aforementioned EU evaluation. As an alternative, the literature on theory-based evaluation is discussed and a strategy for evaluation is derived. The case material, WPAs in Denmark, is then briefly described, and hypotheses are provided about the mechanisms influencing whether WPAs are useful and lead to subsequent action. The methods and findings of the evaluation are then presented. The pros and cons of the presented approach to evaluation are discussed. The article concludes with explaining how well WPAs work given the chosen evaluation approach. A concrete policy recommendation aimed at enhancing the evaluability of such policies in the future is also offered.

Challenges to impact evaluation in the OSH area

The EU Framework Directive 89/391 is usually conceptualised as a prominent example of systematic organisational approaches to risk management; or more broadly, Organisational Health and Safety Management Systems (OHSMSs), defined as ‘interrelated elements used to develop and implement an organisation’s OHS policy and manage its OHS risks. Such elements include organisational and responsibility structures, setting of objectives, hazard identification, risk assessment, procedures, processes and resources’ (Helbo et al., 2016: 202).

Theoretically, a WPA is an OHSMS component. If the WPA works, it is because it is part of an organisational process including problem identification, data collection and follow-up, which, in turn, requires stakeholder involvement (e.g. managers and employees). An effect is created by the intervention‒context interaction (Øvretveit, 2011).

Interest in systematic, organisation-based interventions in the OSH area has increased over the years. Drawing inspiration from the safety movement and a managerial desire to monitor and control the managerial aspects of OSH together with demands for external auditability (Hohnen and Hasle, 2011), the need for systematic and overarching management approaches to OSH has intensified. Interventions in terms of OHSMSs suffer from three related problems regarding impact evaluation.

Problems related to the definition of interventions

The EU Framework Directive 89/391 is an intervention which places a responsibility on the employer to monitor risks at work. In the same spirit, the idea is to install an OHSMS designed for the individual workplace. Such systems are ‘shells which can be filled with different content, depending on the company and its ambitions, culture and history’ (Hasle and Zwetsloot, 2011: 962). They are little more than occasions to act. The interaction between ‘intervention’ and ‘context’ creates a potential outcome but is also why it is difficult to determine exact OHSMS boundaries.¹ There are not clear boundaries between OHS activities, OHS management and OHSMSs (Nielsen, 2000). Even if relevant components can be identified, not all must be present in practice for systematic OHS activity to take place (Robson et al., 2007). To be operational, some OHSMS components need to make alliances with mechanisms in each organisational context.

Problems related to the operationalisation of outcomes

The term outcome usually refers to results that are causally produced by an intervention, but causality refers to a variety of qualitatively different things (Cartwright, 2007). OHSMSs include a variety of factors dealing with an array of problems and risks. The solutions to these problems are substantially different and have different timeframes. In turn, potential outcomes are multidimensional and not easily commensurable, and it is difficult to quantify the impact of an OHSMS. No epidemiological data set covers all aspects of work-related health relevant in an OHSMS perspective (Da Silva and Amaral, 2019).

Instead, outcomes must be defined on the organisational level of analysis and the same level of abstraction as OHSMSs. Outcome measures must therefore be not too specific, but still relevant and ambitious. For example, a specific measure of the reduction of a specific problem (such as noise or stress) would not be relevant, since the specific problem may not exist in many workplaces and may not be mentioned in their WPAs. Another outcome candidate may be increased awareness of risk factors among employees and managers, but it lacks ambition if not followed by action. A good, but not perfect outcome measure has to do with the ability of the OHSMS to take action (preventive and corrective measures) aimed at improving the work environment as a result of the WPA (Da Silva and Amaral, 2019: 128).

Problems related to the causal link between interventions and outcomes

Impact evaluation in the OSH area has traditionally been inspired by a biomedical research paradigm (Hasle et al., 2014). The emphasis has been on interventions and outcomes at the expense of process (Griffiths, 1999). Randomized controlled trials (RCTs) have conventionally been perceived as the ‘gold standard’ in intervention research and the benchmark for all robust inference about causal links between interventions and effects.

At the organisational level, however, RCTs are difficult to carry out in real life (Da Silva and Amaral, 2019). Organisations often refuse to participate (Robson et al., 2007: 347) and are difficult to control (Hasle et al., 2014: 74). Perhaps more importantly, when elements of OHSMSs are required by law, it is by definition impossible within a jurisdiction to randomise one’s way to a control group that is not subject to the law.

In addition, even if RCTs were possible, a fundamental problem would remain with attribution of causal effects to the intervention itself. Since OHSMSs are by definition ‘thin’ interventions, most of the difference in effects would continue to rest with variations in commitment, engagement, energy and follow-up across organisations; in other words, with variations in mechanisms, not with variations between OHSMS and no OHSMS (Øvretveit, 2011). After consulting a high number of studies in their review, Robson et al. (2007) conclude that there is neither evidence for nor against OHSMSs. In their review, da Silva and Amaral find zero studies which quantify the effects of OHSMSs. This resonates with the conclusion of the EU evaluation.

Without reflection on the lacking fit between expectations to methods and the practical-political reality, the continuing lack of evidence is a perpetuating, self-inflicted problem. As long as a control group without the legislation is seen as necessary for causal inference, but remains impossible in practice, little conclusive evidence is likely to be produced.

A search for alternative approaches out of this cul-de-sac has therefore been called for (Cox et al., 2007; Hasle et al., 2014; Nielsen and Randall, 2013; Pedersen et al., 2012), not least inspired by ‘realistic evaluation’ (Pawson and Tilley, 1997). We follow common principles for these alternative approaches, a theory-based focus on mechanisms and designs exploiting within-intervention variations, but go a step further by suggesting a concrete systematic approach followed by a large-N evaluation of WPAs.

A mechanism-oriented approach to evaluation using within-intervention variations

Not the least in situations where there is no control group, it is important not to let RCTs and other designs with control groups have a monopoly on evidence about causal links, but also not to abandon the idea of evidence and of causality as such (De Souza, 2013).

A starting point for theory-based evaluation is that ‘programs are ideas’ (Pawson and Tilley, 1997: 71). Every policy instrument is a form of theorisation (Lascoumes and Le Galès, 2007). Theory-based evaluation articulates an underlying theory in an intervention and hypothesises about empirical signs in processes and mechanisms that can be found empirically if the intervention works as expected (Weiss, 1997). Interventions rarely work all by themselves; they work by releasing ‘triggers’ or ‘mechanisms’ in the form of motivations, resources, structures and so on among participants and organisations in the contexts in which interventions unfold in practice. Therefore, empirical variations in outcomes across contexts within a programme are important sources of information in order to test or develop the philosophy of the intervention (Coldwell, 2019; Greene, 2005).

One of the great achievements of realistic evaluation is to provide a vocabulary to articulate the interaction between interventions and contexts in the form of CMO-configurations (context–mechanism–outcome configurations) (Pawson and Tilley, 1997). Unfortunately, realist evaluation has not totally clarified the ontological, epistemological and conceptual standing of these mechanisms (Dalkin et al., 2015; Hawkins, 2016; Porter, 2015).

Pawson (2013) acknowledges that ‘contexts’ and ‘mechanisms’ are not always clear-cut. Mechanisms are linked to ‘accounts’ that take their meaning from their role in explanation (Pawson and Tilley, 1997: 68). It is sound to regard a mechanism as ‘a theory’ (Pawson and Tilley, 1997: 68) or an explanatory device (Pawson and Tilley, 1997: 64).

According to realist philosophy, mechanisms are ‘hidden’ and ‘underlying’ (Dalkin et al., 2015). They are ‘the causal power of things’ that reside on the level of the real (hence realist evaluation), but they are not necessarily actual or observable (De Souza, 2013). However, since the evaluator is embedded in practical social interaction (Schwandt, 2015) and argumentation (Valovirta, 2002), appeals to causal powers which remain hidden in things but are not actual are philosophically problematic (Porter, 2015) and perhaps of limited practical use. The evaluator is stuck with outcomes and mechanisms that can be observed (Hawkins, 2016). And there is no fixed list of ontologically given mechanisms that the evaluator must choose from (De Souza, 2013).

The position taken here is that in practice ‘mechanisms’ are developed pragmatically as explanatory devices, under the circumstances, taking into account existing theories, the programme at hand, and the ability to build an evaluative argument based on accessible data, which describe the degree of activation of relevant mechanisms. We are particularly interested in mechanisms which explain how variations in contexts interact with the philosophy of the intervention in a way that makes variations in outcomes intelligible.

This pragmatist position has consequences in relation to some of the distinctions found in some parts of realist evaluation. Great advances have been made in the development of methodological approaches to detecting causality in small-N studies (De Souza, 2013; Fontaine, 2020; Harris et al., 2019; Wauters and Beach, 2018), but the justification for these studies has sometimes been undergirded by dichotomies which privilege one side of these dichotomies (such as qualitative over quantitative studies, process-based studies over variance-based studies and deterministic causality over probabilistic causality). By doing so, realist evaluation has focused on how interventions work at the expense of how well they work.

Realist evaluators fear that an understanding of mechanisms may be concealed under variance-oriented analyses using quantitative co-variation (Pawson, 2013: 3). However, if we are to know a mechanism in operation from one which is not, contextual variation is often used as a clue (Astbury and Leeuw, 2010). The position taken here, where we may depart from some realistic evaluators but not theory-based evaluation in general, is that, logically speaking, the difference between the presence and absence of a mechanism is itself a variation. In our view, mechanism-oriented evaluators do not deny variations; they merely prefer to focus on variations across active mechanisms rather than across interventions. While there can be many variations in contexts, the evaluator is particularly interested in those empirical variations in contexts, which also represent differences in activation of theoretical mechanisms.

If mechanisms work as devices to explain outcomes, and if we add that they can be operational to varying degrees, then, logically speaking, variations in activation of mechanisms across contexts where the same intervention occurs should lead to variations in outcome which can be quantified (Ravn, 2019).

Furthermore, although the term ‘firing mechanisms’ seems to suggest that mechanisms are either fired or not, theory-based evaluators are beginning to realise that it is useful to think that that mechanisms can be activated along a continuum similar to the light created by a ‘dimmer switch’ (Dalkin et al., 2015; Ravn, 2019). As a consequence, it is therefore hard to accept a philosophical reservation against the idea that a quantitative, variance-oriented and probabilistic analysis of causal effects should have access to the fruitful vocabulary of mechanisms. The analyses of how interventions work and how well they work should not be separated. They complement each other. Just because some methods are sometimes used without sufficient theoretical support, it cannot be concluded that theory-based evaluators can never use these methods. Instead of confining realist evaluation only to some methods and not to others (an issue debated, for example, by Jamal et al., 2015; Marchal et al., 2013; Van Belle et al., 2016), our position is that if an evaluation is guided by an understanding of mechanisms, its methodology can be freely chosen, as long as this purpose is served. For example, surveys can be used to check the extent to which intended mechanisms are in operation (Astbury and Leeuw, 2010: 373), which is done in the present case.

When exploring this path, we remain clearly situated in the theory-based camp. We do not advocate merely a conventional quantitative description of variations in outcomes across background variables (e.g. organisation size, geographical location), if their meaning is not theorised. We focus only on variations in outcomes that are relevant in a theoretical perspective.

The analyses remain anchored in the philosophy of the intervention and proceeds with a search for mechanisms (represented by variations in contexts) that are logically consistent with this philosophy and help make it operational. This perspective is particularly relevant for OHS initiatives merely offering ways for participants to structure their own assessments about their own OHS risks. In soft, self-regulatory schemes (e.g. WPAs), participants and their organisations infuse the intervention with energy and convert it into action. It is therefore only when people do something with interventions ‘where the real-world creeps in, mobilizing diversity and latent energies’ (Stame, 2010: 376) that they are effective. It is not the intervention that works but rather the interactions between interventions and people in their social contexts (Astbury and Leeuw, 2010: 370; Pawson and Tilley, 1997). Our theoretical position here resonates with realist evaluation, but we add that if these interactions (mechanisms) vary across workplaces (contexts), then quantitative methods should be able to capture these interactions in the form of measurable variations.

A theory about WPA usage should provide links between the organisational self-regulation and participation prescribed by the philosophy of OHSMS, on one hand, and the organisational capacity to take action against risks on the other. To accomplish this, a few mechanisms are likely to be more conducive than others.

The contextual variables that represent mechanisms of interest include classical ‘implementation variables’ (describing compliance with explicit policy regulations), but they also describe variations in organisations that set CMO configurations in motion, even if they are not all explicitly prescribed in the specific intervention. Because of the soft form of regulation in the case at hand, ‘implementation’ by definition involves ‘context’.

If it is impossible or difficult to articulate the most likely conducive mechanisms, it would jeopardise the trust in the philosophy of the intervention, which would in itself constitute a test of this philosophy (logically, not empirically). Conversely, if such mechanisms can be identified logically and theoretically, then the next step is to find empirical manifestations of these active mechanisms and map how members of the target population score on these variables. Doing so allows one to assess the degree to which there is fertile ground for setting effective CMO configurations in motion.

The next step is to check whether these contextual variables are sufficiently correlated with outcomes to sustain trust in the philosophy of the intervention. A methodological caveat is that mechanisms with near-zero or 100 per cent prevalence will not reveal any empirical correlation with outcomes. In practice, however, such situations do not occur very often, and if they do, they are easy to detect.

The prevalence of an active mechanism together with its correlation with outcomes contributes to its impact. Both are needed to a reasonable degree. By comparing different mechanisms in both respects, one gets a sense of the strengths and weaknesses of the ability of an intervention to produce outcomes in a given target population.

In this reasoning, we follow Stern et al. (2012: 14) who argue that there is a whole range of productive questions which can be asked on the road towards ‘plausible judgments of effectiveness’. One of these questions is ‘How does the intervention work?’ Answering that question on the basis of (a) the quantified prevalence of mechanisms assumed to be conducive to outcomes as well as (b) their actual correlations with outcomes, contributes to answering a question about how well the intervention works. We follow advice from realist evaluation, which is to look carefully at mechanisms (Pawson and Tilley, 1997), but we do not remain at that level of analysis. We are interested in using the analysis of mechanisms to draw inferences about how well the intervention works (in this case, the legislation of WPAs).

In doing so, we exploit the cooperative relation between the how-question and the how-well-question. We use the data describing how and how well the most likely conducive mechanisms work to gauge how well the WPAs work. Clearly, it is necessary to remind everybody that the benchmark against which the latter question is answered is not a counterfactual one represented by a control group. The benchmark is instead established theoretically through an articulation of most likely conducive mechanisms based on a reasonable interpretation of the philosophy of the intervention. The goal is, in the terminology of Stern et al. (2012), a ‘plausible judgment’.

As a final step, one can use theory not included in the philosophy of the intervention to suggest additional mechanisms and contextual variables. Their prevalence and correlation with outcomes can also be estimated. The strengths and weaknesses of CMO configurations in and outside of the philosophy of the intervention can then be gauged. In a learning perspective, the findings can be used to adjust the philosophy of the intervention if necessary.

An overview of the logic is provided in Table 1.

Table 1.

Summary of analytical steps and questions.

Analytical step	Helps answer the following question(s)
(1) Identify mechanisms consistent with the philosophy of the intervention	Is it logically plausible that the intervention will work?
(2) Check prevalence of contextual variables representing mechanisms	To what extent are conditions in place to support context–mechanism–outcome configurations consistent with the philosophy of the intervention?
(3) Check correlation between these contextual variables and outcomes	To what extent do the alleged mechanisms produce outcomes as expected? What are the strengths/weaknesses of the intervention given the prevalence of different supporting mechanisms and their correlation with outcomes?
(4) Identify additional mechanisms and repeat (2) and (3)	What are the strengths/weaknesses of the intervention given the prevalence of different mechanisms (both incorporated in and not incorporated in the philosophy of the intervention) and their correlation with outcomes?

WPAs in Denmark

The Danish WPA legislation is a direct consequence of Framework Directive 89/391. Because of the soft nature of this regulation, WPAs are described in the public imagery as everything from a useful tool to a ritualistic tick-box exercise.

Workplaces are legally required to carry out a WPA at least once every 3 years. They must consider workplace health and safety issues and include an action plan (in document form) that is made available to managers, employees and inspection authorities.

The form and shape of WPAs is discretionary. For organisations with thousands of employees, the form of the WPA may be determined by upper level management, the HR department and so on. Others delegate much discretion to local actors in the organisation (e.g. local employees, their local bosses and local work environment representatives (WERs) elected by their colleagues). Some seek the help of consultants or accreditation agencies (Hohnen and Hasle, 2011).

We focus on the immediate impact of WPAs in terms of a rough indicator of use: Whether the WPA has led to initiatives intended to spur improvements of the physical work environment or the psychosocial work environment, respectively, and whether the WPA is generally perceived to be a useful instrument. Our measure of use is raw and does not cover long-term impacts, but it is consistent with the OHSMS philosophy, since it aims at processes related to organisational action, which is a critical step towards the amelioration of risks. Again, consistent with OHSMS philosophy, we incorporate several forms of organisational use, all of which offer alternatives to the most frequently cited negative image of WPAs (i.e. the ritualistic tick-box exercise) (Hohnen and Hasle, 2011).

Hypotheses about most likely conducive mechanisms

The key philosophy in the legislation is that workplaces should take responsibility for assessing OSH problems and act on them in a self-regulatory manner consistent with the local situation and in collaboration between managers and employees.² To be effective, this ‘thin’ intervention must set CMO configurations in motion. The most likely conducive mechanisms here are those that are theoretically consistent with the overall philosophy of the intervention; in this case, those that coalesce around participatory, cooperative and systematic self-regulation at the workplace.

As the most likely active mechanisms conducive to use, we propose the following:

Action plan awareness. The use of WPAs will be enhanced if an action plan is made and the managers and employees responsible for the working environment know it is available. Technically speaking, an action plan is a required part of the legally mandated WPA and therefore part of proper implementation. However, it is an empirical question if an action plan is made, and it is only helpful if people in the workplace know of its existence. So, we measure action plan awareness.

Employee engagement enhances the use of WPAs. A key notion in participatory evaluation is that involvement can lead to ownership, relevance and use (Cousins and Whitmore, 2004). Stakeholder involvement is an important driver for data use in organisations (Kroll, 2015: 471). Employee involvement is also a key factor in organisation-level interventions in the work environment (Nielsen and Randall, 2013: 605).

Local influence. WPA usage will be enhanced if local stakeholders (including the WER, the WER’s immediate manager and the WER’s colleagues) have influence on the WPA process. Involvement may be perceived as superficial if it is not followed by influence. The local stakeholders can also influence the design and focus of the WPA in such a way that it has maximum relevance for their perceived problems in the local workplace. Local influence can therefore be conducive to use (Cousins and Whitmore, 2004).

Reliable picture. The use of a WPA will be enhanced if it is believed to deliver a reliable picture of the real working environment at the workplace (Ledermann, 2012). In general, trustworthy evaluative information and ‘measurement system maturity’ (Kroll, 2015: 471) are conducive to use and particularly important in situations where the evaluative information delivers surprising results (Ledermann, 2012).

Management support will enhance the use of WPAs. Such support is crucial for data use in decision-making (Kroll, 2015: 471) and for the effectiveness of organisation-wide interventions (Nielsen and Randall, 2013).

Strategy. The use of WPAs will be enhanced if they are integrated in wider organisational processes, for example, through strategy or HR policies. An integrative organisational approach is often recommended as conducive to the systematic use of evaluative information (Da Silva and Amaral, 2019; Läubli Loud and Mayne, 2014; Robson et al., 2007).

When activated, these mechanisms are theoretically consistent with a soft form of regulation that places a main responsibility for the conduct of WPAs on the workplace and its people, even if we concede they may be imprecise representations of multiple, ‘real’ underlying mechanisms (Ford et al., 2018; Pawson and Tilley, 1997). If the list above represents the most likely mechanisms that help convert a WPA into action, the lack of signs of such mechanisms will constitute a threat to belief in the effectiveness of WPAs. The same is true if, contrary to expectations, the variations in the presence of these mechanisms are not correlated with outcomes. How and how well these mechanisms work thus helps indicate how well a theory of the WPAs works.

Methods and measurements

As stated, we focus on the public sector in Denmark and use WERs as informants in the study. These representatives are elected by their colleagues, their special responsibility being to work together with managers to monitor and improve the work environment. Since there is no authoritative list of WERs, we sought help from 47 trade unions. Twenty trade unions in a variety of branches that kept lists of WERs helped us contact 6775 WERs, from whom we received 2221 useful survey responses (32.8% response rate). In the absence of a full list of WERs, we do not claim that our sample is representative of all workplaces in the country. But they are elected to represent more than 108,000 employees who claim to work in teaching and research, health care, cultural institutions, defence/police/military, administration, care for the elderly, child care and ‘other’, in fact all the typical branches of the public sector.

Since WPAs deal with a broad variety of working environment problems, we measured outcomes as a composite index consisting of three dimensions: Whether the WPA has led to initiatives intended to spur improvements of the physical work environment, whether it did so in the psychosocial work environment and whether the WPA is generally perceived to be a useful instrument in relation to work environment initiatives. Qualitative data informed us of how the latter includes such broad functions as ‘keeping the work environment on the agenda’ and ‘directing attention towards work environment issues’.

In Supplemental Appendix A, we describe the construction of all variables in detail. We measured the prevalence of each of the six mechanisms above in terms of 5-point Likert-type-scale items or, when relevant, with yes/no questions. We transformed them into scales ranging from 0 to 100 so that their prevalence could be compared. With regard to action plans, we grouped ‘no’ and ‘don’t know’ together, since WERs cannot actively use an action plan which they do not know exists.

We measured ‘management support’ as a composite index of upper management support and immediate management support, since these two were found to be strongly correlated in a factor analysis.

Local influence is an index describing the influence of the WER, the immediate manager and the WER’s colleagues. These were found to be correlated and distinct from ‘overall organisational influence’ (see below).

‘Strategy’ is our term for the mechanism describing whether WPAs are integrated into the overall development of the organisation (e.g. in the form of strategy or HR policies).

To reiterate, data describing all of these measures originated from questionnaires filled out by WERs.

To ameliorate potential biases from excluded variables, we measured the following control factors:

Acuteness of problems: Whether the work environment problems are serious enough to require immediate action for the organisation to function. Logically, acuteness of problems should enhance the use of evaluative information (Ledermann, 2012).

Conflict: Whether there are conflicting views in the organisation about the work environment (inspired by Ledermann, 2012).

The influence of the overall organisation on the WPA process. ‘Overall organisation influence’ was measured as a composite of influence from upper management, the HR department and the overall ‘work environment organisation’ (a corporate structure that consists of managers and employees at different hierarchical levels). These were found to be correlated in a factor analysis and distinct from the influence of local stakeholders described above.

The influence of external consultants on the WPA process (inspired by Nielsen and Randall, 2013: 606).

The use of any method in WPAs that is an alternative to merely conventional questionnaires

The use of anonymous data as part of the WPA process

The ability of the WPA to create new knowledge about problems in the work environment

The ability of the WPA to document what is already known about problems in the work environment.

Admittedly, no absolute distinction can be made between mechanisms represented by our ‘most likely conducive mechanisms’ and these control factors. However, our most likely mechanisms are particularly central to a philosophy of participation and organisational self-regulation, whereas some of the control factors are less consistent with that philosophy, less within the reach of local stakeholders, and/or to a higher extent structurally given in a particular context. Our point is not that the control factors do not have an impact of the use of WPAs, but that they may do so in ways that are not nearly as resonant with the philosophy of collaborative organisational self-regulation as our most likely mechanisms.

We tested organisation size (measured as the number of employees that the WER represents) and WER seniority. None of these variables had significant effects. For the sake of parsimony and clarity, we therefore ran the resulting analysis without these two variables. We do not control for educational background or branch, since WERs can be elected from any occupational group in a workplace in an organisation where many kinds of professionals work together. (A workplace may consist mainly of, say, administrators with a secretary as the WER, even if the overall statistical categorization of their formal organisation is ‘a hospital’.)

We then performed a series of regression analysis to estimate the effects of each of ‘the most likely conducive mechanisms’ (step 1), all of them (step 2) and all of them after additional inclusion of relevant control factors (step 3) upon our outcome (the composite use index). We did this to avoid the risk of overstating the effect of any mechanism in isolation.

Results

The prevalence of the six most likely mechanisms

If 100 per cent is seen taken as a benchmark which signifies a perfect implementation context, the 78.2 per cent for action plans may seem to be relatively high at first sight, but far from perfect given the fact that action plans are required by legislation. The remaining 21.8 per cent consists of 13.2 per cent respondents who claim there is no action plan and 8.6 per cent who do not know whether such a plan exists. While only the former are indicative of a lack of compliance with the legislation, the latter indicates a lack of effect, since WERs cannot use an action plan if they do not know whether it exists.

Management support scores are relatively high (77.1). Above a score of 60, we find the integration of WPAs into strategy and other overall forms of organisational development, the ability of the WPA to give a reliable picture and employee engagement. The lowest score (53.1) is achieved by local influence (of WERs, their immediate managers and their colleagues).

Table 2 suggests there is room for improvement, both in terms of fuller legal compliance and other mechanisms.

Table 2.

Prevalence of six most likely mechanisms.

	M	SD	%
Local influence (score of 0–100)	53.1	33.9
Employee engagement (score of 0–100)	60.6	21.9
Reliable picture (score of 0–100)	61.3	21.6
Management support (score of 0–100)	77.1	20.7
Action Plan (% yes)			78.2
Strategy (% yes)			67.0

SD: standard deviation.

Next, the correlation between most likely conducive mechanisms and outcomes is analysed.

The separate impact of each mechanism

Each of the mechanisms taken separately has a strong and significant correlation with our composite use index. See Table 3.

Table 3.

Summary of simple regression analyses for each of the six mechanisms predicting the use of WPAs.

Mechanisms	Unstandardized coefficients		Standardised coefficient	R ²	F	N ^a
Mechanisms	B	SE B	β	R ²	F	N ^a
Local influence	0.15	0.01	.30***	.09	128.67***	1328
Employee engagement	0.25	0.02	.32***	.10	181.84***	1635
Reliable picture	0.29	0.02	.35***	.12	224.56***	1630
Management support	0.38	0.02	.45***	.20	409.79***	1599
Action plan	13.3	1.10	.28***	.08	144.73***	1666
Strategy	14.9	1.00	.39***	.15	217.94***	1222

WPA: workplace assessment; SE: standard error.

Number of cases included in analysis.

***p < .001.

Before drawing strong conclusions about each of them, we now bring them together into one regression model. See Supplemental Appendix B for technical details.

The effect of six mechanisms on the use of WPAs

Again, each and every mechanism has a strong and significant correlation with the use index. The fact that regression coefficients are significantly lower in the combined model indicates an overlap between the various mechanisms in practice. This makes interpretive sense. For example, if there is no management support and no local influence, an action plan might not be made and employee engagement might subsequently wither. While overlapping mechanisms in an organisation are hardly surprising, the fact that each of the mechanisms remains significantly associated with our use index – also after inclusion in a common regression model – helps sustain our belief in each of the mechanisms as conducive to putting WPAs to use.

Finally, let us see whether these mechanisms also work after integration into a regression model with control factors. We do this, again, to reduce the risk that we have taken the empirical signs for more than they are, thereby misunderstanding or misrepresenting mechanisms.

The effect of all mechanisms on WPA use after inclusion of control factors

Going from Tables 4 to 5, there are relatively small changes in the regression coefficients. The most likely mechanisms remain robust after introducing control factors.

Table 4.

Summary of regression analysis for model including all six mechanisms.^a

Mechanisms	Unstandardized coefficients		Standardised coefficient	N
Mechanisms	B	SE B	β	N
Local influence	0.08	0.01	.15***	1328
Employee engagement	0.09	0.02	.11***	1635
Reliable picture	0.11	0.02	.14***	1630
Management support	0.20	0.03	.23***	1599
Action plan	6.57	1.10	.16***	1666
Strategy	6.68	1.05	.18***	1222
R ²		.35
F		90.96***

SE: standard error.

Results presented are based on pairwise deletion of cases. When the analysis is replicated using listwise deletion (n = 931), our model, F(6, 924) = 89.25, p < .001, R² = .37, and β values remain significant at p < .001, except for employee engagement where p < .01.

***

p < .001.

Table 5.

Summary of regression analysis for model including all six mechanisms and control factors.^a

	Unstandardized coefficients		Standardised coefficient	n
	B	SE B	β	n
Mechanisms
Local influence	0.08	0.02	.16***	1328
Employee engagement	0.08	0.02	.10**	1635
Reliable picture	0.08	0.03	.10**	1630
Management support	0.22	0.03	.26***	1599
Action plan	4.76	1.21	.11***	1666
Strategy	5.54	1.16	.15***	1222
Controls
Acuteness of problems	0.02	0.02	.03	1629
Conflict	− 0.02	0.02	− .03	1652
Overall organisational influence	0.00	0.02	.00	1033
Influence from external consultants	0.03	0.01	.07*	1090
Non-routine vs Routine methods	1.80	1.00	.05	1668
Anonymous data	0.22	1.29	.01	1392
New Knowledge	0.14	0.02	.16***	1648
Documentation	0.11	0.02	.15***	1618
R ²		.41
F		39.64***

SE: standard error.

Results presented are based on pairwise deletion of cases. When the analysis is replicated using listwise deletion (N = 468), our model, F(14, 453) = 23.75, p < .001, R² = .42, remains significant at p < .001. The β values for all our mechanisms, except reliable picture (p = .28), remain significant.

p < .05. **p < .01. ***p < .001.

Discussion

In the following, we consider counterarguments and caveats. Then we discuss the most important new findings.

No statement is made about the effectiveness of the intervention as compared to a similar situation with no intervention. We cannot quantify the exact effect of the intervention as compared to no intervention (Hawkins, 2016). If that criterion is held strictly, however, then no evidence will be produced, because similar legislation applies throughout Denmark; in fact, throughout the EU. However, a quantification of the strengths and weakness of an intervention based on within-intervention variations is also a useful contribution.

Theorising about ‘most likely mechanisms’ has an imaginative element in it (Astbury and Leeuw, 2010: 374) and cannot be done on an entirely objective basis. While that is correct, the identification of these mechanisms is based on logical argumentation about the philosophy of the intervention (which also must be articulated) and with existing knowledge provided by the literature. Empirical signs of claimed mechanisms do not necessarily stand in a one-to-one relationship with distinct mechanisms. Contexts host mechanisms even if the mapping of data to concepts is less than perfect (Ford et al., 2018).

The measured outcomes in the study are broadly linked to interventions in the physical and psychosocial domain and to the perceived usefulness of the WPAs, but not to the long-term amelioration of specific problems in the work environment. Actions taken constitute an important dimension in our outcome measure; taking action is an important precondition for the amelioration of problems, even if these problems are determined by each workplace in its own way.

Our study design does not allow us to capture long-term systemic effects, for example, where the experiences gained from one iteration of the WPA process contribute to capacity-building that is useful for the next round (Jagosh et al., 2015).

In the absence of absolute benchmarks, the prevalence of each mechanism scaled from 0 to 100 depends on how our survey scales are calibrated. We have used scales ranging from 0 to 100. After the inclusion of control factors, correlations between mechanisms and outcomes are robust, which increases our trust in our most likely mechanisms as reasonable approximations of real regularities (even if our exact estimated effect sizes hinge on the specification of our variables and the assumptions of linearity embodied in our regression model).

Our data may suffer from selection biases because only OSH-interested trade unions have participated and only WERs with a particular interest in OSH have decided to participate in our survey. However, this interest can be loaded with both positive and negative views about WPAs and/or the work environment. Furthermore, selection bias may be less of a threat regarding causal links as compared to the determination of the prevalence of phenomena (Cheung et al., 2017).

Our data suffer from limitations related to the views of the WERs and their self-reported data. They may be unaware of certain forms of use of WPAs. They may also exaggerate particularly negative views of problems or positive views about themselves and their role. There is a particular threat from common method bias regarding correlations (Podsakoff et al., 2003), if survey data inflate both mechanisms and outcomes. Subjective views represent mental models and organisational cultures which are sometimes themselves active players in the work with organisational interventions (Nielsen and Randall, 2013: 613). In that light, the threat from subjective data and common method bias is less pronounced.

All things considered and given that existing reviews and evaluations have provided limited evidence about the effectiveness of OHSMSs and elements herein (e.g. WPAs), an overdose of critique may once again result in ‘no evidence’. Instead, we have produced ‘plausible judgments of effectiveness’ (Stern et al., 2012: 14) by explicating six reasonable most likely conducive mechanisms and demonstrating that all of them are correlated with desired outcomes after control for potential confounders. These findings sustain the credibility of the philosophy undergirding the legislation on WPAs.

The use of WPAs, however, is inhibited by the fact that the prevalence of these mechanisms ranges from 53 to 78 on comparable 0 to 100 scales. These figures suggest that there is room for improvement.

Among our control variables, whether a WPA helps document existing problems as well as whether it helps detect hitherto unknown problems are both strongly correlated with the use of WPAs. New knowledge might be better produced under collaborative circumstances, whereas the WPA is used for documentation when the work environment is more politicised. It is up to future studies to show how these causal pathways for soft regulation to work are contingent on contextual circumstances. If the WPA works in different ways depending on the degree of collaboration/conflict, it may support new theorising about how consultants and internal change agents can adapt the WPA process to the individual workplace.

The lack of correlation between WPA use and the acuteness of problems is particularly thought-provoking. Since risk assessment is up to each individual employer, there is no mechanism to ensure that WPAs make the biggest difference in situations where the OSH problems are more acute. The existing legislation about WPAs provides no tool to gauge the scale of problems, neither in absolute nor in comparative terms. This soft, process-oriented form of self-regulation operates in the same way regardless of whether problems are big or small. This may be a built-in weakness in the present legislation on WPAs.

Conclusion

There is currently a lack of evidence about the impact of WPAs (as part of Framework Directive 89/391). As long as it is expected that acceptable evidence can only be produced based on comparing otherwise similar situations with and without the intervention, these expectations are unlikely to be met since the same legislation applies to all workplaces in the EU.

Alternatively, we have proposed a theory-based evaluation that articulates mechanisms in combination with a design that exploits within-intervention variations. This approach was used in a large-N study of WPAs in the public sector in Denmark, making it possible to answer relevant questions about impact in the following way, based on the list of questions presented earlier in Table 1:

Is it possible to identify reasonable mechanisms consistent with the philosophy of the intervention that makes it likely that the intervention works? Yes, local influence on the WPA process, employee engagement, management support, provision of an action plan, incorporation of WPAs into an organisational strategy and the ability of the WPA to provide a reliable picture of the working environment are logical ingredients in a philosophy emphasising organisational self-regulation in the workplace.

To what extent are these mechanisms in place in the target population? Measured on 0 to 100 scales, their prevalence ranges from 53.1 to 78.2. There is room for improvement, in particular regarding the least prevalent (local influence). However, the most prevalent one (awareness of an existing action plan) at 78.2 per cent is also far from perfect. Since action plans are legally required, it would be legitimate to expect a figure much closer to 100 per cent.

Are these mechanisms correlated with outcomes? Yes, these mechanisms are all correlated with WPA usage, independently as well as together in one analysis. Their effect on outcome is robust after the introduction of control variables.

What are the strengths and weaknesses of the intervention? The most likely conducive mechanisms consistent with the philosophy of the intervention were identified. They correlated strongly with the desired outcome, which counts as a strength. However, the use of WPAs is inhibited by the fact that the prevalence of these mechanisms ranges from 53 to 78 on comparable 0‒100 scales. These figures suggest a potential for improvement, for example, with regard to local influence, employee engagement and the ability of the WPA to produce a reliable picture. Awareness of action plans is at 78.2 per cent. The making of action plans and awareness of the existing plans can both be improved.

Another major weakness with the existing form of WPAs is how WPA usage is not significantly correlated with the acuteness of workplace problems. The lack of connection between the scale of problems across workplaces and the follow-up actions taken is a built-in weakness in the present legislation.

Finally, the introduction of control variables revealed that the WPAs can function as documentation of existing problems, but also as identification of hitherto unknown problems. These findings deserve to be theorised further in future research, presumably in the form of different mechanisms under different degrees of cooperation and conflict.

Although our findings can be debated, they remain more informative than many studies claiming that there is no evidence at all. Based on an understanding of the intervention, we have identified most likely conducive mechanism and shown how well each of them work, thereby suggesting parameters for how well the intervention works in the context at hand. While our empirical results cannot be generalised, the list of mechanisms and the approach can be used to determine how and how well a similar intervention works in other contexts. Further evaluative research can improve the sharpness of the logic focusing on most likely mechanisms of OHS interventions and the methodological tools to capture these mechanisms.

Policy recommendations

If policymakers subscribe to a belief that without a control group not subject to legislation, the impact of legislation cannot be assessed, no convincing evidence will be produced as long as the legislation is universal for all within a given jurisdiction. The consideration of alternatives such as theory-based evaluation with a focus on most likely conducive mechanisms in combination with designs that capture within-intervention variations is therefore recommended.

Admittedly, theory-based evaluation is an interpretive exercise as much as an empirical one (Astbury and Leeuw, 2010). An approach based on most likely conducive mechanisms may therefore be subject to criticism, especially in environments where the intervention is politically contested. Mechanism-oriented approaches suggest an interaction between an intervention and its context, which makes it difficult to place clear responsibility for the effectiveness of the intervention on either of these sides. This may make it difficult to apply this approach in political contexts with strong emphasis on accountability (Pattyn, 2019). Policymakers should consider whether it might be better to have ‘plausible judgments of effectiveness’ (Stern et al., 2012) rather than to generally lament a lack of evidence. Even in the absence of a counterfactual benchmark provided by a control group, it is possible to make meaningful assessments of how and how well the WPAs work. A focus on mechanisms and within-intervention variations may not only help with gauging the impact of policies (given the prevalence of most likely conducive mechanisms and their correlation with outcomes), but also support policy learning and readjustment.

For example, we know from our analysis that making an action plan is correlated with outcomes, but action plan awareness is at 78.2 per cent. In the public debate in Denmark, we have recommended a simple way to ameliorate problems related to both the production of evaluative data about WPAs and the lack of awareness about them (Dahler-Larsen and Sundby, 2019). If WPAs including action plans are published, their existence can easily be checked by managers, employees, inspectors and WERs themselves. It is paradoxical that an intervention which makes the production of evaluative information (WPAs) legally required at the same time allows this evaluative information to be kept in-house in the workplace. Publication would increase the impact of the legislation, increase public insight in the risk assessment process (Schmidt, 2012) and make it much easier to carry out evaluations of WPAs in the future.

In making the argument in favour of publication of WPAs including action plans, it was of course useful to know that 78.2 per cent is not enough to indicate compliance with the law. Even more importantly, we also showed that the mechanism called ‘action plan awareness’ is significantly correlated with use of WPAs. Thus, knowing that action plans contribute to how WPAs work, and knowing how well this mechanism works backed up by quantitative data, is a clear advantage. Media picked up the issue and made an opinion poll showing that a majority of Danes favour publication of WPAs (53% yes vs 23% no) (Kyst, 2019). The debate continues at conferences and in publications. The issue is contested, since many employers do not like the idea, and member states may fear that, as a consequence, potential lack of compliance with EU legislation in Member States may become more visible (Smismans, 2003: 70). Our example shows that in the absence of a classical impact evaluation with a control group, a theory-based evaluation based on within-programme variations which assesses and discusses mechanisms conducive to the use of WPAs with back-up of quantitative data can in fact be used to create debate about key aspects of the existing policy, at least at the national level.

While an assessment of the effect of the legislation against a potential control group without this legislation continues to be unavailable, fortunately there are other aspects of how and how well the legislation works that are relevant for a variety of stakeholders. We therefore recommend considering theory-based evaluation in combination with analyses of contextual variations to bring together answers to the questions of how interventions work and how well they work.

Supplemental Material

sj-docx-1-evi-10.1177_1356389020980469 – Supplemental material for How and how well do workplace assessments work? Using contextual variations in a theory-based evaluation with a large N

Supplemental material, sj-docx-1-evi-10.1177_1356389020980469 for How and how well do workplace assessments work? Using contextual variations in a theory-based evaluation with a large N by Peter Dahler-Larsen, Anna Sundby and Adiilah Boodhoo in Evaluation

Footnotes

Declaration of conflicting interest

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This study received funding from the Danish Working Environment Research Fund.

ORCID iDs

Peter Dahler-Larsen

Adiilah Boodhoo

Supplemental material

Supplemental material for this article is available online.

Notes

Peter Dahler-Larsen (PhD, DrScientPol) is Professor at the Department of Political Science, University of Copenhagen, where he is Leader of CREME (Center for Research on Evaluation, Measurement and Effect). He is the author of The Evaluation Society (Stanford University Press, 2013).

Anna Sundby holds a PhD in Clinical Medicine from Aarhus University. She is currently involved in the Responsible Research and Innovation Work Package of the European Union (EU)-flagship Human Brain Project.

Adiilah Boodhoo (PhD) is Senior Lecturer in the Section of Organisational Psychology, University of Cape Town (UCT). She is also a MEL advisor at the Institute for Monitoring and Evaluation, UCT.

References

Astbury

Leeuw

(2010) Unpacking Black Boxes: Mechanisms and theory building in evaluation. American Journal of Evaluation 31(3): 363–81.

Cartwright

(2007) Hunting Causes and Using them. Cambridge: Cambridge University Press.

Cheung

ten Klooster

Smit

, et al. (2017) The impact of non-response bias due to sampling in public health studies: A comparison of voluntary versus mandatory recruitment in a Dutch national survey on adolescent health. BMC Public Health 17: 276.

Coldwell

(2019) Reconsidering context: Six underlying features of context to improve learning from evaluation. Evaluation 25(1): 99–117.

Cousins

Whitmore

(2004) Framing participatory evaluation. New Directions for Evaluation 1998(80): 5–23.

Cox

Karanika

Griffiths

, et al. (2007) Evaluating organizational level work stress interventions: Beyond traditional methods. Work & Stress 21(4): 348–62.

Da Silva

SLC

Amaral

(2019) Critical factors of success and barriers to the implementation of occupational health and safety management systems: A systematic review of the literature. Safety Science 117: 123–32.

Dahler-Larsen

Sundby

(2019) Arbejdspladsvurderinger. Odense: Syddansk Universitetsforlag.

Dalkin

Greenhalgh

Jones

, et al. (2015) What’s in a mechanism? Development of a key concept in realist evaluation. Implementation Science 10: 49.

10.

De Souza

(2013) Elaborating the Context-Mechanism-Outcome configuration (CMOc) in realist evaluation: A critical realist perspective. Evaluation 19(2): 141–54.

11.

Dunlop

Radaelli

(2016) Handbook of regulatory impact assessment. Edward Elgar Publishing. Available at: https://OSHa.europa.eu/en/legislation/directives/the-OSH-framework-directive/1 (accessed 18 January 2018).

12.

European Agency for Safety and Health at Work (1989) Directive 89/391/EEC – OSH “Framework Directive”. Available at: https://OSHa.europa.eu/en/legislation/directives/the-OSH-framework-directive/1 (accessed 18 January 2018).

13.

European Commission (2017) Ex-Post Evaluation of the European Union Occupational Safety and Health Directives. Brussels: European Commission.

14.

Fontaine

(2020) The contribution of policy design to realist evaluation. Evaluation 26: 296–314.

15.

Ford

Jones

Wong

, et al. (2018) Access to primary care for socioeconomically disadvantaged older people in rural areas: Exploring realist theory using structural equation modelling in a linked dataset. BMC Medical Research Methodology 18: 57.

16.

Jagosh

Bush

Salsberg

, et al. (2015) A realist evaluation of community-based participatory research: Partnership synergy, trust building and related ripple effects. BMC Public Health 15(1): 725.

17.

Jamal

Fletcher

Shackleton

, et al. (2015) The three stages of building and testing mid-level theories in a realist RCT: A theoretical and methodological case-example. Trials 16(1): 466.

18.

Greene

(2005) Context. In: Mathison

(ed.) Encyclopedia of Evaluation. Thousand Oaks, CA: SAGE, 82–4.

19.

Griffiths

(1999) Organizational interventions: Facing the limits of the natural science paradigm. Scandinavian Journal of Work, Environment & Health 25(6): 589–96.

20.

Harris

Henderson

Wink

(2019) Mobilising Q methodology within a realist evaluation: Lessons from an empirical study. Evaluation 25(4): 430–48.

21.

Hasle

Zwetsloot

(2011) Editorial: Occupational health and safety management systems: Issues and challenges. Safety Science 49(7): 961–63.

22.

Hasle

Limborg

Nielsen

(2014) Working environment interventions – Bridging the gap between policy instruments and practice. Safety Science 68: 73–80.

23.

Hawkins

(2016) Realist evaluation and randomised controlled trials for testing program theory in complex social systems. Evaluation 22(3): 270–85.

24.

Helbo

Hohnen

Hasle

(2016) Internal audits of psychosocial risks at workplaces with certified OHS management systems. Safety Science 84: 201–9.

25.

Hohnen

Hasle

(2011) Making work environment auditable: A ‘critical case’ study of certified occupational health and safety management systems in Denmark. Safety Science 49(7): 1022–9.

26.

Kroll

(2015) Drivers of performance information use: Systematic literature review and directions for future research. Public Performance & Management Review 38(3): 459–86.

27.

Kyst

(2019) Ny måling: Flertal af danskere bakker op om offentlige APV’er. A4 Arbejdsmiljø [New poll: A majority of Danes support public WPAs]. 20 December. Available at: https://www.a4arbejdsmiljoe.dk/artikel/danskerne-enige-med-professor-apv-er-skal-vaere-offentlige

28.

Lascoumes

Le Galès

(2007) Introduction: Understanding public policy through its instruments: From the nature of instruments to the sociology of public policy instrumentation. Governance 20(1): 1–21.

29.

Läubli Loud

Mayne

(2014) Enhancing Evaluation Use: Insights from Internal Evaluation Units. Thousand Oaks, CA: SAGE.

30.

Ledermann

(2012) Exploring the necessary conditions for evaluation use in program change. American Journal of Evaluation 33(2): 159–78.

31.

Marchal

Westhorp

Wong

, et al. (2013) Realist RCTs of complex interventions – An oxymoron. Social Science & Medicine 94: 124–128.

32.

Nielsen

(2000) Organization theories implicit in various approaches to OHS management. In: Frick

Jensen

Quinlan

, et al. (eds) Systematic Occupational Health and Safety Management Perspectives on an International Development. Oxford: Pergamon Press, 99–124.

33.

Nielsen

Randall

(2013) Opening the black box: Presenting a model for evaluating organizational-level interventions. European Journal of Work and Organizational Psychology 22(5): 601–17.

34.

Øvretveit

(2011) Understanding the conditions for improvement: Research to discover which context influences affect improvement success. BMJ Quality & Safety 20(Suppl. 1): i18–i23.

35.

Pattyn

(2019) Towards appropriate impact evaluation methods. The European Journal of Development Research 31: 174–9.

36.

Pawson

(2013) The Science of Evaluation: A Realist Manifesto. London: SAGE.

37.

Pawson

Tilley

(1997) Realistic Evaluation. London: SAGE.

38.

Pedersen

Nielsen

Kines

(2012) Realistic evaluation as a new way to design and evaluate occupational safety interventions. Safety Science 50: 48–54.

39.

Podsakoff

MacKenzie

Lee

J-Y

(2003) Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology 88(5): 879–903.

40.

Porter

(2015) The uncritical realism of realist evaluation. Evaluation 21(1): 65–82.

41.

Radaelli

Meuwese

(2010) Hard questions, hard solutions: Proceduralisation through impact assessment in the EU. West European Politics 33(1): 136–53.

42.

Ravn

(2019) Testing mechanisms in large-N realistic evaluations. Evaluation 25(7): 171–88.

43.

Robson

Clarke

Cullen

, et al. (2007) The effectiveness of occupational health and safety management system interventions: A systematic review. Safety Science 45(3): 329–53.

44.

Schmidt

(2012) Democracy and legitimacy in the European Union revisited: Output, input and throughput. Political Studies 13(61): 2–22.

45.

Schwandt

(2015) Evaluation Foundations Revisited: Cultivating a Life of the Mind for Practice. Stanford: Stanford University Press.

46.

Smismans

(2003) Towards a new community strategy on health and safety at work? Caught in the institutional web of soft procedures. The International Journal of Comparative Labour Law and Industrial Relations 19(1): 55–84.

47.

Stame

(2010) What doesn’t work? Three failures, many answers. Evaluation 16(4): 371–87.

48.

Stern

Stame

Mayne

, et al. (2012) Broadening the range of designs and methods for impact evaluations. Report of a Study Commissioned by the Department for International Development, Working Paper 38. Department for International Development. Available at: https://www.oecd.org/derec/50399683.pdf (accessed 18 January 2019).

49.

Valovirta

(2002) Evaluation utilization as argumentation. Evaluation 8(1): 60–80.

50.

Van Belle

Wong

Westhorp

, et al. (2016) Can “realist” randomised controlled trials be genuinely realist? Trials 17(1): 313.

51.

Wauters

Beach

(2018) Process tracing and congruence analysis to support theory-based impact evaluation. Evaluation 24(3): 284–305.

52.

Weiss

(1997) Theory-based evaluation: Past, present, and future. New Directions for Evaluation 76: 41–55.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.04 MB