Editorial

Abstract

Performance measurement, a key component of performance management, can itself be seen as a variant of evaluation as it sets out to benchmark and calibrate public management and the outcomes of public-policy decision making. Those who measure the performance of the public-policy system certainly share the rational and instrumental assumptions of some schools of evaluation: that theirs is an objective, technical and apolitical enterprise. Bente Bjørnholt and Flemming Larsen on the other hand are interested in exploring ‘often hidden aspects of performance measurement’ by ‘conceptualising the political consequences’ in practice. According to these authors, performance measurement is not ‘politically neutral’. It ‘also affects the political decisions made, what kinds of political decisions are conceivable and how they are implemented’. This then is a framework-building article that gains extra traction by drawing on the evaluation utilization literature. It asks the kinds of utilization-related questions that evaluators are familiar with: ‘who are the main users?’, ‘which parts of the evaluation are being used?’ and ‘when’ and ‘how’ are these measurements being used?’ Through this lens Bjørnholt and Larsen consider both ‘goal setting’ and ‘implementation’ and develop a conceptualization of performance measurement that identifies critiques of the way performance measurement is used as well as mapping out the shape of future research.

James Copestake takes us back to some of the thorny issues in impact evaluation that have often featured in this journal. He sets out to take the voice of stakeholders seriously whilst at the same time avoiding the risks of confirmatory bias. Although the article addresses international development issues of poverty reduction, wellbeing and food security, it speaks to generic methodological and ethical concerns common to many impact evaluations. Copestake seeks to reconcile demands for ‘upwards accountability’ with demands that in socio-economic programmes the voice of beneficiaries needs to be heard if evaluative judgements are to be legitimate. He suggests that by adopting an ‘exploratory’ rather than a ‘confirmatory’ approach ‘explicitly limiting prior theorization on the part of the researcher’ risks of ‘pro-project bias’ can be avoided. The author is interested to ‘attribute outcomes to a specific intervention’ which together with some aspects of the designs and techniques used in his exemplary case, would be commonly regarded as ‘positivist’. However, Copestake suggests that a positivist approach ‘can be nested in broader (including interpretive) approaches’. The author is obviously familiar with a wide range of evaluation designs and methods (hypothetico-deductive, realist, qualitative and quantitative etc.) and draws on many of them; but interestingly does not speak to any single ‘brand’. It would be encouraging to see whether the occasional articles that nowadays take this stance, foreshadow a new generation of impact evaluations that combine evaluation designs in a pragmatic yet theoretically sound way and combine the strengths of different paradigms to draw plausible causal inference.

Steven Højlund considers evaluation use in the European Commission ‘by focusing on the evaluation system understood as the institutionalization of evaluation practice’. (This follows on from an earlier article in Evaluation 20.1 in which Højlund argued for an institutional-theory perspective on evaluation use.) The author suggests that the evaluation system adopted in the European Commission shapes evaluation use. He develops his case by an analysis of the evaluation of the LIFE programme – which has been evaluated over an extended period – in terms of familiar evaluation use categories: instrumental, strategic, legitimizing and process use etc. Højlund points out that the well-elaborated system in place in the Commission was introduced by policy makers in the European Parliament and in EU Member State governments. Nonetheless he concludes ‘that most use takes place at the level of programme management’ not at policy level. Attentive readers will see an interesting consistency between this article and that of Bjørnholt and Larsen who suggest that performance measurement is associated with ‘de-politicization’. Højlund also suggests that one consequence of the way evaluation functions in the European Commission is de-politicization, unsurprising perhaps given the strong performance management/measurement character of the Commissions evaluation system. This de-politicization according to this author goes some way to explaining why evaluations are ‘only rarely used directly for policy making’. Whether this needs to be so is unclear: the author explains this de-politicization partly in terms of the misalignment of the timing of evaluation and policy-cycles, surely not an insurmountable barrier! Read in conjunction with his earlier article Højlund makes a convincing case for reading across from research into ‘evaluation systems’ through the lens of institutional theory and the ongoing puzzle of evaluation (non) use.

Jan Van Ongevalle, Huib Huyse and Peter Van Petegem report on a theoretically and methodologically rich action research project with development NGOs based in Belgium and Holland working with partners in developing countries. As with other articles in this issue the authors are concerned with the world of ‘results’ and ‘performance’. However, the focus here is crucially on the demands of complexity. The taken-for-granted linear logic that informs most results and performance approaches does not match the complex realities that these NGOs face. The subject of this action research then, was discovering ‘performance monitoring and evaluation’ (PME) approaches that can deal with ‘complex processes of social change’. An analytic and normative framework derived from complexity thinking was designed around four topics: ‘dealing with multi-stakeholder situations; learning from unexpected and intangible results; adaptive capacity; and accountability’. The action research process ‘allowed the organizations who participated in the action research to try out specific “actor-focussed” PME approaches, and adjust how they were implemented according to lessons learned along the way’. This testing involved both a repertoire of technical solutions – ‘outcome mapping (OM), most significant change (MSC), client satisfaction tools (CSI), Sensemaker and “personal goal” exercises’ and an elaborate programme of reflection and sharing what was being learned. Although approaching matters from a somewhat different epistemological position, this article connects with others in this issue. For example, like Copestake, Van Ongevalle, Huyse and Van Petegem want to ensure that the methods they use do not ‘predetermine’ the responses of participants. It is also encouraging to see evaluators following an action research tradition systematically exploring such a broad set of innovative methods such as outcome mapping, Sensemaker and Most Significant Change. The project was undoubtedly strengthened by its openness to this variety of different technical ‘brands’.

We like to encourage debate among our readers and we have the beginnings of one in this issue from Kim Forss and Claus Rebien! They are responding to Steffen Bohni Nielsen and Ditte Marie Winther who in the last issue of Evaluation raised the question as to whether there was such a thing as a Nordic evaluation tradition. Based on their analysis of evaluation journal articles by authors from Norway, Sweden, Denmark and Finland, and reinforced by comparison with evaluation publications from the Netherlands, Nielsen and Winther concluded that they could find no evidence for such a tradition. They did advocate more research on the subject, however, and Forss and Rebien rise to this challenge. Provocatively, Forss and Rebien ask whether evaluation is a ‘context-free pursuit, to be implemented similarly in Canada, the USA, the UK, France, Italy or Finland? Or are there reasons to believe that there are systematic differences that have to do with political-social-economic-cultural contexts?’ Nor do they confine their questions to Scandinavian countries. Forss and Rebien are interested in ‘implications for the general understanding of how evaluation is institutionalized in different countries and regions’. Whilst sketching out a more extended investigation into the subject, they also challenge evaluators from other regions and countries to undertake similar research. At this moment when a consensus about the inevitability of globalization is also being questioned, it is timely for us in the evaluation community to consider ourselves in that context. More contributions to this debate would be welcomed.

The boundaries between randomized controlled trials and theory-based evaluation designs have until quite recently appeared relatively impermeable. We have of course had a few instances of boundary breaches (see, for example, Blamey, MacMillan, Fitzsimons, Shaw and Mutrie in Issue 19.1, 2013 of this journal) but integrated designs remain rare. That is why the article by Susan Nayiga, Deborah DiLiberto, Lilian Taaka, Christine Nabirye, Ane Haaland, Sarah Staedke and Clare Chandler is noteworthy. The authors report on one aspect of PRIME, a major initiative to improve healthcare in rural Uganda. The object of evaluation was the introduction of ‘patient centred services’ and their effects on health-worker communication. The theory-led element of this evaluation focusing on the effects of patient-centred training for health workers was tested as part of a trial of a broader multi-component programme; it was ‘nested’ in a ‘wider cluster randomized trial’. Results of the trial were positive: they showed ‘patient-centred communication was rated 10 percent higher (p < 0.008) by care seekers consulting with health workers who had recently participated in the PRIME intervention.’ The authors recognize the limitations of their approach. It is ‘a compromise between outcomes-based and process-based perspectives’ which led to an essentially positivist position on ‘theory’ making it ‘hard to connect with and allow emergence of perspectives that recognize values such as “patient centredness” as socially constructed and therefore differing in meaning for different actors’. Nayiga and colleagues also acknowledge the evaluation needs to move beyond ‘relatively small differences observable through measurable phenomena . . .’. This is after all only testing ‘one hypothesized mechanism of effect in the intervention’s intended pathway of change’. Despite these reservations and limitations within the scope of their overall design, the evaluation team has worked hard to take account of the many criticisms of designs that rely only on experiments or only on interpretive case-studies. Acknowledgement that often these elaborate and costly evaluations can end up measuring small differences, within one mechanism in a far more complex policy setting is also refreshing.

In any era some evaluation approaches are favoured, promoted and legitimized and others marginalized. How this happens and through what mechanisms is fascinating. What we can say is that such choices are rarely made for methodological reasons alone. David Rutkowski and Jason Sparks consider this process with regard to the emergence of impact evaluation as a favoured mode of evaluation in international development. Rutkowski and Sparks see impact evaluation as part of a results-based logic, as central to notions of governance in the international development sphere – a case often made at a national level in relation to education and healthcare governance. (This article does not address a similar process of impact evaluation prioritization in national/domestic evaluations – a holistic analysis and theorization of this process would be most welcome.) The authors analyse key policy documents of three important international networks: the High Level Forum on Development Effectiveness (a network of ministers and multilateral development actors that meets around key conferences and decisions); the OECDs’ Development Assistance Committee and its working groups; and the ‘Network of Networks for Impact Evaluation’ (NONIE). These networks variously advocate results-based management; new forms of accountability for development; capacity building to support results management and measurement; and the superiority of impact evaluations conducted along experimental lines. In analysing these developments the authors draw on theories of ‘complex multilateralism’ and global governance. They also take us back to ideas of Bjørnholt and Larsen. ‘Governance by evaluation (in this context) institutionalizes “what matters” by developing/instituting a catholic evaluation system that defines how success should be measured and reported in local practice.’ This combined with the emergence of multilateral networks is not a million miles away, I would suggest, from the ‘de-politicization’ of policy making that Bjørnholt and Larsen highlight in their article; and which is a recurring theme in this issue.

Elliot Stern