Abstract
In the 1990s, the European Commission initiated the MEANS programme of evaluation guidance for socio-economic programmes, primarily for those co-financed with the Structural Funds. This initiative developed an intervention logic that has remained in place ever since. The Directorate General for Regional and Urban Policy has recently been taking a fresh look at the logical framework. We have examined it drawing on experiences from three programming periods: from the perspective of an intensive ex post evaluation of the 2000–06 programming period; from the perspective of reporting on the ongoing performance of current programmes; and from the perspective of designing a policy for 2014–20 with a stronger result orientation. The conclusion of this work is that our intervention logic was never entirely clear. We cannot in practice distinguish between a short-term direct effect (result) and a longer-term, indirect effect (impact). We have never actually measured impacts defined like this. With the increasing focus on outcomes in the international literature and the developments concerning the evaluation of impact – defined as the ‘change that can credibly be attributed to an intervention’, we realize that we need to clarify our intervention logic. This article outlines the experiences of the Directorate General for Regional and Urban Policy and its proposals for a re-articulation of the logic of our interventions and the terminology we use in this regard.
Background
When Structural Funds were reformed in 1988 and again in 1993, evaluation gained a prominence it had not had before. With multi-annual programmes, Structural Funds became more a policy and less of a financial instrument. The role of evaluation was not only to evaluate results and impacts, but to contribute to the design and implementation of better programmes that would deliver economic and social (and later territorial) cohesion.
In this context, in 1995 the European Commission established the MEANS programme: Means for Evaluating Actions of a Structural Nature. Supported by a group of independent experts, the programme culminated in 1999 in a six volume set of handbooks on monitoring and evaluation approaches and techniques. MEANS was a valuable resource and contains much guidance that is still relevant. It was succeeded by EVALSED – the online resource on evaluation guidance that is updated regularly (but needs another update now to reflect the thinking that is presented in this article).
Through MEANS, the logical framework for Structural Funds was put in place and remained essentially unchanged until 2009/10 (see Figure 1).

Logic of intervention: 1995–2010.
Representing state of the art thinking in 1995, we have more recently come to question the concepts of the logical framework and their usefulness in designing, implementing, monitoring and evaluating public policies. Has the logical framework worked in practice? Have all elements of the model been clear? This process of reflection was prompted by the experience of evaluating the 2000–06 Structural Funds programmes, of trying to summarize what the current 2007–13 programmes are delivering and of designing proposals for 2014–20. In parallel, the Directorate General for Regional and Urban Policy has engaged in debate with some leading evaluation experts on concepts for evaluation and monitoring of Structural Funds programmes. In addition, we have experimented with counterfactual impact evaluation methods in relation to area-based and enterprise-support measures and explored more rigorous methods in different intervention areas.
At the start of our process of reflection, we were conscious that the traditional logic model was never clear how the bottom-up inputs – outputs – results related to the top-down impacts – results – outputs – inputs. The particular challenge was always with ‘impact’. Can we observe an impact? Can it be captured by an indicator? Should all inputs relate in a linear fashion to an impact? How do we take account of other contributing factors? What is the difference between a result and an impact? The various elements of guidance in MEANS use terms in different ways, but in essence the suggestion seems to be that results are short-term direct effects, while impacts are longer-term indirect effects. But what about long-term direct effects? And finally, what are the respective roles of monitoring and evaluation related to different aspects of the logical framework?
Indicators for the 2000–06 programming period
In 1999 and 2000, the Commission put a significant effort into ensuring that indicators were built into programmes in a systematic way in line with the MEANS intervention logic. In 2003 all programmes were subject to a mid-term evaluation and in 2004 a performance reserve was allocated to programmes that were deemed to be most performing. In fact, the mid-term evaluation took place too early for many programmes and the process of allocating the performance reserve brought to our attention some of the weaknesses of the indicator systems. Some programmes at this early stage radically over-achieved their targets – highlighting the fact that identifying indicators and target setting were not reliable in many Member States. Many programmes used the process of the mid-term evaluation to improve their indicator systems.
In 2007, the European Commission launched the ex post evaluation of the 2000–06 Structural and Cohesion Fund programmes and projects. During the five years of this exercise and through the 21 evaluation contracts the Directorate General for Regional Policy designed and managed as part of the process, a number of findings emerged related to the quality and use of intervention logics in the design and implementation of the 2000–06 programming period.
At the time the ex post evaluation of the European Regional Development Fund (ERDF) was concluded – in early 2010 – we still did not have the final reports from the programmes. Monitoring data used in the ex post evaluation came from the Annual Implementation Reports from 2006. In 2011, in order to complete the information available, the Directorate General for Regional Policy launched a final exercise to extract the data from the Final Implementation Reports relating to the achievements of the programmes. 1
For the mainstream ERDF programmes (Objective 1 and 2), a total of 22,600 indicators were reported across 227 programmes, with an average of 106 indicators per programme, ranging from 25 in Denmark to 192 in Italy. While 25 indicators might be reasonable for relatively large programmes, those in Denmark were small. Over 100 indicators suggests a lot of counting, but also dispersed effort and a lack of concentration. Fifty-one percent of these indicators were classified as outputs; 30 percent were results; and 19 percent were impacts. For the purposes of this article, we take a closer look at the impact indicators, what they were, how they were defined and reported against.
When we look more closely at the 4985 impact indicators, we find that 854 (17%) had no values at all, for baseline, target or achievement. That leaves us with 4131 impact indicators with any quantified values. Of these:
94 percent had final achievements;
58 percent had targets;
6 percent had baselines;
55 percent had targets and achievements;
5 percent had baselines, targets and achievement; and finally
0.5 percent had baselines but no targets and no achievements.
Interestingly, the percentage of impact indicators with reported achievement values increased from 42 percent in the 2006 Annual Implementation Reports to 78 percent in the Final Reports, representing a serious amount of effort on the part of those responsible for programmes. But it is notable that so few of the impact indicators had baselines. How can we assess impacts if we have no idea of the starting point? While having a target and an achievement is useful, without the baseline we cannot assess the scale of the problem and the extent of the achievements reported.
The concept of baselines was implicitly referred to in the Structural Funds regulation for 2000–06 in the article on ex ante evaluation:
2
[The ex ante evaluation] shall assess the consistency of the strategy and targets selected . . . and the expected impact of the planned priorities for action, quantifying their specific targets in relation to the starting situation, where they lend themselves thereto.
There was also a section on baselines in the guidance on indicators issued by the Commission in 2000. However, as we see above, baselines featured in very few cases where they were most relevant – in identifying the aspects of a region or a sector on which the programmes aimed to have an impact.
Ten Member States reported baselines, targets and achievements for some of their impact indicators. Interestingly, four of these were Member States that joined the EU in 2004. Poland, for example, had a very clear impact indicator on reducing the number of fatalities on the roads – with a baseline, a target and an achievement. Of course, evaluation would be needed to find out the extent to which the projects co-financed (e.g. the building of motorways) contributed to the reduction.
Where we have targets and achievements, we can assess achievement ratios, although the reliability of such an exercise is undermined as we know that targets were often changed to align with actual performance. When we examine the impact indicators with targets and achievements reported, we find most in Italy, followed by Spain, the UK, Austria and then Germany. Austria is a surprise as it had fewer programmes and less resources than the other countries. However, we find Austria included numbers of projects having a positive or neutral impact on gender or environment or on rural areas as impacts – which seems of dubious value. When we look across most Member States, they reported jobs created or maintained – but with no baselines. Some specify that they report net new jobs – but this would be more meaningful with a baseline. Jobs safeguarded is another indicator that is difficult to define. Several Member States (e.g. Greece, Spain) report jobs created during construction – which by definition is not an impact.
Reporting only achievements is still less satisfactory as the achievements are not related either to the need or the objective.
None of this is to criticize the great effort that has gone into gathering and reporting the data. But we need to ask why we are gathering all this data. Much of the data are not meaningful and certainly do not represent the impact of the Structural Funds. Their purpose seems primarily to provide a figure that will unlock the final payment from the Structural Funds, rather than being an exercise in accountability and learning. There has been little or no public debate on these figures in any Member State. There must be scope to rationalize, streamline and focus, but also to clarify conceptually what we should count and what these figures mean.
A new interest in outcomes
As we were still deep within the experience of the ex post evaluation of the 2000–06 period, the process of planning for the new period 2014–20 began. The evaluators and we within the European Commission were frustrated that success in Cohesion Policy seemed to be defined by many in Member States as well as within the European Commission as absorption of funds. Reflection within the Directorate General for Regional and Urban Policy led to a shared view that there was a need for a decisive shift from a focus on absorption only to one driven by concerns for performance and results. But this would not be achieved through a continuation of current practice with a bunch of indicators of varying degrees of relevance included as the last element in the finalization of programmes. This was reflected in the conclusions of the 5th Cohesion Report (European Commission, 2010), adopted by the Commission in November 2010, which stated that the ‘impact of cohesion policy is difficult to measure’ and that:
The starting point for a results-oriented approach is ex ante setting of clear and measurable targets and outcome indicators. Indicators must be clearly interpretable, statistically validated, truly responsive and directly linked to policy intervention, and promptly collected and publicized.
The draft Regulations for the 2014–20 period were adopted by the Commission in October 2011 and discussions are underway in both the European Council and Parliament with a view to adoption in 2013. The results orientation is incorporated into the draft regulations and accompanied by a draft Guidance Paper developed by the Directorate General for Regional and Urban Policy. 3 This Guidance Paper on Concepts and Recommendations fundamentally reviews the intervention logic of Cohesion Policy. The new intervention logic is defined in Figure 2.

Outputs, results and impact in relation to programming, monitoring and evaluation.
Results orientation for future cohesion policy
The Directorate General for Regional and Urban Policy proposes a results orientation for future Cohesion Policy, as follows:
The specific objectives of future programmes (which are the pre-determined investment priorities defined in a regional or national context) must have a corresponding result indicator and a baseline;
The result indicator is a proxy for the intended change – we recognize that an indicator can never capture everything that happens;
Policy monitoring reports on the evolution of the result indicator and feeds debate on the evolution of the need the programme aims to tackle;
Evolution of the result indicator is a consequence of the policy action and other factors;
Therefore, the result indicator should be close enough to policy so that policy can have a discernible effect on the indicator;
Evaluation is required to disentangle the effects of the policy from those of other factors as well as exploring any unintended effects of the policy.
The implication of this approach is that we have dropped the traditional distinction between result and impact indicators. The notion of ‘results’ is similar to that of ‘outcomes’ often used in the literature. We use the term results since the translation of ‘outcomes’ in most EU languages uses the same word as for ‘results’. While ‘impact indicators’ have been dropped, impact has not: it is now explicitly the contribution of the policy to change in the result indicator. It is not the longer-term effects on the wider population that may (or may not!) be in some way linked to the policy. We have found very few examples of such impacts being credibly linked to the policy. Instead, we are more modest: let us identify a change we seek where we believe our policy can have an effect. And let us evaluate this effect as we implement the policy.
The fundamental change in approach is that we aim to start the programming process with an identification of the intended result, with a corresponding indicator. Then we consider what scale of policy intervention and what resources should be applied to contribute to change. This is a radically different approach to one in the past that – in practice and in theory – started with the allocation of resources (see Figure 1, which starts at the bottom with the inputs and operations leading upwards finally to impacts).
An essential element of the new approach is transparency on the result indicator and regular monitoring of and debate on its evolution. It is quite possible that the result indicator will not move in the desired direction. Then it would be important to reflect on whether the policy action is the correct one (but that effects will take time), or perhaps the ‘other factors’ are too dominant (and perhaps they should become the focus of a different policy?). It may be that the indicator selected does not reflect the intended change and should be reviewed. If programmes are designed with this intervention logic, at least we are clear about what the policy makers want to change and what success should be measured against – with regular debate and review of the policy and its effects.
Current practice
We have an idea of how we want to move forward and how we might capture the impact of future Cohesion Policy programmes. Our belief is that the results orientation must be built into programmes from the beginning; it cannot be bolted on at the end. It should also express the objectives of the programme – which seems obvious, but clearly is not when we have indicator systems involving thousands of indicators.
The regulatory background for the 2007–13 period was one where many Member States in the European Council negotiations took a view that monitoring and evaluation issues created administrative burdens that should be simplified. We can interpret this as a frustration with the amount of effort that went into the gathering of data for 2000–06 and with the fact that the data gathered did not seem to be meaningful. Our contention is that part of that frustration arises from a lack of clarity on some of the basic concepts of the logical framework. The requirement was for:
information on the priority axes and their specific targets. Those targets shall be quantified using a limited number of indicators for outputs and results . . . The indicators shall make it possible to measure the progress in relation to the baseline situation and the achievement of the targets of the priority axis.
4
While the European Commission’s guidance for the selection of indicators 5 maintained the use of the traditional logical framework and the notion of impact indicators referring to longer-term effects for beneficiaries or the wider population, there was a shift away from the use of ‘impact indicators’ with a call for a greater emphasis on result indicators. It was recognized that impact could only be dealt with through evaluation and that other factors would contribute to change in such indicators. Those of us involved in developing the guidance were struggling with the fact that we knew that impact indicators were not delivering much meaningful information – but we did not have sufficient knowledge at that stage to challenge them more radically.
In fact, we now know that the guidance we provided and the early reflections on how to use result indicators was often ignored, in a context in 2006–07 where most Member States (and some colleagues within the Directorate General for Regional Policy) regarded indicators as an unnecessary administrative burden. We have insights from three sources of information.
The first is reporting against some common – mostly output – indicators in Annual Implementation Reports. While not obligatory, Member States agreed in 2008 to report against these indicators so that some aggregate figures could be generated at EU level to communicate the achievements of the policy. What this experience shows us is that, while practice is improving, there are still simple errors in reporting that undermine the reliability of the figures. However, the fact that these data have to be reported annually, and can be simply aggregated and compared, feeds a process where the Commission regularly asks Member States and regions for explanations when data seem strange. Even more, arriving at these large aggregate figures – while impressive in themselves – gives rise to the ‘so what?’ question. What do these outputs lead to? What changes as a result?
More insights come from a pilot exercise where we explored with volunteer regions from Member States what the results orientation we propose for the future would look like in current programmes. The basic questions we asked during these pilots were simple and we remain convinced that if we can answer these questions we will have better quality programmes that are more likely to achieve their intended results. We asked for each priority we examined:
What do you want to change?
What indicator can capture this change?
Do you know the baseline for 2007 or now (data sources)?
Will your output indicators contribute to change in the result indicator? How?
We found the following: 6
The new approach is feasible but not without a significant change in the practice of those designing programmes.
None of the pilot regions currently use result indicators in the manner proposed by the Commission. The objectives of the priorities examined were expressed in very general terms and in most cases current indicators do not capture the intended effects of the programmes.
The results focus must become part of the development of the programme, which need a stronger and more explicit intervention logic; this cannot be added afterwards.
The main change required is concentration. But concentration has to be the outcome of a process of deliberation and policy choice. This emphasizes the importance of political debate on the choices that drive programme design.
If there is concentration, this means that there will be fewer indicators. Some pilot regions had very many indicators – but none captured the motivations for policy action.
Whatever result indicator is selected, baselines and targets are essential.
As a final point, it is important to recall that indicators do not tell us everything. The evolution of the result indicator should prompt a debate; it is not the last word on the performance of the policy.
While the pilots did not go so far as to explore the interactions between different priorities of complex integrated programmes, it seems clear that clarity on the focus of individual priorities is essential before one can assess the interactions (hopefully coherence and synergy and not conflict) between them.
A further finding from analysis of Annual Implementation Reports in the current programming period 7 is that they are not a good source of evidence on the performance of programmes. This is hardly surprising since, if the indicators do not reflect the real objectives of programmes, reports against them are unlikely to shed much light on performance.
Impact evaluation, not impact indicators
This article confronts the theory of the logical framework of the Structural Funds with the practice of how it has been used, from 2000 until now. Examining what has been reported as ‘impact’ gives insights into what impact actually might be.
The traditional notion of ‘impact’ as long-term effects, including those that are direct and indirect, intended and unintended, seems not to be one that is very meaningful. It seems to represent aspirations rather than anything that can be traced back with any credibility to particular policies or interventions. Many impact indicators reported in the past by Member States in their Structural Fund programmes do not tell us about the impact of the Funds: some capture the activity of a programme but not the change that results; others are so distant from policy actions (e.g. GDP or productivity growth) that they tell us little about the effects or effectiveness of policy. ‘Impact indicators’ suggest that there are simple indicators that we can quantify that capture impact. We know this is not the case. Therefore, is it not time we dropped this use of the word impact and the concept of impact indicators?
What we have proposed as the results orientation is very similar to what is referred to in some of the literature as ‘outcome monitoring’ or ‘results monitoring’ (Stern et al., 2012; White, 2010). We have used the term ‘policy monitoring’. The important point is that monitoring is to observe; evaluation must go beyond observation to estimate, capture or judge impact.
We have clarified that impact is what can credibly be attributed to an intervention. This means that we abolish ‘impact indicators’ and instead focus on impact evaluation, using different methods – both quantitative and qualitative – counterfactual and theory-based.
In this way we clarify the different roles of monitoring and evaluation. We also emphasize that indicators alone cannot tell the whole story. To suggest that they can is to undermine the extremely important role that evaluation can and should play in policy design and implementation and the feedback loop that should exist between the two. It is time for evaluators to reclaim impact and insist on their role in evaluating the effects of policy.
Footnotes
Acknowledgements
This article builds on a presentation at the European Evaluation Society Conference in Helsinki, 4 October 2012.
