Abstract
In the UK, Payment by Results (PbR) is a central element in public service reform. This policy intention is being translated into practice through programmes of varying designs and scale, operating in multiple areas of policy. There are few completed evaluations of these programmes – although results are starting to emerge – and empirical evidence is lacking. This, combined with the strength of feeling, political imperative and conceptual confusion surrounding PbR, has meant that debate has often been heated and focused on whether or not ‘PbR works’. Within this context, this article examines the role of evaluation in approaching PbR. It argues that evaluation’s focus should be on explanation (looking within programmes; surfacing and testing their theories) and refinement (using findings to improve programmes; using theories to look across programmes) to advance the debate on PbR from ‘for or against?’ to ‘when, how, for whom and under what circumstances?’ The article advocates a Realist approach to this undertaking, setting out an illustrative framework of PbR’s mechanisms and hypothesized outcomes for exploration and development.
Introduction to Payment by Results
In the UK, Payment by Results (PbR) forms a central element of the Coalition Government’s approach to public service reform.
1
The 2011 Open Public Services White Paper mentions ‘payment by results’ 15 times and advances a series of arguments for its use. For example: . . . it is not enough to pay someone to provide a service with the only recourse being that if they fail they will not be re-awarded the contract. In these cases it makes sense to build in an element of payment by results to provide a constant and tough financial incentive for providers to deliver good services throughout the term of the contract. This approach will encourage providers to work more closely with citizens and communities to build services that are both more efficient and qualitatively different, orientated around individuals and communities. (Cabinet Office, 2011)
Within the White Paper, PbR is described as part of a series of measures to ‘open up the market’ for the provision of public services. Such measures include: separating commissioning from supplying; involving private and voluntary sector providers; creating competition between providers; aligning financial incentives to the achievement of outcomes; and offering providers greater freedom to innovate. For example: Open commissioning and payment by results are critical to open public services. This is not just about opening up services to competition; it is also about empowering all potential providers, from whichever sector, with the right to propose new ways to deliver services, and linking payment to results so that providers are free to innovate and eliminate waste. (Cabinet Office, 2011)
Within this context, PbR is promoted as a means of making better use of (increasingly scarce) public resources while also addressing perceived weakness in previous models of provision. These weaknesses – as articulated by proponents of PbR – include an insufficient focus on outcomes and a lack of incentive to achieve them. These flaws are seen as arising partly from contractual arrangements whereby services were funded and performance managed using input and output measures, rather than an assessment of outcomes. In this respect, PbR is being used in an attempt to focus attention – using financial incentives – further down the logic chain, as illustrated in Figure 1.

PbR is being promoted as a means of focussing attention on outcomes.
Accepting variety in its use (described below), the essence of PbR’s theory of change is simple: it aims to alter providers’ incentives; and thereby their behaviour; and thereby improve resulting outcomes. This is illustrated in Figure 2.

The essence of PbR’s theory of change: PbR aims to alter providers’ incentives.
PbR is not new. It has been used in labour market interventions going back to the New Deal programmes of the 2000s; moreover, and accepting that payments were largely based on outputs, the NHS has used PbR since the mid-2000s (University of Manchester, 2012). But PbR’s use in public service reform has been expanded greatly under the Coalition Government, marking a quantitative shift in PbR’s application. PbR is being used to deliver services across a broad spectrum of policy areas (Cabinet Office, 2014). Examples include:
Services targeting families facing multiple disadvantages;
Prisoner rehabilitation;
Labour market programmes;
Homelessness;
Public health services, including smoking cessation and physical activity;
Children’s Centre services;
Interventions to support young people not in education, employment or training;
Supported housing services;
Programmes to reduce the numbers of children being taken into care;
Locating illegal immigrants; and,
Drug and alcohol services.
PbR has also been used in foreign aid programmes and the Department for International Development (DFID) has developed a strategy to further its use (DFID, 2014). While this article focuses very largely upon PbR’s current use in the UK (and in particular in England), the issues discussed are inherent to PbR and so should not be particular to this time and place (Clist and Verschoor, 2014); moreover, the international literature cited below suggests that similar claims and criticisms are made.
In some of the above cases, the ‘standard’ PbR model (whereby a public sector commissioner pays a provider directly for the results achieved) has been altered and extended to include the use of social finance – notably through the use of Social Impact Bonds (SIBs). Under a SIB, social investors fund service provider(s) and are in turn paid by commissioners for the results achieved. In part, this model has been used as a response to the problem of smaller, predominantly voluntary sector, providers not being able to finance the upfront costs of involvement in PbR (ICF International, 2014).
Examination of the schemes listed above reveals that the term ‘PbR’ describes a wide variety of arrangements: there is no single approach. Moreover, models of PbR are evolving as lessons from practice – and evaluation – emerge. The main differences in current models relate to: the ‘level’ at which incentive payments are set; the proportion of the contract accounted for by PbR; how far payments are based solely on outcomes achieved, or whether some blend of output/outcome payments is used; and the level at which results are measured. These elements are illustrated in Figure 3.

Different models of PbR.
The range of models outlined above also places limits on the utility of the label ‘PbR’, since the term could, for example, apply to:
a programme whereby central government withholds a grant to local government for failure to achieve a specified improvement in outcomes for the local population; or,
an arrangement whereby individual staff are paid a small bonus for outcomes achieved by ‘their clients’; or,
a contract whereby multiple providers are paid solely on the basis of outcomes achieved by a specified cohort of service users.
In each case, and leaving substantive matters of purpose and content aside, the incentives facing the systems, institutions and individuals involved would be very different. Yet each could be described as ‘PbR’. Moreover, and further compounding this definitional complexity, PbR also goes by different terms. Depending upon the policy/geographic area involved, PbR is also referred to as (inter alia): ‘Results Based Financing’, ‘Outcome Based Commissioning’, ‘Cash on Delivery Aid’, ‘Results Based Aid’, ‘Payment by Outcomes’ and ‘Pay for Success’.
Yet, notwithstanding variations in the particular models or terminology used, the fundamental theory remains the same: change incentives to change services; change services to change outcomes. Or, more simply: incentives matter.
The role of evidence and values in debates about PbR
Differences in values and politics have done much to animate debate on PbR, while evidence has hitherto done little to narrow or inform it. Indeed, recent reports have concluded that there is a paucity of evidence to inform decisions as to whether, when and in what form PbR may be appropriate. For example, in 2012 the Audit Commission reported that: Our review of UK and international research evidence found few rigorous evaluations of PbR and no complete, systematic analysis of its effectiveness. (Audit Commission, 2012)
While in a finely argued paper for DFID on PbR and evaluation, Burt Perrin stated that: Perhaps the most optimistic conclusion that can be drawn from available evidence is that contracting out may increase access and use of health services in the short term rather than broader health outcomes. Unintended effects are quite possible, and there is limited evidence to date that PBR approaches offer value-added compared to other modalities. (Perrin, 2013)
A recent Cochrane review on the use of PbR in attempting to improve the delivery of health interventions in low- and middle-income countries concluded that: The current evidence base is too weak to draw general conclusions; more robust and also comprehensive studies are needed. Performance based funding is not a uniform intervention, but rather a range of approaches. Its effects depend on the interaction of several variables, including the design of the intervention (e.g. who receives payments, the magnitude of the incentives, the targets and how they are measured), the amount of additional funding, other ancillary components such as technical support, and contextual factors, including the organisational context in which it is implemented. (Witter et al., 2012)
And finally, in a rapid review of the evidence on PbR for a scoping study, the evaluation team for the Drug and Alcohol Pilots note that: Largely due to the fact that PbR schemes are a relatively recent development, there is a paucity of evaluations of such programmes and related social investment vehicles [gives references]. The few evaluations which have been conducted generally demonstrate the difficulties inherent in attempting to attribute effects to PbR implementation and offer a mixed picture regarding the advantages and disadvantages of PbR. (University of Manchester, 2012)
This lack of evidence ought to be enough to caution against blanket judgements – either in favour of PbR, or against it. Yet debate on PbR has often lacked caution and engagement with nuance: PbR has both ardent advocates and, albeit to somewhat lesser (or, at least, less visible) extent, steadfast opponents.
At the political level, Chris Grayling MP has been a strong advocate for PbR. He was Minister for Employment as the Work Programme (a large, PbR based scheme to address unemployment) was established and now, as Secretary of State for Justice, is leading the use of PbR in rehabilitation services. Grayling is clear about the political drivers behind the use of PbR: Sometimes those in government just have to believe in something and do it, but the last Government set out a pilot timetable under which it would have taken about eight years to get from the beginning of the process to the point of evaluation and then beyond. Sometimes we just have to believe something is right and do it, and I assure Members that if they went to Peterborough [prison, site of a SIB on reoffending] to see what is being done there, they would think it was the right thing to do. (cited in Garton Grimwood et al., 2013)
While KMPG argues that: Payment by results should be implemented across the public sector without exception . . . Where payment by results exists it should be made enhanced and where it does not exist it should be hurried into existence, even if it is crude to start with. It is possible to describe a maturity path for PBR and to use this as a framework for moving forwards. (KPMG, 2010)
On the opposing side of the debate, Toby Lowe, Visiting Fellow at Newcastle University Business School, citing the potentially distorting effects of incentive payments, argues that: [PbR] does not reward organisations for supporting people to achieve what they need; it rewards organisations for producing data about targets; it rewards organisations for the fictions their staff are able to invent about what they have achieved . . . There have been numerous studies that show that such systems [as PbR] distort organisational priorities and make organisations focus on doing the wrong things – and they make people lie. (Lowe, 2013)
And David Boyle, fellow at the New Economics Foundation, sees PbR not as a means of freeing providers to innovate, but as an accentuation of centralization and management by target: . . . yes to committing to broad outcomes. Yes to services which can go beyond a narrow set of defined achievements – rebuilding the surrounding community, for example. But let’s not pretend these things can be measured objectively. Does Whitehall understand this? I‘m afraid not, and that means targets are on course for a return – but with extra bite and a great deal more bureaucracy and regulation. (Boyle, 2010)
The voluntary sector has also been a source of critique – albeit while not typically taking a wholesale ‘anti-PbR’ position (this care seemingly resulting from the sector’s substantial role in service delivery – including within PbR programmes 2 ). Criticisms here have tended to emphasize features of PbR agreements that favour larger, private sector, organizations. Features noted have therefore included: financing of PbR agreements and consequent implications for cash-flow (not getting paid until results are achieved); the scale of contracts let under PbR (often large and thereby meaning that smaller organizations are unable to lead bids); transactions costs associated with PbR (e.g. legal costs); and, the extent to which PbR may dissuade service providers from working with vulnerable groups.
The conclusion of many voluntary sector critiques has therefore been that there is a need for caution in the use of PbR – and for more evidence on its effects. For example, National Council for Voluntary Organisation’s (NCVO) recent paper on PbR states that: PbR is not always an appropriate mechanism; whether or not it will be an effective method for creating efficiencies, bringing about service transformation and improving outcomes is dependent on both the service it is being applied to and the market of providers. In some cases the costs to both the commissioner and provider of implementing PbR may far exceed the potential savings. Commissioners should ensure that they have a thorough and evidenced case for using PbR, and that their purposes and intentions are clearly explained to the market. (NCVO, 2014)
As can be seen, the debate on PbR has, to date, been value-rich and evidence-light. Moreover, and as a corollary of the definitional complexity outlined above, it is also doubtful whether evidence could support either a wholly pro- or anti- PbR position.
Implications for evaluators and evaluation
So more evidence is needed and evaluation is left with a vital task. The argument advanced here is that this is a task of clarification and refinement, rather than proving or refuting. This is to say that evaluation’s role should not be in pronouncing for or against PbR (if such a pronouncement is even possible). Instead, the focus should be on more contingent questions, such as: what is meant by PbR? How does it operate? What effects – intended and unintended, desirable and perverse – does it lead to? By what means/mechanisms are these effects produced? Under what conditions do these mechanisms lead to the desired outcomes? And, given the policy intention (and programme operation), under what circumstances might PbR be appropriate (relative to alternatives)?
3
A similar conclusion is reached by Burt Perrin in the paper referred to previously: The most important question for evaluation to address should not be ‘does it work?’, but to identify the mechanisms and sets of circumstances under which PBR approaches can most likely result in behavioural change leading to changes in outcomes, recognising that this is very much a question of impact. (Perrin, 2013)
Many readers will have noted that the agenda being developed here suggests a particular role for theory-led approaches. But before following this line of thought, the paper digresses slightly to consider the role that evaluation (or, at least, evaluators) may have to play in the design of PbR programmes. This follows from the fact that designing PbR agreements requires significant analytical effort. As the Audit Commission notes: The ability to measure and evaluate outcomes and the overall success of the project is the essential ingredient for a successful PbR scheme. PbR can only succeed with outcomes that can be accurately measured to inform payments, so success measurement and evaluation cannot be left until the scheme is up and running . . . Developing the data, and the payment model linked to it, can involve considerable analytical resource and should be a factor in considering whether and how to set up a scheme. (Audit Commission, 2012)
And the National Society for the Prevention of Cruelty to Children (NSPCC), in their analysis of PbR, extend this point – touching on long-standing methodological debates: The greatest challenge for PbR commissioners is finding an accurate and fair way to define and measure social outcomes. There is currently no consensus on which methods are best – a variety of methods are currently being piloted, including the use of cohorts, control groups and comparator areas. (NSPCC, 2011)
PbR’s requirements for quantifiable results has therefore led to evaluators playing an advisory role at the design stage of programmes. This seems to have been especially the case with the use of SIBs, where investment in analytical effort has been apparent in proof of concept/design stage – with specialist suppliers emerging and evaluation firms entering this market (e.g. Social Finance’s ‘Directory of SIB Service Providers’). This support seems to have concentrated on two main elements of programme design:
First, knowledge of evaluation is important in designing systems for measuring and attributing outcomes. It is something of a high-point example, but the Peterborough SIB 4 is assessing outcomes for those supported under the scheme, net of a counterfactual estimated using a comparison group arrived at using propensity score matching. The method used was determined by Ministry of Justice and Social Finance analysts, before being independently reviewed by QinetiQ and the University of Leicester – the investment in analysis being a clear function of payments depending upon results. The implication for evaluation is that PbR creates opportunities to support the design of specific schemes, since there is a requirement for analytical rigour at this point in the programme cycle; and
Second, the ability to appraise and synthesize evidence on the likely effectiveness of interventions is fundamental to the design process. As the Audit Commission notes: ‘The strength of the evidence for the relationship between certain actions and outcomes is crucial in selecting areas for PbR and designing schemes’ (Audit Commission, 2012). Commissioners therefore need sufficient confidence that funding ‘intervention x’ will lead to ‘result y’; this is especially the case when SIBs are used and such evidence is required to inform investors’ decisions. The Feasibility Study for the London Homelessness SIB provides an illustration of this type of analysis (ICF, 2014). Again, the implication for evaluation is that there is an opportunity to bring evidence to the design process: showing previous interventions and their effects; aiding the translation of this evidence to the case and context under consideration.
Returning to the main subject, the remainder of the paper sets out a series of implications for evaluation in assessing the operation of programmes based on PbR. In summary, the argument advanced is that:
because under PbR the job of specifying, quantifying and attributing outcomes typically becomes part of the programme management function, 5 the focus of evaluation ought to be on tracking progress (helping programme teams take corrective action) and explaining results;
while on the account of many policy makers, programmes based on PbR can operate on a ‘black box’ basis (payments go in, results come out; commissioners remain agnostic on means), this should not be the focus of evaluation. There is a need to enter the detail of delivery and generate learning for the area of service under consideration; and,
accepting that this is not entirely within the gift of evaluation teams, evaluation should look outside the particular programme under consideration in order to contribute to the more general task of refining the use of PbR, using evidence to suggest appropriate applications.
It is still comparatively early in the life of the PbR programmes referred to in the introduction to this article. Nonetheless, evaluations are beginning to report and early findings are available. It is therefore possible to present an initial assessment of the extent to which they address the points summarized above.
A brief review of publically available reports from evaluations of some of the main PbR programmes (summarized in Table 1) suggests that there is indeed a strong focus on formative feedback and tracing the process from implementation to results. Evaluators also appear to have gained access to PbR’s black box: 6 the concern that providers (and especially commercial providers) would not share details of ‘their’ approach (thereby reducing scope for learning) does not seem to have been realized. Each of the studies offers recommendations for the programme being evaluated; most then elaborate a series of recommendations ‘for PbR’. Yet, most likely because of the given focus of the studies, recommendations are largely tied to the model of PbR used within the programme. 7
A partial analysis of PbR evaluations suggests a significant focus on learning.
These reports therefore contain emerging lessons that may be useful to the task of using evidence to suggest when PbR may/may not be an appropriate approach. For example:
the evaluation of the Children’s Centres pilots documents the failure of PbR in this context. The primary reasons given for this – insufficient time to set up pilots, high transactions costs, overly complex arrangements for assessing performance, outcome measures having a weak relationship with services provided, and insufficient incentive payments for Children’s Centres to change their practice – offer a clear set of findings as to the likely contextual conditions needed for PbR to work effectively;
the Peterborough SIB evaluation highlighted several factors associated with the successful running of the SIB. These included: upfront payments enabling specialist voluntary sector agencies to provide services (alongside associated use of volunteers); selecting outcome measures to explicitly design out problems inherent in PbR (e.g. cream skimming); taking time and designing the SIB with local agencies; and the freedom given to providers to tailor and integrate their services to address beneficiary needs; and
many of the same factors were also highlighted in the London Homelessness SIB evaluation. Points noted here included the: importance of a strong and inclusive design phase, including the use of modelling, evidence reviews and stakeholder engagement; relatively high administrative costs for commissioners; uncertainties over the balance of risk and return for investors; and need for high quality performance data.
The analysis presented here is very limited. Reports included cover just a small proportion of total PbR activity; the review of these reports is also partial and non-systematic. Nonetheless, and notwithstanding the caveats implied by these limitations, the value of looking across PbR evaluations to pick out lessons can be seen. A more thoroughgoing analysis, guided by a clear analytical framework, should therefore add significant value to the task of refining the use of PbR – providing an evidence base to suggest how, when, for whom and under what circumstances PbR is most likely to be appropriate.
How could these efforts be advanced? It seems that the main area for development is for evaluation to take a more comprehensive view across programmes to answer questions about the circumstances under which PbR is most likely to be effective. With this task in mind, and considering the theories underpinning PbR (elaborated below), this article now concludes with a recommendation in favour of Realist approaches. This is because:
The animating question of the Realist – ‘what works, for whom, in what respects, to what extent, in what contexts, and how?’ – is suited to the project of policy development: helping to define cases when PbR may be appropriate (or otherwise);
In tracing the extent to which PbR’s theories hold in practice, Realism’s focus on mechanisms ought to be useful. It may be extending this point too far, but the examination of mechanisms may help to refine the thinking behind PbR – e.g. is it incentive payments that change providers’ behaviour, or is it the concentration on outcomes within performance management frameworks?; and
In taking programme theory (rather than programmes) as the unit of analysis, Realism provides a clear framework for aggregating findings across individual evaluations. As Pawson argues: ‘Evaluation science needs to be more venturesome in widening the focus of inquiries from that of ‘the programme’ and should begin to consider ‘policy ideas and their history’ as its subject matter’ (Pawson, 2013). Furthermore, Realist review (Pawson, 2006) offers a theory-led framework for drawing on a wide body of evidence; this would provide a conceptual basis for incorporating other literature relevant to the theories of PbR (e.g. on the use of performance-related pay/distorting effects of incentive payments), allowing something to be said even where PbR-specific evidence is not available.
Table 2 sets out a starting point for the type of cross-PbR Realist analysis proposed here. It draws primarily upon the views of PbR’s proponents, attempting to summarize their arguments as a set of mechanisms and hypothesized outcomes. Diversity within the views of these proponents means that several mechanisms listed here are contradictory and many are overlapping; moreover, not all mechanisms would apply in all cases. The table also sets out a series of illustrative questions implied by the attempt to ‘follow the mechanism’ into practice and gather empirical evidence.
Initial framework of PbR theory: mechanisms, hypothesized outcomes and implied questions for evaluation.
This question applies for each mechanism, but is not replicated each time to avoid repetition.
The aim here is modest. Table 2 is no more than an initial framework, which would require refining (and refining) in the light of the evidence gathered. Nonetheless, it ought to serve as an illustration of the type of analysis that would aid evaluation in moving from the particular to the general – and in helping to advance the debate on PbR from ‘for or against?’ to ‘when, how, for whom and under what circumstances?’
The framework set out in Table 2 also illustrates how a Realist approach provides a means of drawing in evidence from sources not directly related to PbR (per se), but that offer evidence on its component theory. For example, under the mechanism ‘attention on outcomes’ it may be that individual staff members are provided with financial incentives to achieve outcomes for service users. Here the Realist would want to know about the effects of similar approaches in other areas. This investigation could then draw upon, for example, literature on the displacement of intrinsic motivation (the desire to do good) by extrinsic motivation (the desire to achieve financial reward) (e.g. Sandel, 2012); or literature on gaming in the face of performance management (e.g. Goodhart’s Law 8 ).
The final point arising from Table 2 is that PbR should be considered relative to alternative commissioning models. To the extent that better outcomes are achieved, these should be considered net of the next best alternative. The same should be true of costs. This is no simple undertaking; in the first instance, as the Cochrane review cited above notes: When paying for performance schemes are compared to no intervention, it may be impossible to disentangle the impact of paying for performance per se from the impact of increased resources and other ancillary components. (Witter et al., 2012)
Yet a better comparison is not ‘no intervention’, but ‘intervention commissioned by some other means’. This is an additional and important question for evaluation: under what circumstances is PbR likely to be cost-beneficial relative to alternative approaches to providing public services?
Conclusion
PbR has a strong intuitive appeal. Few would argue against the logic of aligning providers’ rewards with beneficiaries’ outcomes. This logic is especially powerful in an era of highly constrained funding for public services and a consequently heightened need to get the most value for every pound invested. PbR has therefore found favour in the UK as part of a set of policies designed to reform the provision of public services; it is now being tested in multiple policy areas.
Debate on PbR has been animated largely by values and prior (ideological) beliefs: PbR has committed proponents and opponents. This, combined with the currently very limited empirical evidence base, creates demand for evaluation but also constrains its likely influence. Given this, the role for evaluation becomes one of refinement. Drawing on theory-based approaches, this implies: using formative feedback to develop programmes during implementation; using summative findings to provide lessons for analogous efforts; and building these lessons up across programmes to examine when PbR might/might not be appropriate. In a previous paper, Mike Daly (Department for Work and Pensions) and I argued that: . . . there is a need to learn from current pilots and trials. Results are important, but so too is process: analysts especially must emphasise the importance of learning in this emerging area. In short, there is a way of viewing PbR as a black box (payments go in, results come out); this is not a helpful perspective if the aim is better policy development. PbR will never be a silver bullet – such ammunition is unavailable in social policy – but careful development and a focus on learning will minimise its potential as a dangerous weapon. (Battye and Daly, 2012)
In developing that argument, this article has suggested that a Realist approach would be most suitable. The hope expressed here is therefore two-fold. First, that such an approach helps to re-frame the debate on PbR; changing the question from ‘PbR: for or against?’ to ‘PbR: when, how, for whom and under what circumstances?’ And second, that evaluation plays a more prominent role in informing this debate.
Footnotes
Acknowledgements
This article draws on: conversations with colleagues at ICF – notably Yvonne Fullwood and Paul Mason (who provided useful comments on an early draft); two cross-government seminars with analysts working on PbR; and two workshops at UK Evaluation Society Conferences (2012 and 2014). The second of these workshops was delivered jointly with Mike Daly from the Department for Work and Pensions, who has helped clarify many of the conceptual and analytical challenges presented here.
Declaration of conflicting interests
ICF International provides analytical and advisory services across public, private and voluntary sectors. This includes the evaluation of programmes based on PbR, and advice to commissioners and providers considering the use of PbR.
