Abstract
The National Health Service (NHS) in England, as with other health services worldwide, currently faces the need to reduce costs and to improve the quality of patient care. Evidence gathered through effective and appropriate measurement and evaluation, is essential to achieving this. Through interviews with service improvement managers and analysis of comments in a seminar of NHS staff involved in health service improvement, we found a lack of understanding regarding the definition and methodology of both measurement and evaluation, which decreases the likelihood that NHS staff will be competent to commission or provide these skills. In addition, we highlight the importance of managers assessing their organizations' ‘readiness’ to undergo change before embarking on a quality improvement (QI) initiative, to ensure that the initiative's impact can be adequately judged. We provide definitions of measurement for improvement and of evaluation, and propose a comparative framework from which to gauge an appropriate approach. Examples of two large-scale QI initiatives are also given, along with descriptions of some of their problems and solutions, to illustrate the use of the framework. We recommend that health service managers use the framework to determine the most appropriate approach to evaluation and measurement for improvement for their context, to ensure that their decisions are evidence based.
Introduction
The drive for increased ‘productivity and efficiency’, driven by health policy in England and Wales, 1 and the current financial climate, create ever more pressure on managers to evidence the investment in quality improvement (QI) initiatives in services they commission and provide. For example, Primary Care Trust (PCT) commissioners require business cases that demonstrate improvements in productivity and provider Trust Boards are required to deliver greater productivity within limited resources. Evaluation can provide this evidence, but it too must be ‘fit for purpose’, i.e. effective and affordable. Commissioning or undertaking evaluations and understanding the types of measurements required, is not always in the experience or skill set of National Health Service (NHS) managers, who are more used to understanding and comparing high-level performance indicators.
In his report for the previous Government (in the UK), Lord Darzi stated ‘we can only be sure to improve what we can actually measure’. 2 One approach to measurement is ‘measurement for improvement’, which involves using tools, like statistical process control (SPC) charts, to gauge whether a service improvement initiative has yielded an improvement. 3 Managers are likely to find these methods accessible and helpful in communicating information between decision makers. While this can be useful to see if the initiative is being put in place and is associated with some measurable effects on the indicators of ‘success’, this is not sufficient to fully answer questions of the impact, outcome and worth. These are best answered by evaluation.
Within an organization, it can be difficult for NHS managers and others involved in service improvement initiatives to know to what degree aspects of measurement and evaluation should be pursued. In order to make informed decisions, it is necessary to ensure that they are justified by accurate and sufficient evidence. The Department of Health has emphasized that ‘Evaluation is an essential aid to improving project performance’. 4 This is the case regardless of the size or potential impact of a service improvement initaitive, 5 be it introducing a patient checklist on a small ward or a trust-wide implementation of an initiative to reduce waiting lists.
In this paper, we aim to provide an overview of the scope, purpose and relationship between evaluation and measurement in service level QI, and to review some of the challenges that the NHS in England currently faces. In addition, we discuss some of the implications of these challenges for QI initiatives as well as the role organizational readiness can play in their success. Finally, we offer some guidance as to the appropriateness of measurement and evaluation, and examples of what we can learn from some large-scale QI initiatives in the English NHS.
Measurement and evaluation paradigms
Measurement can be used to ‘diagnose’ areas that most require improvement and can be used to establish a baseline against which to measure subsequent improvement. Solberg et al. 3 identify three uses of measurement, namely improvement, accountability and research. Measurement for improvement involves taking small samples of data, over short periods of time, using a few measures, which are directly relevant to the implemented service improvement. This is repeated in cycles, as small alterations are made to the initiative. A convenient way of displaying data is on charts. Run charts are especially popular and SPC charts are increasing in their usage because they capture the apparent fluctuations over time, and allow for some estimation of whether the change is measurement error or may be attributable to other factors. (See Thor et al. 6 for a review of the benefits and limitations of using SPC.) Measurement for accountability utilizes aggregated routine management data (for example, Hospital Episode Statistics [HES]) to compare performance using occurrences such as the number of re-admissions to hospital following surgery. Finally, measurement for research differs in that it aims to provide new knowledge by strictly defining variables (such as clinical outcomes) and sometimes by manipulating variables in controlled designs such as randomized controlled trials.
In contrast, evaluation embraces the wider context of service improvement initiatives. It can be ‘formative’ — aimed at providing feedback on how the initiative is developing and how this influences the results it produces; or ‘summative’ — focussed on aiding decision-makers by providing them with judgements on the achievement of objectives; ‘process’ focussed — designed to help people understand how and why an initiative operates in the way it does; or ‘outcome’ oriented — assess the impact or effect of an initiative and its contributing factors. 7,8
QI evaluations can include some or all of these purposes. A popular paradigm of evaluating QI initiatives is ‘realist evaluation’. 9 This involves adopting a context + mechanism = outcome (CMO) approach. That is, an initiative's outcomes are determined by its mechanisms (aspects which give rise to behavioural change) and the context (social or political conditions, geographic location) in which they are implemented. While evaluation will aim to capture each of these aspects, a measurement for improvement approach will not sufficiently consider the context of the service improvement initiative.
Although the focus of evaluating and measuring improvement is different and can utilize different measures, there are commonalities between them. For example, a common method of measuring initiative outcome and the outcome of QI is to use patient reported outcome measures (PROMS). These are measures of the ‘effectiveness of care’ and were introduced as a standard in Lord Darzi's ‘High Quality Care for All’ document. 2 For these new standardized PROMS, now required by the Department of Health for specific procedures, a patient is asked to complete a questionnaire before a procedure and then another one sometime after the procedure. They are intended to capture the outcome of, or improvement due to, the surgery (or other intervention). The results of these are collated and made available by the NHS Information Centre. This is linked to the National HES data which contain information for all admissions to NHS hospitals in England and can provide information on many areas, for example, waiting times and the number of specific procedures in a given period. Sensitive Data, however, such as those which could identify individual patients or clinicians, are restricted. 10 The data are used to compare the performance of NHS organizations. Aggregated and anonymized health services activity and PROMs data are used in health services research and QI to provide baselines and points of comparison for initiatives. The information can help identify improvements in throughput or reported outcomes which might occur when a change is made to the patient journey or the care they receive. As well as being used as measurement for accountability and improvement, PROMS can also be used when evaluating improvement.
Issues with measurement and evaluation
For evaluation and measurement to be used effectively, it is essential that both are interpreted correctly by those who seek to apply them. However, given the similarities and commonalities that evaluation and measurement possess, it is understandable that their purposes can sometimes become blurred.
Understanding evaluation
Recognizing the importance of evaluation for identifying the impact of QI initiatives, we wished to establish the attitudes, practices, processes and awareness of decision-makers in NHS organizations relating to evaluation and measurement, in order to determine whether evaluation and measurement are clearly understood. We chose to approach members of NHS staff who are employed with substantial job responsibilities as members of service improvement teams, as they are likely to be the key people in an organization who are involved in measurement and evaluation, on a day-to-day basis.
Service improvement managers (or individuals with equivalent job roles) were contacted from a list of those who had previously had interactions with the NHS Institute for Innovation and Improvement (NHS Institute). Thirteen managers were interviewed, four from PCTs, five from NHS provider trusts, three from foundation trusts and one from a cardiovascular network. Interviews were conducted by telephone. The same set of questions was used for each interview. Each lasted approximately half an hour and written notes were taken during the interview. Issues surrounding areas of the service improvement manager's work, their understanding of measurement and evaluation, and the associated activities of their organization were explored.
One of the key questions posed, was whether they would differentiate between measurement and evaluation. Twelve of the 13 participants responded to this question, 11 of whom stated that there was a clear difference between the two. However, when asked to expand on this difference, 10 participants gave an explanation but with less consistency. Five of the 10 participants indicated that measurement was primarily related to quantitative data, with responses like ‘measurement is numbers’ and ‘measurement is data – quantifiable’, and that evaluation was primarily concerned more with qualitative issues, ‘things you can't measure’, ‘intangibles’ and things that are ‘not numbers’.
When asked to describe the current level of evaluation in their organization, six of the 13 participants used phrases like ‘not very well’ or ‘it is rarely done’. However, four of the 13 participants mentioned that these processes were in the early stages of development and were improving. They were each asked in what circumstances evaluations of QI initiatives do not occur. The main reasons were related to responses like ‘mostly financial’ and that ‘time is a major factor’. A lack of understanding of evaluation was also highlighted, with one manager stating that people are ‘struggling to understand how’ and they ‘do not understand the importance and value’, and another believing that it was the ‘biggest factor’ in evaluation not being adopted. Moreover, it was felt by some that qualitative data analysis can be labour intensive and requires specific specialist skills and, in isolation, may not appear as beneficial or economic for some service improvement initiatives as an alternative to a quantitative approach.
We would argue, however, that this is not necessarily the case. Implying that evaluation is just qualitative or ‘anecdotal’, risks losing sight of the fundamental benefits of evaluation. These are, for example, assessing the success of all areas of an initiative and supporting evidence-based decisions. Indeed, a separation of this type threatens to lead people to avoid evaluation altogether, in favour of what would be seen as cheaper and easier measurement, or quantitative, methods. Evaluation should draw on both qualitative and quantitative approaches, where appropriate, to assess the impact of a service improvement initiative.
One manager commented that ‘it is difficult to identify experience’ in evaluation, and another commented that ‘there is not the level of expertise’ in their services to conduct evaluations where necessary. We suggest that there is a danger of a lack of understanding of the paradigm of evaluation being coupled with a lack of awareness, of which NHS staff in their organization are trained in evaluation design, measurement and analysis skills.
Understanding measurement
In addition to the possibility that a lack of understanding about evaluation has contributed towards a low level of evaluation activity, it may also be possible that confusion about what measurement is, has lead to it not being employed as widely as it might be. In a recent paper, the NHS Institute have identified a number of its initiatives (NHS Innovation Awards, the ideas channel for High Impact Actions for Nursing and Midwifery, and Establishing the Evidence) where ‘only a minority’ of service improvement submissions contained any level of measurement, despite this being expressly requested in applications. 11 In response to this situation, the NHS Institute held a WebEx (Web seminar) to identify the key problems and challenges the NHS faces with measurement, and to help formulate how they might address these issues. Invitations to join the event were sent to people involved in service improvement, who have previously had contact with the NHS Institute. Eighty-four people from Trusts, Strategic Health Authorities and other NHS or Department of Health bodies, from across NHS England, logged-in to participate (one participant represented a Chartered Professional body). Of these, 22 participants were in some way involved with the NHS Institute, a further 27 were heads of Innovation/Improvement/Transformation or related areas and the remaining participants consisted of, among others, heads of Finance, Strategy, Nursing and different types of Analyst. Participants in the seminar were encouraged to post comments in an online chat area during the WebEx. A content analysis was conducted using these comments. Many concerns about measurement were raised, including the lack of appropriate skills in their organization, personal dislike of measurement, lack of understanding of its benefits and the confusion about what measurement actually means. For example, one participant commented ‘We are not always clear about what we mean by measurement’, while another stated ‘I'm not sure we have accurately defined measurement yet’. This lack of conceptual clarity may explain some of the resistance to evaluation in comments such as ‘measurement is seen as optional’, and that measurement is ‘the first step towards a performance culture’. The analysis shows that basic concepts of measurement are poorly understood, even by those with an expressed interest in measurement in the NHS in England.
We argue that it is crucial that the benefits of evaluation and measurement for improvement are elaborated in ways that can be useful to health service managers, so that both are integrated into all service improvement activities. It is also important that those who require business cases for improvement, know how to support in-house skills development and how to commission measurement and evaluation activities to produce results capable of addressing key managerial decisions based on their organization's investment in service improvement initiatives.
Readiness
A characteristic of the ‘context’ 9 of a programme is the organization's capacity and capability to undergo change. The implementation of a service improvement initiative can be hindered by a lack of ‘readiness’. 12 Simpson describes readiness as being comprised of the motivation (including perceived need, defined as internally driven motivation, and pressures for change, defined as externally driven motivation) of staff at various levels and the resources of the institution (staff, training, equipment, etc.). Crucially, how readily they are deployed to achieve the change is also an important factor. In a service improvement context, if these aspects are not considered during an evaluation, or even before a service improvement initiative passes through the planning stages, there is a risk of misattributing a lack of success to the service improvement initiative itself and not to the organizations' lack of readiness for change. This underlines the need for NHS managers to undertake diagnostic work, which may also be a ‘baseline’ for measurement of improvement and for evaluation, and to target resources, including staff training in measurement and evaluation, and the understanding of key concepts for the relevant service improvement initiative. However, Weiner et al. 13 stated that there was ‘little consistency in terminology or conceptualization’ of organizational readiness and that many of the instruments aimed at measuring it, demonstrated ‘limited evidence of reliability or validity’. Despite the limited evidence of reliability and validity of these instruments, they represent a tool to measure important aspects of the context of interventions. Other variables to be measured could be derived from conceptual approaches to organizational change. For example, Sirkin et al., 14 describe how four principles have guided more than 1000 initiatives worldwide. These principles are project duration (including time between reviews), performance integrity (capability of project team), commitment of staff (at all levels) and the additional effort required by staff to bring about the change. Bate et al., 15 conducted a large-scale international study of QI in USA and European hospitals. The study concluded that, as noted in a briefing paper by the Nuffield Trust, 16 there are different paths to successful, sustained QI. However, the unifying features are an ability to address multiple challenges simultaneously, and to adapt solutions and strategies to the organization's context. These characteristics of individual and organizational readiness, and capacity for change, can be the focus of measurement within evaluation.
Informing decision-making from evaluation
The ideal link between evaluation and decision-making is that they are mutually informative and interdependent. Assuming that project managers have implemented processes of measurement to assess whether the initiative may give rise to favourable change, the initiative may be implemented or withdrawn, irrespective of the evaluation outcome. As well as service improvement initiatives being withdrawn when they are potentially beneficial, it is possible that they could be continued when they are not beneficial overall. This is more likely if measurement is incomplete and partial, or biased towards finding positive effects. The fact that an initiative has apparently improved the intended aspect of a service, does not necessarily mean that it should be continued as there are impacts on other areas that need to be taken into account. Consider a service improvement team which has helped to implement an initiative that was intended to reduce overnight stays after day-surgery in a hospital ward. After the initiative has been running for a few months, the team notice that the number of stays is considerably reduced, and remaining so. This might suggest that the initiative is a success, should be permanently implemented and possibly spread to other locations. An evaluation, however, could uncover unintended consequences, such as an increase in emergency re-admissions, adverse impacts on staff sickness, increased hospital acquired infections and so on. These negative impacts could be considered as factors linked closely to the context of the intervention and the impacts should be weighed up with the recorded benefits. Measurement methods in QI may establish such impacts through ‘balancing measures’. However, the number of such measures is typically small, so may miss an important unintended impact which may also be a negative one.
Uses of measurement for improvement and evaluation
We now explore and contrast the uses of measurement and evaluation for QI. Measurement for improvement is considered first. It can be defined as using a few measures to quantify or to demonstrate the presence and size of an improvement. 3 It answers the question, is the service improving? In contrast, evaluation is an exploration of the intended and unintended effects of an initiative and involves ‘judging value’. 7 In the previous example, a measurement for QI approach could determine that overnight stays have been reduced (i.e. an improvement). Additionally, evaluation could inform the selection of other measures that the team decided to record. This could also be gained from using measurement as part of an evaluation. In addition, evaluation is appropriate for establishing how well the initiative was planned and implemented, how others perceive it, its unintended effects and how it compares to where it has been (or could be) implemented elsewhere. Table 1 describes and contrasts the uses of measurement and evaluation.
Comparison of measurement for quality improvement and evaluating quality improvement*
†The Model for Improvement 18 is a framework which is designed to focus improvement efforts on achieving what it intends. It poses the questions: ‘What are we trying to accomplish?’, ‘How will we know that a change is an improvement?’ and ‘What change can we make that will result in an improvement?’. These questions are combined with the Plan-Do-Study-Act (PDSA) cycle to guide goal-setting and ensure it is known which changes are desired and intended.
This framework is intended to guide those involved in service improvement in knowing how, when and why evaluation and measurement are relevant. Although large and small service improvement initiatives should be evaluated, 5 not everyone involved in an initiative need evaluate it themselves. For example, staff of an individual hospital ward might measure their progress during the implementation of a larger initiative. While they do not need to evaluate the entire project, their information can contribute towards the evaluation. Therefore, both measurement and evaluation are important components of a service improvement initiative, but must be conducted by the people with the right skills and who can undertake them in agreement with the QI initiative team. The following are two large-scale service improvement initiatives and the issues faced relating to measurement for QI and evaluation, which demonstrate the relationship and characteristics of the two activities.
The Health Foundation's Co-creating Health initiative
The Health Foundation's programme theory 20 in the Co-creating Health initiative states that for people with long-term conditions to take a more active role in their health, patients need to develop the knowledge and skills to manage their condition while working in effective partnership with their clinicians. This includes problem-solving and action planning, which aim to help people increase their confidence and self-management skills. They also need skilled support from their clinicians and health-care systems that operate very differently from those we have today. The Health Foundation provided an integrated initiative for a period of three years, delivered in eight health-care economies (primary and secondary care services in one area). The programme consisted of an advanced development programme for clinicians to develop their skills to support and motivate their patients to take an active role in their own health, and self-management courses for people with long-term conditions. A service improvement programme, using Plan, Do, Study, Act (PDSA) 18 methods, focused on testing changes in service delivery to support more co-productive clinical consultations.
The evaluation design followed Pawson and Tilley's 9 realist evaluation paradigm, with measures of context (e.g. who attended a programme and who dropped out), mechanism (e.g. what behaviour change methods are taught) and outcome (e.g. what evidence there is that the skills are used). These included self-reported measures (e.g. standardized clinical measures of anxiety and depression), experiential rating scales and interviews to ascertain people's narrative of their experience and learning, as well as ‘hard’ data, such as health-care use measures, and ethnographic observation of the programme in action. Importantly, data were attributed to identify participants so that survey data measured at two time points, such as before and after a group training programme, could be linked. The outcome data were also linked to demographic data to identify the characteristics of who attended and which people had better outcomes. This affords the possibility of analyses, so that causal links could be established between programme inputs and impacts.
The measurement for improvement consisted of four or five measures of the processes (e.g. the presence of specified consultation tools, such as cards to prompt joint consultation agenda setting). These were measured on small samples of patients (10 per cycle), with data collected anonymously (i.e. not linked to an individual patient). While this may have been adequate to show changes in relation to tests or suggested improvements, it was not adequate to ascertain for whom these effects were or were not having the expected impact and in what context. Further, measures of outcome were not conceptually linked to the process being measured. For example, asking a patient to rate how well they feel able to self-manage their condition, on a 10-point rating scale offered to a patient by the clinician themselves, on an exit form after a consultation is methodologically flawed. This is for many reasons, not least the strong social bias to rate a powerful clinician positively in the medical environment. 21 It is also contrary to extensive research on the motivation of people to self-manage, to suggest that one short consultation is likely to impact on self-confidence to self-manage, which is a complex multi-behaviour skill. 22 In this example, we suggest that a more effective strategy for measurement for QI could be integrated with the evaluation methods for the initiative itself. For example, simply attributing the survey ratings used in QI to named patients, would enable their data to be tracked over subsequent consultations and linked to their health-care use, their attendance in consultations with other clinicians who had been trained in the enhanced support skills, and whether they had attended a patient self-management programme. Such data linkage would add together to create a much richer picture of the impacts of the programme, including modelling of the necessary combination of inputs to achieve the optimal impacts, and be a basis for some estimates of the financial value of the programme.
The NHS Institute's Productive Ward initiative
The Productive Ward: Releasing Time to Care ™ programme offers a practical approach for clinicians and other ward-based staff to improve the services they deliver to patients. The programme was developed by the NHS Institute, in collaboration with the NHS, and offers self-directed modules and supporting documentation. The programme is based on the principles of ‘lean thinking’, which aims to reduce activities that do not add value. In the case of Productive Ward, the aim is to release time to spend on direct patient care. As approximately a third of NHS expenditure is spent on delivering ward-based care in hospitals, the potential gain in improving fundamental aspects of how wards function is vast.
The NHS Institute commissioned the National Nursing Research Unit at Kings College London to undertake a Learning and Impact review 23 of the Productive Ward programme. The resulting report, recognized that there are not only many perceived benefits of the Productive Ward but also limitations to being able to demonstrate measurable impact. There were challenges engaging frontline staff with understanding, using and owning data. The suggested measures were perceived by some as high level and designed to track programme-wide or organization-wide changes and were not applied consistently.
The review undertook a detailed assessment of locally available data in each case study site, asking leads to complete a profile for ‘Productive Wards’ and for comparison wards that had not been part of the improvement programme. The general conclusion was that only routine or administrative measures were identified as potentially available across all organizations and these had not generally been compiled to demonstrate change over time.
Based on feedback from leaders implementing the Productive Ward, it was felt that the problem was not that the programme was failing to achieve benefits, but that there had been a lack in the means to capture the full benefits on a systematic basis.
To address this gap, the ‘Productives Module Impact Framework’ was developed to help NHS organizations understand the impact that a Productive programme has on productivity, efficiency, staff experience and skills development. Data are collected on financial impact, efficiency (time released as a result of process improvements), knowledge and skills development, and staff experience and well-being. Both quantitative and qualitative data are used in the model, including capture of staff stories of improvement in areas of safety, quality and improved patient and staff satisfaction. Such an approach should help to capture and describe the wider benefits of the Productive Ward for managers, although evaluation requires an assessment of wider system change.
Discussion and conclusions
We have emphasized how measurement for improvement and evaluation are both important and, when planned and integrated throughout a QI initiative, are likely to lead to improved decision-making. Measurement for improvement is particularly beneficial for small localized teams implementing an initiative where the focus is on identifying a level of improvement and where responsibility for evaluating the initiative lies elsewhere, such as with a national funder. Evaluation, of which measurement can form a part, is intended to provide insight into the planning, implementation and impact of an initiative at all levels. By harnessing the various aspects of evaluation and measurement, managers will be better able to increase the likelihood of their initiative having a desirable impact as a whole, as well as within local teams. Moreover, should the initiative not proceed as intended, evaluation will be useful in diagnosing areas which need attention during the programme, enabling corrective action to be taken. We advise that when making important business decisions relating to service improvement initiatives, health service managers must use the relevant measurement data, as well as the evaluation data, to provide an evidence-base to inform and justify decisions. We recommend reference to the framework in Table 1, to help managers clearly establish the purpose and methods for measurement and evaluation.
We recommend that before a service improvement initiative is begun, an organization should assess whether it possesses the capability, or readiness, to implement such an initiative effectively. Failure to do so, risks disillusionment with QI as a methodology. This type of assessment should form part of an evaluation while the initiative is being planned and defined. As the initiative progresses, senior managers should require such information as part of the monitoring process, rather than simply rely on high-level indicators, in active collaboration with improvement teams.
As well as encouraging cost-savings and helping to improve quality, a more successful application of both evaluation and measurement for improvement will lead to more reliable implementation of service improvement initiatives and, ultimately, improved patient care.
Ethical approval was not required for the work presented herein.
