Abstract
This paper deals with three concerns about the evaluative framework that is currently dominant within health economics. These concerns are: that the evaluative framework is concerned entirely with health; that the evaluative framework has an individualistic focus on patients alone; and that the methods used to estimate ‘health’ within the current evaluative framework could be improved both in terms of the generation of descriptive systems and in using valuation methods that rely less on people's ability to express their preferences on a cardinal scale. In exploring these issues the Investigating Choice Experiments for Preferences of Older People (ICEPOP) programme has explicitly focused on both the topic of older people and the methods of discrete choice experiments. A capability index has been developed and attributes for an economic measure of end-of-life care are currently being generated, providing the possibility of extending the evaluative framework beyond health alone. A measure of carer's experience and a framework for extending measurement in end-of-life care to loved ones are both also in development, thus extending the evaluative framework beyond the patient alone. Rigorous qualitative methods employing an iterative approach have been developed for use in constructing attributes, and best-worst scaling has been utilized to reduce task complexity and provide insights into heterogeneity. There are a number of avenues for further research in all these areas, but in particular there is need for greater attention to be paid to the theory underlying the evaluative framework within health economics.
Where were we 10 years ago? The inadequacy of the evaluative framework in health economics
Health economists are concerned with the efficient allocation of those scarce resources that are devoted to health care. In the last 10 to 15 years, economists have been largely focused on the use of Quality Adjusted Life Years (QALYs) as a means of assisting policymakers in maximizing the health that can be obtained from the available health care budget. The evaluative framework for this approach is non-welfarist, based either on a decision-maker approach as first comprehensively described by Sugden and Williams in 1978 1 or an extra-welfarist approach as described by Culyer in the late 1980s. 2 The particular advantage of the QALY has been its ability (at least in theory) to enable comparisons across all types of health intervention and thus to allow decisions to be made that ‘maximize health’. The use of this evaluative framework is epitomized in England and Wales by the decision-making of the National Institute for Health and Clinical Excellence (NICE) which makes decisions about whether particular interventions should be provided through the public purse. 3 NICE guidance requires cost-effectiveness models to be produced, ideally using QALYs as the unit of outcome, and preferably using the EuroQol EQ-5D index measure 4 as the basis for this QALY. 3 The EQ-5D measure contains five dimensions (mobility, self-care, usual activities, pain/discomfort and anxiety/depression) each with three levels broadly corresponding to ‘no problems’, ‘some problems’ and ‘extreme problems’. 4
The increased use of a cost-effectiveness framework in NHS decision-making can clearly be seen as a success for health economics and the need for further research in this area may not, therefore, be obvious. There are a number of inadequacies, however, with this current evaluative framework, particularly in relation to decision-making for older people who may require care from both health services and social services, who may require care from unpaid carers such as family members or friends, and who are approaching the end of their life.
First, the current evaluative framework is concerned entirely with health. There may be important and valued benefits from health care that cannot be measured in terms of health. 5 In evaluating care for older people there are two particular concerns. The first relates to the increasing desire to better integrate health and social care.6–8 Services which provide social care to older people may not contribute to health per se, but to quality of life more generally. Hence basing evaluations purely on measures of health benefit would be likely to underestimate the total benefits of interventions that contribute to quality of life more generally. Such interventions may thus be disadvantaged in funding decisions where they are competing against interventions that focus entirely on health. The second concern is around care at the end of life where, again, health production may not be the focus of the care provided. At that time, there may be particular attributes that become important to older people, including the quality of care itself, preparation for death, spirituality, dignity and not feeling oneself to be a burden on others. In estimating the benefits of interventions designed to ease the end of life and the process of dying, a health measure may be seen as inadequate.
A second inadequacy with the current evaluative framework is its individualistic focus on patients alone. Within the current non-welfarist framework, it is the patient alone who matters (although, strictly, it is the public's valuation of the patient's health state that matters). Evaluations are seldom conducted that take account of the external effects of the patient's illness on those around them. Yet for older people who may rely on unpaid care, this is clearly an inadequacy in evaluation. The provision of unpaid care may impact on the health, employment, social life and general life satisfaction of the individuals providing this care. By ignoring its impact (or, at best, limiting consideration to the health impacts of caring), economic evaluations may inaccurately quantify the costs and benefits of health care provision. Older people are also approaching the end of their lives. There are clearly questions about the extent to which end-of-life care should be evaluated entirely in terms of the (public's view of the) patient's state, and the extent to which it should be evaluated in terms of the impact on the patient's loved ones. These questions have not yet been tackled systematically by health economists.
A third inadequacy concerns the methods that have been used to estimate the value of health within the current evaluative framework. It is possible to think about these methods in terms of developing the descriptive system for measures and in terms of developing the set of values for use in measures. Previously, descriptive systems used in the formation of QALYs have been developed using a variety of methods, 9 including reviews of other health status documents (Quality of Well-Being scale 10 and EQ-5D 11 ), review of policy documents (15D 12 ), review of the general literature (Quality of Well-Being and Health Utility Index 13 ) and mapping from larger health measures (SF-6D 14 ). It has not generally been clear how the final descriptive system has arisen from these reviews, although the EQ-5D appeared to combine researcher expertise with the literature in developing the attributes, 11 and the Health Utilities Index used lay raters to select the most important attributes from the broader list generated. 15 Measures have not, to date, been based on generating attributes through directly finding out from people what is important to them! In contrast, it has been highly recommended by those developing discrete choice experiments (DCEs) that qualitative work is conducted, 16 but little guidance has been given about how this qualitative work should be translated into the final attributes and their levels, 16 and in practice the information provided in empirical studies is exceedingly brief with little evidence of rigour in the qualitative work associated with attribute development. 17
Methods for developing values for economic index measures are well established, but in general have assumed that people can express their preferences on a cardinal scale using complex tasks such as standard gamble and time trade-off. The appropriateness of these methods for use in a population of older people is questionable (and has been highlighted by the fact that older people can find self-completion of the EQ-5D instrument itself difficult 18 ). The introduction to health economics of pairwise DCEs in the 1990s19,20 addressed this issue of expressing preferences on acardinal scale but still arguably retains a complex task structure. There are also difficulties in obtaining adequate insights into heterogeneity of preferences with these traditional discrete choice methods because of the limited information available from each completed task. New methods for developing values are needed that address these issues.
Each of these issues was tackled through the UK Medical Research Council Health Services Research Collaboration ICEPOP programme. This paper provides an overview of the ICEPOP programme, outlining individual studies and their contribution to the issue of evaluative spaces within health economics. Readers are referred to more detailed papers that are published or in press. The paper continues by outlining the nature of the ICEPOP programme and the advances made in relation to evaluative spaces within health economics. The paper ends by considering areas for further research.
Advances made by the ICEPOP programme
The ICEPOP programme began in 2004, building on the earlier Effective and cost-effective care for older people programme run by Jackie Brown between 2001 and 2004. From its start, the ICEPOP programme has taken a dual focus, on topics important to older people on the one hand, and on methodology related to the use of DCEs on the other. In terms of topics, there have been three main foci: capability and the quality of life of older people; unpaid care for older people; and end-of-life care for older people. Methodological work has particularly focused on the use of best–worst scaling (BWS),21–23 a particular type of DCE, in health care. Within this methodological strand there has been work on optimizing response to DCEs, simulation work in relation to study design and analysis, work on individual level preferences and work on characterizing preference heterogeneity. A separate strand of methodological work has focused on the use of qualitative methods in developing attributes and levels for DCEs.
During the programme, the two areas have moved forward in parallel, with individual studies informing the methodology of DCEs, and the particular DCEs providing useful information for decision-making for older people. The methodological work has largely been conducted across the topic areas associated with the programme, but has also drawn upon studies of organizational and treatment preferences outside the area of care for older people, specifically in relation to the organization of dermatology services and treatments for depression.
A third, and over-arching, theme that has emerged within the programme, however, is the notion of extending and improving the evaluative spaces within which economic evaluation in health care takes place. It is this focus that pulls together the substantive and methodological strands of the programme, and the contribution of the programme is outlined in these terms below.
Beyond health: the ICECAP capability index for older people and attributes for end-of-life care
The ICEPOP programme advances the notion of going beyond health in evaluating outcomes in two areas. The first is in relation to the quality of life of older people and the second concerns what older people think is important to them in terms of end-of-life care. In both areas in-depth interviews have been used to understand what is important to older people themselves in terms of these important issues and the BWS approach has been used to obtain values for the resulting quality-of-life index.
For understanding the important attributes of quality of life, in-depth interviews were conducted with 40 older people. Discussion during interviews focused initially upon a number of factors that influenced quality of life. These were activities, relationships, health, wealth, surroundings and religion/faith/spirituality. Probing during interviews and further analysis, however, indicated that five conceptual attributes were important. These were: attachment (feelings of love, friendship, affection and companionship); role (the idea of having a purpose or ‘doing something’ that is valued, either by the individual and/or by others); enjoyment (ideas of pleasure, joy, and a sense of satisfaction); security (ideas of feeling safe and secure, not having to worry and not feeling vulnerable); and control (being independent and able to make one's own decisions). 24
The interviews also, however, suggested that the quality of informants’ lives was limited by the loss in ability to pursue these attributes. 24 This led the authors to link this research with Amartya Sen's notion of capabilities, which distinguishes between functioning (achievement) and capability (freedom to achieve), and recommends that evaluation is conducted in the space of capabilities.25–28 In the light of this work and the empirical findings, the five conceptual attributes developed during the in-depth interviews were interpreted as a set of functionings, the freedom to achieve which (or capability) appeared to be important to older people. As a consequence of this important finding, it was decided that the measure of quality of life for older people would concentrate on developing an index of capability rather than a utility measure. 24
The ICECAP (ICEpop CAPability) index for older people is now in existence, following a further round of qualitative work to establish appropriate terminology for a self-completion measure, and a survey of 315 older people that used the BWS technique to produce a set of population tariffs for use in economic evaluation. 29 Anchoring of the measure on a 0–1 scale has been achieved by no capability on any attribute being given the value 0 and full capability on all attributes being given the value 1. The measure has been used in a small number of surveys to date, and is starting to be used in intervention studies. Progress has been made in establishing construct validity for the measure.
Twenty-three in-depth interviews have been conducted with older people to find out about what they believe to be important about end-of-life care, dying and death. Individuals were sampled from the general population, those receiving residential care and those receiving palliative care, so capturing older people at different points along the trajectory towards death. Analysis of the resultant data is ongoing and conceptual attributes for end-of-life care are currently being developed. These clearly go beyond health in that themes such as dignity and preparation for death are emerging as important.
Beyond the individual patient: the Carer Experience Scale and a framework for evaluating end-of-life care
The ICEPOP programme is extending the evaluative framework to look beyond the patient in two areas. The first of these is in terms of the experience of unpaid carers, and the second is in terms of care at the end of life. The work to develop a measure for carers aimed to develop a self-report measure of the caring experience, concise enough to be scaled through a preference-based valuation exercise and comprehensive enough to cover the broad caring experience. The work on end-of-life care is at an early stage, but has involved developing a broader conceptual evaluative framework that includes the families of older people.
To develop the measure for unpaid carers, meta-ethnography (a technique to synthesize qualitative research 30 ) in combination with in-depth interviews was used to understand the important attributes of the caring experience for unpaid carers. 31 Meta-ethnography was used to generate conceptual attributes of the caring experience; comparison of the study findings produced six attributes covering the process of caring. Semi-structured interviews with 16 carers were then conducted to refine the content and language of the attributes. These interviews revealed the importance of including an attribute about the positive aspects of caring and avoiding value-laden language such as ‘relationship’ or ‘duty’ in the measure. The meta-ethnography and interview findings were used to develop the Carer Experience Scale (CES) – a 6-item (attribute), 3-level questionnaire. CES attributes cover: activities outside caring; support from friends and family; assistance from organizations and the Government; fulfilment from caring; control; and getting on with the recipient. 31 Subsequent work to assess the feasibility of completion for the CES and to value the 729 caring ‘states’ using BWS has recently been conducted. Initial indications of feasibility are encouraging with around 90% of CES scales fully completed. Ultimately, the intention is to use the CES to measure the caring experience in surveys and intervention trials.
Recent work has drawn upon both the economics and wider literatures to develop a framework for the evaluation of end-of-life care. This framework not only considers the patient (as in the work reported in the section above), but also how the benefits to families of end-of-life care should be considered in economic evaluation, both before and after the death of the patient. Further empirical work will be required to expand upon the details within this framework, but it offers a promising starting point for considering the wider implications of interventions at the end of life.
Beyond literature review: using qualitative methods to develop attributes
One of the major advances made by the ICEPOP programme has been in developing rigorous qualitative approaches to the development of attributes and levels both for the measures described above and for individual studies in other areas. It has also been important to draw attention to the poor state of reporting of the development of attributes associated with previous measures and DCEs.17,24
It has become increasingly clear through this work that there are in fact two stages to developing the attributes for DCEs, whether these two stages are explicitly separated (as in the work on the ICECAP measure 24 ) or not (as in work reporting the development of a DCE estimating the value of preferences for different types of dermatology consultation 17 ). The first stage concerns the development of conceptual attributes and the second stage concerns the development of terminology that evokes the meanings of these conceptual attributes. Although the same qualitative method may be used for both stages, it is equally feasible to separate these two stages in terms of both methods of data collection and analytical methods. The most extreme example of this use of different methods in the programme has been in the development of the Carer Experience Scale, where meta-ethnography 30 was used to develop conceptual attributes and in-depth interviews were then used to consolidate attributes and develop appropriate terminology. The development of the ICECAP utility measure used in-depth interviews at both stages, but different analytical techniques: ‘Framework’ 32 for the initial conceptual development and iterative constant comparative techniques for the development of terminology. The work developing attributes for end-of-life care is using in-depth interviews analysed using constant comparative techniques and the writing of accounts for both stages. Despite these differences in approach, the policy of the ICEPOP programme has been to provide clear descriptions of the methods associated with the development of measures and instruments for all work conducted through the pro-gramme.17,24,31 This will aid other researchers conducting work of this kind and also provides future researchers wanting to use the measures with clear information about the quality of the descriptive systems for measures. The question of which particular methods to use in which circumstances depends on a host of factors including the sensitivity and complexity of the topic, the extent to which qualitative work in the area has previously been conducted and the ease of obtaining informants.
Beyond cardinal valuation techniques: using BWS to reduce task complexity and provide insights into heterogeneity
The principal preference elicitation methods used to estimate QALY weights have required respondents’ responses to have cardinal properties: responses must be on a defined numerical scale. Respondents must be able to say by how much they prefer one health state to another (albeit in the majority of cases by responding iteratively to a set of binary choices until reaching a point of indifference between alternatives) and this is a relatively strong assumption. As a result there has been increasing interest in methods requiring responses to have only ordinal properties:22,33,34 stating that A is preferred to B is cognitively easier than stating how much A is preferred to B.
Ordinal tasks can be used to make meaningful inference about underlying numerical values by appealing to random utility theory.35,36 Random utility theory is a simple theory of human decision-making that has been successfully applied in many areas of applied economics: Daniel McFadden won the Nobel Prize in 2000 for his seminal work in 1974 that allowed preferences to be estimated in a regression framework. The intuition behind random utility theory is as follows: when choosing between two objects, A and B, the probability (relative frequency) that A is preferred to B is an increasing function of the utility of A minus the utility of B. 35 Thus, as the utility of A increases relative to B, more respondents will choose A (and choose it more often if they perform multiple choices). In this way, relative frequencies of choice can be used to produce numerical estimates of the utilities associated with particular health states.
DCEs have been the principal type of ordinal task used in applied economics and the ICEPOP team originally intended using a DCE to estimate tariffs for the ICECAP capability index. A traditional DCE would have required older people to choose which state they preferred from a set of two or more. This task would have had to be repeated a number of times, with the health states changing for each question. Anecdotally, older people are the one group in society who have difficulty with DCEs; keeping two or more general quality-of-life states in mind at once was anticipated to be a difficult task, particularly for frail older people or for the oldest old. As a result, a random utility theory-based ordinal task called BWS was used instead, which presented the states one at a time. Instead of respondents making choices between states, they made choices within states. In particular, they were asked to choose the best and worst attribute (dimension) defining the state, based on the levels the attributes took. Box 1 shows an example of one of the states from which respondents were asked to choose the best and worst attribute. This technique is referred to as BWS and it has already been used to estimate EQ-5D tariffs in a cancer context. 37 The proof of its statistical properties was published in 2005, 21 and it has been used in other health care contexts as part of the ICEPOP programme.38,39
Example scenario
You can have a lot of the love and friendship that you want
You can only think about the future with some concern
You are able to do all of the things that make you feel valued
You can have a lot of the enjoyment and pleasure that you want
You are unable to be at all independent
An advantage of best–worst methods is that they elicit more information from respondents than traditional ‘pick one’ DCE tasks. 22 This enables researchers to go beyond estimating (simply) the population-level tariffs (representing the preferences of the hypothetical ‘average’ citizen). This may be important as there is an argument that the differential preferences of population subgroups should also be considered by NICE, 40 as well as evidence of heterogeneity in preferences among the UK population in relation to EQ-5D. 41 It has been shown that the use of efficient statistical designs coupled with best–worst questions can give considerable insights into preference heterogeneity, allowing the estimation of individual respondents’ utility functions in some instances. 42 Investigation of heterogeneity in preferences for capability states is ongoing using the ICECAP data. When this is complete, it is anticipated that policy-makers will be able to choose between the population-level ‘average’ preferences of people aged over 65 years and over, and alternative sets of preferences based on clinical and socioeconomic information. The choice of value set will undoubtedly have equity implications (in that use of different value sets would have distributional implications), but knowing these a priori will allow public debate about the extent to which values that differ from the average should be important for, and used in, decision-making.
What still needs to be done? Going further beyond the current evaluative framework
There are a number of specific aspects of work that follow closely from the progress already achieved within the ICEPOP programme. In relation to going beyond health as an outcome measure, further work on the ICECAP capability index will include empirical work using focus groups in an attempt to obtain values through discussion and debate, as advocated by Sen and colleagues.43,44 Additional assessments of the validity, reliability, feasibility of use and sensitivity to change of the index will likewise be conducted. This work will be particularly important in establishing the extent to which each of the attributes can be influenced by health and social care. These assessments will include comparisons of prediction errors from the population level model with those from models that explicitly account for preference heterogeneity. Prediction errors from a random utility-based regression model give an indication of how well the model estimates explain the actual choices made by respondents. As such, they represent a similar test to that conducted by Roberts and Dolan upon the main UK EQ-5D data 41 (which showed failure to predict actual choices in substantial minorities of respondents). The work will also be extended to produce a capability measure to use among the whole adult population which will be particularly valuable for the assessment of public health interventions. Once attributes are developed for end-of-life care, the terminology for using these attributes will be tested and the aim will be to develop a population tariff for use in such interventions. In terms of going beyond the individual, analytical work remains to establish a final set of population tariffs for the Carer Experience Scale and the measure needs to be assessed for use in practical contexts. Empirical work is needed to further develop the broader framework for evaluating end-of-life care. Issues of anchoring both for the Carer Experience Scale and end-of-life care remain to be tackled.
Methodologically, there are a number of avenues for further research. These include further exploration of the use of qualitative methods in developing attributes, in particular using methods not yet explored during this programme such as focus groups, and also a clearer delineation of the types of methods that might be most useful for particular situations. BWS studies can be valuable preference elicitation exercises in their own right. However, in some contexts they are unlikely to provide all the information on preferences that policymakers require and will be most useful as an adjunct to other studies. This raises the issue of synthesizing data from more than one source. 45 Performing this in a general DCE context is now relatively straightforward, but synthesizing data from different types of choice task offers new challenges. The ICEPOP team will therefore explore the potential for drawing on inferences from DCEs, BWS and conventional choice tasks in future research.
More generally, however, there is the need for greater attention to be paid to both the theory and the empirical data underlying the evaluative framework within health economics. The work of ICEPOP has mainly concerned the evaluative space and its extension beyond health outcomes. The starting point of the ICEPOP programme was that the perspective of maximizing health was insufficiently broad, and so the issue of how to deal with this without going back to the monetary measurement of welfare economics becomes pertinent. To date, ICEPOP has developed measures that go beyond the evaluative space of health, and has focused more broadly on both quality of life and individuals other than the patient. The question then becomes one of whether to, and if so how to, integrate these different measures to form a coherent whole. This difficulty is one that may be used by non-welfarists as a justification to return to the single objective of health maximization. Yet this seems to us inappropriate. Just because easy comparisons can be made in terms of cost/QALY gained does not mean that they should be made; other factors outside health are important in decision-making and should be included. The real world is a messy place in which the health of patients is not the only important factor in health care decision-making. One way of dealing with this would be to use the less summative method of the cost consequences approach to economic evaluation, 5 enabling the outcomes for quality of life, caring, quality of death and so on, to be set out alongside information on health outcomes, with the general aim still being to obtain the most benefit for the cost expended. Benefit is broadly defined here, and could include taking account of distributional considerations. Other possibilities for methods of combining information about different impacts on different groups should also be explored.
To conclude, the ICEPOP programme has taken a number of steps forward in expanding the evaluative space for health economics beyond that of health. Ultimately, the aim is to improve decision-making around those factors that impact on older people, as well as to improve the methodological rigour with which data for decision-making are obtained.
Footnotes
Acknowledgements
We would like to thank: Jackie Brown who began the work on the quality-of-life index for older people; Simone Angel and Tamara Al-Janabi for their contributions to the work of the ICEPOP programme; Tony Marley for conceptual input; Ini Grewal and Jane Lewis for their roles in the qualitative work on the quality-of-life index for older people; Lucy Natarajan, Kerry Sproston and all survey interviewers for their roles in the collection of survey data for the quality-of-life index for older people; Karen Forbes, Julian Abel, Hilary Jennings, Alison Rich, Sue Perry for their collaboration on the work on end-of-life care; Alex Fox and the Princess Royal Trust for Carers for help in distributing the survey to test and derive valuations for the Carer Experience Scale; David Kessler, Glyn Lewis, Mel Chalder, Katrina Turner, Anne Haase for assistance with mental health research collaborations; Chris Salisbury, David de Berker, Sue Horrocks, Alison Noble, for their assistance in collecting dermatology data for use in methodological work; Ray Oppong, Jesse Kigozi and Adam Tipper for work on MSc theses contributing to the general programme area; Paul Dieppe, Cherida Hopper, Carol Davies and Linda Morris for their support through the HSRC. We would also like to thank Richard Smith (chair), John Brazier, David Cohen, John Gladman, Jeremy Horwood, Richard Huxtable, Liz Lloyd, Colette Reid, Mandy Ryan and Phyllis Watkins for their active participation in the advisory group for this programme of work. Finally, we would like to thank all those informants and research participants who have freely given of their time to assist us in this research, and two anonymous reviewers for their comments on this paper. The ICEPOP programme is supported by the MRC Health Services Research Collaboration.
