Abstract
This article presents research undertaken as part of a wider programme of work concerned with measuring and health and wellbeing for economic evaluation. The focus is on developing quality adjusted life years (QALYs) in mental health, but the issues are common across all areas of health care. The article begins by reviewing the issues of what should be valued (health or broader notions of wellbeing), how mental health and wellbeing should be described, how mental health states should be valued and who should do the valuing.
The article presents four pieces of work. The first is a re-analysis of the ONS Psychiatric Morbidity 2000 Survey to provide evidence on the relevance of generic measures across different mental health disorders. It found that common mental health problems, such as anxiety and depression, had a significant impact on the generic preference-based measure of health in the SF-6D, but psychosis and personality disorders did not. The article then presents two studies using the ratings of people experiencing the states of health. Both studies found that people experiencing different health states gave mental health greater weight than physical health compared to members of the general public trying to imagine the health states.
Finally, the article presents a study developing a condition-specific preference-based measure for calculating QALYs from an existing measure of mental health, the CORE-OM, using modern psychometric methods to construct health states amenable to valuation. It also considers a proposal to develop an entirely new QALY measure in mental health.
Introduction
The last two decades have seen the increasing use of economic evaluation to inform resource allocation in health care systems around the world. A core component of any economic evaluation is the way the benefits of the programme for health and wellbeing are measured and valued. The research presented in this article is part of a wider programme of work concerned with measuring health and wellbeing for economic evaluation. The focus here is on mental health, but the issues raised in mental health and the methods of addressing them are common across all areas of health.
The most widely used technique of economic evaluation in health care has been cost effectiveness analysis and an increasingly applied version uses the quality adjusted life year (QALY) to assess effectiveness in units that are comparable across health care pro-grammes. 1 The number of QALYs is calculated by multiplying a person's expected years of life by the value of their health status in each period on a scale where full health is 1 and states equivalent are given a value of 0 (with states worse than dead being given negative values). Typically health states are valued by members of the general public using a range of techniques including: visual analogue scaling (VAS), that asks respondents to place a state on a scale with best imaginable to worst imaginable (or being dead); time trade-off (TTO), that requires a respondent to trade quality for quantity of life; and standard gamble (SG), that asks respondents to consider the level of risk with their life they are willing to take to return to full health.
The most commonly used measures for putting the ‘Q’ into the QALY are the generic preference-based measures of health, such as the EQ-5D 2 and SF-6D. 3 These generic measures have been adopted by agencies such as NICE as part of their reference case. 4 These instruments use descriptions of health that purport to be generic measures and so are suitable for all conditions, including mental disorders. The research reported in this article briefly considers the evidence in relation to this claim in mental health and then considers the following questions in more detail: what should be valued in mental health, how should mental health and wellbeing be described, how should it be valued and who should do the valuing?
Measuring and valuing mental health
Expenditure on mental health services represents 11% of the budget of PCT spending, which is twice as much as for cancer. Nearly one-third of people going to GP surgeries in this country have mental health problems. Mental health problems also have an enormous impact on the UK economy. It is important to have the right tools for putting the ‘Q’ into the QALY, sometimes known as health state utilities, in mental health conditions.
Reviews of the literature found that most research uses either values directly obtained from patients or valuations of vignettes that have been derived bespoke for conditions like schizophrenia, rather than one of the standardized generic preference-based measures. 5 Most of this evidence would not meet the NICE reference case. This finding is confirmed by a HTA review that found only a tiny fraction of all trials use generic measures of any kind. 6
There is increasing pressure on mental health researchers to use generic measures, but are they suitable for use in all mental health disorders? The EQ-5D, for example, has five dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/depression, and each dimension has three levels (no problems, some problems and severe problems). There is evidence for the validity of the EQ-5D and other generic measures across many physical conditions (reviewed by Brazier et al. 7 ), but there are physical health conditions where this was not the case, including macular degeneration 8 and hearing loss. 9 Evidence in mental health is limited and what there is has been mixed. Generic measures (such as the SF-6D) have been shown to reflect the impact of common mental health conditions, such as mild to moderate depression and anxiety, 10 but a recent study of rehabilitation in depression and anxiety found the EQ-5D to be unresponsive to improvements in a population reflected in other instruments, including the SF-6D. 11 The EQ-5D preference-based index also failed to reflect changes in a population with chronic schizophrenia. 12 The authors concluded that the physical components of this instrument are ‘over stressed in a psychiatric context’. There is currently considerable scepticism about generic preference-based measures among mental health economists.13,14 We have tried to advance this concern with a re-analysis of the ONS 2000 Psychiatric Morbidity Survey.
The first question is what should be valued? Conventional economic theory suggests that we should be interested in the overall utility or wellbeing of all those likely to be affected by an intervention. This implies that we are interested in wellbeing as a whole, rather than any specific source of wellbeing. A narrower view has developed among many health economists that health has a special status among the different sources of utility as it represents a more fundamental attribute that is needed in order to participate in a range of activities and roles in society. 15 This limited focus on health has been questioned in health economics and in mental health it may be seen as too limiting, since by its very nature, mental health problems extend beyond conventional definitions of health.
The next question is the way mental health outcomes should be described. Health economists have often favoured generic measures in order to allow cross-programme comparison, however this presupposes that they are valid and responsive in all groups of patients. Existing evidence suggests this is not the case. This has lead to the development of more specific preference-based measures in a number of conditions. Some of this work has been part of the health and wellbeing programme (of the HSRC), including preference-based measures in urinary incontinence, 16 asthma 17 and overactive bladder. 18 However, this raises concerns about comparability between QALYs generated by different condition-specific instruments. Using the same valuation methods (and anchors) does not guarantee comparability due to focusing effects by respondents and preference interaction with co-morbidities. 8 We have obtained a research grant from the MRC/NIHR Methodology Board to examining the problems of developing QALYs from condition-specific measures, including a mental health measure (see later for discussion of the CORE-OM study).
The valuation techniques of TTO, SG and VAS have been used to ask members of the general public to value mental health states and some studies have successfully administered these methods in patients with mental health problems, including schizophrenia. 5 However, there are concerns that these techniques are too complex in vulnerable groups and that values may be distorted by extraneous factors such as time preference and loss aversion that makes it is difficult to ascertain the value for health per se for calculating QALYs. 19 For this reason we have been looking at the use of ordinal methods, such as ranking and discrete choice experiments, for valuing health states in this programme.20,21
The final question is about who should value the health states. There is evidence to show that general population and patient values differ in a systematic fashion, particularly where patients are being asked to value their own health state. In physical conditions, patients tend to give higher values for the same state.22,23 A common explanation for this is that patients in some way adapt to the state, whether physically or psychologically. There is little evidence on the direction and size of this divergence in mental health, if indeed it exists. Research undertaken as part of this programme provides interesting new evidence on this question (see below). However, the implications of differences between the values of the public and patients for agencies that use QALYs (such as NICE) are that they must decide whose values they want to use. 24
What the programme has done to advance the subject
The health and wellbeing programme began three years ago with my appointment at HSRC. The four studies reported below were all carried out as part of this programme. The third study was supported by the MRC and the rest through grants and studentships from other funding sources.
Modelling the impact of mental health disorders on health state values: are generic measures appropriate in all mental health disorders?
The aim was to examine the impact of mental health disorders on health state utility values (SF-6D). 25 The data-set was the Office of National Statistics (ONS) Psychiatric Morbidity Survey (2000), a random sample of 8580 adults aged 16–74 years drawn from the UK population. Diagnoses of specific neurotic disorders were assigned by ONS using answers to various sections of the CIS-R and applying algorithms based on the ICD-10 diagnostic criteria (WHO, 1992). Diagnosis of psychotic disorder was more problematic for lay interviewers and so a second interview was undertaken of suspected cases using the SCID-II. Data were also collected on background characteristics, physical conditions and the SF-12.
The analysis involved applying the SF-6D preference-based algorithm to the SF-12 data 26 and in the first instance, we estimated an OLS model to examine the impact of a broad range of mental health conditions on the SF-6D index after controlling for background variables. Statistically significant decrements were estimated of 0.075 to 0.144 on the SF-6D scale, that were associated with generalized anxiety disorder, mixed anxiety/depressive disorder (MAD), depressive episodes/affective disorders, obsessive compulsive disorder, panic disorder and any phobia. 25 These exceed the established clinical minimally important difference on the SF-6D scale of 0.04. 27 Furthermore, these were larger than those from self-reported physical disorders in the same population. However, dependence on any drug was significant, but substantially smaller (at 0.025) and the coefficients for alcohol dependence, personality disorder and probable psychosis were not significant.
The model achieved a reasonable explanatory power, with an adjusted R-squared of 0.38. Further modelling strategies are being pursued, but these initial findings support the argument that mental health disorders account for a large decrement in health state values. They also support the results of the literature review that generic measures, such as the SF-6D, may have validity for common neurotic disorders such as depression and anxiety, but less so for more complex problems such as personality disorder and psychosis. However, these are highly aggregated data using one measure and the sample sizes are quite small for some conditions (e.g. n = 60 for psychosis).
A comparison of patient and general population weightings of EQ-5D dimensions: do patients give mental health a different relative weighting compared to the general population?
This research examined the differences in health state valuations given by patients when they are asked to value their own current health state, and that given by members of the general public when they are asked to value hypothetical health states. 28 The data-set consisted of 4137 observations on EQ-5D profiles and EQVAS obtained from 3376 patients, from a pooled data-set covering eight different conditions. Two analyses were carried out. In the first, the patient self-rated VAS was compared by health state to the general population VAS values. In the second, the patient self-rated VAS values were modelled against their self-reported EQ-5D state using OLS. The patient model regression coefficients were compared to the corresponding coefficients from the UK EQ-5D general population study using the VAS valuations of the EQ-5D states.
There was a statistically significant but small difference between the mean VAS health state values from patients and the general population values (0.64 for patient VAS, 0.654 from the population value set). The regression modelling found significant differences between the coefficients of the patient VAS model and the population VAS model for the EQ-5D health dimensions of Pain/ Discomfort, Mobility and Anxiety/Depression. Anxiety/ Depression had the largest impact on the patient model compared to Pain/Discomfort in the general population model. The coefficient for the worst level of Anxiety/ Depression was 0.191 (standard error = 0.004) in the general population VAS compared to 0.252 (0.013) for the patient VAS. For mobility and pain/discomfort the order was reversed at 0.200 (0.005) compared to 0.129 (0.028) and 0.226 (0.004) and 0.098 (0.011), respectively. These represent differences of 0.061, 0.071 and 0.128, respectively, that are all statistically significant and potentially important in a cost effectiveness analysis. Overall, patients give a higher weight to the mental health dimension compared to the physical health dimensions than do members of the general public and this is in patients who have physical health problems.
This study uses the VAS, which for many health economists is not a recognized technique for eliciting utility values. 1 More importantly, the EQVAS asks respondents to rate themselves on a health scale and so it is arguable as to whether it really assesses the impact of health on wellbeing at all. This criticism was addressed in the next study.
Exploring the relationship between health and happiness: do patients give mental health a different relative weighting compared to the general population?
The main objective of this study was to investigate the impact of different dimensions of health, as measured by the EQ-5D and SF-6D, on happiness, as measured by the happiness question in the SF-36. 29 The data used were the pooled results from 15 studies undertaken in ScHARR and elsewhere. There were 12,685 cases covering asthma, chest pain, healthy older women, COPD, menopausal women, IBS, ITU patients, leg reconstruction, leg ulcers, lower back pain, osteoarthritis, trauma, sleep clinic, varicose veins and non-patients. Multivariate ordered probit analysis was carried out with the happiness item in the SF-36 as the dependent variable. Several models were investigated with demographic factors, health dimensions of EQ-5D and SF-6D, their preference-based single indices and the medical conditions as independent variables.
The results indicate that the most important dimensions of health associated with happiness are concerned with mental health (probability of reporting a high level of happiness is lowered by –4.6% to –35.7% in SF-6D) and vitality (–4.3% to –49%). Social functioning limitations also has negative associations (–5.7% to –8.1%). Problems in physical functioning and role limitation due to physical health have an unexpected result as they indicate positive association with happiness (3% to 8.5% and 1.9%, respectively) after controlling for other variables. Pain dimensions are not significant indicating there is no association between happiness and pain when controlling for other dimensions. Actual medical condition had an independent effect that was negative for some conditions such as osteoarthritis (–6.1% and –46%).
These findings rely on a comparatively crude measure of happiness, but they suggest that the relative impact of mental health and vitality dimensions of EQ-5D and SF-6D on happiness differs from that suggested by general population valuations.2,3 This provides further evidence of a marked difference between conventional general public preferences and those of people experiencing the problems. Further work is planned to examine the relationship between health and happiness in a mixed methods study (as part of an HSRC studentship).
Developing a preference-based measure from the CORE-OM: investigating the scope of using existing measures
This is a piece of ongoing work being undertaken by a PhD student using methods developed within the programme for deriving a preference-based measure from an existing measure of health and health-related quality of life. This approach was first used in deriving the SF-6D preference-based index from the SF-36 for selecting items from the latter to form a health state classification amenable to valuation. 3 The methods have advanced significantly since then with the use of Rasch analysis alongside conventional psychometric methods for selecting items. Recent examples of this work include selecting from the Asthma Quality of Life Questionnaire and Overactive Bladder Questionnaire for their respective health state classifications.17,18
The CORE-OM is a 34 item measure routinely used in many health service trusts in the UK to assess the outcome of psychological services. 30 It covers subjective wellbeing, problems and symptoms (anxiety, depression and consequences of trauma), functioning (general, close relations and social relations), and risk to self and harm to others. Preliminary analysis has suggested that most of these items are highly correlated, so that forming a health state classification with independent dimensions covering these items is not possible. This means that any orthogonal design of health states for a valuation study would include unlikely combinations of levels (i.e. combing severe problems on one item with good responses on another in a way that would not arise in practice). This has resulted in the development of an alternative to the health state classification approach used in previous studies that takes a Rasch-based vignette approach. The resulting subset of health states will be valued in a survey of the general public (and possibly a sample of users). The next step in this research will be to model the relationship between the values for these states and the latent variable generated by the Rasch model in order to value all states defined by the CORE-OM. This approach is exploratory, but ultimately should significantly advance the field.
Where does the subject need to go next?
There has been an increasing take-up of generic measures in mental health due to the requirements of policy-makers. However, existing measures for deriving QALYs were not designed for mental health problems, and consequently their descriptive content may not capture the impact of many mental health problems. Initial evidence suggests that generic measures may be adequate in depression and anxiety, but not in psychotic and complex conditions.
There are three areas of priority for further research that will help promote the appropriate use of QALYs in mental health: further testing of existing generic preference-based measures, developing a mental health preference-based measure for those areas that are not well-served by existing measures, and examining the policy implications of the gap between general population preferences and the experiences of people with mental health problems. There are numerous additional areas of research but these are not limited to mental health, such as further developments in modelling preference data, including ordinal data, that require more work, but these are beyond the scope of this article.
More research is needed to assess the extent to which existing generic measures are valid across the various mental health disorders and combinations of disorders found among mental health service users. This requires a mixed methods approach that uses qualitative methods with a small number of users across a broad range of mental health problems to assess content validity and psychometric testing of validity and responsiveness in a large scale survey of mental health service users. The generics measures may prove valid for many conditions and this will help convince mental health service researchers to use them in their trials and so improve their value in cost effectiveness modelling. For some conditions, the generic measures may prove not to be valid and that more specific measures are required.
One solution to the problem of the absence of generic measures in key trials is to map from specific measures, such as the Beck Depression Inventory, onto one of the generic preference-based measures.31,32 These mapping functions can predict generic preference-based measures with an adequate degree of accuracy for many cost effectiveness models, but a recent review of mapping studies showed that performance is variable and it really depends on the degree of overlap between the descriptive systems. For this reason, a proposal was developed within this programme (and subsequently funded by MRC) for developing a preference-based approach to mapping between measures that does not depend on such overlap, since values are being elicited from respondents for states drawn from different measures. The aim being to value different health states on the same scale. This work is in progress and may provide a longer term solution to using different measures. However, this still leaves a concern about the impact of co-morbidities.
A further limitation is that many measures used in mental health research are not suitable for valuation. A review undertaken by the Department of Health's Expert Group on mental health outcomes identified the more widely used measures, including HONOS, CORE-OM, GHQ, BDI, Lancashire Quality of Life Scale, CAN, FACE, MHI-5 (from the SF-36) and MANSA. Many of these instruments are widely used in clinical trials and other important empirical work examining outcomes in mental health. They tend to be designed for specific mental health problems and as argued next this is a shortcoming in an area with multiple problems. Furthermore, they are not suitable for valuation using preference elicitation methods due to their size and complexity. This is why an important area for further work is to develop a generic preference-based mental health measure to be used across mental health rather than any specific disorder.
A measure that focuses on one disorder or group of disorders will have limited use. Mental health service users often have more than one diagnosis, particularly those with severe and complex problems. To estimate the overall impact of mental health problems and their treatment it is important to have a generic mental health measure. This will also allow the measure to be used to make comparisons across groups of service users regardless of their specific disorder or disorders. Many individuals with mental health problems will also have physical co-morbidities and these need to be built into the measure for this reason and to allow comparison with physical health programmes. However, the emphasis will be on developing a measure focused on the consequences of mental health problems.
The first stage in developing a preference-based measure of mental health would be to develop a set of items to form a structured health state classification system amenable to valuation using the techniques described earlier. This descriptive system should reflect the views of users of mental health services and at the same time pass standard psychometric testing. There would be three components to this stage of the research: (1) in-depth qualitative interviews; (2) psychometric testing; and (3) expert judgment in the light of this evidence and the content of existing mental health instruments. The second stage would be to value health states defined by the new descriptive system using one or more of the techniques described above. Valuations would need to be obtained from the general public to meet the current NICE reference case, but should be supplemented by valuation work with users to establish the size of any divergence.
The final area for further work is to explore further the divergence between general population and patient valuation of health states. Current evidence suggests that the direction and size of divergence may differ between physical and mental health problems, and this needs to be explored further in order to understand its size and the reasons for the difference. Is it, for example, that members of the general public think they would be able to get over mental health problems? Work of this kind is being undertaken for physical health problems in patients and the general population in this programme, but it needs to be extended to mental health. This requires a mixed method approach to understand the cognitive processes at work as well as the quantities involved.
This programme of work will continue despite the closure of the HSRC, with project grants from MRC and other funding bodies and it will continue to have a focus on mental health conditions. Mental health is an area that creates major challenges for the QALY, but it is important to tackle these if mental health services are to avoid becoming a Cinderella to physical health services.
Footnotes
Acknowledgements
I would like to thank numerous collaborators, colleagues and students who have worked with me on this programme. These include Aki Tsuchyia, Jennifer Roberts, Katherine Stevens (MRC Fellow), Clara Mukuria (HSRC PhD Student), Yaling Yang (PhD student), Ifigeneia Mavranezouli, Rachel Mann, Julie Ratcliffe, Chris McCabe, Glenys Parry, Michael Barkham, Sarah Byford and Isabel Towers. The usual disclaimer applies.
