Abstract
Quality monitoring in medicine was a neglected field until the last two decades. Doctors traditionally did their best, but how good that was could not be evaluated. This situation remains in some areas of medicine, but specialties with clearly-defined interventions and outcomes have progressed in quality of care evaluation, and cardiac surgery leads the way. Measuring the risk of an intervention allows prediction of the outcome and is essential for quality monitoring: without knowing the predicted outcome, the actual outcome cannot be evaluated. Cardiac surgery risk models like EuroSCORE and others have been adopted worldwide, so that measuring risk-adjusted performance is now an integral part of the delivery of good cardiac surgical care. When mortality for a procedure is higher for one surgeon (or hospital) than another, this can be due to one of three reasons, or a combination of the three: the difference is due to chance, or variation in risk profile, or better and safer service. We now have the tools to distinguish between the above factors. We can also observe performance over time: cusum curves plot the performance of surgeons and hospitals by showing hypothetical ‘lives saved’. This provides early warning of deterioration in performance before a problem reaches statistical significance. With the appropriate tools, it is possible not only to identify a problem, but also to anticipate and thus prevent a problem from happening. Monitoring clinical performance is an exciting and rewarding field for future development, and one that will yield real improvements in patient outcomes.
Background
Usually, doctors do their best for their patients. For physicians, if medical treatment fails and the patient dies, we blame the disease, not the treatment or the doctor. It is different for surgeons. This is not surprising because of the strong temporal, if not causal, link between the intervention and the outcome.
As cardiac surgery began to stake its claim in the treatment of heart disease, surgeons justified their aggressive, invasive and high-profile intervention by showing that they could achieve cure or palliation for the majority with an ‘acceptable’ risk of death for the minority. Inevitably, a link was forged between operative mortality and surgical performance.
The Freedom of Information Act stipulates that data collected using public funds must be made freely available, to anyone who asks, within 20 days. As soon as the act became law in Britain in 2005, The Guardian newspaper requested the mortality figures for all cardiac surgeons, by name, for isolated coronary artery bypass grafting (CABG) and aortic valve replacement over the preceding three years. Units had to comply. A few did so willingly, many under protest and most worried about how the newspaper would present the data. Some units (Papworth included) submitted risk-stratified data with 95% confidence limits and statistical analyses. Some submitted basic risk stratification (low and high risk). Others submitted crude data. The Guardian treated the data responsibly: they published in alphabetical order (not in order of mortality), explained risk stratification and, where available, published risk data and confidence limits. 1 This was exceptional: whenever other newspapers dealt with these issues in the past, they tended to sensationalize the reports with headlines like ‘The worst hospital in Britain?’ appearing out of unreliable, unadjusted crude data. Transparency is a growing trend. In cardiac surgery, outcome data are no longer confidential. When that happens in other specialties, the press is more likely to be sensationalist than responsible.
Measuring professional performance should be done by the profession, before the media do it for us. Cardiac surgeons have crossed the threshold of a brave new world in which the measurement clinical outcomes is no longer peripheral to our work, but an integral part of it: as important as deciding the indication for treatment, the choice of therapeutic modality and the skill with which it is administered. Moreover, the tools and mechanisms we devise and develop are likely to form the models on which the quality of care is assessed in other surgical and perhaps medical specialties.
Does operative mortality matter?
Governments and health authorities care much about cost and possibly not enough about clinical outcomes. Surgeons and their patients care more about outcomes (and possibly not enough about cost). Sometime in the late 1980s, a health authority paid much money to a famous firm of accountants and management consultants to examine the performance of the two cardiac surgical centres in its area. After an exhaustive study the firm reported its findings, summarized in Table 1.
Cost of cardiac surgery (£)
The accountants concluded that centre A was more efficient at routine procedures and should, therefore, be restricted to simple operations. Centre B, however, was found to be more efficient in complex and re-do surgery and should therefore be expanded as a specialist referral centre for such cases. Sadly, however, the accountants did not examine clinical outcomes. Had they done so, even to a minimal extent, they would have found that mortality rates tell a different story: the ‘efficiency’ of Centre B in complex surgery was due to the high death rate on the operating table (Table 2).
Outcome of cardiac surgery (mortality) (%)
*Most of whom died during surgery
Operating room deaths cost little in comparison with a prolonged and difficult hospital stay, but that is no consolation for the patient, the family and the surgeon who aspire to survival. This example illustrates, in simplistic terms, the dangers of entrusting clinical performance assessment to accountants. Operative mortality is important. Of course, it is only one of many outcomes that determine the success of a procedure, others being morbidity, functional outcome, long-term survival and freedom from re-intervention. Surviving the operation, however, is the sine qua non: without it none of the other parameters can be measured. It is also the first step in establishing performance assessment, and until we have a robust method of measuring it correctly and meaningfully, attention to other areas as performance measures may be premature (Figure 1).

Survival is a sine qua non for other outcome measures
Crude mortality is not enough
When operative mortality is mentioned, surgeons are quick to claim that they operate on higher-risk patients than their colleagues. When mortality for a specific procedure is higher for one surgeon (or hospital) than another, this can be due to one or more of three reasons:
The difference is due to chance; The difference is due to variation in the predicted risk (different case-mix); The difference reflects better and safer service.
The problem with crude procedural mortality is that it takes no account of factors 1 and 2 above. The first can be eliminated by the appropriate use of statistical methods and the second can be taken into account by using a measure of case-mix, or risk stratification.
The impact of factor 2 should not be under-estimated. Few realize that the predicted risk for first-time isolated CABG can vary 50-fold. A low-risk elective CABG has a risk of less than 1%, whereas emergency CABG in a 90-year-old with unstable postinfarction angina being supported by an intra-aortic balloon is over 50%. Yet in the parlance of crude mortality data, they are both ‘CABG cases’.
Despite the substantial knowledge base on risk assessment in cardiac surgery, one newspaper published league tables of CABG mortality in the UK without risk stratification. Having established that the range of CABG mortality was a highly commendable 1–4% across the country, the article began: ‘scores of patients are dying unnecessarily…’ The lesson from this is that if clinicians do not carry out outcome analysis well, others will do it for them, and do it badly.
How do we measure risk?
Risk models range from simple additive scoring systems, such as Parsonnet 2 and EuroSCORE 3 to complex Bayesian and logistic models such as the Society of Thoracic Surgeons (STS) database model, 4 the UK Bayesian model 5 and the EuroSCORE logistic model 6 and others.
Additive models are easy to use, require no equipment and are simple to remember so that a quick mental calculation can be made at the bedside. They are effective for quality control in large series as well as for inter-institutional comparison. Their main weakness is in the specific prediction of risk in high-risk patients where there is a tendency to underestimate risk.
More complex models are better for individual risk assessment but require specialized tools. This is less of a problem with the exponential growth in the availability of information technology. EuroSCORE now offer a full logistic calculator, which can be used online or downloaded from the web for use on local computers or personal organizers (
The value of predicting mortality
A very important benefit of assessing the risk of death is to use this knowledge in determining the indication to operate. Where operation is contemplated for symptoms, this knowledge helps in weighing the symptomatic benefits against the risk. If the operation is purely for prognosis, possession of this knowledge becomes mandatory: we must never offer an operation which carries a greater risk than the risk it seeks to avert. The corollary of this is informed consent: if the surgeon needs this information to determine whether there is an indication for surgery, then the patient needs it before consenting to surgery.
The second benefit is in the assessment of the quality of care: risk prediction gives a standard, corrected for case-mix, against which the performance of hospitals, units and surgeons can be measured. Comparisons may be made for overall cardiac surgery, specific operation types and specific periods of activity. Clever use of variable life-adjusted displays or Cusum curves allows for a massive amount of information about the performance of a surgeon or unit over time to be displayed in a simple one-line graph (Figure 2) and may act as an early warning system of deteriorating performance.

Variable life-adjusted display (VLAD) or cusum graph showing the performance of seven cardiac surgeons over time. The lines plot outcomes (y axis) against activity (number of operations, x axis). Each rise in the graph represents an actual survivor corrected for risk (1 minus the likelihood of survival). Each drop in the graph represents an actual death corrected for risk (1 minus the likelihood of death) (in colour online)
Two approaches to quality monitoring
There are two ways by which the quality of a surgical service can be observed. The first is by peer-review mechanisms, formalized into quality accreditation and the issue of good practice certificates by peers. The second is by ‘naming and shaming’ or in other words, public disclosure of outcome data, with hospitals lined up in a ‘league table’ or ‘hit parade’ according to their clinical outcomes.
League tables or ‘hit parade’
When journalists and politicians have access to information about hospital procedure numbers and mortality, they usually present the information as a league table or ‘hit parade’, with one hospital at the top (lowest mortality) and one at the bottom (highest mortality). Much of this information is already in the public domain and easily accessible. The acute interest that the media and politicians are developing in healthcare outcomes means that we shall continue to see league tables of hospitals and surgeons available to the public. Having begun in New Jersey and New York, this has already happened in the UK through the work of an organization called Doctor Foster (
Freedom of information is good and desirable, provided those who use that information interpret it intelligently and come to the correct conclusions. Simplistic league tables carry a substantial risk of misinterpretation for the following reasons:
first, data may not be validated and contain errors sufficiently large to affect the true position of hospitals in the tables; second, differences perceived by the layman may be due to chance and may vary with time; third, unless the tables take account of risk stratification, any conclusions from them may be invalid as a reflection of the true quality of surgical work.
Even if all the above factors are dealt with, there are two potentially damaging consequences from the publication of league tables. The first is the damage to the hospital at the ‘bottom’ of the table: if it is perceived to be ‘the worst’, it will close or stop working, with the inevitable result of the next hospital becoming ‘the worst’. Taken to its logical conclusion, we will end up with the absurd situation of only unit (even one surgeon?) continuing to operate. The second consequence is more real and more alarming: there is no doubt that the easiest way to move up a mortality league table is to refuse to operate on high-risk patients. Since these are often the patients who stand to gain most from cardiac surgery, the human cost of such a trend will be exorbitant.
League tables or, as they are known in the United States, ‘report cards’ have already caused problems for surgeons, institutions and patients alike. Shahian and colleagues have identified gaming, refusal to operate on high-risk patients and referral to distant centres as some of these problems in their excellent overview on the experience with report cards. 7 Grunkemeier even casts doubt on the validity of existing measures of case-mix to deal with the statistical and medical complexity of cardiac surgical practice. 8 Nevertheless, the keen interest in medical outcomes displayed by governments, media and patients is unlikely to abate in the foreseeable future. As a profession, we must set the standards for the measurement of quality of care and implement the systems by which such measurement is carried out. Risk modelling is essential for this. Our risk models may not be perfect, but they are like a candle: a source of some light in the blind darkness of crude data collection. We must not reject the candle on the pretext of waiting for a future floodlight! In the meantime, all efforts continue to refine and improve risk modelling, now recognized as a scientific discipline with exciting potential.
Quality accreditation and good practice
There is a real need for quality monitoring in medicine in general. It is now totally unacceptable for clinicians to continue to operate in complete ignorance of their own performance.
Good quality surgical work requires robust knowledge of three crucial variables: what is the unit or surgeon doing (activity), what is the expected outcome (risk prediction) and what is the actual outcome (performance). In addition, there must be a preset level or band of acceptable performance, and a robust mechanism for dealing with situations where performance falls below target. In Europe, the major specialist societies have established the European Cardiovascular and Thoracic Surgery Institute of Accreditation (ECTSIA) with a mission to pioneer and implement pan-European quality monitoring with the award of good practice certificates to units with robust systems (
On another level, some hospitals are applying to the International Standards Organization (ISO) for recognition of quality systems in their services. This is an alternative approach which has been proven in industry and which, with some lateral thinking and innovative management, may serve hospitals well. Whatever system is used to monitor actual outcomes, there needs to be a yardstick for comparison in predicted outcomes, and that is provided by a good risk model.
What makes a good risk model?
The validation of a risk model depends on the assessment of two features: calibration and discrimination. Calibration is the accuracy of the model for predicting risk in a group of patients, in other words, if the model says that mortality in a thousand patients is likely to be 5%, and actual mortality is 5% or close to 5%, then the model is well calibrated. Discrimination refers to the model's ability to distinguish between low-risk and high-risk patients. In other words, if most of the deaths occur in patients that the model correctly identifies as high risk, the model has good discrimination, but if most deaths occur in patients that the model actually identifies as low risk, there is poor discrimination. We measure discrimination using a statistic called the ‘area under the receiver operating characteristic (ROC) curve’. If this is 0.5, the model does not discriminate at all. Good discrimination begins at 0.7 and rarely exceeds 0.85. If the area under the ROC is 1.0, the model is no longer a risk model but a crystal ball which forecasts the future (an impossible task).
It is possible for a risk model to have good calibration but poor discrimination, and vice versa. Discrimination is more important than calibration. A model can be recalibrated or adjusted as practice improves, but if the model is built on the wrong risk factors, its discrimination cannot be improved.
Making electricity in Chicago
EuroSCORE was derived from data on patients operated in 1995 and first published in 1999. It is now 10 years old, and is based on data that are even older. In the first 2-3 years after its introduction, there has been a quantum improvement in cardiac surgical survival. Evidence from countries with national databases suggests that mortality in some has approximately halved, despite gradual worsening of the risk profile of patients. This phenomenal improvement in cardiac surgical outcomes appears to have happened in coronary surgery, valve surgery, combined surgery and other procedures. Yet there has been no new discovery, drug or technological wizardry to explain it. How did it come about?
In 1955, Henry Landsberger 9 analysed experiments from 1924–1932 at the Hawthorne Works (a Western Electric plant near Chicago). The company had commissioned a study to see if its workers were more productive in stronger or weaker ambient light. Productivity improved when lighting was changed in either direction and worsened when the study was finished. He concluded that the improvement was due to the workers being motivated by the interest shown in them. When other changes were made and their effect similarly monitored, such as moving work stations, a similar improvement in productivity also resulted. The term Hawthorne effect was coined to describe the improvement that occurs due to the simple introduction of the monitoring of outcomes: in other words, when you measure performance, it improves.
Until the widespread use of EuroSCORE, there was no established measure of cardiac surgeons' clinical performance. EuroSCORE provided the tool for such measurement, and performance improved.
EuroSCORE has, therefore, probably fallen victim to its own success, in that the heightened awareness of the importance of clinical outcomes has resulted in improvement which may have made the model obsolete. Can we conclude from the above that EuroSCORE is no longer useful for the assessment of today's cardiac surgical outcomes? The answer may indeed be yes, but that requires some new data.
In the meantime, users of EuroSCORE can be assured that it is still a valuable tool for assessing cardiac surgical risk. As most studies show, any risk model offers a set standard. Some units will perform at that standard, some will do better and some worse. The best estimate for evaluating the risk of mortality for a patient undergoing a particular procedure at a particular institution is to calculate the logistic EuroSCORE and then to correct it for the performance of the unit in question, so that the patient should be quoted a predicted mortality calculated by multiplying the patient's EuroSCORE by the hospital's risk-adjusted mortality ratio (RAMR) as follows:
Nevertheless, the time has come to improve the model so that it can be fit for purpose in cardiac surgery of the current era. A major study is underway to update the model and to refine the risk factors and their assessment so as to improve not just calibration, but also discrimination. Data collection has begun and it is anticipated that the new risk model will be constructed, validated and available for use within less than two years, with the aim of providing the finest and most practical risk model for cardiac surgeons and their patients everywhere.
Conclusion
In cardiac surgery, risk modelling is now an integral part of the surgical service. Risk-adjusted mortality prediction serves as an indicator of quality and prompts continual improvement initiatives. This is likely to spread to the use of additional outcome measures beyond mortality in this specialty and, with the appropriate selection of outcome measures, to other surgical and medical specialties. The field is an exciting one, with scope for creativity and research that directly benefits patients and clinicians.
