Analyzing and Grading Effectiveness Research: A New Approach

Abstract

The past ten years have been marked by a brilliant period of clarity followed by complete chaos in the world of clinical research in critical care. For one brief moment, we knew how to manage critically ill, infected patients: Administer hydrocortisone in patients with a defective adrenocorticotropin hormone stimulation test, give recombinant human activated protein C (APC), and maintain a glucose concentration of 80–110 mg/dl. Surely thousands of deaths per year would be prevented [1 –4].

Unfortunately, we were completely wrong. Instead, thousands of patients who otherwise would have lived probably died hypoglycemic and immunosuppressed deaths and hundreds of millions of dollars were wasted on an incredibly expensive therapy that did not work (APC). How?

The answer to that question lies in a fundamental misunderstanding of data generated by clinical trials. To understand fully this disconnect, we must go all the way back to the last century and the rise of high-quality, basic science medical research. This form of inquiry is undoubtedly the glory of scientific research in the past 100 years. It is based on meticulous, high-quality, repetitive experimentation and an approach that eliminates all variability except the item of interest. Inherent in this approach is the idea that the simpler the system, the more reliable the results: Studying molecules is cleaner than studying cells, which is cleaner than studying animals, which is cleaner than studying human beings. Reductionism—reducing a problem to its bare essentials with minimal variability—is valuable in trying to break down problems into a more readily “studiable” question and works well for basic science. Unfortunately, in the world of complex clinical care, this approach is suboptimal at best and probably hampers if not frankly hurts progress in human medicine.

Why is a reductionist approach hazardous to clinical trials and, thus, clinical care? First, human beings are not animals [5,6] and cannot be studied ethically as such, reproducibly inducing artificial and carefully defined diseases. Second, human beings are remarkably different from one another from both genetic and social standpoints. Third, because of the complex interactions of all influences on the morbidity and mortality of human disease (e.g., genetic, social, economic), we really do not understand the unmeasured or unknown variables that impact the outcomes of our patients. Fourth, no matter what delusions under which we choose to function, randomization, placebos, blinding, and post hoc multivariable analyses do not reproduce laboratory conditions. Finally, even with randomization and analysis demonstrating a relatively similar distribution of important variables, one cannot assume identicality between cohorts as one would with genetically identical animals.

As it turns out, no matter how one looks at it, human beings are incredibly complex creatures and the medical systems that care for them are equally so. Unfortunately, the importance of this diversity has not been appreciated fully, and forcing clinical research to conform to ideals derived from reductionist basic science approaches has already led to adverse consequences, as noted above.

Clinical research, instead, must embrace this wonderful human diversity and learn to not always ask if and why a specific intervention “works,” but whether an intervention can improve outcomes when put into general practice [7]. The key to arriving at this understanding is an appreciation of the difference between efficacy and effectiveness research. Efficacy research, performed under highly controlled experimental conditions, demonstrates that a given specific intervention has efficacy in a relatively defined patient population. Randomized, controlled trials are the gold standard for efficacy studies. One must realize, however, that these studies really imply that a given therapy works in some set of the population studied. For example, if a treatment decreases the mortality of ventilator-associated pneumonia from 30% to 25%, it benefits at most 5% of the population. In reality, if one uses an intervention that has substantial biologic activity, such as cancer chemotherapy, the agent probably causes iatrogenic harm in some subjects but yields benefit in an even larger percentage of patients, with the net result being favorable in the overall population. Thus, all efficacy studies, including randomized, controlled trials, are preliminary proof-of-concept studies that do not completely define who benefits most or least from a new therapy.

Effectiveness studies, on the other hand, assess the impact of an intervention on outcomes when introduced to and used in routine clinical practice [8]. These systems are, by nature, much more complex than those seen in simplified efficacy studies and are frequently influenced by variables that are largely eliminated in more controlled studies, such as patient compliance, provider acceptance, cost, interactions with other therapies, diseases that cause exclusion from controlled trials such as chronic kidney disease, and so forth. Because effectiveness studies depend on observations of large numbers of practitioners and patients without randomization, their results are considered inferior to randomized studies in current grading hierarchies of study validity. Nonetheless, clinical outcomes under non-study conditions will always be the measure of quality of clinical care and thus, effectiveness data are crucial to medical practice.

In a larger sense, then, clinical research generally can be divided in to efficacy and effectiveness studies, and two schemata are required to assess the quality of the data generated from clinical trials depending on the nature of the study. Traditional grading systems work very well for efficacy studies and are hierarchical, attempting to define better and better study designs. Based on paradigms to introduce new pharmaceuticals or other interventions into practice, the following phases are accepted:

I. Safety studies

II. Small-scale exploratory studies to demonstrate promise of benefit

III. Large-scale, randomized, controlled trials to prove efficacy

IV. Post-approval trials, generally in more defined or new populations

Clearly, the phase III, properly powered randomized, controlled study is and must remain the gold standard for assessing the efficacy of an intervention. Where this scheme falls short, is in predicting how these interventions will perform when released into general clinical practice, e. g., the morbid and mortal hypoglycemia seen with intensive insulin therapy [9].

Accepting the premise that current grading schemes for clinical research are based on efficacy rather than effectiveness studies and are sometimes flawed in terms of guiding patient management, we must develop a new method of evaluating clinical research, vis-à-vis clinical care. Rather than using the “scientific” reductionist approach based on methods of laboratory research when considering the quality of human trial data, we should instead first consider what evidence a practicing clinician desires to convince him or her to change their practice in a beneficial manner.

In reality, the decision to use a particular intervention depends on multiple considerations, not just the demonstration of efficacy. For example, the impression that a therapy is either marginally safe or is too expensive will discourage its use; yet, neither of these parameters are well studied in the setting of a tightly controlled efficacy trial. Rather than having a hierarchy of evidence where certain data is accorded primacy (e. g., the randomized, controlled trial) it is much more useful to construct a matrix where there are multiple bins or cells—one for each of many types of clinical data that are useful in predicting or determining effectiveness. Before any intervention can be recommended as standard of care, data must accumulate in each of the individual cells. Of note, by necessity, this type of matrix will require at least some of the cells to include studies related specifically to effectiveness, such as cluster randomized trials and large scale observational studies (see Table 1).

Table 1.

Possible Matrix for the Evaluation of Accumulated Data To Support a Specific Intervention

Meticulous safety studies	Single center observational studies	Single center, randomized, controlled pilot trial
Multicenter, randomized, controlled efficacy study	Phase IV confirmatory studies in patient subsets	Cost-benefit analysis
Cluster randomized trials at unit, hospital, or health system level	Before-after studies of protocols including the intervention	Very large observational outcomes studies on regional or national level

Although the matrix presented in the table might be interpreted as hierarchical or even chronologic, running from upper left to lower right, it should not be interpreted that way. There must be data assignable to each cell before a given intervention can truly be considered evidence-based standard of care, and no form of study is inherently more important than any other. In addition, the evidence will almost certainly accumulate in a relatively haphazard way. For example, the blood leukoreduction work published by Hébert et al. in 2003, analyzing more than 14,000 patients across Canada, can be included in the two cells labeled “Before-after studies of protocols including the intervention” and “Very large observational studies on regional or national level,” yet was conducted prior to the publication of large, randomized, controlled trials. Finally, although the lower row of cells might imply the need for massive and expensive studies, it does not, due to the observational nature of the information required. The evidence that can be used to satisfy requirements related to these cells is available frequently from extant, well-organized clinical networks, such as the Kaiser Permanente healthcare group or countries that already utilize a single-payor system. The Veterans' Affairs National Surgical Quality Improvement Program, for example, is the source of data that has already been used to assess the impact of compliance with Surgical Care Improvement Project measures on rates of surgical site infection (unfortunately, minimal) [11].

It is instructive to note how the negative experiences of intensive insulin therapy and APC occurred; both were included in widely circulated care guidelines before definitive evidence of effectiveness (in contradistinction to efficacy) was available [12,13]. The benefits of intensive insulin therapy were first demonstrated primarily in a single-center study of a fairly narrow group of critically ill patients and under highly controlled conditions [14]. Once this promising intervention was subjected to multicenter study in other populations, however, it was found to be either not beneficial or even harmful. Yet, intensive insulin therapy had already been mistakenly adopted in many if not most intensive care units worldwide. Activated protein C was accepted widely as standard of care based on a single study [15] and before the results of confirmatory trials in key subsets (i.e., low severity of illness, high severity of illness, children) had been completed [16]. In addition, careful cost-benefit analyses for this expensive therapy were lacking at the time of its adoption. In neither case would all the cells in the effectiveness matrix have been filled, and, therefore, they would have been considered experimental interventions rather than the modern standard of care as suggested by misapplied grading hierarchies for efficacy data and included in care guidelines.

In summary, we are approaching a new era of clinical research where both efficacy and effectiveness of interventions will be considered prior to determining what constitutes the standard of care. In order to achieve this goal, a new understanding of how to evaluate effectiveness will be required—one that will be more complicated and less hierarchal than the one currently used to judge efficacy research. How the current enthusiasm for comparative effectiveness research will be integrated into this process remains unknown.

References

Ranieri

, Thompson

, Barie

, et al.; PROWESS-SHOCK Study Group. Drotrecogin alfa (activated) in adults with septic shock. N Engl J Med 2012, 31; 366:2055–2064.

Barie

, Hydo

, Shou

, Eachempati

. Efficacy of therapy with recombinant human activated protein C of critically ill surgical patients with infection complicated by septic shock and multiple organ dysfunction syndrome. Surg Infect, 2011; 12:443–449.

Wang

, Sun

, Zheng

, et al. Low-dose hydrocortisone therapy attenuates septic shock in adult patients but does not reduce 28-day mortality: A meta-analysis of randomized controlled trials. Anesth Analg, 2014; 118:346–357.

Griesdale

, de Souza

, van Dam

, et al. Intensive insulin therapy and mortality among critically ill patients: a meta-analysis including NICE-SUGAR study data. CMAJ, 2009; 180:821–827.

Bassols

, Costa

, Eckersall

, et al. The pig as an animal model for human pathologies: A proteomics perspective. Proteomics Clin Appl, 2014 Aug 4 [Epub ahead of print].

Seok

, Warren

, Cuenca

, et al. Genomic responses in mouse models poorly mimic human inflammatory diseases. Inflammation and Host Response to Injury Large Scale Collaborative Research Program. Proc Natl Acad Sci U S A, 2013; 110:3507–3512.

Barie

. Oh Lord! I've got those clinical research blues. Surg Infect, 2004; 5:327–342.

Barie

, Ho

. The value of critical care. Surg Clin North Am, 2012; 92:1445–1462.

Brutsaert

, Carey

, Zonszein

. The clinical impact of inpatient hypoglycemia. J Diabetes Complications, 2014; 28:565–572.

10.

Hébert

, Fergusson

, Blajchman

, et al. Leukoreduction Study Investigators. Clinical outcomes following institution of the Canadian universal leukoreduction program for red blood cell transfusions. JAMA, 2003; 289:1941–1949.

11.

Ingraham

, Cohen

, Bilimoria

, et al. Association of surgical care improvement project infection-related process measure compliance with risk-adjusted outcomes: implications for quality measurement. J Am Coll Surg, 2010; 211:705–714.

12.

Dellinger

, Levy

, Carlet

, et. al. International Surviving Sepsis Campaign Guidelines Committee; American Association of Critical-Care Nurses; American College of Chest Physicians; American College of Emergency Physicians; Canadian Critical Care Society; European Society of Clinical Microbiology and Infectious Diseases; European Society of Intensive Care Medicine; European Respiratory Society; International Sepsis Forum; Japanese Association for Acute Medicine; Japanese Society of Intensive Care Medicine; Society of Critical Care Medicine; Society of Hospital Medicine; Surgical Infection Society; World Federation of Societies of Intensive and Critical Care Medicine. Surviving Sepsis Campaign: international guidelines for management of severe sepsis and septic shock: 2008. Crit Care Med, 2008; 36:296–327.

13.

Hicks

, Cooper

, Webb

, et al. The Surviving Sepsis Campaign: International guidelines for management of severe sepsis and septic shock: 2008. An assessment by the Australian and New Zealand intensive care society. Anaesth Intensive Care, 2008; 36:149–151.

14.

van den Berghe

, Wouters

, Weekers

, et al. Intensive insulin therapy in critically ill patients. N Engl J Med, 2001; 345:1359–1367.

15.

Bernard

, Vincent

, Laterre

, et al. Recombinant human protein C Worldwide Evaluation in Severe Sepsis (PROWESS) study group. Efficacy and safety of recombinant human activated protein C for severe sepsis. N Engl J Med, 2001; 344:699–709.

16.

Barie

. Current role of activated protein C therapy for severe sepsis and septic shock. Curr Infect Dis Rep, 2008; 10:368–376.