Abstract

As detailed in Part 1, the complexities of anginal pain and the apparent role of ‘suggestion’ as a source of pain relief among those receiving interventions served to catalyse researchers in Great Britain and the United States in the 1930s and 1940s to utilise patient blinding as a means to offset such sources of biased outcomes. As Part 2 relates, Harry Gold and his colleagues at Cornell, concerned about the ‘subconscious’ or ‘unconscious’ bias of researchers, would extend such attempts at rigour in their antianginal studies to include the blinding of researchers as well, resulting in the development – and by the 1950s, formal articulation – of the ‘double-blind’ method of clinical investigation.
Harry Gold and researcher blinding
Harry Gold (as related in Part 1) favoured limits in controlling for patient context in clinical studies, focusing on pain in the daily life of the patient, rather than performance on an office exercise treadmill test, as the key clinical endpoint to be measured in anti-anginal studies. Nevertheless, he and his colleagues attempted to extend the rigour of clinical evaluation by focusing on a second component of the patient:clinical researcher dyad – researcher enthusiasm and bias in administering interventions and especially in recording results. As a gauge of prevailing sentiment in this respect, in George LeRoy’s 1941 evaluation of papaverine, while LeRoy was proud of being the consistent, dedicated ‘one observer’ making all the interventions and evaluations in his placebo-controlled study, he admitted that he knew which patients were receiving which remedy. Though he ‘attempted to be as non-committal as possible’, LeRoy acknowledged that ‘discerning patients may well have seen my enthusiasm for these particular xanthine drugs’ (LeRoy, 1 p. 924). Likewise, pointing to the possibility for both patient and clinician bias, Joseph Riseman would remind his own readers in 1943 that ‘the clinical evaluation of the benefit of therapy is, in fact, the physician’s impression of the patient’s opinion of the response to treatment’ (Riseman, 2 p. 672). For Riseman (as described in Part 1), the solution to such potential bias was the ‘objective’ nature of the exercise-tolerance test. For Harry Gold and colleagues, it would be the blinding of the researcher.
Gold had first employed researcher blinding in a study published in 1935 of varying formulations of ether. 3 As he would later recount, in the context of the Great Depression, the question of whether (less expensive) ether out of multi-usage large drums was as effective as ether from small cans held important economic consequences. Conventional opinion was mixed, so Gold proposed in 1933 to put the question to the ‘blind test’ (quotes in original), with the administering anaesthesiologists ‘unaware of the source of the specimens except in terms of code numbers or letters’, 4 and those (presumably Gold and his colleagues) assessing the outcomes kept similarly in the dark. 5 As Shapiro and Shapiro have written, it is unclear whether Gold, by invoking the ‘blind test’, was referring back to renowned pharmacologist Torald Sollman’s own earlier usage of the term, to the contemporary Old Gold cigarette ‘Take the Blindfold Test’ marketing campaign (as Gold’s colleague, Nathaniel Kwit, suggested decades later), or some other source.6–9 Regardless, Gold found no difference in efficacy between the two sources of ether and would become the world’s leading proponent of researcher blinding as ‘a simple expedient which insures a record free of subconscious bias’ (Gold, 10 p. 8).
For the ‘blind test’ of the efficacy of xanthines in angina, the 1937 published paper reported that in eliciting symptoms from patients, ‘to eliminate the possibility of bias, the questioner usually refrained from informing himself as to the agent that had been issued until after the patient’s appraisal … had been obtained’ (Gold, 11 p. 2175). Yet, ‘usually’ belies a more complicated evolution of the study. As Arthur Shapiro uncovered through interviews with Gold and Kwit, the study had begun in 1932 (a year before the ether study would be planned) with only patients blinded. However, within a few years of running the extensive study, Gold and his colleagues determined that the informed researchers were asking ‘leading questions’ in determining the efficacy of the remedies. Thus, by the end of the study, it appears that the investigators had changed course to ensure that the evaluating clinicians were in the dark regarding what a given patient received. 6 Moreover, in the midst of the xanthine study, yet, after the ether study, Gold and colleagues likewise applied researcher blinding to an animal study of the impact of aminophylline on experimental coronary infarcts, showing no benefit when the intervention ‘was unassisted by the intangible something which is apt to be added by an observer’s unconscious bias’.12,13 Double-blinding had been brought to angina evaluation, if not yet named as such.
Over the next decade, Gold would continue to modify the name of such double-blinding, all the while promoting its utility. In a 1943 talk on the ‘Treatment of Cardiac Pain’, in somewhat revisionist fashion, he noted of the xanthine study that ‘our experiments were made with both eyes blindfolded; the patient didn’t know what he was getting, and the doctor didn’t know what he was giving’. 14 Critiquing studies of androgens as vasodilators during the same talk, he reported: ‘I am not in the slightest degree impressed with these results. The studies were not made with the double-eye blind test’. By January of 1947, in the notes for a talk at the New York Academy of Medicine on ‘Recent Advances in Therapeutics’, we see what appears to be Gold’s first invocation of ‘the double-blind test’, to critique the absence of such rigour in evaluations of antihistamines. 15 This was followed, in a withering March 1948 critique of papaverine for cerebral vasospasm, by a slight modification of the term – now ‘double blind-test’ – and continued scepticism. 16
By that year, while the name of the methodology was still in flux, Gold and his colleagues had further refined their technique. In a favourable study of intravenous aminophylline for exertional angina, they reverted to prior nomenclature in reporting that ‘the study was conducted by the ‘blind’ method’ (While Gold was not an author on the article, an acknowledgement noted the Beth Israel Hospital (New York) authors’ ‘appreciation to Dr. Harry Gold, Chief of the Cardiovascular Research Unit, for his advice and help during the course of this study’. The study was ‘aided’ by a grant from the Council on Pharmacy and Chemistry of the American Medical Association.).
17
In particular, the materials for injection, 10 c.c. of a 2.4 per cent aminophylline solution and an identical quantity of physiologic saline, were prepared by a nurse, for each day, in identical syringes marked only with code numbers so that the contents were unknown to the observer as well as to the subject. (Bakst et al.,
17
p. 529) A method was devised for varying the order of the trials [no injection versus saline versus aminophylline] so that all possible combinations of the three tests were used in different sequence. The code numbers on the syringes indicated the order in which the injections were to be given. The contents of the syringes and the corresponding code numbers were noted on cards, sealed in envelopes, and kept sealed until the entire study was completed. (Bakst et al.,
17
p. 529)
Approximating the rigour of the laboratory
Two years later, Gold’s team published their crossover evaluation of the vasodilator khellin (dimethoxy-methyl-furano chromone).
18
Gold would later characterise this as the first true ‘clinical pharmacology’ study, approximating the rigour of laboratory investigation, with objective measurement of symptoms recorded in a seemingly novel ‘daily report card’ system, and rigorous blinding of both patients and researchers.
19
Such blinding required a large team, with different members playing their coordinated but independent roles. From one end, as they noted: One person received the ‘daily report cards,’ decided on changes in dosage and dispensed the supply of tablets with directions for their use. He knew what the patient had been taking but this knowledge played no part in the record of the results, for his function was neither to question patients regarding the effect of the tablets nor to record judgment; he merely assembled and filed ‘daily report cards’. (Greiner et al.,
18
pp. 144–146)
The khellin study became an extended opportunity to emphasise the necessity of observer blinding. Holding up LeRoy as a cautionary example, Gold and colleagues publicly noted that an ‘evaluation of the physician’s enthusiasm (positive suggestion) on the angina of effort would be a study of some interest in itself but it seems self-evident that the physician’s enthusiasm is inadmissible in a scientific experiment’ (Greiner et al., 18 p. 153). Privately, and perhaps still stinging from LeRoy’s intimation that earlier, negative studies of xanthines may have stemmed from hospital personnel with variable ‘diagnostic acuity’, Gold noted to himself: ‘He knew what the drugs were and he apparently communicated that knowledge to his patients. It was not a true blind test’. 23 The khellin paper, featuring such then- and future stars of clinical pharmacology as Gold, Nathaniel Kwit, McKeen Cattell, Janet Travell, Theodore Greiner and Walter Modell, likewise became an opportunity to discuss and show, by example, the broader requirements for methodological rigour. For instance, whether a given patient would first receive placebo versus khellin was determined by a ‘randomized’ process, in which half received khellin first, the other half placebos first (Greiner et al., 18 p. 146). Concerns over the pitfalls of premature analysis were revealed in self-reflective fashion, with an admission of their own eagerness for an early answer, and their initial consideration of khellin as seemingly effective from this partial analysis, a conclusion that would be overturned by their full study (Greiner et al., 18 p. 146, Gold, 24 ). Extending their gaze outward, the comments section was taken up with an extended critical discussion of prevailing forms of studies, including a pointed critique of what was seemingly ‘still the most prevalent, namely, the one in which the patient receives the drug and returns after a week or two with a verbal report on impressions’ (Greiner et al., 18 p. 152). As they concluded: ‘This is probably evaluation at its worst’.
Stabilising the ‘double-blind’ method
The khellin crossover angina study would be paralleled and followed by two additional double-blind, placebo-controlled studies, of alpha tocopherol and heparin, respectively.25,26 For the first time among Gold and his team’s angina studies, the crossover approach would be replaced by matched-pair randomisation (in which patients were first ‘matched’ to one another by pre-established criteria, and then randomised within pairs to either the active or placebo control group). With the insufficiency of the crossover approach predicated on concern for the long-term storage of alpha tocopherol in the body, and for the impact of changing ‘environmental’ and ‘seasonal’ factors in the heparin study, the two studies represented some of the earliest uses of such matched-pair randomisation in the medical literature. 27 While Janet Travell (who would later garner additional fame as the first female Presidential physician, serving as physician to John F. Kennedy), during the planning and early implementation phases of the heparin study, referred to the ‘double blind technic’ and ‘double blind method’ to be employed, she and her co-authors referred in the published paper in April 1953 to the ‘double blindfold method’.26,28,29
Terminology was clearly still being stabilised, and Gold, Travell and their colleagues would display some ambiguity concerning what exactly constituted a ‘double-blind’ study. In the 1950 khellin paper, this referred not to the clinician administering the treatment (though such clinicians would not be involved in the evaluation of patients), but solely to the patients and evaluators. By the time of the 1953 heparin paper, however, Travell and colleagues would note that the double blindfold method … meant the study was conducted by a team, and that not merely the patients but also the physicians who questioned and examined them, injected the solutions and later assessed the data were unaware of the nature of the coded solution given to any particular patient. (Rinzler et al.,
26
p. 439) The whole history of therapeutics, especially that having to do with the action of drugs on subjective symptoms, demonstrates that the verdict of one study is frequently reversed by another unless one takes measures to rule out the psychic effect of a medication on the patient and the unconscious bias of the doctor. The double-blind insures this. (Cornell Conferences on Therapy,
34
p. 724)
By this time, and after nearly a quarter of a century of effort to shift the evaluation of anti-anginal therapeutics, Gold and his colleagues could seemingly claim victory, if narrowly defined. As they noted, ‘nowadays it is a rare study of coronary vasodilators that does not specify control with placebo and “double blind” evaluation of cardiac pain’ (Greiner et al., 37 p. 244). Yet, as they realised, this was only a partial victory: ‘In other areas of therapeutics, the majority of studies are so poorly designed that their data contain no indication as to the correctness of the conclusion’ (Greiner et al., 37 p. 244). Gold had his sights set beyond anti-anginals, and Part 3 of this series will examine the relationship between the ‘double-blind’ method and the advent of ‘clinical pharmacology’ more generally from the 1950s onward.
