Abstract
There are several reasons to believe that Institutional Review Boards (IRBs) and Human Research Protection Programs (HRPPs) contribute to ethical research and the protection of research participants, but there are also important reasons to interrogate this belief. Determining whether IRBs and HRPPs “work” requires empirical evaluation of whether and how well they actually achieve what they were designed to do. In other words, it is critical to examine their outcomes and not only their procedures and structures. In this response to Tsan, we argue that the concept of IRB and HRPP quality entails three dimensions: (1) effectiveness, (2) procedures and structures likely to promote effectiveness, and (3) features unrelated to effectiveness but nonetheless essential, such as efficiency, fairness, and proportionality. Because not all types of quality necessarily guarantee or entail effectiveness, we suggest that broad quality assessments, including such features as regulatory compliance and other procedural measures suggested by Tsan, are unhelpful as the first step in evaluating IRBs and HRPPs. Instead, we must start with outcomes relevant to effectiveness. To do this, we launched the Consortium to Advance Effective Research Ethics Oversight (AEREO), with a mission to define and specify ways to measure relevant outcomes for research ethics oversight, empirically evaluate whether those outcomes are achieved, test new approaches to achieving them, and ultimately, develop and implement empirically-based policy and practice to advance IRB and HRPP effectiveness. We describe several anticipated AEREO projects and call for collaboration between various stakeholders to more meaningfully evaluate IRB and HRPPs.
Some interventions—like parachutes—seem to be so obviously beneficial that it is unnecessary to evaluate their effectiveness (Smith & Pell, 2003). However, there exist few true examples of interventions with such unquestionable benefit (Hayes, Kaestner, Mailankody, & Prasad, 2018). There are several reasons to believe that Institutional Review Boards (IRBs), and related research ethics committees referred to by different titles around the world, as well as the Human Research Protection Programs (HRPPs) of which they are often a part, contribute to ethical research and the protection of research participants. Yet, they are not parachutes. Research ethics oversight by IRBs and HRPPs is certainly sensible, but seemingly sensible interventions often fail to achieve their goals (Dixon-Woods et al., 2016; Prasad & Cifu, 2015). Moreover, there are several affirmative reasons to call for evaluation of IRBs and HRPPs, including evidence of their inefficiency and inconsistency (Abbott & Grady, 2011; Nicholls et al., 2015; Silberman & Kahn, 2011); underlying injustices in the conduct of IRB-approved research (Elliott, 2017; Lamkin & Elliott, 2018); persistent challenges around informed consent that may be exacerbated by IRB requirements (Jefford & Moore, 2008); evidence that IRB risk-benefit analysis is uninformed by evidence and may not reflect the risk-benefit preferences of most participants (Meyer, 2013); and limited efforts by IRBs and HRPPs to extend their reach past the review of documents to examine the day-to-day conduct of research with human participants (DeBruin, Liaschenko, & Fisher, 2011).
To determine whether IRBs and HRPPs “work,” many have called for efforts to move beyond unvalidated surrogate measures focused on their structure and process to direct measures of the outcomes achieved by IRB and HRPP oversight of research (Abbott & Grady, 2011; Beagan & McDonald, 2005; Coleman & Bouësseau, 2008; Emanuel et al., 2004; Grady, 2010; Institute of Medicine, 2002; McDonald & Cox, 2009; Nicholls et al., 2015; Resnik, 2015; Sugarman, Eckenwiler, & Emanuel, 2003; Taylor, 2007). Yet, little progress has been made over several decades toward empirically evaluating whether and how well IRBs and HRPPs actually achieve what they were designed to do. We are skeptical that the approach outlined by Tsan (2019) in his commentary, “Measuring the Quality and Performance of Institutional Review Boards,” can meaningfully advance this goal. In the following, we propose what we think is a more promising approach.
Quality and Effectiveness
Our primary concern about Tsan’s proposal is that it focuses on procedural aspects of IRB quality, when a threshold concern is more specifically IRB (and HRPP) effectiveness.
As we define the term here, effectiveness refers narrowly to whether and how well an intervention achieves relevant outcomes. Although certain processes and structures may be helpful or even necessary to achieving those outcomes, they may not guarantee them—and may sometimes be counterproductive. Therefore, evaluating processes and structures alone cannot be used to demonstrate effectiveness unless they have been validated as surrogate markers of the outcomes of interest (Donabedian, 1988). For example, an IRB that prevents an avoidable risk in a proposed study could be described as effective in that regard—it has at least in part achieved the outcome of participant protection by limiting participant exposure to the avoidable risk. The IRB will certainly miss avoidable risks if it lacks a process to examine study risks and a structure that ensures appropriate expertise to analyze them. However, examining the IRB’s process and structure alone will not tell us whether the IRB in fact has been successful in preventing avoidable risk, that is, the relevant outcome. Put simply, to address effectiveness, it is essential to evaluate outcomes.
The concept of quality is broader and incorporates multiple dimensions, the most important of which is effectiveness. A second dimension of quality refers to the presence and type of the procedural and structural features that may—but are not certain to—promote effectiveness, as described above. Thus, an IRB could have high-quality processes and structures but nevertheless be ineffective. However, low-quality processes and structures are unlikely to coexist with effectiveness.
A third dimension of quality refers to the manner in which effectiveness goals are achieved including, for example, features such as efficiency, fairness, and avoidance of undue burden. Thus, an IRB that prevents avoidable risk by subjecting all proposals to months of delay during review could be characterized as effective with regard to participant protection but low quality. Although ineffective interventions ideally either should be improved to become effective or be abandoned, this type of quality could also be relevant in the absence of effectiveness; indeed, if an intervention is ineffective, but remains in place, it seems especially important to ensure that it is not also burdensome and slow.
Because not all types of quality necessarily guarantee or entail effectiveness, we suggest that broad quality assessments are unhelpful as the first step in evaluating a given intervention. We must start, instead, with outcomes relevant to effectiveness.
These distinctions allow us to parse Tsan’s suggestion that IRBs should be judged based on how well they “have done in respect to what they are supposed to do.” We agree that this is how IRBs ought to be judged, but we disagree about which elements of what IRBs are “supposed to do” matter most. Tsan focuses on procedural elements like “how well and efficiently” IRBs review protocols and “how well they provide continued oversight of approved research,” as measured by whether IRBs base their review and oversight on the relevant Common Rule criteria. Although regulatory compliance is certainly something IRBs are required to achieve, dutiful application of the regulatory criteria on its own tells us nothing about whether IRBs are achieving the outcomes they were created to achieve, that is, whether they are effective. This is because the regulations themselves have not been demonstrated to be effective. In other words, regulatory compliance is a proxy for what we really care about.
IRBs do not exist simply to achieve regulatory compliance, and research ethics regulations do not exist simply to be complied with. Instead, there is substantial international agreement (Table 1) that both research ethics review committees and associated regulations exist to (a) protect the rights and welfare of research participants, (b) promote justice in research, (c) foster a culture of ethical concern among researchers and institutions, (e) maintain and promote public trust in the research enterprise, and (f) promote socially valuable, scientifically valid, ethical research. This is the foundation of what IRBs are “supposed to do” and it is their effectiveness in achieving each of these purpose-based outcomes on which both IRBs and relevant research ethics regulations should ultimately be judged. Thus, we find Tsan’s rejection of participant protection as a relevant parameter of IRB quality particularly troublesome. No matter what else an IRB successfully accomplishes, if participants are not adequately protected due to factors that the IRB could control, there should be serious concern about whether the system is working, either with regard to regulatory approaches or their application by IRBs.
International and National Statements on the Purpose of Research Ethics Review Committees.
Note. For additional information and links to international regulations and guidelines, see ClinRegs, an online database of country-specific clinical research regulatory information (https://clinregs.niaid.nih.gov) and the International Compilation of Human Research Standards 2018 Edition compiled by the Office for Human Research Protections, U.S. Department of Health and Human Services (https://www.hhs.gov/ohrp/international/compilation-human-research-standards/index.html).
By limiting his evaluation of what IRBs are supposed to do to a set of process measures, Tsan’s approach threatens to obscure deeper and more important questions about whether what IRBs are supposed to do actually protects participants and promotes science. In contrast, examining IRB effectiveness directly may shed light on the limitations of the current regulatory framework and provide evidence to support change.
If and when IRBs and HRPPs are demonstrated to be effective, we can ask further questions related to other aspects of their quality, such as what structures and processes are associated with achieving effectiveness and how they can be incorporated into regulatory approaches, whether effectiveness is worth the costs it might entail, and how it might be achieved most efficiently and fairly. Until then, measuring process-oriented surrogates of what IRBs are supposed to do, as Tsan suggests, will not help advance the discussion around outcomes-oriented endpoints relevant to IRB, HRPP, and regulatory effectiveness.
Evaluating IRB and HRPP Effectiveness
To address these foundational questions of effectiveness related to the achievement of IRBs’ and HRPPs’ purpose-based outcomes, we launched the Consortium to Advance Effective Research Ethics Oversight (AEREO) in May 2018 (www.med.upenn.edu/aereo). AEREO brings together leaders in human subjects research oversight, research ethics, and empirical methods from academic research institutions, freestanding health systems, commercial IRBs, and government sponsors. Our mission is to define and specify ways to measure relevant outcomes for research ethics oversight, empirically evaluate whether those outcomes are achieved, test new approaches to achieving them, and ultimately, develop and implement empirically based policy and practice to advance IRB and HRPP effectiveness.
Easier said than done, of course. IRB and HRPP effectiveness are difficult to measure, as Tsan and many others have noted (Abbott & Grady, 2011; Coleman & Bouësseau, 2008; Nicholls et al., 2015; Resnik, 2015; Scherzinger & Bobbert, 2017; Sleem et al., 2010; Taylor, 2007). Isolating IRB and HRPP impact and translating their purpose into validated metrics of effectiveness is important, but metrics have proven elusive. In their absence, however, we cannot continue to treat IRBs and HRPPs like parachutes. What we can do right now is measure in an open-ended way whether and what type of value IRBs and HRPPs contribute and what shortcomings there might be in achieving the system’s goals. AEREO is in the process of developing several funding proposals aimed at these questions of value, seeking at first what has been referred to as “soft intelligence” rather than analysis of formal metrics (Martin, McKee, & Dixon-Woods, 2015).
For example, if we can identify the ways that various stakeholders view research oversight by IRBs and HRPPs to be important and helpful, and ways that it could be more so, that will provide insight into ways the system is working and areas for further intervention and improvement. This approach should be particularly useful to potentially balance evidence regarding problems in the system, which may be more visible and easier to measure than successes. Accordingly, one set of AEREO projects will engage relevant stakeholders, such as investigators, sponsors, and IRB members, about ways in which research ethics oversight could be or has been useful in their experience. Because the perspective of research participants has largely been neglected in the discussion of IRB effectiveness—despite their protection being a central goal (Nicholls et al., 2015)—we will also seek to learn from patients and participants what they want from IRBs and HRPPs and how their experiences relate to the goals of research ethics oversight (Kost et al., 2013).
Another way to make progress on evaluating effectiveness is to evaluate IRB decisions reported in “outcome letters.” Some have examined these letters to determine the extent to which IRB review focuses on various aspects of research, such as scientific validity or consent (Angell, Bryman, Ashcroft, & Dixon-Woods, 2008; Tsoka-Gwegweni & Wassenaar, 2014). Others have analyzed the extent to which IRBs justify their decisions to researchers in these letters (Clapp, Gleason, & Joffe, 2017) or use them to signal other things to researchers, such as accountability and authority (Dixon-Woods, Angell, Ashcroft, & Bryman, 2007; O’Reilly, Dixon-Woods, Angell, Ashcroft, & Bryman, 2009). However, letters reporting IRB decisions to approve, not approve, defer, or modify research can also be used to evaluate the extent to which IRBs are achieving their intended purposes. Thus, another line of contemplated AEREO research involves cataloging the ways in which IRB oversight changes the structure and design of research—compared to how it was initially conceived and proposed—and determining whether scientists, research ethicists, and participants endorse these changes as valuable in furthering participant welfare and autonomy, justice, high-quality science, and other foundational goals of the IRB/HRPP system.
These projects aim to promote progress on questions of IRB and HRPP effectiveness and may facilitate the development of metrics, such as those based on features most important to research participants or identified as valuable by other stakeholders. Related efforts could test different types of research ethics oversight to compare their respective influence on participant experience and identification of ethical concerns. Efforts could also be taken to utilize “natural experiments” around key regulatory changes, such as the recent revisions to the U.S. Common Rule, for example, examining participant experience before and after the new rules take effect.
The Importance of Collaboration
Moving forward, collaboration among various entities and stakeholders will be essential. Analyzing a single IRB or HRPP will be far less informative than analyzing performance across sites, and no single type of stakeholder on its own can provide adequate insight regarding the purposes and outcomes of the system. However, it can be challenging to share materials across institutions and to access stakeholders without institutional facilitation. Moreover, IRBs and HRPPs have not historically made themselves readily available as objects of study (Lynch, 2018). As Tsan notes, skeptical institutions may worry that collaboration with AEREO would simply open them up to criticism as ineffective or low quality, although that certainly is not our intention. One aim of the Consortium is therefore to systematically examine and develop approaches to help overcome some of these traditional concerns.
Our hope is that IRBs and HRPPs at a range of institutions will see the importance of joining AEREO’s efforts. In addition to advancing the ethical goals of promoting and improving effective research ethics oversight system-wide, participants in AEREO projects will have the benefit of learning from collaborating institutions, identifying areas in need of improvement, testing novel approaches, and improving evidence-based practice at their own sites. The Association for the Accreditation of Human Research Protection Programs (AAHRPP) has also agreed that participation in AEREO projects can be used toward satisfaction of its relevant accreditation standards (AAHRPP Advance Newsletter, 2018).
Conclusion
Tsan is absolutely correct that “the lack of systematic assessments of the quality and performance of IRBs nearly five decades after [the system’s] establishment is striking.” While this may be acceptable for parachutes, it is not for IRBs and HRPPs. In Tsan’s view, the current lack of adequate metrics for evaluating participant protections and the quality of IRB reviews should not prevent the assessment of the quality and performance of IRBs and HRPPs; in particular, he believes that the latter can and should be assessed separately from the former. In our view, however, the elements of quality and performance of IRBs and HRPPs that we should care about most are their effectiveness in achieving adequate participant protections and other goals of research ethics oversight. Effort spent optimizing IRB and HRPP processes that have not yet been shown to promote those ultimate ends could be wasted effort or—worse—provide false assurance that all is well with regard to participant protections. In other words, separating evaluation of participant protection and quality of IRB reviews from evaluation of the quality and performance of IRBs and HRPPs, as Tsan suggests, is unwise and likely unworkable.
That said, we share Tsan’s view that the “IRB community should continue its efforts to define parameters for human subjects protection and the quality of IRB ethics reviews.” To facilitate the development of metrics for human subjects protection, it will be necessary to somehow segregate the inherent risks involved with many types of research and with many conditions being studied from the risks that could potentially be avoided through IRB oversight. Thus, conceptual work and subsequent efforts to develop mechanisms to clearly define which risks ought to count as avoidable, and which are and are not research-related, could help substantially advance the science of evaluating IRB and HRPP effectiveness. In the meantime, we maintain that other types of evaluations of IRB and HRPP effectiveness—including those based on stakeholder perspectives and IRB impact, as described above—are both possible and important to pursue.
Ultimately, we must develop approaches that will facilitate the review and improvement of IRB and HRPP effectiveness based on the purposes and endpoints they (and related research ethics regulations) were established to achieve. Adopting an evidence-based approach to the policy and practice of research ethics oversight will help ensure that participants are appropriately protected, social trust in the research enterprise is secured, and limited resources are used judiciously. Collaborating on this endeavor will facilitate the development of best practices, standardized definitions and measures for relevant outcomes, and much-needed infrastructure for evaluating novel approaches to research ethics oversight (Nicholls, 2018). The challenges to evaluating IRB and HRPP effectiveness are great, but so is the need—and there are creative paths forward.
Footnotes
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Stuart Nicholls is an associate editor for the Journal of Empirical Research on Human Research Ethics. He was not involved in editorial decisions regarding this article and declares no other potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Holly Fernandez Lynch, Michelle N. Meyer, and Holly A. Taylor declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for authorship and/or publication of this article. The meeting to launch the AEREO Consortium in May 2018 was supported by the Department of Medical Ethics and Health Policy, Perelman School of Medicine, University of Pennsylvania, and the Leonard Davis Institute of Health Economics, University of Pennsylvania.
