Academy of Breastfeeding Medicine Recommendations on Changes to Classification of Levels of Evidence for Clinical Protocols

Abstract

Background

The Academy of Breastfeeding Medicine (ABM) has engaged in developing, publishing, and updating clinical protocols by volunteers for the international community of physicians and other health care professionals dedicated to breastfeeding since the publication of its first protocol, Hypoglycemia, in 1999. These protocols are translated by volunteers into multiple languages and made available through online links on the ABM website (www.bfmed.org) to facilitate optimal practices in breastfeeding medicine. During protocol development, the authors, under the direction of a “shepherd” from the ABM Protocol Committee, review the evidence, create an annotated bibliography, and assign levels of evidence (LOE) to each primary research or review article supporting the recommendations. In 2009, ABM protocols were accepted to the prestigious National Guidelines Clearinghouse (NGC), a website for clinical protocols that met their strict criteria and were made widely available. Beginning in 2012, NGC required protocols to have LOE assigned using the United States Preventative Services Task Force (USPSTF) system.¹ When funding to NGC ended in 2018, and the USPSTF system was essentially unavailable as it had been previously archived by the National Center for Biotechnology Information, the Protocol Committee began to use the Oxford Centre for Evidence-Based Medicine Levels of Evidence without a formal review.² However, recently with the expansion of topics covered by the protocols, use of this schema has become problematic. Since much of the research conducted to determine best strategies to support breastfeeding is observational in design, there was a general desire for review of other available schemas to determine which would best support the protocols.

Methods

A subcommittee of the ABM Protocol Committee was formed to review various systems for classification of evidence, the strengths and weaknesses of their adoption, and the utility of each one given the nature of breastfeeding research. Special attention was given to the international scope of the ABM membership, volunteers, and audience for clinical protocols. The subcommittee met several times by video conference and communicated further through e-mail from July through September 2019. Articles evaluating various rating systems were reviewed and each subcommittee member was asked to submit a recommendation for which one they recommended and their reasons. The following classification systems and articles assessing their strength and weaknesses were reviewed: Appraisal of Guidelines for Research and Evaluation (AGREE), Grading of Recommendations, Assessment, Development and Evaluations (GRADE), National Institute for Health and Care Excellence (NICE, United Kingdom), the currently used system Oxford Centre for Evidence-Based Medicine (OCEBM) Levels of Evidence, Strength Of Recommendation Taxonomy (SORT), and USPSTF.^3–5 The committee members analyzed these grading systems based on applicability to breastfeeding research, patient-centeredness, logical approach, likelihood for greater inter-rater reliability and consistency, minimal subjectivity, bias toward clinical randomized controlled trials (RCTs), and inclusion of a wide array of observational study designs. This review enabled the committee to narrow the field to three systems: GRADE, OCEBM, and SORT (Table 1).

Table 1.

Summary of Recommendations

Rating issues	SORT	GRADE	Oxford
BF evidence mainly observational (not clinical trials)	Permits higher LOE for observational (i.e., cohort designs)	Rates RCT highest	Inconsistent assigning LOE; not all study types included (i.e., meta-analysis of case control studies)
Patient-centeredness	Permits grading strength of recommendation (A, B, or C) derived from LOE	Complex	Lower levels appear to undermine the recommendations
Geographical reach	Mainly United States; developed by AAFP; AAP used in policies	International	International
Complexity	Simple, easy to apply; lacks some categories	Very complex; may need funding to grade evidence	Complexity leads to inconsistency in rating evidence
Interpretation and reach	Easy to apply by administrators and nonresearch HCPs	Difficult to interpret	Difficult to interpret
Qualitative research	Can be considered case series	Not included	Not included
Quality Improvement	Included as level 3	Not included	Not included
Expert opinion; consensus opinion	Included as level 3 and recommendation C	Not included	Not included
Sustainability	May have limitations	May change over time	Limitations

AAFP, American Academy of Family Physicians; AAP, American Academy of Pediatrics; BF, breastfeeding; GRADE, Grading of Recommendations, Assessment, Development and Evaluations; HCPs, health care providers; LOE, level of evidence; RCT, randomized controlled trial; SORT, Strength Of Recommendation Taxonomy.

Summary of Analysis

Strength of Recommendation Taxonomy

Description

The SORT scale addresses the quality, quantity, and consistency of evidence and allows authors to rate individual studies or bodies of evidence.⁶ The taxonomy is built around the information mastery framework, which emphasizes the use of patient-oriented outcomes that measure changes in morbidity or mortality.

An A-level recommendation is based on consistent and good-quality patient-oriented evidence.

A B-level recommendation is based on inconsistent or limited-quality patient-oriented evidence.

A C-level recommendation is based on consensus, usual practice, opinion, disease-oriented evidence, or case series for studies of diagnosis, treatment, prevention, or screening.

“LOE from 1 to 3 for individual studies also are defined; to determine whether a study measuring patient-oriented outcomes is of good or limited quality, and whether the results are consistent or inconsistent between studies.”⁶

Strengths

Published algorithms⁶ make SORT easy to use to determine both the strength of a recommendation and the LOE. Inter-rater reliability is good. The reviewers agreed that these aspects are a priority for writing protocols and assigning strength of recommendations and LOE. The Protocol Committee comprised international volunteers who do not have the time or expertise to use complicated systems. The system used must function effectively and reliably for widespread adoption and ensure trust in the protocols disseminated by ABM. The SORT scale was deemed useful to clinicians and hospital administrators since it rates the evidence higher than GRADE or OCEBM for well-done prospective observational studies, or systematic reviews/meta-analyses of observational studies.

Weaknesses

The SORT schema is not as complex as the other modalities, so academicians may find it is not adequate in the appraisal and classification of certain types of research studies. SORT is U.S.-centric, developed by American Academy of Family Physicians (AAFP), and used by the American Academy of Pediatrics (AAP). There is concern about acceptability to physicians and organizations from outside the United States of America.

Grading of Recommendations, Assessment, Development, and Evaluations

Description

“GRADE is a transparent framework for developing and presenting summaries of evidence and provides a systematic approach for making clinical practice recommendations.”⁷ The evidence for various outcomes for a clinical question are given a GRADE certainty rating, ranging from very low to high. A high rating indicates the authors are confident that the estimated and true effects are similar. Evidence can be down or up rated one or two levels. Down rating can occur based on risk of bias, imprecision, inconsistency, indirectness, and publication bias, whereas up rating can occur with large magnitude of effect, dose response gradient, and confounding decreasing magnitude of effect. Recommendations based on the evidence are either strong or weak, in favor or against the intervention.

Strengths

The four LOE give more flexibility in assigning grades of evidence.⁸ There are multiple factors and criteria that are taken into account in determining LOE, including equity and resource implications, which are impressive. The system is already in use worldwide and endorsed by many organizations.

Weaknesses

The multiple factors and criteria that are taken into account in determining LOE make this system too unwieldy for ABM to use. There is a considerable amount of subjectivity in each decision, as the nature of rating evidence is a subjective process and may lack inter-rater reliability. Two persons evaluating the same body of evidence might reasonably grade the evidence differently.⁹ Furthermore, many types of research designs may not be easily categorized. ABM does not have the manpower or the expertise to make this a functional system for protocols. The system is also too heavily focused on RCTs, whereas much of the breastfeeding literature involves observational evidence, automatically downgraded to low quality by this method.^4,10,11

Oxford Centre for Evidence-Based Medicine Levels of Evidence

Description

A wide range of clinical questions about occurrence, diagnosis, prognosis, treatment benefit or harm, and screening are considered alongside five LOEs. The tabular format is meant to be a shortcut to allow clinicians, researchers, and patients to find the likely best evidence on their own. The levels do not provide a judgment about the quality of the evidence or make recommendations.

Strengths

There are many LOEs (1–5 for seven different study questions) allowing consideration of systematic reviews when available and case series when not. The system is used internationally yet allows for inclusion of local and current information.

Weaknesses

The rationale for engaging in this ascertainment process stemmed from frustration in being unable to rate much of the evidence for breastfeeding medicine. Concerns with this system included its complexity, inability to translate the resultant number into a clinically meaningful value, and bias toward RCTs. Furthermore, the system does not provide a rating of the strength of recommendations, only the LOE used to make the recommendation. There are many caveats in the system that complicates it further.

OCEBM downgrades much of the observational evidence reported in breastfeeding research, and lowering LOE may make implementation of protocols less likely by those writing hospital policies and protocols but not familiar with the body of evidence available in breastfeeding medicine. There are categories of breastfeeding research not included, such as meta-analysis of case–control studies. The only systematic reviews considered are those of inception cohort designs. Case–control studies are rated 4 along with case series, and these are not equivalent LOE.^3,5,11 Finally, among ABM protocol authors and reviewers, we found that they could reasonably assign different LOE to the same study, giving rise to inconsistent inter-rater reliability.

Future considerations

The Appraisal of Guidelines for Research and Evaluation II (AGREE II)¹² is a system used for grading the actual protocol once it is written. The ABM protocol committee recommends consideration of use of the AGREE II ranking system by independent evaluators in the future. Raters can use the validated instrument to assess the quality of the development process upon completion of each protocol to be published as an appendix. This can only be done when there are sufficient resources available in the ABM to conduct this type of review. The protocol committee suggests this additional step be considered in the future.^3,4

Strengths

The tool is freely downloadable from the AGREE II website, with free tutorials available, and having these protocols validated might allow for more widespread adoption and enhanced trust in the clinical protocols.

Weaknesses/possible drawbacks

The system appears reasonably complicated. It is unclear how many internal or external experts would be necessary to accomplish this task. This is especially concerning considering that the clinical protocols are global in scope. This process requires patient involvement. How to operationalize that globally, with no existing funding currently for the committee's work, and requiring funding for administration, translation, and recruitment for this project, would limit its inception and continuance.

Protocols are already labor intensive inclusive of the annotated bibliographies. This part of the process is currently not being done in a consistent manner; however, restructuring the rating system may improve this part of the process. Having access to the annotated bibliography helps assess the literature supporting the ABM clinical protocols, and has been a member benefit. Adding another layer of complexity with the AGREE system will add more time and personal contribution from onset to completion. The ABM needs to consider costs versus benefits of adding this step in the process.

Summary and Conclusions

The subcommittee recommended changing from OCEBM to SORT. The SORT system requires rating the LOE (1, 2, or 3), using three different research categories/questions (diagnosis, treatment/prevention/screening, and prognosis), as well as the rating the Strength of Recommendation (A, B, or C). These ratings typically appear in a tabular format for key recommendations from the protocol. The subcommittee's recommendation was then considered by the remainder of the Protocol Committee, which voted to affirm the recommendation. This recommendation was brought to the ABM board, and was accepted and approved for adoption. Starting in July 2020, ABM clinical protocols beginning the development process will use SORT for the annotated bibliography and the level of evidence and strength of recommendation will be assigned and published as part of each clinical protocol as these protocols reach publication. Protocols currently in development and phases of publication will continue to use OCEBM. The SORT system and rational for changing published in this special commentary will also appear on the ABM website as a link on the protocols page.

Footnotes

Disclosure Statement

No competing financial interests exist.

Funding Information

No funding was secured for this review.

References

US Preventive Services Task Force. The Guide to Clinical Preventive Services: Recommendations of the US Preventive Services Task Force. Rockville, MD: Lippincott Williams & Wilkins, 2006.

Phillips

, Ball

, Sackett

, et al. Oxford centre for evidence-based medicine-levels of evidence (March 2009). www.cebm.ox.ac.uk/resources/levels-of-evidence/oxford-centre-for-evidence-based-medicine-levels-of-evidence-march-2009 (accessed December 9, 2020).

Gopalakrishna

, Langendam

, Scholten

, et al. Guidelines for guideline developers: A systematic review of grading systems for medical tests. Implement Sci, 2013; 8:78.

Maymone

MBC

, Gan

, Bigby

. Evaluating the strength of clinical recommendations in the medical literature: GRADE, SORT, and AGREE. J Invest Dermatol, 2014; 134:1–5.

García

CAC

, Alvarado

KPP

, Gaxiola

. Grading recommendations in clinical practice guidelines: Randomised experimental evaluation of four different systems. Arch Dis Child, 2011; 96:723–728.

Ebell

, Siwek

, Weiss

, et al. Strength of recommendation taxonomy (SORT): A patient-centered approach to grading evidence in the medical literature. J Am Board Fam Pract, 2004; 17:59–67.

BMJ Best Practice. BMJ best practice: What is GRADE? Available at https://bestpractice.bmj.com/info/us/toolkit/learn-ebm/what-is-grade/ (accessed June 28, 2020).

Guyatt

, Oxman

, Akl

, et al. GRADE guidelines: 1. Introduction—GRADE evidence profiles and summary of findings tables. J Clin Epidemiol, 2011; 64:383–394.

Balshem

, Helfand

, Schünemann

, et al. GRADE guidelines: 3. Rating the quality of evidence. J Clin Epidemiol, 2011; 64:401–406.

10.

Irving

, Eramudugolla

, Cherbuin

, et al. A critical review of grading systems: Implications for public health policy. Eval Health Prof, 2017; 40:244–262.

11.

Gugiu

, Gugiu

. A critical appraisal of standard guidelines for grading levels of evidence. Eval Health Prof, 2010; 33:233–255.

12.

AGREE Enterprise. Appraisal of Guidelines for Research and Evaluation (AGREE) Instrument. AGREE Enterprise. Published 2000. Available at https://www.agreetrust.org/ (accessed June 30, 2020).