Abstract
Background:
Systematic reviews help policy makers and practitioners make sense of research findings in a particular program, policy, or practice area by synthesizing evidence across multiple studies. However, the link between review findings and practical decision-making is rarely one-to-one. Policy makers and practitioners may use systematic review findings to help guide their decisions, but they may also rely on other information sources or personal judgment.
Objectives:
To describe a recent effort by the U.S. federal government to narrow the gap between review findings and practical decision-making. The Teen Pregnancy Prevention (TPP) Evidence Review was launched by the U.S. Department of Health and Human Services (HHS) in 2009 as a systematic review of the TPP literature. HHS has used the review findings to determine eligibility for federal funding for TPP programs, marking one of the first attempts to directly link systematic review findings with federal funding decisions.
Conclusions:
The high stakes attached to the review findings required special considerations in designing and conducting the review. To provide a sound basis for federal funding decisions, the review had to meet accepted methodological standards. However, the review team also had to account for practical constraints of the funding legislation and needs of the federal agencies responsible for administering the grant programs. The review team also had to develop a transparent process for both releasing the review findings and updating them over time. Prospective review authors and sponsors must recognize both the strengths and limitations of this approach before applying it in other areas.
Systematic reviews are practical tools designed to inform decision-making. They aim to help policy makers and practitioners make sense of the research evidence in a particular program, policy, or practice area by effectively identifying and synthesizing evidence across a number of related research studies. In the field of health care, for example, doctors and other health-care professionals have long relied on systematic reviews to provide practical summaries of available medical research (Institute of Medicine [IOM], 2011). In recent years, systematic reviews have gained similar practical use in diverse areas ranging from public health (Truman et al., 2000) to education (What Works Clearinghouse, 2014).
However, the link between systematic review findings and practical decision-making is rarely one-to-one. Policy makers and practitioners may use systematic review findings to help guide their decisions, but they may also rely on other information sources, research evidence, or personal judgment. Similar to other types of scientific evidence, systematic review findings are only one of many factors that ultimately shape public policy and other practical decisions (National Research Council, 2012). For this reason, the link between review findings and the ultimate policy or practical outcome of interest is typically uncertain and incomplete.
In this article, we discuss one recent effort launched by the U.S. federal government to narrow the gap between systematic review findings and practical decision-making. The Teen Pregnancy Prevention (TPP) Evidence Review was launched by the U.S. Department of Health and Human Services (HHS) in 2009 as a systematic review of the TPP literature. The study uses formal systematic review methods to identify programs with evidence of effectiveness in reducing teen pregnancy, sexually transmitted infections (STIs), and associated sexual risk behaviors. The review is a joint effort sponsored by three agencies within HHS: the Office of the Assistant Secretary for Planning and Evaluation (ASPE), the Family and Youth Services Bureau (FYSB) within the Administration for Children and Families (ACF), and the Office of Adolescent Health (OAH) within the Office of the Assistant Secretary for Health (ASH).
In many ways, the TPP Evidence Review is similar to other systematic reviews conducted across the social and health sciences. It uses standard methods for identifying, screening, and assessing individual research studies. It aims to summarize evidence in ways useful for policy makers, practitioners, and other decision makers. The review findings are made publicly available through reports, journal articles, and an HHS website (http://tppevidencereview.aspe.hhs.gov/).
As discussed in this article, what makes the TPP Evidence Review unique is that HHS has used findings from the review to determine eligibility for federal grant funding for TPP programs. The review seeks to identify programs with demonstrated evidence of effectiveness in reducing rates of teen pregnancy, STIs, and associated sexual risk behaviors. Only those programs meeting the review criteria are deemed eligible for certain streams of federal grant funding, which since 2010 has prioritized funding for evidence-based approaches to TPP. This direct connection between review findings and federal funding decisions makes the TPP Evidence Review different from several other well-known evidence-based approaches to public programs and policies, such as the Evidence-based Practice Centers Program of the Agency for Healthcare Research and Quality, the Guide to Community Preventative Services supported by the Centers for Disease Control and Prevention (CDC), and the National Institute for Health and Care Excellence in the United Kingdom. As one of the U.S. government’s first attempts to directly tie eligibility for federal grant funding to the findings of a systematic review, the TPP Evidence Review marks an important shift in the use of systematic reviews to inform funding decisions.
The purpose of this article is to describe how the TPP Evidence Review was initially developed and has unfolded over time. We begin by discussing the origins of the review and the policy context in which it was developed. We then describe the design and planning of the review, focusing specifically on the special considerations necessary to support federal funding decisions. Next, we describe the release of the review findings and the plans for future updates to the review findings. We conclude by assessing the prospects for conducting high-stakes systematic reviews in other policy and practice areas. Detailed information on the review findings, methods, and lessons learned is available elsewhere in the literature (Goesling, 2012, 2015; Goesling, Colman, Trenholm, Terzian, & Moore, 2014).
Background and Context
The TPP Evidence Review was launched in response to recent federal efforts to support the dissemination and scale-up of evidence-based approaches to TPP. The president’s fiscal year (FY) 2010 budget proposed a new tiered, evidence-based TPP Initiative comprising two programs: (1) a community-based competitive grant program and (2) a program in which the majority of funds go to a state formula grant program. The initiative was one of several strategies proposed by the Administration as part of a broader effort to support evidence-based policy making across the federal government (Haskins & Margolis, 2015).
Haskins and Baron (2011) identified two key components of the Administration’s proposed evidence-based initiatives, including the initiative on TPP. The first component involved focusing the initiatives on important social problems that have broad implications for individuals and society. The second component, which was equally important, was to identify problems that can be tied to a research literature and have been studied through rigorous empirical research. TPP was a good match for these components and therefore selected as one of the initial evidence-based initiatives. Similar evidence-based initiatives were proposed in several other federal policy areas, namely, education, labor, and home visiting programs for low-income families.
The overarching goal of these initiatives was to maximize the impacts of limited federal dollars by funding programs that have been shown to be effective through rigorous evaluation, while simultaneously funding new and innovative program models in order to continue building the evidence base. Federal funding for social programs is not always based on evidence, yet such a focus on evidence is seen by some as having the potential to improve program performance and outcomes (Haskins & Margolis, 2015). The ongoing evidence-based initiatives also involve a commitment to investing in evaluation activities to assess whether previously observed impacts can be replicated in new settings. Such evidence can be used to inform program improvement as well as to identify programs that perhaps should no longer be funded because they aren’t achieving expected outcomes.
In late 2009 and early 2010, Congress authorized two TPP programs similar to what the president’s FY 2010 budget had proposed. The first was the community-based TPP program authorized through the Consolidated Appropriations Act of 2010 (P.L. 111-117, 2009). The Act provided US$100 million in FY 2010 for the TPP program to competitively fund grants to address high rates of teen pregnancy, STIs, and associated sexual risk behaviors. The program continues to receive funding and authorization through annual appropriations and is overseen by the OAH, which administers the cooperative agreement grants with state and local organizations (Kappeler & Farb, 2014). The TPP program features a “tiered evidence” structure, with two distinct tiers of grant funding: (1) 75% of the grant funding is for replicating evidence-based programs that have demonstrated evidence of effectiveness through rigorous evaluation and (2) 25% of the grant funding is for implementing research and demonstration grants to develop and evaluate innovative programs that do not have existing evidence of effectiveness. Of the latter 25% of the funding, part was used to fund a joint initiative between OAH and the CDC to implement and test community-wide, multicomponent approaches to preventing teen pregnancy and births in communities with high rates of teen births.
Additional funding for evidence-based TPP programming comes from the Personal Responsibility Education Program (PREP) authorized by congress through the Patient Protection and Affordable Care Act (P.L. 111-148, 2010). Through PREP, FYSB within ACF of HHS provides a mix of competitive and formula block grants to U.S. states and local organizations to provide evidence-based approaches to TPP (Zief, Shapiro, & Strong, 2013). The program includes three distinct components: (1) US$55 million in annual formula grants to states to either replicate effective programs or substantially incorporate elements of effective prevention programs while including three of six adult preparation subjects mandated by congress, (2) US$10 million in competitive PREP Innovative Strategies cooperative agreement grants, which were issued in conjunction with the OAH research and demonstration grants, and (3) US$3.25 million in grants provided for Indian tribes and tribal organizations.
Consistent with the Administration’s vision for these programs as supporting an evidence-based approach to federal policy, the legislation was specifically designed to create a close connection between funding eligibility and the supporting research evidence. The final conference report language for the FY 2010 Consolidated Appropriations Act stated that funded programs must have been “proven effective through rigorous evaluation to reduce teenage pregnancy, behavioral risk factors underlying teenage pregnancy, or other associated risk factors.” The legislative language was broader than replicating TPP approaches and allowed for the possibility of funding other types of program models, such as programs specifically targeting STIs, and youth development models that aim to impact a wide range of youth outcomes, as long as evaluation results indicated impacts on the outcomes of interest. The Senate committee report for the legislation also indicated that the conferees intended that a wide range of evidence-based programs should be eligible for these funds.
As the legislative language was finalized, HHS recognized that implementing these new programs would require a mechanism for identifying specific teen pregnancy programs that met the legislative requirement for rigorous evaluation and proven results. An interagency group of staff from HHS program and policy offices was formed to consider possible options to ensure that funding would be awarded to programs that met the legislative requirement. The group was made up of staff-level subject matter experts, senior career leadership, and political appointees, including counselors to the secretary. In seeking to design and implement the new programs in ways that would meet the legislative requirements, the HHS interagency work group considered two possible options: (1) describing the quality of evidence that would be needed to receive funding and allowing prospective grantees to make the case for the evidence of their proposed programs in their grant applications or (2) identifying specific program models that meet a certain threshold of evidence and then publicizing these programs as being eligible for federal funding.
One of the biggest concerns with the first approach was the practical constraint of having prospective grantees submit their own evidence and then relying on grant review panels to assess the quality and strength of this evidence. Grant review panels are often comprised of a range of individuals with varied expertise related to the content area, program implementation, and evaluation. It would be difficult to ensure that the grant review panels assessed complex evaluation data in a systematic way. Therefore, the HHS interagency work group rejected this option.
For the second approach—identifying specific program models that meet a certain threshold of evidence—the HHS interagency work group again considered the possibility of two different options: (1) relying on the findings of existing reviews and research syntheses or (2) conducting a new HHS-sponsored review to specifically inform the new grant programs. In considering the first option, the work group was concerned about relying on existing registries from external organizations, which could be viewed as promoting a specific ideology. In addition, the group recognized that no existing review had the scope or specifications that aligned exactly with the conference report language.
As a result, the HHS interagency work group ultimately decided to launch a new systematic review of the research literature to inform the new grant programs. There was concern that this approach would result in a perceived list of HHS-endorsed programs, but this concern was counterbalanced by the need for a transparent, objective process sponsored by the federal government. By sponsoring its own review, the work group could tailor the review process to the specific needs of the new grant programs. The timeline could be customized to ensure the availability of the necessary findings at the time the new funding announcements were released. Similarly, the work group could adjust the scope of the review to focus on the specific age-group (up to age 19) and specific outcomes (teen pregnancy and behavioral risk factors underlying teen pregnancy) referenced in the legislative and conference report language. Finally, the work group knew it had several successful models to follow, as a number of other federal departments and agencies had recently launched their own systematic reviews. In 2002, for example, the Institute for Education Sciences within the U.S. Department of Education had launched the What Works Clearinghouse (WWC), a large-scale systematic review of education research. Other examples available at the time included the HIV/AIDS Prevention Research Synthesis (PRS) sponsored by the CDC and the National Registry of Evidence-Based Programs and Practices (NREPP) funded by the Substance Abuse and Mental Health Services Administration.
The TPP Evidence Review was launched in summer 2009 to meet these objectives. HHS tasked Mathematica Policy Research and its partner, Child Trends, with conducting the systematic evidence review under an existing evaluation contract funded by ACF within HHS. The review process was led by ACF’s Office of Planning, Research, and Evaluation and was overseen by staff from OAH, FYSB in ACF, ASPE, the Division of Reproductive Health within the CDC, and the National Institute of Child Health and Human Development. The work to develop the review methods and criteria began in August 2009. HHS needed the review findings completed by March 2010 for release in conjunction with the OAH TPP program funding announcements.
Designing the Review
In the early planning stages, staff from the HHS interagency work group made several high-level decisions about the overall focus and goals of the TPP Evidence Review. The final program legislation described the purpose of the new grant funding as supporting the replication of effective TPP programs. Following this legislative language, HHS staff specified that the TPP Evidence Review would identify and assess evidence for individual program models, not broadly defined practices or approaches (Figure 1). This approach was necessary to give prospective grantees the specific practical guidance they needed to identify a particular program model to implement and to clearly define which models were, or were not, eligible for funding. This approach of assessing evidence only for individual program models differs from the practice of some other systematic reviews, such as the U.S. Department of Education’s WWC, which assesses evidence for both individual program models and more broadly defined practices. As discussed in greater detail later in this section, the decision to assess evidence for individual program models also had important implications for the specific methods used to conduct the review.

The review focused on the effectiveness of individual programs, not broad approaches.
The HHS interagency work group also defined the types of evaluation designs that would be eligible for the review. The group had debated internally whether to limit the review to experimental evaluation designs but ultimately decided that both experimental and well-implemented quasi-experimental designs should be considered. This decision was motivated by a practical need to balance the rigor of the review standards with the Congressional conference report language indicating that a wide range of programs should be eligible for funding. The group wanted prospective grantees to have choices about which programs they selected to implement. Finally, the group decided that, as with other systematic evidence reviews, the TPP Evidence Review would focus only on assessing research evidence, not evaluating programs for the age appropriateness or medical accuracy of the content of the program materials. The assessment of program content would instead be led by the individual federal agencies and offices responsible for administering the grant programs, in part because the program offices have a financial relationship with the grantees, which enables them to require any necessary changes to the program content.
Working within these broad guidelines, the review team from Mathematica then developed the specific methods and criteria used to conduct the review. In particular, the team developed a detailed strategy for (1) identifying potentially relevant studies and programs to consider for review, (2) screening identified studies against prespecified eligibility criteria, (3) assessing each eligible study for the quality and execution of its research design, and (4) synthesizing the evidence and identifying programs with demonstrated evidence of success. The review team documented these methods and criteria in a detailed review protocol, which was approved by ACF with input from the HHS interagency TPP work group at the beginning of the review process in fall 2009. The review protocol is publicly available on the TPP Evidence Review website (http://tppevidencereview.aspe.hhs.gov/).
In developing the protocol, the review team relied heavily on the methods and criteria used by existing systematic reviews and evidence assessment groups. The team drew primarily on the methods and criteria used by the WWC. The team also considered the methods and criteria used by the Cochrane Collaboration, Campbell Collaboration, Blueprints for Violence Prevention, the CDC HIV/AIDS PRS, and NREPP. Drawing on these existing resources helped establish the review as relying on credible methods.
In some cases, however, the review team had to adapt these existing methods to account for the unique features of the TPP literature. Systematic reviews such as the WWC were developed for studies of education interventions and in other fields of research. To have credibility with TPP practitioners and researchers, the TPP Evidence Review team had to consider whether the same methods made sense for assessing TPP programs and research. For example, the review standards recommended by the Cochrane Collaboration address several issues unique to medical trials, such as the blinding of research participants to intervention status. Such standards have no practical analog in studies of TPP. By adapting existing methods and criteria, the review team created its own customized review protocol design to meet the specific needs of the TPP Evidence Review. This process of adaptation or tailoring is one example of the type of precursor or planning steps that can enhance the efficiency and ultimate usefulness of the subsequent review (Da Silva et al., In Press).
The team developed the review methods and criteria in parallel with another HHS-sponsored systematic review, one focused on home visiting programs. The Home Visiting Evidence of Effectiveness (HomVEE) review was launched by HHS in fall 2009 to conduct a systematic review of home visiting programs targeted to at-risk pregnant women and families with children from birth to age 5 (Avellar et al., 2014). HHS has used findings from HomVEE to inform funding for home visiting programs through the US$1.5 billion Maternal, Infant, and Early Childhood Home Visiting Program. The similar timelines and goals of HomVEE and the TPP Evidence Review allowed the two review teams to work in parallel by sharing resources and comparing proposed methods and criteria.
In designing the TPP Evidence Review, the team gave special consideration to the issue of transparency. As stated in a comprehensive IOM report on standards for systematic reviews, “the most important standard [in conducting a systematic review] is to be transparent in reporting what was done and why” (IOM, 2011, p. xii). For most systematic reviews, the need for transparency means providing a clear and detailed description of the overall review methods and criteria. For the TPP Evidence Review, however, the review team also expected close scrutiny of the team’s assessment of each individual program and study. Indeed, given the direct connection between the review findings and funding eligibility, it seemed likely that the team’s assessment of individual programs—why some programs met the criteria for funding whereas others did not—would garner more attention than the details of the overall review process. To address this need, the review team provided as part of its findings a detailed explanation of the team’s assessment of each individual study. These explanations indicated (1) whether the study met the review eligibility criteria, (2) the rating the study received for the quality and execution of its research design (categorized as high, moderate, or low), and (3) whether the study findings met the review criteria for evidence of program effectiveness. The specific details of these criteria are described in Goesling, Colman, Trenholm, Terzian, and Moore (2014). These individual study-level explanations became a major focus of the review and were made publicly available upon release of the review findings.
The review team also gave special consideration to the issue of accuracy in the review process. As with all systematic reviews, there was some risk of error at each stage of the process, from the initial literature search through the study assessments and evidence synthesis. The literature search could fail to uncover a potentially relevant study, the study screeners could wrongly exclude an eligible study, and so on. Given the high stakes of the review findings, the review team had to make special effort to ensure the accuracy of the review process and minimize the risk of error. The team was especially concerned about errors that could wrongly exclude a program from federal funding eligibility. To reduce this risk, members of review team all received an in-depth training on the review methods and criteria at the beginning of the study. The team also embedded several quality-assurance checks into the review process—for example, using teams of two reviewers to conduct study quality assessments. These procedures increased the amount of time and resources needed to complete the review but the added costs were necessary, given the higher stakes.
The review team also had to think practically when developing the review methods and criteria. Although HHS had specified the overall types of evaluation designs eligible for review (both experimental and quasi-experimental designs), the review team had to make many other consequential decisions, such as recommending standards for study design, quality, and relevant outcome measures. In making these decisions, the review team sought to develop review criteria that were as rigorous as possible. However, in part to meet the practical needs of the federal grant announcements, the team also had to make a realistic assessment of the current state of the research evidence. For example, the team could not require programs to show evidence from a large-scale randomized controlled trial conducted with a nationally representative sample of teens because a study of this scale is infeasible in the field of TPP research and sets an impossible standard. Similarly, the team could not limit the types of outcomes considered to pregnancies and STIs as validated by biological tests because most research in this field relies on self-reported survey data. To strike the right balance between the dual needs for rigor and practicality, the review team conceptualized programs as falling on a continuum of evidence ranging from strong to weak. The team designed the review criteria to identify those programs at the top end of the continuum. The review sought to identify programs with the “best available” evidence, not those meeting an abstract definition of the ideal or “best possible” evidence.
One particular challenge the review team faced was in developing the final synthesis stage of the review process. Review authors often use meta-analysis as the final step of a systematic review (IOM, 2011). A meta-analysis typically involves averaging or summarizing effect sizes for a common outcome measure across multiple studies. However, in the field of TPP research, most individual program models have been evaluated only once, and there is not always consistency in outcome measures or analysis methods across studies (Goesling, 2015). For these reasons, the review team had very few studies to synthesize for any one individual program model, which in turn limited the types of synthesis methods available (Valentine et al., In Press). For most of the individual program models included in the review, the team’s assessment of the evidence was necessarily limited to the findings of a single impact study. A more detailed description of the review methods is provided in Goesling et al. (2014).
Release of the Review Findings
The team conducted the review over a roughly 6-month period from fall 2009 through winter 2010. Both the initial review findings and the associated federal funding announcements were then released in spring 2010. The first federal grants from these new programs were awarded in early fall 2010, just over a year after HHS had started its initial planning.
The review findings were released in close coordination with the new federal funding announcements. The funding announcement for “Tier 1” of the OAH TPP program was released in April 2010. The announcement highlighted a group of 28 TPP programs that had met the review criteria for demonstrating evidence of effectiveness (Table 1). Prospective grantees had to choose from among these 28 programs when submitting their applications. The same programs were also highlighted in the August 2010 funding announcement for the State PREP formula grant program, and grantees were encouraged, though not required, to select one of them.
Programs Eligible for Grant Funding.
Note. STIs = sexually transmitted infections; + = statistically significant program impact; o = no statistically significant program impact; na = not available (either not measured or did not meet review criteria).
aPrograms supported by a randomized controlled trial that met the review criteria for a high-quality rating.
More detailed information on the review findings and procedures was made publicly available on a federal website. The website provided a comprehensive list of all studies and programs considered for the review, brief descriptions of the 28 programs that met the review criteria for evidence of effectiveness, and a detailed description of the review methods and criteria. The funding announcements encouraged prospective grantees to review the website materials as they prepared their applications.
As expected, many of the initial public questions about the review findings focused on the review team’s assessment of individual programs or studies. In particular, people wanted to know why certain well-known or widely distributed TPP programs had failed to meet the review criteria for evidence of effectiveness. The website materials helped answer some of these questions, but in some cases, program developers and researchers also reached out to the review team or federal staff to request a more detailed explanation. The review team worked with federal staff to respond to these requests on a case-by-case basis.
Recognizing the possibility that researchers might bring forward new research evidence in an effort to qualify for funding, HHS also built into the funding announcement for the OAH TPP program a process for submitting new evidence. This process allowed prospective grantees to submit new, additional research evidence along with their grant applications. The review team then assessed this new evidence using the same methods and criteria used for the initial review of the evidence. In the end, only three applicants chose to submit additional evidence, and none of this evidence met the review criteria for design quality or evidence of effectiveness.
Over time, HHS received additional questions about the broader review process as well as some criticism of the review methods and criteria. For example, the Coalition for Evidence-Based Policy, a former nonprofit organization that was active at the time in encouraging the use of research in public policy making, released a short report describing the review as an “excellent first step” but arguing for higher standards in identifying programs with demonstrated evidence of effectiveness (Coalition for Evidence-Based Policy, 2010). In particular, the Coalition argued for requiring programs to have demonstrated evidence of sustained, long-term impacts on teen pregnancy rates from at least one well-implemented randomized controlled trial. The Coalition noted that only 2 of the 28 programs initially identified by the review met this higher standard. The other 26 programs were supported by evidence of shorter term impacts, evidence from a well-implemented quasi-experimental design, or evidence of effects on measures of STIs or sexual risk behaviors (not pregnancy). On the basis of this criticism, the Coalition recommended both (1) that prospective grant applicants think carefully about the supporting research evidence when selecting from among the group of 28 eligible programs and (2) that HHS use the new grant funding as an opportunity to further strengthen the evidence base and identify additional program models meeting a higher standard of evidence.
Other critics have taken the opposite position that the review places too much emphasis on randomized controlled trials and similar program impact evaluations. For example, Schalet et al. (2014) contend that federal support for adolescent sexual and reproductive health should encompass a broader range of scientific evidence, such as research on gender; economic inequalities; and lesbian, gay, bisexual, transgender, queer, and questioning youth. Similarly, some practitioners have criticized HHS’s decision to focus the review only on research evidence, not on the content of program information and services (SIECUS, 2014). They believe that issues such as the age appropriateness and medical accuracy of program content and materials should be addressed as a key part of the TPP Evidence Review, not through a separate process overseen by individual federal program offices as part of the grant funding process.
In part to address these questions and concerns, HHS has taken additional steps since the initial release of the review findings to further clarify the purpose and goals of the review as well as to provide further information on the review methods and criteria. For example, in December 2010, HHS and the review team from Mathematica hosted a public webinar to describe the details of the review process and findings. The webinar also provided opportunity for public questions and comment. The review team has disseminated information on the review through professional conference presentations as well as research briefs and journal articles intended to supplement the main review findings. In 2012, ASPE released a research brief summarizing six key lessons from the review process, such as the importance of building on existing systematic reviews, engaging study authors and outside experts, and considering whether programs have sufficient implementation materials available to allow for broader dissemination (Goesling, 2012).
Most recently, in summer 2014, HHS launched a new stand-alone website for the TPP Evidence Review to make information on the review process and findings more easily accessible (http://tppevidencereview.aspe.hhs.gov/). The website is designed to address the needs of diverse stakeholders, including practitioners looking to select a program model to implement in their community (with or without federal funds), program model developers looking to identify gaps in the range of existing models, and researchers in the field of TPP. The website includes information about the review methods, criteria, and findings, along with answers to frequently asked questions. For the programs that have met the review criteria for evidence of effectiveness, the website also includes a searchable database of program models by program type, length, target population, and outcomes impacted. The website is motivated in part by a growing recognition that systematic reviewing findings alone are not enough to meet the needs of practitioners and other key stakeholders, and that translating the review findings into practice requires the availability of implementation planning systems, “technical packages” (Haegerich, David-Ferdon, Noonon, Manns, & Billie, In Press), and other types of information necessary to help organizations identify, select, and implement evidence-based programs and practices (Paulsell, Thomas, Monahan, & Seftor, In Press).
Updating the Review Findings
Since the release of the initial review findings in spring 2010, the TPP Evidence Review team has conducted additional rounds of study assessments to incorporate more recent research. The first update to the TPP Evidence Review was released in spring 2012, covering research released from January 2010 through January 2011. This update identified three new programs meeting the review criteria for evidence of effectiveness, increasing the total number of identified program models from the 28 featured in the initial grant announcements to an expanded group of 31. In summer 2014, HHS released a second update to the review findings. This update covered research released from January 2011 through April 2013 and identified another four programs meeting the review criteria for evidence of effectiveness, bringing the total number of identified programs to 35. A third update of review findings was released in February 2015 and identified two additional programs meeting the review criteria, for a total of 37 program models identified as having evidence of effectiveness (Table 1). Results from a fourth update to the review were released in early 2016 (Lugo-Gil et al., 2016).
The purpose of these periodic updates is to keep the review findings as current as possible. By keeping the review findings up to date, HHS aims to establish the TPP Evidence Review as an ongoing available resource for any federal department or agency developing a new program or making funding decisions. HHS also views the review updates as a broader public resource, providing information on TPP research to state and local governments, youth-serving professionals, and other interested organizations. The review updates have also helped to identify and highlight continued gaps in the evidence base, such as a lack of replication studies and the need for evidence-based programs for Latino youth and high-risk populations, such as youth involved in the child welfare or juvenile justice systems (Goesling et al., 2014). The need for programs targeted to high-risk groups is especially important, since serving these groups is often a high priority for the state and local organizations that apply for federal grant funding.
However, these periodic updates to the review findings have also presented new challenges. For one, such updates to the review findings have the potential to create both positive and negative incentives for future research. On the positive side, the plan for regular updates to the review findings has the potential to encourage more and higher quality research, as program developers and researchers seek to design and conduct high-quality studies that can meet the review criteria. On the negative side, the review may also encourage certain types of reporting bias if program developers and researchers become fearful of reporting findings that may hurt the standing of their programs in the review. In addition, once a program has successfully met the review criteria, program developers and researchers may be reluctant to pursue additional research on the model, for fear of jeopardizing the program’s standing in the review and eligibility for federal grant funding. In part to counteract these potential negative incentives, HHS built evaluation requirements into the new federal grant programs. For example, the first round of funding for the OAH TPP program required a subset of grantees to conduct rigorous experimental or quasi-experimental impact evaluations of their program (Kappeler & Farb, 2014). The evaluation requirements also address the potential for reporting bias—for example, by requiring grantees to develop prespecified analysis plans and to fully report on all prespecific outcomes.
With each successive update to the review findings, the main goal of the updating process has also shifted. For the TPP Evidence Review, the initial review of the evidence and first three updates to the review findings focused primarily or exclusively on identifying new program models and expanding the range of programs that have met the review criteria for evidence of effectiveness. Although this focus on identifying new program models will continue, the review team must also spend an increasing amount of time and attention on updating assessments of program models previously reviewed. As discussed earlier in this article, most TPP programs have been evaluated only once, often in small-scale efficacy trials conducted in closely managed settings. However, due in part to the evaluation requirements HHS included with the new federal grant programs, there is now a growing body of replication research that seeks to test how these programs perform when taken beyond their original research studies and implemented on a broader scale, in new settings, and with different populations. The TPP Evidence Review will use findings from these studies to update its assessments of program models previously reviewed. In some cases, these updated assessments may show an increasing strength of evidence if the new studies are successful in replicating a program’s initial evidence of positive effects. In other cases, the updated assessments will show less evidence of success, and HHS will need to determine whether programs without a successful replication will remain eligible for federal funding. We provide a more detailed discussion of replication studies and their implications for the evidence base in Goesling (2015).
Finally, ongoing updates to the review findings also require periodically revisiting and considering possible changes to the review methods and criteria. For one, the research evidence covered in the review eventually becomes outdated. To keep the review findings up to date, review authors must update the eligibility criteria to highlight the most current research. For example, the eligibility criteria for the TPP Evidence Review were recently updated to require programs to have at least one study conducted within the past 20 years. As along as a program meets this criterion, evidence from the full body of research evidence for the program is considered for the review. For example, if a program had one study conducted in the early 1990s but another conducted in the early 2000s, the review team considers evidence from both studies when assessing evidence of program effectiveness. However, programs for which the only available evidence is more than 20 years old are now excluded from future updates to review.
As another example, the criteria for the TPP Evidence Review were also recently updated to include outcome-specific assessments of program effectiveness—for example, assessing a program’s impact on teen pregnancy outcomes separately from its impact on outcomes related to STIs or sexual risk behaviors. On the basis of these outcome-specific assessments, a program may be identified as having positive impacts on one type of outcome but null or no evidence for other outcomes. These changes to the review criteria are intended to help differentiate the evidence supporting different programs and to allow both program funders and local communities the opportunity to prioritize certain programs on the basis of the evidence for different outcomes.
Prospects for Future High-Stakes Systematic Reviews
What are the prospects for conducting high-stakes systematic reviews in other policy and practice areas? The experience of the TPP Evidence Review demonstrates both the benefits and challenges of directly connecting funding decisions to the findings of a systematic review. On the positive side, the use of systematic review findings can help ensure that funding decisions are grounded in the latest scientific research evidence. It can offer a more objective and rigorous alternative to expert opinion and more informal research syntheses, and it can help ensure a high level of transparency and accountability in the decision-making process. The use of systematic review findings in funding decisions can also help create positive incentives for future research, by highlighting current gaps in the evidence base and by encouraging program developers and researchers to conduct more and higher quality studies in effort to meet the review criteria.
High-stakes systematic reviews may be particularly effective when dealing with sensitive cultural or political issues. In the field of TPP, much of the past 20 years of public policy discussion has centered on a highly polarized debate about the relative drawbacks or merits of abstinence education versus more comprehensive approaches to sexuality education (cf., Advocates for Youth, 2007; Kim & Rector, 2010). The TPP Evidence Review represented an effort to move beyond this debate, by focusing on programs with the strongest evidence of effectiveness within the field, regardless of program content or approach. As Haskins and Margolis (2015) note, shifting the focus to evidence does not necessarily mean the end of the political debate. For example, funding for the OAH TPP program is subject to annual appropriations bills within the U.S. Congress and therefore tied to broader Congressional debates about budget priorities and overall levels of federal spending. As Haskins and Margolis (2015, p. 99) put it, this “legislative process gives no special dispensation to legislation to establish and fund programs simply because they are evidence based.” However, in the context of culturally and politically sensitive issues like TPP, there is likely more common ground in discussions of evidence than in discussions of the content or theoretical basis of different programs or approaches.
Using systematic review findings to inform funding decisions requires a well-established body of supporting research evidence. The TPP Evidence Review benefited from the widespread use of program evaluations in TPP research, particularly a large and growing number of randomized controlled trials. This evidence supplied a strong foundation on which to base the review findings and the resulting federal funding decisions. The use of systematic review findings in funding decisions does not necessarily require randomized controlled trials per se. In the TPP Evidence Review, for example, the review team also considered evidence from higher quality quasi-experimental designs. However, the strength of the underlying evidence will ultimately determine the strength of the resulting review findings and level of confidence in the subsequent funding decisions. Systematic review findings are unlikely to provide a sound basis for funding decisions in the context of new or still emerging bodies of research.
Other notable challenges of this approach include the significant time and resources required to conduct the review, the need to balance methodological rigor against the practical needs of whatever funding decision the review is intended to inform, and the possibility of encouraging reporting bias or other adverse incentives for future research. These challenges add time and cost to nearly every aspect of the review process from the initial design and planning to the dissemination and future updating of review findings. Indeed, experience from the TPP Evidence Review suggests that much of the work begins only after initial release of the review findings: answering questions, further disseminating information and findings, and planning for future updates and refinements. The sponsors and authors of future high-stakes systematic reviews must carefully plan for these challenges before committing to link the review findings to practical funding decisions.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by a U.S. Department of Health and Human Services contract with Mathematica Policy Research (HHSP23320095642WC); however, the views expressed here do not necessarily reflect the official policies of the Department of Health and Human Services; nor does mention of trade names, commercial practices, or organizations imply endorsement by the U.S. government.
