Abstract
Purpose:
This case study discusses Mathematica’s experience providing large-scale evaluation technical assistance (ETA) to 65 grantees across two cohorts of Teen Pregnancy Prevention (TPP) Program grants. The grantees were required to conduct rigorous evaluations with specific evaluation benchmarks. This case study provides an overview of the TPP grant program, the evaluation requirements, the ETA provider, and other key stakeholders and the ETA provided to the grantees. Finally, it discusses the successes, challenges, and lessons learned from the effort.
Conclusion:
One important lesson learned is that there are two related evaluation features, strong counterfactuals and insufficient target sample sizes, that funders should attend to prior to selecting awardees because they are not easy to change through ETA. In addition, if focused on particular outcomes (for TPP, the goal was to improve sexual behavior outcomes), the funder should prioritize studies with an opportunity to observe differences in these outcomes across conditions; several TPP grantees served young populations, and sexual behavior outcomes were not observed or were rare, limiting the opportunity to observe impacts. Unless funders are attentive to weaning out evaluations with critical limitations during the funding process, requiring grantees to conduct impact evaluations supported by ETA might unintentionally foster internally valid, yet underpowered studies that show nonsignificant program impacts. The TPP funder was able to overcome some of the limitations of the grantee evaluations by funding additional evidence-building activities, including federally led evaluations and a large meta-analysis of the effort, as part of a broader learning agenda.
Keywords
The Office of Adolescent Health (OAH), recently merged into the Office of Population Affairs, at the U.S. Department of Health and Human Services (HHSs) contracted with Mathematica to provide evaluation technical assistance (ETA) to the first and second cohorts of Teen Pregnancy Prevention (TPP) Program grantees. 1 Grantees conducting rigorous evaluations received extensive ETA to ensure their research designs could meet the U.S. Department of HHSs’ TPP Evidence Review standards (Goesling et al., 2014)—the benchmark for the field. The goals of this investment in evaluation—and technical assistance to support evaluation—were to (1) build evaluation capacity among grantees, evaluators, and federal staff and (2) broaden and enhance the evidence base of TPP programs with dozens of new, internally valid studies. OAH intended that this effort would produce credible evidence of program effectiveness by showing that a subset of programs had favorable and statistically significant impacts on behavioral outcomes of youth (i.e., sexual initiation, sex with condoms or other contraceptives, pregnancy, and births).
This case study discusses our experience providing large-scale ETA, working with 65 grantees across the cohorts that were required to conduct rigorous evaluations with specific and clear evaluation benchmarks. The case study provides an overview of the motivation for the TPP grant program as well as a description of the grantees, their evaluation requirements and funding, the ETA provider and other key stakeholders, and the ETA provided to the grantees. Finally, it discusses the successes, challenges, and lessons learned from the effort.
Policy Background/Context for the Grant Program
Although the birth rate for teenagers continues to drop nationwide, large disparities persist in teen birth rates across some populations (Martin et al., 2019). Teen pregnancy rates among Hispanic and Black teens are approximately double that of White teens, and teen pregnancy rates in the South are much higher on average than in the Northeast. For instance, teen pregnancy rates are roughly three times as high in Alabama, Arkansas, Tennessee, and West Virginia as they are in Connecticut, Massachusetts, New Hampshire, and New Jersey. In addition, there are some “hot spots” where counties’ teen birth rates are markedly higher than the local or state average (Amin et al., 2017). Furthermore, rates of sexually transmitted diseases (STDs) are high among adolescents. The Centers for Disease Control and Prevention (2017) estimates that approximately one quarter of sexually active adolescent females has an STD.
In 2010, federal funds established the Office of Adolescent Health, including US$110 million to fund what became the first cohort of the TPP Program (Kappeler & Farb, 2014). The creation of this office and the establishment of the TPP Program was intended to support the implementation and evaluation of programming designed to continue the overall decline in teen pregnancy rates, reduce disparities in teen pregnancy by race/ethnicity and geography among other characteristics, and lower rates of STDs among adolescents. The TPP Program provides funding to organizations to implement and evaluate evidence-based sexual health programming, or generate new evidence of program effectiveness, as part of a larger federal effort to use and create evidence through tiered-evidence grant programs (Council of Economic Advisers, 2014).
OAH funded its first two cohorts of TPP grantees in 2010 and 2015, respectively. The 5-year grants provided for a planning year, about 3 years for implementation and data collection, and roughly a year for analysis and reporting. In July 2017, 2 years into the second cohort of the TPP grant program, grantees were notified by HHS that their projects would end in June 2018, after only the third year of the 5-year grant period (Haskins & Kane, 2018). In September 2018, the ETA contract was defunded as well.
The termination of grantee funding disrupted plans for program implementation and the feasibility of completing an in-process 5-year impact evaluation. Several lawsuits were filed on behalf of the TPP grantees in early 2018; by June, rulings from these lawsuits required HHS to process the year four applications from these grantees as if the agency had not originally shortened the programs (Hellman, 2018). The ETA contract was not refunded. The 1-year period of uncertainty caused some grantees to end sample enrollment earlier than expected or drop long-term follow-up data collection. These grants were scheduled to end in July 2020, however most have received no cost extensions, and it remains to be seen what evidence they will be able to produce from these interrupted evaluations.
In 2018 and 2019, HHS reissued funding opportunity announcements for 3-year TPP grants; however, these grants were not intended to support rigorous impact evaluations. In the remainder of the article, we focus on the ETA offered to the first two cohorts of TPP grantees who conducted rigorous impact evaluations and the results of the first cohort.
Description of Grantees and Their Programs
OAH’s funding approach differed for the two cohorts of grantees. In Cohort 1, grantees were required to deliver programs deemed effective by the TPP Evidence Review (Tier 1) or promising programs (Tier 2), with most of the funding designated for implementing evidence-based programs (Farb & Margolis, 2016). The Tier 1 replication and evaluation grants were intended to test the effectiveness of programs previously determined to be evidence based that were implemented with new populations or in new settings. The Tier 2 programs were intended to generate evidence of effectiveness for new/innovative programs that appeared to be promising but lacked evidence of program effectiveness from an impact evaluation. Cohort 1 comprised 102 TPP grantees—75 Tier 1 grantees and 27 Tier 2 grantees. Tier 1 grantees that received more than $1,000,000 per year and all Tier 2 grantees were required to conduct rigorous evaluations. Mathematica provided ETA to the 41 TPP Cohort 1 grantees conducting rigorous evaluations (15 Tier 1 and 26 Tier 2 grantees) who were not pulled in to be part of one of the two national evaluations being conducted under separate federal contracts.
In Cohort 2, the TPP Program had a broader set of funding categories within its Tier 1 (replication) and Tier 2 (innovation) approaches. In this second cohort, the Tier 1 grants were split into (1A) approaches focused on funding organizations to help build the capacity of youth-serving organizations to implement evidence-based TPP programs and (1B) approaches focused on direct service provision of evidence-based TPP programs among settings and populations with the greatest needs. A key difference in Cohort 2 was that rigorous impact evaluations were not required for any of these Tier 1 replication grants. For the Tier 2 innovation grants, Tier 2A grants were offered to promote the development of innovative programs, and Tiers 2B and C grants were provided to conduct rigorous impact evaluations of promising programs. Tier 2B focused on broadly promising programs, and Tier 2C focused on programs designed for males. Only 24 grantees receiving Tier 2B or Tier 2C grants under Cohort 2 conducted a rigorous evaluation and received ETA. 2
The grantees and evaluations receiving ETA were diverse in many ways, and this diversity affected the ways in which technical assistance (TA) was provided (see the section on ETA framework for more specifics). In both cohorts, grantees included state and local agencies, community-based organizations, universities, hospitals, and other organizations. In Cohort 1, grantees implemented 33 different programs, and in Cohort 2, they implemented 24 different programs. Grantees provided sexual health programming in school during school, in school after school, in community-based settings, in clinics, and, in Cohort 2, online. Programs lasted from a single session to several years. Grantees were required to serve youth aged 12–19, with some limited exceptions.
Evaluation Requirements and Expectations for Strength of Applications
As mentioned earlier, a key goal of this investment in evaluation was to produce studies of TPP programs with evidence of effectiveness that meet the TPP Evidence Review standards. The TPP evidence review systematically assesses the quality of evidence for studies in the TPP field 3 . The review process assesses the rigor of the study design and evaluation implementation to determine whether the study is of sufficient quality to be considered causal evidence.
To achieve the goal of meeting TPP Evidence Review standards, grantees conducting an impact evaluation had to implement a randomized controlled trial or a quasi-experimental design with a comparison group and were required to work with the ETA provider throughout the course of the evaluation. Nearly all of the evaluations in both cohorts were randomized controlled trials, with roughly half of the evaluations randomizing youth and half randomizing clusters (i.e., teachers, classrooms, or groups of youth in community-based organizations). The expected sample sizes were anywhere from 100 to 12,000 youth.
OAH was interested in identifying programs that impact youths’ ultimate behavior and not simply their attitudes, intentions, or knowledge (Farb & Margolis, 2016). Therefore, grantees were required to assess program impacts on behavioral outcomes (for instance, sexual initiation or risky sexual behavior), which are outcomes of interest for the TPP Evidence Review.
Furthermore, OAH encouraged grantees to use independent evaluators with experience conducting rigorous evaluations of similar size and scope—in particular, with substantive experience and capacity to conduct real-time data collection, achieve high-response rates on surveys measuring behavioral outcomes, conduct and report credible analyses and findings, and publish papers and present them at professional conferences. It was likely challenging for grant panels to assess these qualifications in some cases, given strong grant writing, space constraints, or a panel member’s lack of knowledge about what is needed to execute a large-scale evaluation.
Funding Available for Evaluation
For Cohort 1, OAH recommended 20–25% of the grant be used for evaluation activities. For Cohort 2, grantees were expected to earmark an adequate amount of grant funds for evaluation activities, but OAH did not explicitly suggest an exact amount. However, for both cohorts, the evaluation received the same dollar amount of funding each year of the 5-year grant (in other words, the evaluation was expected to spend the same amount in the planning year as in the middle of the evaluation when incurring most of the data collection costs). To the extent permitted by its grants’ office, OAH allowed the grantees to carry over unused funds from 1 year to the next to try to address spending variation across years.
In both cohorts, OAH issued cooperative agreements rather than grants to funded organizations. Cooperative agreements allow for substantial involvement of the government in decision making about the work, in contrast to grants, which provide far greater autonomy to the funded organization. A key benefit of using cooperative agreements was the understanding that OAH could withhold funding if grantees did not comply with agreed-upon activities—including participation in all ETA activities and implementing changes to evaluation practice when the ETA team had strong recommendations. In most cases, grantees were responsive to suggestions from the ETA team and from their federal project officer. However, in some instances, OAH issued corrective action letters to grantees who were unwilling to make the changes to their evaluation suggested by the ETA team. For example, one grantee’s original evaluation design compared a single treatment cluster to a single comparison cluster—such a design would be rated as having a confounding factor and the study would be rated as having a “low” level of evidence by the HHS evidence review team. This corrective action letter issued by OAH to the grantee indicated that future years of funding were to be withheld if several additional treatment and comparison clusters were not recruited. As the TA provider, Mathematica was not privy to all instances in which corrective action letters may have been used. However, when we were informed, grantees were responsive and eventually made the required changes to their evaluation to the extent they were able. 4
Description of ETA Provider
As mentioned earlier, Mathematica was funded as the ETA provider, with two subcontractors (Concentric Research and Evaluation and Twin Peaks Partners). Mathematica’s ETA contract began in September 2010, several months after Cohort 1 grantees received their awards.
A fundamental feature of the ETA approach was the use of ETA liaisons as the focal point of contact for each grantee and evaluator with the broader ETA team. ETA liaisons were researchers, most of whom had doctoral degrees and experience conducting impact evaluations. The ETA liaisons familiarized themselves with each grantee and evaluation by reviewing the grant application and then joined in regular communication throughout the course of the 5-year grant period. ETA liaisons were matched with grantees so that there would be some commonality in features of the TA being covered by a liaison. For example, an ETA liaison might be working with several grantees who shared a common design feature, such as all using school-based cluster randomized controlled trial designs to test the effectiveness of their programs, or with the same evaluator who worked on multiple grants.
In Cohort 1, a single ETA team member served as an evaluation liaison for each grantee and its evaluator. In Cohort 2, each grantee received support from a team of two ETA liaisons—a more senior liaison and a junior liaison to provide training opportunities and facilitate more timely feedback to the grantees, evaluators, and OAH. Although grantees worked most closely with their ETA liaisons as the frontline ETA providers, each ETA liaison tapped into the expertise of the larger ETA team and other senior staff (e.g., survey data collection experts, methodologists) at Mathematica when needed.
ETA Contract and HHS Evidence Standards
One goal of the ETA contract was to help grantees and evaluators produce evidence of program effectiveness that would meet the TPP Evidence Review standards. The ETA framework and monitoring were built around achieving this goal, with careful attention to monitoring potential threats to internal validity (such as differential attrition or lack of baseline equivalence) and factors that would influence the study’s ability to detect a favorable and statistically significant impact (Cole et al., 2016).
The HHS TPP Evidence Review assesses the credibility of the evidence of programs aiming to reduce adolescent pregnancies, sexually transmitted infections, and sexual risk behaviors (Goesling et al., 2014). The evidence review first systematically assesses the credibility of the evidence from a study (i.e., the internal validity of the study) and places the evidence into one of the three categories based on features of the design and the threats to internal validity observed in the study. The bullets below capture the broad features of these three categories, although masking some of the nuance/details: High quality: Randomized controlled trials with low levels of sample attrition and statistical controls for any baseline nonequivalence. Moderate quality: Randomized controlled trials with high attrition or quasi-experimental designs, and the study demonstrates baseline equivalence. Low quality: Randomized controlled trials with high attrition or quasi-experimental designs, and the study does not demonstrate baseline equivalence.
In addition, no studies could have confound such that the design could not isolate the effect of the intervention, such as comparing only one unit in each condition.
The threats to the internal validity of the study that are specifically assessed by the HHS Evidence Review created a framework of evaluation features that the ETA team monitored and used to provide TA. The team looked for anything about the design, implementation, or analysis of the results that would compromise the integrity of the design. For example, the ETA team monitored sample attrition in randomized controlled trials and, in all studies, monitored the degree to which treatment and comparison groups were well matched on demographics and baseline measures of outcomes of interest. When grantees appeared to have problematic sample attrition rates or baseline nonequivalence, the ETA team would offer guidance to address the issue so that more credible evidence could come out of the evaluation and it could meet HHS evidence standards.
The ETA team also monitored features to increase the chances the study would detect favorable and statistically significant impacts. These features include program implementation, participant attendance and engagement, and the degree to which services were different for the comparison group (strength of contrast). In addition, the ETA team regularly monitored sample enrollment and response rates because the ultimate sample size plays a critical role in detecting statistically significant impacts. The team played an important role in monitoring the evaluations against HHS evidence standards and working with the evaluators to brainstorm potential solutions to the challenges they faced in meeting the standards. It was very useful for all involved—funder, ETA provider, grantee, and evaluators—to have the HHS evidence standards as concrete goals for the evaluations.
Stakeholders in the Evaluation
Several groups of stakeholders were invested in the success of each evaluation: (1) the grantees themselves, (2) the evaluator (if different from the grantee), (3) the project officer and broader OAH staff, (4) the program developers (if different from the grantee), and (5) the ETA team. All stakeholders desired that the program show favorable and statistically significant impacts on youth behavior if the program truly worked. However, these five distinct entities were effectively operating as just two or three units. The grantee and evaluator were one unit, responsible for delivering the program and conducting the evaluation. OAH, the grant’s project officer, and the ETA team were a second unit, responsible for overseeing and monitoring the evaluation, helping address issues that arose, and ensuring the credibility of the findings. Distinct from the ETA, the project officer provided programmatic technical assistance critical to the success of the effort. The third potential unit was the program developer who was interested in ensuring the evaluation was a credible test of the program they developed. There may also have been local partners, such as contracted program providers, who were heavily invested in the program’s success. They were typically less involved in the evaluation, however.
Although the ETA team supported grantees and evaluators, it did not have formal authority over the work. However, the effective clustering arrangement whereby the ETA team worked closely with OAH meant that the ETA team’s suggestions carried substantial weight because the office was the ultimate decision maker. The ETA team leadership met regularly and as necessary to ensure OAH could make decisions informed by the ETA perspective that promoted the best possible outcomes for the grantee evaluations. The cooperative agreement granted OAH both input into scientific decision making and the ability to withhold future funding if grantees were not compliant. Because OAH-supported suggestions from the ETA team regarding the evaluations, the ETA team had “teeth”—and could guide the direction of the evaluations. Therefore, although the ETA team did not have explicit or formal authority to mandate or require grantees or their evaluators to comply with all expectations, the collaborative and trusting relationship with OAH meant that ETA perspectives and suggestions were effectively communicated to the grantees and the grantees were responsive to this feedback.
Grantee-Selected-Independent Evaluators
Grantees often selected evaluators with whom they had previously worked. Evaluators included university staff, small research organizations, and even large evaluation contractors. Although many evaluators were geographically close to the grantees, some evaluators were forced to travel or hire local staff to participate in recruitment, data collection, observations of program implementation, and so on. Grantees believed their evaluators were independent from the program being tested and could conduct the evaluation without conflict of interest. Potential conflicts were not always identified during the application process, but some became apparent during the evaluation. For instance, midway through the Cohort 1 evaluation period, the ETA team learned that in one case, a husband and wife team served as program lead and evaluator. We raised the concern with OAH, and the team continued unchanged, likely because the program lead was not the program developer. The potential conflict of interest was documented in the final report.
Although grantees were expected to select experienced evaluators, the evaluations included evaluators with vastly different levels of experience, from very seasoned evaluators who had led or participated in multiple impact evaluations to evaluators with no or limited impact evaluation experience. The differences in experience meant there was variation in knowledge that led to different challenges in supporting the evaluations. Among evaluators with limited experience with clustered designs, a common misconception was the role clusters play in statistical power. Several evaluators proposed studies with only a few clusters, limiting the evaluation’s power to detect statistically significant impacts. Similarly, while some evaluators may have had experience collecting survey data, they did not have experience tracking youth over a long period of time and obtaining high-response rates.
This heterogeneity in evaluator experience and knowledge made group ETA a challenge. As a result, contrary to the original plan for the contract, the majority of ETA was conducted one-on-one. One-on-one TA was also useful because grantees worked with heterogeneous populations, such as tribal youth, rural youth, or pregnant and parenting teens; used different delivery modes, such as online or in-person one-on-one programs versus school-based programs; and varied the timing of activities, such as short programs that regularly recruited new youth versus school year cohort-based programs that recruited only twice during the grant period. We supplemented one-on-one ETA with research briefs, conference presentations, and webinars relevant to the stage of grantee evaluations.
ETA Framework
The ETA team worked one-on-one with grantees to progress through a series of evaluation milestones that occurred during different stages of the 5-year grant period (see Zief et al., 2016, for details on how this framework worked with Cohort 1 grantees). Early in the grant period, ETA efforts focused on shaping and refining evaluation designs. During the middle period of the grant, ETA focused on monitoring and troubleshooting the execution of the approved evaluation design, program implementation fidelity, and high-quality data collection. In the final stages of the grant, ETA shifted to reviewing analysis plans and final reports. These milestone stages are described in greater detail below.
To reduce the burden on the ETA team, particularly when we expected to receive 41 analysis plans or evaluation reports for review in a short period, we used standardized templates and procedures to guide the review of the products. Many of these templates, along with resources developed during the course of the contracts are available at the OAH Evaluation Training and Technical Assistance webpage (https://opa.hhs.gov/evaluation-research/evaluation-training-and-technical-assistance).
Design Review (Year 1)
In the first year of the grant, a planning year for the grantees, the ETA team worked with grantees to establish and obtain approval for a credible impact evaluation design. Given the variety of ways in which grantees described their evaluation designs in their grant applications, the ETA team developed a design review template to systematically assess and document key features of each design. This template comprised features including, but not limited to, the random assignment process, the units of assignment (individuals or clusters), plans for recruitment, processes in place to retain the sample, and data collection procedures (or approaches). The ETA team reviewed the plans, focusing on potential threats to validity, the likelihood of the study to detect a statistically significant impact (assuming the program was effective). For the latter, in addition to looking at the proposed study power, we looked for potential red flags in the assumptions of the number of eligible youth and the percentage that would consent and participate.
The ETA liaison(s) for each grantee reviewed each grant application and extracted information about the plan into the design review template to ensure that each grantee’s design was reviewed in a consistent manner. Next, the ETA liaison drafted summaries of each evaluation design, highlighting potential strengths and limitations, and a member of the ETA leadership team reviewed these summaries for accuracy and completeness. Notably, these written summaries described critical concerns in the proposed study. For example, when we encountered scenarios that might cause the study to fail to meet HHS evidence standards (e.g., if only a single unit were to serve as treatment condition), we would highlight that issue and plan to work with the grantee and evaluator to address the problem. Having the HHS evidence standards made it very easy to identify, justify, and resolve some issues. Other issues required more deliberation because there were no clear benchmarks; some also had potential budget implications, for instance, studies with insufficient statistical power.
ETA and OAH used written reviews, telephone calls, and in-person discussions to identify design problems and generate solutions with grantees and evaluators. Eventually, after several months of design refinement, the ETA team drafted a memo outlining the finalized design to OAH and recommend that the design be approved so that the grantees could begin their studies. Once OAH approved the design, the ETA team shifted its TA approach toward monitoring the quality of the ongoing evaluations.
Execution and Monitoring of the Approved Design (Years 2–4)
During the middle 3 years of the grant, the grantees and evaluators enrolled youth into their studies, implemented their interventions, and collected implementation and outcome data. As a result, the focus of ETA evolved to monitoring the progress of the evaluations and solving the real-world problems that commonly occur in all evaluation research (see section on Common Problems Encountered During ETA). There were three key activities conducted with grantees during this middle period to help support the evaluations:
Monthly monitoring calls
A very important feature of the ETA approach was monthly calls with grantees, evaluators, and project officers to monitor evaluation progress; identify risks to the study; and brainstorm solutions for solving them. Having frequent, regular calls enabled the project officers and the ETA team to identify problems quickly and address them to the extent possible before they compromised the integrity of the evaluation.
The ETA team developed a call protocol to guide these regular conversations with each grantee team. During a typical call with grantees, the ETA liaison(s) might ask about features of program implementation, for example, the extent to which the intervention was being implemented as intended (including whether youth were receiving a sufficient dose of the program) or whether the grantee had sufficient staff to implement in all settings. In addition, typical calls would monitor evaluation progress on key targets laid out in the approved design—for example, whether the grantee was achieving recruitment/enrollment targets or whether targeted data collection response rates were being achieved. By regularly monitoring key features of implementation and evaluation, we could identify issues early on and offer solutions and feedback to address those issues.
Regular reviews of evaluation progress
In addition to the monthly monitoring calls during years 2–4, the ETA team created Consolidated Standards of Reporting Trials (CONSORT) diagrams and other documents to assess sample flow, attrition, and baseline equivalence of the enrolled sample. The ETA team used the HHS evidence standards to guide these reviews and determine whether evaluations were on track to meet these standards (Goesling et al., 2014). When grantees had potentially high levels of sample attrition, the ETA team offered guidance and resources to attempt to address this concern—for example, suggestions for obtaining additional contact information of sample members at enrollment—to enable better long-term tracking of the sample.
These reviews led to the development of supplemental ETA materials to address common problems. Evaluations struggled with sample tracking and retention, which were addressed in a brief on tracking and retention (Batten & Myrick, 2016). Review of early data suggested another common issue was baseline equivalence, which was addressed in a brief on analytic approaches to produce impacts that would potentially meet HHS evidence standards (Cole & Agodini, 2014).
The ETA’s regular review of the evaluations’ key features enabled us to monitor threats to validity and statistical power. Additionally, the reviews offered an opportunity for evaluators to understand the HHS evidence standards better and to think about what they should present in final reports. Thus, the CONSORT diagram and baseline reporting helped to build capacity in evaluators while informing the ETA efforts.
Interim reporting products
As the evaluation moved through the midpoint of the grant funding, the ETA shifted focus to interim products that would establish the foundation of a credible final report. In Years 3 and 4, the ETA team requested that grantees complete both impact and implementation analysis plans—these plans were based on templates, available at the OAH ETA page (https://opa.hhs.gov/evaluation-research/evaluation-training-and-technical-assistance).
The impact analysis plan template contained sections for impact research questions, a description of the study design, the data collection procedures, and a detailed description of the analytic approaches the grantee would use to estimate the effects of the interventions to answer the impact research questions. In particular, the impact analysis plan template required documenting how key outcome variables would be coded, methods for handling missing data, and sensitivity analyses to assess the robustness of findings to different data preparation or analytic decisions. Grantees drafted impact analysis plans addressing the sections included in the template, which were then reviewed by their ETA liaison. The ETA liaison and a separate technical reviewer examined the description of the analytic approach relative to the design and assessed whether the approach would produce a credible test of program effectiveness. In particular, by carefully reviewing these plans relative to the TPP evidence standards, the ETA team understood before seeing the final reports whether the planned approaches described were able to produce estimates of program effectiveness that were likely to meet HHS evidence standards.
The implementation analysis plan template contained sections for implementation research questions as well as sections focused on data collection and analytic methods for answering the proposed research questions. Notably, the implementation analysis plans requested grantees and evaluators to carefully describe four features of implementation: (1) adherence of implementation to the program model or the planned intervention, (2) quality of delivery of the intervention (if available), (3) counterfactual experiences (to understand the effective contrast between intervention and control groups), and (4) contextual factors that may have affected program implementation and/or the evaluation. By requiring all grantees to provide comparable and comprehensive information on program implementation, each evaluation had the necessary documentation to contextualize the impact findings.
As with the design review process, the ETA team provided written feedback about both the impact and implementation analysis plans and participated in multiple phone calls with grantees and evaluators to discuss comments and brainstorm how to improve the plans. Grantees submitted revised plans in response to ETA feedback, sometimes repeatedly, before the ETA team recommended the plan for approval by OAH. After OAH approved the plans, the grantees and evaluators were expected to follow the approved plans when beginning their analyses and use the approved text as a starting point for their final evaluation reports. In addition, upon approval of the plans, evaluators were encouraged to register their impact evaluations at a clinical trial registry such as clinicaltrials.gov if they had not already registered them.
Evaluation Reporting (Year 5)
In the final year of the grants, evaluators conducted the implementation and impact analyses they had prespecified, drafted a final report, which was revised with the ETA team until it was recommended to OAH for approval. The final report was again based on a template (available at the OAH ETA webpage) and contained several sections identical to the impact and implementation analysis plan templates. This was intentional; we deliberately had grantees provide information in the analysis plans, CONSORT diagrams, and baseline equivalence reporting to reduce the burden of completing the final evaluation report during the fifth year of the grant.
The ETA team conducted a preliminary, unofficial assessment of the impact findings described in the final reports relative to TPP Evidence Review standards. In particular, the ETA team reviewed the description of the study assignment procedures, assessed both sample attrition and baseline equivalence relative to TPP evidence standards, and examined the impact analytic approach based on the final sample(s) used to show program effectiveness. Through this review process, which typically required three rounds of feedback from the ETA team, grantee final reports had an initial assessment of credibility before public release on the OAH website.
Common Problems Encountered During ETA
During monitoring calls with grantees and evaluators, several common problems came up frequently that, if left unchecked, would threaten the internal validity of the study or compromise the study’s ability to show statistically significant impacts on behavioral outcomes. When such problems were identified, the ETA liaison(s) would offer immediate suggestions and could request additional solutions from ETA leadership. As a result of this collaborative nature of the ETA team, it was possible to identify the most common issues during Cohorts 1 and 2. The ETA team developed written briefs on these topics to standardize the feedback offered to grantees and to serve as written resources that grantees could use for guidance and cite as justification for their approaches.
The ETA briefs are available on the OAH ETA website, and we summarize an illustrative subset of the key issues that they addressed below: Difficulties with recruitment. Many grantees struggled with recruitment and retention both at the individual youth level and at the level of host school or organizations. For instance, grantees ran into problems obtaining the necessary permissions from school districts or gauging the schools’ willingness to participate when attempting to recruit schools for their impact evaluations. Common problems with grantee recruitment efforts included approaching schools without first getting district buy-in or not accounting for school research review board requirements when developing study protocols or time lines. The ETA team developed two related TA briefs on this topic, one on district recruitment (Bruursema, 2015) and one on school recruitment (Thomas, 2015). Selecting and justifying analytic decisions for estimating program impacts. At the analysis planning stages, grantee evaluators had many questions about how best to approach data cleaning, analysis, and reporting. The ETA team developed several briefs on analytic approaches. As an illustration, these briefs include guidance on appropriately adjusting standard errors in clustered designs (Deke, 2013), using the linear probability model to estimate interpretable impacts on dichotomous outcomes (Deke, 2014), and reporting benchmark and sensitivity analyses as a means to showcase the robustness of findings across different justifiable decisions (Kautz & Cole, 2017). One issue that was pervasive in this field was estimating impacts on endogenous subgroups. That is, evaluators examined the effect of the program on contraceptive use by looking only at youth who had had sex as of the follow-up period. We spent a lot of time on the endogenous subgroup issue during the design and analysis stages for Cohort 1 and developed webinars and a research brief to inform the broader field about the bias introduced when conducting an analysis on a subgroup defined postbaseline (Colman, 2012). We believe this work improved the capacity of the field because we rarely saw these analyses proposed in Cohort 2 or other in the other TPP-related grant programs for which we provide ETA. Designing well-powered evaluations. A common issue facing nearly all grantees was lower statistical power than what was needed to show statistically significant impacts on behavioral outcomes. Several evaluations were designed with less than optimal sample sizes, which in some cases were exacerbated by lower than anticipated recruitment efforts and/or poor response rates in follow-up surveys. The ETA team developed several resources to guide grantee efforts for future evaluations, including a drafting brief and online calculator on estimating and justifying minimum detectable impacts for TPP programs (Moreno & Cole, 2014). We also created a brief that presents the power tradeoffs among individual-assignment designs in which dating among youth across conditions weakens the effective contrast being tested compared with cluster-assignment designs in which cross-condition dating is likely to be less frequent (Deke, 2017).
Successes of ETA Activity
The ETA effort made substantial progress achieving the two goals guiding this work: (1) build evaluation capacity among grantees, evaluators, and federal staff and (2) broaden and enhance the evidence base of TPP programs with dozens of new, internally valid studies.
Substantial evaluation capacity building occurred in both Cohorts 1 and 2 of the TPP Program. At a briefing of study findings at the conclusion of Cohort 1, several grantees and evaluators indicated that they had learned a great deal about evaluations and could design and conduct more credible evaluations more confidently as a result of their work with the ETA team. In addition, several of the Cohort 1 evaluators were part of teams that received funding in 2015 for Cohort 2, some for multiple evaluations, and the evaluation designs and plans for Cohort 2 were markedly stronger and better powered to show program effectiveness than for Cohort 1. Furthermore, after the ETA contract was discontinued during Cohort 2, several grantees and evaluators contacted the Office of Population Affairs directly and requested that the ETA team return to provide feedback on their analysis plans and final reports to ensure high-quality products. Mathematica recently learned that we will be able to provide limited support to grantees as they write up their findings. Based on our observations and the feedback, we have received we believe the ETA team’s work with individual grantees and evaluators has built and enhanced the evaluation capacity for the TPP field.
A second success of the ETA effort is that a large body of internally valid evidence of TPP Program effectiveness was created through the Cohort 1 evaluations. The ETA team successfully assisted nearly all Cohort 1 grantees in conducting credible impact evaluations and summarizing their results as internally valid final reports. A special issue in the American Journal of Public Health highlights a subset of the Cohort 1 evaluations along with a synthesis of the results across all findings (Farb & Margolis, 2016). This effort built the capacity of some evaluators to write an article for a reputable journal. All but one of the final reports from OAH-funded grantees submitted to the HHS evidence review met standards when officially reviewed by the TPP Evidence Review team (Lugo-Gil et al., 2018), and all reports that were recommended for approval by the ETA team are publicly available at the U.S. National Library of Medicine. 5 A recent meta-analysis of the Cohort 1 TPP programs brought additional attention to this body of evidence (Juras et al., 2019a, Juras et al. 2019b).
Challenges and Lessons Learned From Providing ETA
One surprising finding was that most of the Cohort 1 evaluations showed small, nonsignificant impacts on behavioral outcomes (Cole, 2016; Goesling, 2016). One might have assumed that with the ETA support and because grantees were attempting to replicate interventions that had previously shown to be effective, or were testing potentially promising interventions, that many evaluations would show favorable and statistically significant program impacts. Evidence from fields such as psychology, however, suggests that the majority of replications do not yield effects of similar size and significance (Open Science Collaboration, 2015). And as noted below, the many limitations to the funded Cohort 1 evaluations likely played a role in many of them showing small, nonsignificant impacts on behavioral outcomes.
The fact that the ETA contract started after the grantees were funded certainly limited the ETA team’s ability to effect changes in the evaluations of some Cohort 1 grantees (see Knab et al., 2016, for details). OAH was a new federal office, funded in 2010, and launched the 2010 TPP grant program several months before awarding the ETA contract. Importantly, despite having evaluation designs that set them up for small, nonsignificant impacts on behavioral outcomes, several of the grantees received Cohort 1 funding. Common limitations among the Cohort 1 evaluations were (1) strong counterfactual conditions (in other words, the comparison group received a robust program as their “business as usual,” which produced a relatively weak effective contrast being tested in the evaluations), (2) smaller than optimal target sample sizes for the contrasts being tested (in other words, they were underpowered evaluations), and (3) serving young populations in which behavioral outcomes are rare (Coyle & Glassman, 2016). Grantees often did not have the ability to change the counterfactual without finding all new partners, which was not feasible in many cases. Similarly, grantees had limited ability to increase sample sizes because of budget constraints or a limited eligible sample in the region. Finally, although we did not encourage grantees to move away from conducting prevention interventions with young populations, we did suggest conducting long-term follow-up assessments, to the extent that this was possible within the 5-year grant period. As the ETA provider, we did not take part in reviewing grant applications nor was involved immediately upon award of the grantees, which constrained our role in improving some Cohort 1 evaluations. These factors may have contributed to the lack of favorable, statistically significant findings in Cohort 1.
We learned several lessons from our experience with Cohort 1 that are important to share with funders. First, funders should have more formal requirements and expectations for the evaluators with whom grantees partner. Ideally, lead evaluators participating in these high-stakes evaluations should have prior experience with an impact or effectiveness evaluation—this should not be their first experience conducting such a study. With more seasoned evaluators, the ETA team’s efforts could potentially shift away from being predominantly one-on-one to group TA, which would improve efficiency. In addition, the types of content provided by the group TA could be more technical and showcase cutting-edge guidance to help push the field forward. Alternatively, these grants can continue to build the capacity of less seasoned evaluators, provided there is enough funding for ETA and enough time in the grant period to shape evaluation efforts and make adjustments. We recommend grants that aim to build capacity last at least 5 years, including a planning year. Shorter grant periods, such as 2 or 3 years, are riskier in terms of final quality if evaluators or grantees lack experience and require substantial support. However, as shown in this case study, ETA can be beneficial with seasoned evaluators and grantees as well. An ETA team can serve as monitors by making the funder aware of the status of evaluations, addressing problems as they arise, facilitating cross-grantee learning, and serving as peer reviewers for key documents including design plans, analytic plans, and final reports.
Secondly, funders must forewarn grantees that they might have to change funded designs during the planning year after review by the ETA team or the funder. Even though the grants were funded as cooperative agreements, many Cohort 1 grantees and evaluators did not expect to have to make substantive changes to their plans during the first year of the grant. Several grantees negotiated agreements with schools or other implementing agencies that needed to be renegotiated to strengthen the proposed evaluation based on feedback from the ETA team. In some cases, this introduced tension in the relationships between grantees and OAH or the ETA team. Clearer expectations and guidance from the funding opportunity announcement forward could have limited the tension resulting from unexpected changes to the evaluation design and implementation.
Finally, ETA is not the solution to all issues. ETA can help grantees improve some aspects of an evaluation, increasing the chances of reporting internally valid estimates. However, other features may not be pliable, particularly if funding is set before the ETA team reviews the evaluation design. This suggests there are features that should receive attention during awardee selection. We discussed three of these at length previously as follows: (1) the strength of the counterfactual, (2) the target sample size, and (3) the likely prevalence of the key outcomes and how well they align with the intervention.
The Cohort 2 ETA activity was well timed to address this third limitation from the Cohort 1 evaluations. OAH called on the ETA team during application review to consult on the feasibility of evaluation designs and identify potentially challenging design issues. The ETA team conducted a fast-turnaround review of the highest scoring grantees to identify any critical design limitations in the proposed evaluations. The fast-turnaround review focused on common issues found among the Cohort 1 evaluations including strength of contrast and statistical power. High-scoring applications were not necessarily funded if they had critical evaluation design flaws that could make it challenging to find credible and statistically significant findings.
Cohort 2 differed from Cohort 1 in ways that facilitated the provision of ETA. Some Cohort 2 grantees were Cohort 1 grantees, so they were aware of the ETA team and its role from the outset. Grantees had a better understanding of the evaluation design approval process, resulting in less tension during this phase. Finally, the ETA team was known to add value and to serve as a useful resource to evaluators and grantees, which facilitated the provision of ETA.
Grantee Evaluations as Part of a Broader Learning Agenda
Although the body of this article has focused on the ETA provided to individual TPP grantees as a means to contribute to the evidence base of TPP programs, the grantee evaluations were only an individual element of a broader evidence-building agenda. OAH was able to implement a broad learning agenda to maximize the information generated through the TPP funds. The learning agenda was based on information obtained from (1) individual grantees conducting impact evaluations, guided by ETA to enable credible evidence of program effectiveness to emerge; (2) two large federal evaluations, one with the purpose of replicating the evidence of a subset of evidence-based programs, and a second to produce evidence of potentially promising programs; and (3) a meta-analysis of all evaluation efforts funded under Cohort 1.
There were several benefits of this multifaceted evidence-building approach, given some of the limitations of the evidence produced by the individual grantees. The second aspect of the evidence-building agenda allowed OAH to directly invest in high-quality evaluations of the programs that had the greatest need of evidence. OAH-funded replication evaluations of three widely implemented program models that had previously been shown to have evidence of effectiveness but were being implemented in new settings or with new populations (Abt Associates, 2015) and new effectiveness evaluations of seven promising programs that did not have evidence of effectiveness (Smith & Colman, 2012). OAH issued a request for proposals for these large federal contracts, and two large organizations with extensive experience conducting high-quality randomized experiments won these contracts and conducted rigorous effectiveness evaluations. The findings from these evaluations met HHS evidence standards and some of the programs evaluated were found to have significant impacts on behavioral outcomes. Therefore, the investment in large federal evaluation contracts helped to provide the types of information that OAH needed for programs that had the greatest need for credible evidence of program effectiveness.
A second key benefit of this approach was that the meta-analysis helped to mitigate the problems of lower than expected study power that several grantees faced. By pooling the information across multiple grantees, OPA was able to secure a more powerful test of its investment in TPP programming. An initial analysis of the results that focused solely on the grantee evaluations indicates that programs that were offered to individuals rather than groups, and programs that targeted female populations were particularly effective at improving participant outcomes (Juras et al., 2019a). A final report that incorporated additional studies into this meta-analysis, including longer term results from the larger federal replication efforts, showed that, on average, the investment in TPP as a whole had favorable and significant impacts on youth behavioral outcomes (Juras et al., 2019b). Therefore, the investment in a meta-analysis as a part of the evidence-building agenda was almost an insurance policy to combat problems of poorly powered individual grantee evaluations.
In sum, OAH’s approach of embedding ETA into grantee-led individual evaluations was a sound strategy as part of a broader learning agenda for the TPP field. It helped build impact evaluation capacity in the TPP field and contributed to the production of many credible studies of TPP programs. In addition, when coupled with other aspects of an evidence-building agenda—notably, investments in large federal evaluation contracts and a meta-analysis of the TPP investment as a whole—the benefits were compounded. Future federal investments in grantee-led evaluations, guided by ETA, can similarly benefit from these comparable aspects of a learning agenda.
Footnotes
Authors’ Note
Jean Knab and Russell Cole are no longer affiliated with Princeton Univerity, NJ and Policy Research, Evaluation, and Measurement, University of Pennsylvania, Princeton, NJ, USA.
Acknowledgments
The authors would like to thank Cay Bradley for her helpful feedback on this article and for her contributions to the ETA effort. The authors would also like to thank Amy Farb at the Office of Adolescent Health for her strong oversight of the ETA effort. Finally we would like to thank the ETA teams for their hard work and great ideas, especially Susan Zief at Mathematica who led the first ETA contract.
Funding
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: This work was funded by a series of contracts with the Office of Adolescent Health, U.S. Department of Health and Human Services (HHSP23337017T and HHSP233201300416G).
