Abstract
Background:
Systematic reviews sponsored by federal departments or agencies play an increasingly important role in disseminating information about evidence-based programs and have become a trusted source of information for administrators and practitioners seeking evidence-based programs to implement. These users vary in their knowledge of evaluation methods and their ability to interpret systematic review findings. They must consider factors beyond program effectiveness when selecting an intervention, such as the relevance of the intervention to their target population, community context, and service delivery system; readiness for replication and scale-up; and the ability of their service delivery system or agency to implement the intervention.
Objective:
To support user decisions about adopting evidence-based practices, this article discusses current systematic review practices and alternative approaches to synthesizing and presenting findings and providing information.
Method:
We reviewed the publicly available information on review methodology and findings for eight federally funded systematic reviews in the labor, education, early childhood, mental health/substance abuse, family support, and criminal justice topic areas.
Conclusion:
The eight federally sponsored evidence reviews we examined all provide information that can help users to interpret findings on evidence of effectiveness and to make adoption decisions. However, they are uneven in the amount, accessibility, and consistency of information they report. For all eight reviews, there is room for improvement in supporting users’ adoption decisions through more detailed, accessible, and consistent information in these areas.
Over the past decade, policy makers, funders, and practitioners have been moving toward evidence-based programs to address many health and social problems (Clancy & Cronin, 2005; Haskins & Barons, 2011; Haskins & Margolis, 2014). The rapid expansion of systematic evidence reviews sponsored by federal departments and agencies is part of this burgeoning movement to use research evidence in the design of federal public programs and policy initiatives (Haskins & Margolis, 2014; McCall, 2009). To connect research findings more directly with funding decisions, federal departments and agencies have sponsored a growing number of systematic evidence reviews in areas such as education, early childhood home visiting, pregnancy prevention, labor, and mental health.
Systematic reviews locate, assess, and summarize findings from existing research in a focused topic area to identify programs and practices with evidence of effectiveness. Using consistent standards and methodology, the reviews examine the design and execution of research studies to assess their internal validity. Evidence of effectiveness is then reported for studies determined to be of sufficient quality. Systematic reviews have their roots in efforts to promote rigorous medical research and evidence-based practices (Transfield, Denyer, & Smart, 2003). For example, for more than 20 years, the Cochrane Collaboration (2013) has reviewed the effectiveness of health-care interventions, and its sister project, the Campbell Collaboration, focuses on social policy (Campbell Collaboration, 2011). Related efforts provide guidance on best practices to creators of systematic reviews and study authors. The Preferred Reporting Items of Systematic reviews and Meta-Analyses (PRISMA) Statement provides a minimum set of items for reporting in systematic reviews and meta-analyses (Liberati et al., 2009), and CONsolidated Standards of Reporting Trials (CONSORT) Statement provides a minimum set of recommendations for reporting randomized trials (Moher et al., 2010).
Systematic reviews sponsored by federal departments or agencies play an increasingly important role in disseminating information about evidence-based programs. They have become a trusted source of information for administrators and practitioners who are seeking evidence-based programs but lack the resources and technical knowledge to assess programs’ evidence of effectiveness as reported in the research literature. The reviews typically provide information about the interventions reviewed and evidence of their effectiveness through reports and websites. To varying degrees, systematic reviews also describe the interventions and may provide other information about their implementation and the supports available to organizations that adopt them. 1
Indeed, several federally sponsored evidence reviews serve as the vehicle for identifying interventions that qualify as “evidence based” (i.e., showing convincing evidence of favorable effects) for specific funding streams. These assessments include the Home Visiting Evidence of Effectiveness (HomVEE) systematic review for programs funded by the Department of Health and Human Services’ (HHS) Health Services Resource Administration under the Maternal, Infant, and Early Childhood Home Visiting (MIECHV) program and the Teen Pregnancy Prevention (TPP) Evidence Review for TPP programs funded by several agencies within HHS (for a detailed description of the TPP evidence review, see Goesling, Oberlander, & Trivits, In press, in this issue). Others, such as the What Works Clearinghouse (WWC) and the National Registry of Evidence-Based Programs and Practices (NREPP), are often referenced in federal funding opportunity announcements as sources of evidence-based interventions.
Systematic reviews are known to have broad audiences of users with varied expertise and training, including researchers, policy makers, and administrators and program operators. Some types of users, such as researchers and program evaluators with strong technical skills, seek detailed information about evaluation designs, psychometric properties of outcome measures, and other technical issues. This article, however, focuses on the subset of users who rely on systematic reviews to select evidence-based interventions to adopt and implement. Such users may include school district officials, school principals, state agency and public health department staff, nonprofit agency administrators, and other program operators. These users vary in their knowledge of evaluation methods and their ability to interpret systematic review findings about program effectiveness. Systematic review websites and reports that translate research findings into nontechnical, accessible language can be invaluable to them. Moreover, these users must consider factors beyond program effectiveness when selecting interventions to adopt. Users need information to assess the relevance of the interventions to their target population, community context, and service delivery system; the intervention’s readiness for replication and scale-up; and the ability of their service delivery system or agency to implement the intervention, including cost considerations.
The current study examines how eight systematic reviews with public websites and sponsored by a federal department or agency report on information to support decision-making about adopting evidence-based programs. The study examines information reported in four areas: (1) synthesis and presentation of findings; (2) fit with a specific population, community context, and service delivery system; (3) readiness for scale-up; and (4) feasibility of implementation. In each area, we assess the relevance of the information for supporting users’ adoption decisions, the depth and breadth of information provided, and accessibility for nontechnical audiences.
Synthesis and Presentation of Findings
To support users’ adoption decisions, it is critical for reviews to disseminate findings on evidence of effectiveness in a way that is accessible to nontechnical users. The presentation of study-specific findings—that is, the reported impact of the intervention of interest on a particular outcome in one study—is relatively straightforward. However, with the increased emphasis on evidence-based decision-making, federal funders are more frequently looking to reviews to evaluate a body of evidence and provide a concise summary, ideally one that is accessible to nontechnical audiences. This synthesis of findings is much more complex than presenting study-specific findings, as there are multiple dimensions to consider within an environment of limited information.
Users of systematic reviews may be interested in both effectiveness (the direction and size of the effect of the intervention on the outcomes it targets) and external validity (how the results of studies of the intervention would apply to a wider population). Information on effectiveness may offer insight into the size of the effect a user might expect in his or her context. Information on external validity helps users understand the extent to which the intervention has been tested, such as the number of studies conducted and their sample sizes.
Fit With Target Population and Context
Users need information about other aspects of external validity to assess how well an intervention may fit with their target population, community context, and service delivery setting. Information on fit helps users understand how an intervention might work locally. For example, users may seek information on whether an intervention has been tested in a location similar to the user’s in terms of urbanicity, demographics, or policy context; whether the intervention has been tested in a similar service delivery context; and whether it has been tested on participants similar in age, gender, socioeconomic status, race/ethnicity, or other characteristics. These determinations require a focus on what works, for whom, and under what conditions, necessitating a better understanding of the context of individual findings rather than improved synthesis methods.
Readiness for Scale-Up
Information about the resources available to support replication and scale-up of evidence-based interventions is essential for users deciding which ones to adopt. Scaling up an intervention involves moving it from implementation in a highly controlled environment to implementation in a larger number of diverse sites under less controlled conditions (Howard, 2012). Research shows that substantial support is needed for effective implementation of evidence-based programs (e.g., George et al., 2008). Implementers need information on how to carry out the program in different settings and multiple locations (Fixsen, Blasé, Horner, & Sugai, 2009; Stid, Neuhoff, Burkhauser, & Seeman, 2013).
To bring an intervention to scale, core components of the intervention need to be clear, and users need to understand what it will take to implement those components with fidelity (Paulsell, Del Grosso, & Supplee, 2014). Assessing readiness for scale-up includes examining information about how well the developer or purveyor has specified the intervention in the form of implementation manuals and guides, training materials, and fidelity standards. Users considering replication of specific interventions will need to know the supports available for implementation, such as information on manuals, provision of pre- and in-service training, technical assistance, and criteria and tools to measure fidelity. In short, users need to know about the supports in place to assist them in implementing the intervention as effectively as it was carried out in the research (Stid et al., 2013).
Users will also benefit from information on adaptation and lessons learned from previous implementation. Adaptation involves making changes to an evidence-based program without compromising its core components (Tortolero et al., 2005). The fit between an effective model and the features of the user’s environment will vary; therefore, understanding whether adaptations have been tested or are allowable by developers or purveyors is useful. Users will also likely be interested in whether a model has been replicated or scaled up in the past, what lessons were learned from those experiences, and who they should contact for more information.
Feasibility of Implementation
Users of systematic reviews need to consider whether it is practical and feasible to implement an intervention in their service delivery setting. Fundamentally, they must assess whether they have sufficient funding, staff, and other resources needed to implement the intervention as intended. They need information about the required staff qualifications and the caseload sizes—for example, to determine whether the implementing agency has enough qualified staff to implement the intervention, and if not, whether personnel could be hired locally at a pay rate the agency could afford. Implementing agencies also need to know the requirements for pre- and in-service training, including how many days of training are required, and whether training can be provided on-site. Similarly, the dosage of services required, technology requirements, and other cost issues must be considered.
Research Questions
The purpose of this article is to examine how eight systematic reviews sponsored by federal departments or agencies currently report on information needed by users to support their decisions about adopting evidence-based programs, identify gaps in information provided, and suggest strategies for improving the reviews’ capacity to meet users’ information needs. Specifically, the article addresses four main research questions: What methods do federally sponsored systematic reviews use to: Synthesize and present findings on evidence of effectiveness to support user adoption decisions? Assess and report on programs’ fit with a range of target populations and contexts (external validity)? Assess and report on programs’ readiness for scale-up? Assess and report on feasibility of implementation?
Based on our examination of the research questions and these eight systematic reviews, we make recommendations for how systematic reviews can be made more accessible and useful for nontechnical users.
Method
To assess capacity to support users’ adoption decisions, we reviewed publicly available information as of July 2015 for eight federally sponsored systematic reviews with public websites in the labor, education, early childhood, mental health/substance abuse, family support, and criminal justice topic areas: (1) the Clearinghouse for Labor Evaluation and Research (CLEAR), (2) Crime Solutions, (3) HomVEE, (4) NREPP, (5) the Strengthening Families Evidence Review (SFER), (6) the TPP Evidence Review, (7) the WWC, and (8) the What Works in Reentry Clearinghouse (Table 1). To the authors’ knowledge, these represent all of the federally sponsored reviews of social service interventions that provide their results on public websites. As the federal government increasingly prioritizes the use of evidence-based interventions, these websites have become a primary information source for people applying for a range of federal funding opportunities and seeking information about program models and practices with evidence of effectiveness.
Federally Sponsored Systematic Reviews Considered.
The authors extracted and coded information from the eight review websites about approaches to synthesizing and presenting information about evidence of effectiveness, external validity, and program information. For evidence of effectiveness and external validity, one author extracted information from a selection of review publications (model-, practice-, or study-specific reports) about the following: study-specific effectiveness information, intervention- or model-level evidence of effectiveness ratings, range of effects, number of studies, sample sizes, number of settings, setting characteristics, and study sample characteristics (see Table 2). 2 The author organized the extracted information into a table, noting whether each item was present.
Availability of Effectiveness and External Validity Information in Systematic Reviews.
Note. These data include information reported as of July 2015. √ = reported; (blank) = not reported; SES = socioeconomic status; CLEAR = Clearinghouse for Labor Evaluation and Research; HomVEE = Home Visiting Evidence of Effectiveness; NREPP = National Registry for Evidence-Based Programs and Practices; SFER = Strengthening Families Evidence Review; TPP = teen pregnancy prevention; WWC = What Works Clearinghouse.
aCrime Solutions distinguishes between “one study or meta-analysis” and “more than one study or meta-analysis” for each of its evidence ratings (effective, promising, or no effects).
For program information, one author coded information from reviews of 7–10 models or practices selected at random on each website regarding dimensions of model specification (target population, core components, dosage, and allowable adaptations), availability of supports for implementation such as manuals and technical assistance, information about replication experiences and implementation history, staffing requirements, and resource requirements such as program and training costs (see Table 3). The author organized the information into a table indicating whether each element was consistently reported, sometimes reported, or not reported.
Availability of Program Information in Systematic Reviews.
Note. These data include information reported as of July 2015. C = consistently reported if information is available; S = sometimes reported; N = not reported; MIS = management information systems; CLEAR = Clearinghouse for Labor Evaluation and Research; HomVEE = Home Visiting Evidence of Effectiveness; NREPP = National Registry for Evidence-Based Programs and Practices; SFER = Strengthening Families Evidence Review; TPP = teen pregnancy prevention; WWC = What Works Clearinghouse.
aIncludes information about sites where the model is known to have been implemented. bIncludes studies in which an evaluation of the model was replicated.
Following initial coding, all authors reviewed the tables of coded data, discussed coding decisions, and resolved areas of disagreement. After the authors reached agreement, we confirmed the findings with a member of each review team. 3
Results and Conclusions
Synthesis and Presentation of Findings
Users selecting an evidence-based program need to understand the effectiveness of the interventions they are considering and the kinds of outcomes participants are likely to achieve. The eight reviews we examined consistently provide study-level information about program effects. All provide information on the direction of the effect on a given outcome within a study—that is, whether it was favorable or unfavorable—and whether it was statistically significant (Table 2). Some reviews, such as HomVEE, provide this information only for studies meeting a certain quality threshold. Other reviews, such as CLEAR, report this information for all studies reviewed. However, the extent of information reported on null findings (i.e., impact estimates that are not statistically significant) varies by review. HomVEE, for example, reports all impact estimates on relevant outcomes within studies meeting a quality threshold. Other reviews, such as CLEAR, summarize a study’s findings and may not mention null findings.
Impacts that are statistically significant may not be meaningful or important in practice however. Practical importance is as—if not more—important than statistical significance to systematic review users. Several of the reviews provide additional outcome- or study-level information on the magnitude of the effect—for example, the difference in means for the intervention and comparison group in natural units—so that users can assess the effect’s practical importance. HomVEE and the WWC also present outcome- or study-level standardized mean differences (i.e., effect sizes) to facilitate comparisons across outcomes and studies. The WWC converts the standardized mean difference to an “improvement index” to provide an example of the potential effect of the intervention on an average comparison group member. Because education studies may suffer from low power due to small samples, the WWC also incorporates practical importance by highlighting findings that are not statistically significant but have a standardized mean difference of at least 0.25.
Reviews less consistently provide summary information from all individual studies of an intervention aggregated to the intervention level. Four of the eight reviews provide an intervention-level evidence of effectiveness rating. HomVEE and the TPP Evidence Review use a threshold approach, indicating whether each intervention has met the bar for evidence of effectiveness, based only on the presence of one or two favorable findings. Crime Solutions and the WWC provide multicomponent ratings. Crime Solutions classifies interventions as “effective,” “promising,” or “no effects” based on study quality and favorability of outcomes and indicates the extent of evidence supporting an intervention’s rating. The WWC reports several pieces of information on an intervention’s effectiveness in each outcome domain targeted, including a numeric improvement index that averages the impact estimates across studies, an effectiveness rating that tells the user how strongly favorable or unfavorable an intervention’s impacts were, and an extent of evidence rating that indicates how much evidence supports the findings in an intervention report.
Synthesizing study-specific information in a way that enables users to understand practical importance and statistical significance—that is, the likely direction, significance, and magnitude of an intervention’s effect—is useful for helping nontechnical users understand what they can glean from the entire body of evidence on an intervention. However, computing intervention-level effect sizes that enable users to assess practical imporance is difficult because studies use different estimation methods. Although formulas exist to convert impact estimates (e.g., from a multivariate linear regression) to effect sizes, these formulas do not cover the range of estimation methods used by study authors. Moreover, in the areas covered by the reviews we examined, the body of evidence is often quite small. For a decription of the choices available for synthesis and their limitations when used with a small body of evidence, see Valentine et al. (In press) in this issue.
Users could also benefit from additional information about the range of impact estimates for an intervention, presented in a format accessible to nontechnical users. The PRISMA Statement (2009) checklist states, “It is preferable to also include, for each study, the numerical group-specific summary data, the effect size, and confidence interval (p. 12).” Similarly, the CONSORT Statement (2010) states, “For each outcome, results for each group and the estimated effect size and its precision (such as a 95 percent confidence interval) (p. 18)” should be presented. The Cochrane Handbook for Systematic Reviews of Interventions (2011) recommends using forest plots (which graphically display the relative strength of treatment effects estimated in different studies of the same intervention) and confidence intervals to present data and analysis results. Both CONSORT and Cochrane recommend including confidence intervals or p-values rather than merely reporting statistical significance. The Campbell Collaboration is less prescriptive regarding presentation of findings, though its template for reviews includes a forest plot in the data and analyses section. Systematic reviews could consider converting technical displays such as forest plots or confidence intervals into a “range of effects” summary to help nontechnical users understand the range of impacts they may expect from a given intervention.
All reviews report at least some information on external validity, including the number of studies included in the review of an intervention, intervention- and study-level sample sizes, and the number of settings (Table 2). All of the reviews that provide an intervention-level evidence rating clearly report the number of studies contributing to that rating. Crime Solutions limits the number of studies reviewed to three, and the WWC requires that two or more studies show favorable effects (among other criteria) to receive the highest effectiveness rating. Seven of the eight reviews report sample sizes and the number of sites for individual studies, but only the WWC reports intervention-level sample sizes or takes into account the number of different settings when determining the extent of evidence on an intervention as a whole. Systematic review users could benefit from the development of best practices for reporting on external validity.
The evidence reviews examined for this study could better assist users in choosing among interventions by presenting more information about effectiveness and external validity in an accessible manner. As mentioned previously, HomVEE and the TPP Evidence Review currently use a threshold approach, reporting on whether an intervention has met the standard for evidence of effectiveness. For these reviews, additional summary measures of the evidence could help users choose between multiple effective interventions, based on the average and range of impact estimates, and the strength and extent of evidence for those estimates.
Summary indicators of intervention effectiveness are useful decision-making tools for users of systematic reviews. Some systematic reviews, however, such as the TPP Evidence Review and CLEAR, examine bodies of evidence that are quite small, with often only one study available per intervention. In these circumstances, summary measures are neither feasible nor even necessary. Over time, however, as initial studies are replicated, more evidence will become available. Establishing procedures for summarizing across studies and communicating with users about what the summary measures mean and how they can be used—even if only applicable to a few interventions early on—can set the stage for incorporating future research in a way that maximizes its helpfulness to users.
Fit With Target Population and Context
All reviews we considered provide at least some information on external validity that could help users determine whether an intervention would be a good fit for a particular context. All reviews report information on study settings, including location and, in most cases, urbanicity of the sites in which interventions were tested (Table 2). All reviews report additional information on setting characteristics such as area demographics or service delivery contexts, although these pieces of information are not reported consistently across reviews or across models within a review and are often reported on a study-by-study basis. Only NREPP consistently provides information on setting characteristics aggregated to the level of the intervention as a whole. HomVEE reports this information only for models that were deemed evidence based, and CLEAR and the TPP Evidence Review note for some models whether the intervention took place in an academic setting. The WWC reports information on area demographics on a study-by-study basis. In addition, Crime Solutions, SFER, the WWC, and What Works in Reentry report information on service delivery context on a study-by-study basis.
Reviews consistently report information on study sample characteristics (Table 2). All report information on the age, gender, and race/ethnicity of participants, at least on a study-by-study basis, and some also report other characteristics. 4 NREPP is the only review that aggregates this information from individual studies to the intervention level and reports, for example, all age ranges and races/ethnicities covered by individual studies. Six of the eight reviews report other information on participants. CLEAR notes some study participants’ place of employment, Crime Solutions and What Works in Reentry note such characteristics as percentage of participants with a substance use disorder, HomVEE notes the pregnancy status of study participants when applicable, SFER notes the percentage of study participants who were in the child support system, and the WWC mentions characteristics such as English language learner or special education status.
Across all eight reviews, information on the characteristics of settings and study samples is provided for individual studies rather than summarized for the body of evidence on an intervention. For example, HomVEE provides tables including study characteristics for studies of models that met its evidence of effectiveness standard; however, finding the information requires clicking on individual studies’ references and then reviewing tables that describe the study sample, setting, services provided, and staff characteristics and training. To better support users’ needs, evidence reviews could display this information more prominently on their websites and could aggregate it across studies for each intervention. Depending on availability of information and characteristics relevant to the area of review, the reported categories could include (1) study sample demographics, (2) community context such as urbanicity and community demographics, and (3) service delivery context, including implementing agency auspice and staff characteristics. Such a display would enable users to determine whether the intervention of interest had been tested with a similar target population and in a similar context to their own.
A more sophisticated approach to disseminating information on intervention fit would be to incorporate tags, filters, or search engines on systematic review websites that allow users to search for or sort interventions based on demographic characteristics of the study sample, community characteristics, implementing agency auspice, and staff characteristics. Given the paucity of research in several areas covered by federally sponsored systematic reviews, it is unlikely that any review would cover all types and combinations of target populations, communities, and service delivery settings of interest to users. However, the development of a search engine could lay the groundwork for helping users more efficiently sift through a body of evidence that will likely grow over time. Moreover, users could determine which dimensions of fit are priorities for them and tailor their search accordingly.
Readiness for Scale-Up
Of the evidence reviews that we examined, only NREPP provides a rating system of readiness for scale-up, which it terms “Readiness for Dissemination.” NREPP’s Readiness for Dissemination ratings summarize the quantity and quality of implementation resources available for an intervention. A requirement for submission of an intervention to the NREPP systematic review is that the following materials are developed and can be made publicly available: Implementation materials (e.g., treatment manuals and information for administrators) Training and support resources (e.g., training curricula and ongoing consultation) Quality assurance procedures (e.g., protocols for gathering process and outcome data and ongoing monitoring of intervention fidelity)
Readiness for Dissemination reviewers 5 independently rate each of these dimensions on a scale of 0–4, with higher scores indicating both higher quantity and quality of these dissemination materials. In addition to the numerical ratings, NREPP reviewers provide a subjective evaluation of dissemination strengths (e.g., noting whether fidelity measurement tools are straightforward and easy to use) and weaknesses (e.g., noting whether an implementation manual has dense content and poor formatting that make it difficult to use).
NREPP is unique in rating interventions’ readiness for dissemination, but all of the systematic reviews provide some program information relevant to scale-up (Table 3). Most reviews consistently provide information related to model specification (e.g., noting the model’s target population), though supports for implementation (e.g., fidelity criteria) and information on replication and implementation experiences are less consistently noted across the reviews.
The systematic reviews vary in requirements regarding the presence of program information and implementation. For example, NREPP limits reviews to programs for which implementation materials are developed and could be made publicly available. The TPP Evidence Review, on the other hand, evaluates some programs that are still under development or had no formal materials available. The MIECHV legislation, associated with HomVEE, specifies a number of implementation requirements for models with evidence of effectiveness, including stipulations that the model must have existed for at least 3 years and that the model is associated with a national organization or higher education institution offering comprehensive program standards. HomVEE’s prioritization scheme also includes the availability of implementation information.
The systematic reviews also take different approaches to obtaining implementation information that can inform scale-up. For example, CLEAR draws this information from implementation studies evaluated during the review, and HomVEE collects this information systematically from developers, manuals and other program materials, implementation reports, and additional sources. Crime Solutions sometimes links users to other sites where the user can obtain implementation information. Others simply report this information when it is reported in the rigorous evaluations they reviewed. For example, the What Works in Reentry Clearinghouse summarizes the quality of implementation when it is available.
Within each systematic review, the information available for reporting may not be consistent across models because it is based on data available from researchers, developers, or purveyors. Of the program information related to scale-up, features related to model specification are the components of program information most consistently reported across program models. All of the systematic reviews provide information specifying the target population (or the samples with which the model was tested) and core intervention components. Some reviews mention pedagogical approaches alongside the intervention components. To the extent that the information is available, most reviews specify dosage, such as the expected frequency and the duration of services.
Reporting on adaptations also varies by review. HomVEE describes adaptations or enhancements to the program models (e.g., noting whether a program is adapted for online delivery or enhanced by being paired with another service delivery component). NREPP notes in its adaptations sections whether manuals are translated into other languages. The TPP Evidence Review notes whether adaptations are allowable for different organizations or populations. This information can be important for helping users assess fit, especially if none of the interventions with evidence of effectiveness is well suited to the target population. For example, program materials might need to be translated into other languages or otherwise culturally adapted.
All reviews have at least some information on supports for implementation, but consistency varies. Most reviews note the presence of manuals, training, and technical assistance when available. Not all systematic reviews report on fidelity standards that implementing agencies may be required to meet and systems for assessing or monitoring fidelity of implementation; however, these factors would be critical for supporting replication and scale-up. HomVEE and the TPP Evidence Review are the only systematic reviews that include considerable information on all of the supports for implementation specified in Table 3.
Information on replication and implementation experiences also vary by systematic review. For example, reviews such as NREPP, HomVEE, and SFER provide information describing the timing and location of known sites where the model was previously implemented (e.g., specifying when and where the model was first implemented and where it has been implemented since). Most of the systematic reviews provide information on whether the model was tested in more than one study, including information on whether effects were replicated with nonoverlapping samples. CLEAR assesses the quality of implementation studies, providing opportunities to summarize findings related to implementation challenges and reported solutions and evaluate the quality of the implementation research. Most of the systematic reviews include information for contacting a developer or purveyor.
In addition to the variation in information, the presentation of information varies across the reviews. For example, HomVEE creates implementation reports that are linked to each model included in the review, and NREPP, SFER, and Crime Solutions incorporate implementation and dissemination information into model summaries. CLEAR assesses the quality of implementation research for selected studies deemed relevant for decisions about programs and policies. Most of the program information components in Table 3 are reported for CLEAR only in the context of these select implementation studies rather than reported for every study or model. Although this information is useful, other reviews’ methods of linking implementation information to all model-level profiles is particularly user friendly for website visitors. Model-level profiles are easily accessible on the website, follow a consistent format, provide consistent content across models, and are easily linked to the summaries of effectiveness research.
Feasibility of Implementation
In addition to interpreting the evidence of an intervention’s effectiveness and understanding the resources available to support implementation, users need to consider several practical issues when deciding which intervention to adopt. Of the eight federally sponsored reviews considered, HomVEE, SFER, and the TPP Evidence Review provide the most systematic information to help users assess the feasibility of implementing interventions included in the review, specifically with regard to staffing and resource requirements (Table 3). An implementation profile of each intervention reviewed includes a section of prerequisites for implementation with information about the allowable types of implementing agencies, staffing requirements including staff education and experience, staff ratio requirements, and data systems/technology requirements. In HomVEE, another section of the profile documents program costs, including the average cost per family, labor costs, purchase of program model or operating license, materials and forms, and training and technical assistance.
The other reviews provide some information for assessing feasibility of implementation but not consistently for every intervention reviewed. NREPP’s intervention summaries include information about required staff qualifications and a cost section, primarily focused on the cost of materials and training. Some NREPP summaries also provide general information about the basis for operating costs. The WWC provides information about program costs often limited to the cost of purchasing curricula and related materials. Crime Solutions is similar to the WWC, consistently reporting on program costs when that information is available. CLEAR reports on selected staffing and resource requirements when the relevant topics are examined as part of an implementation study review.
Information on feasibility is essential for users’ decision-making and should be prominently reported by systematic reviews. For example, an implementing agency may identify an evidence-based program that has been effective with its target population and has adequate implementation supports available. However, the cost per client may be more than the agency can afford, or the purveyor may require staff with qualifications that are not readily available in the community. Moreover, information on feasibility should be reported at the program model rather than study level, and it should be easy to access. One option may be to create a search engine or add fields to an existing search engine, so users can search for interventions that can be implemented by staff with different levels of credentials, within different ranges of annual cost per client, or with different levels of technology usage.
As with all types of information, systematic reviews are limited to reporting on cost and other feasibility information that is available from developers and purveyors or in the research literature. Thus, the degree of information available is likely to vary greatly both across and within substantive fields. Nevertheless, reporting consistent information about feasibility, even if most fields are reported as “information not available,” sets important expectations for program developers and purveyors (that they should provide the information) and for users (that feasibility is an important dimension of their adoption decision).
Implications for the Field of Systematic Reviews
Federally sponsored systematic reviews offer an unprecedented opportunity to disseminate critical information to support users’ decision-making during the crucial process of adopting an evidence-based intervention. Indeed, the websites these evidence reviews maintain provide an ideal platform for guiding practitioners through assessing needs, fit, and capacity to select an evidence-based intervention appropriate for their circumstances. For example, users may need to select from among multiple programs that have evidence of effectiveness in the same domain or across multiple domains. Review websites can organize and present this information clearly, along with other external validity and implementation information, to help users make the selection that best meets their needs. This can include model-level information about fit with a range of target populations and service delivery systems, readiness for scale-up, and feasibility issues such as costs and staffing requirements. Several potential strategies can facilitate this process, including model-level summary reports and ratings as well as search engines and tags that incorporate these factors (also see Haegerich, David-Ferdon, Noonan, Manns, & Billie, In press, in this issue, for a discussion of the use of technical packages in the public health sector).
The eight federally sponsored evidence reviews we examined all provide information that can help users interpret findings on evidence of effectiveness and assess an implementation’s fit with their target population and context, readiness for scale-up, and feasibility of implementation. Most of the reviews, however, are uneven in the amount of information they provide in each of these areas, the accessibility of the information, and the consistency with which they provide the information across interventions. For all eight reviews, there is ample room for improvement in supporting users’ adoption decisions through more detailed, accessible, and consistent information in these areas.
As evidence-based policy making and using evidence-based interventions as a strategy for meeting policy goals continue to gain traction, systematic reviews inevitably take on more prominence as authoritative sources of information about evidence-based interventions. Staff from state and local agencies who receive federal funds to implement a range of education and social programs increasingly look to systematic reviews for guidance about which interventions can best meet the needs of their target populations. This article describes some opportunities to better support decisions about adopting evidence-based programs and offers suggestions for the type and format of information that systematic reviews should provide.
An essential next step is to test and refine these ideas on the basis of user input. Some systematic reviews have engaged in limited usability testing of their websites, but more extensive user input and feedback is critical for ensuring that the information presented is accessible, useful, and understood as intended by the target audiences. Systematic reviews should consider conducting formative assessments to better understand resource utility and user satisfaction and to focus improvements. Systematic reviews can solicit input on their websites through user surveys and at grantee conferences and other venues where users are likely to be present. Web analytics can facilitate understanding of how users interact with systematic reviews. As search engines and other new components of systematic review websites are developed, focused usability testing should be conducted on the new components.
Federally sponsored systematic evidence reviews can also benefit from cross-review consultation and coordination about strategies for meeting users’ needs. Some coordination is already underway in the form of quarterly meetings of systematic review team leaders and topical meetings sponsored by federal agencies. Although they share many characteristics, the reviews examined for this article were initiated for different purposes and examine varied bodies of evidence. They need not adhere to identical processes and standards, but coordination on some issues that transcend discipline and purpose may be helpful. Examples of two such issues are the merits of reviewing named program models versus more general practices and the value of rating readiness for scale-up.
All of the eight reviews examined in this article review named program models, and two also review practices that are not tied to specific branded programs. Assessing the evidence of effectiveness for individual program components or practices not tied to a specific program model can provide useful information for practitioners. These effective components or practices could be adopted and incorporated into existing programs without the need to purchase materials and training from a specific model developer or purveyor. They may also be helpful for users targeting populations for which there are no proven program models. However, implementation of the practice may vary across studies, leaving issues of dosage and other specifications unclear or open to interpretation. Practices not connected to a specific program model would likely lack supports for implementation such as manuals and training materials, and information about costs and other resource requirements may not be available. In contrast, a well-specified program model with implementation guidance, training, and other supports can be implemented with fidelity to the model tested in research and found to be effective. Depending on users’ circumstances, such as availability of resources to purchase implementation materials or the target population of interest, both approaches may yield helpful information.
Thus far, only NREPP goes beyond providing information to producing a rating of readiness for scale-up. Synthesizing this complex information into a concise and accessible rating may be helpful for users of other reviews, especially if consensus can be reached about which elements of scalability are most important. Reporting consistent information about readiness for scale-up sends a strong signal to model developers and purveyors about the level of support they should be prepared to offer users and what it means to fully develop an intervention that is ready to be scaled up, such as through a federal grant program. Reporting consistent information about staffing requirements and costs conveys a similar message. However, reviews may face resistance from model developers and purveyors to such ratings and criticism about how the ratings are constructed. Especially when eligibility for federal funds is implicitly or explicitly tied to systematic review ratings, the stakes for the rating process are high (see Goesling et al., In press, in this issue, for a discussion of the unique considerations when developing one high-stakes systematic review). Most reviews have limited their ratings to areas of broad consensus, such as the internal validity of studies. Although this may be a prudent approach for dimensions that have not been previously rated, the lack of ratings requires users to synthesize the information and draw their own conclusions about which dimensions of scalability they should weigh most heavily.
Federally sponsored systematic reviews have developed strong criteria and methods for assessing study quality and evidence of effectiveness reported at the study level. Ultimately, developing additional tools to synthesize and weigh evidence at the model or practice level—as well on issues of external validity, implementation, and scalability—can better support users’ adoption decisions and thus help realize the promise of evidence-based programs to address social problems and improve outcomes for populations of interest.
Footnotes
Acknowledgment
The authors would like to acknowledge our colleagues at Mathematica Policy Research who provided thorough reviews and thoughtful comments on this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
