Abstract
Recognizing the need for scientific fidelity and balanced representation in the evidence that informs public policy, this study investigates technical and issue bias in 43 policy briefs and state handbooks that provided information about the use of Student Learning Objectives to evaluate teachers’ performance. The author uses multiple qualitative methods to categorize the contributors to the focal documents, identify the evidence they drew upon, and determine how they represented the information to their targeted audiences. The study reinforces the findings of prior research by documenting the outsized impact of advocacy groups in a policy-related evidence base. The results make an important addition to the scholarly literature by cataloging an array of technical assistance providers that translated and disseminated evidence to decision makers and spotlighting the various ways biased information appeared in the publications. Throughout, the study reinforces how incentives and timing shape evidence production and use in policymaking.
Keywords
Understanding the origins and manifestations of bias is a first step toward preventing it from distorting public policy. In part, these understandings involve information suppliers and users at various stages of policymaking and how they are incentivized to present and use research (McDonnell & Weatherford, 2020). Education policy researchers, therefore, have paid attention to the interconnections between private foundations and advocacy organizations in U.S. policy networks and how they used their ample resources to advance favored policy ideas (Lubienski et al., 2016; Reckhow & Snyder, 2014). Such alliances promoted the adoption of teacher accountability reforms (Reckhow et al., 2021; Tobiason, 2019; Tompkins-Stange, 2016), charter schools (DeBray et al., 2014), and school vouchers (Lubienski et al., 2009, 2016). Fewer studies have focused on how policymakers use evidence to diagnose problems, understand the merits of reforms, or justify their policy positions (e.g., Asen et al., 2013; Coburn et al., 2009; Jabbar et al., 2014). Substantial gaps remain in our knowledge of the mechanisms that transmit information to decision makers at various levels of government and the array of parties involved in this work.
The current study extends scholarly discussions about evidence in the policy arena by investigating policy briefs and associated documents. Policy briefs constitute a worthy topic for this investigation because they are a popular strategy for communicating with policymakers and implementers (Cooper, 2014). According to Beynon et al. (2012), “A policy brief is a concise standalone document that prioritises a specific policy issue and presents the evidence in non-technical and jargon-free language” (p. 12). Authors design policy briefs to efficiently translate and transmit evidence to the individuals who develop or enact policies (Dodson et al., 2012). Although no studies appear to have focused on education policy briefs, researchers examined policy briefs in the fields of health care (Dodson et al., 2012) and agriculture (Beynon et al., 2012) to determine how their intended audiences received the documents.
The policy briefs in this study focused on Student Learning Objectives (SLOs). SLOs are a relatively new teacher evaluation method in the United States that requires teachers to set data-based learning goals, identify appropriate assessments, track their students’ progress over time, and determine the extent their own goals were met (Lachlan-Haché et al., 2012a). SLOs were popularized after advocacy groups interested in making teacher evaluation more data-based and reliable successfully promoted rapid changes to federal education policy at the beginning of the Obama administration (Tompkins-Stange, 2016; Weisberg et al., 2009). Federal Race to the Top (RTT) grant applications and No Child Left Behind (NCLB) waivers appeared to codify rigorous evaluation methods into policy by requiring states to evaluate all teachers by the extent they promoted growth in student achievement (U.S. Department of Education [USED], 2009, 2012). The policies, however, also resulted in half of the states adopting SLOs—an evaluation method whereby many teachers were “in essence grading themselves” (Gill et al., 2013, p. ii).
To better understand how an evaluation method at odds with the reformer’s original intent become widely adopted by state policymakers, the current study focuses on documents related to SLOs. SLOs are an ideal topic for this investigation because (a) SLOs were one of several competing policy options, (b) SLOs are the focus of many policy briefs, and (c) officials in states that adopted SLOs often explained their reasoning in handbooks targeted to district-level decision makers and implementers. As such, an examination of SLO policy briefs, state handbooks, and the research they cited has the potential to reveal sources of bias in policy arguments and evidence uptake during policy stages that shifted from federal to state and local levels. The following work depicts an extensive analysis of 43 policy briefs and state handbooks that centered on the following questions:
Investigating these questions involved the compilation of a comprehensive corpus of literature related to SLOs and bibliographic tracing to illustrate patterns among the funders, authors, and publishers of the works targeted for analysis. Qualitative content analysis then documented how authors of policy briefs and state handbooks used evidence to present and justify their claims about SLOs. Parkhurst’s (2017) concepts of technical and issue bias aided in discerning when biased depictions of SLOs stemmed from insufficient fidelity to the norms of scientific research and when they were due to incomplete representation of the evidence base. The findings add to the scholarly literature by documenting the outsized impact of policy advocates among the publications, the array of technical assistance providers who translated and disseminated evidence to policymakers, and ways biased information appeared in the documents. The following sections present the conceptual and empirical bases for the study, discuss SLOs in greater detail, and then describe the research findings and their implications. Throughout, the study illustrates how incentives and timing shape evidence development and its use in policymaking.
Conceptual Framework: Evidence Use in Education Policy
An investigation of policy briefs requires consideration of whom they were written for, with what intent, and whether the documents accurately conveyed information to their audiences. The current study’s analytic framework, therefore, relies heavily on McDonnell and Weatherford’s (2020) theories about evidence use in education policy and Parkhurst’s (2017) concepts of technical and issue bias. McDonnell and Weatherford offer frameworks for (a) considering policymakers’ needs and incentives at various stages of policymaking, and (b) categorizing evidence contributors. These frameworks overlap somewhat with Reckhow et al.’s (2021) theory of the Political Economy of Knowledge Production, but McDonnell and Weatherford’s work is broader in its incorporation of apolitical actors and the inclusion of policy stages after initial uptake. Parkhurst’s concepts provide a means for identifying the many ways evidence can be mischaracterized or misused in the policy arena.
Policymakers and Their Evidence Needs
As a policy advances from conceptualization to implementation, the decision makers and their evidence need shift (Feuer, 2016). McDonnell and Weatherford (2020) identified four policy stages: (a) problem definition and framing solutions, (b) policy design, (c) policy enactment, and (d) implementation. In the United States, the federal government has mostly influenced education policy by defining problems and financially incentivizing favored solutions. Federalism dictates that U.S. education policy is primarily enacted by the states and implemented at local levels. Thus, at the first three of McDonnell’s and Weatherford’s policy stages, evidence is targeted toward federal, state, or local elected officials (and their staffs) who must craft and adopt policies during limited policy windows when changes in public opinion or administration signal that vital constituents and decision makers are receptive to action (Kingdon, 1984). At the final policy stage, information is targeted to the implementers who will interpret the policy, consider how it aligns with their own priorities, and determine how to realize its aims in their local contexts (Leithwood, 2018). The parties involved are, therefore, challenged to make sound decisions with the evidence available when it is feasible for them to act (Feuer, 2016).
Because policies must conform to political and practical realities (Coburn et al., 2009), decision makers require information, “not only about the effectiveness of a procedure and the relationship between the risks and the benefits, but also about its acceptability to key constituencies, its ease and cost of implementation” (Nutley et al., 2002, p. 7). Most policymakers, consequently, take a broad view of evidence and interpret information through their preexisting beliefs and preferred policy positions (Brown, 2015). They also favor convenient, straightforward information that points confidently to practical solutions (Caswill & Lyall, 2013; Henig, 2009). Policymakers’ evidentiary preferences contrast with traditional research outputs that often present nuanced findings in lengthy, inaccessible articles filled with academic jargon. Thus, few policymakers engage directly with scholarly research and many rely on descriptive data, think tank reports, anecdotes, blogs, and other sources academics categorize as “weak” evidence (Asen et al., 2013; Huguet al., 2021; Jabbar, 2015; Jabbar et al., 2014). Such evidence has many sources.
Evidence Contributors and Their Incentives
A variety of funders, researchers, authors, and publishers contribute to the production and dissemination of education-related evidence. Although various categorization schemas exist, McDonnell and Weatherford (2020) sorted evidence contributors into research producers and three subcategories of intermediary organizations: translators and disseminators, policy advocates, and hybrid organizations. These categories include public and private entities that vary in terms of their activities, incentives, and the extent they promote their own policy agendas.
Research Producers
Research producers involve the traditional sources of universities and independent research institutions (e.g., RAND Corporation, American Institutes for Research [AIR]). Research producers are “guided by the canons of the scientific method that constrain both the production of research and the framing of results” (McDonnell & Weatherford, 2020, p. 56). Henig (2009) expounded on traditional research norms: The academy, writ large, socializes—albeit imperfectly—researchers into a normative outlook and incentive system that tends to reward care in design, attention to topics of long-term importance, transparency of method, openness to data sharing and replicability, framing studies within a larger body of research, testing ideas in conferences, scrutiny through peer review. (p. 148)
As such, research producers are incentivized to produce generalizable knowledge through reliable studies that can withstand external review. Although it may be tempting to characterize traditional research producers as “unbiased,” all research products reflect their authors’ worldviews in terms of the topics deemed worthy of investigation, the framing of problems, the formulation of research questions, and the interpretation of results. Some researchers may also be particularly attuned to the needs of funders or policymakers. Research producers maintain their credibility by conforming to expectations that “analysis of policy implications will be balanced—transparent in the assumptions and data underlying their results, clear about limitations on the findings, and explicit as to the strengths and weaknesses of different policy options” (McDonnell & Weatherford, 2020, p. 57). Due to these expectations for neutrality, researchers with an explicit agenda are included with other policy advocates in the Hybrid category.
Translators and Disseminators
Translators and disseminators belong to the group often referred to as mediators (e.g., Levin, 2011) because they make the products of research producers accessible to policymakers and practitioners. McDonnell and Weatherford (2020) envisioned translators and disseminators as nonpartisan technical assistance providers. This group includes private businesses available for hire that range in size from individual consultants to large global consulting firms (Gunter & Mills, 2016). Translators and disseminators also include government-sponsored entities that assist provinces, states, or school districts, such as the USED Institute of Education Sciences (IES) and regional research and development laboratories (Cooper et al., 2009). As an example, IES’s What Works Clearinghouse (2016) publishes systematic reviews and ratings of various educational intervention studies based on their alignment with the rigorous methodological standards of randomized controlled trials.
Translators and disseminators are challenged to identify high-quality research, interpret it responsibly, and tailor their products to nonexpert audiences. To accommodate audience preferences, authors from this category may combine research findings with other nonempirical forms of evidence because “often clients are more persuaded by the professional judgement and experience of peers and by compelling anecdotes” (McDonnell & Weatherford, 2020, p. 58). Nevertheless, the need to maintain credibility in a competitive information marketplace may incentivize translators and disseminators to present evidence responsibly (Feuer, 2016).
Member Organizations
Membership organizations are also intermediaries that distribute evidence, but they have explicit policy agendas (McDonnell & Weatherford, 2020). In the field of education, member organizations include employee unions (e.g., American Federation of Teachers [AFT], National Education Association [NEA]), professional associations (e.g., American Association of School Administrators [AASA]), chambers of commerce, and charter school coalitions, among others. These organizations have expressed missions to further their members’ ideological or material interests. They do so by deploying media strategies, directly lobbying government officials, and publicly disseminating literature reviews, reference lists, research summaries, policy briefs, and fact sheets (Cooper, 2014). Member organizations are unconstrained by the norm of balanced reporting and often use research to bolster their arguments and win support for their positions (McDonnell & Weatherford, 2020).
Policy Advocate Hybrids
Hybrid organizations or “synthesizers” (Reckhow et al., 2021) also act as intermediaries and policy advocates by investing their resources to convince policymakers their favored policy solutions are desirable, feasible, and palatable to constituents. Hybrids differ from member organizations by lacking a defined membership base and more often engaging with research production (McDonnell & Weatherford, 2020). Hybrids include think tanks (e.g., Independence Institute), mission-oriented nonprofit organizations (e.g., The New Teacher Project [TNTP]), and policy-oriented foundations (e.g., Bill and Melinda Gates Foundation, Eli and Edythe Broad Foundation). Some hybrids resemble “venture philanthropists” who use grants to fund “advocacy research” or pilot studies of innovative reforms designed to yield measurable results that advocates can use to justify favored policies (Lubienski et al., 2016). As the number of hybrid organizations has grown substantially in recent decades (Feuer, 2016), these groups have increasingly formed networks to concentrate their resources on issues of common interest (DeBray et al., 2014; Lubienski et al., 2016; Reckhow & Snyder, 2014; Scott & Jabbar, 2014).
Biased Uses of Evidence
The wide array of actors involved in producing education-related evidence could beneficially increase the amount and variety of information available to policymakers and help them accurately assess problems and potential solutions. Yet, more research also provides more opportunities for bias to enter the policy arena from error, ignorance, or strategic manipulation. According to Parkhurst (2017), technical bias involves evidence that strays from the principles of the scientific method. Technically biased pieces of evidence lack soundness or precision in terms of their planning, data collection, analysis, or reporting. Technical bias also includes the selection or interpretation of research findings in a manner inconsistent with their methods or conclusions. Issue bias reflects the ways in which people violate the value of democratic representation by using evidence to “shift the political debate to particular questions or concerns in a non-transparent way” (Parkhurst, 2017, p. 8). Issue bias can arise through the framing of a problem to center a limited set of concerns, ignoring certain types of populations or outcomes, or privileging particular forms of evidence, such as randomized controlled trials. Common indicators of bias include “cherry picking” favorable studies, mischaracterizing findings, ignoring disconfirming evidence (Brown, 2015), and “appeals to ‘evidence’ as a purely rhetorical strategy to gain support” (Parkhurst, 2017, p. 108).
It remains unclear whether the expansion of private interests into education policy has increased or decreased bias in the evidence presented to policymakers. Because the federal government has historically been the greatest investor in education research in the United States, Feuer (2016) argued that privately funded research offers a counterweight to mitigate “group think” within the education bureaucracy. McDonnell and Weatherford (2020) thought hybrid organizations would be incentivized to maintain their credibility by producing high-quality evidence. Critics, however, question whether the expansion of advocacy research has limited researchers’ focal topics, eroded evidentiary standards, and incentivized the opportunistic disclosure of results (e.g., Henig, 2009; Lubienski et al., 2009).
Researchers investigating the fidelity of evidence presented to policymakers used discourse analysis (Asen et al., 2013), rhetorical analysis (Tobiason, 2019), bibliographic ethnography (Edwards et al., 2020), and critical analysis of research methods and findings (Lubienski et al., 2009). Studies that reviewed citations and reference lists showed policy advocates repeatedly cited favored studies despite their weak methodology and inconsistent or uncertain effects (Edwards et al., 2020; Lubienski et al., 2009). Participants in school board debates bolstered their claims with vague, general references to “research” or “studies” and erroneously presented researchers as a single actor who held a consensus position (Asen et al., 2013). Tobiason (2019) determined that proponents of using value-added models (VAMs) to evaluate teachers dismissed disconfirming evidence with unjustified reasoning that the concerns would be resolved. After interviewing policymakers and other entities involved with education reforms in New Orleans, Jabbar et al. (2014) concluded that “policymakers are receiving evidence that is filtered, and much of it is not based on peer-reviewed research, but instead is derived from descriptive trends data, think-tank reports, or selected academic works that support the perspective of the intermediary organization” (p. 1024).
Some evidence suggests wealthy policy advocates may have played an outsized role in setting some education policy agendas or disproportionally influenced decisions about public schools. A study focused on the advancement of market-based reforms in U.S. urban school districts determined that private foundations and their partners were more successful in promoting favorable evidence to policymakers than the less-resourced opposition from teachers, parents, and community members (Scott & Jabbar, 2014). Other researchers documented the disproportional participation of entities associated with private foundations in the policy discussions that led to the federal teacher evaluation student growth mandates (Reckhow & Tompkins-Stange, 2018; Tompkins-Stange, 2016).
Empirical Grounding: SLOs
The following sections provide context for the analysis that follows by applying the concepts raised thus far to the topic of SLOs. After outlining the merit pay and comprehensive teacher evaluation reform policy windows when SLOs were developed and advanced, the narrative turns to presenting the evidence for SLOs as effective teacher performance measures.
Merit Pay Initiatives
SLOs were birthed in Denver Public Schools as part of a teacher merit pay initiative (Bell et al., 2001). Denver’s investment in teacher merit pay was at the forefront of a growing teacher accountability movement. During this period, policy briefs advocating for reform (a) characterized teachers as the most important variable in student performance, (b) defined teacher effectiveness as primarily involving student achievement gains, and (c) promoted “work force policy” solutions such as rigorous evaluations of teacher performance coupled with incentives or sanctions (e.g., Center for Teaching Quality, 2007; Goldhaber, 2006). Policy advocates interested in making teacher compensation less dependent on uniform salary schedules dismissed ongoing concerns about the difficulty of measuring teacher effectiveness and the high failure rates of merit pay initiatives since the early 1900s (e.g., Murnane & Cohen, 1985). Groups such as the New Schools Venture Fund and the Gates, Broad, and Walton Foundations believed schools could repurpose student standardized testing data to determine teacher effectiveness (Brewer et al., 2015; Goldhaber, 2006; Kraft, 2018; Podgursky & Springer, 2007). These efforts contributed to the establishment of the federal Teacher Incentive Fund (TIF) grant program that from 2006 to 2016 awarded US$2 billion to “high-need” charter schools, districts, and states willing to use alternate forms of teacher compensation to close achievement gaps (USED, 2015).
Denver’s Pay for Performance initiative and SLOs began in 1999 as part of a voluntary pilot in 12 elementary schools (Bell et al., 2001). The Denver pilot involved advocacy research designed to “establish a linkage between teacher compensation and student achievement” and to be “a local pilot with far-reaching implications” (Bell et al., 2001, p. 14). In 2005, Denver branded the initiative “ProComp” and, after receiving a TIF grant, expanded merit pay to 90% of their schools (Congressional Research Service, 2011). Two other high-profile merit pay initiatives in the Austin and Charlotte-Mecklenberg school districts also received TIF grants, involved SLOs, and produced evaluation research. Although the merit pay initiatives in Charlotte-Mecklenberg and Austin were short-lived, Denver maintained merit pay and SLOs until 2019 when its teachers’ union won major concessions from the district after a 3-day strike protesting the unpredictability of teachers’ paychecks (A+ Colorado, 2019; Hendee, 2019). Denver dropped SLOs from ProComp, but the initiative’s 20-year history yielded several comprehensive evaluation reports that extensively documented SLOs (e.g., Bell et al., 2001; Briggs et al., 2014; Slotnik et al., 2004).
Comprehensive Teacher Evaluation Reform
Recognizing an open policy window at the beginning of the Obama administration, teacher accountability advocates turned their attention to codifying teacher evaluation reforms into federal policy (Reckhow et al., 2021; Reckhow & Tompkins-Stange, 2018). Fueled by evidence from policy briefs such as TNTP’s “The Widget Effect: Our National Failure to Acknowledge and Act on Differences in Teacher Effectiveness” (Weisberg et al., 2009), the U.S. Congress and USED developed a comprehensive framework for reforming state teacher evaluation systems (Reckhow et al., 2021). The RTT competitive grants initiative that awarded US$4.35 billion to 12 states from 2009 to 2011 centered on teacher evaluation reform. Teacher evaluation was also a central component of the waivers from the NCLB sanctions that would have been triggered in 2014 when schools failed to bring all students to proficiency in math and reading. Scoring rubrics for both the RTT grants and NCLB waivers required teacher evaluation systems to include multiple measures and for all teachers to be evaluated by the extent they promoted growth in student achievement (USED, 2009, 2012). These federal policies incentivized teacher evaluation reform through strategic scoring, annual reporting, conducting site visits, and sanctioning states that failed to meet their commitments (Kraft, 2018). The federalist design of the U.S. education system left the specific design, enactment, and implementation of teacher evaluation systems to the states.
At the initiation of RTT, no statewide teacher evaluation system employed all the features rewarded by the program’s scoring rubric (Kraft, 2018). A particular dilemma facing state-level policy designers was that only 20% to 30% of teachers taught students who took state-mandated standardized tests of math and reading and, thus, lacked a clear source of data for the calculation of student growth scores (Prince et al., 2009). In a series of policy briefs targeted to state policymakers, experts identified three potential growth measures for teachers of nontested subjects and grades: (a) schoolwide mathematics and reading scores, (b) standardized tests for all subjects and grades, and (c) SLOs (Holdheide et al., 2010; Marion & Buckley, 2011; Marion et al., 2012). According to the briefs’ authors, each option had significant drawbacks.
SLOs as Effective Measures of Teacher Performance
During the period when states crafted their comprehensive teacher evaluation policies, a limited evidence base supported the use of SLOs as teacher performance measures. Most studies of SLOs came from evaluations of the merit pay systems in the school districts of Denver, Austin, and Charlotte-Mecklenburg. These origins raise questions about selection bias and external generalizability to evaluation systems without financial incentives. The correlational methods used in the Austin and Denver evaluations revealed weak or inconsistent relationships between the percentage of teachers at a school who met their SLOs and student growth in mathematics and reading (Schmitt et al., 2009; Slotnik et al., 2004). The Charlotte-Mecklenburg researchers evaluated SLOs using a longitudinal quasi-experimental design that matched 16 participating schools with comparison schools (Slotnik et al., 2013). Using cross-sectional hierarchical linear models, the researchers found small statistically significant, positive relationships between SLO attainment and student achievement in math and reading at only the elementary level.
Gill et al.’s (2013) and Lachlan-Haché’s (2015) reviews of the SLO literature recognized the inconsistent statistical associations between SLOs and small improvements in academic achievement. They also noted that SLOs incentivized recommended teaching practices in some contexts, but little data indicated SLOs sufficiently distinguished between teachers’ levels of instructional proficiency. Gill et al. (2013) questioned whether the evidence represented “true differences in teacher performance or random statistical noise” (p. 11). Lachlan-Haché’s (2015) review was somewhat more positive, noting SLOs may encourage teachers to be more collaborative, attentive to the quality of their work, and invested in the process of evaluation. Both reviews observed that implementation varied widely across contexts and SLOs were difficult to implement because they required ample information resources, data and assessment literacy, and sufficient time to implement properly. Gill et al. (2013) also reasoned it would be “nearly impossible” to make SLOs a valid and reliable means for evaluating teachers. The authors believed that attaching high stakes to a process where teachers establish learning goals for their own students would incentivize teachers to set low targets.
When the U.S. Congress negated teacher evaluation student growth mandates in 2015 with the Every Student Succeeds Act (ESSA), about half of the states had enacted teacher evaluation policies that required or allowed SLOs (Close et al., 2020). Since that time, many studies of SLOs have documented lackluster compliance among implementers and reinforced the nuanced picture of SLOs depicted above (e.g., Briggs et al., 2014; Donaldson & Woulfin, 2018; Marsh et al., 2017; Mayger, 2022; Robertson-Kraft & Zhang, 2018). Although SLOs offer potential benefits as teacher evaluation measures, researchers raised serious and unresolved concerns that undermined the case for attaching high stakes to teachers’ classroom-level assessments. Yet, the extant literature also suggested that decision makers at various levels of government were unlikely to engage directly with the research on SLOs as they designed, adopted, and implemented teacher evaluation policies. How did various translators and disseminators depict the SLO evidence in the publications they targeted to policymakers? To answer this question, the current study investigates the contributors to SLO policy briefs and state handbooks, the evidence these documents drew upon, and how they represented the evidence to their targeted audiences.
Methods
This qualitative investigation was designed to examine technical and issue bias in publications associated with SLOs. The study involved three phases of analysis that corresponded with the three research questions. Phase 1 identified the affiliations of the policy briefs’ and state handbooks’ funders, authors, and publishers to determine the proportionality of representation across the various categories of contributors at different policy stages (issue bias). Phase 2 used bibliographic tracing to identify the specific evidence that informed the authors’ claims. Phase 3 relied on directed content analysis to determine whether the authors of briefs and handbooks accurately represented the studies they cited (technical bias) and presented a balanced analysis of SLOs as teacher evaluation measures (issue bias).
Data Sources
The data for this study included policy briefs, state handbooks, and published research related to SLOs. SLOs were defined as performance evaluation policies that required public school teachers in Grades K to 12 to set student learning goals and evaluate the degree of success in meeting those goals using local measures over a defined period (Lachlan-Haché et al., 2012a). A complete corpus of SLO literature was compiled to locate policy briefs and state handbooks and facilitate the identification of all references to SLO-related documents during bibliographic tracing. Google and Google Scholar enabled extensive internet searches for publications. The reference pages of found documents also identified sources. When necessary, the internet archive Wayback Machine provided access to documents no longer available at the original locations. This extensive search revealed 179 documents. Three types of documents were targeted for analysis: policy briefs, state handbooks, and the original research studies they cited. Each publication was downloaded and systematically reviewed.
Policy Briefs
Documents were identified as policy briefs when the author labeled the publication as such or the author’s stated intent was to translate information to identified policymakers. Some policy briefs mentioned SLOs within broader discussions of teacher evaluation or merit pay systems and others focused solely on SLOs. The targeted documents included publications that represented themselves as a “research and policy brief” (e.g., Holdheide et al., 2010) or an “evaluation brief” (e.g., Bailey et al., 2016) when the publication’s primary focus was the provision of policy advice. For consistency, the current study uses the term “policy brief” collectively for the documents that met these inclusion criteria. Publications presenting only descriptive statistics of state policies or those that mentioned SLOs incidentally were excluded from the sample. The analyzed documents included 22 policy briefs (see Table 2) that spanned three federal policy windows, as shown in the appendix: (a) TIF grants, (b) RTT grants, and (c) NCLB waivers.
State Handbooks
The search revealed 75 SLO-related documents from departments of education (DOE) websites hosted by the states that required or allowed SLOs as an option for evaluating teachers from 2012 to the present. In keeping with the study’s focus on the transmission of information about SLOs, the documents selected for inclusion from each state provided narrative text explaining SLOs to school personnel. Excluded documents merely listed requirements or presented quality criteria in checklists or rubrics. The authors of included documents identified them as “handbooks,” “guidebooks,” “user guides,” or “administrative manuals.” For consistency, this study uses the term “handbook” to characterize the 21 documents that met the inclusion criteria (see Table 3). Like policy briefs, some documents were solely devoted to SLOs and others included SLOs with other information about teacher evaluation. Ten handbooks came from states where district policymakers could choose whether to adopt SLOs and 11 came from states that mandated the use of SLOs for teacher evaluation.
Research Studies
The initial search revealed 55 original research studies that investigated SLOs. Only the seven research studies cited or referenced by policy briefs or state handbooks were targeted for in-depth analysis.
Analytic Techniques
Phase 1 determined the contributors of each publication and categorized their affiliations. For each policy brief and cited SLO research study, the funders, lead author, and publisher were recorded in a spreadsheet. Anecdotal notes documented any information supplied within the publication about the author’s institutional affiliations and the publishing organization. When institutional affiliations were unclear, a systematic internet search located authors’ workplaces on the dates of publication and sufficient information about the contributing organizations to facilitate categorization. State handbooks were also reviewed to determine whether they involved contributors unaffiliated with the state’s DOE. For the small number of handbooks that involved outside contributors, similar anecdotal notes documented information about them from the handbooks or the contributors’ websites. Funders, authors, and publishers were categorized using the following codes developed from McDonnell and Weatherford’s (2020) work: Research Producers (researchers without an explicit policy agenda), Technical Assistance Providers (translators and disseminators of information without an explicit policy agenda), Member Organizations (policy advocates that represent member interests), and Policy Hybrids (policy advocates that sponsor, produce, translate, and/or disseminate research). An additional code was added for the USED. A matrix was then created to reveal patterns across policy stages.
Phase 2 investigated where authors obtained their evidence about SLOs. Each claim related to SLOs was extracted, pasted into a spreadsheet, and its stated source of evidence was recorded. Bibliographic tracing involved reviews of the citations, reference pages, and footnotes of each policy brief and state handbook to locate direct citations from the SLO corpus. Evidence sources were coded with the a priori codes of “SLO Research Citation” and “Other SLO Citation,” (e.g., literature about SLOs other than original research). The code “Non-SLO Citation” documented the use of non-SLO literature as evidence for specific claims about SLOs. The code “Research” represented general invocations of research without named sources. The authors’ many claims supported by uncited anecdotal examples prompted the development of the code “Anecdote.” The few briefs that presented expert analysis or self-collected data necessitated the in vivo code “Author Expertise.”
Phase 3 relied on directed content analysis to examine and classify text based on its content or meaning (Hsieh & Shannon, 2005). In this step, the extracted SLO claims were coded using a priori codes resembling those used by Edwards et al. (2020) from their similar analysis of the corpus related to Colombian charter schools (e.g., Positive Claims, Neutral Statements, Negative Claims, Critique Methods, Erroneous). Assigning the “Erroneous” code necessitated comparing the text with the cited document to determine whether the authors’ assertions could be reasonably supported by the referenced evidence. For uncertain cases, I engaged in peer debriefing with a colleague to reach consensus on the final code. In a second round of coding, each claim was assigned a subcode that represented the specific claim. The positive claim codes were the following: (a) incentivizes quality instructional practices, (b) increases student achievement, (c) invests teachers in evaluation, (d) promotes collaboration, and (e) accurate measures of teacher performance. The negative claim subcodes were the following: (a) difficult to implement, (b) unreliable or inconsistent measures, (c) susceptible to cheating or gaming, (e) weak evidence base, and (e) high stakes introduce conflicts of interests. A matrix was then created to reveal patterns across documents.
Limitations
This study investigated bias in SLO policy briefs and state handbooks through document analysis. The findings are limited by the study’s focus on publications within the SLO corpus and the specific states included. The results, therefore, cannot claim to represent the entirety of the SLO policy discourse or generalize to all policy briefs or state handbooks. Although statistical representation is not the purpose of qualitative research, the work that follows makes a valuable contribution by illuminating key contributors and potential sources of bias within documents targeted to policymakers and implementers at various levels of government.
Results
The analysis of the 22 policy briefs related to SLOs revealed that only three were targeted to federal-level policymakers (see the appendix). Instead, most publishers aimed their briefs at state decision makers and released them after federal policies were already established. Merit pay briefs generally followed the adoption of the TIF, most teacher evaluation policy briefs appeared after the launch of RTT, and most briefs solely devoted to SLOs came after the initiation of the NCLB waivers. Thus, a few policy advocates framed the policy problem and solution, but most policy briefs involved technical assistance providers presenting information about how to conform with a federal mandate to state officials. Similarly, the 21 state handbooks published at the end of the NCLB waiver window involved state officials presenting information about how to comply with state mandates to local educators. The following sections outline the wealth of information these publications provided about their contributors, the sources they relied on, and how these documents framed the evidence that supported their claims about SLOs.
Contributors to SLO Policy Briefs and State Handbooks: Issue Bias
As illustrated in Table 1, the direct contributors to the focal policy briefs and state handbooks spanned the categories of federal and state DOEs, technical assistance providers, research producers, policy hybrids, and member organizations. The categories are discussed in order of prevalence.
Organizational Affiliations of Contributors to Student Learning Objective Publications
Note. Categories exceeded 100% when publications had funders, authors, and publishers from different categories.
Departments of Education
State DOEs were the primary authors and publishers of all handbooks. The USED was listed as the funder for 55% of the policy briefs. USED support was likely undercounted, however, because AIR’s involvement with two federal technical assistance centers (outlined in the next section) suggests federal sponsorship of the four AIR briefs that withheld information about their funders. Furthermore, only one of the handbooks from the six states that received federal RTT grants explicitly mentioned receiving USED support. Nine handbooks recognized the contributions of other external individuals or organizations.
Technical Assistance Providers
Technical assistance providers published or authored 82% of the policy briefs and contributed to 29% of the state handbooks. Nine technical assistance providers contributed to 18 briefs and 11 contributed to six state handbooks. Many technical assistance providers were from independent consulting firms (e.g., APA Consulting, Center for Assessment, Education Development Center, KSA Plus Communications). ETS—an entity commonly associated with standardized testing—authored two briefs. AIR authored and published four implementation and policy guides focused on SLOs and contributed to Missouri Department of Elementary and Secondary Education’s (2016) handbook. Although AIR (2022) is often considered a research producer, it also offers technical assistance services.
Four technical assistance providers had government affiliations. Virginia Department of Education’s (2015) handbook credited the Center for Innovative Technology created by the commonwealth to promote technology development. Three USED-sponsored technical assistance providers included the Center for Educator Compensation Reform, National Comprehensive Center for Teacher Quality (TQ Center), and Reform Support Network (RSN). A cross-categorical group of providers constituted the federal centers.
The Center for Compensation Reform supported TIF grantees by publishing a policy brief on merit pay systems for teachers in nontested subjects and grades. The center’s institutional affiliations included technical assistance providers (i.e., AIR, Synergy Enterprises, Westat) and research producers (i.e., Vanderbilt University, University of Wisconsin).
TQ Center published two briefs. The USED formed TQ Center as one of “five content centers that provide expert assistance to benefit states and districts nationwide on key issues related to current provisions of ESEA [Elementary and Secondary Education Act]” (Holdheide et al., 2010, p. 33). TQ Center was a collaboration between ETS, AIR, and Vanderbilt.
RSN published five briefs on SLOs and other topics related to comprehensive teacher evaluation. USED formed RSN to support and provide a collaborative community of practice for RTT grantees, which may explain why they put forth the only publications without identified authors. Individuals associated with RSN spanned multiple categories, including technical assistance providers (e.g., Education First, ETS, and others), research producers (i.e., University of Wisconsin, RAND), and policy advocate hybrids (e.g., TNTP, Community Training and Assistance Center [CTAC]).
To qualify for the technical assistance category, organizations had to refrain from explicit policy advocacy. A few organizations that met this criterion were neither ideologically neutral nor completely disengaged from policy work. Some consulting firms articulated mission statements that signaled policy preferences or offered to engage in policy work on behalf of their clients. For example, Education First’s (2022) website espoused a commitment to racial equity and stated, “We work closely with policymakers, practitioners, funders and advocates to design and accelerate policies and plans.” By contrast, APA Consulting’s (2022) mission suggested the organization supported a market-driven approach that emphasized resources, costs, efficiency, and helping clients “solve problems so they can meet performance goals.” Thus, the lines between categories could be somewhat blurry.
Research Producers
Traditional research producers were directly involved in 14% of policy briefs and handbooks. Individuals from Vanderbilt and Brown Universities authored three briefs. Virginia Department of Education’s (2015) handbook listed James H. Stronge, a professor at the College of William and Mary, as a project consultant and referenced his publications. Wisconsin Department of Public Instruction’s (2017) and South Dakota Commission on Teaching and Learning’s (2015) state handbooks also referenced the involvement of individuals from universities.
Policy Hybrids
Hybrid organizations that engaged in policy advocacy and research activities contributed to a third of the briefs and no state handbooks. Although their numbers were few, hybrids acted during key policy windows. In 2007, three local foundations funded the first merit pay brief that mentioned SLOs. In the same year, the Independence Institute published a Broad Foundation–funded brief outlining the merit pay initiative in Denver. The following year, CTAC published the first brief solely devoted to SLOs based on their advocacy research in Denver. Several years later, the Gates Foundation funded three technical analyses of methods for using student growth measures to evaluate teachers in nontested subjects and grades. The independent policy institute Center for American Progress published one of these briefs.
Two hybrid organizations self-identified as technical assistance providers, but their publications or websites indicated they were explicit policy advocates. The Center for Teaching Quality’s (2007) policy brief described the organization’s advocacy for the advancement of teacher merit pay. CTAC (2022b) claimed to have developed the research base that fostered Congressional support for the TIF program and “assisted the USED to adopt, and more than 40 states and several thousand school districts to embrace, SLO’s as a model for new teacher and principal evaluation systems” (CTAC, 2022a).
Member Organizations
Member organizations contributed to one policy brief and three state handbooks. The AFT and AASA authored a Gates Foundation–funded brief outlining a framework for teacher evaluation systems that also provided basic information about SLOs. The New Jersey Department of Education (2016) and South Dakota Commission on Teaching and Learning (2015) handbooks mentioned the involvement of representatives from educator unions. Conversely, although the Center for Teaching Quality’s (2007) policy brief derived its information from a panel of 18 teachers, the document explicitly stated that the participants were not representatives of teacher unions.
Organizational Interconnections
Many briefs involved funders, authors, and publishers from different categories. The most common linkage pattern was a USED-funded brief authored and published by a technical assistance provider or traditional research producer (n = 11). The USED-funded document published by policy hybrid CTAC represented the lone exception to this pattern. Policy hybrid funders were more eclectic in their associations. Foundation-funded briefs were authored or published by other policy advocates (n = 3), technical assistance providers (n = 3), member organizations (n = 1), and research producers (n = 1).
Evidence Authors Drew Upon to Support Claims About SLOs: Technical and Issue Bias
As indicated in Tables 2 and 3, SLO policy briefs and state handbooks varied widely in the sources they cited to support their claims about SLOs. Ten state handbooks offered no sources of information about SLOs. The other authors used various combinations of SLO research studies, their own expertise, policy briefs, anecdotes, and expert appeals as supporting evidence.
Sources for Policy Briefs’ Evidence About SLOs
Note. Numbers in the Policy Brief column signify the number of policy briefs cited by each document (e.g., Bailey et al., 2016 cited 3 different policy briefs). SLO = Student Learning Objective; AFT = American Federation of Teachers; AASA = American Association of School Administrators; RSN = Reform Support Network. TNTP = The New Teacher Project.
Sources for State Handbooks’ Evidence About SLOs
Note. SLO = Student Learning Objective; TNTP = The New Teacher Project; RSN = Reform Support Network.
SLO Research Studies
As shown, 10 briefs (45%) and four handbooks (18%) invoked original research or evaluation studies related to SLOs. At most, an individual publication cited two SLO studies. The documents collectively referenced seven SLO studies that varied in quality and scope.
Authors most often cited the CTAC advocacy research studies authored by Slotnik and colleagues. Seven briefs and two handbooks referred to CTAC’s evaluations of Denver’s merit pay pilot (Bell et al., 2001; Slotnik et al., 2004). Two handbooks cited Slotnik et al.’s (2013) later study of the merit pay pilot in Charlotte-Mecklenberg. The CTAC studies were technically rigorous and followed scientific norms for reporting samples, methods, results, and limitations, but Slotnik et al. (2004, 2013) headed sections of their reports “National Implications” and encouraged readers to generalize the results from their limited samples to all districts interested in alternative teacher compensation. Although the CTAC Denver studies were the only studies of SLOs available for the two policy briefs written in 2007, later authors could have referenced long-term research studies out of Denver (Gonring et al., 2007; Wiley et al., 2008) or the studies emerging from Austin (e.g., Schmitt et al., 2009).
TNTP authored and published the other study cited multiple times. This study outlined lessons learned from the first-year implementation of a comprehensive teacher evaluation system in Indiana. TNTP’s (2012) report was one of the first to document a reformed evaluation system, but it was light on methodological detail, withheld survey response rates, and buried the sample sizes in the document’s extensive endnotes. The authors targeted their recommendations to all Indiana schools without establishing the representativeness of the six-district sample. TNTP offered another example of a technical assistance provider acting as a policy advocate. TNTP’s (2022) website stated that after documenting “systemic indifference to teacher effectiveness in The Widget Effect,” they worked with 13 state DOEs and 50 districts to reform teacher evaluation and reached 15% of U.S. teachers with their “next-generation” evaluation systems.
Publications from research producers yielded only one citation each. These authors included researchers from Austin School District’s Department of Program Evaluation, the University of Connecticut, and Mathematica Policy Research. Authors from the research producer category provided ample detail about their methods and results, presented a neutral stance, refrained from policy advocacy, and made limited recommendations. Although the two cited documents from Austin presented evaluation summaries with minimal information about the research methods and their limitations (Lamb & Schmitt, 2012; Schmitt & Ibanez, 2011), the district also published similar data with more detail in its annual reports.
Author Expertise
Authors of three briefs and one state handbook relied on their own expertise or data as sources of information about SLOs. Marion and Buckley (2011) and Marion and colleagues (2012) developed two white papers targeted to state policymakers that presented technical analyses of various student growth measures for evaluating teachers. Holdheide et al.’s (2010) “Research & Policy Brief” presented descriptive statistics from surveys the authors conducted to identify the challenges of evaluating teachers of English learners and students with disabilities. Virginia Department of Education’s (2015) handbook listed Stronge as a project consultant and referenced his publications that outlined experiences with a process similar to SLOs (Tucker & Stronge, 2005).
SLO Policy Briefs
Twelve policy briefs (55%) cited from one to four other policy briefs as sources of information about SLOs. Authors cited Prince et al.’s (2009) “The Other 69 Percent” most often. In this brief, Prince briefly mentioned SLOs when describing the Denver merit pay initiative. Instead of citing any of the Denver studies, however, the author referenced a Center for Teaching Quality (2007) policy brief that obtained its information about SLOs from a panel of teachers. Several authors also cited the RSN (2012d) and Lachlan-Haché et al. (2012c) briefs solely devoted to SLOs and two briefs about growth measures (Goe & Holdheide, 2011; Holdheide et al., 2010).
Three state handbooks (14%) referenced from one to six policy briefs as SLO information sources. Each of them cited at least one of the AIR briefs written by Lachlan-Haché et al. Utah also referenced two RSN briefs and a brief by Goe and Holdheide (2011).
Anecdotal Data
The use of anecdote to support claims was evident in 72% of the policy briefs and 14% of the handbooks (Tables 2 and 3). Some anecdotes offered uncited descriptions of the well-documented merit pay systems in Denver or Austin (e.g., Holdheide et al., 2010; South Carolina Department of Education, 2015). Other anecdotes described initiatives with little or no documentation of effectiveness, such as newly reformed evaluation systems in Connecticut, New York, and Rhode Island (e.g., Potemski, 2013; RSN, 2012c).
Expert Appeals
Seven documents made appeals to expertise to support claims about SLOs. Four state handbooks (18%) referred to general “research” or “evidence” without providing any further information. The Rhode Island Department of Education (2015), for example, simply stated that SLOs were “based on research” (p. 17).
A few publications grounded their credibility in the involvement of teachers. The Center for Teaching Quality (2007) claimed its brief was “compelling” because it showcased “the authentic voices of educators” who include “national, state and district teachers of the year; Presidential Award winners; Milken honorees” (p. 2). New Jersey Department of Education (2016) offered that its handbook was developed with educator feedback. South Dakota Commission on Teaching and Learning’s (2015) handbook referenced teacher opinions from an unpublished study of its evaluation pilot.
Representations of Evidence in Briefs and Handbooks: Technical and Issue Bias
Determining whether authors used evidence to depict SLOs in a technically correct manner proved to be a more difficult task than identifying issue bias in the balance of presented claims. Minimal citation and sloppy sourcing made it challenging to track down evidence sources. Some authors provided in-text citations without accompanying full references and vice versa. Authors frequently cited documents by their publishers instead of the authors’ names (e.g., “CTAC, 2004” instead of “Slotnik et al., 2004”). Thus, when Lachlan-Haché et al. (2012b) provided only an in-text citation for “What Works Clearinghouse, 2011,” there was little hope of identifying the referenced publication among the many contenders. Errors repeated across publications suggested some authors engaged in secondary citation instead of accessing the referenced sources directly.
When feasible, textual analysis of the individual claims indicated that most authors reasonably represented the evidence for SLOs by using qualifiers such as “can” and “when implemented as intended.” Authors, therefore, refrained from making provably false claims—with a few notable exceptions.
Misrepresentations of Evidence
The authors of one policy brief and three state handbooks erroneously referenced non-SLO sources to support claims about SLOs incentivizing beneficial instructional practices. Errors often stemmed from mistaken assumptions that SLOs employ the same processes as those outlined in the source material.
In one example, Lachlan-Haché et al. (2012c) cited a What Works Clearinghouse article from 2009 to support the claim: “SLOs reinforce best teaching practices. Setting goals for students, using data to assess student progress, and adjusting instruction based on that progress demonstrate good teaching practices” (p. 1). The cited document, however, cannot reasonably support this statement because Hamilton et al. (2009) rated the quality of the evidence for the listed practices as “low.” The source material also used the phrasing, “teach students to examine their own data and set learning goals” (Hamilton et al., 2009, p. 19), but students do not set SLO goals—teachers do. South Dakota Commission on Teaching and Learning’s (2015) handbook listed the same reference to What Works Clearinghouse without an in-text citation and Utah State Office of Education’s (2014) handbook referenced the article to support claims about the efficacy of using data to drive instruction.
Authors of the Utah State Office of Education (2014) and Arizona Department of Education (2015) handbooks made similar errors in their references to Beesley and Apthorpe (2010). Arizona’s handbook cited the document to support the statement, “Current research shows that creating SLOs strategically aligned to instruction has a positive impact on increased learning of students” (p. 1). Although the full reference was missing, the cited item appears to be a McREL research report that neither mentioned SLOs nor referenced any known studies of SLOs. The most relevant section in the McREL document summarized findings from research focused on student-selected goals and providing students with regular feedback. Arizona Department of Education’s (2015) handbook also cited Beesley and Apthorpe (2010) without a full reference to support a claim about the efficacy of strategically aligned instruction.
Balance of Claims
The claims analysis revealed substantial differences in the extent briefs and handbooks presented a balanced representation of the SLO evidence. As illustrated in Table 4, policy briefs collectively offered a somewhat even distribution of positive and negative claims about SLOs. By comparison, state handbooks were far more positive in their depictions, with only two state handbooks (10%) presenting any negative claims at all.
Authors’ Claims About SLOs in Policy Briefs and State Handbooks
Note. Data represent 22 policy briefs and 21 state handbooks. SLO = Student Learning Objective.
Looking at individual documents, the analysis determined that four briefs (18%) made only neutral statements about SLOs. These publications mainly provided basic information about SLOs or descriptions of how to implement them. Five policy briefs (23%) presented a similar number of positive and negative claims, seven (32%) tipped toward the positive, and six (27%) were more negative. The proportions of positive and negative claims differed somewhat by the policy windows when the documents were published. The three briefs by policy advocates published prior to RTT presented a mainly positive depiction of SLOs (i.e., Degrow, 2007; Prince et al., 2009; Slotnik & Smith, 2008). The critical analyses of growth measures distributed during the RTT window included one publication that made only negative claims (i.e., Goe & Holdheide, 2011) and three Gates Foundation–funded critical reviews (i.e., Marion & Buckley, 2011; Marion et al., 2012; Tyler, 2011). During the NCLB waiver window, technical assistance providers published briefs that were mostly positive or balanced in the claims they presented.
As indicated in Table 4, some claims appeared far more or less frequently in briefs when compared with handbooks. The two most popular positive claims were that SLOs can incentivize high-quality teaching practices (41% briefs vs. 81% handbooks) and contribute to increased student achievement (38% vs. 57%). The two most common negative claims were that SLOs may be unreliable or inconsistent measures (55% vs. 10%) and they can be difficult to implement (50% vs. 5%). About one fourth of both briefs and handbooks put forth that SLOs incentivize teachers to be more invested in the evaluation process. Briefs were far less likely than state handbooks to claim SLOs promote collaboration (14% vs. 57%) or suggest that SLOs accurately measure teacher performance (18% vs. 38%). Only five of the 43 publications mentioned the weak evidence base for SLOs. These authors most often discussed the limited scope of the SLO studies and called for more research. Few authors mentioned that SLOs were susceptible to cheating or gaming or that using SLOs for high-stakes purposes introduces conflicts of interests that incentivize teachers to set low goals. No handbooks reported these negative claims.
Rhetorical Devices
Although many policy briefs presented negative claims about SLOs, the manner of presentation was sometimes misleading. Several policy briefs outlined problems with SLOs and then seemed to resolve them by offering untested solutions. Lachlan-Haché et al.’s (2012c) frequently cited implementation guide offered an example: Poorly set targets, badly timed meetings, a lack of consistent training, and a myriad of other problems can limit the quality of SLOs and make the process a cumbersome routine with little meaning. Guidance, training, and monitoring procedures can help ensure the quality of SLO rigor and comparability, while further innovation can help reduce the impact of additional challenges. (p. 6)
As seen, the authors were transparent that SLOs could be low quality and difficult to implement, but they dismissed these concerns with a list of mitigating solutions that included professional development and a vague reference to “further innovation.” The causal inference anchoring passage was that SLOs would be successful if the correct technical solutions were employed. Yet, in the preceding sections of the guide, Lachlan-Haché et al. (2012c) offered only anecdotal evidence and survey research from early initiatives to support the listed technical solutions (i.e., Donaldson, 2012; TNTP, 2012). In a similar manner, Lachlan-Haché et al. (2012a), Potemski (2013), and RSN (2012d) used anecdotes as supporting evidence for technical solutions to remediate the unreliability of SLOs.
Discussion
As one of the only empirical analyses of education policy briefs and implementation handbooks, this study makes important contributions to the body of policy research. The relatively accessible pool of SLO literature provided an ideal data set for investigating bias in information presented to decision makers at various levels of government. Parkhurst’s (2017) concepts of technical and issue bias were instrumental in identifying insufficient faithfulness to the norms of scientific research and inadequate representation of the evidence base. Using bibliographic mapping and directed content analysis, the study documented examples of technical and issue bias among the reviewed documents, although their main contributors were DOEs and technical assistance providers focused on sharing information about existing government mandates. Although explicit policy advocates produced only a handful of policy briefs, these entities were over-represented among the few research sources authors drew upon to support their claims about SLOs. Instead of referencing original research, over half of the authors supported their statements by citing policy briefs, sharing anecdotes, or appealing to expert opinion.
Regarding technical bias, most authors refrained from making provably false claims, but sloppy citation practices made some claims challenging to confirm, and four documents misrepresented their source material. Regarding issue bias, the overall pool of policy briefs exhibited sufficient variety to support Feuer’s (2016) assertion that broad contributions to the evidence base can mitigate groupthink. However, few individual briefs and almost no state handbooks portrayed SLOs in a balanced manner, and several authors promoted an overly optimistic understanding of SLOs by describing implementation challenges and then dismissing those concerns with unproven solutions. This pattern of outlining problems and “solving” them reflected the rhetorical device Tobiason (2019) documented in the discourse about VAMs. The following discussion expands on these core findings by providing further insight into the previously under-examined category of technical assistance providers and extending conversations about timing and representation as vital factors in policymaking (e.g., Reckhow et al., 2021).
One of the study’s major findings was the substantive involvement of technical assistance providers in 82% of policy briefs. These translators and disseminators of information included an array of independent educational consulting businesses and the federal technical assistance providers RSN, TQ Center, and the Center for Compensation Reform. Our interpretation of their involvement as good, bad, or neutral hinges on whether the discussion centers on the technical quality of consultants’ products or the growing role of private interests in decisions about public schooling. Considering the former, McDonnell and Weatherford (2020) thought technical assistance providers would be a “reliable source of valid and generalizable evidence” (p. 70) because their need to attract clients would incentivize them to maintain positive reputations and achieve results. The current findings somewhat support this statement. Despite variations in the quality of their publications and over-reliance on nonempirical sources, technical assistance providers’ products were generally less biased than the SLO handbooks published by state DOEs. Moreover, unlike state DOEs, most private educational consultants identified their products’ authors and publicly shared their backgrounds, values, and beliefs about education. Such transparency enables motivated readers to investigate probable influences on how authors interpret information (e.g., technical expertise or ideology). Conversely, the association of federal technical assistance centers with multiple independent consultants, researchers, and policy advocates makes it more complicated to discern who contributed to their publications.
The USED’s extensive contracting of private consultants to facilitate teacher accountability reforms supports assertions that the federal government’s expanded role in education policy has fueled the growth of the advice industry (e.g., Gunter & Mills, 2016; Jabbar et al., 2014) and contributed to a blurring of public and private sectors (e.g., Ball, 2010). Private consultants’ long lists of state and district clients also indicated that the involvement of these organizations extended to all levels of government. This pervasiveness was evident when CTAC and TNTP produced advocacy research into merit pay and teacher evaluation reforms, promoted the adoption of favorable policies, and then solicited business from states and districts offering to assist in implementing those policies.
The activities of CTAC and TNTP muddled the researcher, policy advocate, and technical assistance provider roles in questionable ways. The example with the clearest impact was CTAC’s highly cited Denver study that used evidence from a limited sample to promote widespread adoption of teacher merit pay and included a three-page section on the assistance districts would need to implement such an initiative (Slotnik et al., 2004). This behavior aligns with Ball’s (2010) description of companies and consultants “for whom policy is a business opportunity and from whom governments are increasingly purchasing ‘policy knowledge’” (p. 127). The “policy for profit” business model presents a conflict of interests, especially when combined with research activities that justify or introduce reforms. The current findings, therefore, suggest McDonnell and Weatherford’s (2020) conception of hybrid organizations should include technical assistance providers who engage in policy advocacy. The presence of mixed motives also justified taking a more critical stance toward the CTAC and TNTP publications than seems to have occurred. Greater skepticism toward advocacy research, in general, might mitigate the outsized impact of hybrid organizations documented by this and other research (e.g., DeBray et al., 2014; Lubienski et al., 2009; Reckhow & Tompkins-Stange, 2018; Scott & Jabbar, 2014). However, the Gates Foundation’s sponsorship of the policy briefs with some of the most robust and critical analyses of SLOs indicates that readers should not simply dismiss evidence due to its associations with policy advocates.
The disproportionately high citation of advocacy research in this study can be partly explained by its timing. Policy advocates published and distributed the first SLO research and policy briefs; thus, their products were readily available when federal officials were formulating the teacher evaluation policy framework. The vast majority of SLO publications came later as more contributors shared crucial information. Specifically, the cost estimates for implementing SLOs only appeared in one of the last published briefs (Fermanich et al., 2015), and the Gates Foundation’s critical analyses of alternative growth measures arrived after RTT grants had already mandated the use of student growth in teacher evaluation. This sequence offers one explanation for why so many documents about teacher evaluation made no claims that SLOs were a reliable means for assessing teachers’ performance. Most authors addressed the question, “How can SLOs be implemented?” without considering whether SLOs should be used to evaluate teachers at all. Repeatedly, policy brief authors admitted that insufficient evidence supported using SLOs as evaluation measures but encouraged state and local policymakers to adopt them anyway. This advice seems jarring when a main argument for mandating student growth as an evaluation tool was to make schools more data-informed (USED, 2009, 2012). Authors’ recommendations to adopt SLOs should, therefore, be interpreted within the context of a preexisting federal mandate and other flawed options for assessing student growth for teachers of subjects and grades without standardized tests (Marion & Buckley, 2011).
While some viewpoints were over-represented among the focal publications, others seemed nearly absent. This oversight matters because the values and positions missing from or magnified within a policy discourse represent important facets of Parkhurst’s (2017) concept of issue bias. The most underrepresented contributors to the SLO publications were the member organizations of the teachers and administrators tasked with policy implementation—the two groups most affected by teacher evaluation reforms. During the early teacher accountability policy windows, teacher evaluation policy briefs from the NEA (2010; NEA & Center for Teaching Quality, 2009) and the AFT and AASA (2011) focused far more attention on opposition to VAMs than they did to SLOs. The educator associations’ briefs echoed the popular framing of the low school performance problem by recognizing the importance of teacher effectiveness, but they differed by offering comprehensive policy solutions that went beyond teacher evaluation reform to include advancing teachers’ professional growth and improving school cultures and working conditions. These priorities can be detected in the state handbooks’ amplified claims about SLOs incentivizing beneficial instructional practices and promoting collaboration.
Although some state handbooks overstated the positive aspects of SLOs in ways that could make them more appealing to educators, many handbooks made no attempt to persuade district implementers that SLOs were worthwhile uses of their time. The authors merely presented a series of instructions, sometimes prefaced by a relevant statute. Both approaches may undermine implementation fidelity by overlooking the vital role administrators and teachers play in interpreting policies and determining how to fit them into their contexts (Leithwood, 2018). Presenting an unrealistically optimistic depiction of SLOs risked failing to prepare local administrators for implementation challenges. Conversely, making little attempt to explain the potential benefits of SLOs offered implementers few reasons to expend more than minimal effort to meet requirements. Indeed, research indicates that SLO implementation has often resulted in teachers and principals adopting compliance orientations and engaging in behaviors that undermine the integrity of their evaluation systems (Ford et al., 2017; Longchamp, 2017; Marsh et al., 2017; Mayger, 2022). Such lackluster results may explain why states such as Georgia, Rhode Island, and Pennsylvania dropped SLOs from their teacher evaluation systems once Congress granted flexibility to the states through ESSA (2015).
Implications for Research, Policy, and Practice
As a qualitative document analysis, the implications of this study are strongest in areas where the findings converge with extant research. These recommendations fall into three general areas: (a) interpreting evidence, (b) timing evidence production, and (c) developing systems that support a healthy policy ecosystem.
In an environment with so much evidence of varying quality, information literacy skills are vital for research producers, translators, and disseminators as well as those who interpret their products. Information consumers should learn to recognize policy advocacy, differentiate between types of evidence, identify technical and issue bias, and detect the use of unproven solutions to negate genuine concerns. Given these needs, educational institutions must equip their students with the skills and dispositions to be critical information consumers, ethical research producers, and effective public champions for the rigorous use of evidence (Brown, 2015). Areas for further research include developing a better understanding of how audiences interpret educational policy briefs and handbooks. Topics of interest include the specific features of these documents that influence evidence uptake or implementation fidelity and whether readers perceive USED sponsorship as implying neutrality, quality, or endorsement of the contents. The field could also benefit from knowing more about the skill sets educational consultants bring to their work.
This study reinforced the importance of making evidence available at significant policy stages. Too much information about SLO weaknesses surfaced after important decisions were already made, and too little attention was paid to the needs of implementers. A wider array of constituencies should, therefore, equip themselves to provide accessible information during early policy stages when policy problems and solutions are being framed and narrowed (Reckhow et al., 2021; Serpell, 2020). More policy briefs should focus on the likely costs to implementors and explicitly consider the unintended consequences of solution strategies. To involve research producers in facilitating policy implementation, evaluation, and learning, McDonnell and Weatherford (2020) also advised incentivizing a greater use of research-practice partnerships.
Finally, a healthy policy ecosystem has systems that enable individuals to make smart choices. To aid evidence consumers in evaluating sources, Feuer (2016) suggested the creation of a reputable independent agency that rates policy advocate organizations by the technological rigor and credibility of their products. The author also advised establishing nonpartisan entities to offer timely research analysis and summaries for policymakers at federal, state, and local levels. Currently, the federal Congressional Research Service (2011) prepares private reports for Congressional staff on issues under consideration, and the USED’s What Works Clearinghouse summarizes and rates some education research for the wider public. Other reliable sources of information about educational innovations include Best Evidence Encyclopedia and Blueprints for Healthy Youth Development. Slavin (2019), therefore, proposed a “virtual encyclopedia” that compiles educational research reviews regardless of where they appear, provided they meet minimum criteria for inclusion. This “wiki”-style approach could offer multiple expert viewpoints together in one convenient place, which would address decision makers’ needs for timely advice within an information-rich environment.
Conclusion
Technical and issue bias are predictable factors in policy discourses about problems as multifaceted and complex as teacher effectiveness (Parkhurst, 2017). Although it may be tempting to view ESSA’s (2015) rollback of teacher evaluation mandates as evidence of policymakers’ enlightenment, the persistence of SLOs in many state teacher evaluation systems (Close et al., 2020) indicates some lessons are yet to be learned from the unwanted consequences of data-informed teacher evaluation policies. The adoption of a performance evaluation technique that was arguably more resource-intensive and equally or less reliable than the former maligned practices points to blind spots in the field of education regarding data-based decision-making that have yet to be fully examined. This myopia was evident in teacher accountability advocates’ credulous faith in standardized test scores without regard to their practical limitations. It was also present in authors’ tendencies to conflate SLOs with other forms of data-based instruction, as if all data uses were equally beneficial.
Parkhurst’s (2017) work suggests these blind spots may stem in part from mistaken assumptions that evaluating school and teacher effectiveness are solely technical matters. But in a society where the very purposes of public schools remain contested, determining what makes an education, or a teacher, effective has ethical and political dimensions (Cambron-McCabe, 2012). Technocratic rationalism can obscure these dimensions and fuel issue bias when it focuses a policy discourse on “what works” without considering varying perspectives about what it means for something to “work.” The unfortunate results of policymakers’ incomplete causal assumptions are unsound policies with unintended consequences that schools may be ill-equipped to buffer.
Footnotes
Appendix
Basic Information About SLO Policy Briefs and Cited Research Organized by Policy Window
| Type | Publication | Funder | Lead author | Publisher | Focus | Audience |
|---|---|---|---|---|---|---|
| 2001–2005 | Pre-TIF | |||||
| Research | Bell et al. (2001) | 4 local foundations | CTAC | CTAC | Evaluation of Denver merit pay pilot | National/local |
| Research | Slotnik et al. (2004) | Broad and 7 local foundations | CTAC | CTAC | Evaluation of Denver merit pay pilot | National/local |
| 2006–2008 | TIF | |||||
| Brief | Center for Teaching Quality (2007) | 3 national foundations | KSA Plus Communications | Center for Teaching Quality | Merit pay | National |
| Brief | Degrow (2007) | Broad Foundation | Independence Institute | Independence Institute | Merit pay | Colorado |
| Brief | Slotnik & Smith (2008) | USED | CTAC | CTAC | SLOs and merit pay | District |
| Brief | Prince et al. (2009) | USED | Vanderbilt University | Center for Ed. Compensation Reform | Merit pay for NTSG | State/district |
| 2009–2011 | RTT and TIF | |||||
| Brief | Holdheide et al. (2010) | USED | Vanderbilt University | TQ Center | Evaluating NTSG | State/district |
| Brief | AFT & AASA (2011) | Gates Foundation, Carnegie Corp. | AFT & AASA | AFT and AASA | Evaluation system design | Districts |
| Brief | Goe & Holdheide (2011) | USED | ETS | TQ Center | Evaluating NTSG | State/district |
| Brief | Marion & Buckley (2011) | Gates Foundation | Center for Assessment | Center for Assessment | Evaluating NTSG | State/district |
| Brief | RSN (2011) | USED | RSN | RSN | Evaluation system design | State/district |
| Research | Schmitt & Ibanez (2011) | N.D. | AISD | AISD | Evaluation of Austin Merit pay year 2 | Local |
| Brief | Tyler (2011) | Gates Foundation | Brown University | CAP | Evaluating NTSG | District |
| 2011–2014 | NCLB Waivers & TIF | |||||
| Research | Donaldson (2012) | CAP | University of Connecticut | CAP | Teacher interviews re: evaluation reform | State |
| Brief | Lachlan-Haché et al. (2012a) | N.D. | AIR | AIR | SLO implementation | State/district |
| Brief | Lachlan-Haché et al. (2012b) | N.D. | AIR | AIR | SLOs as measures | State/district |
| Brief | Lachlan-Haché et al. (2012c) | N.D. | AIR | AIR | SLOs benefits & challenges | State/district |
| Research | Lamb & Schmitt (2012) | N.D. | AISD | AISD | Evaluation of Austin Merit pay year 3 | Local |
| Brief | Marion et al. (2012) | Gates Foundation | Center for Assessment | Center for Assessment | SLOs for NTSG | State/district |
| Brief | RSN (2012a) | USED | RSN | RSN | Educator support for reform | State/district |
| Brief | RSN (2012b) | USED | RSN | RSN | Evaluating NTSG | State/district |
| Brief | RSN (2012d) | USED | RSN | RSN | SLOs as measures | State/district |
| Brief | RSN (2012c) | USED | RSN | RSN | SLO quality control | State/district |
| Research | TNTP (2012) | N.D. | TNTP | TNTP | Evaluation of Indiana teacher evaluation pilot year 1 | Indiana |
| Brief | Potemski (2013) | N.D. | AIR | AIR | SLO business rules | State/district |
| Research | Slotnik et al. (2013) | USED | CTAC | CTAC | Evaluation of Charlotte-Mecklenberg merit pay year 5 | National/local |
| Brief | Goe et al. (2014) | USED | ETS | AIR | Evaluation system design | State/district |
| 2015–2016 | TIF, ESSA | |||||
| Brief | Fermanich et al. (2015) | USED | APA Consulting | USED | SLO costs | TIF grantees |
| Research | McCullough et al. (2015) | USED | Mathematica Policy Research | IES and REL Mid-Atlantic | Implementer interviews re: alternate growth measures | State/district |
| Brief | Bailey et al. (2016) | USED | EDC | TIF | SLO quality | TIF grantees |
Note. SLO = Student Learning Objective; TIF = Teacher Incentive Fund grants program; CTAC = Community Training and Assistance Center; USED = U.S. Department of Education; NTSG = teachers of nontested subjects or grades; RTT = Race to the Top grants program; TQ Center = National Comprehensive Center for Teacher Quality; AFT = American Federation of Teachers; AASA = American Association of School Administrators; RSN = Reform Support Network; N.D. = not disclosed; AISD = Austin Independent School District; CAP = Center for American Progress; NCLB = No Child Left Behind Act; AIR = American Institutes for Research; TNTP = The New Teacher Project; ESSA = Every Student Succeeds Act; IES = Institute of Education Sciences; REL = Regional Educational Laboratory; EDC = Education Development Center.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Author
LINDA K. MAYGER, EdD, is an associate professor in the Department of Educational Administration and Secondary Education at The College of New Jersey. Dr. Mayger studies systems that enable students and educators to thrive, focusing primarily on state educator evaluation policies and leadership in Full-Service Community Schools.
