Abstract
Until recently, existing research on teacher professional development (PD) has largely relied on teacher perceptions and self-reports to evaluate effectiveness. Though more current research has used a diverse array of designs and methodologies to examine impact on teacher knowledge, practice, and student learning, uncertainty regarding the effectiveness of various PD models remains, particularly for these nonperceptive variables. There has been a call in the field to apply a consistent conceptual framework in order to identify critical mechanisms underlying effective models and to support improved theorizing about teaching and learning. Thus, we present an integrated literature synthesis of one collaborative model of PD, teacher study groups (TSGs), in an effort to make sense of the relatively rich body of research that has been performed on this model. We identified 32 studies that examined TSGs’ impact on teacher and student outcomes and synthesized this research using Desimone’s five-factor conceptual framework, which is being increasingly applied across the field. Findings suggest that TSGs are an effective PD model and that there are components of the model not accounted for in the five-factor framework that affect teacher outcomes and student learning. We conclude with a discussion of implications, including limitations of the five-factor framework and ideas for further refinement that situate PD in a vast empirical landscape.
Keywords
Throughout their careers, teachers must remain abreast of developments in learning science, pedagogy, and evidence-based strategies for supporting complex and diverse student needs. Given this reality, participation in professional development (PD) is a mandated component of nearly all state licensure processes, indicating that educators partake in PD continually across their potential decades of practice. In addition to the time investment, PD mandates equate to significant financial investment among stakeholders, including states, districts, and teachers who seek out offerings beyond the often limited selection provided in their districts (Tooley & White, 2018).
Despite this substantial investment, district and state policies do little to direct teachers and school site leaders to models that are grounded in empirical evidence. On the contrary, PD impact evaluations have indicated that programs often fail to produce meaningful, sustained improvement in teachers’ practice or student outcomes, particularly in scaled-up implementation (Borko, 2004; Garet et al., 2008; Garet et al., 2011; Garet et al., 2016; Glazerman et al., 2010; Harris & Sass, 2011; Jacob & Lefgren, 2004; Randel et al., 2011). Given the long-term impact that teachers have on student outcomes (Chetty et al., 2014; Hanushek, 2011), ensuring that educators have access to meaningful, vetted opportunities to deepen professional knowledge and skills is essential. This review seeks to examine the current evidence of impact for one model of PD, teacher study groups (TSGs), and evaluate the underlying mechanisms that researchers have identified as important to its success.
The Scope of Available PD
Educator PD has taken a variety of forms, which can be classified in a multitude of ways, one of which is a model’s underlying philosophy of learning—whether it is designed for passive recipients of knowledge or active constructors of knowledge. What is commonly thought of as the traditional model of in-service training continues to function as the prevalent form of PD in many districts (Tooley & White, 2018; Wei et al., 2010). This model has been critiqued as a top-down, one-size-fits-all approach that treats teachers as passive recipients of knowledge (Lieberman & Miller, 2014; Little, 1993) rather than active agents engaged in the construction of professional learning. Analyses have found this model to be less effective in improving teachers’ practice than sustained, school-based models in which teachers actively connect learning to ongoing practice (Darling-Hammond et al., 2009; Kennedy, 2016).
These growth-in-practice models seek to engage practitioners in ongoing, collaborative learning that is directly connected to their daily work (Darling-Hammond et al., 2017; Lieberman & Miller, 2014). Teaching is understood as complex and dynamic and thus necessitates discussion and reflection rather than rote execution of established procedures. Teachers’ professional growth is understood as a multifaceted intellectual activity that requires both deep conceptual knowledge and contextualized decisionmaking, which are developed through ongoing participation in a learning community (Vygotsky, 1978; Wenger, 1998).
However—though researchers and practitioners have enacted a vast range of growth-in-practice models—establishing effectiveness, testing underlying theory, and identifying causal mechanisms have remained a challenge. Models have included professional learning communities (PLCs; e.g., Vescio et al., 2008), Japanese lesson study (e.g., Doig & Groves, 2011), and TSGs (e.g., Gersten et al., 2010). Though grouped under the same growth-in-practice umbrella, these three models have critical differences. The PLC model is generally teacher-driven in terms of focus (i.e., group members select content for discussion on a meeting-by-meeting basis), and the PLC functions as the source of knowledge from which members draw to answer questions about improving practice (Newmann, 1996; Vescio et al., 2008). For this reason, some researchers have critiqued PLCs for lack of connection to empirically grounded practices and for perpetuating misunderstandings when the collective group lacks accurate knowledge of focus topics (e.g., Little, 2003). Also grounded in ongoing, practice-focused collaboration, Japanese lesson study is characterized by teachers iteratively participating in lesson study cycles, composed of planning a lesson, observing one group member teach the lesson, collecting data during that observation, and then collaboratively analyzing those data (Lewis et al., 2006).
Akin to PLCs and Japanese lesson study groups, TSGs are composed of colleagues who meet regularly and focus inquiry on how their instruction affects student learning. However, TSGs depart from the two previously discussed models in that they are predicated upon a preplanned scope and sequence and content grounded in empirical research. Whereas a PLC or lesson study group may focus on a fluid array of topics that members select, a TSG concentrates on a single, preselected topic over a span of time, such as instructional practices for emergent literacy (e.g., Cunningham et al., 2015) or mathematics instruction (e.g., Koellner et al., 2011). Furthermore, by design, a TSG includes the provision of new content in order to increase collective knowledge by leveraging some form of expert input (e.g., a university faculty member or master teacher) to facilitate integration of new knowledge and skills into the inquiry process.
Growth-in-Practice: Establishing Effectiveness
Therefore, application of collaborative learning communities has varied considerably (Louis et al., 1996; Vescio et al., 2008), and this wide variation has challenged the field to produce empirical evidence supporting school-situated models for learning (see Yoon et al., 2007). Although a full body of research has shown that particular instructional practices positively influence student achievement (e.g., heterogeneous and flexible grouping; Boaler, 2006; Cohen & Lotan, 2014), questions remain regarding whether and how PD fosters teachers’ implementation of high-impact practices (Borko, 2004; Desimone & Garet, 2015; Hill et al., 2013). Though recent work has moved toward the use of objective impact measures and designs that allow for causal inference (e.g., Jayanthi et al., 2018)—broadening understanding of the underlying processes leading to change—the body of evidence remains largely composed of shallow pools of research characterized by breadth rather than depth (Kennedy, 2016). Within each of these model-specific pools (e.g., research on TSGs), the range of methodologies and lack of common conceptual frameworks have compromised the generalizability of individual study findings to the larger body of PD research.
In addition to expanding in methodology, research has sought to improve the field’s understanding of growth-in-practice PD through a strengthening of the conceptual frameworks employed. Researchers have argued that grounding empirical work in common frameworks that focus on the underlying mechanisms affecting teacher and student outcomes—rather than narrowly on the direct efficacy of various models through experimental and quasi-experimental designs (QEDs)—would connect the separate substrands of research by illuminating common, active ingredients in effective PD (Desimone, 2009; Desimone & Garet, 2015; Lindvall et al., 2018; Wayne et al., 2008). Accordingly, Kennedy (2016) synthesized extant research by grouping studies based on the PD’s content focus and method for supporting teachers to integrate that content into their practice. She demonstrated that less prescriptive interventions had a greater impact on students’ learning than those delivering a set of prescribed teaching behaviors. Such a focus on underlying components may hold promise in answering questions regarding impact on students and model-general features that facilitate processes of change.
Conceptual Framework
Working within a PD research paradigm that is focused on mechanisms leading to change in teacher knowledge, teacher practice, and student outcomes will support improved understanding of underlying processes. A viable common, core conceptual framework, then, has the potential to ameliorate lingering questions by first identifying critical mechanisms and then examining how they interact in a larger theory. Seeking to do so, Desimone synthesized extant research and concluded that sufficient empirical evidence existed (e.g., Desimone et al., 2002; Desimone et al., 2013; Garet et al., 2001) across correlational, quasi-experimental, experimental research, and case studies in support of five critical features (for a complete review of the theory and research underlying the conceptual framework, see Desimone, 2009). As a result, Desimone (2009) constructed the five-factor conceptual framework, which stipulates that five components underlie effective PD in any form: content focus, active learning, coherence, sustained duration, and collective participation.
This conceptual framework posits that these five core features work in an operational theory, which describes how effective PD influences teacher and student outcomes. In this model, the five factors are theorized to affect intermediate and distal outcomes through the following pathway: (a) teachers experience effective PD; (b) this PD increases teachers’ knowledge and skills and/or their attitudes and beliefs; then, (c) teachers use new knowledge, skills, attitudes, and/or beliefs to improve their instruction, approach to pedagogy, or both; and finally, (d) these instructional improvements result in increased student learning.
Since publication of the five-factor conceptual framework, researchers have tested it, and some have suggested that the framework requires modification to sufficiently explain empirical results (e.g., Garet et al., 2011). Given this, Desimone and Garet (2015) made refinements to the framework, three of which are relevant to this study. First, post-2009 research suggested that content focus may be more complicated than originally described, in that changing teachers’ procedural classroom behaviors is easier than deepening content knowledge or inquiry-oriented instructional strategies. In other words, PD attempting to do the latter may have heightened requirements in terms of duration and designing teacher learning in a manner that promotes more complex practices. Second, research has indicated that individual teachers vary in response to the same PD, which, in turn, varies the strength of impact on students. Therefore, taking individual teachers’ needs and levels of knowledge into account is critical in designing effective PD. That said, PD must also remain balanced with a design that fosters collective participation over individualized learning plans for teachers. Finally, PD is more successful when it is closely linked to teachers’ classroom practice (e.g., Santagata et al., 2010), suggesting that alignment with current teaching context is an additional, critical component of coherence.
Thus, given Desimone and Garet’s (2015) refinements, this review is grounded in the five-factor conceptual framework and related definitions of the critical features of effective PD: (a) Content focus: Activities focus on subject-specific content and how students learn content in that academic domain; (b) Active learning: Rather than passively sitting and listening, teachers learn through participation in endeavors like planning for implementation, reviewing student work, presenting to one another, and receiving feedback; (c) Coherence: Activities, content, and goals are aligned with both teachers’ knowledge, beliefs, classroom practice, and student needs, and their mandated curriculum, goals, and assessments; (d) Sustained duration: Activities are ongoing and include at least 20 hours of contact time; (e) Collective participation: The group is composed of teachers of the same subject, grade level, or school to build an interactive, sustainable learning community.
Purpose and Research Questions
Across the research field, evidence establishing the extent to which various PD models are successful and, relatedly, whether effectiveness can be sufficiently explained by the presence of particular underlying features—such as those described in the five-factor framework (Desimone, 2009; Desimone & Garet, 2015)—remains weak (Hill et al., 2013; Kennedy, 2016). Grounding systematic reviews of PD research in such a framework, then, holds potential for identifying limitations, reconceptualizing subsets of research, and integrating them into a larger picture of effective PD in a vast empirical landscape.
We therefore leverage Desimone’s (2009) and Desimone and Garet’s (2015) work by systematically reviewing and synthesizing the evidence underlying TSGs, a PD model that has been implemented in varied iterations and studied through a range of research designs. Although additional research is needed to determine whether the five-factor framework provides a feasible model for understanding patterns in research findings, grounding such a synthesis in the framework holds potential for enhancing understanding of why particular studies have or have not demonstrated impacts on teachers and students. It is worth noting that Desimone and Garet (2015) specified that alignment with current teaching material is an additional, critical component of coherence in need of consideration. This is an underlying premise of TSGs and comprises one of the reasons that this synthesis focuses on this particular PD model. Furthermore, TSG studies have employed a broad array of methodologies (e.g., survey, randomized controlled trials [RCTs], qualitative, mixed methods), resulting in a relatively rich research base for analysis. We are not aware of any such comprehensive review of the TSG model, and such a review affords the opportunity to explore lingering questions about TSGs in a manner different than that afforded by a single experimental trial or case study, enabling a broader analysis of effectiveness situated in the five-factor conceptual framework.
Thus, the purpose of this integrative literature synthesis (Torraco, 2005) was to amalgamate and reconceptualize this body of research in order to support stakeholders in identifying models that affect teachers’ practice and student outcomes. Reviewing the literature on TSGs in this manner provides a model both for future systematic evaluations of PD programs and for exploration of the viability of the five-factor framework to illuminate patterns across studies. The integrative literature synthesis methodology allows for the weaving together of findings from multiple strands of research, acknowledging that both qualitative and quantitative methodologies provide valuable insight into understanding a phenomenon.
In doing so, we conducted separate analyses of the quantitative studies (i.e., examining direct impacts of a program) and qualitative studies (i.e., examining participant perceptions and/or underlying processes leading to change), and then synthesized these findings into a set of holistic conclusions (Creswell, 2014; Sandelowski et al., 2006). This approach was appropriate for two reasons. First, the qualitative and quantitative research comprised complementary work within a shared domain, in that they each addressed different parts of the phenomenon of interest. The two sets of studies neither confirmed nor refuted one another but rather complemented each other by linking causal explanations to detailed descriptions and observations. As Sandelowski et al. (2006) characterized, in this type of analysis, quantitative findings provide that-knowledge (e.g., that participation in a TSG led to higher levels of knowledge than participation in traditional PD), whereas qualitative findings provide why-knowledge (e.g., the contextual or identity factors that explain these observations). Second, given the nature of this set of studies, employing this type of analysis allowed for the configuration of findings in a line of argument that posits relationships among concepts or events. Given that these two subsets of research on TSGs addressed different but related aspects of participation, it was appropriate not to reduce them but rather to synthesize them into a coherent whole.
The quantitative component of this review focused on the effectiveness of TSGs, as measured by impact on teachers’ knowledge, teachers’ practice, and student outcomes. The qualitative analysis examined how teachers experienced TSG participation in relation to their knowledge, practice, and students, and how teachers’ experiences related to the factors identified in the five-factor conceptual framework. Synthesizing findings across these two methodologies allowed for the integration of complementary data sets, providing deeper insight into the phenomenon of TSG participation (Sandelowski et al., 2006). Therefore, the following research questions guided this review:
Method
As previously mentioned, given that the research base is composed of a mixture of qualitative, quantitative, and mixed-methods designs, we employed an integrative literature synthesis methodology. By definition, this methodology involves reviewing, critiquing, and synthesizing research on a topic in a way that generates new knowledge or perspectives. The goal of such a review is to go beyond summarizing the literature or finding an effect size; the method involves challenging and extending existing knowledge through an in-depth analysis of patterns across studies and critically examining the body of work to arrive at new conclusions (Torraco, 2005).
Although this synthesis addresses the relatively mature topic of teacher PD, the need for reconceptualization is warranted given ambiguity in findings regarding effectiveness of various models and calls to reframe research across the field through use of a common conceptual framework. Furthermore, reviewing and synthesizing existing research is a critical step toward identifying evidence-based models of teacher development and guiding future research, and no such review has examined patterns across studies related to the five-factor framework (Desimone, 2009; Desimone & Garet, 2015). Therefore, in performing our research, we used a comprehensive search strategy to locate TSG studies that were peer-reviewed, published, and relevant to teacher and student impact, particularly in relation to the five-factor framework. In consultation with a research librarian, we conducted this review in three phases: (a) title and abstract search, (b) full text review, and (c) literature coding and data extraction.
Phase 1: Title and Abstract Search
We began with systematic searches using the keywords teacher study group and teacher work group located in article abstracts, titles, and subjects. After reviewing these results, we added educator study group to our search terms. Search engines used included EBSCO Academic Search Complete, JSTOR, and Web of Science. This initial search yielded 361 articles. We entered each article’s title and abstract into an Excel spreadsheet, deleted duplicate entries, and coded the remaining articles against six inclusion criteria. The first criterion was that we included only studies that were published in English. Second, we included only studies published in a peer-reviewed journal; this excluded book chapters, technical reports, unpublished dissertations, and master’s theses. Third, we determined whether the study was empirical (i.e., collecting primary data) or conceptual and included only empirical studies. Fourth, we coded for participant population and included only studies with participants who were teachers in grades PK–12. Fifth, we included only studies that implemented a TSG as the independent variable. Because not all interventions were explicitly called TSGs in the studies, we used the following criteria to determine that the model studied was equivalent to a TSG: (a) Teachers met/worked in a structured, collaborative group that convened regularly around a pre-planned topic and included some type of expert input (e.g., school-based facilitator, university faculty, researcher-created materials and guidelines); (b) teachers learned and discussed content based on empirical research to improve teaching and student learning; and (c) teachers reflected on their practice. The sixth and final inclusion criterion was that the study collected data on at least one of the following outcome variables: teacher learning, teacher practice, and/or student outcomes. This included studies that employed quantitative, qualitative, and mixed-methods designs, given that our methodology was designed to synthesize findings across these strands of research. All articles that did not meet these requirements were eliminated, and articles with unclear coding results based on the abstract and title review were moved to the next phase, which resulted in 57 articles eligible for full-text review.
Phase 2: Full-Text Review
We reviewed the full texts of all studies included after Phase 1. In order to qualify for the third phase, and thus be included in this review, studies had to have met the six aforementioned inclusion criteria. Both primary researchers read and coded all studies in this phase. We calculated interrater reliability using Cohen’s kappa, which indicated strong agreement (κ = 0.89, 95% confidence interval [0.78, 1.01]), with disagreement on three studies. We discussed these three articles until reaching consensus. The most common reasons for exclusion were that the model being implemented did not meet criteria to be defined as a TSG (41% of the excluded articles) and that the study’s outcome variables did not include teacher knowledge, teacher practice, or student outcomes (28%). This resulted in 27 articles eligible for inclusion. We then hand-searched the reference lists of these 27 studies and found five additional studies that met our inclusion criteria; we found full texts of these studies using Google Scholar. This culminated in a set of 32 articles (k) that were eligible for inclusion. Figure 1 depicts our search and inclusion process.

Study selection PRISMA flowchart (on the basis of Moher et al., 2009). PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-Analyses.
Phase 3: Study Coding and Data Extraction
After reviewing full texts to determine inclusion, we coded each included study on relevant features, which were ultimately used to discuss and synthesize patterns of strengths, contributions, and limitations. First, we coded each for descriptive characteristics (see Table 1 for descriptive data), including (a) participant characteristics (i.e., grade level and domain in which participants taught), (b) study design characteristics (i.e., methodology, number of participants), (c) TSG intervention characteristics (i.e., content area of focus, how TSG was defined, scope and sequence, expert input, and dosage), and (d) outcome variables and instruments used. It should be noted that although not all studies referred to their intervention as a TSG, given that they met inclusion criteria to be defined as one in the scope of this research, they are referred to as TSGs for the remainder of this article. In addition to descriptive coding, we extracted data relevant to our research questions regarding impact on teacher knowledge, teacher practice, and student outcomes from both quantitative and qualitative lenses.
Descriptive characteristics
Note. TSG = teacher study group; RCT = randomized controlled trial; QED = quasi-experimental, between-subjects design; WS = quasi-experimental, within-subjects design; QL = qualitative; QN = quantitative; MM = mixed methods; ES = effect size; TOPEL = Test of Preschool Early Literacy.
Denotes studies where effect sizes were not reported. We calculated Cohen’s d when possible using methods outlined by Borenstein et al. (2009).
Quantitative Analysis
Our first research question focused on the extent to which the data demonstrated TSGs’ impact on teacher knowledge and practice. To answer this question, we extracted relevant characteristics, measurement tools, analyses, and effect sizes of all studies using quantitative or mixed-methods designs to measure teacher knowledge or practice. Our second research question focused on the extent to which the studies demonstrated impact on student outcomes, directly. We extracted findings from each study that employed a student measure and analyzed student impact as it related to the descriptive features of each study. To answer our third research question, we coded each article for the presence or absence of the factors identified in the five-factor conceptual framework (Desimone, 2009; Desimone & Garet, 2015).
Qualitative Analysis
To answer our fourth question, we coded the qualitative studies and the qualitative components of the mixed-methods studies for key themes related to teacher perceptions of the PD process. We analyzed each study’s qualitative findings by drawing upon Miles et al.’s (2020) systematic approach to qualitative analysis. Rather than following a predetermined set of codes, we used an emergent strategy to illuminate dominant patterns across studies (Creswell, 2012). In doing so, we performed an initial open-coding round, in which we identified emergent features that described the essence of the phenomenon of interest (i.e., participation in a TSG). We then grouped these codes into fewer overarching categories and used these categories to perform a second round of focused, second-order coding. Each of the authors independently coded the data and created analytic memos to summarize emerging patterns (Maxwell, 2013), and we then met to compare code application, clarify or revise code definitions, and calibrate when necessary (Miles et al., 2020). We arrived at a consensus for all decisions, rereading and discussing aspects of articles until we reached agreement.
Results
Descriptive Characteristics
Of the 32 included studies, 19 (59%) were qualitative, five (16%) were quantitative, and eight (25%) employed mixed methods. Within the quantitative set, a majority used pre-post, within-subjects designs (k = 6), and the remaining studies were RCTs (k = 4) and between-subjects QEDs (k = 2). Most studies (k = 18; 56%) involved elementary school teachers, with the others relatively evenly distributed across other grade levels. The most common source of expert input in the TSGs was a member of the research team (k = 18), followed by a trained district employee (k = 7), a combination of both (k = 6), or—in the case of one study—a researcher-created resource kit providing guidance on how to teach fractions (Lewis & Perry, 2017). In terms of domain, most studies’ focus was literacy (k = 12)—either a specific aspect (e.g., vocabulary) or literacy instruction in general—with the next most common being mathematics (k = 5), followed by non-domain-specific instructional practices (k = 5). The remaining studies implemented TSGs with a focus on science instruction (k = 4), inclusive practices (k = 3), music instruction (k = 1), instructional technology (k = 1), and student-led individualized education plans (k = 1). See Table 1 for our descriptive coding results.
Impact on Teacher Knowledge and Practice
Our first research question focused on the extent to which the studies demonstrated that TSGs influenced teacher knowledge and practice. To answer this question, we synthesized data from the 11 studies that analyzed impact on teachers’ knowledge and the eight that analyzed impact on practice using quantitative methods. Some studies measured just one of these constructs and others measured both, so these subcategories were not mutually exclusive.
Teacher Knowledge
Of the 10 studies that measured impact on teachers’ knowledge, a majority (k = 7) found that participation in a TSG resulted in a meaningful positive impact on the outcome variable. These TSGs had a variety of foci, both domain-general and subject-specific (see Table 1); for studies that did not report effect sizes but did include enough information for us to calculate Cohen’s d, we did so using methods outlined by Borenstein et al. (2009). Most studies that reported a positive impact on knowledge used assessment tools other than teachers’ self-reports (Brahier & Schäffner, 2004; Cunningham et al., 2015; Jayanthi et al., 2018; Koellner et al., 2011; Lewis & Perry, 2017). Further considering results given each study’s research design adds dimension to these data. Most of the studies that found a positive impact on knowledge employed a quasi-experimental, pre-post, within-subjects design (Brahier & Schäffner, 2004; Cunningham et al., 2015; Elster, 2009; Khourey-Bowers et al., 2005; Koellner et al., 2011). In other words, positive findings referred to significant increases in knowledge within subjects on pre- to postintervention measures. The remaining two studies that reported positive impacts were RCTs (Jayanthi et al., 2018; Lewis & Perry, 2017). Jayanthi et al. found that teachers in a TSG condition outperformed those in a control condition. Lewis and Perry compared a TSG condition to both a control and a PD condition that were identical to the TSG without the expert input, and they found that teachers in the TSG performed significantly better than both groups on knowledge measures.
Two studies (Gersten et al., 2010; Heller et al., 2012) demonstrated mixed results regarding impact on knowledge. In their RCT, Gersten et al. compared teachers who had participated in a TSG to teachers in a control group. They employed two assessments of teacher knowledge and practice, each correlated with the two domains of literacy that the TSG addressed: comprehension and vocabulary. The researchers used multiple regression and reported Cohen’s d effect sizes using standardized mean differences in outcome variables, and they found that teachers in the TSG group significantly outperformed control group participants on the vocabulary measure (effect size = 0.73) but not on the comprehension measure (effect size = 0.32). Heller et al.’s study compared teachers’ knowledge across four conditions: three systematically varied PDs—one of which was a TSG—and a control group. Their analysis demonstrated that although all three PD conditions outperformed the control condition, there was no significant difference among the three PD conditions in teachers’ knowledge.
One study (Hofman & Dijkstra, 2010) used a QED and did not find significant between-group differences in knowledge of general teaching practices between TSG teachers and those in a comparison PD group. The authors also ran a within-subjects analysis in the two groups; although the between-groups analysis did not show significant differences in teachers’ knowledge, teachers in the TSG group showed a significant increase in knowledge after participating in the intervention, whereas the comparison group did not demonstrate significant growth within subjects. It should be noted that attrition in this study was high, with 33% attrition in the TSG group and 64% in the non-TSG group, so results must be interpreted with caution.
Teacher Practice
Nine studies measured whether participation in a TSG affected teachers’ practice using quantitative methods. Most of these studies (k = 6) found that participation in a TSG resulted in a meaningful positive impact in this area. One study had mixed results (Meyers et al., 1997), and one (Hofman & Dijkstra, 2010) did not find significant differences in practice between teachers in the TSG and comparison groups. Due to study limitations resulting from significant attrition, the Hofman and Dijkstra study will not be discussed further in this section.
Similar to the studies in the teacher knowledge subset, the six studies that demonstrated a positive impact on practice did so through TSGs with a variety of foci, both domain-specific (Brahier & Schäffner, 2004; Cunningham et al., 2015; Gersten et al., 2010; Jayanthi et al., 2018) and domain-general (Cifuentes et al., 2011; Elster, 2009). Instruments used to quantify practice also varied, with half relying on teachers’ self-reports (Brahier & Schäffner, 2004; Cifuentes et al., 2011; Elster, 2009) and two (Brahier & Schäffner; Elster, 2009) using proximal instruments specifically aligned with the TSG content.
Again, considering the research designs of the studies that demonstrated a positive impact on practice adds further context. Both Gersten et al.’s (2010) and Jayanthi et al.’s (2018) RCTs demonstrated that participation in a TSG positively affected teachers’ practice compared to a control group. With a sample of 182 teachers, Jayanthi et al. found that the TSG had significant impacts on teacher-directed vocabulary instruction (p < .001; g = 0.93) and interactive vocabulary instruction (p = .02; g = 0.47). Similarly, Gersten et al. compared teachers in a TSG (n = 217) to control group teachers (n = 251), and they found that TSG participation positively impacted both teachers’ comprehension (p < .01; g = 0.86) and vocabulary instruction (p < .01; g = 0.58). Cifuentes et al. (2011) found a positive impact on practice using a QED, and the remaining three studies (Brahier & Schäffner, 2004; Cunningham et al., 2015; Elster, 2009) used quasi-experimental, pre-post designs with no treatment group to demonstrate a positive impact on teachers’ practice.
Meyers et al.’s (1997) QED had mixed findings regarding impact on practice. They conducted an analysis of variance to compare two groups of teachers, each of which received a different form of PD. In the TSG condition (n = 97), teachers discussed research on teaching practice (e.g., questioning techniques, feedback, and discussion facilitation), classroom management (e.g., praise, correction, and monitoring), behavior management (e.g., positive and negative interactions), and instructional strategies (e.g., read-alouds) with expert input from a certified trainer. In the comparison condition (n = 65), teachers met one-on-one with a supervisor to discuss their instruction. Those in the TSG group outperformed comparison group counterparts on two of four aggregated practice variables: academic statements (e.g., teacher questioning techniques; F = 14.35, p < .001) and students off-task (F = 10.15, p < .001).
Student Outcomes
Seven studies eligible for this review used quantitative methods to explore whether teachers’ participation in a TSG affected students’ achievement. Two studies (Cunningham et al., 2015; Lewis & Perry, 2017) demonstrated a positive impact on students, two demonstrated mixed results (Heller et al., 2012; Saxe et al., 2001), and three had null findings (Gersten et al., 2010; Hofman & Dijkstra, 2010; Jaynathi et al., 2018). These studies used a range of methods for quantifying impact on students (e.g., proximal vs. distal measures of student learning).
Cunningham et al. (2015) and Lewis and Perry (2017) both found a positive impact on students, though these two studies were quite different in terms of design and operationalization of the dependent variable. Lewis and Perry’s three conditions included a control group, a TSG group, and a comparison PD group. The TSG group received expert input in the form of a researcher-designed fractions resource kit, which provided teachers with step-by-step guidance on implementing a lesson study cycle on fractions. The resource kit included information on evidence-based practices for teaching fractions, instructional examples, and discussion guidelines to enhance teachers’ content and pedagogical knowledge. The authors measured impact on students (n = 1,059) by administering a pre- and post-assessment of students’ fraction knowledge and then conducting a multiple regression using standardized mean differences in outcome measures. Assessment items were drawn from national and state tests, published curricula, and research articles. Because TSG teachers had been presented with the formats of some of the assessment items in the resource kit, items were disaggregated into two subscales—familiar and unfamiliar—to address the possibility that some item formats may have been easier for students in the TSG condition due to familiarity rather than actual knowledge. Lewis and Perry’s analysis controlled for student characteristics (e.g., baseline fractions knowledge) and educator characteristics (e.g., years of teaching experience) and hours of instructional time, and found that students in the classrooms of TSG teachers outperformed those in the other two conditions, with an overall coefficient of β = 0.49 (SE = 0.14, p < .001). When the items were disaggregated into the two subscales, TSG students also outperformed both groups (β = 0.44, SE = 0.11, p < .01) on the familiar-item subscale and (β = 0.52, SE = 0.15, p < .001) on the unfamiliar-item subscale.
Cunningham et al.’s (2015) design experiment included three sequential yearlong TSG teacher cohorts. The researchers measured impact on a randomly selected set of students (n = 101) by administering the phonological awareness subtest from the Test of Preschool Early Literacy (Lonigan et al., 2007) as a pre- and post-assessment of students’ phonological knowledge. Because the experimental design did not include a control group of students, the authors compared change in the study’s student sample to standardized scores and percentile ranks provided by assessment developers. At preintervention, 50.5% of the students sampled from participating teachers’ classrooms scored at or below 1 standard deviation below the mean, and at postintervention 31.7% of children had scores at or below 1 standard deviation below the mean. This difference was statistically significant, χ2(1) = 5.38, p < .001, with a Cohen’s d effect size of 0.27. Paired-sample t tests comparing pre- and posttest scores demonstrated that scores improved significantly from pre- (M = 86.42, SD = 11.58) to post-TSG implementation (M = 91.99, SD = 11.58), t(100) = 5.12, p < .001. The researchers compared the mean change to national norms and found that it represented a movement from the 23rd percentile to the 34th percentile in participating students.
Heller et al. (2012) and Saxe et al. (2001) demonstrated mixed results regarding student impact. Heller et al.’s design included four groups: Three were identical in format and science content but varied in connection to teachers’ daily classroom practice—with the TSG being the condition most closely connected—and the fourth was a control condition. Student data included pre- and post assessment scores on a science knowledge assessment comprising selected-response items and students’ written justifications for those answers. Analyses revealed that although there was no meaningful difference between the three experimental groups’ selected-response scale scores, students in the TSG outscored peers in all other conditions on written justifications with an effect size of 0.31. Students in all three experimental conditions outperformed the control group on both scales.
In their QED, Saxe et al. (2001) compared students’ post-TSG implementation scores on a researcher-created assessment of fractions items across three groups: a TSG condition, a systematically varied comparison PD that lacked expert input, and a control condition. Their analysis of impact on students’ fractions knowledge relied on two different subscales: one that measured conceptual understanding of fractions and one that measured computational ability. Though there was no significant difference among the groups on students’ computational knowledge scores, the students in the TSG condition significantly outperformed the other two treatment groups on the conceptual knowledge subscale.
On the other hand, in their RCTs comparing TSGs to a control condition, neither Jayanthi et al. (2018) nor Gersten et al. (2010) found a significant impact on students’ knowledge. Using three subtests from standardized, nationally normed assessments of students’ vocabulary knowledge, Jayanthi et al. reported no significant differences between students in the treatment and control groups in postintervention scores. Similarly, Gersten et al. employed subtests from nationally normed assessments of vocabulary and comprehension knowledge, and—though their results demonstrated a moderately large effect on students’ oral vocabulary (0.44) and small effect sizes for reading vocabulary (0.21) and passage comprehension (0.13)—none of these tests reached statistical significance.
Five-Factor Conceptual Framework
Our third research question asked whether reconceptualizing quantitative findings given the five-factor framework (Desimone, 2009; Desimone & Garet, 2015) would reveal patterns that further explained impact on teacher and student outcomes. Our quantitative sample for this analysis comprised all 12 previously discussed studies that examined the impact of participation in a TSG on teacher and/or student outcomes. We divided studies into two groups—those measuring teacher impact and those measuring student impact—and we examined patterns that emerged across study results in light of the presence or absence of the five factors.
First, a notable pattern emerged related to student outcomes in the subgroup of three studies (Heller et al., 2012; Lewis & Perry, 2017; Saxe et al., 2001) that quantified students’ knowledge with instruments containing two subscales, one of which tapped students’ deeper understanding (e.g., conceptual vs. computational knowledge). In these studies—each of which employed randomized, multigroup designs—students of teachers participating in the TSG scored significantly higher on the scale measuring deeper understanding. Heller et al.’s study compared the impact of teachers’ participation in three systematically varied PD conditions—each of which met criteria for all five factors (i.e., content focus, active learning, coherence, sustained duration, and collective participation)—and a no-treatment control group. Although each of the three PD conditions improved students’ scores on a multiple-choice assessment well beyond those of the control condition, there was no meaningful difference in scores among the three PD conditions. However—despite the three comparison conditions meeting criteria for all five factors—students in the TSG condition scored significantly higher than those in the other two conditions on the written justification subscale, a measure of deeper understanding.
Relatedly, Saxe et al. (2001) employed a measure of students’ fractions knowledge that disaggregated scores into a computational- and a conceptual-knowledge subscale. They assigned teachers to one of three conditions: a TSG, a comparison PD that lacked expert input, and a control group. Results indicated no differences among the three conditions in students’ computational knowledge; however, students in the TSG condition significantly outscored those in the other two conditions on the conceptual knowledge subscale, despite both treatment conditions meeting criteria for all five factors.
In a similar vein, Lewis and Perry’s (2017) study compared student performance across three conditions: a control group, a TSG, and a comparison condition systematically varied to lack expert input. The two treatment conditions each had four factors present, as neither achieved the 20-hour requirement to be classified as having sustained duration (Desimone, 2009). Researchers assessed students’ fractions knowledge with an instrument also containing two subscales: one for familiar and another for unfamiliar items. Again, despite both the TSG and the comparison condition meeting criteria for the same number of factors, students in the TSG condition outperformed comparison group peers on both subscales, including the more difficult, unfamiliar-items measure.
Second, the explanatory power of the five-factor framework was most clear in the studies that fell on the high or low ends of the total number of factors present. Those that operationalized the TSG model in a way that met criteria for all five factors (Cunningham et al., 2015; Heller et al., 2012; Saxe et al., 2001) registered impact on all outcome variables, and the study that met criteria for the fewest factors (Hofman & Dijkstra, 2010) demonstrated the lowest impact on teacher and student outcomes. Yet there were additional studies that did not align with this pattern, particularly those that documented a meaningful impact on outcome variables despite not meeting criteria for all five factors. Cifuentes et al. (2011), for example, did not meet criteria for content focus but demonstrated a positive impact on teachers’ practice. Neither Jayanthi et al. (2018) nor Khourey-Bowers et al. (2005) met criteria for coherence or sustained duration; however, both demonstrated positive impacts on teacher knowledge. It should be noted that Jayanthi et al. found a positive impact on teacher practice but not on student learning. Similar to Lewis and Perry (2017), Koellner et al.’s (2011) study did not meet criteria for sustained duration but demonstrated a positive impact on teachers’ knowledge. Elster (2009) did not provide enough information to determine whether their TSG met the 20-hour criterion but nonetheless demonstrated a positive impact on both teachers’ knowledge and practice. Finally, Brahier and Schäffner (2004) found a positive impact on teachers’ knowledge and practice despite not meeting criteria for collective participation.
Qualitative Data
Our final research question asked how teachers experienced participation in a TSG in relation to knowledge, practice, student learning, and factors identified in the five-factor framework. Through our analysis of the qualitative studies (k = 19) and the qualitative component of the mixed-methods studies (k = 7), three broad themes emerged: (a) valuing of five-factor framework components, (b) perceptions of student impact, and (c) TSG-specific features with added value.
Valuing of Five-Factor Framework Components
One dominant theme that emerged was the valuing of particular components of the five-factor framework (Desimone, 2009; Desimone & Garet, 2015), namely, coherence and sustained duration. Sixty-two percent of the studies in the qualitative sample produced codes related to how coherence affected whether teachers changed their practice as a result of participating in the TSG, as measured by teachers’ goals, current levels of knowledge, and mandated frameworks and assessments. In particular, when TSG work was congruent with the teacher-level aspects of coherence (i.e., teachers’ current levels of knowledge and beliefs), teachers were more likely to alter their practice. For example, Brownell et al. (2006) found that high adopters of TSG focus practices possessed an aligned belief that improved instruction could eliminate problem behaviors and that supporting students’ behavior was as important as teaching academics. Conversely, if TSG materials lacked congruence with teachers’ current levels of knowledge and beliefs, it was less likely that the TSG would motivate a change in practice. Brownell et al. (2014) stated, “Unsupportive contexts coupled with [teachers’] lack of knowledge, confidence to advocate for students’ needs, and weak propensity to analyze their practice hindered the development of integrated knowledge and practice” (p. 41).
Duration also emerged as a factor that teachers valued across a majority of the studies. Teachers reported that sustained duration mattered but in a way only partially related to the definition of sustained duration articulated in the five-factor framework (i.e., 20+ hours spread over the course of an academic year). For example, Englert and Tarrant (1995) noted that “it took teachers and researchers at least a year to begin to incorporate the practical and theoretical knowledge” (p. 334) presented and discussed in the TSG. Teachers and researcher-participants reported that duration of an entire academic school year or beyond was effective due to the provision of time to foster meaningful discussion, deepen understanding, and increase willingness to implement new skills and practices.
Although the five-factor framework stipulates that effective PD is focused on a specific academic subject, qualitative data demonstrated that TSGs focused on non-subject-specific topics (i.e., student-directed IEPs, classroom management) also resulted in meaningful impacts. Despite not being focused on a single academic subject, teachers in these studies described that the focus on a given topic was facilitative of professional growth in valued areas of practice. For example, Eisenman et al. (2005) noted, “Teachers began early in the school year to build supportive structures within their classrooms and schools for teaching students the knowledge and skills to lead their own IEPs” (p. 200). Although this focus was not academic-content specific, teachers described it as driving positive change in their practice.
Furthermore, the qualitative data complicated the posited core theory of action in the five-factor framework, indicating that the five factors do not always trigger a change in teachers’ knowledge and/or attitudes that led to a change in practice. Instead, our analysis suggested that some teachers’ knowledge and beliefs prior to TSG participation moderated impact. Teachers with lower levels of baseline knowledge, a more teacher-centered view of instruction, and/or lower classroom management skills were less likely to experience the increase in knowledge needed to trigger progression through the five-factor framework’s theory of action (e.g., Brownell et al., 2014).
Perceptions of Student Impact
The second broad theme that arose across the qualitative research was how teachers understood their participation in a TSG to have influenced their students. Synthesized data suggested that teachers attributed two particular components of the study groups to enhancing student learning: (a) the collaboration among group members on shared topics of practice and (b) the iterative nature through which the study group examined teaching and learning. Over half of the studies produced codes showing that participants regarded the repeated and ongoing experience of critically reflecting on day-to-day instruction as key in improving student learning. For example, Hung and Yeh (2013) noted, As the group generated and implemented the collective ideas in their classroom practices, the teachers also continuously observed how their students performed in class and reacted to their . . . implementation, so they could make adjustments for follow-up instruction. (p. 162)
A smaller sample of studies also reported that teachers attributed improvements in student achievement to the collaborative nature of the TSG. The opportunity to cooperate with colleagues toward a shared goal of improving instruction and making meaning of student data led participants to improve their practice in ways that they described as enhancing student performance in their classrooms.
TSG-Specific Features With Added Value
The final theme that emerged regarded TSG-specific features that teachers emphasized as positively affecting their knowledge, practice, and/or students: (a) expert input and (b) connection to daily practice. Our coding indicated that teachers experienced these features as critical mechanisms that drove their professional growth. Nearly half of the studies reported that teachers believed expert input played a particularly supportive role in improving their instruction in areas such as classroom routines (Brownell et al., 2004), experimentation with newly learned concepts (Hung & Yeh, 2013), and emergence of a common knowledge base among participants (Englert & Tarrant, 1995). Teachers also described expert input as providing critical scaffolding in identifying problems of practice and initiating a problem-solving process (Englert & Tarrant), and researcher-participants reported that expert input catalyzed the acceptance of published research among participants (Gearhart & Wolf, 1994). In over half of the qualitative studies, thematic analyses also indicated that teachers attributed the TSGs’ tight connection to their daily practice as driving improvements in their instruction. “Members of the . . . research group devoted a good deal of energy to considering data sources to draw on for the most complete and complex picture of learning in their classroom” (Chandler-Olcott, 2002, p. 30). Participants commonly described that the key to meaningful and successful enactment of the TSG focus practices hinged upon the opportunity to implement ideas gathered through collaborative discussions and then return with data from that implementation for analysis (Eisenman et al., 2005; Englert & Tarrant, 1995). When TSG content was aligned with daily instruction (e.g., learning techniques for teaching mathematics that teachers would immediately implement and evaluate), teachers were supported in their enactment and refinement of new practices. Figure 2 illustrates the theory of action given the five-factor framework in relation to our analysis.

Theory of change given our results and the implicated components of the five-factor conceptual framework (Desimone, 2009; Desimone & Garet, 2015). Circles indicate areas of the framework in which our analysis suggests refinement is needed. Dashed line indicates moderating factors implicated in our analysis.
Discussion
This integrative review synthesized findings across the quantitative and qualitative literature on TSGs. Grounded in the five-factor framework (Desimone, 2009; Desimone & Garet, 2015), our analysis illustrated two major findings: (a) TSGs positively influence student and teacher outcomes, and (b) though the five-factor framework allowed us to reconceptualize this body of research in a meaningful way, further refinement of the framework is warranted.
TSGs’ Impact on Teachers and Students
Research Questions 1 and 2 asked whether a synthesis of existing literature would reveal evidence of TSGs’ impact on teacher knowledge, teacher practice, and student outcomes. The ubiquity of PD across PK–12 education and the range of research available on TSGs merited a more in-depth understanding of the ways in which this model has been shown to influence teachers and students. Results indicated that TSGs have demonstrated a positive impact on teacher and student outcomes, and that they have been shown to do so with a variety of foci (e.g., literacy, mathematics) and research designs. That said, there were mixed results in that some RCT studies did not yield the expected impact on students (e.g., Jayanthi et al., 2018) and multiple randomized experimental designs indicated that the TSG model performed similarly to other models in terms of impact (e.g., Meyers et al., 1997). Given this initial analysis, it was not clear why some studies demonstrated mixed or null results across outcome variables (e.g., Gersten et al., 2010; Jayanthi et al., 2018) whereas others showed positive results across all measures (e.g., Heller et al., 2012). Student results, in particular, were perplexing in this analysis. Although Cunningham et al. (2015) and Lewis and Perry (2017) demonstrated positive impacts on both teacher and student outcomes, Gersten et al.’s (2010) and Jayanthi et al.’s (2018) significant impacts on teacher outcomes did not equate with meaningful impacts on students—an indication that the underlying processes between teacher change and student learning are indeed complex and dynamic.
The addition of our third research question allowed for a reconceptualization of these data using a common conceptual framework that has been applied across the PD research field (Desimone, 2009; Desimone & Garet, 2015). Through this lens, additional patterns emerged. First, studies with mixed results did not enact TSGs using all five factors. For example, neither Gersten et al. (2010) nor Jayanthi et al. (2018) met criteria for coherence, so one possibility for interpreting these data is that a lack of coherence may have lowered impact on teachers and thus decreased impact on student achievement. In order to be coherent, PD must align with both district expectations and teachers’ current goals, knowledge, and practice. Given the large-scale nature of these two RCTs, it is unlikely that the content delivered through the TSG—to which participants were randomly assigned—was coherent with these teacher-level aspects of coherence. Our qualitative analysis provided complementary information regarding this topic: Teachers who reported less agreeable attitudes toward mandated participation in TSGs or less content knowledge of the study group’s focus (Brownell et al., 2006) were less affected by participation. This suggests that teachers experience differential impact based on coherence with individual needs and interests (Englert & Tarrant, 1995), underscoring Kennedy’s (2019) argument for considering teachers’ motivation to learn and alter their practice when making sense of PD research results.
Penuel et al. (2007) suggested that coherence may be better understood as alignment with teachers’ personal goals for learning, and that the policy- and district-expectation aspects of coherence comprise part of the interpretive frames through which teachers perceive a learning opportunity to be coherent or not. In alignment with Kennedy’s (2019) emphasis on motivation, Penuel et al. found that the level of coherence teachers perceived a PD to have significantly impacted their implementation of focus practices, suggesting that this factor may affect impact differently than factors such as active participation. Furthermore, Santagata et al.’s (2010) RCT found that teachers for whom a PD was aligned with their lesson pacing had significantly higher implementation effects than those in a nonaligned group. These findings correspond with Lewis and Perry’s (2017) theory that the challenge of scaling up an instructional improvement program hinges on such coherence, and that the tension between PD content and enactment is a critical factor in linking a PD to improved student outcomes. In agreement with Penuel et al., Lewis and Perry suggested that coherence should be understood at the teacher level (i.e., teachers’ knowledge and goals), and PD opportunities that involve active learning, collective participation, and sustained duration forge the cohesion of research-based instruction with teachers’ existing knowledge and beliefs by design. This occurs through challenges from colleagues, students, and the content itself—the negotiation of lesson plans, for example, creates an authentic (and thus coherent) need to build pedagogical ideas and knowledge. In a conceptual model, coherence is constructed through a PD process that is grounded in teachers’ daily practice. Coherence, then, likely supports enactment of newly acquired knowledge and strategies.
A second finding emerged regarding instances in which a TSG condition resulted in increased student learning in comparison to other PD models, despite the conditions meeting criteria for the same number of the five factors. For decades, scholars have acknowledged that although evidence exists that providing teachers with opportunities for collaboration on curriculum and instruction positively affects student achievement, few studies have illuminated how and why these outcomes play out across populations and settings (e.g., Goddard et al., 2007). Our synthesis sheds light on this issue. First, our analysis further confirms that teacher collaboration can positively affect student outcomes—Peercy et al. (2015), for example, noted, “The teachers’ process of co-construction affected not only lesson content, it also had an impact on student learning” (p. 878). Second, our results indicate that particular components of the TSG model may act as key mechanisms in this process. As aforementioned, three studies found that students whose teachers participated in TSGs outperformed peers with teachers in comparison conditions on measures that assessed deeper, conceptual knowledge—as opposed to less cognitively demanding skills, like computational ability—despite the TSG and the comparison conditions meeting criteria for the same number of factors. Our qualitative analysis indicated that elements of the TSG model beyond those defined in the five-factor framework affected student outcomes. Specifically, we found that expert input, a close connection to daily practice, and the collaborative, iterative nature of the study group acted as influential inputs and thus warrant consideration in theory linking PD to student outcomes.
We want to emphasize that these findings do not refute the five-factor framework as the five factors are meant to be seen as baseline components of effective PD (Desimone, 2009). However, our findings do bring into question whether additional components should also be included. Desimone and Garet (2015) emphasized a connection to teachers’ daily practice as an expanded aspect of coherence in their refinements to the framework, and collaborative structures could be similarly integrated into the active learning or collective participation factors, neither of which currently specifies that PD be grounded in a collaborative community of practice (Wenger, 1998). In addition, findings suggest that expert input played a meaningful role in a process that ultimately affected students’ conceptual understandings of math and science topics. Perhaps the collective participation factor could be expanded to acknowledge the importance of amplifying a community’s collective knowledge through some outside, expert input. It is possible that these three factors play a powerful role in supporting teachers’ enactment of improved instructional practices by supporting insights into their own instruction, a process that Kennedy (2019) identified as a powerful predictor of whether a PD influences instruction in a sustainable and meaningful way. At minimum, these findings indicate that systematic research is needed to clarify the role of (a) expert input, (b) connection to daily practice, and (c) the collaborative, iterative nature of study groups in the process connecting teacher PD to student learning.
Revisiting the Framework
Content Focus
Though the five-factor framework specifies that PD should focus on subject matter content and how students learn that academic content (Desimone, 2009), our results indicate that PD can positively affect outcomes when a study group’s focus is specific in nature but domain-general (i.e., not solely focused on a single academic subject), such as inclusive practices (Brownell et al., 2006) or instructional technology (Cifuentes et al., 2011). For example, Cifuentes et al.’s study met criteria for four factors (i.e., all other than content focus), yet their TSG demonstrated a meaningful impact on teachers’ practice (d = 1.31). Given this, PD focused on a singular topic—though not necessarily an isolated academic domain—should be considered focused, in which case the framework would have better explained these studies’ positive results.
These findings align with those of Kennedy’s reviews (2016, 2019), in which she demonstrated that a focus on content knowledge—compared to student behavior, engagement, or feedback—was not predictive of program effectiveness. Rather, her analysis suggested that it was the manner in which a PD facilitated integration of new content into teachers’ existing practice that predicted impact. Models that encourage a change in teachers’ practice by facilitating new insights into student thinking and problems of practice may be more powerful drivers of change across instructional domains, regardless of content focus. Our findings echo this, suggesting that further research is needed to determine whether and how focus on a specific but domain-general topic affects teacher and student outcomes.
Sustained Duration
Although the five-factor framework stipulates that in order to be effective, PD must be ongoing and composed of at least 20 hours of contact time, our results indicate that defining sufficient duration is more complex than a minimum-hour requirement. Jayanthi et al. (2018) did not meet the 20-hour criterion—but did spread the PD across the span of a school year—and demonstrated significant impacts on teachers’ knowledge and practice. Relatedly, Lewis and Perry (2017)—whose study also demonstrated meaningful impacts on outcome variables—did not meet the minimum-hour requirement over the course of their study, and Koellner et al. (2011) found that their TSG proved effective for teachers despite not meeting the 20-hour mark. The only instance in which our data suggested that having fewer than 20 hours of contact time mattered was in the extreme case of Meyers et al. (1997), in which contact time was only around 5 hours. Our qualitative synthesis shed further light on this phenomenon, suggesting that the ongoing nature of the TSG (e.g., being distributed across an entire school year) affected participants rather than the total number of hours. Participants reported that trust for facilitators increased over time, conversations deepened, and their willingness to implement new practices increased over the course of multiple months.
Furthermore, post-2009 research has suggested that identifying the time required for PD to affect teacher and student outcomes is complex and dependent upon learning objectives. For example, changing teachers’ use of procedural classroom behaviors is a vastly different task than developing inquiry-oriented instructional strategies (Desimone & Garet, 2015; Piasta et al., 2010; Sailors & Price, 2010). The minimum hours of contact time needed to establish impact is highly dependent upon a PD’s focus. Translating new knowledge into effective instructional practices is a complicated process (Desimone & Hill, 2017), and our data further suggest that this aspect of the framework be revisited.
The Framework’s Core Theory of Action
Finally, our data showed variation in teacher learning and the extent to which teachers were able to translate participation in PD to classroom practice. Specifically, teacher-level variables—such as years of experience (Brahier & Schäffner, 2004), preexisting views of instruction (Brownell et al., 2006), and willingness to engage in self-reflection (Brownell et al., 2014)—appeared to moderate whether participation in a TSG led to changes in teacher and student outcomes. Brahier and Schäffner (2004) found that teachers with 11 to 25 years of experience demonstrated the greatest changes in practice through study group participation, for example, and these findings reflect those of other studies that have complicated the five-factor conceptual framework’s theory of action. Just as student learning is complex, teacher learning is incremental, recursive, and varied, and long-term maintenance and generalization of new knowledge require sustained time and support (Borko, 2004; Carpenter & Fennema, 1992; Desimone & Hill, 2017; Fennema et al., 1993; Fennema et al., 1996; Franke et al., 2001). Accordingly, we concur with Desimone and Garet’s (2015) suggestion that future research explore the impact of providing teachers with a menu of options. This may increase coherence at the teacher level, which—given our analysis—appears to catalyze a process of instructional improvement that ultimately results in improved student learning.
Limitations
The findings of this integrative synthesis must be considered against its limitations, the first of which is related to our inclusion criteria. We did not include TSG research and models that have been presented in books, dissertations, conferences, symposiums, or grey literature, such as government and institutional documents. We opted only to include articles that had undergone peer review. Undoubtedly, additional information on TSGs is available from a range of sources, and it is possible that relevant findings exist beyond the scope of this review.
Second, though the majority of studies were implemented in the United States, two originated in Europe (Elster, 2009; Hofman & Dijkstra, 2010). Though we valued the inclusion of as wide a range of research as possible, we did not account for the differential impact of cultural elements, teacher education, and educational policies that likely affected teachers’ backgrounds prior to participation in research. However, the teachers in the United States also hailed from a variety of educational backgrounds, and—given that one of our findings was that background factors impacted teachers’ propensity to change—we believe these European studies provided valuable data that contributed to our findings.
Finally, though there is general agreement that PD research should focus on underlying processes rather than specific models, there has been critique of the particular framework in which we elected to ground our analysis. Kennedy (2016, 2019), for example, argued that classifying PD programs by design features fails to account for the underlying purpose or premise about teaching and teacher learning, and that a conceptual framework should account for domain-general challenges that transcend subject matter knowledge. Our findings reflect this critique, emphasizing the need for broadened future research and specific refinements to the five-factor framework.
Conclusion
The field of PD research continues to bear critiques of fragmentation and lack of common conceptual frameworks to support understanding of the underlying processes in effective PD (e.g., Hill et al., 2013). This synthesis allowed us to reconceptualize one relatively rich strand of this research using a commonly cited conceptual framework, which we understood as a step toward addressing this fragmentation. Our analysis underscores the value of employing such a framework in this type of review, and it highlights areas in which this particular framework needs further development. Our findings indicate that TSGs, when operationalized within the five-factor framework, do positively influence teacher knowledge, practice, and student outcomes. Additionally, our analysis suggests that there are particular factors of the TSG model—not specified in the five-factor framework—that increase PD’s impact on teachers and students. Findings further indicate that particular aspects of the five-factor framework merit further refinement (i.e., content focus and duration).
The purpose of performing an integrative synthesis is to stimulate future research through critical analysis of the existing body of work (Torraco, 2005). Given that most states require teachers to complete PD—but requirements rarely ensure that PD leads to meaningful growth for teachers and students—future research around the questions that this analysis raised will elevate the field’s understanding of effective models for teacher learning and underlying processes that result in improved teacher practice and student outcomes. Future studies should support the development of a menu of evidence-based options from which teachers can choose to cohere with professional growth needs and interests. As this occurs, we look forward to studies that systematically explore variations of various PD models and larger scale implementation studies that guide understanding of scalability in a variety of school contexts with a variety of school-based implementers.
Footnotes
Authors
ALLISON R. FIRESTONE is a PhD candidate in the University of California, Berkeley, Graduate School of Education, 2121 Berkeley Way, 4th Floor, Berkeley, CA 94704; email:
REBECCA A. CRUZ is an assistant professor at San Jose State University’s Lurie College of Education, 1 Washington Square, San Jose, CA 95102; email:
JANELLE E. RODL earned her PhD in special education from a joint doctoral program between University of California, Los Angeles, and California State University, Los Angeles, and is an assistant professor of special education (mild/moderate disabilities) at San Francisco State University, 1600 Holloway Avenue, San Francisco, CA 94132-1722; email:
