Abstract
This article reports a synthesis and meta-analysis of intervention studies investigating the effects of team-based learning on content knowledge outcomes. Team-based learning is a particular set of instructional components most often used in higher education classrooms. Authors of team-based learning reviews report that team-based learning improves students’ end of course grades, test performance, and classroom engagement. Students report that team-based learning is interesting, allows for deeper understanding of content, and prepares them more effectively for assessment and course performance. A total of 30 studies were located and synthesized. In total, 17 studies met criteria for a meta-analysis, yielding a mean effect size estimate of 0.55, p < 0.001 across all measures. Moderator analysis indicated that group size moderated the magnitude of effect to a statistically significant degree, with smaller group sizes contributing to additional effects. The meta-analysis is followed by a confirmatory synthesis of the remaining 13 studies. Implications for instruction incorporating the use of team-based learning are described.
A meta-analysis of the effect of team-based learning on content knowledge and comprehension
In higher education, classes require discipline-specific academic content knowledge and ask students to use discipline-specific ways to interact with academic text (Lea and Street, 1998; Shulman, 2005). These classes also require students to engage with academic communities that have distinct assumptions about meaning making and knowledge (Lea and Street, 2006; Lillis and Scott, 2007). Acquiring understandings of these discipline-specific ways of thinking and communicating is crucial for academic success. This can be difficult for some students, particularly when they are unfamiliar with the academic skills or background knowledge required (Thompson and Zamboanga, 2004). University instructors who use instructional practices in which students interact with and develop discipline-specific academic literacy do great service to their students (Shanahan and Shanahan, 2012). Despite this need, instruction continues to primarily include teacher-centered methods (Nystrand and Gamoran, 1991) such as lectures with little text reading and student discourse. The use of teacher-centered practices is partially rooted in the teacher belief that the best way to ensure content learning is for the instructor to present all necessary information to students (McKeachie and Svinicki, 2013). However, evidence suggests that more traditional teaching methods do not enable all students to appropriately engage with the types of academic literacy constitutive to higher education (Hake, 1998; Lea and Street, 2006).
Academic discussion surrounding text can enable students to construct rich understandings of the different types of knowledge and skills necessary to meaningfully engage with broader academic communities (Lea and Street, 2006). Therefore, consideration should be given to collaborative learning methods that focus on academic discussion and deep processing of text. Team-based learning (TBL) is an interactive instructional practice that appeals to educators who wish to, or who are able to, from a practical point of view, move beyond traditional models of teaching and learning (Michaelsen and Sweet, 2008; Sweet and Michaelsen, 2012) and provide students opportunities to encourage academic content knowledge and acquire discipline-specific academic literacy (Hrynchak and Batty, 2012).
Situating TBL within group-based learning
Group-based learning has been termed differently through the years: small group learning (Springer et al., 1999), collaborative learning (Bruffee, 1999), cooperative learning (Herrmann, 2013), and TBL (Michaelsen et al., 2004). Most group-based learning can be placed under one of three headings: (1) casual small groups, (2) cooperative learning, and (3) TBL (Michaelsen et al., 2004). Casual small groups are easy to implement and typically include activities where the teacher lectures for 15–20 minutes and then asks students to pair with the student beside them to discuss a question. After a few minutes, a few students share their answers with the whole class. Casual small groups can be effective in increasing student motivation (Machemer and Crawford, 2007), self-direction (Justice et al., 2007), and personal involvement (Rogers, 1983).
Cooperative learning is broader and can take on one of many different forms (Kagan, 1994). It usually involves carefully planned and structured group activities that are infused into a course of learning. Some common types include Jigsaw (Aronson et al., 1978) and Group Investigation (Sharan, 1990). The results of studies investigating university students’ attitudes toward cooperative learning has been mixed, with several reporting that: (a) cooperative learning can be difficult for students to navigate (Hillyard et al., 2010), (b) difficult interpersonal dynamics that are difficult to resolve on their own and (c) anxiety-provoking conditions to perform tasks that are not always clearly related to learning (Gillespie et al., 2006a, 2006b). Other students report large group sizes, a lack of relevance to broader coursework, difficulties with workload distribution, and management problems (Liden et al., 1985; Lizzio and Wilson, 2005), losing/having to save face, being bullied/intimidated, feeling the need to conform, distrust, resentment, conflict, and free riding, among others (Micari and Pazos, 2014; Robinson et al., 2015). That said, it can be effective in terms of higher achievement, greater persistence, and positive attitudes (Kyndt et al., 2013; Springer et al., 1999). Students appear to learn more, or at least the same, as peers participating in more traditional learning (Burgess et al., 2014; Fatmi et al., 2013). There are reports of higher test scores, higher course grades, and greater satisfaction with the course (Sisk, 2011); that students are more highly engaged, prepared for class, and perform better on course outcomes (Allen et al., 2013); and that students report learning more content, gaining a deeper understanding of information, heightened interest, and increased group-member participation (Altintas et al., 2014).
TBL “represents an even more intense use of small groups in that it changes the structure of the course in order to develop and then take advantage of the special capabilities of high-performance learning teams” (Michaelsen et al., 2004: 7). TBL is most appropriate in courses that meet two conditions: (1) students are required during the course to understand a significant body of information and (2) a primary goal of the course is to apply or use this content by solving problems, answering complex questions, resolving issues, and the rest. Prior reviews of TBL literature report higher test scores, higher course grades, and greater satisfaction with course (e.g. Sisk, 2011). Instructors reported that students were more highly engaged, prepared for class, and performed better on course outcomes (Allen et al., 2013). Students reported learning more content, gaining a deeper understanding of information, heightened interest, and increased group-member participation (Altintas et al., 2014).
TBL instructional activities
While collaborative learning involving teams has been a part of university learning for hundreds of years, more recently, university instructors have begun implementing TBL as a defined instructional method of teaching (Michaelsen and Sweet, 2008). TBL, in this particular model, includes several instructional activities, summarized in Figure 1 (Michaelsen and Sweet, 2011).

Typical instructional sequence in team-based learning.
Readiness assurance
Prior to meeting in class, students individually prepare by reading assigned material and studying. During class, students complete a brief quiz referred to as an Individual Readiness Assurance Test (iRAT; Michaelsen and Sweet, 2008) to increase individual student accountability and motivation, ensuring that individuals are prepared to participate in group discourse. Following the iRAT, students are organized into permanent groups, or teams of five to seven students. Students take the same challenging quiz again with their team (Team Readiness Assurance Test (tRAT)), where they must negotiate knowledge and come to consensus through discourse before answering. Students then receive immediate feedback on their group answers to the tRAT. Groups can then complete an evidence-based appeal to the instructor when they believe they can make a case for answers the instructor considered inadequate. The two-step assessment process serves to provide individual accountability and motivates students to prepare for class, while student discourse and immediate feedback help students clarify and extend their knowledge. To support students’ understandings, the instructor can conduct targeted class discussion centered on especially challenging topics or topics in which considerable disagreement existed (Michaelsen and Sweet, 2011).
Concept application
Following the readiness assurance process, students then participate in knowledge application-based activities that allow the teams to use knowledge to address significant, real-world problems. Finally, teams participate in a peer evaluation process in which students provide feedback on their team’s overall success and their teammates’ contributions.
Despite its broadening use and popularity, few evaluations of TBL have been conducted. Reviews are narrative in nature and do not provide effect sizes for important academic outcomes. The purpose of this study is to learn more about under what conditions the above model of TBL has been most successful. We also provide effect sizes for important academic outcomes and meta-analytic findings, a unique contribution to the TBL literature. The following research questions were addressed. What are the effects of TBL on content knowledge? What variables moderate the effects of TBL?
Method
Search procedure and criteria
A multi-step process was used to conduct a comprehensive search of TBL intervention studies. Electronic searches of the MEDLINE, Business Source Complete, ERIC, Academic Search Complete, and PsycINFO databases were completed to locate studies in peer-reviewed journals between 2000 and September 2014. A combination of the descriptors TBL or team learning and outcomes was used to locate articles. We reviewed the initial yield of 941 abstracts for inclusion and conducted a hand search of previous systematic reviews of TBL (Burgess et al., 2014; Fatmi et al., 2013; Haidet et al., 2014 Sisk, 2011).
A total of 30 studies met the selection criteria for the synthesis. Studies were selected based on the following criteria:
The study self-identified the treatment as TBL and contained the components of TBL outlined by Michaelsen and Sweet (2008).
The study was published in a peer-reviewed journal.
The study utilized an experimental or quasi-experimental research design to evaluate the effects of TBL.
Participating students were in undergraduate or graduate level classes.
Study design included a dependent measure of learning outcomes. We excluded studies that only included descriptive or qualitative measures (e.g. student perceptions, engagement, motivation).
Data analysis
Coding procedures
An extensive coding sheet was developed to capture information about participants, study design, features of treatment and comparison conditions, and outcome effects. The team of raters included one doctoral level researcher, two doctoral candidates, and two doctoral students. Because of the diversity of research designs and outcome measures, each article was assigned to two coders and was double coded. Differences in coding were discussed until consensus was reached. The first author (E.S.) triple checked all codesheets for accuracy prior to developing a database, which was also double-checked.
Effect size calculation
For all studies, the Hedges (1981) procedure for calculating unbiased effect sizes for Hedges’ g was used. Each estimate of Hedges’ g was weighted by the inverse of its variance to account for potential bias in studies with smaller samples. All effects were computed using the Comprehensive Meta Analysis (Version 2.2.064) software (Borenstein et al., 2011). Hedges’ (1981) g can be interpreted as follows: 0.2 = small effect, 0.5 = medium effect, and 0.8 = large effect.
Meta-analysis procedures
A total of 17 studies were included that reported sufficient information to allow effect sizes to be computed. Dependence of effect sizes for studies that included more than one outcome measure was resolved by averaging the effect sizes from all measures and including the average and its standard error (SE) in the meta-analysis, as recommended by Borenstein et al. (2009). A random-effects model was used to analyze the effect sizes and compute estimates of mean effects and SEs. Advanced methodological approaches to meta-analysis, such as multilevel modeling (Hox, 2002) and structural equation modeling (Cheung, 2008), were not implemented in the random-effects analyses of the effect sizes due to the small number of studies available. Mean effect size statistics and their SEs were computed.
Heterogeneity of variance was evaluated using the Q statistic. Moderator variables were introduced into the random-effects models for the post-secondary-level meta-analysis, resulting in a mixed-effects model. Given the small number of published studies on TBL, only two moderators were tested: group size (five or fewer students vs more than five students) and session length (1 hour or less vs more than 1 hour). These moderators could not be treated as continuous variables because both generally were reported as ranges in the studies included in the analysis. In some cases, studies did not provide sufficient data to code these moderator variables. These studies were included in the overall estimate of the mean effect size, but were dropped from the moderator analysis where data were missing.
Results
A total of 30 studies are included in this synthesis and reflect a variety of study designs. We report study features (e.g. study design and sample characteristics) followed by a meta-analysis of findings from 17 qualifying studies at the post-secondary level. This is followed by a narrative synthesis of the remaining 13 studies that did not qualify for meta-analysis.
Study features
In the following section, we summarize information about study characteristics and design elements. Detailed information about each study is included in Supplementary Tables 1 and 2 that can be downloaded at http://www.meadowscenter.org/files/general/TBL_Manuscript_Tables.pdf
Sample characteristics
Sample sizes ranged from 31 to 925 students, with a mean of 202 and a median of 121 participants. The majority of studies targeted graduate school students (n = 26), mostly in the field of medicine (n = 16). This was followed by studies that included pharmacy students (n = 8).
Study design
The corpus of 30 studies included 20 treatment-comparison, 4 crossover, 1 multiple treatment, and 5 single group studies. Several design elements strengthen the reliability and lend credibility to findings. These design elements are included by the What Works Clearinghouse (WWC) standards for reviewing studies (US Department of Education, 2014), the following of which are required to be rated as meeting WWC standards without reservation: (1) randomized control design, (2) evidence that attrition did not cause attrition bias, (3) baseline equivalence is established, and (4) independent variables are reliable and valid. Another design element critical to establishing reliability of findings is fidelity of implementation (Swanson et al., 2013). The number of studies that possess these design elements is reported in Table 1. No studies reported all five elements.
Study design elements.
Intervention dosage
A total of 25 studies reported dosage information. The number of TBL sessions ranged from 1 to 24 and was not reported in eight studies. Fourteen studies reported the length of sessions, with a range of 50–270 minutes and a mean of 119 minutes. Information about frequency and length of sessions is reported in Table 1.
Meta-analysis
A total of 17 studies qualified for meta-analysis. All outcomes were related to content knowledge and included both standardized and researcher-developed measures of content knowledge. Both researcher-developed and standardized content knowledge outcomes were included in effect size calculation.
Mean effect size estimate and moderator analysis
The mean effect size estimate for the 17 studies that included post-secondary-level participants was 0.55 (SE = 0.10; p < 0.001; 95% confidence interval (CI) = 0.37, 0.74), indicating a moderate positive effect of TBL on content knowledge that is significantly different from zero. The variance associated with the effect sizes was statistically significant (Q = 103.47; degrees of freedom (df) = 16; p < 0.001). This indicates that effect sizes varied across studies and there may be moderators that explain that variation. Moderator analyses (Table 2) revealed that group size moderated the magnitude of the effect sizes to a statistically significant degree (Q = 4.01; df = 1; p = 0.045), with smaller groups associated with larger effects. Session length was not a statistically significant moderator of the magnitude of effect. However, given the small number of studies included in the moderator analysis, inadequate power may have led to this result. See Table 2 for effect sizes by moderator, SEs, and Qbetween statistics. We could not examine the differences between effect sizes from standardized and researcher-developed content knowledge assessments because too few studies included standardized measures.
Results from moderator analysis of post-secondary-level studies.
SE: standard error.
Publication bias
Publication bias was evaluated using the trim-and-fill approach (Card, 2012). This approach builds on a visual inspection of a funnel plot of effect sizes for asymmetry through an iterative process that seeks to correct asymmetry in a funnel plot of effect sizes. This asymmetry can be evidence of the omission of null or very small effect sizes in studies that were conducted but not published. Trim-and-fill analysis deletes the effect sizes causing the asymmetry, calculates a mean effect size, and then returns the deleted effect sizes. Effect sizes for unpublished studies that may have been omitted are imputed, and the analysis repeats until the plot is symmetrical. The results of the analysis indicated whether estimates of mean effect size may be biased by the exclusion of effect sizes from unpublished research. The meta-analysis of the post-secondary-level studies indicated that publication bias did not affect the mean effect size estimate.
Synthesis of remaining studies
A total of 13 studies met all of the inclusion criteria but did not provide enough information for effect size calculation and therefore could not be included in the meta-analysis. Meta-analytic findings indicate a moderate sized positive effect in favor of students who participated in TBL. It is interesting to consider whether or not these remaining 13 studies align with the meta-analytic results. The question here becomes, “Is TBL associated with student improvement in content knowledge?”
Researchers used a variety of outcomes in the 13 additional studies, including course grades, final examination grades, and pre- to post-test gains. While authors of one study reported that students assigned to the TBL treatment condition performed equally well as students in the lecture-based comparison group on their final examination, authors of the remaining 12 studies reported that students who received TBL outperformed students who did not. Three reported improvements in class grades (Carmichael et al., 2009; Espey et al., 2008; Nieder et al., 2005), five reported higher final examination grades for students who engaged in TBL (Inuwa et al., 2012; Koles et al., 2010; (Letassy et al., 2008; Nyindo et al., 2014; Vasan et al., 2011), and three reported gains from one year to the next (Year 1 being no TBL and Year 2 being TBL; Pogge et al., 2013; Rao and Shenoy, 2013; Zgheib et al., 2010). Finally, one study investigated the role of the iRAT and tRAT on final examination scores. Students who completed both the iRAT and tRAT outperformed students who only completed the tRAT alone. This result was reported for both above and below average students, providing some evidence that the combination of the iRAT and tRAT is important in improving content knowledge outcomes.
Discussion
In this synthesis, we examined the effects of this particular model of TBL on content knowledge, that is, knowledge of the subject matter. The meta-analysis described in this article is unique in that we report effect sizes and results from moderator analysis. The overall mean effect size of 0.55 indicates a moderate positive effect of TBL on content knowledge when compared to non-TBL comparison groups. In addition, group size moderates the magnitude of this effect, with smaller groups associated with better outcomes. Conclusions from the synthesis of 13 additional studies that did not qualify for the meta-analysis aligns, with authors reporting improvements in class grades, final examination grades, and gains from one year to the next. These results align with prior reviews of the broader corpus of small group learning studies that reported an effect size of 0.54 in achievement (Kyndt et al., 2013). This meta-analysis provides more conclusive evidence that this model of TBL as an instructional practice enhances the knowledge that students have of their subject matter.
TBL is a multi-component intervention. As with all multi-component interventions, measured effects are reflective of the intervention as a whole and do not provide information related to what individual intervention components may be causing the effect. However, based on research reporting the effects of individual components, we can suggest hypotheses. The effect of TBL may be related to the assessment component. Phelps (2012) investigated the effect of assessment on student achievement across 177 studies and reported large mean effect sizes ranging from d = 0.055 to 0.88, indicating that students who take a test outperform those who spend the same amount of time engaged in other activities. The related issue of “test motivation” is realized in two forms: (1) intrinsic motivation and (2) extrinsic motivation. Within TBL, both types of motivation are tapped. There is intrinsic motivation inherent in doing well on the individual quiz (that is, iRAT) in order to be prepared for the later knowledge application activity. Extrinsic motivation is addressed through the group quiz (tRAT), where students may work harder or better in order to perform well in front of their peers. Even further, this tRAT provides an opportunity for immediate feedback. It is well established that testing with feedback produces the strongest positive effect on achievement (e.g. Fuchs et al., 1984; Phelps, 2012). Given the strong evidence supporting the “test effect” coupled with the fact that several TBL components include assessment and feedback (e.g. iRAT, tRAT, targeted instruction), future studies are needed to compare TBL to assessment only to determine the role of assessment in driving the TBL effects.
When considering the hypothesis that the social component of TBL (e.g. team work and discussion in small groups) is responsible at least in part for the outcomes, we can look to meta-analytic results to learn more about under what social conditions TBL is more effective. The number of students per heterogeneous TBL discussion group appears to affect the magnitude of TBL’s effect, suggesting that TBL is be more effective when implemented in smaller (i.e. fewer than 5) rather than in larger groups (i.e. more than 5). The structure and activities that characterize TBL help us interpret this finding. A key feature of any collaboration, including TBL, is the co-construction of meaning through group discourse. By engaging in discussion with one another, students provide and receive immediate feedback about their understanding of ideas. It is possible that smaller groups can more efficiently share ideas and better communicate to co-construct meaning. Smaller groups may also compel students to participate more fully, speak more often, and exhibit more effort on difficult tasks, making “social loafing” less likely (Jackson and Williams, 1985).
As with any synthesis, the findings are limited by the quality of the research included. First, none of the 30 studies that met inclusion criteria also met all of the What Works Clearinghouse evidence standards. Several studies were missing important descriptive information such as sample characteristics or the number of students in the comparison group. Other studies reported improvement with no statistical information to substantiate their claim (e.g. Inuwa et al., 2012; Letassy et al., 2008). A majority of the studies examining TBL introduced inherent bias by convenience sampling graduate medical or pharmaceutical students. Random assignment to groups or matching a comparison group could potentially mitigate this bias, but no studies employed random assignment. Also, pretesting was seldom carried out, and many studies included a limited use of standardized measures, relying heavily on quiz scores or course grades. Typically, more rigorous studies yield smaller effects than those that are less rigorous (Swanson, 1999). Studies appeared to demonstrate moderate overall effects, but confidence in these findings is reduced because many of the studies were poorly designed. Related to meta-analytic methods, the number of studies included in moderator analysis was adequate but rather small. Indeed, larger samples of studies yield more precise effect size estimates than smaller samples (Borenstein et al., 2009). Additional high-quality studies of TBL implementation would help further verify the validity of the 0.55 effect size reported here.
Footnotes
Acknowledgements
Lisa V McCulley, David J Osman, and Michael Solis completed this work while at the University of Texas at Austin.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research reported here was supported by the Institute of Education Sciences, US Department of Education, through Grant R305F100013 to The University of Texas at Austin as part of the Reading for Understanding Research Initiative. The opinions expressed are those of the authors and do not represent views of the Institute or the US Department of Education.
Author biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
