Abstract
This investigation studied the effects of the Multiple Literacies in Project-Based Learning science intervention on third graders’ academic, social, and emotional learning. This intervention includes four science units and materials, professional learning, and post-unit assessments; features of project-based learning; three-dimensional learning (National Research Council, 2012); and the performance expectations from the Next Generation of Science Standards (NGSS Lead States, 2013). The intervention was evaluated with a cluster randomized control trial in 46 Michigan schools with 2,371 students. Results show that students who received the intervention had higher scores on a standardized science test (0.277 standard deviation) and reported higher levels of self-reflection and collaboration when involved in science activities.
Today’s young students face the increasing consequences of climate change, the evolution of viruses and bacteria that pose hazards to their health, and the accelerating pace of new technologies. The rate of these changes is likely to increase in the future, which highlights the need for today’s students to develop a foundation of scientific ideas and practices to explain phenomena and find solutions to problems. However, if young students are to have a continuing interest in science and address its impact on their lives, they need high-quality elementary science instruction that helps them maintain a positive attitude toward science. Yet science education at this level has been de-emphasized, in part by the prior legislation of No Child Left Behind (2001), which focused on improving mathematics and reading achievement; this policy pressed many states to reduce the already small amount of time allocated for science instruction (Dorph et al., 2011; Griffith & Scharmann, 2008). This shortened time for science continues today and is often inequitably distributed among K–3 classrooms and racial and ethnic groups (Curran & Kitchin, 2019).
The decade of research following the enactment of No Child Left Behind revealed not only persisting inequities in science learning opportunities (Kohlhass et al., 2010) but advances in how people learn (National Research Council [NRC], 1999, 2007) and the need for increasing science content knowledge and scientific skills (Schmidt et al., 2005). These findings raised serious concerns among the scientific and policy community that resulted in two major reports: A Framework for K–12 Science Education (hereafter, Framework; NRC, 2012), which identifies the three dimensions of science knowledge (i.e., disciplinary core ideas [DCIs], crosscutting concepts [CCCs], and scientific and engineering practices [SEPs]), and the Next Generation Science Standards (NGSS; NGSS Lead States, 2013), which articulate a set of three-dimensional performance expectations (PEs) for Grades K–12. Although they are endorsed by the scientific research community, education and policy stakeholders, and most U.S. states (Bendici, 2019), integration of these reforms into science classrooms has encountered multiple challenges, including accessing evidence-based high-quality science curricular materials and teacher learning opportunities. This point is underscored in the National Academies of Science’s (2021) Science and Engineering in Preschool Through Elementary Grades: The Brilliance of Children and Strengths of Educators, which highlights the need for “research on effective approaches . . . promising instructional approaches that integrate content; design use of curriculum and instructional materials; and how to support teachers through professional learning opportunities” (p. 2).
The development and testing of Multiple Literacies in Project-Based Learning (ML-PBL) responded to the need for a research-based, empirically tested innovative intervention that would deepen students’ use of scientific knowledge and practices and increase their academic, social, and emotional learning. Given these objectives, the ML-PBL intervention developed a system’s approach to support elementary students’ science learning with high-quality teacher and student instructional and learning materials, teacher professional learning (PL) experiences, and classroom-based assessment tasks (Krajcik et al., 2021). Anchored in the principles of project-based learning (PBL), ML-PBL focuses on having students investigate questions and make sense of phenomena that they find personally important, with the intention of transforming classrooms into places where students work together, generating knowledge and solving meaningful problems.
Beginning as a design-based study for third graders, ML-PBL underwent several rounds of revisions and testing over the course of 4 years, including teacher experiments, classroom pilots, a field test, and, most recently, an efficacy study to determine whether the intervention enhanced students’ science academic, social, and emotional learning. 1 In 2018–2019, a cluster randomized control trial (RCT) was conducted with 2,371 third graders (1,165 treatment and 1,206 control) in 46 schools (23 treatment and 23 control) with 91 teachers (41 treatment and 50 control) in 111 classrooms (54 treatment and 57 control). Results of this study and its impact on the science learning of a diverse student population of third graders—whose individual demographic characteristics varied by race and ethnicity, gender, socioeconomic status (SES), English language learner (ELL) status, and the use of individualized education plans (IEPs)—are reported below.
Theoretical Framing of the ML-PBL Intervention
The design of ML-PBL recognizes and responds to the importance of developing scientific knowledge and practices that could increase students’ science learning. Based on the ideas of PBL and incorporating the principles of the Framework and the PEs described in the NGSS, the ML-PBL team designed an intervention for a wide range of students, especially those who are typically excluded from science learning opportunities. Recent studies show that science is often not taught in the early grades, which is more likely to occur in low-income and minority schools, limiting students’ access and learning opportunities to emerging scientific knowledge and practices (NRC, 2021). Moving away from the conventional science curricula typically found in elementary schools, the ML-PBL intervention was constructed to actively involve students in asking questions about scientific phenomena and exploring these through experiential activities that seek to approximate the practices of scientists and engineers.
Foundation of the ML-PBL Design
PBL has a rich history grounded in the philosophy of John Dewey (1938), who advocated for learning through discovery by investigating meaningful problems of personal relevance. The concept of discovery has been subsequently elaborated by science education researchers, who encourage problem- and inquiry-based learning that often has learners creating “projects” (see Thomas, 2000). In the 1990s, PBL received growing recognition for its emphasis on building the capacity of learners to apply knowledge by making decisions, solving problems, and evaluating their solutions (Blumenfeld et al., 1991). By 2000, PBL had gained an important distinction by clarifying its differences from other problem- and inquiry-based learning approaches. Fundamental to PBL was the formulation of a set of design principles that guided an instructional model that could be reviewed, revised, and tested (Krajcik & Shin, 2014; E. C. Miller & Krajcik, 2019). Within the research community, however, scholars have had different approaches to what the PBL design principles should prioritize and entail. 2 Currently, there appears to be a closer consensual understanding of what is and what is not PBL. Condliffe (2017) argues that this agreement is a consequence of “curricularizing” PBL into situated classroom practices, which guide teachers in how to best implement PBL into their instructional activities. The “curricularization” of PBL has resulted in many new studies that have used several ideas from the Framework and the NGSS for building science learning activities.
One of the first of these ideas incorporated into several new investigations focuses on the need for meaningful science learning for all students, especially for those marginalized, underrepresented groups in the United States, where historical and cultural identity are often ignored in science textbooks, instructional materials, and pedagogical practices (Lee et al., 2019). Many recent studies focus on these ideas by strengthening learners’ identity formation (Jagers et al., 2019), engaging students in sensemaking from collective experiences (Easley, 2020), building collaborative experiential science activities (Baser et al., 2017), intersecting science learning with other disciplines (Arias et al., 2016; Fitzgerald, 2020), and providing equitable learning opportunities (Haas et al., 2021; Pinkard et al., 2017). Although they often have similar and overlapping goals, the research approach and methods for evaluating the impact of the various interventions among these studies differ. For example, some rely on narratives (Pinkard et al., 2017), mixed methods (Fitzgerald, 2020), or quasi-experimental designs (Baumfalk et al., 2019). The ML-PBL intervention, adhering to many of the same objectives of these studies, has taken a different path by estimating the impact on science learning by using a causal design.
Model Identification
The theoretical model of ML-PBL is grounded in three foundational components. The first focuses on the Framework (NRC, 2012), which promotes shifting classroom learning practices from acquiring disconnected science facts and tasks based on memorization to environments where students make sense of phenomena, design solutions to complex real-world problems, and use the three dimensions of scientific knowledge (DCIs, SEPs, and CCCs). DCIs are the fundamental ideas of the scientific disciplines of Earth and space sciences, physical sciences, life sciences, and engineering technology and application of science. They focus on the most powerful and generative ideas of science that build across the K–12 spectrum (Duncan et al., 2016). SEPs describe how scientists and engineers explore the natural and design world, increasing in complexity across grades (Schwarz et al., 2016). CCCs are ideas that scientists apply across disciplines and are used to explore phenomena or solve problems (Nordine & Lee, 2021). These dimensions, when used together, are often referred to as “three-dimensional learning,” which involves learners in explaining a range of natural phenomena and solving challenging problems, including those from engineering. Although each of the dimensions is important on its own, together they support students in a figuring-out process that is central for exploring and explaining phenomena, which is prominent in the ML-PBL intervention model.
The next element of the ML-PBL model incorporates the NGSS (NGSS Lead States, 2013), which provides a set of PEs that integrate the Framework’s three dimensions of scientific knowledge. Using these PEs, the ML-PBL team targeted the overall learning goals for the units that have been created. Materials were designed so that students would have opportunities to build toward the PEs for a given grade level. Within each unit are smaller lesson-level learning goals for instruction that are sequenced into smaller-grained performance-based learning goals. As students master these learning goals, they build on one another, ensuring learning coherence within and across the units (Fortus & Krajcik, 2012). 3
The third foundational feature of ML-PBL builds on the principles of PBL that emphasize the importance of the driving question (DQ), which explores a compelling natural phenomenon to be explained or a complex problem to be solved. To answer the DQ (and smaller lesson-level questions), students engage in scientific practices, such as planning and carrying out investigations, building models, and constructing explanations. In PBL classrooms, students are expected to collaborate with one another. It is anticipated that this student collaboration will allow for exchanging ideas and building knowledge, which are likely to occur when creating artifacts that are purposely designed to support science literacy and the use of scientific practices. When engaged in these PBL practices, teachers are encouraged to have students draw from their life experiences, recognizing and elevating differences in personal identities and cultural histories. Finally, the constructed ML-PBL performance-based assessments are used to capture students’ emerging scientific understandings and practices.
These three foundational components (the Framework, the NGSS, and PBL) inform the design of the ML-PBL learning activities and materials, PL for teachers, and assessments, all of which are anticipated to directly affect the learning context at the classroom level. Here, the “learning context” represents the environment, where it is expected that students will figure out phenomena that they find personally relevant and meaningful, solve problems, use scientific practices, create artifacts, and work in collaboration with classmates. When these ML-PBL experiences occur, students are more likely to become academically, socially, and emotionally engaged.
We envision engagement in ML-PBL as embodied in the students and as being measured by obtaining information from them when they experience different learning tasks. We assume that students will become more engaged at the intersection of interest, challenge, and skill building (Schneider et al., 2016), when posed through experiences and materials that spark students’ interest in “figuring out”challenging questions to make sense of phenomena. These challenging questions are designed to support students in reflecting with an increased wonder about how the world works, assuming responsibility for their own and others’ contributions to solutions, and appreciating the value of working in teams. These student constructs are defined in this study as self-reflection, capacity for collaboration, and taking ownership and responsibility for their own and others’ work, which are viewed as important social and emotional learning (SEL) experiences (for further discussion of these concepts, see Durlak et al. [2015], Jagers et al. [2018], and Jagers et al. [2019]).
Figure 1 shows the relationships among the features of the ML-PBL intervention.

Logic model of ML-PBL
Beginning on the left side of the diagram are the PBL principles (DQ, exploring and explaining phenomena, developing artifacts, fostering collaboration, and promoting equity, described below) and the three dimensions of scientific knowledge described in the Framework (NRC, 2012) and NGSS PEs. All of these elements are incorporated into ML-PBL instructional projects and woven throughout the materials, PL, and assessments.
Features of the ML-PBL Intervention
Instructional Materials and Activities
The Grade 3 ML-PBL curriculum consists of four units, each framed by a DQ and an anchoring phenomenon and culminating in students developing an artifact. 4 The four units include Squirrels (Adaptation), Toys (Forces and Motion), Birds (Biodiversity), and Plants (Weather/Climate) (see Table 1). In each unit, the DQ gradually and purposefully involves students in using the three dimensions of scientific knowledge to explain and predict a phenomenon or develop a solution to a problem by helping them wonder, persist, and make sense of their world.
Units, Driving Questions, and Phenomena
Note. For a more complete description of the PEs, see online Appendix A. Additional information on the units can be found on this website: https://mlpbl.open3d.science/
For example, the Squirrels Unit’s DQ is “Why do I see so many squirrels, but I can’t find any stegosauruses?” This unit focuses on learning how species survive and adapt to changes in the environment over hundreds of millions of years. The big ideas (DCIs) in this unit focus on how survival depends on adaptation and how changes in the environment (whether natural or not) can cause modifications in the populations of organisms. In the Squirrels Unit, students explore “What do squirrels need to survive?”—followed by questions about the squirrel’s physiology and environment. Exploring the past through fossils, students learn how scientists use that evidence to trace the changes in organisms over time and analyze why some species die out and some survive. Throughout the unit, learners ask questions based on phenomena they experience and build models to explain the various phenomena and the questions asked and posed to them.
Within and across ML-PBL units, students have multiple opportunities to experience firsthand scientific phenomena, ask questions based on their observations, and become involved in activities that help build their science knowledge through planning and undertaking investigations, constructing scientific explanations, and creating models of scientific phenomena. These experiences rely on direct observations of phenomena and a range of traditional print, multimodal, and digital texts, videos, and computerized simulations. In the units, students participate in numerous and varied learning opportunities to record data from firsthand investigations, write scientific explanations, develop and revise models, draw and describe plans for engineering design solutions, and seek feedback. Throughout these experiences, students collaborate with other classmates, either with the entire class or in small groups.
PL in ML-PBL
In this study, the PL experience introduced and helped treatment teachers understand the ML-PBL theoretical model and its enactment in the classroom. Key aspects of PL included developing usable knowledge of the features of PBL, reviewing the scope of the units, enacting some of the experiences in which the students would engage during the lessons, exploring how to use the materials and their relevant experiential tasks, and learning about the construction of various student artifacts and assessments. The PL activities supported teachers in enacting the intervention with its highly developed and specified materials as a roadmap for which adjustments could be made to ensure equitable learning opportunities that were culturally and historically responsive to the students, their families, and their communities.
The PL experiences in ML-PBL underscore the importance of teachers creating classroom environments that affirm cultural identity, responsible ownership, and collaborative productive relationships. A key aspect of PL experiences is supporting teachers in providing learning opportunities for all their students by forming a bridge from the students’ home lives to those at school. PL also assists the teachers in crafting meaningful discourse moves that identify supportive structures that solicit ideas from students and validate their contributions and participation. Critical to ML-PBL’s design is its interdisciplinary nature, by which teachers learn how science should be used in classrooms to develop speaking, listening, reading, viewing, and writing skills, thereby intersecting literacy and science (Krajcik et al., 2021).
At the beginning of the school year, treatment teachers experienced a 3-day PL session about the NGSS and an in-depth explanation of PBL. Teachers shared their insights regarding how they would be using the ML-PBL materials. At this time, teachers also learned about the research activities of which they would be a part, including the collection of student artifacts and assessments and classroom observations. Three additional in-person PL sessions occurred during the academic year prior to each unit. If teachers could not attend a session, the PL group offered make-up sessions.
In addition to the face-to-face meetings, PL facilitators on the team met with groups of teachers via video conference. These sessions occurred approximately every 2 weeks. The goal was to learn from the teachers about what worked, what was challenging, and what needed support. On average, treatment teachers received approximately 7 days of PL (counting in-person and formal virtual hours) throughout the school year. 5 Teachers received continuing education credits from their districts for attending these in-person sessions.
The control teachers were promised the treatment, including the PL and all material and equipment, the following year. They received up to 6 hours of PL on the NGSS and three-dimensional learning (NRC, 2012) in the 2018–2019 study and were asked to collect data from their students, complete pre- and post-teacher surveys, and permit observations of their classrooms with an incentive. Originally, the offer of treatment was to take place in 2019–2020; however, the COVID-19 pandemic interrupted its enactment.
Assessments
Two types of assessments were implemented in the ML-PBL intervention. The first was the post-unit tests, given only to treatment students; these included the formative embedded assessments and artifact construction that are part of the ML-PBL materials. The second was a summative assessment designed by the Michigan Department of Education (MDE) and given to treatment and control students upon completion of the intervention (details are discussed in the Methods section). The post-unit assessments were designed with items that used the three dimensions of learning to make sense of phenomena similar but not identical to the phenomenon or problem featured in the unit (Bartz & Chen, 2021; Harris et al., 2015; Li, 2021; Mislevy & Riconscente, 2006). Rubrics and scoring protocols were developed for these items and scored by raters who were recruited and instructed on the use of the rubrics. Interrater reliability was calculated multiple times over the course of each unit. 6
Testing the Intervention
Based on the results of the teaching experiments, pilot test, and field test, designers modified the intervention each time in preparation for the larger efficacy study, which began in the fall of 2018 and continued through the spring of 2019. 7 Following are the main research question and exploratory research questions examined in this efficacy study.
Main Research Question
1. What is the main effect of this intervention on third-grade students’ science learning? Do ML-PBL treatment students outperform students in the control group on an independent summative science assessment?
Exploratory Research Questions
2. Does the treatment effect differ by students’ gender, race and ethnicity, SES, ELL status, or having received an IEP?
3. Does the treatment lead to changes in students’ self-reported skills in self-reflection, collaboration, and responsibility for their own and others’ work?
4. Are there significant differences in key teacher practices between the treatment teachers and the control teachers? Does the treatment yield a change in teacher practices?
Methods
Study Design and Sampling
One of ML-PBL’s major considerations was to improve the science academic, social, and emotional learning of low-income and underrepresented minority students. To achieve this goal, a sample of schools from the Detroit Public School Community District (DPSCD) was recruited. The Michigan sample site selection outside DPSCD was organized into three regions: one on the west side, another in the eastern-central section, and one in the northern part of the state. To secure cooperation, several intermediate school districts and district superintendents were contacted about the study and their willingness to become involved. Upon their approval, a memorandum of understanding was drafted and signed, indicating their agreement to participate.
School eligibility for the ML-PBL program was determined by several factors: the school’s status as a public and nonspecialized school; a Grade 3 enrollment of more than 25 students; the inclusion of racial and ethnic minorities; and/or the inclusion of students receiving free and reduced lunch. Using the Michigan State Longitudinal Educational Data System, a full list of public elementary schools was obtained within each of the major districts, Genesee, Kent, DPSCD, and in several districts in northern Michigan. Once the list of schools in each region was obtained, each school’s potential eligibility to participate in the intervention was determined. 8
Randomization
For randomization, schools were blocked by region. DPSCD was its own block, and the other three Michigan regions were combined into another block. To ensure a balance between the treatment and control schools, the sample was randomized and re-randomized until the p values of the differences between the treatment and control schools on each school-level covariate were at least 0.2 (p > 0.2). This indicates no evidence of difference between the treatment and control schools. The school-level covariates used were the proportion of student racial composition, the proportion of free and reduced school lunch, Grade 3 enrollment, and average grade-level mathematics and reading scores from the Michigan Student Test of Educational Progress in 2017.
After school randomization, teacher participation was initiated. A few teachers taught multiple classrooms (n = 9 for the treatment teachers, and n = 5 for the controls), with treatment teachers requesting that they be able to give the treatment to all sections. To accommodate them, all sections were given the treatment. We randomly chose a focal classroom from each teacher so that the analysis would not be overrepresented by students from a few teachers. In our analysis, we estimated the treatment effect for the focal group and all the other sections (see Table 4).
Sample Attrition
After school randomization, 25 schools were assigned to each treatment and control condition. From this initial randomization, four schools attrited, two from each condition. Three of these schools failed to provide class lists and were considered as attriting schools. One additional school provided class lists but did not provide the summative assessment and was also considered an attriting school. To examine bias from differential school attrition, the percentage of attrition by treatment status was calculated and compared across conditions. These results are reported in Table 2 (see Panel A). The four attriting schools were dropped in the final analysis, resulting in the analytic sample of 46 schools.
Attrition Calculations
Student attrition was calculated from classroom rosters (excluding those from the attriting schools), which were received before the start of the intervention. Originally, there were 1,518 treatment students and 1,405 control students, while the analytic sample had 1,165 treatment students and 1,206 control students. This yielded an overall attrition of 19% and a differential attrition between the treatment and control student groups of 9%. 9 However, because this intervention was implemented at the classroom level, we assume that student-level attrition (i.e., students without the summative assessments) was not related to treatment, except in cases where a teacher left the study (n = 6; two treatment and four controls). We suspect that other student attrition was related to absences and student mobility. These results are reported in Table 2, Panel B.
Treatment and Control Conditions
The treatment teachers received learning materials, assessments, and PL that focused on the NGSS, PBL, and the curriculum. The control teachers, as noted above, received PL on the NGSS and were expected to teach their science classes as “business as usual.” To confirm that the control teachers were not using PBL in their classrooms, we first checked with school administrators about the science practices in the schools, which depended on district and/or school curricular choices; we learned that teachers primarily used science textbooks and worksheets. Second, several random observations were conducted in the control teacher classrooms that recorded teacher and student science learning activities. Third, a majority of the control teachers completed an exit survey in which they and the treatment teachers described their science materials and practices.
Data, Instruments, and Measures
Student Instruments and Measures
To estimate the treatment effect, three sets of student data were collected: a science academic achievement outcome measure, demographic characteristics, and three latent constructs of SEL. The science academic achievement outcome was constructed from the MDE state test, from items corresponding to the Grade 3 science NGSS PEs, which were designed to measure DCIs, CCCs, and SEPs. All the MDE items that corresponded to the Grade 3 NGSS PEs were given to the team. However, the number of items exceeded the number that a third grader could complete in 40 to 50 minutes. Consequently, the items were split into three different forms: A, B, and C. 10 A series of item response theory analyses were conducted, and the results can be found in online Appendix B. To ensure that the different forms were not biasing the treatment effect, the forms were also interacted with the treatment on the outcome of interest. These interactions were not significant for forms A, B, and C (p values for A, B, and C were 0.747, 0.554, and 0.678, respectively), indicating no evidence that the different forms were biasing the results (these results can be found in online Appendix C).
To obtain student demographic characteristics, state-level data on the students were requested from the Michigan Education Research Institute (MERI). A list of the students from the study was given to MERI, which matched them to the students’ research identifier numbers. These numbers were used to match with the students’ background demographics, which included race and ethnicity, SES, ELLs, gender, and whether they had an IEP. Not every student was matched with the MERI data; therefore, a missing demographic flag was created to identify these students (n = 130, approximately 5% of the analytic sample).
To create an instrument that would measure third graders’ SEL for their science classes, the team consulted multiple sources, including psychological research studies on SEL, developmentally appropriate questions for third graders, and items from other national assessments (e.g., the Early Childhood Longitudinal Study; see Baines et al., 2017; Jagers et al., 2018). Recognizing differences in literacy skills among students, a drawn thumbs-up (agree), thumbs-down (disagree), and closed fist (neutral) were used to measure agreement. Students circled their feelings on a paper/pencil form administered to treatment and control groups in the spring of 2019. The SEL instrument was designed, field-tested, and revised in the year prior to the efficacy study. The results found three key latent constructs: self-reflection, ownership, and collaboration. These key latent constructs are important social and emotional factors related to science learning, and to PBL specifically. The relationship of these constructs to the ML-PBL intervention is explored in C. Miller (2021, ch. 3). Results of the confirmatory factor analyses, including the items, can be found in online Appendixes D1 and D2.
Using the available data for the students’ individual demographic information, descriptive statistics and mean differences between the treatment and control students were calculated. Table 3 provides these descriptive statistics and the balance for the analytic sample on individual characteristics. The balance for the student-level variables by treatment status was estimated by using a three-level hierarchical linear model (HLM), controlling on region with the three levels being student, teacher, and school. Checking for balance is important because imbalance between the two populations could indicate problems with the randomization and potential bias in the statistical tests.
Balance of the Analytic Sample
Note. Because the randomization was blocked, the balance was calculated controlling on region. ELL = English language learner; IEP = individualized education plan; SES = socioeconomic status.
The effect sizes of all variables in Table 3 indicate balance between the treatment and control conditions. However, an additional sensitivity check was calculated for the treatment effect, which included all the demographic characteristics shown in Models 3 and 4 in Table 4. 11
Treatment Effect of ML-PBL
Note. Treatment effect is the difference between the treatment and control group, measured in standard deviations. Standard errors are in parentheses, and 95% confidence intervals are in brackets. The additional covariates include SES, race and ethnicity, gender, ELL status, and IEP status. ELL = English language learner; IEP = individualized education plan; SES = socioeconomic status.
p < 0.05; **p < 0.01; ***p < 0.001.
Originally, we planned to collect a measure of prior academic achievement, so student reading benchmark scores were collected from the schools. However, not every school submitted their students’ benchmark scores or used the same benchmark tests for their students. The scores were normed using national norming standards, but it could not be confirmed that these benchmarks all had similar difficulty. Thus, this norming procedure could not guarantee that students who were at the 80th percentile were the same. Additionally, there was an imbalance between the treatment and control groups on the type of test administered. Therefore, due to the nature of an RCT with low differential attrition, which gives strong evidence for causal inference in and of itself (see U.S. Department of Education et al., 2020), it was determined that using the normed benchmark test could bias the results of the treatment effect, and the normed pretest was removed as a measure. For those interested, results of the treatment effect after controlling for the reading benchmark scores can be found in the technical report (Krajcik et al., 2021).
Teacher Instruments and Measures
In addition to student-level data, teacher-level data were also collected, including entry and exit surveys, observations, and enactment surveys. The entry survey, given to control and treatment teachers, included questions regarding each teacher’s familiarity with the NGSS and PBL and their experience with the recent PL they had received. The intent of the initial background survey was to confirm that both groups of teachers were only somewhat knowledgeable about the theoretical principles of the intervention and had limited professional development on PBL. The exit survey was also given to the treatment and control teachers and included items regarding their experiences teaching science that year. The treatment exit survey included questions on their use of three-dimensional learning, challenges with teaching ML-PBL, integration of literacy and mathematics in science, efforts to foster student collaboration and engagement, cultural awareness and equity, and experience with how PL supported their instruction. Additionally, the questions also asked about their perceptions of having met NGSS PEs and the quality of resources in the classrooms. The control exit survey focused on the teachers’ science content coverage, time spent on science instruction, science practices, exposure to NGSS and PBL professional development during the year, and the quality of their science resources.
Two observation schedules were designed, one for the treatment teachers and the other for the control teachers. The treatment observation protocol was not a checklist; instead, it highlighted the principles of PBL, and observers received specific directions to look for ML-PBL strategies used not only by the teacher but also the students. Raters scored how well the teacher and the students were involved in using the DQ, figuring out phenomena, using discourse moves, and collaboratively building artifacts. Observers also scored teachers on providing opportunities for all students to participate in science and maintaining good classroom management, with an overall score for using PBL principles. An abbreviated observation protocol was created for the control teachers to learn which science materials and instructional practices, including PBL, they were using in their classrooms.
Observers were recruited via recommendations from district science directors, teacher education professors, and the Michigan Science Teachers Association. Most of the observers were retired teachers and familiar with the NGSS and PBL. They were trained with observation videos, and interrater reliability was also calculated several times throughout the year. 12
Analytical Approach
To assess the difference between the treatment and control conditions in science achievement and to account for the clustering that occurred because of the assignment of schools to treatment, a three-level HLM was used (Bloom, 2005; Raudenbush, 1997; Raudenbush & Bryk, 2002). Because students were nested not only within schools but also within classrooms, we suspected that there would likely be classroom-level effects. Given the nesting structure of students within classrooms within schools and students learning alongside one another, student achievement would likely be correlated within a classroom and school. The three-level HLM allowed us to account for this supposition. Therefore, four different three-level models with students nested within classrooms within schools were estimated (for the random-effect variances from the three levels, see online Appendix F).
Models for Estimating the Main Effect of the Treatment
Models 1 and 2:
where
Finally,
For the first two models, the treatment effect was given in a single predictor multilevel model with randomization block fixed effects. Additionally, the treatment effect was also estimated with only the focal classrooms, which were randomized before the beginning of the intervention. Thus, each model, with the full sample of students and then only with those students who were in the focal classrooms, was estimated. For the following two models, controls were included for student race and ethnicity, gender, and SES.
Models 3 and 4:
where
Finally,
The ML-PBL design focused on contextualizing science to students’ lives so that it would work for all students, regardless of race and ethnicity, gender, or SES. Therefore, to determine whether student improvement differed across individual characteristics, cross-level interactions between individual race and ethnicity, gender, and SES with treatment at the school level were also conducted. 13 (Additional covariates, such as ELL and IEP status, were also included; For these interactions, the following models were used.
Gender heterogeneity:
where
Finally,
Similar models were used with the dummy variables for race and ethnicity, SES, and ELLs and IEP status.
Given the nature of ML-PBL, which encourages students to work together, develop their own questions to answer, and create and revise their own models, we hypothesized that treatment students—when engaged in their science classes—would report higher levels of self-reflection, ownership, and collaboration than the controls. To examine whether differences between treatment and control responses on these constructs were reported during the science classes, two three-level HLM models were estimated for each of the three constructs. These were the same models indicated in Model 1 and 3 above for the treatment effect, but with these social and emotional latent constructs instead of the standardized science assessment as the outcome.
To understand the implementation of ML-PBL, the exit surveys from the treatment and control teachers were analyzed to examine the differences on PBL activities, knowledge of PBL, and key aspects of three-dimensional learning. Differences in these items between the treatment and control teachers were tested by using t tests on the items.
Results
Student Outcomes
Table 4 provides the results of our treatment effect for the student outcomes. This table shows the impact of the intervention on science achievement. Column 1 shows the treatment effect for student science achievement with only regional fixed effects. Column 2 is the estimation of the treatment effect with regional fixed effects and only students in the focal classrooms. Column 3 includes the regional fixed effects with additional individual-level covariates. Finally, the estimation of the treatment effect in Column 4 includes the regional fixed effects, only students in focal classrooms, and additional demographic covariates. Across all the estimations, the treatment effect remains statistically significant. However, Column 3, which includes the student individual-level covariates, shows that the treatment students outperformed the control students by a 0.277 standard deviation on the summative science assessments. The corresponding standard error is 0.09, which is consistent with some of the other experimental work with PBL curriculum (see Harris et al., 2015; Schneider et al., 2022) but may also be high due to a lack of school-level covariates. However, adding school-level covariates only minimally improved the precision of the treatment effect (see preliminary results from the technical report; Krajcik et al., 2021). The 0.277 effect size corresponds to a Hedges’g standardized mean difference of 0.289. According to Kraft (2020), this is considered a large effect size in education research. See online Appendix E for the full model estimation.
The treatment on the treated effect (i.e., the effect of the treatment on those who took up the treatment) was also estimated by using an enactment survey (administered to teachers at the end of each unit) as a proxy for an indicator of treatment compliance (where teachers who enacted at least one unit were considered compliers). The estimation of the treatment on the treated was 0.379 standard deviation and statistically significant at the 0.01 level. The method and results are reported in online Appendix F.
Tests of Heterogeneity
To test whether the treatment effect varied for subpopulations, several tests of heterogeneity, which examined interactions that account for differences by SES, gender, race and ethnicity, and ELL and IEP status, were conducted. The summary of the student-level heterogeneity is reported in Table 5.
Summary of Student-Level Heterogeneity
Note. Coefficients are measured in standard deviations. Standard errors are in parentheses. ELL = English language learner; IEP = individualized education plan; SES = socioeconomic status.
p < 0.05; **p < 0.01.
The only significant interactions in Table 5 are the interaction of treatment with Asian students and treatment with ELLs. Aside from these variables, there is no evidence that the treatment differed by gender, race and ethnicity (other than Asian), SES, or IEP status. Asian students and ELLs show significant negative treatment effects. However, there is also overlap between the Asian and ELL students (70.5% of the Asians [55 of the 78] were also ELLs). Therefore, as a sensitivity check, a three-way interaction of treatment × Asian × ELLs was analyzed. However, with such a small number of Asian students, the sample was not powered enough to detect a three-way interaction, although we suspect that the language treatment effect among many of the immigrant-sampled Asian population contributed to this result. Because of the small sample sizes, the confidence intervals include a large range, indicating that although there is no evidence of treatment heterogeneity in this study, there is also no evidence of treatment homogeneity. A more in-depth examination of this interaction is explored in the online Appendix G.
SEL
For the SEL outcomes, a difference test on all the social and emotional items was conducted. First, a factor analysis was conducted to indicate the validity of the three latent constructs: reflection, ownership, and collaboration (see online Appendix C). A three-level HLM was then conducted on each of the three constructs without covariates and then with the additional individual covariates of race and ethnicity, SES, gender, and region fixed effects. These estimates are reported in Table 6.
Estimated Treatment Effect on Reflection, Ownership, and Collaboration
Note. Coefficients are in standard deviations. Standard errors are in parentheses.
p < 0.05; **p < 0.01.
Across all three constructs, the treatment students were estimated to have higher factor scores on these latent constructs than were the control students. For reflection, treatment students were estimated to have 0.544 higher factor scores in reflection, 0.434 higher factor scores in ownership, and 0.416 higher factor scores in collaboration than the control students. These correspond to Hedges’g standardized mean differences of 0.569 for reflection, 0.441 for ownership, and 0.434 for collaboration.
Implementation of ML-PBL
Initially, we planned to obtain five observations for each treatment teacher and visit each control teacher twice. However, there were major issues in scheduling observers into the classrooms because of weather conditions, state spring testing, high costs of the recruitment and training of observers, and long distances between school sites. In the end, each treatment teacher was observed at least twice, although some were observed several more times. Only a few of the control teachers were observed once. 14
Seventy percent of the treatment and control teachers completed the exit survey. Results of the exit survey (see Table 7) indicate that ML-PBL was conducted in treatment classes but likely did not occur in the control conditions. With respect to the exit survey, it was clear that although the control teachers were using some of these key practices, there were substantially significant differences between the two groups on knowledge of PBL, PBL activities, and key components of three-dimensional learning and scientific practices.
Differences in Science Teacher Practices and Student Activities, by Treatment and Control Teachers
Note. SD = standard deviation.
For the teacher practice of helping students make sense of phenomena, the difference between the treatment and control group was 0.39 point, significant at the 0.05 level, indicating that treatment teachers reported that they helped students more in making sense of phenomena. For the student practices of exploring questions, designing investigations, analyzing data, collaborating to build models, using evidence, constructing artifacts, debating ideas, choosing their own problems to solve, challenging each other’s ideas, using different ways to solve problems, and working in groups to solve phenomena, the difference between the treatment and control teachers varied from 0.29 point to 0.82 point, significant at the 0.05 level. This indicates that, on average, the treatment teachers more frequently used these practices than did the control teachers. Finally, for the specific classroom activities of hands-on experiments and creating models, the treatment teachers reported 0.43 and 1.01 points higher than the control teachers, indicating that the treatment teachers more frequently used these activities than did the control teachers. Overall, this yields evidence that the treatment teachers were more often using PBL practices in their classrooms. (See online Appendix H for the difference in tests of the teacher for all exit survey items.)
There is always a question of generalizability with respect to RCTs—specifically, in this case, as to whether it is possible to generalize the effect of the treatment to other classrooms in the United States. Recent work on estimating generalizability in RCTs has been developed (see Tipton, 2014; Tipton & Miller, 2021; Tipton & Olson, 2018). We conducted a generalizability estimate for this sample with positive results, suggesting that at the lower end (the most conservative estimate), we would expect to find these results for this population in more than two-thirds of the United States (see online Appendix I). However, it is indeed the case that such generalizable estimates cannot completely control for unobservables, and in a country as diverse as the United States, this estimate could be lower.
Discussion
ML-PBL provides a unique intervention that includes not only curriculum and lesson materials but also PL and embedded assessments designed to be used throughout the units. The curricular activity systems approach (Roschelle et al., 2010) of ML-PBL gives insights into why other elementary school science reform interventions have shown somewhat limited effects (Klager, 2017). Typically, science reforms do not have the scope and depth of ML-PBL, which is based on learning theory, PBL principles, and the three-dimensional learning described in the Framework (NRC, 2012); moreover, they are not usually conducted with an efficacy trial. ML-PBL results show a significant intervention treatment effect, and one that is higher than what has been reported in other science studies (see Harris et al., 2015; Lynch et al., 2012; Wilson et al., 2010). The main effect, which shows a significant difference on an objective science assessment between treatment and control students, is large (Kraft, 2020). What is particularly noteworthy is that this main effect holds for students of differing gender, race and ethnicity, and SES and across the major geographic regions of Michigan.
One interpretation of the ML-PBL main effect is to imagine a school district that standardizes science achievement measures on a 100-point scale. A student in the treatment group whose science test score performance is at 50 points would be expected to improve 10 points on the science achievement test. Because a percentile ranking is sensitive to the starting percentile one chooses, it could also be that the treatment effect might increase an average student’s letter grade from a C to a B.
Another positive treatment effect was found for SEL during science classes. The latent constructs used for this analysis reflect the components of the intervention practices, such as supporting students when they initiate asking questions and figuring out phenomena. Finally, collaboration, a key feature of ML-PBL, helps ensure that all students work with others as they pursue questions. It underscores the importance of allowing all students to participate in experiences that encourage equitable practices that support science learning.
Limitations
This work has several limitations. Unfortunately, there are no strong standardized science assessments for students in lower elementary grade levels. Designing, training, and administering a science pretest at the beginning of the year would have been costly. As many students are still early readers at the beginning of Grade 3, the test would have to be given orally, which, moreover, could be subject to tester bias. We relied on the What Works Clearinghouse Standards, which suggest that if there is low attrition between the treatment and control group, a pretest is not critical (U.S. Department of Education et al., 2020). However, regardless of their recommendation, we maintain the need for a high-quality science pretest in the early grades to estimate intervention effects, especially given the likelihood of possible unobservables.
We would also have preferred to have a larger number of teacher observations in the treatment and control conditions throughout the school year, but this was not possible because of cost constraints and observer and teacher availability. There is always the question as to whether the control teachers were teaching various aspects of the treatment and whether the treatment teachers were implementing all components of the intervention. This becomes more important in isolating aspects of the intervention which may be more related to science learning than others. More direct observations of these concerns are important for measuring the fidelity of implementation among the teachers.
Several questions remain regarding the sustainability of ML-PBL on students’ and teachers’ practices. Originally, we planned to deliver the intervention to the control teachers to compare the results of those students with the current treatment sample. However, due to the COVID-19 pandemic, we were unable to determine differences in treatment conditions in a cohort of teachers and students from the subsequent year; this prevented us from measuring sustainable treatment effects on the treated teachers and students. Another unanswered question—one that has become more prevalent in the literature—is whether the positive impact of treatment would also affect students’ science learning in the next grade or “fade out” because of variations in support of ML-PBL at the instructional or school level. Having more information on the longer-term impact of ML-PBL on teacher practices would also be a window into what we could expect from this intervention and its PL experiences in the future.
With respect to SEL, we would have preferred more in situ measures of student engagement. This remains of deep interest to our team, and we are currently experimenting with new techniques for obtaining in-the-moment social and emotional repeated responses of young children. Moreover, we are currently obtaining social and emotional responses of students during key ML-PBL experiences, such as using the DQ, writing explanations, and investigating phenomena, including developing various models that represent students’ understanding of how phenomena occur. With respect to the results of our impact on different subpopulations, we were hindered from a more comprehensive examination because of the small proportions of students in various demographic groups. For example, in our sample, the majority of Asian students were ELLs, which may have made it challenging for them to ask their own questions orally, write explanations, and participate in group discussions. This is a very important problem, as it is difficult to detect treatment effects on various racial and ethnic backgrounds and social classes, and, unfortunately, these are the populations that typically do not receive science instruction in the early grades. Although this was a limitation of our intervention, we strongly suggest that, in the future, larger numbers of actual (not simulated) groups from diverse background groups be oversampled.
Implications for Practice
The design of the ML-PBL intervention addresses the need for high-quality instructional and learning materials, teacher PL, and assessments for young children based on the Framework (NRC, 2012), the NGSS (NGSS Lead States, 2013), and principles of PBL (Krajcik & Shin, 2014). The expectation was that such an intervention would improve science academic, social, and emotional learning and transform teacher practices. The design of this intervention culminated in the study’s system design and includes several important innovations (see Blumenfeld et al., 1991; Fortus & Krajcik, 2012; Krajcik & Schneider, 2021; Krajcik & Shin, 2014).
First and foremost, all the units have carefully designed lessons in which teachers learn what evidence to “look for” to determine whether the students meet the lesson-level performance learning goals. For example, teachers in one of the Toys Unit lessons are encouraged to use observations of small-group discussions to obtain evidence for how well students can predict the amount of friction that can act on a moving car. Described in accessible language, teachers are directed to actions that demonstrate students’ understanding of scientific phenomena with discourse moves (e.g., talk and turn, revoicing), activities, and collaborative interactions. Second, the units lead teachers to take advantage of the environment outside the classroom, which opens a new laboratory of learning activities that support and enhance experiencing phenomena, enhancing observation skills, constructing explanations, and working in teams. Third, experiences were designed that actively involve the families in science learning at home and in the communities where they live. And finally, there are specific directions for teachers to help them recognize and support cultural differences and encourage flexibility to incorporate them into lesson activities.
At a more general level, this intervention differs from current and previous work by focusing on and providing a clear articulation of three-dimensional learning, with detailed lesson plans, assessments, and teacher learning opportunities (see Krajcik & Schneider, 2021). The principles of PBL were further refined and modified so that they specifically incorporated the tenants of the science reform movement that was designed to have students taking on new scientific skills and practices. The intervention gave teachers the tools necessary to transform their teaching and bring experiential learning into elementary classrooms, support student curiosity about their world, and promote equitable science academic, social, and emotional learning. 15 Importantly, our PBL units are the curriculum that students experience and not a “project” tacked on to the end of units.
Unquestionably, science education needs more attention starting in earlier grades so that students can create early interest, develop a foundation for more in-depth science learning, and acquire skills to help them begin to think critically (see NRC, 2021). Many of the major challenges facing the world today require deep, usable scientific knowledge, not just for those going into science but for all individuals across the globe. The results reported here illustrate that ML-PBL, by making use of the principles of PBL and three-dimensional learning, has the potential to bring about a transformation in the way science is taught and learned in elementary school. The intervention’s effects hold promises for instruction models and experiences that can influence the science academic, social, and emotional learning of young children at a time when science has never been more important for their futures and that of our society.
Supplemental Material
sj-pdf-1-aer-10.3102_00028312221129247 – Supplemental material for Assessing the Effect of Project-Based Learning on Science Learning in Elementary Schools
Supplemental material, sj-pdf-1-aer-10.3102_00028312221129247 for Assessing the Effect of Project-Based Learning on Science Learning in Elementary Schools by Joseph Krajcik, Barbara Schneider, Emily Adah Miller, I-Chien Chen, Lydia Bradford, Quinton Baker, Kayla Bartz, Cory Miller, Tingting Li, Susan Codere and Deborah Peek-Brown in American Educational Research Journal
Footnotes
Notes
J
B
E
I-C
L
Q
K
C
T
S
D
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
