Abstract
In this article, we examine classroom observations from a 3-year large-scale randomized trial in the Los Angeles Unified School District (LAUSD) to investigate the extent to which a professional development initiative in inquiry science influenced teaching practices in in 4th and 5th grade classrooms in 73 schools. During the course of the study, LAUSD introduced an additional districtwide scientific inquiry professional development initiative, which complicates the experimental analysis but allows us to conduct a quasiexperimental analysis of the second Multilevel models predicting the presence of science inquiry in observed classroom lessons show that both interventions increased the incidence of inquiry-based science teaching, but the impact was limited to selected features of the inquiry process. We also found that the experimental impacts on teaching practice correspond with the features of scientific inquiry to which the teachers were most frequently exposed during the professional development.
T
Major changes in teaching practice will require substantial learning and guidance (Ball & Cohen, 1999; Borko, 2004; Wilson & Berne, 1999), and although teachers learn in many ways, professional development promotes systemic and coordinated teacher learning, which in turn may influence student achievement (Gamoran, Secada, & Marrett, 2000; Borko, 2004; Desimone, 2009; Garet, Porter, Desimone, Birman, & Yoon, 2001). Support for teacher learning is especially necessary when promoting challenging and complex pedagogies, such as scientific inquiry (Crawford, 2007; Davis, Petish, & Smithey, 2006). Professional development can take many forms, including mentoring, coaching, and lesson study, but the standard approach in the United States remains training in the form of in-service workshops, on which billions of dollars are spent annually (Birman et al., 2007).
Rigorous evidence of the impact of professional development in typical, real-world settings is limited (Rice, 2009; Wayne et al., 2008), and several recent randomized field trials of professional development workshops have yielded no evidence of an impact on student achievement (e.g., Heller, 2012; Newman et al., 2012) or, in some cases, a negative impact (e.g., Borman, Gamoran, & Bowdon, 2008; Pane, McCaffrey, Slaughter, Steele, & Ikemoto, 2010). Given these findings, it is imperative to understand why these interventions are not producing the expected results. To do so, we need studies that investigate how professional development influences teaching practice, one of the ways in which professional development influences student achievement (Desimone, 2009). When possible, these studies should include multiple professional development programs in multiple sites (Borko, 2004).
In this article, we investigate how professional development changes teaching practice by examining the effect of two professional development initiatives designed to increase the prevalence of scientific inquiry instruction in 80 elementary schools in the Los Angeles Unified School District (LAUSD). The first initiative in “Science Immersion” was evaluated experimentally: Forty of the schools were randomly assigned the opportunity to send two teachers a year to an intensive summer training workshop. This professional development initiative negatively affected student achievement during the 1st year of the study (Borman et al., 2008), and during the 2nd year of the study, there was no difference in student achievement between the treatment and control schools (Bowdon, Borman, & Gamoran, 2009). Between the 1st and 2nd years of the experiment, LAUSD increased its commitment to scientific inquiry by adopting the Full Option Science System (FOSS) curriculum and providing at least 1 day of professional development for all teachers in the district. We derive quasiexperimental estimates of the impact of the districtwide FOSS initiative by using the 1st-year classroom observations in the experimental control schools as a preintervention measure for the FOSS professional development initiative. We use data from observations of classroom instruction and of the Science Immersion professional development workshops to answer the following research questions:
Did either professional development initiative change science instruction in elementary schools?
How did science instruction change?
How does the content of the professional development relate to the changes in teachers’ observed science instruction?
We demonstrate that although Science Immersion and FOSS both resulted in changes in teaching practice, neither produced the full-scale implementation of the inquiry cycle hypothesized to yield major improvements in scientific understanding (NRC, 2000). Instead, both interventions produced changes in the instructional use of asking and responding to scientific questions through investigation. The full vision of scientific inquiry that includes synthesizing and communicating scientific concepts was not achieved consistently under either professional development regime. However, we did observe that the experimental impacts on teaching practice correspond with the features of scientific inquiry to which the teachers were most frequently exposed during their professional development workshops.
Scientific Inquiry: A New Vision of Science Instruction
Science teaching at the elementary level offers an opportunity to investigate how teachers respond to ambitious instructional reform initiatives. Most elementary science instruction is traditional (Kennedy, 1998; Loucks-Horsleyet al., 2009), but an alternative vision—scientific inquiry—has been consistently promoted for the past 20 years (AAAS, 1993; NRC, 1996, 2000, 2012). In traditional science instruction, science is conceived primarily as a body of content and a set of procedures, and the goal of learning is to acquire the content and master the procedures. In these lessons, teachers present a scientific fact or principle to students, which may be accompanied by a confirmatory laboratory exercise demonstrating the fact or principle. Traditional science instruction has failed to produce uniformly high levels of scientific literacy (Romberg, Carpenter, & Dremock, 2005). In contrast, scientific inquiry allows students to conduct investigations to test questions about the natural world and then use the evidence they collect during their investigations to articulate an explanation in terms of scientific concepts and principles. These investigations are not purely exploratory; inquiry with minimal guidance from teachers does not lead to learning (Kirschner, Sweller, & Clark, 2006). Rather, scientific inquiry resembles the work of practicing scientists, with teachers serving as guides, ready to respond to the student questions as they emerge from their investigations. According to the National Research Council (1996),
scientific inquiry refers to the diverse ways in which scientists study the natural world and propose explanations based on the evidence derived from their work. Inquiry also refers to the activities of students in which they develop knowledge and understanding of scientific ideas, as well as an understanding of how scientists study the natural world. (p. 23)
The NRC elaborated on this definition by presenting five “essential features of classroom inquiry” (NRC, 2000, Table 2-5). At the time of this study, the latest vision of K–12 science instruction—the National Science Education Standards (NSES)—consisted of the following five features:
Feature 1: Learner engages in scientifically oriented questions
Feature 2: Learner gives priority to evidence in responding to questions
Feature 3: Learner formulates explanations from evidence
Feature 4: Learner connects explanations to scientific knowledge
Feature 5: Learner communicates and justifies explanations
The features are intended to embody scientific inquiry so that students can actively experience it. Although the features are numbered, they are not intended to be executed in order. Rather, scientific inquiry is viewed as a cycle, with different features following one another. The NRC recently reframed the K–12 science standards to emphasize, among other things, eight “practices” rather than five “features” of scientific inquiry (NRC, 2012), but regardless of which version of the NSES one follows, scientific inquiry—asking questions, gathering and interpreting evidence, and communicating explanations—remains central. 1
A number of developers created curricula aligned with the NSES to engage students with scientific inquiry. In California, these curricula cover three elementary science content areas: life science, earth science, and physical science. The System-Wide Change for All Learners and Educators (SCALE) partnership developed a Science Immersion (hereafter Immersion) curriculum at the University of Wisconsin-Madison in partnership with the University of Pittsburgh and LAUSD (Schunn, Millar, Lauffer, & SCALE Immersion Team, 2005). The Immersion curriculum for fourth grade is a unit called “Rot it Right” that covers the life science standards on the transfer of matter and energy through food chains, the living and nonliving components of ecosystems, and the role of microorganisms in ecosystems. It consists of thirteen 45-minute lessons, or about 10 hours of instructional time. The fifth-grade Immersion curriculum is a unit called “Weather Forces and Prediction” that covers the earth science standards on the role of convection currents, the ocean, and the water cycle in weather patterns and severe weather events. The unit consists of twenty-one 50-minute lessons, or about 17 hours of instructional time.
The FOSS project at the Lawrence Hall of Science, University of California-Berkeley, developed kit-based science units for use in elementary schools (FOSS, 2009). FOSS is a complete curriculum covering all three elementary science content areas; the comparable kits to the Immersion units are “Environments” (fourth grade) and “Water Planet” (fifth grade). Because both curricula are aligned with the NSES, classroom observation protocols that identify the essential features of inquiry can be used to document the implementation of both Immersion and FOSS. Moreover, to the extent that average teaching practices in LAUSD are traditional, we can attribute differences in the observed features of inquiry to professional development in these curricula.
Professional Development as a Reform Strategy
We adopt the “nested layers” model of school organization in which inputs at one level (e.g., classroom instruction) lead to outputs at another (e.g., student achievement; Gamoran et al., 2000; Barr & Dreeben, 1983). Put another way, teaching practices mediate the relationship between professional development initiatives and student learning (Desimone, 2009).
There is little convincing evidence, however, that professional development initiatives influence student achievement. The “loose coupling” view of school organization suggests, in fact, that classrooms are insulated from external influences on the core technology of teaching (Meyer & Rowan, 1978; Weick, 1976), and there is mixed evidence that professional development changes teaching practice. Prior research has documented a positive association between professional development and teaching practice, but most of the existing evidence relies on teacher self-reports, such as using large surveys (Desimone, Porter, Garet, Yoon, & Birman, 2002; Garet et al., 2001; Supovitz & Turner, 2000) or teachers’ daily instructional logs (Correnti, 2007). Many of the studies of science teaching used pre- and postintervention data without the benefit of a comparison group (e.g., Akerson & Hanusci, 2006; Jeanpierre, Oberhauser, & Freeman, 2005; Oliveira, 2010) or have used quasiexperimental designs (Kelly, 2011; Kelly, Rickles, Sass, Ullah, & Foster, 2008; Lee, Deaktor, Hart, Cuevas, & Enders, 2005). We draw from one of very few large-scale, experimental studies of the effects of professional development on directly observed teaching practice.
Recent large-scale experimental evaluations of the potential impact of professional development workshops present a complicated picture. On the whole, professional development in real-world settings seems to have limited impact on student achievement. Recent researchers have begun to investigate teaching practice as well, although often using teacher-self report or simple measures of time use. An evaluation of professional development for a scientific inquiry curriculum in Alabama elementary and middle schools, for example, found that teachers in the intervention schools reported spending more time using “active learning strategies” but that students in these schools performed equivalently on achievement tests (Newman et al., 2012, p. xxiv–xxvi). Rather than relying on teacher reports of how much time students are actively learning, we investigate teaching practice with data collected by classroom observers using a multidimensional rubric.
A recent evaluation of a science professional development initiative in California and Arizona found that teachers in the intervention schools performed better on a content knowledge assessment and reported being more confident teaching science, but student performance was not different in the two groups of schools (Heller, 2012). An evaluation of a mathematics initiative in sixth-grade classrooms in the mid-Atlantic region of the United States found no impact on student achievement but used both teacher self-report and classroom observations to measure implementation (Martin, Brasiel, Turner, & Wise, 2012). The classroom observations showed that teachers in the intervention schools placed more responsibility on students for learning and complex thinking and that the students showed evidence of responsibility for learning when working in groups, but there was no difference in student responsibility for learning when in class discussion, and there was no significant difference in the extent to which teachers made connections for students (Martin et al., 2012, Table 3.4).
An evaluation of a learner-centered geometry curriculum deployed in eight Baltimore high schools yielded a statistically significant impact on student achievement: Like the 1st-year results of the study from which our data are drawn, the impact was negative (Pane et al., 2010). Observations and interviews conducted by the researchers revealed that the teachers in the study found the learner-centered pedagogy difficult to implement. Difficulty of implementation is a critical problem, since incomplete implementation may harm students if it leaves them without sufficient guidance to navigate the learning process (Kirschner et al., 2006). Teachers need guidance as well, both from curriculum materials and from professional development facilitators. The Immersion and FOSS curricula and accompanying professional development differ in their design, and these differences may yield different impacts on the individual features of scientific inquiry. Moreover, observing the Immersion professional development workshops may inform how the workshops influenced teaching practice.
Method
Research Design
We use data from the System-Wide Change (SWC) study, a school-based randomized field trial funded by the National Science Foundation of professional development in the Immersion curriculum developed by SCALE. The study took place in the LAUSD, one of the country’s poorest-performing school districts in elementary science (Lutkus, Lauko, & Brockway, 2006). A study timeline is presented in Table 1.
Timeline of LAUSD Initiatives, Scientific Inquiry Professional Development, and Data Collection
Note. LAUSD = Los Angeles Unified School District; FOSS = Full Option Science System.
LAUSD had a long-standing commitment to scientific inquiry at the elementary level. Since providing curricular resources alone is not sufficient to change teaching practice (Gamoran, Anderson, Quiroz, Secada, Williams, & Ashmann, 2003), LAUSD provided professional development to accompany these science curricula, including Immersion and FOSS. This professional development consisted of single-day workshops in Immersion or FOSS for interested teachers and more extensive training (about 3 days per year) for the science lead teachers (SLTs). A 2-day workshop on “Rot it Right” was also made available to interested teachers. Because of limited resources and a strategic decision to train as many teachers as possible, these professional development workshops were brief and did not conform with the sustained, active learning called for by the “best practices” in professional development (Garet et al., 2001; Supovitz & Turner, 2000).
In early 2006, LAUSD approved a 3-year school-based randomized field trial of a 5-day professional development workshop known as an “Immersion institute.” The Immersion institute went beyond the typical LAUSD professional development initiative by offering a reflective experience for an entire week. The institutes were facilitated by teams of education and natural science faculty from partner universities, LAUSD central and local district science instructional leaders, and LAUSD science teachers. The facilitation teams explicitly engaged the teachers in the lessons of the Immersion unit from a student perspective, and then reflectively as a practitioner, in a process deemed the Science Immersion Model of Professional Learning (SIMPL; Lauffer & Lauffer, 2009). A key component of this model is that teachers were to experience the Immersion unit authentically—that is, as students—and then reflect on the experience as teachers who will deliver the lessons (Lauffer, 2010). Participating teachers were invited to attend follow-up sessions during the school year in which they could reflect on their pedagogy and discuss student work, but in practice few teachers attended these sessions.
The eight LAUSD local district superintendents nominated 190 schools they considered to be at least “minimally prepared” to undertake the initiative, and we selected a stratified random sample of 10 schools from each local district, yielding a study sample of 80 schools. Five schools in each local district (40 total) were randomly assigned to the experimental condition: They were encouraged to send two teachers per year—an SLT and a grade-level colleague—to Immersion institutes, starting in summer 2006. Fourth-grade teachers were invited in summers 2006 and 2007; fifth-grade teachers were invited in summers 2007 and 2008.
Forty-five teachers from 27 schools attended the five Immersion institutes offered to fourth-grade teachers in 2006, 32 teachers from 20 schools attended the four Immersion institutes offered to fourth- and fifth-grade teachers in 2007, and 19 teachers from 14 schools attended the two institutes offered to fifth-grade teachers in 2008. Project staff observed 9 of the 11 institutes and coded the observations with the same instrument that we employed in classrooms. We present data from the six Immersion institutes for which at least 4 of the 5 days were observed (three in fourth grade and three in fifth grade) to answer our third research question: to illustrate the relation between what teachers experienced in professional development and the changes in teaching practice.
The SWC study collected multiple forms of data: student achievement; teacher surveys; teacher, principal, and district staff interviews; and classroom and professional development observations. We use school average student achievement as a covariate to analyze the classroom observation data. District data on prior achievement and student demographics revealed no statistically significant differences between the two experimental groups of schools at the outset of the study (Borman et al., 2008). In the 1st year, fourth-grade students in the 40 Immersion schools performed approximately one quarter of a standard deviation lower in fourth-grade life science than students in the 40 comparison schools on districtwide standardized assessments (Borman et al., 2008). Moreover, students of SLTs—who were targeted for the Immersion institutes—performed approximately one half of a standard deviation lower in the 1st year of the study than students of SLTs in comparison schools. No differences in fourth- or fifth-grade student achievement on district and state standardized assessments were apparent in the 2nd and 3rd years of implementation (Bowdon et al., 2009; Borman, Gamoran & Bowdon, 2010). We believe that the professional development and classroom observation data can inform these initially negative, then null, student achievement findings by documenting the extent to which teaching practice reflects the intended vision of scientific inquiry embodied in the NSES.
The Interventions: Two Professional Development Initiatives in Scientific Inquiry
In July 2007, after the classroom observers had been in the field for a year, LAUSD adopted FOSS districtwide in kindergarten through fifth grade (LAUSD, 2007a, Section I) following the California State Board of Education’s inclusion of FOSS as an adopted elementary science curriculum (California Department of Education, 2007). As with the Immersion initiative LAUSD did not simply make FOSS available; it undertook a series of three 1-day professional development workshops, 1 day per science content area. SLTs were required to attend all three workshops, and every fourth- and fifth-grade teacher was told to attend a single workshop in one of the three content areas (LAUSD, 2007b, Sec. 36). Interviews with teachers revealed that these brief workshops emphasized use, maintenance, and coordination of the FOSS kits rather than providing teachers with an authentic learning experience as the Immersion institutes attempted to do. Fourth- and fifth-grade teachers who wished to continue using Immersion were welcome to do so as long as they supplemented their instruction with the FOSS units that covered the additional science content areas that Immersion did not cover. Fifth-grade teachers were presented with a “blended” FOSS-Immersion curriculum as another option (LAUSD, 2007b, Sections 15, 16). This blended curriculum was introduced during the institutes in a sequence of twenty-five 45-minute lessons—nearly 19 hours of instruction—adding FOSS lessons on air pressure and processes of the water cycle (evaporation, condensation, deposition, distribution) to the Immersion weather unit’s lesson sequence.
The attributes of the two professional development initiatives—Immersion and FOSS—are compared in Table 2. They share a number of similarities but differ substantially in their design and deployment. Both curricula are aligned with the NSES and the California science content area standards and aim to engage students in scientific inquiry (FOSS, 2002, 2007; Schunn et al., 2005). This alignment is the primary justification for using the same observation protocol for classrooms taught by teachers trained in either or both interventions. Both professional development initiatives took the form of institutes or workshops, but the Immersion institutes were 2 to 5 days long (with most participating teachers attending the 5-day version), and the FOSS trainings were 1 to 3 days long (with most participating teachers attending the 1-day version). All teachers attended the brief FOSS trainings, however, and only two teachers per year from each treatment school could attend the sustained Immersion institutes because of financial constraints.
Characteristics of Science Immersion and Full Option Science System (FOSS) Initiatives
Note. LAUSD = Los Angeles Unified School District.
Classroom Observation Summary Statistics
Note. Pr(I– C ≠ 0) = probability of a two-sample t test that the Immersion and comparison schools (and lessons in those schools) have equal means.
Immersion addressed one science content area in fourth and fifth grade—life science and earth science, respectively—whereas the FOSS kits were “wall to wall,” covering all three components in each grade (one kit per content area). Immersion required teachers to facilitate students’ open questioning (open inquiry), whereas the FOSS curriculum was more structured (guided inquiry). Immersion required the classroom teachers to collect and prepare instructional materials on their own, whereas FOSS consisted of prefabricated “FOSS kits.” The kits required ongoing maintenance to remain useful, however, which may diminish this distinction over time. Moreover, since each grade had only three kits, the grade-level classrooms had to coordinate their instruction so they were not teaching the same content areas at the same time. Teachers could use Immersion simultaneously. Finally, the two initiatives were deployed in different ways: The Immersion institutes were assigned experimentally to schools and attendance was optional for two teachers per target grade per year, whereas the shorter FOSS training sessions were mandatory for all elementary school teachers. Consequently, the contrast in the Immersion experiment is between randomized conditions, whereas the contrast in the FOSS quasi experiment is a pre- and postintervention analysis using the 1st year of the classroom observation data of fourth-grade classrooms as a baseline measure.
Data Collection and Measures
Classroom observations: Sampling
Immersion institute and classroom observation data were collected for 3 consecutive years. Fourth-grade classrooms were observed during the 1st and 2nd years of the study, and fifth-grade classrooms were observed during the 3rd year of the study. Each year, observers asked the SLT and a randomly selected colleague in each of the 80 schools to sit in on three science lessons. Not all schools complied with the request every year, but in the 1st year, nearly two thirds of the schools were sampled (52 of 80), and by the 3rd year, nearly three quarters of the schools were sampled (63 of 80). 2 Across all 3 years, classrooms from 75 of the 80 study schools were observed. Most teachers who agreed to participate did so fully, with 196 of 274 teachers (72%) represented by at least three classroom observations. Scores on LAUSD’s life science “periodic assessment” that we use as a pretest covariate were not available for 2 schools, so our analytic sample includes at least 1 year of observations from 73 of the 80 schools in the SWC study. Thirty-six of the 73 schools (49%) were in the treatment group, which suggests to us that there was no differential attrition. The two groups are equivalent on the life science pretest, are balanced across years, and are equally likely to have been taught by a SLT. Overall, the analytic sample includes 711 classroom observations, 357 from Immersion schools and 354 from comparison schools.
The science lessons in our data are not randomly selected from the school year. Rather, they were purposefully selected to cover the science standards that the Immersion units address. That is, the lessons were sampled so that the observers were more likely to witness scientific inquiry than randomly selected lessons would reveal. Furthermore, the observers spread their observations out over time to represent the range of activities related to the California science standards covered by the Immersion unit.
Observers also attended six of the seven fourth-grade Immersion institutes and three of the four fifth-grade Immersion institutes offered during the course of the study. Some of the observations were used for training purposes. We coded the six observed Immersion institutes that a trained observer attended for at least 4 of the 5 days. The unit of analysis for both classrooms and professional development institutes was a “day” of observation, which averaged 51.6 minutes (10.3 five-minute segments) for the classroom lessons and more than 5 hours (21.0 fifteen-minute segments) for the professional development sessions. The observations from the Immersion schools were 3 minutes longer than those from the comparison schools. Because the observation length is different in the two conditions and because longer observations are more likely to yield evidence of scientific inquiry, we include observation length in our statistical models. The trained observers recorded narrative notes in 5-minute segments for classroom observations and 15-minute segments for professional development sessions.
Classroom observations: Coding the outcome variable
The narrative notes for both the classroom and professional development observations were subsequently coded to identify the use of the essential features of scientific inquiry. For their training, a team of five raters spent 50 hours iteratively coding sample observations until they came to a consensus on which essential features of inquiry were present. The consensus among the raters was not perfect, but an analysis of observations conducted of 15 raters on a videotaped sixth-grade lesson revealed that 85%, on average, agreed with the modal inquiry feature in the 5-minute segment (Osthoff & Ferrare, 2007). 3 The activities described in a 5-minute segment could be coded as more than one feature of inquiry; for example, if students gave priority to evidence (Feature 2) and formulated an explanation from that evidence (Feature 3) in a 5-minute period, the segment would count for both features. 4
This approach produced a complicated outcome variable that we simplify for analysis. Although the data produce counts of 5-minute segments with evidence of inquiry in each lesson (e.g., four segments of Feature 1, two segments of Feature 2, etc.), we reduce the count data to a binary indicator of whether the feature of inquiry was documented in the entire lesson (e.g., Feature 1 was observed, Feature 2 was observed, Feature 3 was not observed, etc.). This simplifies the analysis from segments of lessons to entire lessons. 5 We examine the features separately to investigate how science teaching changed (our second research question), and we examine them together (in a single collapsed “any feature” variable) to examine whether professional development changed teaching practice (our first research question).
As can be seen in the summary statistics reported in Table 4, some form of scientific inquiry was evident in most of the observed lessons. Given that the lessons were purposefully sampled to reveal scientific inquiry and that any instance of inquiry at any time in the lesson is a relatively low standard, this is not surprising. Not all of the inquiry features were equally represented, however. Eighty-six percent of all observed lessons had at least one 5-minute segment with at least one feature of inquiry. Features 1 and 2 were observed most frequently (62% and 61% of lessons, respectively), followed by Feature 3 (41% of lessons), whereas Features 4 and 5 were rarely observed (15% and 9% of lessons, respectively). Even though the features were intended to be iterative and not follow a sequence from Feature 1 to Feature 5, we might expect the initial stages of the inquiry cycle to be more prevalent. That being said, we interpret the findings for Features 4 and 5 as prima facie evidence that the “full cycle” of inquiry did not take place.
Proportion of Lessons With Evidence of Scientific Inquiry by Year and Experimental Condition
Table 4 also breaks down the evidence of scientific inquiry by experimental condition and year of the study. Three trends are evident. First, the features of inquiry were not unique to the schools that were randomly assigned the opportunity to attend the Immersion institutes: In the 1st year of the study, some form of inquiry was apparent in 87% of the lessons from the Immersion schools, but 67% of lessons from the comparison schools also showed some evidence of scientific inquiry. Second, there is an apparent difference between the 1st year of the study and the subsequent years of the study after LAUSD undertook the FOSS initiative: In the 2nd year, some form of inquiry was observed in 87% of the lessons from comparison schools. Third, the experimental and longitudinal contrasts differ by feature. Feature 2, for example, is more prevalent in the Immersion schools and remains stable over time. Feature 1, on the other hand, is higher in the Immersion schools in Year 1 and increases in all schools over time. Our statistical model will focus on these observed differences between the Immersion and comparison schools over time.
Statistical methods
To answer our first two research questions (Did either professional development initiative change science instruction in elementary schools? How did science instruction change?), we use multilevel logistic regression models. These models account for the clustered nature of our data; we wish to estimate the impact of an intervention assigned at the school level, but our data are lessons within schools. The model is shown in Equation 1:
The outcome π ij is a binary indicator of whether any feature of inquiry (or an individual feature of inquiry to answer our second research question) was observed in lesson i in school j. β1 estimates the impact of the assignment to the Immersion institute, βSXj is a vector of school-level variables, βLXij is a vector of lesson-level variables, and uj is the random effect that makes this a two-level random intercept model. The lesson-level variables include the length of the lesson in segments (centered on the sample mean) and whether the lesson was taught by a SLT. We estimate models with two sets of school-level covariates. In Model 1, the school-level covariates include indicators for the LAUSD local districts that served as randomization blocks and a baseline student achievement measure (the school mean percentage correct on the fourth-grade life science periodic assessment in 2005–2006). Model 2 includes the variables in Model 1 as well as indicators for the 2nd and 3rd years of the study and interaction terms between the year and treatment assignment to identify how the lessons changed over time.
We interpret the Immersion coefficients as reflecting the experimental effect of the opportunity to attend the Immersion institutes. In the 2nd and 3rd years of the study, these effects interact with the deployment of FOSS, but they remain causal estimates of the Immersion training effect in the presence of the FOSS substitute. That is, we assume that the scientific inquiry teaching practices in the Immersion schools would have resembled those of the comparison schools in the absence of institute training, and therefore any observed differences are effects of the opportunity to send teachers to the Immersion institute.
For the quasiexperimental analysis, we interpret the Year 2 and Year 3 coefficients as longitudinal estimates of the effect of the FOSS curriculum and training on the comparison schools. The observers were present in the fourth-grade classrooms prior to the adoption of FOSS, which establishes a baseline rate of scientific inquiry teaching in these schools. Indeed, in Table 4, we see that the comparison schools had fewer observed instances of scientific inquiry in the 1st year of the study. We assume that the teaching practices in the comparison schools would have remained the same in the 2nd year, which allows us to attribute any observed changes in teaching practices over time to the introduction of FOSS. The observers were not in the fifth-grade classrooms in the 1st year, so the Year 3 coefficients combine the effect of time with a change in the grade observed. We assume that the prevalence of inquiry teaching and the relative distribution of the inquiry features do not fundamentally differ in fifth grade, but we acknowledge this is a limitation in the study design. In sum, the Immersion coefficients are the experimental estimates of the Immersion professional development initiative, and the year coefficients are the quasiexperimental estimates of the systemwide rollout of the FOSS professional development initiative in fourth grade between Year 1 and Year 2, with an additional change to fifth-grade classrooms in Year 3. Because the year coefficients are fully interacted with the treatment coefficients in the model, and because FOSS was randomly preceded by the Immersion initiative in half the schools, the model yields separate estimates of the effects of Immersion and FOSS as well as the combination.
Results
Any Scientific Inquiry
We start with the experimental results and our first research question: whether the professional development initiative changed the prevalence of scientific inquiry in the observed lessons, both overall (Model 1) and in the different years of the study (Model 2). For the any feature outcome in Table 5, Models 1 and 2 show that the offer to attend the Immersion institute increased the log odds of observing some form of scientific inquiry in a lesson. In Model 1, the Immersion coefficient is 0.611, which corresponds to an odds ratio of 1.84, or that the odds of observing any scientific inquiry was 84% higher in a lesson from an Immersion school than a lesson from a comparison school. As we expect, longer observations and lessons taught by SLTs are more likely to demonstrate some features of scientific inquiry.
Results From Logistic Regression Models Predicting the Evidence of Science Inquiry in Classroom Observations
Note. 711 observed lessons from 73 schools. Lesson length in 5-minute segments and centered on the sample mean. Cluster-adjusted standard errors in parentheses. Controls for local district randomization block and intercept coefficients not shown. SLT = science lead teacher.
p < .1. *p < .05. **p < .01.
Accounting for the different years of the study (Model 2) shows that the magnitude of the treatment difference changed over time. In these models, the Immersion coefficient describes the difference between the Immersion schools and the comparison schools in Year 1; adding the year estimates and the interaction terms produces contrasts for different years. 7 This model reveals that the contrast between the experimental conditions was very large in the 1st year of the study (0.922, odds ratio = 2.51) and smaller in the subsequent two years (0.922 – 0.527 = 0.395 in Year 2, odds ratio = 1.48; 0.922 – 0.349 = 0.573 in Year 3, odds ratio = 1.75). The quasiexperimental estimates for the differences from Year 1 to Year 2 (1.037, odds ratio = 2.82) and Year 1 to Year 3 (1.031, odds ratio = 2.80) suggest that evidence of scientific inquiry increased substantially in the comparison schools after the 1st year, thereby attenuating the difference between lessons from the Immersion and comparison schools in any form of scientific inquiry.
Features of Scientific Inquiry
To address our second research question—how teaching practice changed—the subsequent models investigate each feature of scientific inquiry to describe how individual features responded to the professional development initiatives. The results suggest that the increase in the overall level of scientific inquiry was limited to a subset of the features of inquiry. The pooled data in Model 1 for Feature 1 suggest that the two experimental conditions were comparable overall, but Model 2 suggests a dynamic situation. The Immersion coefficient in Model 2—representing the experimental contrast in the 1st year—is large (0.640, odds ratio = 1.90), with a p value of 0.059. The quasiexperimental year coefficients are larger still (1.821, odds ratio = 6.19; and 1.960, odds ratio = 7.10), suggesting that a lesson from a comparison school was 6 or 7 times more likely to reveal evidence of scientific questioning in the 2nd and 3rd years of the study than in the 1st year. The interaction of the treatment with year suggests that there was no difference between Immersion and comparison school in the subsequent years of the study; the experimental difference is apparent only in the 1st year, but the increase in Feature 1 in the comparison schools eroded the initial difference. We take this to mean that both professional development interventions increased the amount of questioning that occurred in LAUSD classrooms.
In contrast to Feature 1, the experimental estimate for Feature 2 (“Learner gives priority to evidence in responding to questions”) moderates less across years. Overall, the Model 1 estimate (0.391, odds ratio = 1.48) suggests that lessons from the Immersion schools were more likely to show evidence of Feature 2, but the Model 2 experimental estimate is not statistically significant because of a larger standard error. The quasiexperimental estimates and interaction terms show no evidence that Feature 2 responded to the introduction of FOSS.
Both Immersion and FOSS appear to have influenced Feature 3 (“Learner formulates explanations from evidence”). The overall experimental estimate in Model 1 (0.466, odds ratio = 1.59) is similar in magnitude to Feature 2; the experimental estimate for Year 1 is larger still (0.719, odds ratio = 2.05) and is statistically significant. The quasiexperimental contrast in the comparison group between Year 1 and Year 3 is large enough to be statistically significant (0.889, odds ratio = 2.43) and is marginally significant between Year 1 and Year 2.
There is no evidence of an experimental impact in Feature 4 (“Learner connections explanations to scientific knowledge”). There is quasiexperimental evidence, however, of an increase in Feature 4 between the 1st and subsequent years of the study. The year coefficients for Feature 4 are quite large (1.510, odds ratio = 4.53; 1.701, odds ratio = 5.48), but the basis for this substantial increase is quite low: Only 4% of comparison classrooms showed evidence of Feature 4 in Year 1 (see Table 4). Neither professional development initiative appeared to have an impact on Feature 5 (“Learner communicates and justifies explanations”).
Professional Development Observations
To investigate our third research question—how the classroom observation findings relate to teachers’ professional development experiences—we turn to the coded observations of six of the Immersion institutes. The results are reported in Table 6. We wish to know whether the relative emphasis during the workshops on the different features of inquiry resembles the experimental impact estimates. The workshops consisted of two types of activities: periods when the facilitators modeled the Immersion unit lessons to teachers assuming the role of students and periods when facilitators engaged teachers in reflective discussions about implementing the lessons. We focus on the periods when the teachers behaved as students, which on average consisted of 8.29 segments, or slightly more than 2 hours each day. 8 In light of the analysis of the individual features of inquiry, we are most interested in how the five inquiry features were presented to the teachers when they assumed the role of students.
Features of Inquiry Present in Science Immersion Professional Development Institutes
Note. Twenty-nine professional development sessions from six Science Immersion institutes observed (three fourth-grade institute in summer/fall 2006, one fifth-grade institute in 2007, and two fifth-grade institutes in 2008). Each coded segment is 15 minutes long.
Each 15-minute segment covered one or more of the features of inquiry. Feature 1 was prominent during the reflective segments of the day (not shown), but the segments during which the teachers were treated as students emphasized Feature 2 and Feature 3 (3.39 and 3.04 segments per day, respectively), and Feature 4 and Feature 5 were not often present (1.21 and 0.93 segments per day, respectively). Although we can make only limited inferences from these data, we do note that the two features that were most frequently present in the Immersion institutes—Feature 2 and Feature 3—are the same features for which we observe statistically significant experimental treatment estimates. We take this as suggestive evidence that the impact in the classroom corresponds to the emphasis placed in the professional development. Moreover, the fact that the professional development facilitators modeled Feature 4 and Feature 5 approximately once each day suggests that the features emphasizing conceptual connections and communicating scientific knowledge were difficult not only for the teachers but also for the facilitators to implement in practice.
Discussion
This article investigated science instruction in LAUSD classrooms to document the impact of two different professional development initiatives on scientific inquiry instruction: Immersion and FOSS. The experimental and quasiexperimental analyses using any feature of scientific inquiry as an outcome provide rigorous evidence that teaching practices can be influenced by professional development at a large scale in a setting such as LAUSD. This article also illustrates the value of directly observing the relationship between intervention elements and teaching practices to inform the achievement findings in a randomized cluster trial. Particularly in the presence of negative or null effects, it is vital to examine the design and delivery of training to teachers as well as classroom instruction in order to interpret student achievement findings and design more effective interventions.
Our first research question was whether the two professional development workshops changed teaching at all. We found evidence that both initiatives changed teaching practice. The experimental analysis showed that there was more evidence of scientific inquiry observed in lessons from the schools that were randomly selected to send teachers to the Immersion institutes. Not all of the observed teachers attended an Immersion institute, but the lessons from the Immersion schools were nearly twice as likely to show evidence of any kind of scientific inquiry relative to lessons from the Immersion schools across all 3 years of the study (odds ratio = 1.84). This contrast was strongest in the 1st year (odds ratio = 2.51). Sending a small number of teachers to an intensive professional development workshop changed teaching practice. The experimental contrast weakened in the subsequent years of the study not because scientific inquiry was no longer apparent in lessons from the Immersion schools but rather because scientific inquiry was more apparent in the lessons from the comparison schools after the universal FOSS training was introduced (Year 2 odds ratio = 2.82; Year 3 odds ratio = 2.80). One-day workshops, such as the FOSS training, have been characterized as having little value (Garet et al., 2001; Little, 1993), but the odds of observing some form of scientific inquiry in the observed classrooms nearly tripled after every elementary teacher in the district was provided with at least 1 day of training. In sum, both professional development initiatives changed teaching practice.
This finding is tempered by the individual results for the five essential features of inquiry we used to answer our second research question (NRC, 1996, 2000). Using an observational tool that records the essential features of inquiry, we were able to see that the full cycle of inquiry was not evident in the professional development sessions or in the subsequently observed lessons. There is some evidence that the Immersion initiative doubled the extent to which students asked scientific questions (Feature 1) in the 1st year (odds ratio = 1.90, p = .059). The Immersion professional development increased the extent to which students gave priority to evidence to answer questions (Feature 2). In the 1st year, the observers were twice as likely to observe students forming explanations from evidence (Feature 3; odds ratio = 2.05); this contrast was smaller but still substantial in the pooled data (odds ratio = 1.59). There is little evidence, however, that the Immersion intervention influenced the extent to which students connected these explanations to scientific knowledge (Feature 4) and no evidence that they communicated or justified these explanations (Feature 5). Notably, teachers were exposed most frequently in the Immersion institutes to Features 1, 2, and 3, with Features 2 and 3 being most prominent (observed in three segments per day).
In the years after the districtwide FOSS training, Features 1, 3, and 4 were more prevalent (Year 2 odds ratios = 6.19, 2.43, and 4.53, respectively). The odds of observing scientific questioning (Feature 1) were 6 times higher after the FOSS training, which is a dramatic increase. The odds of observing students connect explanations to scientific knowledge (Feature 4) also increased substantially, but the absolute level remained low. There is no evidence of an impact of FOSS on Feature 2 and Feature 5. Schools in which Immersion was followed by FOSS had higher levels of Feature 2 than in schools in which FOSS was introduced alone, but Feature 5 increased under no circumstances. Future research should look more closely at the differences between open inquiry and guided inquiry curricula and professional development. On the basis of these two examples, open inquiry (Immersion) seems to promote the use of evidence, and guided inquiry (FOSS) promotes questioning and, importantly, connecting evidence to scientific knowledge.
On the whole and despite substantial differences in curriculum design and deployment, professional development in either Immersion or FOSS—and for some Immersion school teachers in later years, both—stimulated teachers to initiate scientific inquiry but fell short of exposing students to all of its elements in a way consistent with the original vision of scientific inquiry described in the NSES. Both interventions emphasized questioning and gathering evidence, but no regime stimulated all of the features of inquiry at once, and under neither did students communicate and justify their explanations. In light of these findings, the NRC’s (2012) current effort to redefine scientific inquiry more concretely seems appropriate, but reframing standards will bear fruit only if teachers can successfully practice them with their students.
Although we do not have evidence that engaging in all of the features of scientific inquiry will improve student achievement, we are concerned that partial exposure to scientific inquiry may be detrimental. After all, students in the Immersion schools performed worse on standardized tests than students in the comparison schools in the 1st year of the study (Borman et al., 2008). Conducting empirical forays without connecting the results back to scientific concepts or requiring students to articulate their conceptual understanding is likely to leave teachers and students unclear as to the implications of their investigations and may not lead to learning (Kirschner et al., 2006). Future research should investigate how the individual features of scientific inquiry influence student achievement as well as other outcomes, such as motivation or interest in science.
We suspect that the similar response of LAUSD fourth- and fifth-grade teachers to these two workshops on scientific inquiry reflects the challenges of activity-based learner-centered instruction as well as the LAUSD setting. As others as have found (e.g., Pane et al., 2010), learner-centered instruction is difficult. One reason may be that learner-centered instruction often is designed around activities. These activities are exciting but may not lend themselves to all of the features or practices of scientific inquiry. In a detailed video study of teaching practices in high school engineering classrooms, Walkington and colleagues (Walkington, Nathan, Wolfgram, Alibali, & Srisurichan, in press) find that students and teachers tend to miss opportunities for “reflection and integration of ideas” when engaging in learner-centered activities. Moreover, they caution against “activity for activity’s own sake.” Activity of this sort is what Kirschner and his colleagues (2006) warn does not lead to learning. We suspect a similar phenomenon in our study may account for the relative deemphasis of Feature 4 and Feature 5 in our study and propose that curriculum developers not only heed the specificity of the new K–12 framework (NRC, 2012) but also follow the example of Barab and colleagues (2007) and redesign curricula to expose students to more challenging and integrating experiences. Furthermore, our observations of the professional development sessions suggest that the facilitators also found it challenging to present Feature 4 and Feature 5 to the teachers. It may be more reasonable to expect teachers to reproduce the experience they had in training than to rebalance the distribution of inquiry features they introduce to students.
In follow-up interviews, teachers reported a related constraint: Science was not an instructional priority relative to literacy and mathematics, and consequently, they had little classroom time for science. Reflection and integration require both guidance and time. Inquiry approaches may not provide sufficient guidance to students or teachers, and time constraints limited classroom opportunities to complete the cycle of scientific inquiry. Both the Immersion and the FOSS initiatives were subject to these constraints, which we think contribute—along with the difficulty of the task and the training teachers were given—to the limited implementation of the cycle of scientific inquiry. There remains much to be understood about how professional development initiatives translate into classrooms, such as those in LAUSD, but the responsibility for achieving a new vision of scientific inquiry may lie not with teachers alone but with curriculum developers, facilitators, and administrators as well.
Footnotes
Acknowledgements
The authors are grateful for comments on prior drafts from Jill Bowdon, Sarah Bruch, and Paul Hanselman and for research assistance from Alexan Chalaganyan, Erika Hernandez, Phil Selvey, Ashley Turner, and Sun Young Yoon.
Authors’ Note
A previous version of this article was presented at the 2011 Annual Meeting of the American Educational Research Association, New Orleans, Louisiana, and at the Interdisciplinary Training Program in the Education Sciences at the University of Wisconsin-Madison.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Research for this article was supported by grants from the National Science Foundation (Award No. ESI-0554566) and the Institute of Education Sciences at the U.S. Department of Education (Grant No. R305C050055) to the Wisconsin Center for Education Research, School of Education, University of Wisconsin-Madison. Findings and conclusions in the article do not necessarily represent the views of the supporting agencies.
Notes
Authors
JEFFREY GRIGG is a doctoral student in the Department of Sociology at the University of Wisconsin-Madison. His research interests include educational inequality and causal inference in real-world settings.
KIMBERLE A. KELLY is a Lecturer in the Department of Psychology at the University of Southern California. She has spent the last ten years directing research and program evaluation in STEM initiatives focused on improving teaching and learning in science.
ADAM GAMORAN is the John D. MacArthur Professor of Sociology and Educational Policy Studies and director of the Wisconsin Center for Education Research at the University of Wisconsin-Madison. His research focuses on educational inequality and school reform.
GEOFFREY D. BORMAN is Professor of Education and Sociology at the University of Wisconsin-Madison. His areas of research include experimental and quasi-experimental design, educational policy, and educational inequality.
