Abstract
Research has consistently shown that inquiry-based learning can be more effective than other, more expository instructional approaches as long as students are supported adequately. But what type of guidance is adequate, and for whom? These questions are difficult to answer as most previous research has only focused on one type of guidance and one type of learner. This meta-analysis therefore synthesized the results of 72 studies to compare the effectiveness of different types of guidance for different age categories. Results showed facilitative overall effects of guidance on learning activities (d = 0.66, 95% CI [0.44, 0.88]), performance success (d = 0.71, 95% CI [0.52, 0.90]), and learning outcomes (d = 0.50, 95% CI [0.37, 0.62]). Type of guidance moderated the effects on performance success but not on the other two outcome measures. Considerable variation was found in the effects of guidance on learning activities, but the relatively low number of studies do not allow for any definitive conclusion on possible age-related differences.
Keywords
Psychologists and educational scientists seem to converge on the notion that student involvement is key to successful learning (e.g., Freeman et al., 2014). The question of how students should be actively involved in the learning process is nevertheless still debated. In formal disciplines such as science and mathematics, a pedagogical approach known as inquiry learning qualifies as an organic way to make students active agents in their own learning process. Inquiry-based methods, in short, enable students to learn about a topic through self-directed investigations. By “acting like a scientist,” students not only learn science content but also science processes which both are included in the curriculum standards of many countries worldwide (Abd-El-Khalick et al., 2004; Achieve, 2010).
Despite their appealing nature, controversy remains as to whether and when inquiry-based methods promote student learning. Ausubel (1962), for example, contested that inquiry-based learning is only appropriate for learners in the concrete operational stage of development; once a person is capable of formal operational reasoning it is more effective, and certainly more efficient, to directly teach the abstract principles underlying a topic or domain through expository methods. Taba (1963) on the other hand, emphasized the potential of inquiry learning to strengthen students’ ability to use cognitive processes. Even though she admitted that is it unfeasible and undesirable to learn every topic through inquiry, she also believed that it would be possible to find the conditions of an effective teaching–learning strategy that would make inquiry learning possible.
These arguments have been repeated throughout the years, albeit in slightly different form, by scholars criticizing and advocating inquiry learning. In the most recent debate, Kirschner, Sweller, and Clark (2006) argued that inquiry-based methods are unlikely to be effective because they ignore the limitations of working memory (i.e., the cognitive structure in which conscious processing occurs). The mere conduct of an inquiry is already so demanding that it requires most cognitive capacity in working memory, leaving (too) little capacity to store novel information in long-term memory. A reply by Hmelo-Silver, Golan Duncan, and Chinn (2007) challenged this notion by arguing that contemporary inquiry-based methods are powerful and effective models of learning because they employ extensive scaffolding. Of interest is the shift in focus from inquiry learning per se to the conditions under which inquiry learning can be effective. As student guidance seems to be the key to answering this question, the present meta-analysis aims to contribute to this ongoing debate by investigating the conditions of effective inquiry learning support in science and math education.
The above discourse also points to the importance of defining what exactly should be understood by inquiry learning and when it can be considered “unguided,” “minimally guided,” or “guided.” As there appears to be no consistent definition of inquiry learning (Alfieri, Brooks, Aldrich, & Tenenbaum, 2011; Klahr & Nigam, 2004; Strike, 1975), the present meta-analysis had to conceive one of its own, and defined the method as one in which students conduct experiments, make observations or collect information in order to infer the principles underlying a topic or domain. These investigations are governed by one or more research questions, either provided by the teacher or proposed by the student; adhere (loosely) to the stages outlined in the scientific method; and can be performed with computer simulations, virtual labs, tangible materials, or existing databases. This working definition is consistent with the description of inquiry learning used by the National Science Foundation (2000) and the scientific processes and practices outlined by the National Research Council (2012). The definition of guidance is more complicated and will be worked out in the sections that follow.
Previous Reviews of Inquiry Learning
Inquiry-based learning is rooted in the work of John Dewey (1859–1952), a philosopher of education who played a prominent role in educational reform in the first half of the 20th century. Dewey believed that instead of emphasizing the memorization of facts, science education should teach students how to think and act scientifically (National Research Council, 2000). Inquiry learning was adopted on a large scale in educational practice during the discovery learning movement in the 1960s, albeit for a different reason. Inspired by the work of Bruner (1961; Bruner, Goodnow, & Austin, 1956), inquiry-based methods were initially embraced as a fruitful way to learn science content, but the influential work of scholars such as Kuhn (2005) and Klahr (2000) gradually renewed the emphasis on the cultivation of science process skills.
These echoes of time are reflected in the review studies on inquiry learning that have been published over the past decades. Early research syntheses by Bittinger (1968) and Hermann (1969) found inquiry-based learning to be more effective compared with expository forms of instruction. The advantage of inquiry learning was most apparent for the transfer of the learned material; superior retention was often obtained from expository methods. Bittinger’s review further showed that these effects occur both in laboratory studies and in regular classrooms, although the effects in the latter setting tended to be less pronounced. In addition, Hermann found initial evidence that the effectiveness of inquiry learning depends on the guidance students receive during an inquiry. Despite considerable variation in the amount and type of guidance across studies, some within-study comparisons revealed that a reasonable degree of guidance is more effective than low guidance.
Thirty years later, T. De Jong and Van Joolingen (1998) reviewed the literature on simulation-based inquiry learning in science domains. Following an analysis of the typical problems students encounter in the various phases of the inquiry cycle, the authors synthesized empirical studies comparing the effectiveness of simulations with and without additional guidance (called “instructional measures”). Simulations enhanced with instructional measures were found to yield higher learning outcomes overall than simulations without this guidance. Three measures in particular appeared to be effective: giving learners access to domain information during the inquiry, offering assignments to structure the inquiry process, and constraining the complexity of the inquiry process through model progression. A review by T. De Jong, Linn, and Zacharia (2013) suggested that the advantages for guidance in simulation-based inquiry learning generalize to different instructional settings. Based on a synthesis of well-controlled comparison studies, the authors concluded that, for the acquisition of conceptual knowledge, investigations conducted with physical materials are as effective as technology-enhanced investigations that involve computer simulations or online laboratories.
A recent meta-analysis partially confirmed these findings (D’Angelo et al., 2014). Simulations were found to have an advantage in achievement over nonsimulation instruction (d = 0.62), but did not affect students’ learning activities (d = 0.26). These results should nevertheless be treated with some caution because the nonsimulation condition involved “some other kind of instructional treatment” (D’Angelo et al., 2014, p. 14), which does not rule out the possibility that the effect of using simulations was confounded with the instructional method. A more solid conclusion from this meta-analysis was that supplementing a simulation with some form of learner guidance had a modest effect on learning outcomes (d = 0.49) and inquiry skills (d = 0.41) but these effects proved independent of the type of guidance. However, two other variables did moderate the outcomes: fifth-graders failed to take advantage of the guidance offered via the simulation whereas older learners did, and guidance proved ineffective for simulation-based inquiry learning in math whereas significant positive effects were found in the domain of science.
Other recent meta-analyses have contrasted inquiry learning with more expository forms of instruction. The outcomes once again confirm the importance of learner guidance. Alfieri et al. (2011) found that across domains and settings, inquiry-based methods with minimal or no guidance are less effective than explicit instruction (d = −0.38). But when students receive adequate guidance during the inquiry, they learn more (d = 0.30) than students who are taught the same content by expository methods. Furtak, Seidel, Iverson, and Briggs (2012) conducted a similar meta-analysis in the domain of science. Students engaged in an inquiry with minimal guidance from their teacher tended to learn more than students exposed to “traditional instruction” (d = 0.25). However, studies comparing teacher-directed inquiry learning with traditional instruction had a considerably higher overall mean effect size (d = 0.65). The magnitude of this effect could not be replicated in the meta-analysis by Carolan, Hutchins, Wickens, and Cumming (2014), who found a small but significant overall benefit (g = 0.15) of more guided inquiry learning over less guided inquiry learning.
In conclusion, the research integrations presented in this section provide convincing evidence that inquiry-based methods can be more effective than other, more expository methods of instruction. Its effectiveness has mainly been demonstrated on learning outcomes assessed after the task by means of domain knowledge posttests; embedded assessment of the actions learners perform during an inquiry (i.e., learning activities) and the quality of the products they create during that inquiry (i.e., performance success) has received significantly less attention. The second conclusion that can be drawn is that the effectiveness of inquiry learning depends almost entirely on the availability of appropriate guidance. Which types of guidance are appropriate cannot be determined on the basis of the existing reviews and meta-analyses. This seems at least in part due to the fact that guidance is often classified ad hoc on the basis of the included studies. Using an a priori classification based on a theoretical framework might be more fruitful and ease interpretation of the findings. Finally, even though previous meta-analyses have paid attention to possible moderating effects due to the learners’ age, the relative effectiveness of different types of guidance for different age groups has not yet been assessed.
These three conclusions shaped the design and focus of the present meta-analysis in which possible age-related differences in the effectiveness of inquiry learning guidance were examined. In order to contextualize the meta-analysis’ research questions, the next section will give a more detailed account of what learners are supposed to do during an inquiry and propose various ways in which their activities can be supported.
Empirical Research on Inquiry Learning
Lazonder (2014) identified two major strands in inquiry learning research. On the one hand are studies investigating how particular groups of learners go about performing an inquiry. These studies often address developmental differences in scientific reasoning (i.e., the core thinking skills underlying successful inquiry learning), and offer the same, often minimal guidance across age groups to find out what learners of a certain age can and cannot do on their own account. The second line of research aims to identify the effects of different types of guidance on learners’ inquiry activities, performance success, and learning outcomes. These studies typically test the effects of one type of guidance with one homogeneous group of learners. The main findings of both lines of research are summarized below.
Developmental Differences in Scientific Reasoning
Scientific reasoning is defined as the application of scientific inquiry methods to reasoning situations (Kuhn & Franklin, 2006). A well-known and respected model describing the cognitive processes underlying scientific reasoning is the Scientific Discovery as Dual Search framework of Klahr and Dunbar (1988). According to this model, scientific discovery proceeds in iterative cycles of hypothesis generation, experimentation, and evidence evaluation. These three core processes have been extensively studied in the cognitive and developmental research conducted over the past 25 years.
The ability to formulate hypotheses is not present until elementary school age the earliest (Piekny & Maehler, 2013). From that age onward, children, teenagers, and adolescents all tend to begin their inquiry by focusing on hypotheses consistent with prior beliefs—but differ in their attempts to generate subsequent hypotheses. Cross-sectional research has shown that children tend to keep generating plausible hypotheses and often get stuck focusing on a single hypothesis, whereas older learners more often generate implausible hypotheses and are more likely to consider multiple hypotheses (Klahr, Fay, & Dunbar, 1993). Still, teenagers and even adolescents spontaneously generate very few hypotheses, and confuse hypotheses with predictions (Gijlers & De Jong, 2005; Njoo & De Jong, 1993). It thus seems that, although the ability to generate hypotheses has matured around the age of 12, inducing new or alternative hypotheses from data remains difficult across age groups (Klahr et al., 1993). This finding is particularly relevant to math education where learners mainly have to rely on available data or examples to create hypotheses (which are commonly referred to as conjectures). As mathematical objects and relations are abstract, they generally do not allow learners to make conjectures based on prior knowledge or everyday experience (Cañadas, Deulofeu, Figueiras, Reid, & Yevdokimov, 2007). Inducing conjectures from analogical cases or problems is perhaps an exception, but recognizing appropriate analogies can be challenging for learners (cf. Gick & Holyoak, 1980).
The ability to design experiments suitable for generating or testing hypotheses has proven to be difficult across ages. Five year olds are not yet able to distinguish between testing a hypothesis and generating an effect, but a notable increase in this ability occurs around the age of 6 (Piekny, Gruber, & Maehler, 2014; Sodian, Zaitchik, & Carey, 1991). Basic competence in carrying out unconfounded investigations of the relationship between variables that clearly covary can be acquired by many pupils by the age of 10 (Kanari & Millar, 2004; Schauble, Glaser, Duschl, Schulze, & John, 1995), but with the right support 6 and 7 year olds are already able to learn how to conduct sound investigations (Chen & Klahr, 1999; Varma, 2014). With age, children increasingly improve their ability to set up valid experimental comparisons and transfer this skill to new domains (Chen & Klahr, 1999; Koerber, Sodian, Kropf, Mayer, & Schwippert, 2011; Veenman, Wilhelm, & Beishuizen, 2004).
Evidence evaluation involves a critical assessment of the results obtained through experimentation, which should lead to informed decisions on whether and how hypotheses should be accepted, rejected, or revised and further examined. This coordination of theory and evidence often requires learners to make inferences about causal relationships between variables showing different patterns of covariation. The ability to evaluate perfect covariation and non-covariation evidence develops during the preschool and early elementary school years (Koerber et al., 2011; Piekny & Maehler, 2013). Preschool children at 4 years of age can already correctly interpret perfect covariation data as evidence supporting a causal hypothesis, and this ability improves significantly between the ages of 4 and 5 (Koerber, Sodian, Thoermer, & Nett, 2005; Piekny et al., 2014). The ability to interpret imperfect covariation seems to be comparatively more demanding (Inhelder & Piaget, 1958; Koslowski, 2008; Kuhn & Phelps, 1982). It is therefore no surprise that this skill does not seem to be well developed yet in early childhood (Koerber et al., 2005; Piekny et al., 2014), develops slowly, and hardly ever reaches maturity (Kuhn, Amsel, & O’Loughlin, 1988).
This summary of the research on scientific reasoning concurs with Zimmerman (2007), who concluded on the basis of a comprehensive review that “children are far more competent than first suspected, and likewise, adults are less so” (p. 213). Children from the age of 5 onward possess a basic ability to generate hypotheses, design and conduct experiments, and evaluate evidence, and therefore seem ready for inquiry learning. These young learners obviously need specific guidance in order for their inquiry to be successful, but the same could be said regarding older learners who, due to their higher levels of proficiency, can be assumed to require less specific types of support.
Learner Guidance to Support Scientific Reasoning
Studies investigating the effectiveness of inquiry learning in math and science education typically compare groups of learners who either do or do not receive a particular form of guidance. The results are generally in favor of guided inquiry learning, as was indicated by the meta-analyses presented above, but most studies provide insufficient evidence to explain these beneficial effects in light of the support needs of different age groups. Three important limitations are discussed below, together with their implications for the design of the present meta-analysis.
As knowledge acquisition is the ultimate goal of any instructional approach, inquiry learning research has predominantly and often exclusively focused on this particular outcome measure. This means that the effect of guidance is usually inferred from the knowledge learners have developed rather than assessed from the activities they perform during the inquiry. Although this type of research is defensive in its own right, a direct assessment of whether guidance promotes the inquiry processes as expected is more informative, and the studies that do combine process and outcome measures have indeed yielded valuable insight into the effects of the guidance under study (e.g., Burns & Vollmeyer, 2002; Chinn & Malhotra, 2002a; Lazonder, Hagemans, & De Jong, 2010; Toth, Klahr, & Chen, 2000).
Inquiry learning research also suffers from terminological inconsistency. Existing studies diverge considerably on which activities an inquiry comprises, and are equally inconsistent in designating the type of guidance being investigated. Some researchers used colorful names such as systematic science instruction (Samarapungavan, Mantzicopoulos, & Patrick, 2008), information tips (Hulshof & De Jong, 2006), and mechanistic cues (Kaplan & Black, 2003) that do not convey the type of guidance learners receive. Other studies used different names for the same type of guidance. For example, the mechanistic cues used by Kaplan and Black (2003) served to remind learners to use the control of variables strategy (CVS) in designing experiments. In other writings, such reminders have been referred to as prompts (Marschner, Thillmann, Wirth, & Leutner, 2012), heuristics (Veermans, Van Joolingen, & De Jong, 2006), and hints (Rey, 2011). The reverse problem also occurs. Puntambekar and Hubscher (2005) noticed a change in the notion of scaffolding such that the term is increasingly being used to indicate support mechanisms that lack the core attributes of the original concept such as ongoing diagnosis, calibrated guidance, and fading. This lack of operational definitions complicates the comparison of the efficacy of different types of guidance and is likely to caused unwarranted disagreement among scholars in the field (Klahr, 2013).
Terminological differences aside, previous research has almost exclusively focused on one type of guidance and one type of learner. Remarkably, however, the researchers’ choice of specific or less specific guidance appears to be independent of the learners’ age. Explicit support such as direct instruction of the CVS has been successfully applied with children (e.g., Klahr & Nigam, 2004; Varma, 2014), but older learners benefit from explicit directions too (e.g., Sao Pedro, Gobert, Heffernan, & Beck, 2009). Similarly, less specific types of support such as model progression have been found to promote inquiry learning in children (White, 1993), teenagers (Wichmann & Timpe, 2013), and adolescents (Mulder, Lazonder, & De Jong, 2011). As few studies have investigated the effects of one type of guidance across age groups, the present meta-analysis aimed to synthesize the results of individual studies to compare the effectiveness of different types of support for different age categories.
The limitations outlined above point to two implications for the design of this meta-analysis. One is that the scope of the analysis should go beyond the mere assessment of learning outcomes. Although it may be difficult to include enough studies investigating how guidance shaped the inquiry learning process, this information is a valuable addition to the empirical research and makes the meta-analysis stand apart from previous research integrations. The second implication is that the meta-analysis should start from a typology of inquiry learning guidance that differentiates different types of guidance based on their specificity.
Typology of Inquiry Learning Guidance
As a first step in investigating whether a differential effect of inquiry learning guidance indeed exists, this meta-analysis defined guidance as any form of assistance offered before and/or during the inquiry learning process that aims to simplify, provide a view on, elicit, supplant, or prescribe the scientific reasoning skills involved. As both the term guidance and its definition are rather broad, some further specification seems appropriate. In previous writings, guidance has often been classified in terms of the learning activities it aims to support. T. De Jong (2006), for example, organized various types of guidance according to the phases of the inquiry cycle, and Lazonder (2014) did the same for the three core scientific reasoning processes. A more coarse-grained classification was proposed by Reid, Zhang, and Chen (2003), who made a distinction between interpretative support that helps learners understand important domain concepts, experimentation support for guiding learners in designing and conducting experiments, and reflective support that assists learners in looking back on their inquiry and the knowledge acquired. A similar classification was proposed by Quintana et al. (2004) who presented a series of scaffolding principles and guidelines to support sense making, process management, and articulation and reflection.
Even though the above classifications are generally acknowledged and frequently cited in the research literature, they are less appropriate for the purpose of this meta-analysis. Developmental research suggests that younger, less experienced learners need more explicit guidance than older learners; investigating this conjecture thus requires a framework that classifies guidance in terms of extensiveness or explicitness rather than the skills or activities involved. A potentially more appropriate typology was proposed by T. De Jong and Lazonder (2014) who organized their framework according to the specificity of the guidance learners need to successfully perform an inquiry. This typology was used in the present meta-analysis; a short description of its defining characteristics is given in Table 1.
Typology of inquiry learning guidance
Note. Based on T. De Jong and Lazonder (2014), with minor textual modifications.
Process constraints offer the least specific type of guidance and are therefore intended for learners with matured inquiry skills. Process constraints give no overt directions at all but instead organize the inquiry into a series of manageable subtasks, for instance, by increasing the number of elements the learner should investigate (White, 1993), increasing the fidelity of the learning environment (Alessi, 1995), or increasing the number of features the learner can control (Rieber & Parmley, 1995). Status overviews are more specific in that they summarize what or how well the learner has performed; the decision to use this data to maintain or adapt one’s behavior is entirely up to the learner. An example of a status overview in collaborative inquiry learning is the Participation Tool that visualizes how much group members have contributed to the online learning dialogue. The use of this tool was found to encourage the more silent or less active students to increase their participation (Janssen, Erkens, Kanselaar, & Jaspers, 2007). Prompts are timed cues, either given by a human being or embedded within the learning environment, that remind the learner to perform a particular action. Prompts are more specific than status overviews because they tell the learner what to do (but not how to do it) at appropriate moments during the inquiry.
The remaining types of guidance all provide the learner with guidelines on how to perform a certain activity, but differ with regard to the specificity of the directions given. Heuristics remind learners to perform an action and point out possible ways to perform that action. Similar to prompts, heuristics can be implemented as cues in the learning environment that appear either at preset points in the inquiry (e.g., Lee & Chen, 2009) or in adaptive response to the learner’s actions (e.g., Veermans et al., 2006). Alternatively, the full set of heuristics can be given at the outset of the inquiry, for example as a series of assignments or as process guidelines on student worksheets (e.g., De Vries, Van der Meij, & Lazonder, 2008; Njoo & De Jong, 1993).
Scaffolds offer more specific guidance than heuristics: they assist learners in performing demanding activities by explaining what to do and how to do it, and provide designated means to carry out, structure, or simplify the learner’s actions. Consistent with Vygotsky’s (1978) notion of the zone of proximal development, scaffolds explain or take over the difficult parts of the activity; when the learner’s skill level increases, the scaffolding is gradually removed so that the learner eventually performs the activity without assistance. However, this fading is often and rightfully absent in short-term inquiries because inquiry skills develop slowly. Scaffolds come in many forms and have been designed for different age groups. For example, the Go-Lab environment (T. De Jong, Soteriou, & Gillet, 2014) incorporates scaffolds that assist learners in elementary education, middle school and high school during the key phases of an inquiry such as formulating research questions, generating hypotheses, and designing experiments. For each of these activities a designated scaffold is available (see http://www.golabz.eu/apps).
Explanations offer the most specific type of guidance and are therefore intended for learners who lack the basic ability to perform an inquiry skill. Unlike all other types of guidance, explanations can be given either before the inquiry, for instance as a short preparatory training, or during the inquiry on a just-in-time basis. One of the most well-known examples concerns the direct instruction in experimental design developed by Klahr and co-workers (see, for an overview, Klahr & Li, 2005). Their CVS instruction proved highly successful in promoting children’s ability to design unconfounded experiments, and yielded sustained effects until 3 years after the instruction (Strand-Cary & Klahr, 2008). In all these studies, children learned the basics of experimental design through short teacher-led instructions, examples, and guided practice. More recently, the principles underlying this CVS instruction have been used to design a software tutor where explanations on the use of the CVS are embedded within the learner’s inquiry (e.g., Siler, Klahr, Strand-Cary, & Magaro, 2009).
Research Questions and Hypotheses
This meta-analysis aimed to answer three questions: (a) what is the effectiveness of inquiry learning guidance on learning activities, performance success, and learning outcomes, respectively; (b) does this effectiveness depend on the type of guidance; and (c) does the effectiveness of different types of guidance depend on the learner’s age? As previous meta-analyses by Carolan et al. (2014) and D’Angelo et al. (2014) reported positive effects of guided inquiry learning over unguided inquiry learning, the present meta-analysis was expected to yield similar outcomes. Consistent with the view of Kirschner et al. (2006), this overall positive effect was expected to differ as a function of the type of guidance, with larger effect sizes being associated with more specific types of guidance. However, the typology of guidance described in the preceding section suggests that specific forms of guidance such as scaffolds or explanations might be comparatively more effective for younger learners who experience more difficulties during an inquiry than older learners, who in turn might benefit more from less specific guidance.
Method
Literature Search
Studies investigating guidance in inquiry learning were identified through a search of reference databases and a perusal of relevant conference proceedings. First, the databases of ERIC, PsycINFO, and Web of Science were consulted for the period 1993 to 2013. This time span was selected in order to include exactly two decades of recent research on inquiry-based learning. After several trial runs, the final searches were conducted in January 2014 with the following queries: [(“inquiry learning” OR “discovery learning” OR “control of variables”) AND (“students” OR “support” OR “instruction”], [“scaffold*” AND “inquiry”], [(“scientific thinking” OR “scientific reasoning”) AND (“learning” OR “instruction”)]. The Web of Science search, limited by SSCI and CPI-SSH citation indexes, resulted in 738 hits. The searches in ERIC and PsycINFO, additionally limited by [AB abstract], returned 774 and 589 hits, respectively. By comparing the total of 2,101 hits, 524 duplicates were found; exclusion of those duplicates resulted in 1,577 unique hits.
Second, online conference proceedings of EARLI (European Association for Research on Learning and Instruction), NARST (National Association of Research in Science Teaching), and AERA (American Educational Research Association) were searched for papers concerning inquiry learning guidance. This yielded an additional 89 hits which were collected online (29) or requested from the authors (63 requested, 31 responses, 27 papers received). The 56 retrieved papers were scanned for duplicates with the search results from the databases, which resulted in the removal of 4 papers and brought the total number of available and unique conference papers to 52.
Inclusion of Studies
To be eligible for inclusion in this meta-analysis, a study had to (a) investigate inquiry-based learning in math, physics, chemistry, biology, or general science with learners aged 5 to 22; (b) compare a group of learners that received a particular type of guidance against a reference group that did not receive this guidance—regardless of whether additional basic learning support was available to learners in both groups; (c) either randomly assign learners to one of these two groups, confirm the comparability of both groups in preliminary analyses, or control for possible preexisting differences in the main analyses; and (d) assess the effects of the guidance under study on participants’ learning activities, performance success, or learning outcomes, and report these effects quantitatively by means of descriptive or inferential statistics.
The first three criteria were used in an initial screening of the studies’ abstracts. If no abstract was available, the full publication was collected and examined. This first round of selection resulted in the provisional inclusion of 216 studies. In order to reach a final decision, these studies were retrieved from an online library or requested from the authors. The 213 studies that were eventually obtained were read by both authors for inclusion. For studies with insufficient statistical information to compute an effect size, the main author was contacted with a request to provide the missing data. If the author did not respond or could not provide the requested information, the study was discarded. After differences in judgment were discussed, the authors agreed that 68 studies met all four inclusion criteria. As 4 of these investigations reported on two experiments with separate samples, the final number of studies included in this meta-analysis was 72 (see Table 2).
Studies included in this meta-analysis
Note. Some studies incorporate multiple types of guidance or multiple outcome measures; the ones listed here were included in the meta-analysis for reasons outlined in the Method section. IF = journal impact factor.
LA = Learning Activities; PS = Performance Success; LO = Learning Outcomes; is = inquiry skills; rs = regulative skills; dk = domain knowledge. bRE = randomized experiment; QE = quasi-experiment (with nonequivalent control group design). cRetrieved as advance online publication; published in 2014.
Outcome Measures and Possible Moderators
Outcome measures included in this meta-analysis were learning activities, performance success, and learning outcomes. Learning activities pertained to what participants did during the inquiry. These actions included a variety of skills ranging from planning the inquiry process through generating hypotheses and designing experiments to reflecting on outcomes, which were assessed from the experimenter’s notes, the learners’ oral or written statements, or computer log files. Performance success indicated what learners managed to achieve during the inquiry, as evidenced by the products they created throughout the learning process. Examples include the number of valid inferences, the quality of a concept map, and the proportion of correctly completed assignments. Learning outcomes were assessments administered immediately or shortly after the learning process. These measurements sought to determine what participants had learned from the inquiry through posttests, criterion tasks, interviews, or questionnaires.
To explore possible differential effects on these outcome measures, seven moderators were extracted from the studies. The first moderator, outcome focus, specified in more detail what each outcome measure addressed. This could be inquiry skills, regulative skills, or domain knowledge, depending on the outcome measure in question. Learning activities for example, was defined as a measurement of skill and could therefore only pertain to inquiry skills and regulative skills. Likewise, measures of performance success could only target domain knowledge whereas learning outcomes could refer to all three types of outcome focus.
The second moderator, publication type, referred to whether a study appeared as a journal article, conference paper, or dissertation. Journal articles were further classified according to the 2013 release of Thompson Reuters’ Journal Citation Reports®. Studies from journals with an impact factor of 1.5 or higher were coded as first-tier journal article; all other journal publications were considered second-tier articles. If studies from conference papers and dissertations had been reworked into a journal article, they were coded as such if the publication date fell within the time span covered by this meta-analysis.
The third moderator was the domain of the study, which concerned the disciplinary nature of the topic of inquiry. A coarse-grained distinction was made between math and science, where math was used as a general designation for disciplines such as arithmetic, geometry, algebra, and statistics, and science served as overarching term for physics, chemistry, and biology. Fourth, the study’s design was classified as either randomized experiment or quasi-experiment. The former type of design is characterized by a random allocation of students across experimental conditions whereas the latter lacks such randomization, often because practical constraints require researchers to work with intact classes. Duration was the fifth moderator. It provided a rough indication of the time learners’ spent on their inquiry, and hence did not include the time for pre- or post-session assessments. A dichotomous classification was used to differentiate studies that included a single session from studies with multiple sessions. Single-session studies typically lasted 20 to 100 min; multiple-session studies comprised two or more of these lessons that took place on consecutive days or with a delay of 1 week maximum.
The final two moderators represent the key factors in this meta-analysis. Type of guidance indicated the specificity of the support given to learners in the experimental condition. Consistent with the typology presented in Table 1, a distinction was made between process constraints, status overviews, prompts, heuristics, scaffolds, and explanations. These category labels reflect the terminology used in this meta-analysis and do not necessarily match the designation used in the original study. With age group, the study’s sample was classified as children, teenagers, or adolescents. Children were 5 to 12 years of age, teenagers were 12 to 15 years of age, and adolescents were 15 to 22 years of age. This coding occurred on the basis of the sample’s mean age. In case the sample’s mean age or age range was not provided, it was inferred from participants’ grade level, taking into account the differences in educational systems across countries.
Internal consistency of the outcome measures extraction and coding was determined by having both authors score all studies independently. Interrater agreement on learning activities, performance success, and learning outcomes was high with an overall Cohen’s κ of .91. All disagreements were resolved by discussion. Concerning moderator coding, straightforward study features such as publication type, domain, study design, and duration were scored by the second author, who consulted the first author when in doubt. Coding of study features that required a subjective interpretation of the rater were preceded by interrater reliability assessments. Toward this end, both authors independently scored a set of 33 randomly selected studies. The Cohen’s κ agreement estimates were .83 (type of guidance), .91 (age group), and .91 (outcome focus). After disagreements were resolved by discussion, the second author coded the remaining studies.
Computation of Effect Sizes
Standardized mean differences were computed for each outcome measure. For learning activities and performance success, this effect size metric was defined as the difference between the mean scores of the treatment group and the control group divided by the pooled standard deviation. When means or standard deviations were not reported, effect sizes were obtained from inferential statistics (t, χ2, and F) or raw data. A similar procedure was used for learning outcomes except that, where possible, effect sizes were based on gain scores so as to compensate for potential a priori differences between conditions. In studies that provided both pretest and posttest data, the mean gains and standard deviations were calculated via the conversion formula provided by Lipsey and Wilson (2001). If neither gains nor pre–post scores were available, the means and standard deviations of the posttest scores were used to calculate the effect size. Effect size estimates of all outcome measures were corrected for small sample bias using the procedure by Hedges and Olkin (1985). This unbiased effect size index will be indicated by the parameter d throughout this meta-analysis.
The methods proposed by Borenstein, Hedges, Higgins, and Rothstein (2009) were used to deal with multiple measures and multiple comparisons within a single study. In studies including multiple independent subgroups (e.g., separate data is presented for boys and girls in the experimental and control condition), first the summary statistics for the two conditions was recreated, and then this data was used to compute the effect size. In studies with multiple treatment groups, the approach taken depended on the nature of the guidance. If guidance was of the same type (e.g., two prompted conditions were compared against one unprompted control condition), a composite effect size was calculated by averaging the effect sizes obtained in the treatment groups. The effect size variance was computed based on the variance of each effect size as well as their intercorrelation. If different types of guidance were assessed within the same study, combining both effects would go against the purpose of this meta-analysis. Instead, the type of guidance that best matched the study’s goals and research questions was selected or, if this proved impossible, one type of guidance was selected at random.
A similar approach was used to resolve dependence that occurred when a study reported multiple scores for a single outcome measure. Examples include studies that assess various inquiry skills to indicate learning activities, or separate posttest scores for conceptual and procedural knowledge. As with multiple treatment groups, the preferred solution was to compute a combined effect size and calculate its variance by taking into account the correlation among the separate scores. When this correlation was not reported, the most relevant score was selected or, as the second-best alternative, one score was chosen at random.
Data Analysis
As not every study included all three outcome measures (see Table 2), the shifting-unit-of-analysis approach (Cooper, 1998) was used to preserve as much of the data as possible. This resulted in three separate analyses, one for each outcome measure, which were conducted using an identical approach. All main analyses were performed using the SPSS macros provided by Wilson (2005). All analyses used the random effects model because studies examining different types of guidance and divergent age groups are unlikely to share the same true effect size. Under the random effects model, the observed effect size Yi is defined as Yi = μ+ ξ i + є i , meaning that the observed effect size for any study is determined by the grand mean (μ), the deviation of the study’s true effect size from the grand mean (ξ i ), and the sampling error єi . The grand mean is estimated by
where the weight assigned to a particular study (Wi) is given by 1/(Var(Yi) + τ2), with Var(Yi) being the within-study variation and τ2 the between-study variation. The parameter τ2 is estimated by Q − (k − 1)/c, where Q is the between-study heterogeneity statistic, k is the number of studies, and c is calculated from the studies’ weight as
To test this meta-analysis’ first hypothesis, the summary effect of inquiry learning guidance was analyzed for each outcome measure. Inverse variance weights were used to assign more weight to effect sizes from studies with larger samples. The significance of the summary effect was determined by z tests; the Rosenthal’s (1979) fail-safe N was calculated to indicate how many additional studies with a zero mean effect size would be needed to undo a significant overall effect. Next, Q tests based on analysis of variance were performed to examine whether the basic study features moderated the findings. Analog to a one-way ANOVA, these tests indicated whether the observed between-study variance (Qb) was statistically significant; the I2 statistic was computed to indicate how much of this variance reflects true score variation. To adjust for multiple comparisons, Hochberg’s (1988) step-up procedure was used to control the familywise Type I error rate at level α = .05.
The Hochberg-corrected Qb was also used to test the second hypothesis, which predicted that the overall mean effect size was related to the type of guidance. When homogeneity of effect sizes across the six types of guidance was rejected, the planned comparison procedures described in Hedges and Pigott (2004) were used to reveal which types of guidance differed significantly from one another. The Q tests’ within-group variance (Qw) was used to test the third hypothesis that more specific types of guidance are more effective for younger learners and vice versa. The Qw homogeneity statistic indicated whether it is reasonable to assume that studies investigating the same type of guidance share a common effect size. With the type of guidance for which this assumption was disproved, planned contrasts were performed to examine whether the heterogeneity was attributable to the learners’ age group.
Results
Learning Activities
Measures of learning activities were included in 20 of the 72 studies. The summary statistics presented in Table 3 indicate that there was a significant overall effect of guidance in that guided inquiry learning led to a more proficient use of inquiry skills than did unguided inquiry learning, z = 5.85, p < .001. The d statistic and its confidence interval show that the true effect size is likely to fall in the range of 0.44 to 0.88, which according to Cohen (1988) can be considered a medium to large effect. Rosenthal’s fail-safe procedure further showed that an additional 1,419 null-effect studies would be needed to bring the p level beyond the .05 threshold of significance.
Summary of effect sizes for learning activities
Note. N = 2,374.
Moderator analysis was performed to examine whether the between-study variation in effect sizes was attributable to the basic study features listed in Table 3. Results showed that this was not the case, as neither outcome focus nor publication type, study design, and duration moderated the findings. Moderator analysis also sought to determine whether the heterogeneity of effect sizes was due to the type of guidance. Again no significant moderation effect was found, which means that the overall effect of guidance on learning activities did not depend on the specificity of the assistance students received.
However, there was significant within-group variation in the overall effect of guidance, Qw(14) = 24.50, p = .040, which could point to possible differential effects due to the learners’ age. Indeed, some age-related differences were found for two types of guidance, and the direction of effect was consistent with hypotheses. Process constraints, which is the least directive type of guidance, appeared to be more beneficial for adolescents (d = 0.94) than for children (d = 0.78). Scaffolds, in contrast, provide rather specific guidance and were found to be more effective for teenagers (d = 3.62) compared with adolescents (d = 0.70). Statistical analysis nevertheless revealed that none of these differences was statistically significant, Qw < 9.77, p > .062.
Performance Success
A total of 17 studies analyzed the products learners created during the inquiry and used this data to assess the impact of guidance on performance success. The overall mean effect sizes reported in Table 4 indicate that learners who received guidance outperformed their unguided counterparts by more than half a standard deviation, z = 7.30, p < .001, which constitutes a medium to large effect (Cohen, 1988). Rosenthal’s fail-safe N estimated that 753 unpublished studies with zero mean effect would be required to make the obtained overall effect statistically nonsignificant.
Summary of effect sizes for performance success
Note. N = 1,019.
Moderator analysis (with Hochberg’s multiple testing correction) showed that the variation between the results of the 17 studies was independent of publication type and duration. However, a significant moderating effect was found for study design. As shown in Table 4, the mean effect size of randomized experiments was almost twice as high as that of quasi-experiments. The I2 statistic further indicated that more than 80% of the between-study variance in effect size reflects true score variation, which can be considered a high amount of variance according to the benchmarks proposed by Higgins, Thompson, Deeks, and Altman (2003).
The Hochberg-corrected Qb for type of guidance was also significant, and the magnitude of I2 indicated that a moderate 65% of the observed variance represents true between-studies variation (Higgins et al., 2003). Consistent with expectations, larger effect sizes were associated with more specific guidance, r = .53, p = .030. Planned contrasts further showed that explanations were more effective than all less specific types of guidance combined, z = 1.84, p = .033. Along the same lines, no significant difference was found for scaffolds, z = 1.45, p = .073, but heuristics were significantly more effective that their less specific alternatives, z = 1.72, p = .043. Prompts were as effective as status overviews and process constraints combined, z = 0.25, p = .401, and status overviews were less effective than process constraints, z = 2.01, p = .023.
The within-group test of homogeneity was not significant, Qw(11) = 15.42, p = .164. This finding means that a particular type of guidance has a similar positive effect on performance in children, teenagers, and adolescents.
Learning Outcomes
The majority of the studies included in this meta-analysis addressed the effects of guidance on learning outcomes. The overall mean effect size of these 60 studies indicates that the presence of a particular type of guidance had a significant positive effect on learning outcomes, z = 7.83, p < .001 (see Table 5). The magnitude of the summary effect was approximately half a standard deviation, which is a medium effect according to Cohen’s (1988) benchmarks. Rosenthal’s fail-safe procedure indicated that 6,379 zero-effect studies should be added to the sample to bring the mean effect size down to a statistically nonsignificant level.
Summary of effect sizes for learning outcomes
Note. N = 5,629.
The Qb tests reported in Table 5 indicate that two basic study features might account for the observed variation in effect sizes. Publication type tended to moderate the findings such that journal articles contained higher effect sizes than conference papers and dissertations combined, and journal impact factor was positively correlated with effect size, r = .31, p = .049. However, these effects became statistically nonsignificant after Hochberg correction for multiple comparisons was applied. The significant moderating effect of outcome focus remained, which means that the effect sizes of studies assessing inquiry skills was more than twice as high as the effect sizes found in studies assessing domain knowledge. The benchmarks for I2 proposed by Higgins et al. (2003) imply that a high amount of the between-studies variance was real rather than spurious.
The type of guidance had no significant moderation effect, meaning that all six types of guidance, regardless of their specificity, were equally effective in promoting learning outcomes. Analysis of the within-group variance further showed that this effectiveness was independent of the learners’ age, Qw(54) = 62.73, p = .194: children, teenagers, and adolescents all benefited to the same extent from a particular type of guidance.
Discussion
This article quantitatively synthesized the results of 72 studies examining the effectiveness of guidance in inquiry-based teaching and learning. The obtained findings confirm the first hypothesis that guidance has a significant positive effect on inquiry learning activities, performance success, and learning outcomes. These results are consistent with recent research integrations comparing guided and unguided inquiry learning with other instructional methods (Alfieri et al., 2011; Furtak et al., 2012) and corroborate the conclusion of related meta-analyses that addressed the issue of guidance from a slightly different perspective. The one by Carolan et al. (2014) focused on simulation-based inquiry learning in adult education and imposed no date restrictions on the included studies; the resulting set of 31 studies contained none of the empirical works included in the present meta-analysis. D’Angelo et al. (2014) limited their search of the literature on simulation-based inquiry learning to the same time period as the present meta-analysis, but included only five of the studies listed in Table 2 in their meta-analysis. The present meta-analysis thus adds to the research by showing that the benefits of guidance generalize beyond simulation-based inquiry learning, are applicable to different age categories, and extend to learning activities and performance success.
Moderator analysis did not lead to a uniform conclusion as to whether the variation in effect sizes was attributable to basic study features. Two characteristics were found to moderate the findings, but their effects were inconsistent across the three meta-analyses. The overall effect of guidance on performance success was moderated by the studies’ research design. Randomized experiments yielded significantly higher effect sizes than quasi-experiments, which suggest that methodological quality could be one explanation for the divergence of effect sizes. Regarding learning outcomes, the moderating effect of outcome focus indicates that guidance has a larger impact on the development of inquiry skills than on the acquisition of domain knowledge. One plausible explanation is that the guidance included in these meta-analyses was exclusively geared toward inquiry skills; inducing knowledge by performing these skills is possible but at the same time susceptible to errors not necessarily addressed by the guidance.
Of further interest is that certain study features did not moderate the effect sizes. One of these concerned the domain of the studies. A mere 7% of the included studies were in math education but unlike D’Angelo et al. (2014), where math education was equally underrepresented, the domain did not moderate the effectiveness of guidance. The studies’ duration did not affect the effectiveness of guidance either. This finding challenges the postulation that long-term intervention studies are needed to establish the true effect of guided inquiry learning. Similar to D’Angelo et al., the present findings indicate that short-term studies reveal its actual impact just as well. Publication type did not moderate the findings either. This result contradicts the tacit belief among educational scholars that studies with large effects appear more often in esteemed outlets than studies with medium or small effects.
The second research question concerned the relative effectiveness of different types of guidance. More specific guidance such as scaffolds and explanations was expected to yield larger effect sizes than less specific support such as process constraints or prompts. This hypothesis was partially supported by the results. The overall mean effect size for learning activities and learning outcomes was independent of the type of guidance, but a moderator effect was found in performance success. The rank correlation confirmed that larger effect sizes were associated with more specific types of guidance. Results of the planned contrasts, although generally supportive, do not allow for any definitive conclusion because several comparisons included too few effect sizes from which to make valid generalizations. Still, the overall conclusion seems to be that learners perform better during an inquiry (i.e., create better products to exhibit their domain knowledge) when supported by more specific forms of guidance.
This conclusion raises the question why the type of guidance did not affect learning outcomes, which in 42 of the 60 studies concerned domain knowledge. Part of the answer lies in the lower effect sizes for domain knowledge compared with inquiry skills. In keeping with the aforementioned explanation, offering more specific guidance has an immediate effect on the domain knowledge articulated during the inquiry, but these initial differences fade away quickly after the instructional context is withdrawn. This reflects the incommensurability between learning and performance: instructional interventions designed to increase performance during the learning task are often less effective in increasing learning in the long term (Kapur & Rummel, 2012). The guidance included in this meta-analysis was at the skill level and might therefore be less appropriate to promote learning outcomes.
In light of this explanation one would expect the type of guidance to moderate learning activities. This was not the case, but the considerable within-group variance points to a differential effect that could be due to the learners’ age. This effect was most apparent for process constraints and scaffolds, and consistent with the assumption that more specific types of guidance are more beneficial for younger learners. Still, the number of studies in this analysis was rather low which does not allow for any definitive conclusions; other factors not included in the moderator analysis could have caused the within-group variation as well. An alternative explanation for the absence of any consistent age-related differences could be that empirical researchers accommodate the inquiry tasks used in their studies to the capabilities of the age group. If so, young learners engage in less demanding tasks than older learners, which cancels out possible age-related differences in proficiency and hence causes various age groups to benefit as much from more or less specific types of guidance. This possibility would obviously complicate the assessment of the effectiveness of guidance and points to a need for more cross-sectional research in which the same task is used with participants from different age groups (cf. Penner & Klahr, 1996; Schauble, 1996; Tschirgi, 1980; Veenman et al., 2004) randomly assigned to either a guided or unguided inquiry condition.
Implications for Theory and Research
The insights gained through these meta-analyses point to the need to reconsider the assumptions accompanying the typology of guidance. This framework grew out of a recent review of the inquiry learning literature (T. De Jong & Lazonder, 2014) and was applied for the first time in this meta-analysis. The framework proved useful to classify the guidance used in previous studies on a single dimension, but the assumed match between the types of guidance and learners’ prior knowledge and skills was contradicted by the current findings. Despite the moderating effect in performance success, learning activities and learning outcomes were equally enhanced by each type of guidance. These results suggest that less specific forms of guidance are already quite useful to young learners with low inquiry skills, and that older, more experienced learners do benefit from specific types of guidance such as scaffolds and explanations. These new insights should be addressed in subsequent writings to further improve the typology.
On the other hand, the present findings shed new light on the conditions of effective minimally guided instructional approaches such as inquiry-based and problem-based learning. Underlying Kirschner et al.’s (2006) plea for high instructional guidance is the tacit assumption that more specific guidance leads to greater learning. Kirschner et al. advocated for worked examples and process worksheets as effective methods of guided learning which both have been successfully applied to facilitate inquiry-based learning (e.g., De Vries et al., 2008; Mulder, Lazonder, & De Jong, 2014). The results of the meta-analyses confirm that even though worked examples and process worksheets differ in specificity—they would be classified as explanations and heuristics, respectively—both would yield similar positive effects on learning and performance. This suggests that “strong guidance” does not necessarily mean “specific guidance;” other dimensions such as the frequency and duration of the guidance might be more imperative to help reduce the demands of inquiry-based learning on working memory. Future research should examine whether and how these properties contribute to the effectiveness of guidance to support inquiry learning activities and outcomes.
Future research should also address the issue of how multiple types of guidance are best combined. The present meta-analysis focused on the effects of individual types of guidance. Although such controlled investigations warrant experimental rigor, inquiry learning practices often incorporate various forms of help and assistance. For example, teachers may start the inquiry with a short benchmark lesson; give hints, feedback, and explanations during the inquiry; and discuss the learners’ performance after the inquiry. The questions as to how different types of guidance are best combined, and whether such a combination is more effective than offering a single type of guidance have received minimal attention in empirical investigations. The orchestration of guidance therefore merits attention in future studies.
The present findings do not point to restrictions regarding the duration of future studies on guided inquiry learning. This implication contradicts the common concern that inquiry learning interventions are often studied for a too short period of time (e.g., T. De Jong & Lazonder, 2014; Kuhn, Garcia-Mila, Zohar, & Andersen, 1995). The current meta-analysis proved otherwise and imply that the true effects of guidance can be revealed in short-term studies. However, this is not to say that adequate guidance causes all inquiry skills to develop at the same pace. More demanding skills such as evidence evaluation inherently require more time to develop than more straightforward skills and do not always reach maturity (e.g., Zimmerman, 2007). Guidance facilitates the development process and effects observed in the short term remain when the duration of the inquiry is extended. Yet guidance in and of itself does not serve as catalyst to boost the pace of development.
Implications for Teaching and Learning
The results of these meta-analyses point to several implications for practice. First and foremost, inquiry-based teaching practices should employ guidance to assist learners in accomplishing the task and learn from the activity. The need for guidance applies to short-term inquiries that are performed as part of a single lesson as well as to more comprehensive inquiry units and projects that span multiple lessons. So whenever learners act like scientists, their teacher should provide them with adequate guidance.
Adequate guidance is not the same as highly specific guidance. Too much guidance inevitably challenges the inherent nature of the inquiry process, and the present findings indicate that less specific forms of guidance lead to comparable learning activities and outcomes as more specific guidance. This enables teachers to create guided learning environments that give learners enough freedom to examine a topic or perform a task on their own. For example, math teachers do not need to direct students through every step of the mathematical modeling process: giving them a heuristic that conveys the gist of each phase will generally suffice. Likewise, elementary science teachers can teach children the basics of experimental design prior to an inquiry, but the use of these skills is equally enhanced by introducing the variables under investigation one at a time. The deployment of such learning environments would help meet the need for more authentic school inquiry tasks that reflect the core attributes of scientific practice (Chinn & Malhotra, 2002b). Engaging in these tasks enables even the youngest learners to learn to understand and appreciate the methods and epistemology of scientific investigation which might eventually cause them to pursue a career in science or mathematics.
However, highly specific guidance is necessary when teachers want students to maximize their performance, for instance when a science project serves as a showcase for parents or when student products are submitted to a national contest. The moderating effect of guidance found in the second meta-analysis indicates that performance success increases more when learners receive more specific guidance. Although the tentative nature of the conclusions imply that this recommendation may not apply to all six types of guidance, the mean effect sizes seem to suggest that explaining a yet unmastered inquiry skill in full enhances performance more than does any of the other, less specific forms of guidance.
Finally, teachers do not have to take the age of their students into account in selecting a particular type of guidance. The conclusions and recommendations presented thus far all apply to children, teenagers, and adolescents. In absence of any consistent age-related differences, teachers can base their choice of guidance on other factors such as the learners’ topical knowledge or familiarity with the inquiry skills, and the teacher–student ratio. As a result, elementary school teachers may decide to give children a simple prompt instead of extensive explanations, and teachers in middle and high school classes can support their students through specific directions or a mini lesson at the start of an inquiry. The decision to increase or decrease the specificity of the guidance learners receive rests on factors other than the ones included in these meta-analyses.
Limitations
Any statistical research integration inherently has limitations, and the present meta-analysis is no exception. A common problem is that some well-designed empirical studies tend to report incomplete statistical information and for that reason six studies had to be excluded from the present meta-analysis. In addition to reducing statistical power, this discard also decreased the number of second-tier journal articles and conference papers which were already underrepresented in the sample. This in turn muddies the waters concerning issues of publication bias in that informative studies had to be discarded for “technical” reasons rather than nonsignificant findings.
A related limitation concerns the relatively low number of primary studies in the meta-analyses of learning activities and performance success. This complicated the analysis of possible specificity effects of guidance in performance success and might explain why the age-related differences in learning activities failed to show. The limited set of studies also caused a more lenient stance toward handling multiple comparisons. Many dependencies were controlled for, for example by extracting only one dependent variable and one treatment group from each study, and examining their effects in a separate meta-analysis for each outcome measure, but some studies were included in two or three meta-analyses. Although the amount of overlap was acceptable (72% of the studies was included in just one meta-analysis), there was no complete independence.
Another issue concerns the conduct of multiple moderator analyses on the same, small set of studies. To control the familywise Type I error rate, omnibus tests of heterogeneity preceded planned comparisons, and Hochberg’s step-up procedure was used where appropriate to ensure that the overall likelihood of a Type I error would not exceed 5%. Although Hochberg’s procedure is less well-known than, for example, a Bonferroni correction, it is also less conservative in that it maximizes the possibility to detect true effects. Still, any adjustment for multiple comparisons increases the probability of producing false negatives, and hence reduces statistical power. This was perhaps most apparent in the moderator analysis of learning outcomes, where the effect of the type of publication turned statistically nonsignificant after adjustment. A practical yet improper solution would be to reduce the number of moderators (even after the fact) so that maximum statistical significance is maintained. A more sincere solution, which was used here, is to present enough information for anyone to evaluate the data in context.
Finally, the scope of this meta-analysis was limited to disciplines subsumed under math and science education. In view of the recent emphasis on STEM education, it might seem like a missed opportunity to exclude studies in technology and engineering education. Yet this was a deliberate choice made for practical reasons because the omitted disciplines are not generally taught in K-12 classes. This decision nevertheless restricts the generalization of the findings to STEM education as a whole.
Conclusion
This synthesis of 72 empirical studies demonstrates that guidance is pivotal to successful inquiry-based learning. Learners who are given some kind of guidance act more skillfully during the task, are more successful in obtaining topical information from their investigational practices, and score higher on tests of learning outcomes administered after the inquiry. These benefits are largely independent of the specificity of the guidance: even though performance success tends to increase more when more specific guidance is available, learning activities and learning outcomes improve as much with specific and nonspecific types of guidance. The effectiveness of guidance applies equally to children, teenagers, and adolescents, which offers educational designers and K-12 math and science teachers a wide choice of opportunities to effectively involve learners in forms of inquiry-based learning.
Footnotes
Notes
Authors
ARD W. LAZONDER is an adjunct professor of education, Department of Instructional Technology, University of Twente, PO Box 217, 7500 AE Enschede, the Netherlands; e-mail:
RUTH HARMSEN is a PhD student at the Department of Teacher Education, Faculty of Behavioral and Social Sciences, University of Groningen, Grote Kruisstraat 2/1, 9712 TS Groningen, the Netherlands; e-mail:
