Abstract
Scientific reasoning abilities are already developing in elementary-school-aged children and enable them to understand the world around them. The goal of the current study was to develop a new instrument for 8- to 10-year-old children in Grades 3 and 4 to measure their understanding of the steps of the scientific inquiry cycle (SIC). Such an understanding is essential for scientific reasoning as well as for inquiry-based learning approaches and, above all, for scientific practice. We developed and applied 15 items in a sample of 878 third- and fourth-grade students (n = 434, Grade 3; n = 435, Grade 4). As confirmed by item response theory (IRT) modeling, the items produced reliable scale scores. Furthermore, we explored the relations between children’s SIC performances, cognitive abilities, and epistemic beliefs. As expected, intelligence, text comprehension, experimentation strategies, and sophisticated epistemic beliefs were positively associated with children’s SIC performance, a finding that helps to establish initial evidence for the construct validity of the SIC test scores.
A major goal of science is knowledge seeking, and scientific reasoning encompasses the abilities to generate, test, and evaluate hypotheses, theories, and data and to reflect on this process (Kuhn, 2011; Morris, Croker, Masnick, & Zimmerman, 2012). Scientists and experts need scientific reasoning to be able to draw adequate conclusions in their research fields, and laypersons need it to expand their knowledge of the world. Even in elementary school, children are already beginning to think scientifically (e.g., Kuhn, 2011; Ryu & Sandoval, 2012; Zimmerman, 2007). This means, for instance, that they ask questions about the nature around them or can construct a simple conclusive experiment (e.g., Bullock, Sodian, & Koerber, 2009; Kuhn, 2011; Sodian, Zaitchik, & Carey, 1991). Scientific reasoning is believed to guide children’s information seeking processes in scientific disciplines and to facilitate their general understanding of the world (Kuhn, 2011). It supports conceptual change as well as the development of children’s personal epistemology (see Kuhn, 2011; Morris et al., 2012; Osborne, 2013). Due to the great importance of scientific reasoning for acquiring knowledge about the surrounding world, national and international education standards have identified scientific reasoning as a normative goal of students’ science education (National Research Council, 1996, 2011; Organisation for Economic Co-Operation and Development [OECD], 2016).
The scientific inquiry cycle (SIC) is a core element of scientific reasoning (e.g., Klahr & Dunbar, 1988; Kuhn, 2011; White, Frederiksen, & Collins, 2009). Although scientific procedures can differ in many ways, a simplified version of the SIC can be subsumed under the interrelated and iterative steps of theorizing, questioning, hypothesizing, investigating, analyzing, and synthesizing (White et al., 2009; Zimmerman, 2007). The understanding of the sequence of these steps is essential for inquiry-based science learning approaches as well as for scientific reasoning and argumentation and, above all, for scientific practice (Colburn, 2000; Kuhn, 2010; Kuhn & Dean, 2005; Lehrer & Schauble, 2015). Furthermore, from the perspective of the philosophy of science, it can be viewed as the initial core of empiricism and empirical research (e.g., Friedman, 2006; Popper, 1935).
At the intersection of cognitive development and science education, instruments for investigating scientific reasoning skills are required to describe children’s competencies or to measure their progress in science learning. So far, only a few paper-and-pencil tests have been designed to measure the components of scientific reasoning in elementary school children 8 to 10 years of age (e.g., Koerber, Mayer, Osterhaus, Schwippert, & Sodian, 2015; Mayer, Sodian, Koerber, & Schwippert, 2014). Because this is a sensitive phase in children’s cognitive development moving from a rather naïve to more advanced conceptions in science (e.g., Bullock et al., 2009; Kuhn, 2011; Kuhn & Weinstock, 2002), appropriate instruments are needed to assess children’s competencies and learning progress.
The goal of the present study was to develop a new paper-and-pencil test for elementary school children to assess a core component of scientific reasoning, namely, the understanding of the sequence of the single components of the SIC, which forms the basis of the hypothetical-deductive approach to science (experimentalism). The instrument consists of 15 items (the exact structure of the SIC test will be described in the “Method” section) that were applied in a sample of 878 third- and fourth-grade students and scaled using item response theory (IRT) modeling in Mplus 7.31 (Muthén & Muthén, 1998-2012). To investigate the construct and criterion validity of the scores from the instrument, we explored the latent factor structure of the items, and the relations between children’s SIC performance and their cognitive abilities as well as epistemic beliefs in the domain of science. The SIC test can be used, for instance, to assess students’ science competencies outside the regular school curriculum or to evaluate science courses and enrichment programs for gifted or talented children (see Schiefer, Golle, Tibus, Trautwein, & Oschatz, 2017).
Conceptual Framework for Scientific Reasoning
Scientific reasoning includes “the skills involved in inquiry, experimentation, evidence evaluation, and inference that are done in the service of conceptual change or scientific understanding” (Zimmerman, 2007, p. 172). It involves a range of cognitive and metacognitive skills and is considered a cumulative and cyclical process that requires the coordination of theory and evidence (Kuhn, 2011; White et al., 2009). The goal of this cyclical process is to acquire knowledge or to produce change in already existing knowledge (see Kuhn, 2011). Scientific reasoning encompasses the ability to generate, test, and revise theories and hypotheses and to reflect on this process (Kuhn & Franklin, 2006; Zimmerman, 2007). Thus, scientific reasoning is closely related to the broader construct of epistemic cognition, which can be defined as cognitive processes focused on epistemic issues (Greene, Sandoval, & Bråten, 2016; Sinatra, Kienhues, & Hofer, 2014) or “the thinking that people do about what and how they know” (Sandoval, Greene, & Bråten, 2016, p. 457). Epistemic cognition is believed to drive and guide scientific reasoning processes in the context of knowledge seeking and evidence evaluation.
The SIC as a Core Element of Scientific Reasoning and Experimentalism
A simplified version of the SIC includes the following steps, which correspond to the core features of inquiry-based learning and hypothetical-deductive approaches (Pedaste et al., 2015): (a) the generation of hypotheses on the basis of a specific research question (derived from theory or the results of previous research), (b) the planning and conducting of experiments, (c) data collection, (d) analysis, (e) the evaluation of evidence, and (f) the drawing of inferences. The SIC subsumes all individual components of scientific reasoning from a metaperspective and emphasizes a holistic view of scientific inquiry as the components build the basis for the cumulative and cyclical process of knowledge acquisition and change (Kuhn & Franklin, 2006; Zimmerman, 2007).
The SIC has served as a theoretical framework for scientific reasoning models, for example, the scientific discovery as dual search (SDDS) model by Klahr (2000) or the model of the scientific inquiry process by White and Frederiksen (1998). It can even be considered the general core of empiricism (e.g., Friedman, 2006; Popper, 1935), that is, all kinds of empirical research involving a hypothesis-driven approach (see Lehrer & Schauble, 2015; McComas, 1998). Therefore, the SIC is an effective initial model that can enable students to develop the abilities to engage in inquiry and to understand its constituent processes (White & Frederiksen, 2005, 1998).
The SIC comprises an epistemic perspective because the goal of the inquiry phases is not the separate collection of knowledge but the ongoing and continuing generation, testing, and revision of theories and hypotheses in science (Furtak, Seidel, Iverson, & Briggs, 2012; Osborne, 2013). Knowledge of the components of the SIC is an important basis for inquiry and scientific practice but only when the components are considered in a logical order and in relation to each other (Lehrer & Schauble, 2015). They are always interrelated when investigations are conducted, for instance, “it makes little sense to plan and conduct an investigation unless there is a driving question that can, in turn, be addressed by the data that the investigation yields” (Lehrer & Schauble, 2015, p. 6).
All of the steps of the SIC are arranged to represent a cycle, but because inferences from an experiment lead mostly to new research questions or hypotheses and the beginning of a modified inquiry process, they correspond more closely to a spiral (see Figure 1). Thereby, it should be noted that the exact sequence of the steps differs in the literature, and mature scientific inquiry does not necessarily proceed in the stepwise manner that is postulated (e.g., it is possible to start anywhere in the cycle, and there might be various paths, see Pedaste et al., 2015). Scientists do not necessarily proceed through these steps of inquiry in a fixed order, for instance, “analyzing data can lead to the need to do further investigation” (White et al., 2009, p. 9). This means, for example, that scientists might have to reanalyze their data or change a component of their experiment before they can go on to interpret the results or draw conclusions. Depending on the research traditions and methodology of different scientific disciplines, there might also be deviations from these steps in practice (see Pedaste et al., 2015). Nevertheless, a consistent approach can clearly be defined, and the SIC as displayed in Figure 1 is an effective analytical model that students can use to develop the skills needed for inquiry and an understanding of its constituent processes (White & Frederiksen, 1998, 2005). By following this approach, students can learn that their own process of collecting, evaluating, and interpreting evidence is similar to the practice of real scientists (e.g., Bell, Lederman, & Abd-El-Khalick, 1998) and that scientific knowledge is subject to change (Furtak et al., 2012). Furthermore, this model represents the theory-driven deductive approach that is accepted and usually applied by scientists in empirical investigations (Lehrer & Schauble, 2015; McComas, 1998; White et al., 2009), for instance, when they test a new hypothesis by means of scientific inquiry.

Steps of the scientific inquiry cycle (SIC).
So far, the individual steps in the inquiry cycle have usually been investigated and assessed independently of each another (e.g., Kuhn, 2007; Piekny & Maehler, 2013) even though they are interdependent and highly interrelated (Lehrer & Schauble, 2015; Wilhelm & Beishuizen, 2003). It can be assumed that the individual components (e.g., designing experiments, controlling variables, interpreting data) can be demonstrated or trained independently of one another. However, being competent in the individual components without sound knowledge of the entire process is not sufficient to conduct targeted empirical research or to reflect on this research (see Kuhn & Franklin, 2006). This strengthens the central role of the understanding of the sequence of the SIC as a prerequisite for a metaperspective on scientific reasoning.
Scientific Reasoning in Elementary School Children
Traditionally, developmental psychologists have presumed that scientific reasoning abilities develop with age and become more complex as children get older. At the end of elementary school, most children have reached the late concrete-operational level or even the early formal-operational level in their cognitive development (Inhelder & Piaget, 1958). Developmental research across the last 20 years has provided evidence that scientific reasoning abilities already exist in elementary-school-aged children (e.g., Bullock et al., 2009; Kuhn, 2011; Morris et al., 2012; Zimmerman, 2007). Although elementary school children have trouble systematically designing controlled experiments, drawing appropriate conclusions on the basis of evidence, and interpreting evidence in general (Morris et al., 2012; Zimmerman, 2007), they do possess basic scientific reasoning skills. They are able to differentiate hypotheses from evidence, they can distinguish between a conclusive and an inconclusive experimental test, and they do not confound the testing of hypotheses with the production of positive effects (e.g., Bullock et al., 2009; Sodian et al., 1991). Furthermore, elementary school children can recognize that exposure to different information may lead to different knowledge claims (e.g., Carpendale & Chandler, 1996), and they view investigation and testing as central to science (Kittleson, 2011).
Elder (2002) summarized that students at the end of elementary school have a mixture of naive and sophisticated understandings of science and scientific inquiry: On one hand, children tend to regard scientific knowledge as a developing, changing construct that is created by reasoning and testing. On the other hand, they display naive notions of science as a mere activity rather than as directed by aims to explain phenomena in the world. In sum, elementary school can be considered a sensitive phase in children’s development because the children move from a rather naïve to a more sophisticated understanding of science and inquiry during this time (e.g., Koerber et al., 2015; Kuhn & Weinstock, 2002). This strengthens the possibility that children between the ages of 8 and 10 might understand a simplified version of the (complex) scientific inquiry process.
Measuring Scientific Reasoning and the Need for a New Instrument
A variety of task formats have been used to assess children’s scientific reasoning and inquiry skills, including interviews, scenario-based instruments, think-aloud protocols, self-directed experimentation tasks, simulations, or story problems (e.g., Bullock & Ziegler, 1999; Carey, Evans, Honda, Jay, & Unger, 1989; Dunbar & Klahr, 1989; Kuhn et al., 1995; Mason, 2016; Schauble, 1996). However, only a few questionnaires and paper-and-pencil tests have been developed (e.g., Koerber et al., 2015; Mayer et al., 2014). Even though paper-and-pencil tests have been criticized (e.g., due to their limited validity or their limits regarding the assessment of qualitative processes; see Mason, 2016), they offer the simplest and most efficient way to measure abilities in group settings (e.g., school classes, science interventions). Educational research and practice has progressively focused on the development of questionnaires ever since national and international large-scale studies such as PISA (Programme for International Student Assessment) or TIMSS (Trends in International Mathematics and Science Study) have increased in importance (Mullis & Martin, 2013; OECD, 2016). However, developing paper-and-pencil measures to assess elementary school children’s scientific reasoning skills in a group-testing situation is challenging due to, for example, children’s limited reading capacities or the great effort required to develop instruments that produce test scores with good psychometric qualities. This might explain the apparent lack of instruments.
Nevertheless, a group of researchers recently developed a paper-and-pencil instrument for fourth graders for assessing different components of scientific reasoning (the goals of science, theories and alternative frameworks, using experimentation strategies, experimental design, and data interpretation; see Koerber et al., 2015, based on previous work by Mayer et al., 2014). The authors combined these components into a unitary scientific reasoning scale that could be separated from intelligence or general processing skills (Koerber et al., 2015). The items referred to specific steps within the SIC, and therefore, this instrument could not be used to assess children’s understanding of the sequence of the individual steps of the SIC. Such an understanding of the sequence is considered an important prerequisite for understanding the entire inquiry process (see Lehrer & Schauble, 2015; White et al., 2009; Zimmerman, 2007). We assume that an understanding of the steps is important because the ability to sequence all of the steps of the cycle in the typical order provides some indication of an individual’s overall ability to plan and monitor the entire process of scientific inquiry. It would be rather unproductive to carry out single steps of the cycle without the ability to embed those steps in the whole inquiry process (e.g., designing an experiment without deriving a hypothesis first or without knowing how to interpret the results; for example, White et al., 2009). Therefore, we assume that learners need to build an initial model of the scientific inquiry process, which might allow the coordination of a deeper understanding of the processes involved in each of the individual steps of the cycle. In contrast to existing measures, we took a holistic approach to scientific reasoning and assessed the understanding of the steps of the entire inquiry process and not (like existing scales) specific components within the SIC.
Validity of the Scores From a New Instrument
To determine the construct validity of the scores from a new instrument for measuring the SIC, the use of already existing scientific reasoning questionnaires is recommended (Kline, 2015). As described above, there are hardly any scientific reasoning instruments for elementary school children that can be applied in group testing situations. The recently developed scientific reasoning scale (Koerber et al., 2015; Mayer et al., 2014) had not been published when we designed our study and was not available for the validation of the SIC test scores. Although no comprehensive scientific reasoning test was available for validation, commonly used scientific reasoning tasks that measure single aspects of scientific reasoning could be applied to validate the SIC item scores. As research has often focused on experimentation strategies (see Chen & Klahr, 1999; Koerber et al., 2015; Zimmerman, 2007), we assessed such strategies alongside the new instrument to ensure convergent validity. Because an understanding of the SIC and an understanding of experimentation strategies are associated with the scientific inquiry process, we expected a positive relation between the constructs.
Relations to Cognitive Abilities
To develop a measure of the understanding of the SIC that would produce valid scores, it is necessary to distinguish this competence from general cognitive abilities such as intelligence or reading skills (see Mayer et al., 2014), which are assumed to be necessary for the processing of (text-based) assessment tasks. There is evidence that scientific reasoning is positively related to cognitive abilities but can be measured separately (Bullock et al., 2009; Mayer et al., 2014). To determine the relations between SIC test performance and cognitive ability scores, we assessed intelligence and reading comprehension as covariates and expected positive relations between the scores on these constructs and the SIC scores (see Koerber et al., 2015; Mayer et al., 2014).
Relations to Epistemic Beliefs
Besides cognitive abilities, an understanding of the epistemic features of science is essential for scientific reasoning and inquiry (Greene et al., 2016; Kuhn, 2011; Osborne, 2013; White et al., 2009). This underlines the close interrelation of scientific reasoning and epistemic beliefs 1 (Greene et al., 2016). Epistemic beliefs are subjective beliefs about the nature of knowledge (what one believes knowledge is) and the nature of knowing (beliefs about the process through which one comes to know) in science (see Hofer & Pintrich, 1997; Lederman, 2007). Such beliefs drive our metacognitive and cognitive processing in the context of scientific reasoning (Greene et al., 2016; Hofer & Pintrich, 1997). In line with Conley, Pintrich, Vekiri, and Harrison’s (2004) conceptualization, we distinguish here between the four dimensions of source, certainty, development, and justification of knowledge. The source dimension addresses beliefs about the knowledge that resides in external authorities. Sophisticated stances include critically evaluating and not having “blind faith” in external authorities such as teachers. The certainty dimension reflects beliefs about the (un)changeability of knowledge in the natural sciences. Sophisticated stances include statements about the possibility of change, the further development of scientific knowledge, and a variety of answers to complex problems. The development dimension is associated with beliefs that recognize science as an evolving subject. Sophisticated stances include statements about how scientific ideas are continuously changing (e.g., due to new discoveries or data). Finally, the justification dimension refers to the role of experiments and how students evaluate claims. Sophisticated stances include justified judgments and the acceptance of a variety of explanations for scientific phenomena (Schraw & Sinatra, 2004).
According to the assumed interdependence of scientific reasoning and epistemic beliefs (see Osborne, 2013), we investigated relations between the SIC and all the dimensions of epistemic beliefs in the present study. Regarding the inquiry cycle, we expected that, in particular, sophisticated epistemic beliefs about the certainty, development, and justification of knowledge would be positively associated with the understanding of the SIC because these dimensions refer to science as a changing and reversible discipline (Conley et al., 2004). Sophisticated stances in these dimensions might be a prerequisite for the understanding that scientific knowledge is subject to change due to the cyclical and cumulative phases of the SIC (inquiry, analysis, inference, and argument). The source dimension, however, refers instead to critical evaluations of external authorities or experts and might therefore be less associated with the inquiry process.
The Present Study
Our goal with the present study was to develop a new paper-and-pencil instrument that meets the criteria for objectivity, reliability, and validity and is appropriate for children in Grades 3 and 4. The instrument was aimed at measuring children’s understanding of the sequence of the steps of the SIC. Existing instruments have often focused on single aspects within the SIC such as experimentation strategies or on the evaluation of evidence (e.g., Chen & Klahr, 1999; Koerber et al., 2015; Koslowski, 1996). Therefore, the SIC test provides a fruitful addition to the field as it takes into account the relations between the steps and their order—which mark the basis of the hypothetical-deductive scientific approach.
To ensure the objectivity of the SIC test (which is a major requirement for analyzing further quality criteria; see Kline, 2005), a standardized manual was developed. To ensure its content and face validity, the newly developed instrument was discussed with distinguished experts in the field of scientific reasoning prior to scaling.
Two hypotheses guided our study. First, we hypothesized that the items we developed would produce reliable scale scores and would measure elementary school children’s understanding of the SIC (Hypothesis 1). To investigate the divergent validity of the test scores, we investigated how children’s performance on the SIC test was related to their intelligence and reading comprehension. To contribute to the convergent validity of the SIC test scores, we analyzed how SIC test performance was related to experimentation strategies and epistemic beliefs in the domain of science. Second, we hypothesized that individual differences in SIC performance would be positively related to intelligence, reading skills, experimentation strategies, and sophisticated epistemic beliefs in the domain of science (Hypothesis 2).
Method
Participants and Experimental Design
The current study was based on data from 878 elementary school children in the third (n = 434) and fourth grades (n = 435; age: M = 8.89, SD = 0.76; 57.4% boys; see Table 1 for sample demographics by grade level). They were from equivalent public elementary schools in urban areas in southwest Germany. Data about socioeconomic status and ethnicity were not collected in the present study. In a cross-sectional design, the SIC test and the validation instruments were first assessed in 42 classes from 10 elementary schools (n = 681) by applying a rotational design with three versions of the questionnaires (all contained the SIC test and a combination of the following instruments: fluid and crystallized intelligence, text comprehension, epistemic beliefs, and experimentation strategies; see Table 2 for details). Such multiple-booklet designs are a common procedure in the context of large-scale assessments (see Frey, Hartig, & Rupp, 2009). This method allows representative testing of a variety of constructs to be applied but does not require any single child to answer the entire item set (see Koerber et al., 2015). Within the participating schools, the booklets were randomly assigned to classrooms. To increase the sample size and to investigate the discrimination of the SIC test between samples with different abilities, the SIC test was additionally assessed in 36 extracurricular science, technology, engineering and mathematics (STEM) courses (n = 197) for third and fourth graders. Prior to testing, we obtained parents’ written consent for their child’s participation. The study was approved by the ethics committee of the university’s Faculty of Economics and Social Sciences (Approval Number AZ.: A2.5.4.-038_sn) and the Ministry of Education and the Arts (Approval Number 31-6,499.20/875). Each measure was administered in a group testing situation by a trained instructor. Data collection in schools took 90 min and included the assessment of the SIC test and the validation instruments (see Table 2). For organizational reasons, data collection in STEM courses was limited to 45 min and therefore included only the assessment of the SIC test.
Description of the Sample (by Grade Level).
Note. For n = 9 children, no information about the grade level was provided.
Overview of the Sample (N = 878) and the Multiple-Booklet Design.
Note. n = number of students who got Booklets A, B, or C. N = total number of students who were tested in school classes or STEM courses. The instruments were arranged in the booklets in the presented order (from left to right). The Booklets A, B, and C were randomly assigned to the school classes. SIC = scientific inquiry cycle; STEM = science, technology, engineering and mathematics.
Instruments
SIC test
The instrument focused on the assessment of the understanding of the typical order of the steps of the complete SIC. The developed tasks required (a) the active reconstruction of the sequences of all steps of the SIC and (b) an understanding of the consecutive next steps of the cycle within a given inquiry process (White et al., 2009; see White & Frederiksen, 1998). We believe that an understanding of all steps is required for an understanding of the complete inquiry cycle and that partial solutions do not indicate an understanding of the SIC. Furthermore, all of the steps are related to each other (Klahr & Dunbar, 1988; Wilhelm & Beishuizen, 2003), corresponding to a holistic approach to the inquiry process. Therefore, we postulated that the understanding of the SIC would be represented as a unidimensional construct.
The SIC items were developed according to generally recommended procedures (see Downing & Haladyna, 2006). Prior to application, items were repeatedly discussed with four distinguished experts in the field of scientific reasoning with respect to elementary-school-aged children. The experts gave feedback on the content validity of the test, the item format (i.e., arrangement and number of steps in the SIC), and the examples that were used (i.e., the choice of the research topics that were used). The practicability and comprehensibility of the items were tested in a pilot phase with N = 10 third and fourth graders. Subsequently, 15 of the 22 items were selected and revised. We administered the items again to three children who were asked to use think-aloud techniques during processing (to detect possible problems in the children’s understanding of the instructions or in the handling of the presented materials; see Fonteyn, Kuipers, & Grobe, 1993).
The final SIC test consisted of 15 items that were dichotomously scored (0 = wrong answer, 1 = correct answer). Although there might be variations from the concrete order of the steps of the SIC in practice, correct and consistent steps can be clearly defined (see Pedaste et al., 2015). Two different response formats were used to assess the understanding of the inquiry process: (a) sorting the given steps of the SIC into the right order (three items, presented first, see Figures 2 and 3 for examples) and (b) selecting the correct subsequent step after a situation in the SIC is described (12 items, presented second, see Figure 4 for an example). Items were presented in a domain-general everyday-life context because it was essential to ensure that the children’s ability to answer the items did not require specific science content knowledge or knowledge about specific methods in different science disciplines. This was important because the goal was to design an instrument that could be used to assess the understanding of the SIC in third- and fourth-grade children who have not yet taken specific science lessons in school.
a. Three items required the sorting of the single steps of the inquiry cycle via printed labels (Step 1: finding the research question; Step 2: generating hypotheses; Step 3: planning an experiment; Step 4: conducting an experiment/collecting data; Step 5: analyzing results; Step 6: making inferences). Each of these three tasks required the active reconstruction of a different inquiry process. The respective research topic was introduced to the children in a short paragraph (e.g., “Tom wants to find out whether his new pet has a sensitive sense of smell. How can he investigate this like a scientist?”). Two out of three items were presented in a concrete everyday context (i.e., pet’s sense of smell, freezing of juice); one item was presented in a general context (“How do scientists proceed when they want to investigate something?”). This was intended to include tasks that covered the general understanding of the inquiry process as well as its application. The order of the presentation of the respective inquiry steps was selected at random. To compensate for possible differences in reading abilities, the test instructors read the six single inquiry steps aloud to the children. Afterward, the children were given printed labels that contained the six inquiry steps that were previously read to them. They had to put the steps in the right order by sticking the steps in their questionnaire (see Figure 3). The starting point—finding a research question—was given to the children. Only completely accurate solutions were counted as correct because partial solutions did not indicate an understanding of the entire SIC.
b. Twelve items were single-choice items, and the children were asked to select the respective next step in the inquiry cycle within a given inquiry process. The first six items referred to a concrete research topic (e.g., “Mr. Abendstern is a famous scientist and knows exactly how a scientist has to work. He is interested in what causes tooth decay and wants to find out more about it”). The second six items referred to a general context (e.g., “Mrs. Morgenstern is a famous scientist and knows exactly how a scientist has to work”—without specifying a research topic; see Figure 4). The children were told that they would be asked about the different working steps of the scientists. These respective working steps were not described in the correct consecutive stepwise order in which the scientist would do them (they were in a random order). Every page of the questionnaire contained only one item, and the children were not allowed to move backward through the questionnaire to correct answers they had already given. This procedure was chosen to avoid dependencies and cross-links between the children’s answers. Each of the questions referred to one of the six steps of the inquiry cycle, and the children were asked what the scientist would do next (e.g., Mrs. Morgenstern has a hypothesis she wants to check. What is her next working step? [a] She performs an experiment, [b] She plans an experiment to verify her hypothesis, [c] She evaluates the results of her experiment). The response options referred either to the next step in the inquiry cycle (correct answer: [b] in this example) or to two randomly selected other steps (wrong answers: distractors [a] and [c] in this example; see Figure 4). The complete list of test items will be provided by the authors upon request.

Example Item 1 from the SIC test (Sorting Task Part 1, everyday problem).

Example Item 1 from the SIC test (Sorting Task Part 2).

Example Item 4 (single-choice item, general problem).
To ensure the objectivity of the implementation and the analysis of the SIC test, a course manual was developed (see Kline, 2015). It included well-formulated instructions for the test administrators and the children. This means that the administrators were given word-for-word instructions to ensure that the test was implemented similarly in all schools and that there were no differences due to the test administrators. Prior to testing, all the research assistants participated in a half-day training where they were instructed in how to implement the test, for instance, how to deal with the test items and materials, the instructions, and the manual. To ensure the objectivity of the analysis of the test as well, there were clear instructions, guidelines, data masks, and syntaxes (e.g., in SPSS and Mplus) for the coding, evaluation, and interpretation of the answers.
Reading comprehension
Reading comprehension was assessed with the “text comprehension” subtest of the standardized German reading proficiency test ELFE 1-6 (Ein Leseverständnistest für Erst- bis Sechstklässler—a reading comprehension test for first to sixth graders—Lenhard & Schneider, 2006). This subtest measures the reading comprehension on text level. Each of the 20 single-choice items consisted of a small section of text, a question, and four answer alternatives. Children had to choose between the right answer and three distractors. All answers were explicitly mentioned in the text. Children had 7 min to read the texts and answer the items. Sum scores were used in further analyses (Cronbach’s α = .87, all reliabilities were calculated from the current sample).
Intelligence
Intelligence was measured with an age-adapted version of the BEFKI-short (Berliner Test zur Erfassung fluider und kristalliner Intelligenz [Berlin test of fluid and crystallized intelligence], Schroeders, Schipolowski, Zettler, Golle, & Wilhelm, 2016). The first subscale, Fluid Intelligence, consisted of 16 items. Within a time limit of 15 min, children had to select two figures that completed a series of figural patterns (α = .67). An example item from the fluid scale can be found in Figure 5. The second subscale, Crystallized Intelligence, consisted of 16 single-choice items and included questions about general knowledge (e.g., “What is the climate change”?). Within a time limit of 8 min, children had to choose one out of five answer alternatives (α = .64). Sum scores were calculated separately for the two subscales and used in further analyses.

Example item from the BEFKI fluid intelligence test (Schroeders et al., 2016).
Experimentation strategies
Experimentation strategies were assessed with six single-choice items with three answer alternatives (one correct, two misconceptions). The items focused on the control of variables strategy (Chen & Klahr, 1999; Zimmerman, 2007). As no published (German) test for assessing experimentation strategies exists, we used three items from research projects by Mayer et al. (2014) and developed three other items with the same format (following Mayer et al., 2014, and Ehmer, 2008, see Figure 6). The items were presented in everyday-life contexts designed to assess domain-general experimentation skills (Mayer et al., 2014). The items were scored dichotomously (1 = correct answer, 0 = wrong answer). The assumed unidimensionality of the scale was supported by a confirmatory factor analysis (CFA). 2 Although the internal consistency was very low for this scale (α = .44), we used sum scores in further analyses because experimentation strategies are theoretically important for the SIC. We critically discuss the reliability of this scale score in our “Discussion” section.

Example item for assessing experimentation strategies.
Epistemic beliefs
We assessed science-related epistemic beliefs with a 26-item instrument (Conley et al., 2004, adapted from previous work by Elder, 2002, translated by Urhahne & Hopf, 2004). The four subscales included the dimensions: Source (five items, α = .59), Certainty (six items, α = .58), Development (six items, α = .56), and Justification of Knowledge (nine items, α = .65). Items were rated on a 4-point Likert-type scale and can be found in Table 3. The Source and Certainty scales were recoded so that for each of the scales, higher scores reflected more sophisticated beliefs (a higher negation of the source and the certainty items).
Items From the Questionnaire by Conley et al. (2004, p. 202f) for Assessing Epistemic Beliefs.
Note. The items from the dimensions source and certainty (–) had to be recoded because agreement points to less sophisticated epistemic beliefs. On the contrary, agreement with the items from the development and justification (+) dimensions indicates sophisticated beliefs.
Statistical Analysis
Initial item analyses were computed to explore item characteristics (means, standard deviations, item selectivity). We applied IRT modeling in Mplus to scale children’s test scores with a two-parameter logistic model (2PL model) for dichotomous items (Birnbaum, 1968; Muthén & Muthén, 1998-2012). We used a maximum likelihood estimator with robust standard errors (MLR), which uses a numerical integration algorithm (Muthén & Muthén, 1998-2012). To correct for the clustering of the data (children nested in classes), we used type = complex for all analyses (Muthén & Muthén, 1998-2012). We applied confirmatory item factor analyses (IFAs) using structural equation modeling (WLSMV estimator) to test the model fit and the postulated unidimensional latent factor structure of the 2PL model (Birnbaum, 1968). We computed the comparative fit index (CFI), the Tucker–Lewis index (TLI), the root mean square error of approximation (RMSEA), χ2, p value, and the χ2/df ratio (for recommendations for model evaluation, 3 see Chen, Curran, Bollen, Kirby, & Paxton, 2008; Hu & Bentler, 1999; Schermelleh-Engel, Moosbrugger, & Müller, 2003). To investigate the relations between children’s performance on the SIC test (expected a posteriori [EAP] parameters) and intelligence, text comprehension, experimentation strategies, and epistemic beliefs (Hypothesis 2), we applied correlation and multiple regression analyses. All variables were z standardized prior to the analyses.
Missing data
In our study, there were hardly any missing values on the SIC items. Missing data ranged from 0% to 1.03%. Due to the multiple-booklet design, the amount of planned missing data on the other scales ranged from 33.2% to 34.5% (see Table 7 for the exact number of participants for each variable). We used the full information maximum likelihood approach implemented in Mplus to deal with missing data. To estimate the model parameters, this approach takes into account all variables from the respective models (see Schafer & Graham, 2002).
Results
Initial Item Analyses
The initial item analyses (means, standard deviations, selectivity) are presented in Table 4. Because our items were scored dichotomously (1 = correct, 0 = incorrect), the means of the initial item analyses represent the frequencies with which the correct solutions were identified (0.28 ≤ M ≤ 0.82). None of the items were solved by none or all of the children. The items were not too difficult or too easy (T. J. Kline, 2005). The items had sufficient variances (0.38 ≤ SD ≤ 0.50). Initial analyses of the item selectivity (correlations between the items and the test score) revealed that three items (Items 4, 8, and 12) had values close to zero. These items were excluded from further analyses and the subsequent scaling of the items. The excluded items were all single-choice items and referred to the steps “finding the research question” (Item 4) and “planning the experiment” (Items 8 and 12).
Results of the Initial Item Analyses (Sorted From Most Difficult to Least Difficult).
Note. r.cor = correlation between item and test score.
Scaling of the SIC Items
To address our first research question, we scaled the items with the Birnbaum (1968) measurement model as a 2PL model for dichotomous items. The data were also scaled with a one-parameter logistic (1PL; Rasch) model and a 3PL model. The model fit criteria are summarized in Table 5. We decided to use the 2PL for theoretical reasons, clear parameter interpretation, and parsimony (Maris & Bechger, 2009; San Martín, González, & Tuerlinckx, 2015 see von Davier, 2009).
Model Fit Information for the 1PL (Rasch), 2PL, and 3 PL Models.
Note. 1PL = one-parameter logistic; 2PL = two-parameter logistic; 3PL = three-parameter logistic; LL = log-likelihood; MLR =Robust maximum likelihood; AIC = Akaike information criterion; BIC = Bayesian information criterion.
We applied confirmatory IFAs using structural equation modeling (WLSMV estimator) in Mplus to test the model fit and the postulated unidimensional latent factor structure. This procedure tests whether observed item responses can be explained by a single continuous latent trait (see Koerber et al., 2015). The model fit of the 2PL model revealed acceptable results: 4 RMSEA = .035; χ2/df = 2.10, χ2(54) = 113.662, p < .001, CFI = .89, TLI = .86. The SIC items showed an acceptable overall marginal EAP reliability of .64. Values above .70 can be described as good, and values above .60 can be described as acceptable (comparable with Cronbach’s alpha; see Field, 2009; Koerber et al., 2015). The EAP reliability is an estimate of test reliability that is obtained by dividing the variance of the individual EAP ability estimates by the estimated total variance of the latent ability (Kim, 2012). The percentage of correct answers (for the complete sample as well as for Grades 3 and 4 separately) and the item properties (difficulty, discrimination) for the 2PL model are summarized in Table 6. Children in Grade 4 performed better on the SIC test than children in Grade 3, t(867) = 9.05, p < .001. Children participating in an extracurricular STEM course performed better on the SIC test than children in regular school classes, t(876) = 6.94, p < .001. Results of the differential item analyses (see Holland & Wainer, 2012) revealed no differential item functioning (DIF; item difficulty, item discrimination) for gender (boys vs. girls), intelligence, grade level (Grade 3 vs. 4), age, or school.
Item Properties for the 2PL Model (Items Ordered From Most Difficult to Least Difficult).
Note. SC = single-choice item; ST = sorting task; N = 878; expected a posteriori (EAP) reliability = .64.
Relations Between SIC Performance, Cognitive Abilities, Experimentation Strategies, and Epistemic Beliefs
To address the second research question, we calculated correlations between SIC performance (latent EAP estimates), cognitive abilities, experimentation strategies, and epistemic beliefs (see Table 7). Apart from source of knowledge, all variables were positively correlated with SIC performance. Correlation coefficients ranged from .17 for development of knowledge to .49 for text comprehension (all ps < .01). Text comprehension, experimentation strategies, and fluid and crystallized intelligence had the highest positive relations with SIC test performance. This means that children with a higher level of text comprehension, experimentation strategies, and intelligence scored higher on the SIC test than children with lower scores on these scales. Correlations between SIC test performance and epistemic beliefs were low to moderate (.06 < r < .22). There were significant positive relations between the dimensions certainty, development, and justification of knowledge and SIC performance. In accordance with our expectations, the source dimension was not associated with SIC test performance because this dimension is less related to science as a changing and reversible discipline (Conley et al., 2004). Because some of the reliabilities of the scores from the validation instruments were low to moderate, we additionally used the attenuation formula to correct the correlations for the nonreliability of the scores (Schmidt & Hunter, 2015). The results are presented in Table 8.
Correlation Coefficients, Means, Standard Deviations, Intraclass Correlations, Skewness, and Kurtosis for all Measures.
Note. ICCs = intraclass correlation coefficients; SIC = scientific inquiry cycle.
Items were scaled by a 2PL model for dichotomous items. The mean of the latent factor was fixed to 0.
Items were reversed so that for these scales, higher scores reflected more sophisticated beliefs (a strong negation of the source and certainty items). The SIC test was used in the complete sample (N = 878, school classes and STEM courses), the other instruments only in school classes (n = 681). Sample sizes are determined by the rotational booklet design (see Table 2).
p < .05. **p < .01. ***p < .001 (two-tailed).
Correlation Coefficients Between the SIC Test and the Covariates, After Correction for Attenuation.
Note. SIC = scientific inquiry cycle.
In a second step, we computed a multiple regression model to predict SIC performance from cognitive abilities, experimentation strategies, and epistemic beliefs (see Table 9). The predictors were the z standardized measures of text comprehension, crystallized intelligence, fluid intelligence, experimentation strategies, and the four dimensions of epistemic beliefs. The dependent variable was the z standardized SIC EAP score. We also report structure coefficients (Courville & Thompson, 2001; Ziglari, 2017). The results revealed that text comprehension (β = 0.20, p < .001), fluid intelligence (β = 0.18, p = .002), experimentation strategies (β = 0.28, p < .001), and sophisticated beliefs about the certainty of knowledge (β = 0.18, p = .001; beliefs about less certainty, see reversal of items) were significant predictors of SIC performance and explained 38% of the variance in SIC performance. The structure coefficients also pointed to the association of crystallized intelligence and SIC performance. Overall, the results confirmed the assumed relations between the SIC test scores and other constructs and helped to establish initial evidence for the construct validity of the test scores.
Regression Model for Predicting the SIC Performance (EAP Scores).
Note. N = 878. Standardized coefficients are reported. rs = rYX / R (structure coefficient). R = .616.rYX = bivariate correlation between predictor scores and the SIC. SIC = scientific inquiry cycle.
Variables were z standardized prior to analyses.
p < .05. **p < .01. ***p < .001.
Discussion
This study focused on the development of a new paper-and-pencil test for elementary school children to assess their understanding of the sequence of the single components of the SIC, which forms the basis of the hypothetical-deductive approach to science (experimentalism). Knowledge of the components of the SIC is an important basis for inquiry and scientific practice but only when the components are considered in a logical order and in relation to each other (Lehrer & Schauble, 2015). The ability to sequence the steps of the SIC had not previously been assessed in this age group. The items required students to reconstruct the associated sequences and to answer single-choice questions that referred to consecutive steps in the SIC. The understanding of the SIC was assessed in a domain-general manner because children in Grades 3 and 4 have not yet taken specific science lessons in school. It can be implemented, for instance, for the measurement of elementary school children’s science competencies or to evaluate the effectiveness of science interventions for gifted or talented students (see Schiefer et al., 2017).
Scaling of the SIC Items
The results of the 2PL scaling indicated that our instrument could reliably measure the understanding of the inquiry cycle in elementary-school-aged children (Hypothesis 1). The results of the IRT modeling indicated that the items produced reliable and feasible scale scores and fulfilled—despite the selective misfit of the model—the overall psychometric affordances of a dichotomous 2PL model (Birnbaum, 1968). This helped to establish initial evidence for the construct validity of the test scores (i.e., that the competence to solve the SIC items can be explained by one latent trait). The SIC test scores had an EAP reliability of .64, which is comparable to the reliability of existing scientific reasoning test scores at an elementary school level (Koerber et al., 2015; Mayer et al., 2014). There is evidence that the SIC test scores could be used to differentiate between different groups (children in Grade 3 vs. children in Grade 4; children in school classes vs. children in an extracurricular STEM training), which is an important characteristic of a good test (Kline, 2015).
Construct Validity of the SIC Test Scores
To further investigate the construct validity of the SIC test scores, we analyzed relations between the children’s SIC performance and their cognitive abilities and epistemic beliefs in the domain of science (Hypothesis 2). The results of the correlation analyses indicated that besides the components of fluid and crystallized intelligence, experimentation strategies and sophisticated epistemic beliefs were associated with SIC performance. The results were consistent with our expectation that cognitive ability constructs would be positively related to children’s SIC performance. However, due to the moderate relation, it can be concluded that SIC performance can be differentiated from cognitive ability variables as a separate construct, thus indicating discriminant validity (Kline, 2015). This differentiation is important because reading comprehension and intelligence generally play important roles in the processing of written test instruments and influence test performance, especially at the elementary school level (Koeppen, Hartig, Klieme, & Leutner, 2008).
The correlations of the SIC test with epistemic beliefs corresponded with our prediction that sophisticated beliefs about the nature of knowledge and knowing (see Hofer & Pintrich, 1997) would be positively associated with students’ understanding of the SIC. The results provide empirical evidence for the relation between sophisticated epistemic beliefs and the completion of scientific reasoning tasks (e.g., Kuhn, 2011; Morris et al., 2012; Osborne, 2013). In line with our expectations, sophisticated epistemic beliefs about the certainty, development, and justification of knowledge were positively correlated with children’s SIC performance. These epistemic beliefs refer to science as a changing and reversible discipline (Conley et al., 2004) and might be prerequisites for the understanding of the cyclical and cumulative knowledge-seeking phases of the SIC (Koslowski, 1996; Kuhn, 2011; Kuhn & Franklin, 2006). The goal of these inquiry phases is not the separate collection of knowledge but the ongoing and continuing generation, testing, and revision of theories and hypotheses. An understanding of this process of knowledge acquisition and change requires an understanding of the need for change and the constant further development of scientific knowledge. The source dimension, on the contrary, was not positively correlated with SIC performance. This might be because this dimension refers to how people deal with external authorities, which, in line with our expectations, might be less associated with an active inquiry process.
The results of the multiple regression analyses revealed that text comprehension, fluid intelligence, experimentation strategies, and sophisticated epistemic beliefs about the certainty of knowledge were the best predictors of children’s performance on the SIC test. Besides cognitive abilities and experimentation strategies, epistemic beliefs that the children had developed about the certainty of knowledge positively predicted SIC performance. Due to the recoding of the items (see Conley et al., 2004), this indicates that beliefs in a high level of uncertainty in scientific knowledge are positively associated with SIC performance (when cognitive abilities and experimentation strategies are controlled for). Overall, the results confirmed the assumed relations of the SIC test scores with other constructs and helped to establish initial evidence for the construct validity of the test scores (see Cronbach & Meehl, 1955).
Implications
Our results have important implications for educational research and practice. First, the present study showed that the items on the new instrument produce reliable scale scores and can be used to assess third- and fourth-grade students’ understanding of the sequence of the steps of the SIC. The test exhibits satisfactory content validity because the items were rated by experts and were clearly derived from theory. The understanding of the SIC is an important prerequisite for inquiry learning and scientific practice. The relevance of such inquiry-based competencies has recently been acknowledged in national and international education plans (e.g., Duschl, Schweingruber, & Shouse, 2007; National Research Council, 2011) as well as large-scale assessment studies such as PISA or TIMSS. Knowledge of the process of scientific inquiry refers to an advanced international benchmark (highest competence level) in Grade 4 (Martin, Mullis, Foy, & Hooper, 2016). The SIC test complements and broadens existing tasks and tests for elementary school children (e.g., Bullock & Ziegler, 1999; Koerber et al., 2015; Mayer et al., 2014) because, in contrast to previous scales, it assesses a meta-perspective on scientific reasoning. Thus, the SIC test can assess the core of a comprehensive and complex research process in a simple and economic manner, and can easily be used to measure young children’s understanding of this process.
Second, our study demonstrated that the 8- to 10-year-old children in the study were competent in solving the SIC tasks and understood the process of scientific inquiry. Although the children were not asked to generate or explain information by themselves, it can be assumed that they might possess an early understanding of the deductive hypothesis-driven process of knowledge seeking and change (Kuhn & Franklin, 2006; Zimmerman, 2007). Our findings support the recommendation that educators should incorporate scientific inquiry methods into science curricula (see education plans; National Research Council, 1996, 2011) even as early as elementary school. Because there is evidence that elementary school children can already understand the processes and goals of inquiry, under the guidance of a teacher, these children might be able to plan, conduct, and interpret experiments independently (see Colburn, 2000).
Limitations and Future Research
Although our study demonstrated that the understanding of the SIC could be reliably and validly measured in elementary school children, some limitations should be considered when interpreting the results. First, our study was narrowly focused on a specific sample of third and fourth graders and investigated their competences using the SIC test in a cross-sectional design. As there is evidence that SIC performances increase with age, it might be promising to also administer the test to older age groups or to investigate students’ achievement on the SIC test longitudinally. According to Mayer et al. (2014), this might provide further insights into “age differential developmental changes . . . and the early impact of prerequisites contributing to development” (p. 50). Given that so far there are also no instruments for adults, it might be promising to assess the SIC competences of adults (e.g., university students). This might provide insights into whether, for example, trainee teachers or laypeople have an understanding of the inquiry process.
Second, the SIC test comprises a simplification of the very complex process of scientific inquiry and cannot take scientific practices of different branches of science (e.g., evolutionary biology or genetics) into account. However, the test can be implemented to measure the core of the process of hypothetical-deductive scientific research and inquiry that is accepted and applied by scientists in many different domains. This represents a domain-general approach to assess a general scientific principle, but limits simultaneously the consideration of discipline-specific methods.
Third, the limited reliability and the selective misfit of the SIC items should be taken into account, which is, however, in line with the reliability of scores from other scientific reasoning scales for 8- to 10-year-old children (e.g., Koerber et al., 2015; Mayer et al., 2014). The development of additional items for each of the inquiry steps might increase reliability and enable a detailed analysis of the difficulty of the respective steps. It can be assumed that certain sequences (e.g., analyzing data after data collection) might be easier for the children to understand than other sequences (e.g., closing the inquiry cycle by starting a new phase of inquiry after drawing inferences). Also, the division of the SIC into the described phases (see Pedaste et al., 2015) needs further investigation. Both single-choice items representing the phase “planning an experiment” had to be excluded from the scale due to low item selectivity (correlation of item with test score). This indicates that those items did not represent the competence we intended to measure with the SIC test. An alternative explanation could be that the students did not discriminate between the phases “planning” and “conducting” an experiment. From a theoretical point of view, both phases represent scientific thinking and are important elements of the SIC. However, it might have been difficult for the children to identify those phases as distinguishable steps.
Fourth, the paper-and-pencil format of the test should be critically examined. Such formats do not allow insight into the underlying thought processes (see Mason, 2016). Other formats, such as think-aloud protocols or computer-based assessment of the understanding of the inquiry cycle (e.g., log file analyses), might be complementary approaches for future research (e.g., Young, 2014). Nonetheless, paper-and-pencil tests are essential for assessing students’ competencies in large-scale studies or group testing situations.
Finally, further validation of the SIC items with other, more reliable instruments or tasks is needed. Here, the limited reliability of some of the validation instruments (i.e., the scale for measuring experimentation strategies) should be critically taken into account and might have distorted the results. Although the assumed unidimensionality of the experimentation scale was shown in a CFA, this scale needs further item analyses and revision because the children had trouble providing consistent answers. Given that the psychometric quality of this scale is very limited, its interrelations with the other constructs should be interpreted carefully and are likely to be underestimated (see Fan, 2003). This limitation strengthens the need for further validation of the SIC test scores with more reliable instruments. Specifically, it might be promising to investigate the relations between SIC test performance and performance on existing scientific reasoning tasks that were not available when we conducted our study (e.g., by Koerber et al., 2015; Mayer et al., 2014). Furthermore, validation with other cognitive variables that have been found to be related to scientific reasoning (e.g., problem solving, inhibition, spatial abilities; see Mayer et al., 2014), metacognitive processes (e.g., planning skills, strategy use), motivational factors, or sociocultural background might be promising for explaining individual differences. Finally, validation with science grades as well as hands-on scientific practices might be important for investigating the criterion validity of the SIC test scores. This might answer the question of whether solving the SIC items is related to science achievement at school as well as to the ability to conduct “real experiments” in a lab (see Flick, 1993).
Conclusion
Taken together, the present study showed that it is possible to assess elementary school children’s understanding of the SIC with a paper-and-pencil test. The test could be implemented objectively, and the results helped to establish initial evidence for the construct validity of the test scores. It broadens the pool of existing instruments for measuring scientific reasoning in this age group. In addition to recent studies (e.g., Koerber et al., 2015; Mayer et al., 2014), the present investigation provides further empirical evidence that elementary school children are competent in scientific reasoning. The presented SIC test can be administered to assess the core of scientific reasoning and might be implemented in future research for assessing elementary school children’s scientific reasoning competencies or the measurement of educational progress in the area of science learning (e.g., within the evaluation of science interventions for high ability students or school curricula).
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research project was funded in part by a grant from the Hector Foundation II. When the study was conducted, Julia Schiefer was a doctoral student at the LEAD Graduate School & Research Network (GSC 1028), which is funded by the Excellence Initiative of the German federal and state governments. Special thanks go to Norman Rose from the Hector Research Institute of Education Sciences and Psychology and Johann Jacoby from the LEADing Research Center for their methodological advice.
