Abstract
The study of complex systems has been highlighted in recent science education policy in the United States and has been the subject of important real-world scientific investigation. Because of this, research on complex systems in K–12 science education has shown a marked increase over the past two decades. In this systematic review, we analyzed 75 empirical studies to determine whether the research (a) collectively represents the goals of educational policy and real-world science, (b) has considered a variety of settings and populations, and (c) has demonstrated systematic investigation of interventions with a view to scale. Results revealed needs in five areas of research: a need to diversify the knowledge domains within which research is conducted, more research on learning about system states, agreement on the essential features of complex systems content, greater focus on contextual factors that support learning including teacher learning, and a need for more comparative research.
In Science for All Americans (American Association for the Advancement of Science, 1989), and the subsequent Benchmarks for Scientific Literacy (American Association for the Advancement of Science, 1993), the topic of systems occupied a prominent role as one of the four common themes driving science learning in U.S. K–12 education. Since then, science education researchers have focused on constructing and testing methods for teaching and learning about systems, with an emphasis on complex systems (Hmelo-Silver & Azevedo, 2006; Sabelli, 2006; Wilensky & Jacobson, 2015; Yoon, in press). Although this research has resulted in a number of new digital tools, curricula, and theoretical frameworks, it has not generated consensus on how best to support student learning. With the recent launch of the Next Generation Science Standards (NGSS Lead States, 2013), there is a similar emphasis on systems learning, except that the processes fueling systems, such as energy and matter, are also included as important crosscutting concepts. Such concepts are understood to apply to all the content domains encompassed in the NGSS as core ideas in the Physical, Life, and Earth and Space Sciences as well as Engineering, Technology, and Applications of Science.
As we work to translate the NGSS into curricular experiences for all students, understanding the kind of empirical research that has been done on complex systems in the past would be helpful to determine how best to support classroom learning. Given that systems and their processes are crosscutting concepts, it would be prudent to understand how robust the literature base is in terms of the content domains represented as well as the utility of that content. Similarly, understanding how diverse the research base is with respect to populations it has studied can help us understand whether learning outcomes can be attributed to a majority of students. Finally, for educational resources to be adopted at scale, the reliability of methods should be examined in the research that purports to support improved learning.
We begin this article by defining concepts in the study of complex systems that are central to understanding how they exist and function. The section that follows outlines NGSS teaching and learning goals and recent educational policies that identify systematic frameworks that should be considered when evaluating curricula, resources, and practices that have the potential to be scaled up and that represent real-world scientific inquiry. Next, we describe our methods for narrowing down and reviewing empirical studies of complex systems in science education (CSSE). A review of the empirical studies follows. We conclude with recommendations for classroom learning and directions for future research.
What Are Complex Systems?
The study of complex systems can be summarized as understanding how behavior of phenomena at different scales is related and how larger scale patterns emerge from the interdependent components at lower scales (Bar-Yam, 2016). By studying the patterns that emerge and the interactional processes that lead to these patterns, researchers can better understand how systems adapt, self-organize, fluctuate, and reach and maintain equilibrium. Scientists and engineers have for a long time been interested in investigating structures, dynamics, and states of complex systems that can be applied to real-world problems. For example, in a recent conference, the National Academies amassed a multidisciplinary group of leading American complexity researchers to examine nine pressing global issues. These included ecological robustness (whether the biosphere is sustainable), enhancing robustness through interconnectivity (e.g., power grid structures, disaster relief networks), and how to exert control on the spread of disease (The National Academies, 2009). The goal of this work was to identify the limits, optimal states, and weaknesses within systems such that interventions could be applied to enhance stability in the face of perturbations. Solving these issues, as many of the researchers admitted, is challenging given the nonlinear, stochastic or random, and decentralized nature of complex systems. Here, and in complexity organizations such as the Santa Fe Institute and the New England Complex Systems Institute, researchers are interested in developing visual and mathematical models (e.g., Servedio et al., 2014) that can help describe and predict various systems states as well as the dynamics that fuel them. Optimization, resilience, and robustness are conditions that complexity researchers study and model for utilitarian purposes (e.g., Allison & Martiny, 2008; Caschera et al., 2011; Levin & Lubchenko, 2008).
In terms of content, complex systems researchers represent a multitude of scientific knowledge domains, including physics (e.g., Fuentes, 2014; Sherrington, 2010), chemistry (e.g., Hordijk, Kauffman, & Steel, 2011), biology (e.g., Sah, Singh, Clauset, & Bansal, 2014), and ecology (e.g., Bettencourt & Brelsford, 2014; Sole, 2015). Complex systems researchers also recognize the universal nature and intersectionality of systems within different domains and seek to highlight this universality and reveal underlying principles that unite them (West, 2014). Although a close cousin, this review does not include studies in the field of systems dynamics that focus on macro-level systems functions (typically of engineered systems) and generally aim to quantitatively track the rate of flow of resources to optimize whole system efficiencies (e.g., Forrester, 1994). Similarly, it does not include studies of dynamic systems, which hail from the field of developmental psychology and focus mainly on identifying the dynamics occurring within and between individual actors that support or constrain mental and behavioral growth (e.g., Lewis, 2000; Thelan & Smith, 1994).
NGSS Teaching and Learning Goals
Emphasizing the utility of science learning and how scientists come to understand and apply their research is a central focus of the Framework for K12 Science Education (National Research Council [NRC], 2012) that underpins the NGSS. The vision states, The learning experiences provided for students should engage them with fundamental questions about the world and with how scientists have investigated and found answers to those questions. We anticipate that the insights gained and interests provoked from studying and engaging in the practices of science and engineering during their K12 schooling should help students see how science and engineering are instrumental in addressing major challenges that confront society today, such as generating sufficient energy, preventing and treating diseases, maintaining supplies of clean water and food, and solving the problems of global environmental change. (p. 9)
We can see here that the NGSS vision for science education maps well onto the research that complexity scientists are undertaking in terms of solving problems confronting society and the environment.
With respect to specific content involving systems in the NGSS, arguably all crosscutting concepts bear some relationship to the study of complex systems. To review just three—concepts of scale, proportion, and quantity relate to the importance of recognizing what is relevant at different measures of size, time, and energy and how changes in scale, proportion, or quantity affect a system’s structure or performance. For example, to understand the complexity of a particulate system, it is important to recognize that at the component level, particles in a gas system move in straight lines, collide with one another, and alter their speed and direction of motion. Changes in these behaviors such as the number of particles, the energy of particles, and the frequency of collisions can affect the system’s outcomes such as energy transfer, random motion of particles, and diffusion.
Concepts of systems and system models include examining the system in detail to determine system interdependencies and recognizing the fact that properties and behaviors that exist at a system level may be different from the lower level interactions from which they emerged. In addition, these system level properties and behaviors are often difficult to predict from knowledge about the components and their interactions (NRC, 2012, p. 92). Concepts in this category illustrate core complex systems concepts of interdependence, emergence, and self-organization, among others (cf. Jacobson, 2001).
Concepts of stability and change emphasize static and dynamic equilibrium over various time scales, and feedback loops. To understand complex systems, one must be able to recognize conditions in which aspects of a system are changing or not, and these can be dependent on the time span during which the observations are made (Booth-Sweeney & Sterman, 2007). The mechanisms of feedback loops in a system also serve to regulate processes and maintain or destabilize the system (Gotwals & Songer, 2010).
Another important challenge articulated in the framework is the need for an inclusive science education that offers equal opportunities for all students to experience high-quality learning experiences. Other policy documents have similarly highlighted goals for a more authentic and current science curriculum in K–12 and instructional practices that reach a diversity of learners in diverse settings (National Science Board, 2010; NRC, 2011; President’s Council of Advisors on Science and Technology, 2010). Addressing such details in classroom teaching and learning is not an easy task, however, particularly because the NGSS represents a new vision for science education—one that differs from typical learning activities in U.S. classrooms (Wilson, 2013).
What we aim to determine in this review is whether complex systems research in education similarly reflects the research aims of real-world scientific investigation and the NGSS teaching and learning goals.
Scaling Research and Innovations
The development of classroom resources that can be adopted at the scale intended for NGSS should go through a number of steps and evaluations. Recently, the Institute of Education Sciences (IES) and the National Science Foundation (NSF) in the United States outlined a continuum of methodological levels that influence increased reliability of intervention outcomes at scale (IES/NSF, 2013). The continuum delineates six types of research beginning with foundational and early-stage or exploratory research working with single and ideal populations and ending with scale-up research in which findings can generalize to multiple populations with attention to situated mediators (a description of the stages of research appears in a later table). Determining how the corpus of educational research in complex systems fares against these categories of research and development can offer information about the reliability of research findings that are intended to support student learning.
Other recent frameworks aimed at producing reliable outcomes that can scale to many populations and educational contexts emphasize a need to consider situated variables such as population and context. Design-based implementation research focuses on designing and testing interventions across different levels and settings of learning (Penuel & Fishman, 2012). This framework suggests that there should be a commitment to an iterative design cycle, attention to particular learner responses, and the development of infrastructures that sustain change. The Carnegie framework for improvement science similarly focuses on building interventions that scale via evidence collected from a variety of educational situations that address variation in performance and system variables that influence it (Bryk, Gomez, & Grunow, 2010; Bryk, Gomez, Grunow, & LeMahieu, 2015). Of central importance to our review here is whether complex systems research in education has tested interventions with consideration to context diversity.
Purpose of the Review
The purpose of this review is to examine the research conducted on CSSE from 1995 to 2015. The year 1995 was selected as the starting point because the bulk of interventions and scholarly work emerged after the publication of the Benchmarks for Scientific Literacy (American Association for the Advancement of Science, 1993) where, as previously noted, the concept of systems was prominently featured. We also assumed a need to allow a 2-year period for interventions to be constructed, implemented, and reported.
We are interested in investigating how the corpus of CSSE research addresses real-world complex systems research and applications, fulfils NGSS intended standards and goals, and can be scaled to support all students. The research questions underpinning the review are as follows:
To what extent does CSSE research represent the state of complex systems research and the NGSS standards and goals?
To what extent has CSSE research been conducted in a variety of educational settings with a variety of populations?
To what extent has CSSE research demonstrated systematic investigation in terms of methods toward scaling interventions?
We highlight themes that emerge from this group of studies that illustrate areas of robust research emphasis such as particular content domains, conceptual foci, or methodologies. We also highlight what the research says about concepts that are more or less challenging for students to learn and concepts that the field should investigate in terms of what students know and how to support learning. It is important to note, however, that a meta-analysis of student learning outcomes is outside of the scope of this review.
Method
Literature Search Procedure
This systematic review focused on peer-reviewed empirical articles in the field of science education. Articles for review were selected through the following method. First, we defined the keywords needed to search for complex systems articles. These keywords are: (a) complex systems; (b) system, science education; and (3) complexity. Using these keywords, we performed three different searches with each of the keywords in three different databases: Education Resources Information Center (ERIC), Education Full Text, and PsycINFO. We selected these databases because they are the most commonly used ones for educational studies and previous published review articles with similar content have used these databases (e.g., Gerard, Varma, Corliss, & Linn, 2011). Table 1 shows that from this first-level search, there was a large number of hits from each of the databases. Using the inclusion criteria discussed below, one of the authors studied the titles of the articles as well as skimmed the abstracts, over a 3-month period, to yield the numbers extracted in the last column of the table. This number equaled a total of 234 articles. When the research team examined the results of the first level of extraction, we were concerned that with such large numbers, human error may have caused some articles to be missed that could be germane to the study.
Articles extracted: database, search strings, number of hits, and number extracted after inclusion criteria applied
To ensure that all empirical educational articles were selected, we performed another level of systematic review with a finer grained search on ERIC. We chose ERIC as the single database to perform this search because it yielded the largest number of hits in the first level. In this second level, we first searched for the keywords of system AND scien*. We then included another level with the following keywords: (a) complex*, (b) decentral*, (c) emergen*, (d) causal*, (e) nonlinear*, and (f) self-organi*. An asterisk (*) was used in order to include all plural versions of each of the keywords in the searches. The first-level keywords were used in every search, and one keyword from the second level was used for each individual search. For example, the first search was “system AND scien* AND complex*,” and the next search was “system AND scien* AND decentral*,” and so on. For the second-level search, both the title and abstracts were examined for inclusion criteria, which yielded 130 articles for a total combined first- and second-level search of 364 articles. See Table 1 for the detailed search string results.
Inclusion Criteria
For the literature that resulted from each search, inclusion criteria were also used to narrow the pool of relevant research. The inclusion criteria included studies that (a) encompassed K–12 science education and (b) were focused on empirical studies on teaching and learning about complex systems. For those hits for which the inclusion criteria fit was dubious, the article was examined by one or both of the other authors. The resulting extracted articles were also examined by the other two authors to verify inclusion in the final data set. After checking for duplicates across the two levels of extraction, we identified 203 different studies.
Exclusion Criteria
On more detailed examination of the 203 empirical studies, we recognized that there were some studies that did not focus on the teaching and learning of complexity in systems. For example, a number of studies investigated the use of computational tools in learning science (e.g., Azevedo, Cromley, & Winters, 2005; Azevedo, Moos, Greene, Winters, & Cromley, 2008; Azevedo, Winters, & Moos, 2004; Ioannidou et al., 2010; Parnafes, 2010). Other studies focused on instructional strategies such as collaborative learning and self-regulated learning to learn science (e.g., Brady, Holbert, Soylu, Novak, & Wilensky, 2015; Greene, Moos, Azevedo, & Winters, 2008; Randler & Bogner, 2009). Although the topics of these studies were related to systems, for example, exploring differences in student use of self-regulatory strategies when learning about circulatory systems (Greene et al., 2008), since they did not examine the learning and teaching of the complexity associated with the system, they were taken out of our systematic review. Subsequently, we were left with a total of 75 studies (see Table 2 for a list of the studies).
Seventy-five CSSE research studies between 1995–2015, included in review
Note. DD = design and development research; FEE = foundational, early-stage, and exploratory research; DBR = design-based research.
A distribution of the 75 CSSE research studies aggregated in 5-year intervals between 1995 and 2015 is shown in Figure 1. The graph shows an increasing trend in which the publication rate roughly doubled every 5 years. These studies were most commonly featured in Journal of Research in Science Teaching (10), International Journal of Science Education (9), Journal of Science Education and Technology (9), and Journal of the Learning Sciences (6).

Number of CSSE publications over the past 20 years. CSSE = complex systems in science education.
Data Analysis
This section describes the analyses performed to address each research question and is therefore divided into three sections. When interrater reliability is reported without extenuating explanatory details, it should be understood that agreement was obtained between the first two authors.
To What Extent Does CSSE Research Represent the State of Complex Systems Research and the NGSS Standards and Goals?
An analysis of the science subjects and the complex systems concepts contained in each study can provide insight into the extent to which CSSE research represents the state of complex systems research in science as well as NGSS standards and goals. Table 3 shows the categorization scheme for science subjects and complex systems concepts.
Categorization scheme for science content and complex systems concepts
The CSSE studies focused on the teaching and learning, or understanding, of scientific systems that are complex in nature. These scientific systems are found in myriad knowledge domains. We identified seven common domains or content areas, that is, biology, chemistry, computer science, earth science, ecology, physics/physical science/engineering, and complex systems in general. Some studies were coded for more than one content area due to the interdisciplinary topics they covered. Interrater reliability was completed on 20% (16) of the 75 articles with adequate agreement between the authors (kappa = 0.7; Landis & Koch, 1977). Disagreements about the science content of the studies were resolved after discussion.
Constructing a categorization manual to code complex systems characteristics was more challenging due to the different phrasing of processes and structures among the studies. We decided to use a qualitative content analysis approach in which clearly defined categories were developed inductively (Mayring, 2000). The first two authors read and coded 10 studies to identify an initial set of concepts that the authors of the articles themselves used to describe the complex systems characteristics investigated in their respective interventions. The characteristics were then discussed and revised, after which another 10 articles were coded. A preliminary framework of complex systems characteristics was constructed and further elaborated on throughout the process. Studies could be coded as including multiple relevant complex systems characteristics. The two authors then coded the rest of the articles separately and discussed the codes. Any discrepancies in the codes were resolved after several rounds of negotiation.
Finally, a framework was completed comprising three macro categories (i.e., structure, processes, and states) within which 13 complex systems concepts were embedded (see Table 3). These superordinate categories are often referred to in descriptions of differences between complex systems approaches and linear approaches when describing the world. For example, Mitchell (2009) writes that traditionally, science has often adopted a linear and reductionist approach, focusing on “breaking down” these systems to their components for analysis of their behaviors. While this approach simplifies the learning, it does not do justice to the multiple interactions among the components and interrelationships within the system, which give rise to systemic properties and patterns (Capra, 1996). We interpret the notion of ‘system components’ as referring to system structures. The notions of “multiple interactions” and “interrelationships” refer to processes that fuel systems; giving “rise to systemic properties and patterns” pertains to emergent states.
In our framework, Structure characteristics refer to the physical features of the system, including micro and macro levels or different scales, the number and names of variables, how the variables are connected and the number of connections, the system organization (whether it is linear or nonlinear), and inputs and outputs or initial conditions. There are five complex systems characteristics related to the structure category. Process characteristics refer to the dynamics and mechanisms that fuel complex systems evolution. These include how the variables in the system are interdependent or form relationships, the processes of self-organization and emergence, the causal nature of processes such as feedback and cycles, and perturbations that trigger shifts in system structures and dynamics. There are also five complex systems characteristics related to processes. States characteristics refer to how complex systems exist in the world as a result of shifts or due to existing structures and processes. Three complex systems characteristics of states were identified in the studies.
To What Extent Has CSSE Research Been Conducted in a Variety of Educational Settings With a Variety of Populations?
Analysis of the demographics of research participants in the various studies provided insight into the extent to which CSSE research has been conducted in a variety of educational settings. Three aspects of demographic information commonly provided in peer-reviewed articles were examined. The first was what kind of population participated in the study with respect to students and/or teachers and also the grade range of the students (elementary, middle or high school) and the experience level of the teachers (preservice or in-service). In studies where student ages were given instead, the grade levels were deduced from the given ages. The final two categories included were the gender and the race and ethnicity of the study population. The U.S. basic categorization for races and ethnicities (Office of Management and Budget, 1995) was used to code the race and ethnicity composition in the sample.
While the information was relatively easy to extract with little ambiguity, interrater reliability was still performed on 20% of the articles with very agreement (kappa = 0.85 − 0.9; Landis & Koch, 1977) for all four demographic dimensions. If no information was provided in the article on a particular demographic aspect, it was duly noted. Table 4 describes the details of the coding of the demographic information.
Categorization scheme for demographics of research participants
To What Extent Has CSSE Research Demonstrated Systematic Investigation in Terms of Methods Toward Scaling Interventions?
Analysis of the research contexts, methodologies, and purposes can provide insight into the extent to which CSSE research has demonstrated systematic investigation in the past 20 years. Specifically, we examined the research setting, school type (if it was conducted in a school), sample size, research methods, and the research and educational purposes of the study.
Research setting refers to where the study was conducted, for example, in a classroom, an informal learning environment (e.g., museum and summer camp), or a laboratory setting.
School type refers to the kind of schools the sample was taken from for studies conducted in a classroom setting. They were coded as public (mainstream), public (charter or special target), or private schools.
Sample size refers to the number of participants involved in the study. If a study included two samples, the larger of the two sample sizes was documented. Such cases are also noted in Table 2.
Research method refers to the sources of information that were used, how the information was sampled, and the types of instruments that were used in data collection. This review largely coded the studies based on the type of data collected—qualitative, quantitative, or mixed methods. Studies coded as qualitative research included case study, ethnography, grounded theory, narrative, and phenomenological research, while studies coded as quantitative included experimental, quasi-experimental, and nonexperimental. Mixed-methods studies were those that include both qualitative and quantitative research methods.
Research aim refers to whether the study was early-stage exploratory research, design and developmental research, or efficacy and scale-up research. This coding was adapted from the IES/NSF (2013) classification of educational research.
Finally, the educational purpose refers to the dimension of CSSE investigated in the study. For example, did the study examine student or teacher learning as part of an intervention, or was the research conducted to determine existing levels of complex systems understanding? The six CSSE dimensions were first constructed by one of the authors after reading through half of the articles. The dimensions were subsequently discussed and refined among the three authors. Interrater reliability was also completed on 20% of the articles with respect to this category (that were not reviewed for the construction of the coding scheme) with substantial agreement obtained (kappa = 0.65–0.8; Landis & Koch, 1977).
Further details about the categorization schemes for research contexts, methodologies, and purposes appear in Table 5.
Categorization schemes for research context and methodology.
Note. CSSE = complex systems in science education. 2010–2015 is a 6-year period in which eight CSSE articles were published in 2015.
Results
Representations of Complex Systems Research
Science Content Areas
Figure 2 shows the distribution of content areas represented in the group of CSSE studies. Ecology and biology were the most commonly coded content areas, with 38 studies (51%) and 24 studies (32%), respectively. Within these content areas, topics of investigation ranged from environmental issues and ecosystems to human body systems and genetics. The content area of earth science was represented in eight studies (11%) with smaller numbers coded in the remaining content areas. Nineteen studies (25%) were coded in more than one content area.

Distribution of science content.
Complex Systems Characteristics
A total of 185 complex systems characteristics were coded as target learning content in the CSSE studies we reviewed. The distribution of complex systems characteristics coded in the categories and subcategories can be found in Figure 3. Of the total number of complex systems characteristics, 86 (46%) were coded in the Structures category. Another 79 (43%) were coded in the Processes category, and just 20 (11%) were coded in the States category. Among the subcategories, the highest frequency of complex systems concepts appeared in Interdependence/Relationships, which was coded in 33 (18%) of studies. The subcategories with the next highest frequencies were Connections (coded in 27, or 15%, of studies) and Levels/Aggregates/Scale (coded in 25, or 14%, of studies). Relatively low numbers appeared in the subcategories of Inputs/Outputs or Initial Conditions, Self-Organization, Perturbations, Equilibrium/Stability, and Decentralized. It is also important to note that some essential complex systems concepts related to real-world scientific investigations were missing in the CSSE studies analyzed. These include the characteristics of robustness and resilience, which would belong in the States category.

Distribution of complex systems concepts.
Educational Settings and Populations
Fifty-nine studies (79%) focused on students as the target groups, with relatively equal distribution across elementary, middle, and high school grades. Twelve studies (16%) included students across elementary, middle, and/or high school grades. In comparison, only 11 studies (15%) focused on teachers as target participants, and 7 studies (9%) included both teachers and students in their samples.
Regarding gender, we were unable to derive or infer the gender composition in 38 (51%) studies as they did not provide sufficient information about their sample. Of the group that did report gender composition, there appeared to be sufficient balance between number of males and females in the study samples.
Similarly, with respect to race and ethnicity, we were unable to distil much information in this category because 59 studies (79%) did not report details about the racial and ethnic composition. While the rest did suggest diverse races and ethnicities in their samples, it was difficult to ascertain the proportion because of the lack of description. Of the 12 studies that provided sufficient information, there were no racial or ethnic groups specifically targeted.
Research Methods
Research Setting and School Type
There were 47 studies (63%) conducted in school settings, 24 studies (32%) conducted in laboratory settings, and 6 studies (8%) conducted in informal settings. One study was conducted in both school and laboratory settings, and one in both school and informal settings. Of the 47 studies conducted in school settings, 27 (52%) did not report sufficient information to distil the types of school the student or teacher samples were from. Of those that did, 19 of the studies (40%) were conducted in mainstream schools while only one study was conducted in a private school.
Sample Size and Methodology
In the category of sample size, forty studies (53%) had sample sizes of 50 or fewer participants. Of this segment, 16 (21%) had sample sizes of 20 or fewer participants. Nine studies (12%) had 51 to 100 participants, and seven studies (9%) had 101 to 150 participants. Two studies were not coded as they did not record their sample sizes (Stroup & Wilensky, 2014; Wilensky & Resnick, 1999).
In the category of methodology, mixed methods were the most common methods used in the CSSE studies. A total of 40 studies (53%) used mixed methods in their investigation, 24 studies (32%) used qualitative methods, and 11 studies (15%) used quantitative methods.
Research Aim and Educational Purpose
In terms of research aim, 41 studies (55%) were coded as Foundational, Early-Stage, and Exploratory (FEE) research, and 28 studies (37%) were coded as Design and Development (DD) research. Six studies (8%) were coded as both FEE and DD. However, no study was coded as Efficacy, Effectiveness, and Scale-up (EES) research.
For educational purpose, most of the CSSE studies investigated student learning and/or understanding of complex systems. There were 50 studies (67%) that examined how student learning of complex systems can be facilitated or improved, while 28 studies (37%) assessed students’ state of understanding. Only seven studies (9%) investigated teacher understanding and/or learning of complex systems. There were 19 studies (25%) in which conceptual frameworks or research instruments were developed to measure or interpret complex systems understanding. Two studies explored the relationship between instructional practice and student learning of complex systems, and another study examined the relationship between teachers’ understanding of complex systems and their classroom instruction. Figure 4 shows the distribution of the dimensions of CSSE in the group of studies.

Distribution of CSSE dimensions. CSSE = complex systems in science education.
Discussion
In this section, we discuss each of the analyses and highlight areas of strength as well as gaps in the corpus of CSSE research with respect to the research questions. Again, more details about each study can be found in Table 1.
Strong Representation in Biology and Ecology Studies
Complex systems researchers address issues in multiple scientific knowledge domains and search for underlying principles that are universal in nature (West, 2014). We found an abundance of CSSE studies in the domains of biology and ecology. The biological systems in these studies included human body systems such as the circulatory and respiratory systems (e.g., Cheng & Gilbert, 2015; Hmelo-Silver, Marathe, & Liu, 2007; Ioannidou et al., 2010), cellular systems (Penner, 2000; Verhoeff, Waarlo, & Boersma, 2008), and genetics or genetic engineering (e.g., Klopfer, Yoon, & Perry, 2005; Yoon, 2008). These studies have revealed a number of issues in student reasoning about complex systems. For example, Hmelo-Silver and colleagues have conducted studies with students and teachers to explore differences in how experts and novices understand systems. They found that experts recognize the integrated nature of system structures (components), behaviors (component interactions), and functions (outcomes of interactions) and use the latter two (i.e., behaviors and functions) as deep principles to organize their knowledge of the system (Hmelo-Silver et al., 2007; Hmelo-Silver & Pfeffer, 2004). Novices, on the other hand, only reason about structures and largely ignore behaviors and functions of systems. Verhoeff et al. (2008) found that while students were able to accurately reason about the structures and relationships between levels of cellular organelles, cells, and organs, they had difficulty understanding how the different levels of activity functioned as coherent organ processes. On the topic of genetic engineering, Yoon (2008) found that the process of decentralization was a more challenging concept to grasp than other complex systems concepts. Decentralization is the notion that control of systems (e.g., how genetically modified organisms spread) is often distributed across different components rather than localized in one component.
Studies focused on the domain of ecology, which represent more than half of the total CSSE research in our sample, have likewise highlighted student learning challenges. On the topic of population growth, Wilkerson-Jerde and Wilensky (2015) discussed that students lack inferential reasoning about how individual behaviors generate group-level patterns. They suggested that a greater focus on recognizing mathematical relationships in graphical representations can support learning about exponential growth, which is a typical outcome of complex systems. Identifying essential relationships that fuel complex systems functions has proven to be challenging in various ecological systems. Varma and Linn (2012) found that only a few students in their study understood the relationship between the albedo process and Earth’s temperature on the topic of global warming. Other research has investigated what students understand about the impact of human-engineered systems on environmental natural systems. For example, Hogan (2000) found that middle school students tended to reason in a direct fashion about how pollutants affect ecosystems rather than recognizing the importance of indirect effects. Tsurusaki and Anderson (2010) discussed that a sound understanding of the relationship between human activity and ecological systems varies across grade levels.
Need to Diversify Other Knowledge Domains
Collectively, CSSE research in biological and ecological systems has provided a great deal of knowledge about how students reason about complex systems. The plethora of studies have already spawned research in developing learning progressions across grades in topics such as water systems (Gunckel, Covitt, Salinas, & Anderson, 2012), carbon cycling (Mohan, Chen, & Anderson, 2009), and biodiversity (Songer, Kelcey, & Gotwals, 2009). But our review of the field revealed far fewer studies in the domains of chemistry (Chi, Roscoe, Slotta, Roy, & Chase, 2012; Levy & Wilensky, 2009, 2011; Vachliotis, Salta, & Tzougraki, 2014; Wilensky & Resnick, 1999), physics or physical sciences (DeLeo, Weidenhammer, & Wecht, 2012; Klopfer, Scheintaub, Huang, Wendel, & Roque, 2009; Perkins & Grotzer, 2005; Stavrou & Duit, 2014; Stavrou, Duit, & Komorek, 2008); and astronomy (Calderon-Canales, Flores-Camacho, & Gallegos-Cazares, 2013; Gazit, Yair, & Chen, 2005). Given that an essential goal of K–12 science education is to understand the utility of the scientific research enterprise and recognize how scientists apply their research (NRC, 2012), diversifying the knowledge domains within which students learn about complex systems represents an important area of future CSSE research.
Strong Representation in the Study of Complex Systems Structures and Processes
Our analysis of the complex systems concepts found within the CSSE studies demonstrated relative depth in the areas of systems structures and processes. Within these metalevel categories, a number of the studies attended to learning about connections between systems components, and how they form aggregate levels at different scales. This research is particularly well represented in a series of studies conducted by Levy and Wilensky (2008, 2009, 2011). Using agent-based computational simulations of chemical systems, the authors discuss the importance of developing conceptual knowledge that connects submicrolevel particle behaviors and interactions with emergent structures formed at the macro system-wide level. They further suggest that when students apply reasoning about midlevel structures, such as groups and clusters that can form as intermediate structures between microlevel and macrolevel states, they demonstrate deeper levels of understanding of processes that fuel emergent behaviors and states as well as a deeper level of understanding of the scientific domain.
A preponderance of studies focused on helping students recognize the relationships and interdependencies of system components. Using aquaria, Jordan, Brooks, Hmelo-Silver, Eberbach, and Sinha, (2014) showed that middle school students in their study were more easily able to recognize system interdependencies found in the processes of photosynthesis (plants making food) and limiting factors (e.g., oxygen) but not cellular respiration (converting food to energy) or eutrophication (lack of oxygen). They suggested that a lack of understanding of the reciprocal relationship between photosynthesis and cellular respiration can lead to difficulties in understanding the critical importance of having enough oxygen to maintain healthy aquatic ecosystems. Efforts to enable students to understand these critical relationships and interdependencies in systems have produced learning models and assessment tools that have been examined in multiple lines of research such as the system thinking hierarchical model (e.g., Assaraf & Orion, 2010).
Need for More Research and Interventions on System States
Structures and processes are certainly central features of complex systems and arguably represent a place to start in terms of developing a basic level of understanding. However, there were some gaps in the CSSE literature that warrant discussion and point to a need to develop a deeper understanding of how systems operate and exist. The relatively low numbers that appeared in the structures and processes categories in the concepts of initial conditions, perturbations, and self-organization (e.g., Assaraf & Orpaz, 2010; Ginns, Norton, & Mcrobbie, 2005; Gotwals & Songer, 2010; Hogan, 2000; Puk & Stibbards, 2011) indicate that few studies examine the critical nature of how complex systems can differ based on initial variables, and how triggers can influence how systems self-organize. Such investigations are important to be able to model phenomena and make predictions, which are two scientific practices at the core of real-world complex systems research and are highlighted the NGSS (NGSS Lead States, 2013). Similarly, the analysis revealed only a few studies that focused on complex systems states such as equilibrium and decentralization (e.g., Basu, Sengupta, & Biswas, 2015; Eilam, 2012; Hmelo-Silver, Liu, Gray, & Jordan, 2015; Peppler, Danish, & Phelps, 2013; Repenning, Ioannidou, Luhn, Daetwyler, & Repenning, 2010; Yoon, 2008) and no CSSE studies that investigated system robustness and resilience. It is clear that more educational interventions are needed that examine system states, which are the focal investigations of scientists concerned with addressing pressing global issues such as biosphere sustainability (The National Academies, 2009).
Strong Representation of Studies About Student Learning
Given the goals of NGSS to provide high-quality K–12 science learning experiences for all students, this review also sought to determine the extent to which the corpus of CSSE studies has conducted research in a variety of educational settings and a variety of populations. The target group analysis showed that research has been conducted on a range of age-groups and grade levels across the K–12 spectrum, although, as we discuss further on, much of this research is noncomparative. Other outcomes of this student learning focus include the importance of using computational tools in instruction, emerging theories about why learning about complex systems is challenging, and theoretical frameworks for assessing complex systems.
A number of studies (e.g., Klopfer, Yoon, & Perry, 2005; Repenning et al., 2015; Vattam et al., 2011; Wilensky & Reisman, 2006; Yoon, Koehler-Yom, Anderson, Lin, & Klopfer, 2015) describe computational tools that have been developed to visualize structures and mechanisms that enable users to view the evolution of systems over time. A particularly robust line of research has been aimed at developing agent-based simulations represented in modeling tools such as NetLogo (Wilensky & Reisman, 2006) and StarLogo (Yoon et al., 2016). These simulations allow students to manipulate and construct facsimiles of scientific systems in which changes in initial conditions, random variation, decentralized interactions, and self-organized emergent behaviors (among other system characteristics) are investigated. Repenning et al. (2015) have developed what they call “collective simulations” that tie together social learning techniques in the classroom of learners with networked computers to engage students themselves in the simulations. Similar earlier efforts to incorporate students as system agents can be found in the concept of participatory simulations (Klopfer, Yoon, & Perry, 2005; Stroup & Wilensky, 2014).
Research on student learning about complex systems has also advanced emerging theories about why learning about complex systems presents challenges. Grotzer and colleagues found that students tend to reason about immediate effects rather than cascading or indirect effects. For example, students fail to realize that a change in one population can have impacts on populations that are not directly linked through domino-like or cyclic complex causal relationships (Grotzer & Bell Basca, 2003; Grotzer et al., 2015). Chi et al. (2012) have hypothesized that learning difficulties about complex causality and nonlinear dynamics may be caused by an inability to recognize the difference between direct schema and emergent schema. Direct schema develop within learners from commonly observed everyday life events. They proceed through linear interactions as simple as needing to shop for food and driving one’s car from home to the market. However, emergent schema can be developed only if or when learners realize that complex phenomena exist in a series of nonlinear interactions. For example, ecosystems are weblike structures in which a trigger, like a forest fire in one part of the system, will quickly affect many parts of the forest because they are all interconnected.
CSSE studies of student learning have also generated a few theoretical frameworks that locate learning in different aspects of systems. The Structure–Behavior–Function framework hails from the field of engineering systems design (e.g., Bhatta & Goel, 1997) and is represented in a number of studies (e.g., Danish, 2014; Hmelo-Silver et al., 2007). A systems understanding through the Structure–Behavior–Function lens follows from a hierarchical knowledge of system characteristics. The components or structures (e.g., hybrid or electric motor) and behaviors (e.g., energy consumption) must first be understood in order to work with the system to achieve the desired output or function (e.g., how far a car can travel). Yoon (2008, 2011), based on earlier work by Jacobson (2001), offered a framework encompassing core concepts within complex systems learning that identify “clockwork” versus “complex systems” mental models. A clockwork orientation views the world from a Cartesian perspective (Capra, 1982) that generally views the world and its constituents as machines. It is based on a method of analytic thinking that involves breaking up complex phenomena into pieces to understand the behavior of the whole from the properties of its parts. This is in contrast to a complex systems view, according to which the essential properties of an organism (or complex system with a constant influx of energy) are properties of the whole—properties that none of the parts have on their own. A system’s central properties arise from the interactions and relationships among the parts, which is the dynamic process of emergence. This framework investigates the processes that fuel emergence and change in systems from micro to macro levels. The above-mentioned system thinking hierarchical model (e.g., Assaraf & Orion, 2010) represents yet another framework for understanding student learning.
CSSE studies have made significant inroads with respect to what students know about complex systems and how learning can be supported. A prudent next step might be for the field to reach consensus on essential content features of complex systems learning.
Need for More Research on Teacher Learning
Equally important for the field is a greater focus on what teachers know. Our results show that comparatively few studies have worked with teachers as their target population (e.g., Carlsson, 2002; Hmelo-Silver et al., 2007; Klopfer, Yoon, & Perry, 2005; Liu & Hmelo-Silver, 2009; Yoon et al., 2015). We do know from these studies that teachers typically have a weak understanding of complex systems concepts such as behaviors and processes that fuel systems (Hmelo-Silver & Pfeffer, 2004; Liu & Hmelo-Silver, 2009) and that developing teacher expertise in the pedagogies and tools that support complex systems, such as computational modelling tools, is essential to good instruction (Yoon et al., 2015).
Furthermore, the analysis of educational purpose with respect to dimensions of CSSE research showed that a majority of studies focused on improving student learning through specific interventions, which supports the goals articulated in the NGSS. However, few studies have investigated teacher learning needs as well as teachers’ states of complex systems understanding. Of particular interest in this category is also a lack of focus on the relationship between teaching or instructional supports and learning, which was the subject of only three studies (Hmelo-Silver et al., 2015; Ioannidou et al., 2010; Yoon et al., 2015). There are well-documented challenges that teachers often experience in adopting new teaching approaches, especially those that are computer supported (Aldunate & Nussbaum, 2013; Ertmer, Ottenbreit-Leftwich, Sendurur, & Sendurur, 2012), as many complex systems interventions are. Furthermore, Gerard et al. (2011) discussed the importance of opportunities for high-quality teacher professional development in order to conduct technology-enhanced inquiry in science classrooms. Thus, it is important for the CSSE field to consider what characteristics of professional development are needed to support teachers in improving students’ understanding of complex systems. This is especially essential in light of the fact that the NGSS represents a new vision for science learning that comes with a steep instructional learning curve (Wilson, 2013).
Need for Research on Common Context Features
Overall, the analysis also revealed that a majority of studies did not report on gender composition, race or ethnicity, or the type of school setting where the research was conducted. Almost a third of the studies were also conducted in laboratory settings. Here, it is important to point out that for several decades now, learning research has suggested that populations of learners can be affected by demographic and contextual variables that significantly support or limit learning and participation (Brown, Collins, & Duguid, 1989; Greeno, 1998; Sawyer, 2015). This fact, in part, spawned the development of whole fields of educational research such as the Learning Sciences (Kolodner, 2004; Pea, 2016; Sawyer, 2015, Yoon & Hmelo-Silver, 2017) where the messiness of real-world learning environments and sociocultural activity occupies a central role in understanding how people learn (Bransford, Brown, & Cocking, 1999). To ensure that CSSE research and interventions can impart benefits to all learners, more studies accounting for situated and contextualized factors that mediate learning are needed. Furthermore, it would be helpful to know more about how these factors support or constrain the development of complex systems understanding, which was not a focus of this review.
Strong Representation of Early-Stage Exploratory Research
With respect to whether CSSE studies have used methodologies and conducted research toward scaling interventions, the analysis showed that the majority of studies were aimed at FEE research. This finding might be expected considering that complex systems research is relatively recent compared to other research in the field of education such as literacy, mathematics, or science. In mathematics education research, for example, a plethora of developmental research has exposed numerous cognitive challenges in the learning of percentage and proportions (Parker & Leinhardt, 1995) that researchers agree is a topic that must be studied. However, the CSSE field still has not come to consensus on frameworks that describe the content of what needs to be learned about complex systems. Furthermore, the aforementioned focus on building computational resources has necessarily required greater emphasis on exploratory research as newer tools continually emerge with affordances that enable different forms of participation and learning that need development and testing. The nature of technological development likewise dictates that some technologies become obsolete before they can be tested at scale. For example, Yoon (2008) and Klopfer, Yoon, and Perry (2005) described participatory simulations that were developed on mobile technologies—Thinking Tags and Palm Pilots—that are not in use today. The relative infancy of the field, in addition to this variability of learning resources, is arguably why we see a preponderance of mixed methodologies to examine implementation outcomes. Such methodologies can provide a more holistic picture of the extent to which populations have learned and why or how they have learned over using only quantitative or qualitative methods (Creswell & Plano Clark, 2011).
Need for More Comparative Research
Our results reveal that some subfields of complex systems research, like student learning, have reached a level of maturity that warrants more systematic investigation of how particular curricular and instructional design choices compare to others. Only 15% of the studies in our data set used quantitative methods; these studies tended to be experimental or quasi-experimental in design (e.g., DeBoer et al., 2014; Liu & Hmelo-Silver, 2009; Peppler et al., 2013; Plate, 2010; Thompson & Reimann, 2010). Thus, CSSE researchers have generally used noncomparative research designs. Noncomparative approaches limit any affordances that can be interpreted as having a significantly positive (or negative) influence on learning compared to other learning activities. Importantly, the noncomparative nature of these studies does not provide adequate information about the added value of learning experiences that can respond to new curricular mandates such as those that are found in the NGSS. Further evidence to support this assertion is suggested by the fact that no studies in our sample were categorized as efficacy, effectiveness, or scale-up research, and the fact that more than half of the studies worked with population sizes of less than 50. Coupled with the aforementioned lack of focus on contextualized variables, it is difficult to determine for whom and under what conditions CSSE interventions can achieve curriculum and instructional learning goals. More testing across different levels and settings (Penuel & Fishman, 2012) and studies that address variation in performance and system variables (Bryk et al., 2010; Bryk et al., 2015) with larger sample sizes need to be completed in future CSSE research.
Conclusion
Over the past two decades, policy mandates in science education and real-world complex systems research have spurred an interest in helping students learn about complex systems at the K–12 level. This systematic review was aimed at understanding how well the CSSE research reflects three central features of educational interventions. These are the extent to which it represents the goals of real-world complex systems science and the goals of the NGSS; whether CSSE research has been conducted in a variety of settings and populations; and whether CSSE studies have collectively aimed to conduct research to scale interventions. The research questions can also be considered as overarching goals to move the research field forward in achieving high-quality educational experiences in science education for all students. The analysis found critical needs in five areas: (a) a need for more research in different knowledge domains outside of the content areas of biology and ecology, (b) a need for more research on system states as opposed to structures and processes, (c) a need to develop a common understanding of the complex systems content that is essential to be learned, (d) a need to consider contextual factors that will affect the learning environment and population including teacher learning, and (e) more comparative research to determine the value of CSSE interventions over traditional forms of instruction, including an emphasis on what teachers need in professional development activities. Our intention is for this review to mobilize CSSE researchers to work together to address these important needs.
Footnotes
Authors
SUSAN A. YOON is professor of education in the Teaching, Learning, and Leadership Division at the Graduate School of Education of the University of Pennsylvania; email:
SAO-EE GOH is a researcher with the Academy of Singapore Teachers with the Ministry of Education in Singapore; email:
MIYOUNG PARK is a PhD candidate in the Teaching, Learning, and Teacher Education program at the Graduate School of Education of the University of Pennsylvania; email:
