A Flipped Systematic Debugging Approach to Enhance Elementary Students’ Program Debugging Performance and Optimize Cognitive Load

Abstract

Reintroducing computer science (CS) education in K–12 schools to promote computational thinking (CT) has attracted significant attention among scholars and educators. Among the several essential components included in CS and CT education, program debugging is an indispensable skill. However, debugging teaching has often been overlooked in K–12 contexts, and relevant empirical studies are lacking in the literature. Moreover, novices generally have poor performance in domain knowledge and strategic knowledge concerning debugging. They also consistently experience a high cognitive burden in debugging learning. To address these gaps, we developed a flipped systematic debugging approach combined with a systematic debugging process (SDP) and the modeling method. A quasi-experimental study was conducted to explore the effectiveness of this flipped systematic debugging approach, in which 83 fifth-grade students attended the flipped debugging training lessons with the SDP–modeling method, and 75 fifth-grade students attended the unassisted flipped debugging training lessons without the SDP–modeling method. The results indicated that flipped debugging training using the SDP–modeling method improved students’ debugging skills. The results from the questionnaire showed that the proposed teaching approach increased the students’ investment in germane cognitive load by promoting schema construction. It also helped reduce students’ intrinsic and extraneous cognitive load in learning.

Keywords

program debugging teaching cognitive load flipped classroom elementary education scratch

Introduction

There is a growing demand to reintroduce computer science (CS) education in K–12 contexts to promote computational thinking (CT) (Franklin et al., 2020). Papert’s (1980) early work indicated that programming could be a powerful approach to teaching CT to young students. However, novices can rarely build effectively running computer programs at the first attempt and must perform multiple debugging iterations to correct the faulty outcomes (Akcaoglu & Koehler, 2014).

Debugging is an iterative cognitive process of systematically diagnosing and repairing errors to correct faulty programs (Michaeli & Romeike, 2019b; Wu et al., 2019), which can be regarded as a problem-solving activity (Lye & Koh, 2014). In this study we focused on program debugging education because debugging can allow students to learn from problems or failures, eliminate misconceptions and enhance their understanding of CS concepts (Michaeli & Romeike, 2019a; Zhong & Si, 2021). Moreover, debugging skills can be leveraged to solve relevant problems in non-programming domains (Emara et al., 2020) and real-life scenarios (Michaeli & Romeike, 2019b).

However, program debugging is a challenging task for novices (Falloon, 2016; Kim et al., 2018) for three reasons: poor student performance in domain knowledge and strategic knowledge, huge cognitive burdens induced by debugging learning, and the lack of emphasis on program debugging as a separate teaching practice in K-12 CT education.

Specifically, first, novices lack the required domain knowledge and strategic knowledge (Liu et al., 2017; Murphy et al., 2008). Domain knowledge refers to knowledge of the relevant concepts, constructs, relations and rules of the underlying programming language (Li et al., 2019). Novices typically develop a shallow understanding of basic concepts, which increases the difficulty of interpreting the executed results of specific sets of commands (Pea, 1986), inferring error causes (Fitzgerald et al., 2010; Kim et al., 2018) and correcting errors. Strategic knowledge refers to knowledge of debugging methods, which can guide novices to plan effective problem-solving solutions (Li et al., 2019). Novices often lack effective debugging strategies or cannot apply strategies efficiently (Michaeli & Romeike, 2019a; Murphy et al., 2008). For example, novice programmers generally revise problematic programs without any plan (Chen et al., 2013). They tend to make minor changes randomly to decrease the difference between the faulty and goal programs (Bers et al., 2014; Liu et al., 2017), during which they sometimes forget to revert incorrect modifications (Michaeli & Romeike, 2019b) and introduce new errors easily. It is difficult for novices to obtain excellent debugging skills by merely learning how to write programs (Fitzgerald et al., 2010). Thus, novices should be provided with explicit teaching instructions to learn effective debugging strategies and enhance concept comprehension (Chiu & Huang, 2015; O’Dell, 2017).

Second, in addition to the lack of domain knowledge and strategic knowledge, performing various subtasks (e.g., recognizing discrepancies, inferring errors, fixing errors) to debug programs may induce high cognitive load (Zhong & Si, 2021). Novices typically lack effective schemata and relevant experience in debugging (Van Gog et al., 2006); therefore, they must simultaneously process different types of information in their working memory when debugging. Novices’ learning is negatively affected once the debugging task–imposed cognitive load exceeds their working memory capacity (Shadiev et al., 2015). Thus, effective instructional approaches for debugging should be designed to optimize different types of cognitive load as perceived by novices in learning.

Third, despite its importance, debugging teaching has often been overlooked, especially in K–12 education (Michaeli & Romeike, 2019b). Empirical studies exploring effective debugging teaching approaches are lacking (Michaeli & Romeike, 2019a). Owing to the lack of time for student active learning, classroom teachers prefer to directly provide students with error-correction solutions or leave them alone with error-processing tasks, rather than explicitly teach them program debugging (Michaeli & Romeike, 2019b).

Given the above concerns, it is important for us to conduct research to develop an effective instructional approach for debugging and evaluate its effects on debugging teaching in actual elementary classrooms. Enhancing novices’ domain knowledge and strategic knowledge should be the priority of debugging teaching. Novices’ cognitive burden in learning should also be optimized in the design of effective teaching instructions (Klepsch & Seufert, 2020).

This study makes the following contribution to the literature: It proposes a flipped systematic debugging approach to enhance elementary students’ debugging skills and optimize three types of cognitive load imposed on them. The flipped debugging approach uses a systematic debugging process (SDP) containing four debugging steps to familiarize students with effective debugging strategies, and the modeling method to facilitate students’ acquisition of domain and strategic knowledge (Kale & Yuan, 2021) required for debugging. This flipped systematic debugging approach can serve as a reference for other researchers and educators to replicate or improve the instructional design on debugging. Moreover, this study provides experimental results on student performance in debugging and cognitive load to show the effectiveness of the proposed flipped systematic debugging approach. The results are based on the application of the proposed debugging teaching approach in training 158 fifth-grade elementary students. Overall, this study enriches empirical research concerning the development of effective debugging teaching approaches.

Conceptual Background

Program Debugging Models

Scholars have developed different models to illustrate the debugging process. For example, Vessey (1985) proposed the following goal hierarchy for debugging: 1) problem identification by comparing the outputs of the desired and faulty programs; 2) understanding of the program structure; 3) program examination according to the execution order; 4) formation of hypotheses to infer bugs; and 5) bug correction. Similarly, Carver and Risinger (1987) designed a debugging model that included the following five phases: program assessment, bug identification, program representation (specifying the command set in which a bug may exist), bug location and bug correction. Yoon and Garcia (1998) developed a two-step approach for debugging; step one involved comprehending the program and finding discrepancies, while step two involved isolating bugs, repairing them and re-evaluating solutions.

Although the different debugging models may contain different labels or descriptions of debugging subtasks, we can parsimoniously group all the essential subtasks into four major components (Figure 1): problem identification and representation (e.g., comparing the program goals and outputs), bug location (e.g., understanding the program structure), bug correction, and solution evaluation. These four essential subtasks constitute the fundamental steps that students generally experience during debugging. Using a systematic stepwise approach to guide students through debugging is likely to improve their debugging skills by increasing their debugging speed and correctness and decreasing the number of introduced errors (Carver & Risinger, 1987; Michaeli & Romeike, 2019b). The following sections explain each step in more detail.

Figure 1.

Framework showing the systematic program debugging process.

The debugging process begins with the problem identification and representation step. This step requires students to comprehend the objectives of the desired program and compare them with the outcomes of the faulty program. Students then hypothesize likely errors based on the identified discrepancies. Young novices tend to skip the discrepancy detection step and revise the faulty program based on their intuition (Kale & Yuan, 2021; Liu et al., 2017). Since the ability to recognize whether the actual outputs match the desired outputs is foundational to successful debugging (Rich et al., 2019), discrepancy recognition should be emphasized. Moreover, discrepancies may remind students of familiar error symptoms and corresponding stereotypical patterns of errors, which can help them speculate future likely errors (Li et al., 2019).

The next step is bug location, in which students need to understand the basic structure of the program and hypothesize the possible locations of each error. Decomposing the whole program into several subsections with distinct functions enhances students’ comprehension of its structure and logic, which improves their error localization speed. Considering that novices generally have a low ability to decompose programs (Lin et al., 2016), some measures can be taken to help them comprehend programs. Specifically, teachers can split certain parts of the program into sub-procedures and use meaningful names for these sub-procedures and the associated variables. Teachers can also provide novices with a simple textual introduction to sub-procedure functions (Li et al., 2019). When hypothesizing possible error locations, students should be encouraged to narrow their error search to specific program parts (e.g., specific sub-procedures) based on clues from the previous step (Carver & Risinger, 1987). Otherwise, they need to read each line of program code, which increases their time spent on error localization (Li et al., 2019). Moreover, students can narrow the set of error searches with the help of the program execution visualization in programming tools (e.g., Scratch) (Li et al., 2019).

The third step, bug correction, requires proposing solutions to repair each error found. Then, the fourth step, evaluation, requires students to check whether the repair attempts successfully revert the faulty program to the correct state. Improving novices’ ability to evaluate and adjust hypotheses is crucial to effectively debug faulty programs (Li et al., 2019). Solution evaluation is an iterative process. Testing the solution’s correctness immediately after one repair attempt helps students trace its effectiveness. When a solution produces the desired result, students should determine whether other errors exist. When a solution cannot resolve the original error, students should first undo the incorrect modifications to avoid introducing new errors (Rich et al., 2019) and then restart the debugging process cycle. According to evaluation results, they need to conduct some or all of the debugging steps to generate and assess an alternative solution. Debugging iterations continue until the faulty program arrives at the correct state.

Although the use of the systematic debugging steps can help develop students’ debugging skill (Michaeli & Romeike, 2019b; O’Dell, 2017), previous research (e.g., Böttcher et al., 2016) found that approximately half of the students could not effectively apply the systematic debugging approach in debugging tasks even after training. This may be partly due to the associated high cognitive burden, especially for novices (Zhong & Si, 2021). Novices’ cognitive burden in learning should be optimized in the design of effective teaching instructions (Klepsch & Seufert, 2020).

Cognitive Load Theory

Cognitive load theory (CLT) explains that students experience intrinsic cognitive load (ICL), extraneous cognitive load (ECL) and germane cognitive load (GCL) in learning (Van Gog et al., 2006). ICL is determined by the complexity of the learning content, students’ relevant expertise and the interaction between them (Sweller et al., 1998). The complexity of a task largely depends on the degree of element interactivity (Chen et al., 2015). Element interactivity is “a measure of informational complexity” (Sweller, 2020, p. 8). High element interactivity implies that students must simultaneously process a large number of interacting elements and their relationships, which is challenging (Van Merriënboer & Sweller, 2005). In contrast, low element interactivity means that students need to process merely a few interacting information elements at a time. For example, understanding the following script to draw a rectangle has high element interactivity: repeat 2 {move 30 steps; turn left 90°; move 60 steps; turn left 90°}. This script contains seven single elements that students must know first, namely, the repeat, move and turn left commands and the corresponding numbers: 2, 30, 90 and 60. Students must also simultaneously process the relationships between these elements to identify the rectangle obtainable by this script. A task’s element interactivity changes with the students’ expertise level (Sweller, 2020). Students with sufficient expertise can construct various cognitive schemata to store knowledge in their long-term memory. These students can more easily process numerous interacting elements as a few elements or a single one, thus relieving their working memory burden (Sweller, 2020). For tasks with high element interactivity, encouraging novices to build schemata to incorporate interacting elements can help reduce the associated ICL (Chen et al., 2015).

Germane cognitive load is mental work directly contributing to learning. It can generally be induced by appropriate instructional activity design and used to comprehend required learning materials (Shadiev et al., 2015). Inducing GCL through appropriate teaching approaches (e.g., providing guidance, arranging different versions of a task in random order during practice rather than arranging one version of a task repeatedly) can motivate students to invest their cognitive resources in schema construction (Van Merriënboer & Sweller, 2005). Moreover, encouraging students to study expert solutions (e.g., program debugging strategies) can promote GCL to stimulate students to construct the relevant schemata (Van Gog et al., 2004). For example, demonstrating different program debugging tasks containing repeat instruction, repeat-until instruction, or the combination of loops instruction with the conditionals instruction in loops concept teaching may evoke GCL among students, helping them construct more accurate cognitive schemata of related concepts (Wang et al., 2018). Allowing students to observe experts’ debugging processes may also stimulate students to invest cognitive resources in developing a deeper understanding of problem-solving strategies. GCL is related to students’ efforts to acquire concepts and skills (Van Gog et al., 2006). Higher GCL means students put more effort into concept comprehension and skill acquisition.

Extraneous cognitive load is associated with the instructional format used in the teaching and learning process (Chen et al., 2015; Shadiev et al., 2015). Improper instructional design can impose excessive ECL on students, which may overburden students’ limited working memory and prevent them from engaging in meaningful learning processes such as information processing and schema construction (Van Merriënboer & Sweller, 2005). For example, explaining the loops concept merely in a verbal way without demonstrating its construct and application through the computer may increase ECL as experienced by students. Moreover, the program debugging process require students to search through various possible strategies and solutions, identify an effective solution among these possibilities and implement the solution to move closer to the goal state. Students who lack support and use resource-intensive processes such as trial and error may experience high ECL (Van Merriënboer & Sweller, 2005). Excessive ECL negatively affects students’ learning, leading to longer learning time, lower learning performance or both (Wang et al., 2018). In contrast, providing explicit instructional guidance such as detailed demonstrations or appropriate debugging strategies can reasonably decrease interacting elements students need to process relevant to the problem-solving process and reduce the associated ECL (Sweller, 2020). Figure 2 shows the possible relationship between the various types of cognitive load in the domain of program debugging.

Figure 2.

Cognitive load relationships in program debugging (adapted from Garner, 2002).

Instructional Design on Program Debugging

To enhance elementary students’ debugging skills and optimize their different types of cognitive load, this study developed a flipped debugging teaching approach combined with the SDP method (Figure 1) and the modeling method and applied this flipped teaching in actual debugging teaching practice.

How systematic debugging process may help?

The four steps included in the SDP can be presented as question prompts to scaffold novices’ debugging processes. The use of scaffolding to explain the essential problem-solving steps can enhance novices’ learning and balance their cognitive load (Shadiev et al., 2015). Scaffolding allows novices to undertake essential but easily ignored subtasks, helping them stay on track to solve problems. This enhances their understanding of the debugging process and promotes their strategic knowledge development (Kale & Yuan, 2021). Furthermore, the heuristic nature of stepwise approaches means that scaffolding does not guarantee a correct solution (Van Gog et al., 2004), which offers novices opportunities to process information and make inferences independently.

Furthermore, scaffolding presenting problem-solving steps may help balance the different types of cognitive load in novices when learning. First, it may help optimize ICL. Debugging requires novices to simultaneously handle various elements and their relationships (e.g., recalling programming concepts and structures, describing discrepancies, inferring and correcting errors). This can lead to the high element interactivity of debugging. The stepwise problem-solving approach can distribute the complexity of the task to different subtasks. By limiting the number of information elements, scaffolding allows novices to simultaneously process relatively fewer elements in each subtask and thus minimise ICL (Van Merriënboer & Sweller, 2005). Second, scaffolding may reduce ECL. Weak problem-solving strategies (e.g., haphazard tinkering, trial and error) can be avoided by providing novices with problem-solving steps. Moreover, novices can consult the scaffolding tool to check subsequent steps and thus do not need to remember all the required steps, which relieves their cognitive burden and reduces ECL. Third, using the scaffolding tool may stimulate novices to establish useful cognitive schemata to increase their perceived GCL. The variability of problem situations contributes to novices’ schemata establishment (Van Merriënboer & Sweller, 2005). Debugging diverse faulty programs using the scaffolding tool in debugging exercises may help novices get familiarized with the systematic debugging process and grasp meaningful debugging strategies (e.g., debugging steps, incorrect-revision undo strategy), thus building relevant cognitive schemata. The schema construction can improve novices’ expertise level and consequently reduce their perceived ICL (Sweller, 2020).

How the modeling method may help?

The modeling method helps novice students acquire domain knowledge and strategic knowledge by demonstrating the problem-solving process (Kale & Yuan, 2021). Similar to example-based learning (Van Gog et al., 2019), this method emphasizes examples (Kale et al., 2018), encouraging students to observe the problem-solving process that an experienced problem solver (e.g., a teacher) is performing (i.e., modeling examples) or has completed (i.e., worked-out examples). Modeling examples can be presented through live or video demonstrations (Van Gog et al., 2019; Kale et al., 2018). During demonstrations, the experienced problem solver can inform novices of effective problem-solving strategies and explain the thoughts involved in the reasoning activities and the rationale for selecting the solution steps (Kant et al., 2017). Such demonstrations can help novices fill their knowledge gaps and build useful cognitive representations of problem-solving behaviors (Van Gog & Rummel, 2010; Wittwer & Renkl, 2010). This will allow novices to apply the knowledge acquired to effectively address similar issues in later situations (Kant et al., 2017).

Based on the CLT perspective, listening to experts’ instructions and observing their actions can help novice students focus on crucial knowledge (e.g., domain concepts, strategy selection, rationales for decision-making) involved in solving problems (Van Gog et al., 2004). This may induce GCL in students by encouraging them to construct meaningful cognitive schemata concerning concepts and debugging strategies (Kant et al., 2017), which will decrease their ICL (Sweller, 2020). However, the distracting information included in face-to-face demonstrations may increase ECL because novices are likely to spend part of their cognitive resources on irrelevant details (e.g., the teacher’s tone of voice) and unclear explanations (Van Gog & Rummel, 2010). Moreover, the model’s behavior (natural or didactic) may affect ECL as perceived by students (Van Gog & Rummel, 2010). Compared with showing erroneous or ineffective solutions, demonstrating effective strategies and reasoning processes clearly can minimize novices’ ECL.

Why use the flipped classroom approach?

The flipped classroom approach comprises two stages: the self-leaning stage and the in-class learning stage, aiming to increase active learning opportunities and student active participation (Gao & Hew, 2022). The self-learning stage allows students to learn at home at their own pace through pre-class video lectures and online quizzes. Video lectures enable students to pause or replay the video when they encounter difficult-to-understand content, which helps manage their working memory burden (Clark et al., 2005). The in-class learning stage focuses on active learning, engaging students in higher-order cognitive activities (Gao & Hew, 2022). These active learning activities (e.g., problem-solving debugging tasks) allow students to apply their knowledge to tackle problems (e.g., correct errors included in faulty programs), which improves their understanding of the learned concepts and promotes skill development (Deslauriers et al., 2019; Gao & Hew, 2022).

Research Questions

In this study, the experimental group employed the flipped debugging teaching approach combined with the SDP–modeling method, while the control group used the unassisted flipped debugging teaching approach (i.e., without the SDP–modeling method). The flipped classroom procedure was used for both groups to increase students’ active learning opportunities and active engagement. The effects of the SDP–modeling method on debugging learning of 158 fifth graders were measured. This study was guided by the following research questions:

1) What is the effect of flipped debugging training with the SDP–modeling method on elementary student performance in program debugging?

2) What is the effect of flipped debugging training with the SDP–modeling method on elementary student performance in three types of cognitive load?

Method

This study aimed to examine the effects of the flipped debugging teaching approach using the SDP–modeling method on student performance in program debugging and cognitive load. A quasi-experimental design with experimental-control groups was adopted in this study. The student performance on debugging tests, including a pretest, midterm test, posttest and delay test, was collected and analyzed. Data on a cognitive load questionnaire (CLQ) were also collected to explore the students’ three types of cognitive load in each lesson.

Participants

One hundred and 58 fifth graders from a public elementary school in an Asian country participated in this study. We focused on elementary school students as starting programming education (including program debugging learning) in the early years of schooling is beneficial for student future learning (Bers et al., 2019). All participants had taken a 3-month programming course taught by the same teacher in the previous semester, in which they learned how to build programs but received no explicit instruction on debugging. A quasi-experimental design was used since this empirical study lacked random participant assignment (White & Sabarwal, 2014). Participant assignment to the experimental group and control group was determined solely through researcher selection. Two classes with 75 students were selected as the control group, while the remaining two classes with 83 students were chosen as the experimental group. The demographic information for both groups is indicated in Table 1. Before the participants were exposed to the debugging teaching intervention, the baseline data (i.e., student performance in program debugging) was collected to check whether any selection bias existed between the two groups (White & Sabarwal, 2014).

Table 1.

Demographic information for participants from two groups.

Group	Age			Gender
	Age			Boys		Girls
	N	Mean (SD)	Range	N	%	N	%
Unassisted flipped debugging training group (CG)	75	10.33 (0.475)	10–11	33	44.00	42	56.00
Flipped debugging training group using the SDP–modeling method (EG)	83	10.19 (0.397)	10–11	41	49.40	42	50.60

Experimental Procedure

Figure 3 depicts the experimental procedure design. The students took a debugging pretest in the first week to gauge their initial debugging skill level. Afterward, the experimental and control groups attended four debugging training lessons (one lesson per week) under different teaching conditions. Between the second and third lessons, both groups took a debugging midterm test. After the lessons were completed, the students took a debugging posttest. One week later, they took a delayed debugging test.

Figure 3.

Research design diagram.

The lesson content and debugging tasks of debugging training (Figure 4) focused on fundamental concepts (e.g., sequences, loops, conditionals, variables) included in the K–12 CS framework (K–12 Computer Science Framework Steering Committee, 2016) and the CT literature (e.g., Tsai, 2019). The pre-class learning sessions helped familiarize students with domain knowledge (e.g., basic concepts), while the in-class sessions provided students with active learning opportunities (e.g., debugging exercises). Each participant was assigned a computer to watch the teacher’s demonstrations and debug faulty programs in Scratch (https://scratch.mit.edu/).

Figure 4.

Key learning content of program debugging lessons.

Design of flipped debugging training with the SDP–modeling method for the experimental group

Figure 5 shows the flipped debugging lesson schedule for the experimental group. In each lesson, the experimental group was required to watch the video lectures and complete the online quizzes before class through the learning management system (https://cas.xueleyun.com/). At the start of the face-to-face learning sessions, the teacher briefly reviewed essential preview learning content to activate students’ prior knowledge. The teacher then guided students by discussing high-error-rate quiz questions to eliminate misunderstandings.

Figure 5.

The flipped debugging lesson design for the experimental group.

Afterward, the teacher presented a detailed demonstration of the program debugging process because of the application of the modeling method. Based on the scaffolding tool example illustrated (Figure 6), the teacher demonstrated the process of debugging a faulty program using the SDP approach (Figure 1) and explained why errors were solved with this approach. After the teacher demonstration, the students were provided with concept application opportunities for developing their debugging skills. They were asked to independently think about how to correct the given faulty program to achieve the required goals. During this period, the students were offered the scaffolding tool showing the SDP (Figure 7) to guide their problem-solving process. The students were not allowed to share ideas with peers, given that they needed to rate their perceived cognitive load in learning. Moreover, the teacher provided students with only technical support (e.g., the use of the programming tool) when necessary. After the debugging activity ended, the participants were required to complete the CLQ to rate their perceived different types of cognitive load during learning.

Figure 6.

The scaffolding tool example used in the teacher demonstration in the second lesson.

Figure 7.

The scaffolding tool used for the experimental group in the third lesson.

Design of unassisted flipped debugging training for the control group

Figure 8 indicates the flipped debugging lesson schedule for the control group. As in the experimental group, the control group also needed to watch the same video lectures and complete the same online quizzes before class. The control group then reviewed the same key content and discussed the same quiz questions at the beginning of each lesson under the teacher’s guidance. The main difference in the activity arrangement between the control and experimental groups lies in the design of the teacher demonstration and problem-solving activity. Specifically, during the teacher’s brief demonstration, the control group observed only error repair demonstrations and the final error-correction results instead of the detailed debugging process (e.g., the use of the SDP approach, error reasoning processes). Moreover, in the problem-solving activity, the students were asked to complete debugging tasks without the scaffolding tool. Faulty programs used in teacher demonstrations and debugging exercises were the same as those for the experimental group. Similarly, the control group students could receive only technical support and were not allowed to share ideas with peers during debugging exercises. Finally, the students completed the CLQ to rate their perceived three types of cognitive load in learning.

Figure 8.

The flipped debugging lesson design for the control group.

Data Collection

Program debugging test

To assess the students’ overall performance in debugging, the teacher required them to debug faulty programs on computers. The students received 11 test tasks of different difficulty levels and were required to correct as many malfunctioning programs as possible within 20 minutes. Each task contained one faulty program with one to four injected logical errors. Faulty programs included in test tasks were designed based on examples of programming books and CT tests. The logical errors injected were developed based on bugs that the students encountered during their previous programming lessons and commonly reported misconceptions in the literature (e.g., Swidan et al., 2018). These errors (n = 27) were related to sequences (number of errors: n = 6), loops (n = 9), parameter passing (n = 3), variables (n = 6) and conditionals (n = 3). Figure 9 and Figure 10 show examples of the second posttest task, with sequence and loop errors.

Figure 9.

Example of the handbook page containing basic task information for the second task in the program debugging posttest.

Figure 10.

Example of the second task included in the program debugging posttest, including the faulty program, likely error options, and corresponding correction options for each error.

In each test, we provided students with a handbook covering essential task information (e.g., the pictures showing the current running result of the faulty program and its goal running result, the number of errors per problematic program) (see Figure 9). Another handbook containing faulty programs (see Figure 10), likely error options (three to seven) and corresponding correction options (two to three) for each error was also provided for reference when the students got stuck in problems.

The students were asked to take the program debugging test four times. The debugging midterm test, posttest and delay test were slightly modified based on the debugging pretest. Similar to the debugging pretest, each of the other three debugging tests contained 11 debugging tasks with 27 logical errors that needed to be fixed. Although the four tests were not identical, their content scope, item structures and difficulty level were similar.

Several measures were taken to enhance the validity and reliability of program debugging tests. Specifically, developing debugging tests based on previous relevant work contributes to the improvement of the validity and reliability of these measurement instruments. To ensure the validity of debugging tests, the first author revised the debugging midterm test, posttest and delay test based on the corresponding pretest script to prevent students from recalling the test items and related answers in the pretest. Each test’s test scripts and results were also not discussed with or returned to the students to avoid cheating. Moreover, refinement based on experts’ feedback helped enhance the validity of debugging tests. All four debugging tests were developed by the first author and checked by two other experts with extensive programming experience. Debugging tests were revised and refined based on experts’ suggestions. The first author also invited an elementary Chinese language teacher with 30-year teaching experience to offer suggestions on the program debugging task descriptions to ensure that the question descriptions could be understood by elementary school students. Additionally, the test procedure used for debugging tests remained identical for the two groups at each measurement time point since unstandardized test procedures may produce unreliable data.

Cognitive load questionnaire

To evaluate the cognitive load imposed on students by the two debugging training types, we used the CLQ to measure students’ perceived three types of cognitive load throughout each lesson. The CLQ was adapted from subjective rating scales developed by Leppink et al. (2013, 2014). Their rating scales could identify three types of cognitive load and have been widely used in the relevant literature (Mutlu-Bayraktar et al., 2019). The adapted CLQ fitted the debugging teaching context and included 13 items on three subscales. Specifically, one subscale with five items measured perceived ICL (e.g., The content of this debugging activity was very complex), while the other two with four items separately assessed perceived ECL (e.g., The instructions and explanations during the activity were very unclear) and GCL (e.g., The activity really enhanced my understanding of the concepts and definitions). All questionnaire items were rated on a 7-point Likert scale, with 1 indicating ‘completely disagree’ and 7 indicating ‘completely agree’.

The first author developed the CLQ by referring to widely-applied scales designed by Leppink et al. (2013, 2014) and adapted all question items in the CLQ to fit better the program debugging context. This helps improve the validity and reliability of this measurement instrument. Before the actual implementation, the questionnaire was translated from English to Chinese by the first author and another bilingual native Chinese expert. The validity of the questionnaire translation was checked, and differences were resolved through discussions. All participants were asked to complete the CLQ on paper in each lesson, during which the teacher explained the rating items in detail to ensure that students fully understood the items. The participants immediately handed in the CLQ upon completion.

Data Analysis

Program debugging test

In each test, 27 logical errors were inserted into 11 faulty programs. To enhance the validity of the test, we developed our evaluation criteria for debugging tests based on similar assessment criteria used by other scholars (e.g., Fitzgerald et al., 2008) to accurately assess students’ overall performance in program debugging. In addition to the number of corrected errors, the number of newly introduced errors was also used as an indicator to assess students’ debugging skills. The students were given 1 point separately when fixing one error or introducing one error. The final test score was the difference between the total points of the two indicators, with a maximum of 27 points. Approximately 32% of the debugging test data (n = 632) were scored carefully by two raters (the first author and one expert with rich programming experience) to determine the reliability of the analysis. Cohen’s kappa coefficient was 0.984 (p < .001, 95% CI [0.980 to 0.988]), indicating excellent inter-rater reliability (Landis & Koch, 1977). The first author completed the remaining scoring task after resolving all differences.

A 2 × 4 mixed-design analysis of variance (ANOVA) was conducted to determine the effects of flipped debugging training using the SDP–modeling method on student performance over time. The within-subjects factor was the measurement time, with four levels (i.e., pretest, midterm test, posttest and delay test). The between-subjects factor was the debugging training type (i.e., flipped debugging training with the SDP–modeling method and unassisted flipped debugging training). The students’ test scores were used as the dependent variable. The debugging test data satisfied the normal distribution. Since the test data violated the assumption of sphericity based on Mauchly’s test, we used Greenhouse-Geisser correction when reporting relevant results. When the ANOVA yielded significant results (p < .05), further comparisons in the mean score differences were conducted through Bonferroni-corrected pairwise comparisons to minimize the probability of making Type-I error and improve the reliability of analysis results. The partial eta squared statistic ( $η_{P}^{2}$ ) was computed to evaluate the practical significance of the results. Its values of 0.01, 0.06 and 0.14 represent small, moderate and large effect sizes, respectively (Field, 2013).

Cognitive load questionnaire

The internal consistency of the CLQ and its three subscales were estimated using Cronbach’s alpha. Reliability analysis revealed the range of Cronbach’s alpha values to be between 0.854 and 0.890 for the full CLQ and between 0.766 to 0.867 for its three subscales (i.e., $α_{I C L}$ : 0.831 to 0.867, $α_{E C L}$ : 0.766 to 0.832; $α_{G C L}$ : 0.820–0.864). These values demonstrate the satisfactory reliability of the adapted CLQ and its three subscales. To answer the second research question, for each type of cognitive load (i.e., ICL, ECL and GCL), we conducted the two-factor mixed-design ANOVA to assess the effects of two different debugging training types on students’ ratings collected in each lesson. For each two-factor mixed-design ANOVA, the within-subjects factor was the measurement time, with four levels (i.e., the first, second, third and fourth lesson), while the between-subjects factor was the debugging training type. The students’ ratings on each subscale (i.e., ICL, ECL or GCL subscale) were the dependent variable. Cognitive load data satisfied the normality assumption. Similarly, Bonferroni-corrected pairwise comparisons were conducted if the ANOVA yielded significant results, and the $η_{P}^{2}$ was used to represent the effect size.

Results

Students’ Performance on Program Debugging Tests

A 2 (debugging training type) × 4 (measurement time) ANOVA revealed a significant repeated measurement main effect on the students’ test scores (F (2.741, 427.576) = 325.299, p < .001, $η_{P}^{2}$ = 0.676), showing that the students’ scores changed remarkably across the four measurement time points (all p < .001). The ANOVA also yielded a marked main effect of the debugging training type on student performance (F (1,156) = 15.615, p < .001, $η_{P}^{2}$ = 0.091), but this effect was qualified by an interaction between the debugging training type and measurement time (F (2.741, 427.576) = 12.028, p < .001, $η_{P}^{2}$ = 0.072).

The analysis of the simple main effect of the debugging training type indicated that the difference between the experimental and control groups in the pretest was not significant (F (1,156) = 0.007, p = .934,

η_{P}^{2}

< 0.001), implying that the student groups had equivalent levels of debugging skills at the start of the intervention. However, significant differences in the results were found for each of the subsequent tests (Table 2), showing that at each measurement time point, the experimental group always obtained significantly higher scores than the control group. Moreover, the differences between the two groups in the posttest and delay test were larger than that in the midterm test.

Table 2.

Mean Scores (SD) on Each Debugging Test for the Two Groups, and Analysis Results of the Simple Main Effect of the Debugging Training Type.

Measurement Time	Unassisted Flipped Debugging Training	Flipped Debugging Training with the SDP–Modeling Method	F	p-value	$η_{P}^{2}$
Measurement Time	M (SD)	M (SD)	F	p-value	$η_{P}^{2}$
Pretest	5.293 (2.091)	5.325 (2.705)	0.007	0.934
Midterm test	7.600 (2.531)	8.639 (2.417)	6.955	0.009^**	0.043
Posttest	9.653 (2.560)	11.494 (2.587)	20.144	<0.001^***	0.114
Delay test	10.320 (3.724)	12.855 (3.045)	22.115	<0.001^***	0.124

Meanwhile, the simple main effect of measurement time revealed that the mean scores significantly changed as the debugging lessons proceeded for both the experimental group (F (3,154) = 194.319, p < .001, $η_{P}^{2}$ = 0.791) and the control group (F (3,154) = 83.495, p < .001, $η_{P}^{2}$ = 0.619). The score differences between adjacent measurement time points were compared using Bonferroni-corrected pairwise comparisons. For the experimental group, three pairwise comparisons (pretest vs. midterm test, midterm test vs. posttest and posttest vs. delay test) revealed a constant significant increase in test scores (all p < 0.001). In contrast, the scores of the control group significantly increased from the pretest to the midterm test and from the midterm test to the posttest (both p < .001); however, the difference between the posttest and the delay test was not significant (p = .165).

Students’ Perceived Cognitive Load

To evaluate the rating differences in each type of cognitive load per lesson between the control and experimental groups, we mainly presented analysis results concerning the debugging training type. The analysis results based on the two-factor mixed-design ANOVA uncovered significant main effects of the debugging training type on students’ ratings for ICL (i.e., F (3,154) = 13.319, p < .001, $η_{P}^{2}$ = 0.079), ECL (i.e., F (3,154) = 13.416, p < .001, $η_{P}^{2}$ = 0.079) and GCL (i.e., F (3,154) = 65.779, p < .001, $η_{P}^{2}$ = 0.297). These analysis results indicated that the experimental group experienced lower ICL and ECL but higher GCL at each debugging training lesson compared with the control group.

The simple main effect of the debugging training type further showed that in the first lesson, there were no significant rating differences between the two groups for ICL (p = .083) nor for ECL (p = .566) (see Table 3). In contrast, the difference in GCL between the two groups was significant (p < .001,

η_{P}^{2}

= 0.218), favoring the experimental group. The results mean that in the first lesson, both groups exhibited similar interactions between debugging task complexity and students’ debugging skills (Van Merriënboer & Sweller, 2005). Moreover, both groups experienced similar levels of cognitive load in processing instructional materials and tasks. However, the experimental group invested more effort in the schema construction concerning concepts and skills (Van Gog et al., 2006).

Table 3.

Descriptive Statistics of Cognitive Load Data, and Analysis Results of the Simple Main Effect of the Debugging Training Type for Each Type of Cognitive Load.

CL Type	Lesson	Unassisted Flipped Debugging Training	Flipped Debugging Training with the SDP–Modeling Method	F	p-value	$η_{P}^{2}$
CL Type	Lesson	M (SD)	M (SD)	F	p-value	$η_{P}^{2}$
ICL	First	3.387 (0.920)	3.113 (1.035)	3.052	0.083
	Second	4.789 (0.765)	4.446 (0.755)	8.059	0.005^**	0.049
	Third	4.317 (0.777)	3.918 (0.909)	8.716	0.004^**	0.053
	Fourth	4.875 (0.954)	4.448 (0.757)	9.776	0.002^**	0.059
ECL	First	2.177 (0.745)	2.108 (0.745)	0.330	0.566
	Second	2.293 (0.673)	2.054 (0.494)	6.562	0.011^*	0.040
	Third	2.270 (0.665)	1.952 (0.646)	9.301	0.003^**	0.056
	Fourth	2.233 (0.686)	1.846 (0.584)	14.661	<0.001^***	0.086
GCL	First	3.213 (0.867)	4.184 (0.971)	43.567	<0.001^***	0.218
	Second	3.603 (0.928)	4.377 (0.943)	26.881	<0.001^***	0.147
	Third	4.213 (0.913)	5.069 (0.688)	44.805	<0.001^***	0.223
	Fourth	4.097 (0.934)	5.124 (0.723)	60.334	<0.001^***	0.279

Note. CL = Cognitive load.

In subsequent lessons, the experimental group perceived significantly lower ICL (all p < .01) and ECL (all p < .05) than the control group (Table 3). Yet, the experimental group experienced significantly higher GCL in learning program debugging (all p < .001). These results suggest that teaching students program debugging with the SDP–modeling method could help optimize their debugging learning-induced cognitive load by reducing their ECL and ICL but improving their GCL.

Discussion

This study compared the effects of two debugging training types (i.e., flipped debugging training combined with the SDP–modeling method vs. unassisted flipped debugging training) on elementary student performance in debugging and cognitive load. The students in both training conditions learned the same concepts and completed the same debugging tasks.

Student Performance in Program Debugging

No significant difference in student performance between the two debugging training types was detected in the debugging pretest. This result is expected because none of the participants had received specific training on debugging skill development in either school or extracurricular courses before the study. However, both groups considerably improved their scores from the pretest to the midterm test and from the midterm test to the posttest. Thus, novices can obtain beneficial debugging experience by practicing how to debug programs containing various errors (Li et al., 2019).

Additionally, the students who attended the debugging training lessons based on the SDP–modeling method consistently performed better in the midterm test and posttest than those who attended the unassisted debugging lessons. Since the students in the control group did not receive explicit instructions on the SDP and learn from teacher demonstrations, their typical approach was to construct basic schemata via random generation and test processes (Chen et al., 2015). That is, the students tended to fix errors by randomly generating possible error-correction solutions and checking their effectiveness. Generally, the unsuccessful modifications were jettisoned, and the successful ones were retained. Students may accumulate meaningful debugging experience by retaining some of the successful modifications in their long-term memory. However, owing to the lack of instructions, students may spend significant cognitive resources constructing accurate schemata (McLaren et al., 2016). This explains why these students underperformed those in the experimental group in the last three tests.

In contrast, the application of the SDP and modeling method helped novices in the experimental group acquire domain and strategic knowledge (e.g., error-related knowledge, SDP, undo strategy) (Kale & Yuan, 2021). Specifically, through teacher demonstrations, these students learned the systematic debugging method and error reasoning process. In subsequent exercises, following the problem-solving steps provided by the scaffolding tool to debug faulty programs might enhance their familiarity with the SDP. Practice opportunities help students recognize deficiencies in their cognitive schemata, which may challenge them to invest more effort in knowledge acquisition (Kant et al., 2017). The emphasis on reverting wrong revisions can prevent novices from introducing new errors (Michaeli & Romeike, 2019b). This debugging training type, therefore, improves students’ learning outcomes. Based on their empirical study, Michaeli and Romeike (2019b) similarly concluded that compared with the use of the teaching approach of asking students to complete debugging exercises independently, the implementation of explicit teaching instructions on debugging has a more positive effect on student performance.

Unlike the students in the unassisted flipped debugging training group, the students in the flipped debugging training group using the SDP–modeling method performed significantly better in the delay test than in the posttest. Moreover, the score difference between the two groups in the delay test was larger than that in the posttest. The students in the flipped debugging training group using the SDP–modeling method outperformed those in the unassisted flipped debugging training group in the delay test owing to the following two reasons. First, the students in the flipped debugging training group using the SDP–modeling method were more likely to spend less time identifying discrepancies and locating and correcting errors when dealing with familiar debugging tasks. Second, by using meaningful cognitive schemata (e.g., SDP), these students might be able to narrow their error search and thus improve their debugging speed and accuracy when dealing with unfamiliar tasks (Carver & Risinger, 1987).

Student Performance in Three Types of Cognitive Load

In addition to helping students achieve higher learning performance, optimizing their use of cognitive resources in learning is vital (Klepsch & Seufert, 2020). The results show that the experimental group rated markedly higher scores in GCL than the control group in each lesson. This finding indicates that the SDP–modeling method evokes students’ GCL by motivating them to allocate more cognitive resources to build and process mental models (Klepsch & Seufert, 2020).

Specifically, in the modeling method, demonstrating each step of the SDP, accompanied by the teacher’s verbal explanation of solutions, seems to be effective for enhancing students’ understanding of the systematic problem-solving process. In addition to showing each problem-solving step and corresponding answers, the teacher explained the logic between these steps and the justifications for error inferences. Allowing students to learn expert information (i.e., ‘why’ and ‘how’ information) can help them acquire useful domain and strategic knowledge and facilitate their schema construction, which helps induce higher GCL (Klepsch & Seufert, 2020; McLaren et al., 2016). These schemata may be activated and serve as a guide for subsequent debugging tasks (Leppink et al., 2014). For example, schemata can direct students on how to devote most of their cognitive resources to processing meaningful debugging steps. However, unlike the experimental group, since the teacher merely showed the error correction process and the final error-repair results, the control group did not have the opportunity to learn expert information, which contributed minimally to schema building.

The scaffolding tool showing the SDP can direct students to concentrate on effective debugging steps in exercises. Scaffolding can guide students to understand problems, hypothesize possible errors based on discrepancies, narrow error searches and promptly undo problematic revisions. By encouraging students to use the recommended steps and strategies (e.g., undo strategy), the scaffolding tool helps them construct schemata to store knowledge in their long-term memory (Van Merriënboer & Sweller, 2005). Such knowledge may be activated in future debugging tasks. Moreover, the scaffolding tool allows students to write down intermediate results (discrepancies, likely errors, correction solutions) on paper. Requiring students to generate and record their own answers may increase their engagement in problem-solving processes (Chen et al., 2015) and help them recognize the relationships between identified discrepancies and corresponding errors and solutions. Additionally, displaying immediate execution results on screen after revisions can help students recognize potential relationships between correction solutions and program outcomes (Kale & Yuan, 2021). The formation of these patterns (e.g., relationships between the discrepancy and likely error, between the error correction and program result) may enhance students’ conceptual understanding and increase their error-reasoning experience. In contrast, owing to the lack of effective debugging strategies, students who solve problems without the help of the scaffolding tool may start debugging based on their intuition and generally rely on weak strategies (e.g., haphazard tinkering, trial and error). This may not be efficient for novices’ learning given that they typically have more difficulties remembering steps and revisions that work and constructing relevant mental models (Van Gog et al., 2019).

The application of the SDP–modeling method promotes schema construction and thus enables students in the experimental group to perceive higher GCL. Students’ schema construction is also reflected in their higher debugging test scores, consistent with the viewpoint that the facilitation of schema building can improve student performance (Klepsch & Seufert, 2020). In addition to increasing students’ learning outcomes, schema construction and automation can decrease ICL induced by debugging exercises. This study indicates that the rating difference in ICL between the two flipped debugging training types was significant in lessons two to four. Moreover, although there was no significant difference in ICL in the first lesson between the two groups, the experimental group taught with the SDP–modeling method had lower ICL in debugging exercises.

Students’ perceived ICL is determined by combining both the complexity of the debugging tasks (i.e., element interactivity) and the students’ relevant prior knowledge (i.e., expertise used for debugging) (Seufert, 2018). The SDP–modeling method encouraged students to organize and store useful information (e.g., SDP, error reasoning processes) in their long-term memory by constructing schemata. This increased their expertise in program debugging (Chen et al., 2015). In subsequent debugging tasks, directly retrieving already constructed task-relevant schemata from their long-term memory helped the students process several associated elements as one (Klepsch & Seufert, 2020). In other words, students with higher expertise process fewer interacting elements (corresponding to task complexity) and thereby experience lower ICL (Seufert, 2018; Thees et al., 2020). In contrast, owing to the lack of guidance on the systematic debugging strategy and the lack of teacher demonstrations, the schema construction process in the unassisted flipped debugging training group is less efficient. This leads to a lack of effective cognitive schemata in students of this group. Thus, the unassisted flipped debugging training group experienced higher ICL in each lesson, and the rating score difference between the two groups increased as the lessons progressed.

Similar to the ICL results, the students who attended debugging training lessons based on the SDP–modeling method perceived lower ECL in each lesson. Moreover, significant differences in the rating results between the two flipped debugging training groups occurred in the last three lessons. For the experimental group, teacher demonstrations (the use of the modeling method) provided the students with clear explanations and expertise in debugging (e.g., systematic steps and correct solutions, error reasoning). Moreover, the ‘how’ and ‘why’ information may obviate novices’ need to use weak debugging strategies in subsequent debugging practice and thus reduce their perceived ECL in learning (Klepsch & Seufert, 2020; Sweller, 2020). Furthermore, to avoid natural teaching behaviors (e.g., demonstrating erroneous solutions inadvertently), the teacher adopted a didactical demonstration approach, such as clearly explaining effective strategies and sequentially displaying each step of the SDP and corresponding correct answers on a slide. This helped decrease the students’ cognitive resources wasted in face-to-face demonstrations.

Additionally, the scaffolding tool used in exercises allowed the students to debug programs based on the systematic steps, which enabled them to avoid unnecessary strategy searches and the use of weak debugging strategies. Moreover, the scaffolding tool relieved the students’ limited working memory by requiring them to write down answers to each debugging step (e.g., discrepancies, likely errors) on paper (Perkins, 1993). Both benefits contributed to the decrease in ECL.

In contrast, in debugging training lessons without the SDP–modeling method, the teacher merely demonstrated error repairs and the final program outcome after repairing all errors. The teacher did not offer the students details on how to infer and locate errors and how to propose correction solutions. Unclear explanations might distract novices and waste their cognitive resources (Van Gog & Rummel, 2010). Additionally, the students did not learn the systematic debugging strategy, and did not have the support of the scaffolding tool in exercises. This means that these novices need to identify appropriate ones among various possible strategies and solutions, which may impose excessive ECL on them (Van Merriënboer & Sweller, 2005). Moreover, owing to the lack of opportunities for learning useful strategic knowledge, these novices tended to rely more on weak problem-solving strategies (e.g., haphazard tinkering, trial and error) in debugging exercises. These neither effective nor efficient strategies can impose high ECL on novices (Klepsch & Seufert, 2020; Van Gog et al., 2006).

Implications for Related Research and Educators

This study shows important implications for relevant research and teaching practices on program debugging. First, this research enriches related empirical studies on program debugging teaching. Empirical research exploring effective instructional approaches concerning program debugging is lacking in CT literature (Michaeli & Romeike, 2019a; Rich et al., 2019). This study helps to fill this research gap. We developed the flipped debugging teaching approach combined with the SDP–modeling method. Moreover, we used a quasi-experimental design with experimental–control groups to examine to what extent the flipped debugging teaching approach combined with the SDP–modeling method can facilitate the learning performance of elementary school students in program debugging and cognitive load. Qualitative data collected (i.e., program debugging tests, cognitive load questionnaire) verified the effectiveness of the flipped debugging teaching approach combined with the SDP–modeling method. This study helps researchers develop an initial understanding of the effective instructional approach relevant to program debugging. This study may also encourage other researchers to refine this flipped debugging teaching approach (using the SDP–modeling method) or implement empirical studies to explore other innovative instructional approaches on program debugging.

Second, this study has essential implications for actual teaching practices. K-12 CT education seldom attaches importance to the implementation of explicit teaching instruction concerning program debugging (Michaeli & Romeike, 2019b). Classroom teachers tend to leave students alone with error-correction tasks or directly give them error-correction solutions due to the lack of appropriate teaching approaches and active learning opportunities for students (Michaeli & Romeike, 2019b). This study demonstrated in detail how to implement teaching instruction in the actual program debugging course in an elementary school to address the issues identified in the related literature (i.e., poor student performance in domain knowledge and strategic knowledge, huge cognitive burdens imposed by debugging learning). Specifically, the flipped classroom method was used to increase active learning opportunities and foster student active participation. The SDP method was used as the scaffolding tool to help novice students get familiarized with the systematic program debugging process to obtain strategic knowledge. The modeling method allowed novice students to observe the whole program debugging process and learn the expertise (e.g., ‘why’ and ‘how’ information) to help them acquire domain knowledge and strategic knowledge (Kale & Yuan, 2021). Moreover, the SDP and modeling methods helped optimize students’ perceived three types of cognitive load. The application of the flipped debugging teaching approach combined with the SDP–modeling method in this study provides a valuable reference to other educators and practitioners. Additionally, our results verify the efficacy of the flipped debugging teaching approach combined with the SDP–modeling method for elementary school students’ program debugging learning, which enriches effective instructional resources relevant to program debugging training.

Conclusion and Limitations

Novices who learn to program are often frustrated by errors (Michaeli & Romeike, 2019b). They generally lack domain knowledge and strategic knowledge used for debugging. Debugging programs can bring a great cognitive burden to novices (Zhong & Si, 2021), given that it is a cognitively demanding task. Novices can rarely improve their debugging skills by merely learning how to program (Fitzgerald et al., 2010). Therefore, effective instructional approaches should be developed and adopted to enhance novices’ debugging skills (Michaeli & Romeike, 2019b) and meanwhile optimize their cognitive load.

This study designed a systematic debugging framework containing four essential steps and examined the effects of applying the flipped systematic debugging teaching approach with the SDP–modeling method on elementary students’ debugging learning and cognitive load. The results of our empirical study proved that the flipped debugging training approach with the SDP–modeling method promotes effective (higher debugging performance) and active (higher GCL) learning among novices compared with the unassisted flipped debugging training approach. Specifically, the application of the SDP–modeling method enhanced novices’ program debugging abilities. Moreover, this debugging training approach encouraged novices to engage in schema construction processes to increase their investment in GCL, which thus helped reduce their perceived ICL. This approach also reduced students’ perceived ECL imposed by learning.

Two limitations that may affect the generalization of this study’s results are as follows: First, only fifth graders were recruited. The results may not be transferable to other grade levels. Future empirical studies should be conducted on different sample groups in junior or senior high schools or even universities to explore how flipped debugging training with the SDP–modeling method affects student learning. Second, the self-report method used for measuring different types of cognitive load needs to be tested in more empirical studies. Future studies should adopt objective measurement methods for yielding more reliable and valid results on students’ cognitive load evaluation. Despite these limitations, considering the importance of debugging skills in CS and CT education and the difficulties teachers encounter in instructional design, this study provides a valuable reference on how to design and implement an effective debugging teaching approach to improve students’ debugging skills and optimize three types of cognitive load.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Xuemin Gao

Khe Foon Hew

Author Biographies

Xuemin Gao holds a PhD degree in Education from The University of Hong Kong. She has two MEd separately in Science and Technology Education from Beijing Normal University and in Science and Environment Education from The Education University of Hong Kong. Her research interests include computational thinking, programming education, blended-learning environment, and instructional design.

Khe Foon Hew is an associate professor in the Faculty of Education at The University of Hong Kong. His primary research interests focus on online pedagogy, instructional design; and e-learning/blended-learning environments.

References

Akcaoglu

Koehler

M. J.

(2014). Cognitive outcomes from the game-design and learning (GDL) after-school program. Computers & Education, 75, 72–81. https://doi.org/10.1016/j.compedu.2014.02.003

Bers

M. U.

Flannery

Kazakoff

E. R.

Sullivan

(2014). Computational thinking and tinkering: Exploration of an early childhood robotics curriculum. Computers & Education, 72, 145–157. https://doi.org/10.1016/j.compedu.2013.10.020

Bers

M. U.

González-González

Armas–Torres

M. B.

(2019). Coding as a playground: Promoting positive learning experiences in childhood classrooms. Computers & Education, 138, 130–145. https://doi.org/10.1016/j.compedu.2019.04.013

Böttcher

Thurner

Schlierkamp

Zehetmeier

(2016, October). Debugging students’ debugging process. Proceedings of the 2016 IEEE Frontiers in education conference (FIE), Eire, PA, 12–16 October, 2016 (pp. 1–7). IEEE. https://doi.org/10.1109/FIE.2016.7757447

Carver

M. S.

Risinger

S. C.

(1987). Improving children’s debugging skills. In Olson

G. M.

Sheppard

Soloway

, (Eds.), Empirical studies of programmers: Second workshop (pp. 147–171). Ablex Publishing Corp. https://doi.org/10.5555/54968.54978

Chen

M. W.

C. C.

Lin

Y. T.

(2013). Novices’ debugging behaviors in VB programming. Proceedings of the 2013 learning and teaching in computing and engineering, Washington, DC, 21–24 March, 2013 (pp. 25–30). IEEE. https://doi.org/10.1109/LaTiCE.2013.38

Chen

Kalyuga

Sweller

(2015). The worked example effect, the generation effect, and element interactivity. Journal of Educational Psychology, 107(3), 689–704. https://doi.org/10.1037/edu0000018

Chiu

C. F.

Huang

H. Y.

(2015). Guided debugging practices of game based programming for novice programmers. International Journal of Information and Education Technology, 5(5), 343–347. https://doi.org/10.7763/IJIET.2015.V5.527

Clark

R. C.

Nguyen

Sweller

(2005). Efficiency in learning: Evidence-based guidelines to manage cognitive load. Pfeiffer.

10.

Deslauriers

McCarthy

L. S.

Miller

Callaghan

Kestin

(2019). Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom. Proceedings of the National Academy of Sciences of the United States of America, 116(39), 19251–19257. https://doi.org/10.1073/pnas.1821936116

11.

Emara

Grover

Hutchins

Biswas

Snyder

(2020). Examining students’ debugging and regulation processes during collaborative computational modeling in science. In Gresalfi

Horn

I. S.

, (Eds.), The interdisciplinarity of the learning sciences, 14th International conference of the learning sciences, (pp. 1325–1332). International Society of the Learning Sciences. https://repository.isls.org//handle/1/6332

12.

Falloon

(2016). An analysis of young students’ thinking when completing basic coding tasks using Scratch Jnr. on the iPad. Journal of Computer Assisted Learning, 32(6), 576–593. https://doi.org/10.1111/jcal.12155

13.

Field

(2013). Discovering statistics using IBM SPSS statistics (4th ed.). Sage.

14.

Fitzgerald

Lewandowski

McCauley

Murphy

Simon

Thomas

Zander

(2008). Debugging: Finding, fixing and flailing, a multi-institutional study of novice debuggers. Computer Science Education, 18(2), 93–116. https://doi.org/10.1080/08993400802114508

15.

Fitzgerald

McCauley

Hanks

Murphy

Simon

Zander

(2010). Debugging from the student perspective. IEEE Transactions on Education, 53(3), 390–396. https://doi.org/10.1109/TE.2009.2025266

16.

Franklin

Weintrop

Palmer

Coenraad

Cobian

Beck

Crenshaw

(2020, February). Scratch Encore: The design and pilot of a culturally-relevant intermediate Scratch curriculum. In Proceedings of the 51st ACM technical symposium on computer science education, Portland OR, 11-14 March 2020 (pp. 794–800). ACM. https://doi.org/10.1145/3328778.3366912

17.

Gao

Hew

K. F.

(2022). Toward a 5E-based flipped classroom model for teaching computational thinking in elementary school: Effects on student computational thinking and problem-solving performance. Journal of Educational Computing Research, 60(2), 512–543. https://doi.org/10.1177/07356331211037757

18.

Garner

(2002). Reducing the cognitive load on novice programmers. In Barker

Rebelsky

, (Eds.), World conference on educational multimedia, hypermedia & telecommunications (pp. 578–583). Association for the Advancement of Computing in Education (AACE). https://www.learntechlib.org/primary/p/10329/

19.

K–12 Computer Science Framework Steering Committee (2016). K–12 computer science framework. http://www.k12cs.org.

20.

Kale

Akcaoglu

Cullen

Goh

Devine

Calvert

Grise

(2018). Computational what? Relating computational thinking to teaching. TechTrends, 62(6), 574–584. https://doi.org/10.1007/s11528-018-0290-9

21.

Kale

Yuan

(2021). Still a new kid on the block? Computational thinking as problem solving in code. Org. Journal of Educational Computing Research, 59(4), 620–644. https://doi.org/10.1177/0735633120972050

22.

Kant

J. M.

Scheiter

Oschatz

(2017). How to sequence video modeling examples and inquiry tasks to foster scientific reasoning. Learning and Instruction, 52, 46–58. https://doi.org/10.1016/j.learninstruc.2017.04.005

23.

Kim

Yuan

Vasconcelos

Shin

Hill

R. B.

(2018). Debugging during block-based programming. Instructional Science, 46(5), 767–787. https://doi.org/10.1007/s11251-018-9453-5

24.

Klepsch

Seufert

(2020). Understanding instructional design effects by differentiated measurement of intrinsic, extraneous, and germane cognitive load. Instructional Science, 48(1), 45–77. https://doi.org/10.1007/s11251-020-09502-9

25.

Landis

J. R.

Koch

G. G.

(1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310

26.

Leppink

Paas

Van der Vleuten

C. P.

Van Gog

Van Merriënboer

J. J.

(2013). Development of an instrument for measuring different types of cognitive load. Behavior Research Methods, 45(4), 1058–1072. https://doi.org/10.3758/s13428-013-0334-1

27.

Leppink

Paas

Van Gog

van Der Vleuten

C. P.

Van Merrienboer

J. J.

(2014). Effects of pairs of problems and examples on task performance and different types of cognitive load. Learning and Instruction, 30, 32–42. https://doi.org/10.1016/j.learninstruc.2013.12.001

28.

Chan

Denny

Luxton-Reilly

Tempero

(2019, January). Towards a framework for teaching debugging. Proceedings of the twenty-first Australasian Computing Education Conference, Sydney NSW, 29–31 January, 2019 (pp. 79–86). ACM. https://doi.org/10.1145/3286960.3286970

29.

Lin

Y. T.

C. C.

Hou

T. Y.

Lin

Y. C.

Yang

F. Y.

Chang

C. H.

(2016). Tracking students’ cognitive processes during program debugging—an eye-movement approach. IEEE Transactions on Education, 59(3), 175–186. https://doi.org/10.1109/TE.2015.2487341

30.

Liu

Zhi

Hicks

Barnes

(2017). Understanding problem solving behavior of 6–8 graders in a debugging game. Computer Science Education, 27(1), 1–29. https://doi.org/10.1080/08993408.2017.1308651

31.

Lye

S. Y.

Koh

J. H. L.

(2014). Review on teaching and learning of computational thinking through programming: What is next for K-12? Computers in Human Behavior, 41, 51–61. https://doi.org/10.1016/j.chb.2014.09.012

32.

McLaren

B. M.

van Gog

Ganoe

Karabinos

Yaron

(2016). The efficiency of worked examples compared to erroneous examples, tutored problem solving, and problem solving in computer-based learning environments. Computers in Human Behavior, 55, 87–99. https://doi.org/10.1016/j.chb.2015.08.038

33.

Michaeli

Romeike

(2019a, April). Current status and perspectives of debugging in the K12 classroom: A qualitative study. In Proceedings of 2019 IEEE global engineering education conference (pp. 1030–1038). IEEE. https://doi.org/10.1109/EDUCON.2019.8725282

34.

Michaeli

Romeike

(2019b, October). Improving debugging skills in the classroom: The effects of teaching a systematic debugging process. Proceedings of the 14th workshop in primary and secondary computing education, Glasgow Scotland UK, 23–25 October, 2019 (pp. 1–7). ACM. https://doi.org/10.1145/3361721.3361724

35.

Murphy

Lewandowski

McCauley

Simon

Thomas

Zander

(2008). Debugging: The good, the bad, and the quirky – a qualitative analysis of novices’ strategies. ACM SIGCSE Bulletin, 40(1), 163–167. https://doi.org/10.1145/1352322.1352191

36.

Mutlu-Bayraktar

Cosgun

Altan

(2019). Cognitive load in multimedia learning environments: A systematic review. Computers & Education, 141, 103618. https://doi.org/10.1016/j.compedu.2019.103618

37.

O’Dell

D. H.

(2017). The debugging mind-set. Communications of the ACM, 60(6), 40–45. https://dl.acm.org/doi/10.1145/3052939

38.

Papert

(1980). Mindstorms: Children, computers, and powerful ideas. Basic Books, Inc.

39.

Pea

R. D.

(1986). Language-independent conceptual ‘bugs’ in novice programming. Journal of Educational Computing Research, 2(1), 25–36. https://doi.org/10.2190/689T-1R2A-X4W4-29J2

40.

Perkins

D. N.

(1993). Person-plus: A distributed view of thinking and learning. In Salomon

(Ed.), Distributed cognitions: Psychological and educational considerations (pp. 88–110). Cambridge University Press.

41.

Rich

K. M.

Strickland

Binkowski

T. A.

Franklin

(2019, February). A k–8 debugging learning trajectory derived from research literature. Proceedings of the 50th ACM Technical Symposium on Computer Science Education, Minneapolis, MN, 27 February–2 March, 2019 (pp. 745–751). ACM. https://doi.org/10.1145/3287324.3287396

42.

Seufert

(2018). The interplay between self-regulation in learning and cognitive load. Educational Research Review, 24, 116–129. https://doi.org/10.1016/j.edurev.2018.03.004

43.

Shadiev

Hwang

W. Y.

Huang

Y. M.

Liu

T. Y.

(2015). The impact of supported and annotated mobile learning on achievement and cognitive load. Educational Technology & Society, 18(4), 53–69. http://www.jstor.org/stable/jeductechsoci.18.4.53

44.

Sweller

(2020). Cognitive load theory and educational technology. Educational Technology Research and Development, 68(1), 1–16. https://doi.org/10.1007/s11423-019-09701-3

45.

Sweller

Van Merriënboer

J. J. G.

Paas

(1998). Cognitive architecture and instructional design. Educational Psychology Review, 10, 251–296. https://doi.org/10.1023/A:1022193728205

46.

Swidan

Hermans

Smit

(2018). Programming misconceptions for school students. Proceedings of the 2018 ACM Conference on International Computing Education Research, Espoo, Finland, 13–15 August, 2018 (pp. 151–159). ACM. https://doi.org/10.1145/3230977.3230995

47.

Thees

Kapp

Strzys

M. P.

Beil

Lukowicz

Kuhn

(2020). Effects of augmented reality on learning and cognitive load in university physics laboratory courses. Computers in Human Behavior, 108, 106316. https://doi.org/10.1016/j.chb.2020.106316

48.

Tsai

C. Y.

(2019). Improving students’ understanding of basic programming concepts through visual programming language: The role of self-efficacy. Computers in Human Behavior, 95, 224–232. https://doi.org/10.1016/j.chb.2018.11.038

49.

Van Merriënboer

J. J.

Sweller

(2005). Cognitive load theory and complex learning: Recent developments and future directions. Educational Psychology Review, 17(2), 147–177. https://doi.org/10.1007/s10648-005-3951-0

50.

Van Gog

Paas

Van Merriënboer

J. J.

(2004). Process-oriented worked examples: Improved transfer performance through enhanced understanding. Instructional Science, 32, 83–98. https://doi.org/10.1023/B:TRUC.0000021810.70784.b0

51.

Van Gog

Paas

Van Merriënboer

J. J.

(2006). Effects of process-oriented worked examples on troubleshooting transfer performance. Learning and Instruction, 16(2), 154–164. https://doi.org/10.1016/j.learninstruc.2006.02.003

52.

Van Gog

Rummel

(2010). Example-based learning: Integrating cognitive and social-cognitive research perspectives. Educational Psychology Review, 22(2), 155–174. https://doi.org/10.1007/s10648-010-9134-7

53.

Van Gog

Rummel

Renkl

(2019). Learning how to solve problems by studying examples. In Dunlosky

Rawson

K. A.

(Eds.), The Cambridge handbook of cognition and education (pp. 183–208). Cambridge University Press.

54.

Vessey

(1985). Expertise in debugging computer programs: A process analysis. International Journal of Man–Machine Studies, 23(5), 459–494. https://doi.org/10.1016/S0020-7373(85)80054-7

55.

Wang

Fang

Miao

(2018). Learning performance and cognitive load in mobile learning: Impact of interaction complexity. Journal of Computer Assisted Learning, 34(6), 917–927. https://doi.org/10.1111/jcal.12300

56.

White

Sabarwal

(2014). Quasi-experimental design and methods. Methodological Briefs: Impact Evaluation, 8, 1–16. https://www.unicef-irc.org/KM/IE/img/downloads/Quasi-Experimental_Design_and_Methods_ENG.pdf

57.

Wittwer

Renkl

(2010). How effective are instructional explanations in example-based learning? A meta-analytic review. Educational Psychology Review, 22, 393–409. https://doi.org/10.1007/s10648-010-9136-5

58.

Y. L.

Ruis

A. R.

Wang

M. H.

(2019). Analysing computational thinking in collaborative programming: A quantitative ethnography approach. Journal of Computer Assisted Learning, 35(3), 421–434. https://doi.org/10.1111/jcal.12348

59.

Yoon

B. D.

Garcia

O. N.

(1998). Cognitive activities and support in debugging. Proceedings of the fourth annual symposium on human interaction with complex systems, 22–25 March, 1998, Dayton, OH (pp. 160–169). IEEE. https://doi.org/10.1109/HUICS.1998.659974

60.

Zhong

(2021). Troubleshooting to learn via scaffolds: Effect on students’ ability and cognitive load in a robotics course. Journal of Educational Computing Research, 59(1), 95–118. https://doi.org/10.1177/0735633120951871