Effects of Game-Based Learning on Students’ Computational Thinking: A Meta-Analysis

Abstract

This meta-analysis determined game-based learning’s (GBL) overall effect on students’ computational thinking (CT) and tested for moderators, using 28 effect sizes from 24 studies of 2,134 participants. The random effects model results showed that GBL had a significant positive overall effect on students’ CT (g = 0.677, 95% confidence interval 0.532–0.821) with significant heterogeneity among effect sizes. Among game types, role-playing yielded the largest GBL effect size, followed by action, puzzles, and adventures. Moreover, the effect of GBL on CT was weaker among students in countries that were more individualistic than others. Lastly, interventions between four hours and one week showed the largest GBL effect size, followed by those over four weeks, up to four hours, and between one week and four weeks.

Keywords

game-based learning computational thinking meta-analysis moderating analysis effect size

Introduction

Students’ game-based learning (GBL) is linked to their greater mathematics achievement (meta-analyses show effect sizes of 0.13 for Tokac et al., 2019 and 0.37 for Byun & Joung, 2018), and GBL’s close fit with computational thinking (CT) might yield an even larger effect. To explain why GBL might especially suit learning CT, we map GBL processes (complex story, rules, goals/subgoals, autonomy, feedback, tries, Burke, 2014) on to CT processes (complex simulation, problem solving, sequence/algorithm, conditional logic, loop, debug; Brennan & Resnick, 2012). As past studies of GBL and CT show mixed results (e.g., positive effect: Du, 2020; negative effect: Lee et al., 2014), we meta-analyze 28 effect sizes from 24 studies of 2,134 participants.

Theoretical Framework

After defining GBL and CT, we explicate how GBL maps on to CT, and discuss how various factors might affect GBL’s effects on CT. GBL is learning by playing a game within a complex story with rules. Students achieve an overall goals by attaining subgoals, using their autonomy to make decisions, receive feedback on the outcomes of their decisions, and making further tries if needed (Burke, 2014). For example, in a treasure hunt game, a student helps a turtle look for 2 treasure chests with three maps (1 real, 2 fake), starting at an origin. This complex story (treasure hunt) has rules (turtle turns and moves forward), an overall goal (get two treasures) and several sub-goals (determine real map, go to location, dig, get treasure). The student has autonomy to make decisions (use which map?), receives feedback (digging at this location uncovers no treasure chest), and can try again (use another map). Lastly, the results appear on a leaderboard, which provides a status award to high scorers (extrinsic/instrumental motivation, from Ryan & Deci’s self-determination theory in 2017).

Papert (1980) introduced and Wing (2006) defined it: “computational thinking involves solving problems, designing systems, and understanding human behavior, by drawing on the concepts fundamental to computer science.” Key CT concepts and skills include simulation, problem solving, sequences/algorithms, loops, conditional logic, and debugging (Brennan & Resnick, 2012).

GBL Processes Map on to CT Processes

Applying GBL to learning CT, a student writes a Logo computer program to ensure that the turtle finds the two treasure chests, developing CT knowledge and skills en route. To succeed within the complex treasure hunt story, a student’s Logo program simulates turtle decisions and actions, solving problems along the way (such as how to find the real map). As a student has autonomy to decide which map to use first, (s)he might start with the treasure chest location closest to the origin (of all three maps), and write the Logo program accordingly. To achieve subgoals, the computer program includes sequences/algorithms such as moving from the current location latitude to the treasure chest location latitude on the current map: (a). compute steps from current location latitude to treasure location latitude, (b). turn turtle in that direction, (c). move this number of steps (latitude motion). After the turtle digs at a location, the game gives feedback: (no) treasure chest here. Accounting for different possible feedback, a student uses if-then statements (conditional logic) to determine the next sequence of actions: if found treasure chest, go to next treasure chest location on map; else get another map. As some actions are repeated, a student can write loop statements: repeat algorithm until correct map is found. Typically, the first program fails. However, the game allows multiple tries, so the student can scrutinize the program instructions and consequent game actions, identify its flaw(s) and revise accordingly (debug). Although well-designed GBL for CT can yield many benefits, a good GBL design is non-trivial, so poorly designed GBLs can be too (a). difficult and frustrate students or (b). easy and bore students—both of which can demotivate students (Burke, 2014; Ryan & Deci, 2017).

As many GBL processes map on to CT processes, we propose two hypotheses. First, GBL is more effective than traditional instruction at helping students learn CT.

H-1a. GBL outperforms traditional instruction at increasing students’ CT.

Second, the close mapping of GBL on to CT yields larger effect sizes than GBL on other subjects such as mathematics.

H-1b. GBL’s effect size on CT is larger than its effect size on mathematics.

Differences in GBL’ s effects on CT

Game intervention design (game type, intervention duration), demographic factors (individualism, grade level/age) or measures (control group, CT assessment, instrumental validity) might account for differences in GBL effects on CT across studies.

Game Intervention

Game types might yield different GBL effects on CT. Specifically, simulations or role-playing GBL might outperform other types of GBL at improving student learning of CT or other subjects. As CT often involves simulations, simulation games include instant feedback, explicit encouragement for multiple tries, and supportive structures for sequences of steps (algorithms), which approximate and support CT more than other games (Liu, 2019). Meanwhile, when students role play, especially adult roles, they try to think and act like older people with superior knowledge and skills, which can yield higher performance than otherwise (Vygotsky, 2016). Such role play might improve students’ CT. Indeed, Yildiz et al. (2017) showed that algorithmic thinking was highest among students who played simulation games, followed by those who played role-playing games, adventure games, and action games.

Hence, we propose the following hypotheses.

H-2a. Students in simulation GBL outperform all other students at CT.

H-2b. Except for students in simulation GBL, students in role playing GBL

outperform all other students at CT.

Neither the shortest nor the longest durations of game-based intervention typically yield the best results. Although longer interventions (e.g., days rather than minutes) can increase their cumulative intensity to yield larger results (Ross & Begeny, 2015), interventions across longer time periods (e.g., weeks) can be diluted by other events to yield weaker results (Nahmias et al., 2019). Together, these results suggest that an intermediate duration yields a peak effect (e.g., days; Racey et al., 2016).

H-3. GBL interventions lasting days outperform those lasting minutes and those lasting weeks.

Demographics

Demographics (culture, grade/age) might also affect GBL effects on CT. Cultural values differ across countries and might moderate GBL’s motivation effects on CT. Unlike people in individualistic cultures (e.g., Canada, Germany), those in collectivist ones (e.g., China, Korea) value group goals more than individual goals (Hofstede, 2019), and rely more on nearby, extended family members (Shah, 2015). As such parents tell their children that their learning outcomes at school affect their family status and reputation (extrinsic motivation, Ryan & Deci, 2017), these children value extrinsic motivation more, compared to those in individualistic cultures (Ni et al., 2010). Hence, students with more extrinsic motivation have lower achievement in individualistic countries but not collectivist ones (Chiu & Chow, 2010; D’Ailly, 2003). As GBL uses both extrinsic motivation (leader board, immediate feedback; Burke, 2014) and intrinsic motivation (autonomy, sub-goals; Ryan & Deci, 2017), GBL might raise motivations and CT more for students in collectivist countries than in individualistic ones.

H-4. GBL effects on CT are greater in countries that are collectivist.

Compared to younger students, older students have more knowledge and skills than younger students do (Daniel & Gagnon, 2011), so they might capitalize on them to adapt to GBL more quickly to learn more CT (Mao et al., 2022). Thus, we expect GBL to improve the CT of older students more than the CT of younger students.

H-5. GBL effects on CT are greater for older students than for younger students.

Measures

Measures (control group condition, CT concepts vs. skills, instrument reliability) might also affect GBL effects on CT. Control groups in studies: (a). do not exist, (b). receive active instruction, (c). receive passive instruction, or (d). receive unreported instruction. As pre- and post-test studies without control groups ignore typical improvement over time via traditional instruction, they often overestimate treatment effects, compared to studies with both experimental and control groups (Lipsey & Wilson, 1993). For example, Lei et al.’s (2022) meta-analysis of GBL showed that studies with control groups showed smaller effects than other studies.

H-6a. GBL effects on CT are smaller in studies with control groups.

While active instruction control groups (e.g., group discussion) might receive better instruction and outperform passive instruction control groups (e.g., watch educational videos, Sitzmann, 2011), GBL meta-analyses showed mixed results. For example, GBL effects against active control groups rather than passive control groups were weaker in Sitzmann (2011) but stronger in Wouters et al. (2013). Together, they suggest that the quality of instruction matters more than its passive or active nature (no hypothesis on active vs. passive instruction).

As knowing a CT concept (e.g., define loops) is often easier (and a prerequisite) to applying its corresponding CT skills (e.g., correctly use loops in a program to achieve a subgoal), assessments of learning CT concepts rather than learning CT skills will often yield higher scores (Ma & Liu, 2019; Zhang & Nouri, 2019). Indeed, Wouters et al. (2013) found higher GBL effects on knowledge than on skills.

H-6b. GBL effects on CT are larger in studies assessing CT concepts rather than CT skills

Reliability of CT assessment can also affect the GBL effect size on CT. Reliability is often lowest for researcher’s original, unvalidated instruments, higher for validated instruments, and highest for standardized tests (Tokac et al., 2019). Such instruments with more reliability often have less measurement error and hence are more likely to yield significant effect sizes (Cohen et al., 2003).

H-6c. GBL effect on CT is most likely to be significant with standardized CT tests, less likely with validated CT tests, and least likely with other CT tests

Study Purpose

This study tests whether the strong mapping of GBL processes on CT processes yield positive GBL effects on CT (larger than GBL effects on mathematics). Furthermore, this meta-analysis tests whether GBL effects on CT differ across game intervention design (game type, intervention design), demographics (individualism, grade level/age), or measures (control group condition/instruction, CT concepts vs. skills, CT assessment validation).

Methods

Study Searching

In mid-March 2022, we searched the following electronic databases for relevant studies: EBSCO, Web of Science, ProQuest Dissertations and Theses Global, ScienceDirect, China National Knowledge Infrastructure, and WanFang DATA. We used this keyword search combination: (game-based learning, game, educational game, digital game, simulation game, role-playing game, serious game, computer game, or video game) AND (computational thinking, computational thinking skill, or computational thinking concept). Moreover, we reviewed the references of all reviews and included articles to find additional studies. The initial search yielded 2,305 articles (sources: 1,022 Web of science, 87 EBSCO, 35 ScienceDirect, 32 ProQuest Dissertations and Theses Global, 422 China National Knowledge Infrastructure, 685 WanFang DATA, and 22 from other sources). We removed 347 duplicate articles.

Study Selection and Coding

We included articles that fit all seven inclusion criteria: (a). used terms related to game-based learning in titles, keywords or abstracts; (b). used terms related to CT in any part, including the main texts; (c). were written in English or Chinese; (d). available in full-text, (e). empirically examine the effect of GBL on CT (excluding theoretical papers, reviews, and unrelated studies); (f). had an experimental or pre-post design (excluding surveys and qualitative studies); (g). reported sample sizes and at least one key measure (standard deviation, t, F or p values).

The first author and corresponding author independently evaluated and coded each study. Specifically, they applied these criteria with 95% agreement to remove 1,934 articles, and discussed disagreements to reach consensus. Hence, this meta-analysis included 24 studies from 2011–2022 (with 28 independent effect sizes, see Figure 1).

Figure 1.

PRISMA 2009 flow diagram.

The two authors coded each manuscript for game intervention attributes (game type, duration), demographics, and measures (see Table 1). They applied Prensky’ s (2001) game taxonomy: action, adventure, fighting, puzzle, simulation, role-playing, sports, and strategy. We coded five intervention durations: not reported, up to 4 hours, between 4 hours and 1 weeks, within 1–4 weeks, and over 4 weeks (Lei et al., 2022).

Table 1.

Study Characteristics Included in Meta-Analysis.

Study	N	Control group	Control condition	Game type	Individualism (score)	Grade level	Instrument reliability	Duration	CT construct	Hedges’s g
Ayman et al. (2018)	15	1 (8C + 7E)	3	2	25	1	1	1	1	1.570
Chang (2017)	37	2 (19C + 18E)	2	1	17	2	1	1	1	0.369
Del Olmo-Munoz et al. (2020)	84	2 (42C + 42E)	4	1	51	2	1	1	2	0.884
Du (2020)	44	2 (23C + 21E)	3	4	20	3	2	2	2	0.575
Gao (2014)	220	2 (110C + 110E)	3	2	20	3	1	1	2	1.135
Gao (2014)	240	2 (120C + 120E)	3	2	20	4	1	1	2	0.514
Gao (2020)	65	1	1	2	20	5	2	2	2	1.316
Gong and Qiao (2021)	32	1	1	1	20	5	2	3	2	0.659
Hooshyar et al. (2021a)	79	2 (43C + 36E)	3	1	60	2	2	1	1	0.675
Hooshyar et al. (2021b)	78	2 (42C + 36E)	3	1	60	2	2	1	1	0.579
Hooshyar et al. (2021b)	78	2 (42C + 36E)	3	1	60	2	2	1	2	0.552
Kazimoglu (2020)	151	1	1	1	89	5	1	3	1	0.260
Kazimoglu (2020)	151	1	1	1	89	5	1	3	2	0.350
Lee et al. (2014)	18	1	1	5	91	2	1	1	2	−0.298
Liu (2019)	60	2 (30C + 30E)	3	1	20	2	3	2	2	0.652
Mou (2011)	104	2 (52C + 52E)	3	5	20	5	1	4	2	1.113
Zhang (2019a)	89	2 (42C + 47E)	2	1	20	4	1	5	2	0.731
Psycharis and Kotzampasaki (2019)	115	1	1	3	35	2	3	4	2	0.374
Qiu (2019)	108	2 (54C + 54E)	3	2	20	4	1	4	2	0.776
Rose et al., (2020)	57	2 (30C + 27E)	4	1	89	2	2	1	2	0.513
Rose et al., (2020)	57	2 (30C + 27E)	4	1	89	2	2	1	2	0.179
Sun (2021)	30	1	1	1	20	2	1	2	2	0.746
Wang (2021a)	40	1	1	2	20	2	2	1	2	0.490
Zhang (2019b)	43	1	1	1	20	2	3	1	1	2.000
Wang (2021b)	30	1	1	2	20	3	2	5	2	0.801
Zhang (2017)	38	1	1	3	20	3	1	2	2	0.460
Zhao and Shute (2019)	43	1	1	1	91	3	1	1	2	0.514
Zhou (2021)	28	1	1	1	20	2	1	2	2	1.408

Note. Control group (1 = No; 2 = Yes) with #C + #E indicate the numbers of participants in the control group and in the experimental group; Control condition (1 = None; 2 = Instruction not reported; 3 = Passive instruction; 4 = Active instruction); Game type (1 = Action, 2 = Role playing, 3 = Adventure, 4 = Simulation, 5 = Puzzle); Grade level (1= Kindergarten, 2 = Elementary school, 3 = Middle school, 4 = High school, 5 = College); Instrument reliability (1 = Other test, 2 = Validated test, 3 = Standardized test); Duration (1 = Up to 4 hours; 2 = Between 4 hours and 1 week; 3 = Within 1–4 weeks; 4 = Over 4 weeks; 5 = Not reported); Computational thinking construct (1 = Concept; 2 = Skill).

Demographics include grade level and individualism. The coders divided studies into 5 grade levels: kindergarten, elementary school, middle school, high school, and college. Based on the country, we retrieved its individualism cultural value from Hofstede (2019).

Measures include control group attributes, CT construct, and instrument reliability. The coders divided studies into four types of control conditions: none (e.g., Zhao & Shute, 2019), instruction not described (e.g., Chang, 2017), passive instruction (e.g., Hooshyar et al., 2021b), or active instruction (e.g., Rose et al., 2020). Also, they categorized CT outcomes as concepts or skills (Hooshyar et al., 2021a). Instrument reliability was coded as: standardized test, validated test, or other test (Tokac et al., 2019).

Inter-rater reliability was acceptable to high (Cohen’s kappa, Warrens, 2015): game type (0.824), intervention duration (0.825), individualism (1.000), grade level (1.000), control condition (0.873), CT construct (0.833) instrument reliability (0.833).

Next, we calculated the effect sizes for each independent sample within a study. If multiple independent samples of students participated in a game-based intervention study, we encoded them separately. If a study reported multiple components of CT within the same construct (concept or skill) or measured it multiple times after intervention within a sample, we used their mean and corresponding total effect size. We computed each study’s effect size, Hedges’s g (corrected Cohen’s d, Borenstein et al., 2005) via t, F, p values, or its sample sizes, means, and standard deviations in each group with Comprehensive Meta-Analysis 3.3.

We assessed study quality on the revised Jadad et al. (1996) Scale (0–5). Its criteria were as follows. Regarding double-blinding, a detailed description yielded two points, and its use without description yielded one point. Regarding random assignment of participants to conditions, a detailed description yielded two points, and its use without description yielded one point. Specified number of lost or withdrawn participants yielded one point. All 24 articles scored more than two points, indicating high-quality studies.

As these studies used different measures for different game interventions on different student populations, they were likely distinct and heterogeneous, so we used a random effect model, which likely better fit the sampling distribution, allowed effect sizes to vary, and allowed our conclusions to generalize more broadly (Borenstein et al., 2010). We also computed the Q statistic (Hedges, 1982) to determine the heterogeneity among effect sizes and the I² statistic (Higgins & Thompson, 2002) to determine the variance between studies (accounting for sampling error, Huedo-Medina et al., 2006).

We assessed publication bias with a funnel plot, fail-safe number, Egger’s regression, trim-and-fill, and single-study exclusions. A funnel plot with a severe asymmetric distribution of effect sizes suggests publication bias. When the minimum number of studies that render the computed effect size non-significant (fail-safe number, Nfs) falls below 5k+10 (k = number of studies), the danger of publication bias is substantial (Khoury et al., 2013). In Egger’s et al., (1997) regression, a significant intercept far from zero indicates risk of publication bias. We used trim-and-fill to calculate the number of missing studies and add their effects to yield an adjusted mean effect size (Duval & Tweedie, 2000). We also tested whether removing single studies with extreme effect sizes (outliers) substantially changed the overall effect size (Borenstein et al., 2009).

Results

The random-effects model of the 24 studies’ 28 effect sizes showed a strong, positive GBL effect on students’ CT (g = 0.677, k = 28, 95% confidence interval 0.532–0.821), see Table 2 and Figure 2). This result supports hypothesis H-1a (positive GBL effect on CT). As the GBL effect size on CT (0.677) far exceeds its effect sizes on mathematics in meta-analyses (0.13 for Tokac et al., 2019 and 0.37 for Byun & Joung, 2018), this result also supports H-1b.

Table 2.

Random-Effects Model of the Effect of GBL on CT.

k	N	g	95% CI		Heterogeneity			Tau-squared			Test of null (2-tailed)
k	N	g	LL	UL	Q	p	I ²	Tau²	SE	Tau	Z	p
28	2134	0.677	0.532	0.821	117.264	0.000	76.975	0.105	0.047	0.324	9.173	0.000

Note. k = Number of effect sizes; CI = Confidence interval; LL = Lower limit; UL = Upper limit.

Figure 2.

Forest plot for the random-effects model.

Publication Bias and Sensitivity Analysis

We tested the likelihood of publication bias with a funnel plot, Nfs, Egger’s regression, trim-and-fill, and single-study exclusions. In the funnel plot, the 28 effect sizes were mostly symmetrically distributed across the axis (see Figure 3). Also, the Nfs of 2,273 far exceeded the threshold of 150 (z = 17.765, p < .001; 150 = 5 × 28 + 10; 5k + 10, Card, 2011). The intercept of Egg’s regression significantly exceeded zero (2.599, p = 0.003), indicating risk of publication bias. To address this possible publication bias, a random-effects trim-and-fill showed missing 4 studies on the right and raised the overall effect from 0.677 to 0.779. Also, removing possible single study outliers still yielded effect sizes within the 95% confidence interval (0.532–0.821; see Figure 4).

Figure 3.

Funnel plot of the effect sizes with 95% confidence interval.

Figure 4.

Forest Plot of Sensitivity Analysis for the one study removed.

Moderator Analysis

The homogeneity test showed significant heterogeneity among effect sizes (Q = 117.264, p < .001, I² = 76.975, see Table 2 and forest plot in Figure 2). Hence, we tested for the following moderators: game intervention (game type, duration), demographics (individualism, grade/age), and measures (control group, CT assessment, instrument reliability).

Game Type and GBL Intervention Duration

Game type moderated the GBL effect on CT (Q_BET = 9.944, df = 4, p < .05, see Table 3). Role-playing games yielded the largest GBL effect size (g = 0.871, k = 7, 95% confidence interval 0.594–1.148), followed by action games (g = 0.662, k = 16, 95% confidence interval 0.476–0.849), simulation games (g = 0.575, k = 1, 95% confidence interval −0.018–1.168), puzzle games (g = 0.411, k = 2, 95% confidence interval −0.972–1.793), and adventure games (g = 0.395, k = 2, 95% confidence interval 0.232–0.558). As only one or two studies examined simulation, puzzle or adventure games, we interpret these results cautiously. These results support H-2b (role play) but not H-2a (simulation).

Table 3.

GBL and Students’ CT: Univariate Analysis of Variance for Moderating Variables.

	Between-group effect (Q_BET)	k	g	SE	95% CI		Homogeneity test within each group (Q_W)
	Between-group effect (Q_BET)	k	g	SE	LL	UL	Homogeneity test within each group (Q_W)
Game type	9.944* (p = .041)
Action game		16	0.662	0.095	0.476,	0.849	60.671***
Role playing game		7	0.871	0.141	0.594,	1.148	21.138**
Adventure game		2	0.395	0.083	0.232,	0.558	0.201
Simulation game		1	0.575	0.303	−0.018	1.168	0.000
Puzzle game		2	0.411	0.705	−0.972	1.793	14.484***
Intervention duration	11.889* (p = .018)
Up to 4 hours		14	0.655	0.123	0.414	0.896	52.550***
Between 4 hours and 1 week		6	0.863	0.175	0.521	1.205	19.227**
Within 1–4 weeks		3	0.358	0.083	0.194	0.521	3.694
Over 4 weeks		3	0.708	0.223	0.271	1.145	9.442**
Not reported		2	0.768	0.149	0.475	1.061	0.054
Grade level	2.718 (p = .606)
Kindergarten		1	1.570	0.565	0.462	2.678	0.000
Elementary school		14	0.643	0.121	0.406	0.880	56.194***
Middle school		5	0.697	0.133	0.435	0.958	9.279
High school		3	0.643	0.107	0.433	0.853	1.290
College		5	0.705	0.188	0.336	1.074	41.114***
Control group?	0.05 (p = .823)
No		13	0.666	0.115	0.441	0.891	90.071***
Yes		15	0.696	0.072	0.555	0.838	19.127
Control group instruction	1.189 (p = .756)
None		13	0.666	0.115	0.441	0.891	90.071***
Instruction not reported		2	0.619	0.181	0.265	0.973	0.859
Passive instruction		10	0.751	0.087	0.581	0.922	12.595
Active instruction		3	0.542	0.210	0.130	0.953	4.190
CT construct	0.477 (p = .490)
Concept		6	0.845	0.269	0.318	1.372	40.679***
Skill		22	0.652	0.074	0.507	0.798	73.888***
Instrument type	0.496 (p = .780)
Other test		15	0.647	0.097	0.457	0.837	61.054***
Validated test		10	0.659	0.104	0.464	0.863	20.614*
Standardized test		3	0.983	0.468	0.067	1.900	29.875***

Note. k = Number of effect sizes; CI = Confidence interval; LL = Lower limit; UL = Upper limit.

***p < .001; **p < .01, *p < .05.

GBL intervention duration significantly moderated GBL’s effect on CT (Q_BET = 11.889, df = 4, p < .05). Interventions between four hours and one week showed the largest effect size (g = 0.863, k = 6, 95% confidence interval 0.521–1.205), followed by over those four weeks (g = 0.708, k = 3, 95% confidence interval 0.271–1.145), those up to four hours (g = 0.655, k = 14, 95% confidence interval 0.414–0.896), and those within one to four weeks (g = 0.358, k = 3, 95% confidence interval 0.194–0.521). Hence, these results support H-3 (days of GBL).

Individualism and Grade Level/Age

Individualism significantly moderated the GBL effect on CT (Q_Model (1, k = 28) = 13.42, p < .001,see Table 4, supporting H-4) but grade level/age did not (Q_BET = 2.718, df = 4, p > .05; no support for H-5). The meta-regression showed that the effect of GBL on CT was weaker among students in individualistic countries (β = −0.008, z = −3.66, p < .001).

Table 4.

Meta-Regression Analysis of Continuous Variables (Random-effects Model).

	Parameter	Estimate	SE	Z-value	p	95% Lower	95% Upper
Individualism	β0	0.998	0.101	9.10***	0.000	0.783	1.212
	β1	−0.008	0.002	−3.66***	0.000	−0.012	−0.004
	Q model (1, k = 28) = 13.42, p < 0.001

Note. k = Number of effect sizes; CI = Confidence interval.

***p < .001; **p < .01, *p < .05.

Measures

Measures did not moderate GBL effects on CT. Control condition did not significantly moderate the GBL effect on CT, regardless of whether testing its presence (Q_BET = 0.05, df = 1, p > .05) or its type of instruction (Q_BET = 1.189, df = 3, p > .05; see Table 3). Also, CT construct did not significantly moderate the GBL effect on CT (Q_BET = 0.477, df = 1, p > .05). Lastly, instrument reliability did not moderate the GBL effect on CT (Q_BET = 0.496, df = 2, p > .05). Hence, these results show no support for H-6a, H-6b, or H-6c.

Discussion

This meta-analysis of 28 effect sizes from 24 studies showed a medium positive GBL effect on students’ CT, moderated by game type, intervention duration, and individualism.

GBL Effect on CT

GBL had a positive overall effect on students’ CT (g = 0.677), much larger than GBL effects on mathematics (0.13 for Tokac et al., 2019 and 0.37 for Byun & Joung, 2018). This result supports our claim that the close mapping of GBL processes (complex story, rules, goals/subgoals, autonomy, feedback, tries, Burke, 2014) on to CT processes (complex simulation, sequence/algorithm, problem solving, conditional logic, loop, debug; Brennan & Resnick, 2012) helps students learn CT.

This result suggests both theoretical and practical implications. Regarding theory, scholars can create theoretical models of mappings of GBL processes on to cognitive processes in other academic subjects (English, history, science, etc.). Then, they can test whether mappings with greater correspondences between GBL processes and cognitive processes in a subject domain yield greater GBL effects on student learning in that subject. Practically, the positive GBL effect on students’ CT suggests that educators consider how to help teachers incorporate GBL into their lessons to help their students learn more CT.

Moderators

The present study found that game type, intervention duration, and individualism moderated the GBL effect on CT. By contrast, grade level/age, control group condition, CT assessment, and instrument reliability did not moderate this effect.

Game Type

Game type moderated the effect of GBL on CT. GBL with role-playing games yielded a larger effect on CT than GBL with action games, simulation games, puzzle games or adventure games. This result aligns with the claim that students’ role playing (especially adult roles) encourages them to think and act like older people with superior knowledge and skills (Vygotsky, 2016). This result also partially coheres with Yildiz et al.’s (2017) result showing that algorithmic thinking was highest among students who played simulation games, then by those who played role-playing games, compared to other games. As our meta-analysis included few studies of simulations, puzzles or adventure games, more future studies can include these games to help future meta-analyses determine whether GBL with role playing rather than these game types help students learn more CT.

Practically, the many GBL studies with action games or role-playing games in this meta-analysis enable confidence in the greater effectiveness of role-playing GBL over action game GBL for learning CT. Hence, when role-playing GBL and action game GBL are both viable for learning specific CT concepts or skills, these results suggest that educators use role-playing GBL rather than action game GBL in the absence of other compelling reasons.

Intervention Duration

GBL interventions between four hours and one week showed the strongest GBL effect on CT, aligning with our proposal that the peak effect occurs for GBL interventions lasting days rather much shorter GBL interventions (minutes) or much longer GBL interventions (weeks). This intermediate peak result is also consistent with other intervention duration results showing that extremely short interventions (e.g., minutes) can lack sufficient cumulative intensity and effectiveness (Ross & Begeny, 2015), while other events can dilute the impacts of longer interventions (e.g., weeks, Nahmias et al., 2019). Practically, these results suggest that educators design GBL learning with intermediate durations (four hours to one week) to maximize students’ CT learning. Future studies with different GBL intervention durations can further narrow the window of the optimal GBL intervention duration for GBL of CT.

Individualism

The effect size of GBL on CT was smaller among students in individualistic countries. This result aligns with the view that compared to students in individualistic cultures, those in collectivist cultures value group interests (Hofstede, 2019) and extrinsic motivation more (Ni et al., 2010; e.g., from their family, Shah, 2015), so extrinsic motivation aspects of GBL, such as leadership boards (Burke, 2014), benefit these students more. Future fine-grained studies can test the validity of this hypothesized mechanism; if supported, this mechanism suggests the greater use of leadership boards and other extrinsic motivation aspects of GBL in collectivist cultures but less use of them in individualistic cultures.

Limitations and Future Research

This meta-analysis has three major limitations: number of primary studies, inadequate information about moderators, and limited languages of published studies. First, this literature search only yielded 24 relevant studies of GBL effects on CT. As this small number of studies has low statistical power for study-level variables (e.g., moderators such as simulation game type), their results might not be generalizable. Specifically, we must cautiously interpret non-significant results at the study-level (possible false negatives), though we retain confidence in our significant results (Cohen et al., 2003). After researchers conduct more primary studies of GBL and CT, future meta-analyses can include such studies for more statistical power.

Second, many primary studies lacked information regarding key moderator variables, such as student gender and game difficulty (Wang et al., 2010). As a result, our meta-analysis could not test whether they moderate GBL effects on CT. Future studies can include such information to enable testing of their moderation effects in future meta-analyses.

Lastly, the authors only read Chinese and English, so this meta-analysis only included studies published in Chinese or English. Future teams of researchers with literacy in more languages can include more studies in their meta-analyses.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Zhuotao Lu

Weijie Mao

Hao Lei

Author Biographies

Zhuotao Lu is a PhD candidate at the Institute of Curriculum and Instruction of East China Normal University. Her research interests include design of curriculum combined with digital games to support computational thinking development and assessment of computational thinking.

Ming M. Chiu is Chair professor of Analytics and Diversity in the Special Education and Counseling Department and Director of the Assessment Research Centre at The Education University of Hong Kong. His research interests include learning analytics, group processes, inequality, corruption, and online sexual predators.

Yunhuo Cui is a professor and Director of the Institute of Curriculum and Instruction at East China Normal University. His research interests include effective teaching, curriculum evaluation, school-based curriculum development in China, and school-oriented teacher professional development model curriculum studies.

Weijie Mao is a PhD candidate at the Institute of Curriculum and Instruction of East China Normal University. Her research interests include design of curriculum combined with digital games to support critical thinking development and assessment of critical thinking.

Hao Lei is a professor of the Institute of Curriculum and Instruction of East China Normal University. Hisresearch interests involve curriculum and instruction evaluation, teacher professional development, and student learning and development.

References

Ayman

Sharaf

Ahmed

Abdennadher

(2018). Minicolon; teaching kids computational thinking using an interactive serious game. In Göbel

Garcia-Agunde

Tregel

Hauge

J.B.

Oliveira

Marsh

Caserman

(Eds.), Serious games (pp. 79–90). Springer. https://doi.org/10.1007/978-3-030-02762-9_9

Borenstein

Hedges

L. V.

Higgins

J. P.

Rothstein

H. R.

(2005). Comprehensive meta-analysis (version 3.3). Biostat

Borenstein

Hedges

L. V.

Higgins

J. P. T.

Rothstein

H. R.

(2009). Introduction to meta-analysis. John Wiley & Sons, Ltd

Borenstein

Hedges

L. V.

Higgins

J. P.

Rothstein

H. R.

(2010). A basic introduction to fixed‐effect and random‐effects models for meta‐analysis. Research Synthesis Methods, 1(2), 97–111. https://doi.org/10.1002/jrsm.12

Brennan

Resnick

(2012). New frameworks for studying and assessing the development of computational thinking [Paper presented]. In: Proceedings of the 2012 Annual Meeting of the American Educational Research Association. Vancouver, BC, Canada. https://dam-prod.media.mit.edu/x/files/~kbrennan/files/Brennan_Resnick_AERA2012_CT.pdf

Burke

(2014). Gamify: How gamifification motivates people to do extraordinary things. Bibliomotion

Byun

Joung

(2018). Digital game-based learning for K-12 mathematics education: A meta-analysis. School Science and Mathematics, 118(3–4), 113–126. https://doi.org/10.1111/ssm.12271

Card

N. A.

(2011). Applied meta-analysis for social science research. The Guilford Press

Chang

C. H.

(2017). Transforming video gameplay experiences into a roadmap to facilitate children’s learning of computational thinking concepts (Publication No. 10287095). Doctoral Dissertation. University of Columbia ProQuest Dissertation & Theses Global

10.

Chiu

M. M.

Chow

B. W. Y.

(2010). Culture, motivation, and reading achievement: High school students in 41 countries. Learning and Individual Differences, 20(6), 579–592. https://doi.org/10.1016/j.lindif.2010.03.007

11.

Cohen

West

S. G.

Aiken

Cohen

(2003). Applied multiple regression/correlation analysis for the behavioral sciences. Lawrence Erlbaum

12.

D’Ailly

(2003). Children’s autonomy and perceived control in learning: A model of motivation and achievement in Taiwan. Journal of Educational Psychology, 95(1), 84–96. https://doi.org/10.1037/0022-0663.95.1.84

13.

Daniel

M. F.

Gagnon

(2011). Developmental process of dialogical critical thinking in groups of pupils aged 4 to 12 years. Creative Education, 2(5), 418–428. https://doi.org/10.4236/ce.2011.25061

14.

del Olmo-Munoz

Cozar-Gutierrez

Gonzalez-Calero

J. A.

(2020). Computational thinking through unplugged activities in early years of primary education. Computers & Education, 150, Article 103832. https://doi.org/10.1016/j.compedu.2020.103832

15.

F. F.

(2020). Research on the teaching of gamified programming to promote the development of computational thinking: Take Minecraft as an example. [Master’s Thesis, University of East China Normal]. CNKI. https://kns-cnki-net-443.web.bisu.edu.cn/KCMS/detail/aspx?dbname=CMFD202101&filename=1020635987.nh

16.

Duval

Tweedie

(2000). Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics, 56(2), 455–463. https://doi.org/10.1111/j.0006-341x.2000.00455.x

17.

Egger

Davey Smith

Schneider

Minder

(1997). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315(7109), 629–634. https://doi.org/10.1136/bmj.315.7109.629

18.

Gao

(2014). Research on the cultivation of computational thinking based on game-based teaching. [Master’s Thesis, Shaanxi Normal University]. CNKI. https://kns-cnki-net-443.web.bisu.edu.cn/KCMS/detail/detail.aspx?dbname=CMFD201501&filename=1014402628.nh

19.

Gao

(2020). Research on cultivation of computational thinking based on game-based teaching: Taking the basic course of computer application in higher vocational education as an example. [Master’s Thesis, Hunan Normal University]. CNKI. https://kns-cnki-net-443.web.bisu.edu.cn/KCMS/detail/detail.aspx?dbname=CMFD202101&filename=1020320653.nh

20.

Gong

Qiao

A. L.

(2021). The impact of game-based experiential learning on computational thinking. Modern Educational Technology, 31(11), 119–126. https://kns-cnki-net-443.web.bisu.edu.cn/kcms/detail/detail.aspx?dbcode=CJFD&dbname=CJFDLAST2021&filename=XJJS202111017&uniplatform=NZKPT&v=XpjWDuN4kfpuXqN8UPd6D6v3wYRypDAsqjsQDuOOi9q4NPoj90OLcVDCnT5AIrIP

21.

Hedges

L. V.

(1982). Estimation of effect size from a series of independent experiments. Psychological Bulletin, 92(2), 490–499. https://doi.org/10.1037/0033-2909.92.2.490

22.

Higgins

J. P. T.

Thompson

S. G.

(2002). Quantifying heterogeneity in a meta-analysis. Statistics in Medicine, 21(11), 1539–1558. https://doi.org/10.1002/sim.1186

23.

Hofstede

(2019). Compare countries. Hofstede insights. https://www.hofstede-insights.com/product/compare-countries/

24.

Hooshyar

Malva

Yang

Pedaste

Wang

Lim

(2021a). An adaptive educational computer game: Effects on students’ knowledge and learning attitude in computational thinking. Computers in Human Behavior, 114(6), 106575. https://doi.org/10.1016/j.chb.2020.106575

25.

Hooshyar

Pedaste

Yang

Malva

Hwang

G. J.

Wang

Lim

Delev

(2021b). From gaming to computational thinking: An adaptive educational computer game-based learning approach. Journal of Educational Computing Research, 59(3), 383–409. https://doi.org/10.1177/0735633120965919

26.

Huedo-Medina

T. B.

Sanchez-Meca

Marın-Martınez

Botella

(2006). Assessing heterogeneity in meta-analysis: Q statistic or I²index? Psychological Methods, 11(2), 193–206. https://doi.org/10.1037/1082-989x.11.2.193

27.

Jadad

A. R.

Moore

R. A.

Carroll

Jenkinson

Reynolds

D. J. M.

Gavaghan

D. J.

McQuay

H. J.

(1996). Assessing the quality of reports of randomized clinical trials: Is blinding necessary Controlled Clinical Trials, 17(1), 1–12. https://doi.org/10.1016/0197-2456(95)00134-4

28.

Kazimoglu

(2020). Enhancing confidence in using computational thinking skills via playing a serious game: A case study to increase motivation in learning computer programming. IEEE Access, 8, 221831–221851. https://doi.org/10.1109/ACCESS.

29.

Khoury

Lecomte

Fortin

Masse

Therien

Bouchard

Chapleau

M. A.

Paquin

Hofmann

S. G.

(2013). Mindfulness-based therapy: A comprehensive meta-analysis. Clinical Psychology Review, 33(6), 763–771. https://doi.org/10.1016/j.cpr.2013.05.005

30.

Lee

T. Y.

Mauriello

M. L.

Ahn

Bederson

B. B.

(2014). CTArcade: Computational thinking with games in school age children. International Journal of Child-Computer Interaction, 2(1), 26–33. https://doi.org/10.1016/j.ijcci.2014.06.003

31.

Lei

Chiu

M. M.

Wang

Xie

(2022). Effects of game-based learning on students’ achievement in science: A meta-analysis. Journal of Educational Computing Research, Advance online publication February 8, 2022. https://doi.org/10.1177/07356331211064543

32.

Lipsey

M. W.

Wilson

D. B.

(1993). The efficacy of psychological, educational, and behavioral treatment: Confirmation from meta-analysis. American Psychologist, 48(12), 1181–1209. https://doi.org/10.1037//0003-066x.48.12.1181

33.

Liu

X. Z.

(2019). Research on the cultivation of pupils’ computational thinking ability based on gaming teaching. [Master’s Thesis, Shenyang Normal University]. CNKI. https://kns-cnki-net-443.web.bisu.edu.cn/KCMS/detail/detail.aspx?dbname=CMFD201902&filename=1019057152.nh

34.

Z. Q.

Liu

Y. Q.

(2019). From project-based learning and pair programming to interdisciplinary integrated design: A meta-analysis based on international studies of K-12 computational thinking from 2006 to 2019. Journal of Distance Education, 37(05), 75–84. https://kns-cnki-net-443.web.bisu.edu.cn/kcms/detail/detail.aspx?dbcode=cjfd&dbname=cjfdlast2019&filename=ycjy201905009&v=%25mmd2bqz4sq7%25mmd2frqwage3worwq5%25mmd2bzzyme0n%25mmd2bwd2n6r3af9y4jw3uwtag5mwdwmkzten6cq

35.

Mao

Cui

Chiu

M. M.

Lei

(2022). Effects of game-based learning on students’ critical thinking: A meta-analysis. Journal of Educational Computing Research, 59(8), 1682–1708. https://doi.org/10.1177/07356331211007098

36.

Mou

(2011). Train computational thinking ability with “light game”: The effect of education game on the program design basic course teaching. Journal of Distance Education, 29(06), 94–101. https://doi.org/10.15881/j.cnki.cn33-1304/g4.2011.06.010

37.

Nahmias

A. S.

Pellecchia

Stahmer

A. C.

Mandell

D. S.

(2019). Effectiveness of community-based early intervention for children with autism spectrum disorder: A meta-analysis. Journal of Child Psychology and Psychiatry, 60(11), 1200–1209. https://doi.org/10.1111/jcpp.13073

38.

Chiu

M. M.

Cheng

Z. J.

(2010). Chinese children learning mathematics: From home to school. In Bond’s

(Ed.), Oxford handbook of Chinese psychology (pp. 143–154). Oxford University Press

39.

Papert

(1980). Mindstorms: Children, computers, and powerful ideas. Basic Books

40.

Prensky

(2001). Digital game-based learning. McGraw Hill

41.

Psycharis

Kotzampasaki

(2019). The impact of a stem inquiry game learning scenario on computational thinking and computer self-confidence. Eurasia Journal of Mathematics, Science and Technology Education, 15(4), Article em1689. https://doi.org/10.29333/ejmste/103071

42.

Qiu

S. J.

(2019). Research on game-based teaching design of information technology in senior high school based on the cultivation of computational thinking. [Master’s Thesis, Qufu Normal University]. CNKI. https://kns-cnki-net-443.web.bisu.edu.cn/kcms/detail/detail.aspx?dbcode=CMFD&dbname=CMFD201902&filename=1019042665.nh&uniplatform=NZKPT&v=MXLc-4gntV99Vnl8Z-y_4d4Z1ihDuikLcgeRn9ucu2owKXkHrncn5x8xJGY1DiHc

43.

Racey

O’Brien

Douglas

Marquez

Hendrie

Newton

(2016). Systematic review of school-based interventions to modify dietary behavior: Does intervention intensity impact effectiveness Journal of School Health, 86(6), 452–463. https://doi.org/10.1111/josh.12396

44.

Rose

S. P.

Habgood

M. P. J.

Jay

(2020). Designing a programming game to improve children’s procedural abstraction skills in scratch. Journal of Educational Computing Research, 58(7), 1372–1411. https://doi.org/10.1177/0735633120932871

45.

Ross

S. G.

Begeny

J. C.

(2015). An examination of treatment intensity with an oral reading fluency intervention: Do intervention duration and student–teacher instructional ratios impact intervention effectiveness Journal of Behavioral Education, 24(1), 11–32. https://doi.org/10.1007/s10864-014-9202-z

46.

Ryan

R. M.

Deci

E. L.

(2017). Self-determination theory: Basic psychological needs in motivation, development, and wellness. Guilford Publications

47.

Shah

(2015). Zippies and the shift in cultural values in India. In Pereira

Malik

(Eds.), Investigating cultural aspects in Indian organizations (pp. 31–56). Springer. https://doi.org/10.1007/978-3-319-16098-6_3

48.

Sitzmann

(2011). A meta-analytic examination of the instructional effectiveness of computer-based simulation games. Personnel Psychology, 64(2), 489–528. https://doi.org/10.1111/j.1744-6570.2011.01190.x

49.

Sun

M. Z.

(2021). Research on the influence of game design based learning on computational thinking skills and attitudes. [Master’s Thesis, Nanjing Normal University]. CNKI. https://kns-cnki-net-443.web.bisu.edu.cn/KCMS/detail/detail.aspx?dbname=CMFDTEMP&filename=1022405763.nh

50.

Tokac

Novak

Thompson

C. G.

(2019). Effects of game-based learning on students’ mathematics achievement: A meta-analysis. Journal of Computer Assisted Learning, 35(3), 407–420. https://doi.org/10.1111/jcal.12347

51.

Vygotsky

L. S.

(2016). Play and its role in the mental development of the child. International Research in Early Childhood Education, 7(2), 3–25. https://doi.org/10.4225/03/584fa3ec1610d

52.

Wang

(2021a). Research on the game-based instructional design of information technology in primary school based on computational thinking. [Master’ s Thesis, Shenyang Normal University]. CNKI. https://kns-cnki-net-443.web.bisu.edu.cn/KCMS/detail/detail.aspx?dbname=CMFD202102&filename=1021600710.nh

53.

Wang

Y. H.

(2021b). Research on the design of gamification learning process of junior high school programming oriented to computational thinking training. [Master’ s Thesis, Northeast Normal University]. CNKI. https://kns-cnki-net-443.web.bisu.edu.cn/KCMS/detail/detail.aspx?dbname=CMFDTEMP&filename=1021631052.nh

54.

Wang

Jin

(2010). Influence of task difficulty and feedback learning on ability of analogical reasoning in children. Psychological Development and Education, 26(1), 24–30. https://doi.org/10.16187/j.cnki.issn1001-4918.2010.01.003

55.

Warrens

M. J.

(2015). Five ways to look at Cohen’s kappa. Journal of Psychology & Psychotherapy, 5(4), 1–4. https://doi.org/10.4172/2161-0487.1000197

56.

Wing

J. M.

(2006). Computational thinking. Communications of the ACM, 49(3), 33–35. https://doi.org/10.1145/1118178.1118215

57.

Wouters

Van Nimwegen

van Oostendorp

van der Spek

E. D.

(2013). A meta-analysis of the cognitive and motivational effects of serious games. Journal of Educational Psychology, 105(2), 249–265. https://doi.org/10.1037/a0031311

58.

Yildiz

H. D.

Durak

H. Y.

Yilmaz

F. G. K.

Yilmaz

(2017). Examining the relationship between digital game preferences and computational thinking skills. Contemporary Educational Technology, 8(4), 359–369. https://doi.org/10.30935/cedtech/6205

59.

Zhang

(2017). Research on the influence of educational game on junior high school students’ computational thinking ability. [Master’s Thesis, Yan Bian University]. CNKI. https://kns-cnki-net-443.web.bisu.edu.cn/kcms/detail/detail.aspx?dbcode=CMFD&dbname=CMFD201801&filename=1017109910.nh&uniplatform=NZKPT&v=hXZ9smIA2U1emUD-VvG0iWsf1QgkDF3sqirPV8kxj1Gelgc65YTHTt8Dq0X_Bpnn

60.

Zhang

(2019a). The design and development of the computational thinking educating game which supports independent inquiry. [Master’ s Thesis, Inner Mongolia Normal University]. CNKI. https://kns-cnki-net-443.web.bisu.edu.cn/KCMS/detail/detail.aspx?dbname=CMFD201902&filename=1019842430.nh

61.

Zhang

(2019b). Research and construction of game-based teaching mode based on computational thinking in Scratch courses in primary schools. [Master’ s Thesis, Min Zu University of China]. CNKI. https://kns-cnki-net-443.web.bisu.edu.cn/kcms/detail/detail.aspx?dbcode=CMFD&dbname=CMFD201902&filename=1019192810.nh&uniplatform=NZKPT&v=hUeIGtxlP-8dKWDWvJtNULwZ2rpxV3-1XRElSXv354l5eceWA4TOF6gJ_WVTSpvt

62.

Zhang

Nouri

(2019). A systematic review of learning computational thinking through scratch in K-9. Computers & Education, 141, Article 103607. https://doi.org/10.1016/j.compedu.2019.103607

63.

Zhao

Shute

V. J.

(2019). Can playing a video game foster computational thinking skills? Computers & Education, 141, Article 103633. https://doi.org/10.1016/j.compedu.2019.103633

64.

Zhou

H. M.

(2021). Applying gamification principles to develop computational thinking for elementary students: Taking a Scratch programming class as an example. [Master’ s Thesis, Central China Normal University]. CNKI. https://kns-cnki-net-443.web.bisu.edu.cn/KCMS/detail/detail.aspx?dbname=CMFDTEMP&filename=1021160416.nh