Abstract
This study tested the effects of implementing a narrative computer-based educational game within a middle-school math class. Gameplay consisted of navigating through a virtual spaceship and completing missions by periodically engaging in learning-by-teaching activities that involved helping an avatar solve math problems. In a pretest/posttest matched-groups design, 58 middle-school students either played the game for 10 hours over 4 days in place of their typical math instruction (game group), or they received conventional math instruction that consisted of a matched set of practice problems (control group). Contrary to our hypotheses, results from posttest measures indicated no significant differences in learning outcomes or motivation between the two groups. Importantly, supplementary observational data indicated that students in the game group spent much of their time during gameplay engaging in activities unrelated to the educational content of the game (e.g., navigating the virtual world) and only 20% of their time engaging in learning-by-teaching activities. These results highlight the importance of designing educational games that effectively balance features intended to entertain learners and features intended to promote learning. Implications for implementing educational games into classroom instruction are discussed.
Introduction
There is growing interest among educators and instructional designers in the use of computer games for learning (Honey & Hilton, 2011; Kapp, 2012; Mayer, 2014). Although many strong claims are made about educational games (Gee, 2007; McGonical, 2011; Presnky, 2006), several reviews of the existing evidence report mixed findings and show the need to improve game design (Clark, Yates, Early, & Moulton, 2011; Mayer; 2014; O’Neil & Perez, 2008; Tobias & Fletcher, 2011; Vogel et al., 2006; Young et al., 2012). Specifically, a major challenge of effective game design is achieving an appropriate balance among features intended to motivate students and features intended to foster learning (Mayer, 2014). Features intended to entertain and engage learners risk distracting learners from the instructional goal, whereas features intended to promote learning may risk disrupting the flow of gameplay and demotivating students (Fiorella & Mayer, 2012). For example, games that involve a strong narrative theme might motivate students (Dickey, 2006, 2015), yet such games risk taking attention away from the learning material (e.g., Adams, Mayer, MacNamara, Koenig, & Wainess, 2012; Pilegard & Mayer, 2016). Thus, effective game design requires the integration of multiple perspectives to ensure that games both foster engagement and support the cognitive processes necessary for learning (Plass, Homer, & Kinzer, 2015).
Approaches to Educational Games Research
Research on educational games typically follows one of three approaches (Mayer, 2014): value-added, media comparison, or cognitive consequences. The value-added approach involves precise experimental comparisons between a base version of a game and a game with one added instructional feature. For example, research following the value-added approach has identified several ways to improve the design of educational games, such as adding self-explanation prompting (Johnson & Mayer, 2010), providing explanatory feedback (Erhel & Jamet, 2013), and following multimedia principles (e.g., Moreno & Mayer, 2002; Wang et al., 2008). The major strength of the value-added approach is that it allows the researcher to isolate the effects of specific game features and make causal conclusions. A possible weakness of this approach is that the value of educational games might depend on the presence of many different types of features simultaneously.
The media comparison approach involves comparing the effectiveness of an educational game to another medium that presents the same instructional content. A limitation of this approach is that it assumes certain media may be inherently more effective than others, such as educational games being more effective than lectures. As Clark (1983, 1994) has argued, learning ultimately depends on the instructional methods rather than the media. In other words, differences that appear to be due to the media often are actually explained by the way the information was presented, which could be modified within a given media. Therefore, it should not be surprising that the research evidence involving media comparisons is mixed (Clark et al., 2011). For example, studies involving games have found them to be both more effective than an equivalent textbook lesson (e.g., Barab et al., 2009) and less effective than an equivalent PowerPoint lesson (e.g., Adams et al., 2012). In short, the effectiveness of games (or other media) ultimately depends on how well the specific instructional features are designed to support learning.
Finally, the cognitive consequences approach involves testing the effects of playing an educational game compared with not playing a game or playing an unrelated control game. For example, research taking this approach has found that playing commercially available first-person shooter games can provide benefits for aspects of perceptual attention (e.g., Green & Bavelier, 2003, 2007). This approach is limited in its ability to identify specific game features that are effective (like the value-added approach) but does not assume that educational games are inherently more or less effective than other media (like the media comparison approach). It is also a more practical approach for the implementation of educational games in the classroom, although much of the relevant past research has focused on the acquisition of cognitive skills rather than academic content knowledge (Mayer, 2014).
In the present study, we apply a version of the cognitive consequences approach toward evaluating the effectiveness of implementing an educational game into a middle-school math classroom. Specifically, we tested the cognitive consequences of replacing part of students’ conventional math instruction with gameplay designed to teach the same math concepts (e.g., solving equations, rational numbers, and percentages). The educational game was designed to incorporate one primary entertainment feature and one primary instructional feature. The entertainment feature involved an elaborate narrative theme in which students navigated a spacecraft, interacted with a computer-based avatar, and completed various missions. Past research suggests that a narrative theme might be motivating to students but does not necessarily result in better learning (Adams et al., 2012; Pilegard & Mayer, 2016). Thus, the primary instructional feature involved opportunities to engage in learning by teaching during problem-solving tasks, in which students help the avatar choose correct steps to solve math problems. The next section summarizes the rationale for incorporating learning-by-teaching activities into educational games.
Learning by Teaching
Learning by teaching involves enhancing one’s own learning through the act of explaining the material to others (Bargh & Schul, 1980; Fiorella & Mayer, 2013; Roscoe & Chi, 2007). According to generative learning theory (Fiorella & Mayer, 2015, 2016; Wittrock, 1990), teaching others promotes learning because it encourages students to actively select the most relevant information, organize it into a coherent structure, and integrate it with their existing knowledge. Teaching can also encourage students to actively reflect on their own understanding of the material, especially when it involves meaningful interactions with a peer or a computer-based pedagogical agent (Roscoe, 2014; Roscoe & Chi, 2008).
In an early study by Annis (1983), college students learned better from a history lesson when they taught the material to a peer, when compared with students who only prepared to teach and those who studied normally. Recent research by Fiorella and Mayer (2013, 2014); Hoogerheide, Deijkers, Loyens, Heijltjes, and van Gog (2016); and Hoogerheide, Loyens, and van Gog (2014) extended these early results, finding that the act of explaining to others promotes persistent and meaningful learning outcomes. Specifically, this research has found benefits of explaining previously studied material to a fictitious peer by recording a short video lecture. Moreover, research by Roscoe (2014) and Roscoe and Chi (2008) demonstrates that answering deep conceptual questions from a peer can help students generate explanations that involves reflective knowledge building (i.e., actively elaborating on the material) rather than merely knowledge telling (i.e., passively restating the material).
Learning by teaching has also been incorporated within computer-based environments, particularly by having students interact with teachable agents (e.g., Biswas, Leelawong, Schwartz, Vye, & The Teachable Agents Group at Vanderbilt, 2005; Leelawong & Biswas, 2008). Teachable agents are virtual avatars that guide the learning-by-teaching process by asking questions and providing prompts to students. Students can interact with the agent by asking questions, explaining, selecting the next step in a solution, or providing feedback. Research suggests that interacting with a teachable agent, especially one that prompts students to actively process and reflect on the material, can be more effective than being tutored by an agent or engaging in other activities that do not incorporate learning by teaching (Biswas et al., 2005; Chin et al., 2010; Matsuda et al., 2013; Segedy, Kinnebrew, & Biswas, 2013).
In summary, learning by teaching is especially effective when students are provided with opportunities to engage in quality interactions with a peer or virtual agent. The educational game in the present study aimed to encourage students to engage in learning-by-teaching activities by helping a virtual agent solve math problems. Students received prompts to teach the avatar the next correct step in a problem by selecting it from a list of options or to determine whether a step chosen by the avatar is correct or incorrect. Thus, this form of prompting did not require students to overtly generate their own explanations of the material but instead consisted of more focused prompts targeting mathematical principles related to the problem-solving tasks. Past research on self-explanation prompting suggests that focused prompts such as selecting options from a list can result in better learning outcomes within computer-based learning environments (Wylie & Chi, 2014), including games (Johnson & Mayer, 2010).
Theory and Predictions
This research uses cognitive load theory (Paas & Sweller, 2014; Sweller, Ayres, & Kalyuga, 2011) as a general framework for understanding how different design features of games might influence cognitive processing during learning. According to cognitive load theory, students have a limited capacity to process new instructional content in working memory. Thus, instruction should be designed to minimize cognitive processing irrelevant to the instructional goal (i.e., extraneous load) and encourage students to engage in cognitive processing necessary for constructing knowledge (i.e., germane load). A major challenge of game design is to balance features that primarily serve to entertain and motivate students with features intended to guide students’ cognitive processing and promote meaningful learning (Habgood & Ainsworth, 2011)—what is referred to is intrinsic integration (Kafai, 1996). For example, a strong narrative theme may be entertaining to students but could create extraneous load and hamper learning if it is not closely tied to the instructional content of the game. Integrating effective instructional features such as learning-by-teaching activities may help focus students’ attention on the academic content and encourage students to engage in germane load by organizing and integrating the material with their existing knowledge.
The educational math game used in the present study aimed to achieve a balance between entertaining features (i.e., a strong narrative theme) and instructional features (i.e., prompts to engage in learning-by-teaching activities). Thus, we predicted that supplementing conventional instruction (consisting of direct instruction followed by problem-solving activities) with productive gameplay (learning by teaching) involving the same concepts would enhance student learning outcomes and motivation. Alternatively, if the entertaining game design features cause students to engage in excessive irrelevant behaviors during learning, such as spending a substantial portion of their time navigating the virtual world without engaging in learning-by-teaching activities, gameplay may not benefit or could even hamper learning compared with conventional classroom activities. In other words, cognitive capacity might be exceeded by extraneous load, and thus cognitive resources cannot be effectively allocated toward activities aimed at constructing knowledge (i.e., germane load), such as responding to learning-by-teaching prompts. In short, the primary goal of the study was to test the efficacy of an educational game designed to embed an effective learning strategy within an immersive technology-based learning environment.
Method
Participants and Design
The participants were 58 American students in the same seventh grade math class at a middle school located on a U.S. Air Force base in Germany. There were 29 boys and 29 girls, with an average age of 12.5 years (SD = 0.5). Twenty-eight students were assigned to the game group, and 30 students were assigned to the control group; the groups were matched for grade point average and gender. First, students were rank-ordered by their grade point average (acquired from their teacher), and each pair of students on the list were randomly assigned to either the game group or the control group. Then, minor adjustments to the group assignments were made to ensure an equivalent number of boys and girls in each group. Students in the game group participated in 10 hours of gameplay over 4 days as a replacement to their conventional classroom instruction. Students in the control group received conventional classroom instruction involving problem-solving practice with their normal math teacher and did not participate in gameplay. As described later, both groups learned about the same math concepts by solving a similar set of practice problems.
Materials and Measures
Computer-based materials consisted of a narrative educational game designed to teach middle-school math concepts. Paper-based materials consisted of learning outcome measures, a motivation questionnaire, and an observational measure.
Math game
Example Acquisition and Application Problems During Gameplay or Conventional Instruction.
Some of the problems in the game additionally prompted students to engage in learning-by-teaching activities. For example, students might be prompted to select the next step in the problem-solving process from a menu presented on the screen or asked whether they agree with a decision made by the avatar or recommend an alternative step. When incorrect options are chosen, the student is prompted to consider another option. These forms of prompting continue until the problem has been solved correctly. Thus, the learning-by-teaching prompts did not require students to actively generate their own explanations but instead allowed students to pick solutions from a list of options and make corrections when they selected an inaccurate solution step. Figures 1 and 2 provide representative screenshots of the game’s interface and the layout of a learning-by-teaching prompt. The game periodically prompts students to solve learning-by-teaching problems, but students can also choose to solve learning-by-teaching problems at any time throughout the game. If a student has not completed a learning-by-teaching problem after some time, they will be prompted to solve one to gain energy for the player’s character to continue navigating throughout the game. This feature aimed to integrate the teaching activities within the broader narrative and goals of the game and to ensure that all students had a similar number of opportunities to engage in learning by teaching during gameplay.
Screenshot of the virtual world and avatar in the math game. Screenshot of the learning-by-teaching prompt within the math game.

Learning outcome measures
Learning outcome measures consisted of items assessing knowledge acquisition, knowledge application, and knowledge organization. The knowledge acquisition and application tests were administered both before and after the intervention (with slightly different questions on the pre- and posttests), whereas the knowledge organization test was exploratory and administered only after the intervention. The tests were developed to target the concepts taught during the learning phase of the experiment—either during gameplay or during students’ regular math instruction. Each of the knowledge tests were informally pilot tested with a separate group of students to ensure the questions were at the appropriate difficulty level. The students’ teacher also reviewed the knowledge tests, and minor modifications to the questions were made prior to data collection.
The knowledge acquisition test (α = .69) consisted of 10 questions that required students to solve basic equation problems. The items were designed to align with the equation questions that students were prompted to solve during the game or during typical instruction. Each correct response was worth 1 point, for a total possible score of 10. Example items from the knowledge acquisition test included the following: “Solve for x: 12x − 480 = 9,648” and “What is x − 127, where x = 47?”
The knowledge application test (α = .51) consisted of six questions that required students to apply what they have learned to new situations. The items were designed to align with the application problems that students solved to continue navigating in the game (e.g., borrowing money from an ATM with an interest rate) or that students solved during their conventional math instruction. Each correct response was worth one point, for a total possible score of six. One item from the application test asked students: “Max is interested in buying a house that is 3,087 square feet and costs $301,999. How much does the house cost per square foot?”
The knowledge organization test (α = .92) was included as an exploratory measure and required students to sort a set of nine math word problems into categories based on their underlying mathematical principles. The test was developed based on previous work assessing the quality of students’ knowledge structures (or schemas) for representing different problem types (e.g., Chi, Feltovich, & Glaser, 1981; Quilici & Mayer, 1996). The nine problems consisted of three problem types (i.e., mean, median, and mode), each presented with three different cover stories (i.e., a reading teacher, a person deciding to see a movie, and a scientist measuring temperature). A problem involving mean included the following: A reading teacher is interested in how her class performed on this week’s quiz. Her ten students received scores of 86, 92, 74, 77, 95, 64, 82, 83, 82, and 90. She adds all of the students’ scores on the quiz and then divides by the number of students who took the quiz. Steve is trying to figure out whether he wants to see a new movie. He reads online how other people have rated the movie, from 1 star to 5 stars. When he orders the reviews from worst to best, he finds that the rater in the middle gave the movie 4 stars, so he decides to go see the movie.
Motivation questionnaire
The motivation questionnaire (α = .74) consisted of 14 items adapted from previous work (Blackwell, Trzesniewski, & Dweck, 2007; Midgley et al., 1998) targeting students’ motivation to learn and their beliefs and attitudes toward learning math on a 7-point scale. The following example items were included: “During class, I like harder information so that I have a chance to learn new things,” “I try to use the skills and strategies I learn in math class in my other classes,” “I like learning information that increases my curiosity, even if it is hard to learn,” and “I think the information in my math lessons is useful for me to learn.” Responses to the motivation items were totaled to form a composite motivation score, with possible scores ranging from 14 to 98.
Observational measures
As supplementary process data, two experimenters coded students’ behaviors while playing the math game. Due to the logistical and technical challenges associated with the design and implementation of a commercial game into the classroom, specific log data from the game is unavailable. Fortunately, the observational data helped provide insight into the extent to which students were engaged in on-task behavior during the game. Two experimenters coded behaviors of each student every 5 minutes for a 30-minute period on each of the 4 days of gameplay to provide an estimate of the percentage of time spent engaging in on-task and off-task behaviors. The two raters then discussed any discrepancies in their ratings to reach a consensus. Observations were coded based on the following four categories: (a) engaging in learning-by-teaching activities, (b) solving problems that did not involve engaging in learning-by-teaching activities, (c) engaging in game behaviors irrelevant to the academic content of the game (e.g., navigating through the game), or (d) engaging in irrelevant behaviors outside of the game (e.g., looking away from the computer, talking with another student). Therefore, the observational data were used to indicate the extent to which students engaged in germane instructional activities (e.g., solving learning-by-teaching problems) or extraneous activities (e.g., navigating through the virtual environment).
Procedure
The study consisted of three phases: pretesting, gameplay, and posttesting. In the pretesting phase, all students completed the demographics questionnaire, the knowledge acquisition and application pretests, and the motivation questionnaire. In the gameplay phase, the game group completed 6 sessions, ranging from 1.5 to 2.5 hours each, for a total of approximately 10 hours of gameplay over 4 days. Students completed the game individually in a computer lab during which time interaction with the other students was minimal. During the gameplay phase, the control group received their conventional math instruction, consisting of direct instruction from their teacher, followed by guided and independent practice exercises, without playing the game. The exercises covered the same underlying math concepts included within the game, though the specific practice problems were slightly different. Students in the control group worked on the practice exercises individually but also could ask for assistance from their usual math teacher. Finally, in the posttesting phase, all students completed the motivation questionnaire and the knowledge acquisition, application, and organization tests.
Results
Did the Groups Differ on Pretesting Measures?
Independent-samples t tests indicated that the two groups did not significantly differ on the pretest measures: the knowledge acquisition test, t(56) = −0.84, p = .403; the knowledge application test, t(56) = −0.09, p = .931; and the motivation questionnaire, t(56) = 1.28, p = .207. Thus, our matched-groups design appeared to create two homogeneous groups in terms of basic characteristics related to learning math.
Did Playing the Game Promote Learning Outcomes?
Means and Standard Deviations for Posttest Learning Measures Across Two Groups.
Note. Scores for acquisition and application are adjusted for pretest performance.
Did Playing the Game Promote Motivation to Learn About Math?
We then tested whether the two groups differed on their self-reported motivation and beliefs toward math. An independent-samples t test indicated no significant difference between the game group (M = 65.8, SD = 9.6) and the control group (M = 63.8, SD = 11.6) on the motivation questionnaire completed after the gameplay phase, t(56) = 0.71, p = .481. Apparently, experience playing the game did not greatly influence students’ beliefs and attitudes compared with experience with their conventional math instruction.
To What Extent Did Students in the Game Group Engage in On-Task Behavior?
Given that gameplay did not appear to benefit learning outcomes or motivation, we used the supplementary observational data to help determine the extent to which students were engaged in learning-by-teaching activities during the game. Across the five gameplay sessions, students were engaged in some form of gameplay most of the time (89%). However, only 20% of that time was spent engaging in learning-by-teaching activities. Approximately 50% of students’ time was spent engaging in problem-solving activities, with or without learning-by-teaching prompting. This suggests that much of students’ gameplay was spent engaging in extraneous behaviors irrelevant for learning, such as navigating the virtual world, interacting with the avatar, or otherwise attending to the narrative theme of the game.
Discussion
Findings from the present study suggest that the educational game did not significantly benefit student learning outcomes or motivation. Despite this result, the data provide some important insight into why gameplay did not appear to greatly influence learning. First, the findings suggest that the educational game did not effectively implement learning-by-teaching features in a way that supported deep cognitive processing necessary for meaningful learning. For one, the observational data indicate that only a small amount of gameplay was spent engaging in learning-by-teaching activities, such as choosing the next step in a problem or deciding whether the avatar has chosen the correct step. Instead, students apparently spent much of their time engaged in irrelevant activities, such as navigating the virtual world and attending to the storyline of the game. According to cognitive load theory (Sweller et al., 2011), these entertaining and potentially motivating features of the game may have created extraneous cognitive load, thereby hampering learning outcomes. In other words, the game may have caused too much extraneous load and not enough opportunities to engage in cognitive processing relevant for developing an understanding of the material (i.e., germane load). This suggests modifications are necessary to place a greater emphasis on the instructional features of the game and less of an emphasis on features of the game that are irrelevant for learning, which is a common challenge in educational game design. Of course, one limitation of our study is that we do not have equivalent time-on-task data for the control group. Thus, we do not know the extent to which the game group and the control group differed in how much time they spent engaging in task-irrelevant behaviors during learning. Because we cannot assume that students in the control group were consistently engaged during learning, further research is needed to explore how games and conventional instruction might differentially impact student engagement in relevant learning activities.
Another important consideration derived from the present study is the specific implementation of learning-by-teaching activities. Students were required to select the best explanation or correction to the avatar’s actions but were not required to independently prepare a lesson or explanation. Previous work on providing self-explanation prompts during computer-based learning (Wylie & Chi, 2014) suggests that selecting options from a list can be effective, yet the prompts used in the present study did not appear to encourage learners to engage in deep cognitive processing. It may be more effective to require students to actively generate their own explanations by responding to open-ended prompts (e.g., Fiorella & Mayer, 2013). Selecting items from a list also potentially creates the problem of students gaming the system by quickly clicking through prompts to see the answer without actively using the prompts as a learning tool.
One limiting factor potentially contributing to the results was the low performance on the knowledge application and knowledge organization tests, creating a possible floor effect. The measures were reviewed and edited by the students’ teacher prior to the study to achieve the appropriate difficulty level, yet students still appeared to struggle with these two tests. The knowledge application test presented word problems that required students to apply their knowledge of basic equations to new situations. This may have been too challenging given that students had only recently learned to solve the equations. The knowledge organization test may have been too difficult for a similar reason. Furthermore, students may not have fully understood the instructions of the sorting task used to assess knowledge organization. The present study applied methods of previous studies that have used sorting tasks for different problem types in statistics (Quilici & Mayer, 1996) or physics (Chi et al., 1981); however, these studies involved college students. One possible implication from the present study is that sorting tasks may need to be further modified to be more understandable to younger learners.
The present study followed a more practical cognitive consequences approach to educational games research. Further work should consider more systematic comparisons of specific game features that may contribute to motivation and learning. The current study was constrained by the logistical challenges of classroom research and thus was limited to comparing the math game to conventional math instruction, without access to fine-grained process data. This type of comparison is important from a practical standpoint to evaluate whether it is helpful or detrimental to replace conventional classroom instruction with gameplay designed to incorporate a particular learning strategy. At the same time, this design makes it difficult to determine which specific features of the game may have potentially helped or hindered learning. Future classroom studies should consider isolating the effects of specific features of games, such as by testing a version of the game with and without the learning-by-teaching feature (Mayer, 2014). Subsequent research should also consider extending the evaluation of games over a longer period of instruction. Students may need more time to get used to the mechanics of gameplay before they are able to focus their attention on understanding the math concepts.
In conclusion, this study contributes toward our understanding of the practical challenges of designing and implementing educational games into the classroom. Despite a considerable difference in what learners experienced (conventional instruction or gameplay), there was not a significant difference between the groups in learning outcomes or motivation. This suggests that implementing games or learning-by-teaching activities is not guaranteed to have a positive impact over standard forms of instruction. As past research indicates, games and learning by teaching can be useful tools, but learning ultimately depends on whether the learning environment is appropriately designed to foster productive cognitive processing. Although the game did not significantly benefit learning, another way to interpret these findings is that replacing conventional instruction with an educational game also did not appear to be detrimental. It is also possible that there are longer term motivational benefits associated with supplementing math instruction with gameplay. Further research is needed to better understand the cognitive and motivational consequences of implementing games as supplements to more conventional forms of instruction. The present study further highlights the need to consider the way in which design features of educational games influence students’ cognitive processing during learning.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the U.S. Advanced Distributed Learning (ADL) Initiative (Contract W911QY-16-C-0001) awarded to Jennifer J. Vogel-Walcutt. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the ADL Initiative or the U.S. government. The U.S. government is authorized to reproduce and distribute reprints for government purposes.
