Abstract
The development of a serious game combines the skills of numerous disciplines, from subject matter experts on the topic being taught; to story developers, game designers, and software developers; to instructional designers, educational assessment scientists, and others. This section provides commentary on the Intelligence Advanced Research Projects Activity’s unique game development program, Sirius, where multiple games with the same training goal were independently developed and tested by different teams. We compare the experience of two of these teams not only in game design but also in how skills of various disciplines were woven together to produce and validate their games. Lessons learned are reviewed to provide guidelines and takeaway points to assist game development practitioners in their future efforts to create effective serious games.
The objective of the Sirius program was to teach people, by means of serious gameplay, how to alter their cognitive behavior in order to become less susceptible to bias and consequently more effective in solving various cognitive tasks. Our experience on the Sirius program indicates that it is necessary to employ a multidisciplinary approach and a tightly integrated team to successfully develop this type of training game.
The two Sirius teams reviewed in these lessons learned had expertise in a variety of fields that were central to the objectives of the Sirius program: education theory, experimental psychology, game design, multimedia communications, software engineering, and many others. The strongest tension was between the game development practice and demands of the social science theory being taught. It was the source of the greatest risk to the project, but when properly managed, led to remarkable creativity. Both the CYCLES team, led by University at Albany, and the Missing team, led by Leidos, achieved this equilibrium by having all team members productively involved at all stages of the project. In this article, we focus primarily on Phase 2 of the Sirius program, although our observations apply equally to the results of Phase 1.
Theory-Driven Design
Perhaps the most consequential decision our teams had to make was how to approach the game design. Given the objectives of the program, the key considerations were (a) teaching methodology and (b) bias elicitation and mitigation techniques. Without an effective teaching methodology, the games simply would not act as effective training interventions. The CYCLES teaching philosophy is based on a modified Observe-Orient-Decide-Act (OODA) loop that was effective in improving analytic thinking in military trainings (Boyd, 1995). During the course of the project, however, the CYCLES team realized that it was not enough to let players observe and orient to their environment and that explicit and active instruction on bias definitions and mitigation strategies was needed. Thus, the modified and refined OODA loop, which included in-game quizzes, became the highly effective teach–play–test process. The Missing team came to a similar realization, with expanded teaching module content after each play segment that gave players an opportunity to test their knowledge and receive feedback before applying it in subsequent gameplay.
Incorporating bias elicitation and mitigation techniques into the design of the games was likewise critical. Instructional content for the games was based on a cognitive bias framework of the specific cognitive biases being targeted, including bias elicitation and mitigation approaches derived from the literature (Barton et al., 2015; Symborski et al., 2014). The Missing game design approach identified points at which the biases overlap with regard to common causes and potential sources for mitigation. This yielded an efficient game that treats the origins of multiple biases at their common source and allows players to generalize their learning across different problems and portions of the game to other biases.
The Game: Find a Teaching Strategy That Is Also a Game
Once the theoretical framework and educational imperatives for the Sirius games had been laid out, it was necessary for each team to design and build a game that incorporated these principles. Should it be a skill game, an adventure game, or an exploration game? Should it have a narrative? Should it be a virtual world that draws the player in? This is the fun, creative part of any game development project, and the two teams addressed this differently.
In CYCLES Carnivale, a casual, puzzle-style game, ludic interactions are based in a carnival setting replete with decision-making challenges in the form of familiar carnival games. The narrative, in which the player must successfully navigate decision-making challenges to escape the carnival, is contextualized through this series of self-contained puzzles. The Missing games took a different approach, incorporating an overarching narrative from which players’ tasks arise naturally in the context of an evolving story. The game narrative leads the player to resolve a mystery surrounding the story’s protagonist, with ludic interactions included to establish context for teaching and feedback.
A key takeaway in this regard was the realization that, regardless of game genre, we did not need a game that also teaches bias mitigation; instead, we needed a teaching strategy that was a game. With respect to addressing these pedagogical goals, both games were either simplified or adapted more closely to our teaching strategies (Barton et al., 2015; Martey et al., 2014; McKernan et al., 2015; Shaw et al., 2016; Symborski et al., 2014) across multiple game iterations to achieve maximum game efficacy. The result is a teaching game where every element is consequential: There is no room for extraneous content that could distract players’ focus on learning.
Measure, Measure, Measure, and Fix
After picking the right education theory, the right bias elicitation and mitigation strategies, and then settling on the right game design, we still needed to make sure that all elements worked together optimally. We ran a series of experiments required by Intelligence Advanced Research Projects Activity’s (IARPA), which were used to evaluate game efficacy as well as to support another important goal: optimization of the game. After each experimental cycle, we carefully examined where the game had fallen short and made necessary fixes. For example, did we need an additional mitigation strategy to teach anchoring bias effectively? Was there too much text to read, too many new terms to absorb? And why did some players get stuck in parts of the game? Herein lies the lesson: In the process of optimization, a serious game will need to be revised, many times, which means that the game architecture should be designed to allow for easy revisions and repairs.
Three Experimental Cycles
The Sirius program was designed in two phases, each phase consisting of three experimental cycles. Each experimental cycle entailed measurements of game progress, ranking teams on their progress toward the IARPA performance metrics. There was enormous pressure to produce the best possible initial game, so that the first experiment in each phase would show strong progress toward the program objectives. The second experiment was a gamble: If something did not work in the first game, several alternative game revisions were typically available. Therefore, it was critical to have the third try to demonstrate improved performance.
During each experimental cycle, we assessed immediate and long-term player learning outcomes using a pre-/post-/follow-up test design. The scale of this testing, with thousands of participants, multiple games, and research sites with the same research goals, provided a rare opportunity to compare results between teams. We used the immediate and long-term bias mitigation data from this testing to determine what game content needed to be improved, which was important to the Sirius program’s success.
What Matters and What Does Not
The core experimental aspect of the Sirius program included a series of manipulations known from social science literature to have an impact on training effectiveness. Testing of the Missing game manipulated spaced repeat play, game duration, reward structure, narrative depth, visual/audio fidelity, first versus third person view, and priming of participants. CYCLES game testing examined spaced repeat play, game duration, reward structure, narrative depth, visual fidelity, player interactivity, and structured analytic techniques. The open questions were whether these manipulations would work in the games and whether they would lead to the most effective game design. We found that, with very few exceptions, these manipulations did not have an effect on knowledge transfer. Spaced repetition, where the game was played twice with several days between, was the only independent variable that had consistent, strong effects across teams, and universally improved long-term learning outcomes (Clegg et al., 2015). Nonetheless, the data indicate that the games became increasingly effective teaching mechanisms from one experimental cycle to the next, suggesting that our efforts at tightening and optimizing the games’ educational content were more important than the experimental game manipulations.
Cognitive Load and Overload
Refining the amount of cognitive effort demanded from players was a continuous theme throughout the project. This led us to remove many elements that did not contribute directly to learning, to avoid impinging on player’s cognitive load. The final CYCLES game deliberately cut the amount of teaching for the toughest bias (anchoring) in order to maximize the game’s overall performance, after we noted a decline in effectiveness due to players reaching their cognitive load limits. We believe that a better option would be to split off complex teaching into multiple games. The Missing games were partitioned into three episodes, each episode containing a play segment and a teaching segment of roughly equal duration. This promoted balance between play and learning and enabled complex teaching to be spread out.
Playtest the Pedagogy
Playtesting is an essential part of any game development process for discovering software defects and optimizing player experience. Playtesting was also useful, prior to efficacy testing, to verify implementation of the intended teaching content and to increase the likelihood of a successful efficacy test. We learned the value of prototyping theory-driven design in a way that could be easily iterated by subject matter experts in an interactive preview before moving on to formal playtesting. This provided value by verifying the design from a pedagogical perspective prior to game production, minimizing development iterations.
No One Likes to Be Wrong
It is well accepted that feedback to players is crucial to learning, but in order for feedback to be beneficial, the player needs to be receptive to it. Based on playtester feedback of the Missing games, we note that players tended to get defensive if feedback was written in a manner that came across as accusatory (e.g., “You were wrong,” “You were biased”), which presumably can make them less receptive to the training. By rephrasing feedback to soften the language and direct criticism at players’ answers rather than the players themselves (e.g., “Your answer might have been biased…”), while attributing correct answers to the players’ own cleverness (e.g., “You were right!” or “You were unbiased!”), we improved player receptivity to the feedback. Similarly, the CYCLES team made a considerable effort to carefully craft the language in the game in order to achieve an optimal balance of constructive feedback and encouragement that would keep players engaged and motivated.
Conclusion
The Sirius program sponsored development of multiple games to mitigate biases. Two of the best performing games from this program, CYCLES and Missing, had several factors in common with respect to their pedagogy design. Both games taught core definitions and mitigation strategies of the biases and included frequent practice and feedback directly related to bias mitigation and recognition. They also included real-world examples built into the game and tasks in a variety of settings and activities for understanding and knowledge transfer outside of the game environment. A multiyear large-scale test campaign using thousands of participants provided a rare opportunity to compare results across teams and to optimize game educational content to improve overall learning outcomes.
In retrospect, we found it useful to adopt a serious game design strategy that emphasized teaching over gameplay, where game components are simplified and ludic interactions focused on setting a real-world context for teaching and feedback. An iterative design approach that refined the teaching strategy informed by quantitative performance and empirical data yielded game designs with no extraneous content to distract players’ focus on learning. These lessons learned may benefit others when developing their own serious games.
At the conclusion of the program, many open questions remain and our research into effective education games does not end here; however, it is now clear that teaching cognitive bias mitigation is possible and that games can be significantly more effective for this purpose than more traditional, passive learning methods. We believe that the success of the Sirius program will lead to new applications of game-based educational technology outside of the Intelligence Community: in business, medicine, law enforcement, and whenever critical decisions need to be made in the face of incomplete, uncertain, or contradictory data.
Footnotes
Authors’ Note
The views and conclusions contained in the article are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, AFRL, or the U.S. Government.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Intelligence Advanced Research Projects Activity (IARPA) via the Air Force Research Laboratory (AFRL) contract number FA8650-11-C-7175.
