Abstract
Automated content generation for educational games has become an emerging research problem, as manual authoring is often time consuming and costly. In this article, we present a procedural content generation framework that intends to produce educational game content from the viewpoint of both designer and user. This framework generates content by means of genetic algorithm, and thereby offers designers the ability to control the process of content generation for various learning goals according to their preferences. It further takes into consideration how the content can adapt according to the skill of the users. We demonstrate effectiveness of the framework by way of an empirical study of human players in an educational language learning game aiming at developing early English reading skills of young children. The results of our study confirm that users’ performance measurably improves when game contents are customized to their individual ability, in contrast to their improvement in uncustomized games. Moreover, the results show that the lowest proficiency participants demonstrated greater improvements in performance while playing the customized game than did the more highly proficient participants.
The adaptivity represents a promising route for educational games to shape education. According to Gee (2010), players learn best in games that offer properly organized problems that push them toward the outer limits of what Gee calls their “region of competence.” As Csikszentmihalyi’s (1990) theory of flow suggests, this feeling of a game’s content and difficulty being matched to one’s competence provides optimal playing satisfaction. However, educational games have tended to offer a static sequence of task difficulty. While educational games are often designed with a fixed progression of task difficulty, there have been calls for dynamic tailoring of difficulty on a per-player basis (Ben, 2011; da Rocha Seixas, Gomes, & de Melo Filho, 2016; Noroozi, 2016; Zook et al., 2012). Dynamic difficulty adaptation is a challenging problem; given the current broad diversity of player background skills, preferences, and motivations, this matching is typically difficult or impossible to achieve with any single, fixed progression (Metzger & Paxton, 2016; Noroozi, McAlister, & Mulder, 2016; Qian & Clark, 2016). Procedural content generation (PCG) offers an answer to this dilemma, in that it generates content automatically through algorithms, independent of a human designer (Shaker, Togelius, & Nelson, 2014). Thus, PCG offers a few distinct strengths in contrast to human-made content. First, it builds on extant methods of content development to generate additional learning content, which can serve to accelerate the rate of learning. Second, it can alter game content to more effectively suit the needs or skill level of individual users. Third, because it can generate new content on demand, it eliminates the lag time between users needing new materials, and human designers being able to prepare them.
As PCGs are put to use in educational games, one must take into account the roles of both players and designers. While the former play the game itself, the latter are the ones who initiate the learning. As such, designers do not merely configure the basic settings or parameters: instead, they play an active part in educational games. For instance, they track player performance and can produce new content in accordance with the learning objectives (LOs) or even regenerate a game’s contents according to a user’s performance. For this reason, we understand the development of content to be an iterative and interactive procedure that requires the involvement of player and designer alike. We maintain that there is a pair of matters to tackle regarding the generation of content for adaptive educational games. One involves the extent to which a designer controls content generation: rather than being limited to setting the initial parameters in a learning exercise, a designer ought to be given a greater degree of control. In other words, a content generation system should grant a designer the ability to steer the content generation in accordance with his or her discretion regarding various LOs. A second issue involves player adaptation: because players possess variable levels of skill—given different levels of experience, learning habits, individual characteristics, and so forth—it is important that the content generation system cater to different players with suitable game contents. Our article seeks to address these two issues in advancing the design and outcomes of an interactive content generation framework. This system facilitates designer oversight concerning specific LOs and the generation of individually tailored contents.
In this article, we approach content generation as a process of optimization, in which one aims for the sequence of educational contents that will maximize given criteria. These criteria include a mix of desired sorts of educational materials, difficulty catered to individual users, and a plausible educational intention. Employing a genetic algorithm (GA), we produce ordered sequences of educational content from authored domain knowledge of the given content, internally ordered groups of educational materials, and the effect of the educational materials on content quality. In contrast to systems that plot out an action sequence (Hodhod, Cairns, & Kudenko, 2011; Hooshyar et al., 2016; Niehaus, Li, & Riedl, 2011; Porteous, Teutenberg, Pizzi, & Cavazza, 2011; Riedl, Stern, Dini, & Alderman, 2008), we more efficaciously employ evaluation criteria to select a content set that best addresses a user’s particular needs and can produce a variety of contents. We show the workings of our system in a children’s language learning game designed for advancing early English reading skills. Our approach can be easily extended to other educational games, we argue, on account of its flexible domain knowledge structures and generalizable model of player abilities. The key contribution of this article is its ability to generate customized and individualized contents, by considering both the designer’s preferences and player’s skill levels, for almost all types of educational games.
Related Works
PCG is an important technique for computer game development and is likely to be of ever greater importance in the future. The primary purpose of PCG is to enhance game replayability by offering users a different experience with every new game. An initial instance of such a use of PCG comes from the game Rogue (Toy, Wichman, Arnold, & Lane, 1980), which produced a new dungeon for every game a player played. Furthermore, PCG enhances game adaptability (e.g., Harrison & Roberts, 2014; Luo, Yin, Cai, Lees, & Zhou, 2013; Rowe, Mott, & Lester, 2014; Smith, Andersen, Mateas, & Popović, 2012), which entails altering a game dynamically through procedural methods in response to events within the game.
Recent efforts have focused extensively on adaptive games, and particularly the question of how to make game content adaptive to player’s preferences. For instance, in the Mario AI competition, entrants seek to develop a system capable of producing game content for an individual user’s enjoyment (Shaker et al., 2011). Similarly, Galactic Arms Race (Hastings, Guha, & Stanley, 2009) is a game that attempts to design individualized weapons according to the playing style of the user. To date, however, there have been few attempts to employ PCG for educational ends; those that have had an instructive dimension have primarily employed PCG for matters of adaptability or replayability (Hartsook, Zook, Das, & Riedl, 2011; Hullett & Mateas, 2009). The game Refraction, for instance, teaches fractional arithmetic and relies on PCG to create levels and introduce mathematical concepts in accord with a given player’s skills (Smith et al., 2012). Hullett and Mateas (2009), meanwhile, articulate a system for creating training scenarios within the domain of firefighting—for example, buildings partly collapsed or engulfed in flames, from which trainees must save victims. In this case, the generated scenario is the environment itself, rather than any tasks or events taking place within it. PCG has also been employed in the adaptive production of conflict resolution scenarios in the SIREN project (Grappiolo, Cheong, Togelius, Khaled, & Yannakakis, 2011). Finally, Bellotti, Berta, De Gloria, and Primavera (2009) seek to optimize LOs along a given learning curve by way of experience management.
Our efforts in developing a content generation framework are particularly indebted to Zook et al. (2012), with respect to the heuristic-based method we use for generating content. Our work prioritizes not just adaptiveness to a player’s demonstrated abilities, however, but also the preferences and controlability of the designer. Thus, our framework facilitates designer control over generated content and the corresponding training objectives and likewise gives players the ability to dictate the degree to which the game’s content will focus on each LOs. Having taken both designer preferences and player abilities into account, our system generates the content.
Testbed Language Learning Game
Data-driven language learning game (DLLgame) is a web-based adaptive learning game developed at Korea University in The Republic of Korea. Its goal is to foster children’s early English reading skills through a pair of activities, one involving alphabet knowledge and the other involving phonological awareness. To promote alphabet knowledge, DLLgame instructs players concerning lowercase and uppercase letter shapes and corresponding sounds by having them hear a letter’s name, trace its shape on the screen, and view pictures of things starting with the spoken phoneme (Figure 1). Note that an effective way of helping children learn the alphabet is to present them with familiar words that correspond to a letter—so, for instance, when introducing “A” one might give examples of words beginning with A (“apple,” “ant,” etc.; Goffredo et al., 2016; Kartal & Terziyan, 2016; Rambli, Matcha, & Sulaiman, 2013). Thus, when children familiarize themselves with these words, they can make connections between the shape of a letter and what it sounds like and will subsequently be able to pronounce it when they see it. In DLLgame, a letter is shown along with related objects.
Learn A to Z.
When all letters of the alphabet have been reviewed, the player listens to a single phoneme, while an assortment of graphemes (target and distractors) is shown on the screen. The player must identify the grapheme that corresponds to the sounded phoneme and select it by clicking on it. Instantaneous feedback then tells the player whether the selection was right or not. If the player selected wrongly, he or she is informed of the name of the letter and is instructed to select a second time. Players thus have the chance to enhance their letter recognition automatically in a game-like setting, by recognizing and choosing the right letter among others floating on the screen. When players click on the correct letter, they score points. These letter recognition games employ embedded assessments so that the system can speedily determine a player’s familiarity with letters (uppercase and lowercase).
When the player masters the relation of phonemes to graphemes, DLLgame automatically advances to phonological training (early decoding skills), promoting phoneme recognition by segmenting the initial sound along with pictures of things starting with the sounded phoneme. In addition to promoting early decoding skills, its goal is to implant vivid associations between letters and things starting with those letters, such that those things will subsequently act as cues for the corresponding letter sounds, see Figure 2.
An easier content generated with more guidance for user A (the left figure) while more difficult content with less guidance generated for user B (the right figure) based on their skill level.
The proposed content generation framework has been applied in DLLgame for producing new customized contents. The contents in DLLgame are designed with a pair of LOs in mind. LO1 is to gain alphabetic proficiency by identifying mnemonic images which recode graphemes into familiar objects, which at the same time form a connection between sound and letter learned. LO2 is to build phonological proficiency by segmenting the first sound of words along with pictures of things starting with the sounded phoneme. A player must identify and select either such pictures or the pertinent mnemonic image. While the given graphemes are displayed alphabetically for all players alike, the corresponding object pictures for each grapheme are chosen according to a player’s knowledge level by the GA-based scenario generation framework. Hence, DLLgame produces game contents that adapt to a player’s strengths or deficiencies in knowledge.
DLLgame functions not as a substitute for in-school reading pedagogy but rather as a supplement. If the registered DLLgame user (either the child’s teacher or guardian) logs into the game, the DLLgame web server will record the child’s usage data. This comprise logs detailing session times, playing durations, and the player’s responses to learning tasks, surveys, and tests.
The Heuristic-Based PCG Framework
Our work frames content generation as a process both interactive and iterative, a cycle of generation-learning-evaluation of which designer and player are a part. It begins with a designer generating game contents in accordance with his or her preferences concerning LOs, relying on a rough estimate of a hypothetical player’s abilities. A player then plays these scenarios, and his or her performance is observed. Once this performance is evaluated, the system can then generate new contents; the generation-learning-evaluation cycle begins anew. What this iterative process facilitates, then, is continual fine-tuning of the initial estimate of a player’s abilities. In turn, contents grow increasingly adaptive and can offer new challenges to players whose performance demonstrates improvement.
Figure 3 summarizes our content generation framework. It begins with the designer establishing the contents—configuring its settings and domain knowledge but also dictating the degree to which LOs are exercised within it. To this end, the designer establishes the LO intensities (i.e., his or her preferences concerning the exercise of specific LOs in a given content). Let’s assume n LOs in a content, then vector of LO intensity set by designer can be represented as Framework of the proposed content generation.
In addition to LO intensities, our framework gives the designer control over another important input: the estimated skill level of a player, represented as a skill level vector
Once the designer has established the LO intensities and player’s estimated knowledge level, and input any other requisite learning materials, the content generator then produces instances that reflect the designer’s LO preferences and adapt to the player’s abilities. A generated content instance comprised a sequence of learning boxes, each box containing a list of learning materials that are to be staged according to the given constraints. A designer or domain expert can pre-author a set of these boxes, and then the content generator can produce a content instance by assembling a set of suitable boxes.
The content generator is programmed to produce different content instances every time: if repeatedly given a set of identical inputs, it will produce sets of content instances that are closely related but nevertheless distinct. This offers a distinct advantage, in that it allows for the collection of data on the extent to which a player is improving over different sessions, since the performances occur in sessions of similar difficulty. This information concerning player performance can then be used by the designer to evaluate the effectiveness of the learning, which in turn allows the designer to begin a new iteration of the content generation process.
GA in the Proposed Framework
A GA-based heuristics search is employed to uncover the optimal sequences of learning boxes aligned with the designer’s inputs. GA is a popular method for combinatorial optimization (Hendrikx, Meijer, Van Der Velden, & Iosup, 2013); it begins with a randomly generated population (in our case, sequences of learning boxes) and iteratively assesses and alters the individual elements of this population according to a given fitness function. In each iteration, a subset of poor individuals is replaced while good ones are retained. The algorithm ceases when it locates an individual of the desired quality or when sufficient time elapses. Figure 4 shows a flowchart of an interactive GA.
Flowchart of genetic algorithm.
There were two steps to our construction of the initial population for the GA. The first step comprised randomly generating sequences from the learning box collection for every stage of the content, Bs = (b1, b2,…, bq). The second step takes these box sequences for each stage and randomly generates box sequences for the whole content. These content-wide box sequences B = (b1, b2,…, bm) are formed by selecting at random box sequence Bs for each stage s and combining them.
The initial population, having been constructed, is then iteratively evolved and evaluated. At this juncture, we evaluate each box sequence according to the extent to which its aggregated LO intensity values align with the desired values of the designer’s inputs. The suitability of a box sequence requires one to compare its aggregated intensity vector
Computational Results for the Parameter Settings.
As the results demonstrate, the target value can be attained in the combination of a population of 150 with 100 or 200 generations. For the sake of less time spent computing, the lower number of iterations was chosen. Figure 5 presents the optimal objective function and average objective function values over the course of the evolution process.
Best and average objective function values through the evolution process.
There are two primary advantages in using GA for our content generation. One involves the fact that the fitness evaluation in GA grows out of a total possible solution. Because of this, various overarching design criteria can be readily used for assessing the efficacy of the solution. Beyond comparing aggregated LO intensities against the sought-after values, our work can put additional design criteria into the fitness function, for example, content diversity concerning discrete boxes in a sequence or the soft ordering constraints among boxes. A second advantage is that GA-based searching can produce not merely one solution but rather a set of optimal solutions (such as, in the present case, content instances). This augments systemic replayability because, as stated before, it prevents a player from encountering the same content twice.
Experiment
To prove the effectiveness of the framework in question, we performed an empirical study that involved a children’s language learning game designed to promote early reading skills in English. The study is aimed at assessing the given content generation framework in terms of its capacity for generating contents individually catered to a player’s abilities. We will next outline the methodology of the study and its outcome.
Participants
This present study was carried out in three elementary schools in The Republic of Korea, in an urban area of Seoul. A total of 150 children from 14 preschool classes were asked to participate in it. Of these, 120 children met the age requirements of the study (turning 5 or 6 years old). Eighteen of these children were dropped from analyses on account of their not being present for the collection of pre- or posttest measures, so the final sample comprised 102 children (33 female, 69 male). The parents or caretakers of these children received letters which outlined the goal of the study and requested permission for the children to partake in it and all granted permission. The prior knowledge of the participants were the same, and their early English reading skills were extremely limited—as would be usual for most Korean young children in the same level. The participants are randomly assigned into two groups (fifty one children for each group) The values for the effect size of the study as a measure commonly used to determine the effects of experiments were then computed. An effect size of 0.88 was obtained, which is comparable with those published by Albacete and VanLehn (2000).
Study Methodology and Procedure
Teachers received a user manual detailing the game contents and how to play them. Children were first introduced to the DLLgame. Brief explanations on how to use the application were provided before the start of the study. Children were then allowed to freely explore the DLLgame, and their actions and behavior were recorded. Following this, the experiment was carried out in two phases for each participant in the study. In Phase 1, a set of pre-generated game contents were played by participants (both groups). How the players performed against LOs was recorded, and his or her current proficiency was estimated by means of a simple player model which derives a first-order, linear approximation of proficiency from performance. Each participant’s average performance was recorded and mapped to individual performance categories. From these mapped performance categories, each player was assigned an integer value between 1 and 5 to denote proficiency in the LO.
Phase 2 of the experiment tested the capacity of the proposed framework to generate contents customized to the skills of individual players. The estimation of a player’s proficiency gathered from Phase 1 became the input value for the content generation system, which then generated a set of customized content instances. As a point of comparison, we generated an additional set of uncustomized contents for the second group by assigning players randomly selected skill levels (either greater or less than their estimated proficiency). Each participant in the first group was asked to play a “customized” scenario, whereas each participant in the second group played an “uncustomized” scenario.
It should be noted that we kept the LO intensities unchanged across all content generations (LO1 = 8, LO2 = 9). In determining the relative efficacy of the generated contents, we consider to what extent performance of a player rose within the game. The customized and uncustomized sets of scenarios were similarly organized into pairs of instances, generated with identical inputs (i.e., the skill level of the player is fixed for both). Thus, in each pair, the content instances are identical with respect to difficulty level. To determine performance gain (PG) in play, the results of the previous performance are subtracted from the subsequent performance, then divided to the subsequent performance and multiplied by 100. PG was calculated for both groups, and the average value for each set was determined. We hypothesized that the customized scenarios would outperform the uncustomized ones with regard to average PG.
Result and Discussion
Since the language learning game comprise a pair of Ls, each LO is evaluated in terms of its own specific measurement. The LO1 (alphabetic principle acquisition) is calculated in terms of the number of correctly identified mnemonic images divided by the sum total of mnemonic images (P1). The LO2 (phonological awareness development) is calculated in terms of the number of correctly selected object pictures that begin with the spoken phoneme divided by the total number of pictures of objects that begin with the spoken phoneme (P2). At the end of each round of the game, P1 and P2 were recorded, and the PG of P1 and P2 were then derived accordingly for each pair of the scenarios based on the methodology as described previously.
Using the estimated proficiency levels of the players derived in Phase 1 of the experiment, we sectioned the participants into five subsets for the LO, each subset comprises participants judged to have similar proficiency. Then for each subset, we examine the average improvements in performance across all scenario pairs, comparing the customized sets against the uncustomized ones. Figures 6 and 7 show the outcomes of this comparison: they demonstrate that, across all five subsets, participants enjoyed a greater increase in performance in playing the customized scenarios than in playing the uncustomized ones. Moreover, the results for the customized scenarios show that even the lowest proficiency participants (subsets 1 and 2) demonstrated greater improvements in performance than did the more highly proficient participants. This accords with the intuitive notion that beginner players can make rapid improvements more readily than can experienced players.
Average PG for learning alphabetic principle (LO1). Average PG for development of phonological awareness (LO2).

The Average PG for the Customized and Uncustomized Scenarios.
Note. PG = performance gain.
The results of study are consistent with findings from several researchers (e.g., Luo et al., 2013; Zook et al., 2012), showing that the heuristic-based method can produce a population of potential solutions. Our efforts in developing a content generation framework may be similar to those mentioned with respect to the heuristic-based method we used for generating content; however, our work prioritizes not just adaptiveness to a player’s demonstrated abilities but also the preferences and controlability of the designer. Thus, our framework facilitates designer control over generated content and the corresponding training objectives and likewise gives players the ability to dictate the degree to which the game’s content will focus on each LO. Having taken both designer preferences and player abilities into account, our system generates the customized content which results in improvements in player’s performance in contrast to uncustomized games.
Conclusions
In this article, we have put forward a PCG framework for educational games, one which can produce game content with the roles of designers and users equally in mind. This framework does not share the focus of interactive storytelling research but rather seeks to offer the designer significant control over the LOs, which are in turn adapted to players’ individual skills. Our approach can be easily extended to other educational games, we argue, on account of its flexible domain knowledge structures and generalizable model of player abilities. We demonstrate effectiveness of the framework by way of an empirical study of human players in an educational language learning game aiming at developing early English reading skills of young children. The results of our study confirm that users’ performance measurably improves when game contents are customized to their individual ability, in contrast to their improvement in uncustomized games. Moreover, the results show that the lowest proficiency participants demonstrated greater improvements in performance while playing the customized game than did the more highly proficient participants. We plan to further develop our content generation framework along a few different paths. For one, the framework’s construction will be improved by means of finer mechanisms by which the designer can control content generation in terms of preferences and constraints. Furthermore, GA-based optimization might well be combined with constraint satisfaction methods to achieve this end. Finally, scenarios of greater complexity and difficulty are required to better evaluate the efficacy and scalability of this framework.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the ICT R&D program of MSIP/IITP in the Republic of Korea (grant number 2016(B0101-16-0340b and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (grant number R1610941).
