Abstract
In this article, we present the application of a method for visualizing gameplay patterns observed in log-file data from a geometry game. Using VisCareTrails, a data visualization software system based on the principle of timed word trees, we were able to identify five novel behaviors that informed our understanding of how players were approaching the game. We further utilized these newly identified player behaviors by triangulating them with geometry test scores collected from players outside the game setting. We compared the predictive capacity of these behaviors against five demographic characteristics commonly observed to be associated with educational outcomes: age, gender, ethnicity, mother’s education, and attitude toward video games. Two of the novel behaviors we identified, both reflecting inflexible problem-solving strategies, outperformed all demographic variables except age in terms of predicting change in geometry test scores post-gameplay. We believe that this is sound evidence for the utility of VisCareTrails and the timed-word-tree method for identifying pedagogically relevant player behaviors from semi-structured data associated with educational games.
Keywords
Introduction
Game log-files are notoriously difficult to analyze. 1 Even within a simple game, with a limited number of moves, the number of possible combinations that a player can perform between initiation and completion can be substantial. 2 The question of how to make sense of this data is a considerable problem if the Learning Sciences are to make use of the recent explosion of online data sources. The following is an investigation into the use of a time-sequence log-file visualizer, VisCareTrails, to find pedagogically relevant patterns in player actions within a game.
Previous work
A number of visualization tools have been proposed to aid sense-making of log files. From the 1990s to the present day, time-sequence log-file visualization has been used to investigate computational performance, 3 usability, 4 and network security.5–8 Outside of computer science, log-file visualization has also had success within biomedical science, 9 chemistry,10,11 analysis of email log files to identify relationships within organizations, 12 improving the scientific processes, 13 and gaming. 14
More recently, log-file visualization has been applied to educational games15–17 though historically the field of Educational Data Mining (EDM) has tended to rely on non-visual, unsupervised learning techniques such as clustering or semi-supervised techniques such as semantic tagging for pattern finding. 18 EDM’s sister field, Learning Analytics (LA), has tended to focus on highly processed data, although some exploratory visualization has been utilized with linked data 19 and unstructured data from online forums, particularly MOOCs. 20
The visualization tool
VisCareTrails
VisCareTrails is a visual analytic system originally developed to mine medical records and identify when decision patterns are associated with positive (recovery) or negative (death) outcomes. By summarizing multiple sequences of timed-events, it “provides the user a means to explore electronic health data in order to understand patterns, problems and opportunities in clinical practice.” 21 The central idea behind the tool is the concept of timed word trees. Timed word trees are a way to bring structure to semi-structured data, namely, patient descriptions and time of description. The timed word tree is made up of two components: an event sequence and a word tree. The event sequence consists of a series of time-stamped events 22 that can be represented as an event tree to display the structure of the sequence dependencies across time. A word tree places this structure onto a sequence of words and uses that structure to arrange those words spatially. 23 A timed word tree incorporates not just the sequence order but the timing of the sequence to encode the distance between words. This is essential because timing conveys important information within the medical context, where all sequences for an aggressive form of cancer diagnosis end in death, the sequence will always be the same but the time between events may not. It is highly desirable to be able to identify whether some patient sequences occur over months and others over years. Timed word trees allow for this level of granularity in analysis and VisCareTrails provides a user friendly, simple drag-and-drop interface 23 to achieve this. In previous research using this software, the developers were able to identify more effective patterns of medical testing that could be used to increase survival rates. 24
VisCareTrails and game logs
VisCareTrails is ideally suited for data that has definitive start and end points within the system such as the initial consultation and death in the medical scenario. This is not dissimilar to the structure of a game, though, where a player initiates a level and either completes or fails to complete that level. As such VisCareTrails could be a useful tool to identify patterns within the game that lead to success or failure. Games are also highly time dependent, in which a similar sequence with different timing may denote a meaningful difference between player strategies: a thoughtful player versus a player who is just randomly trying different moves. Within a learning game, where academic outcomes are a primary goal, these patterns could be psychologically and pedagogically relevant to a students’ success more generally.25,26 Therefore, the following study was intended to determine whether the VisCareTrails tool and the timed-word-tree method that it utilizes could be a useful exploratory visualization technique for educational games: a way to identify relevant learner attributes that could be leveraged to aid educational goals.
The game
Noobs vs. Leets (http://create.nyu.edu/dream/) is a single player puzzle game that was designed to teach middle school students the basic geometry concepts of angles. 27 The game runs in a web browser via the DREAM (Digital Reference of Experiments and Assessments Manager) application on desktop or laptop computers. The game is divided into six chapters, each consisting of 8–10 levels and focusing on a specific angle concept. To complete each level, the player, in control of the “Noob” character, unlocks angles by clicking on the angle or angles to be solved and then selecting the correct rule (acute, obtuse, right, straight, complementary, supplementary, and vertical). By doing so, players open paths that can be traversed by the character in order to free a fellow “Noob.” For example, a player could click on an unlocked (known) angle and a locked (unknown) angle that together sum to 90° and then select the complementary rule button to unlock that angle. Each level within a chapter is designed to increase difficulty as the player progresses. A key difficulty component is the introduction of the “Leet” character. The “Leet” is introduced to force players to strategize about which paths are usable and which angles need to be unlocked or avoided. Immobile (purple) “Leets” create obstacles that block pathways, and mobile (orange) “Leets” chase and kill the player if their path is unlocked. Figure 1 presents a screenshot of the game interface.

Noobs vs. Leets. The goal is to open angles to create a path that leads to the cage.
In Noobs vs. Leets, 28 the first six levels of chapter 1 create a scaffolded introduction to the major game mechanics. In chapter 1, level 7, the player encounters their first challenge to correctly identify the angles in question without guidance from the game. There, players have to choose among four different angle rules (right, acute, obtuse, and straight) in order to unlock the angles to create an open path that leads to the end goal, a cage holding a fellow Noob. An immobile (purple) “Leet” is also present at this point of the game. However, chapter 1, level 9 is the last level of the first chapter of the game, allowing the player to open the cage through their choice of a set of possible solutions. Here, two mobile (orange) “Leets” are present. These levels are depicted in Figure 2.

Chapter 1: (a) level 7 and (b) level 9.
Level 7 presents five angles that can be unlocked, while level 9 presents eight unlockable angles. Chapter 1, level 7 is the first time that the player interacts with all the game mechanics without scaffolding, and it is the first application stage where we can expect players to make mistakes regarding the learned angle rules.
Methods
Analytic strategy
We had two aims within this study. The first is to use VisCareTrails and the timed-word-tree method to detect and visualize relevant behavioral patterns among players in the Noobs vs. Leets game. The second was to determine whether these attributes were relevant to players’ overall understanding of geometry. To do this, we assigned players’ attributes based on the visualizations and then looked for relationships between these attributes and pre- and post-test scores on a geometry test that was independent of gameplay. As a point of reference, we also compared the ability of our derived player characteristics with demographic characteristics commonly associated with educational attainment. In this way, we were able to uncover game-related behaviors that are associated with geometry understanding in a more generalizable sense, allowing for a richer understanding of the educational impact of the game and possibly leading to novel pedagogical interventions.
Visualization: VisCareTrails interface
Figure 3 illustrates the visualization tool interface. The left panel presents the events registered in the data file. The bottom panel shows the sequence of each event that was registered by the tool. On the main canvas, the x-axis encodes sequences of timed events, while the y-axis shows all branches of events. Branch thickness represents the number of samples in that event at a single time. The tree can be rooted by any given event. Here, all events are rooted by an abstract event @BEGIN . In practice, this event was used to represent the starting point of a game session. The visualization allows both coloring and labeling of branches to facilitate the identification of patterns among players and the grouping of branches according to specific characteristics.

Visualization tool interface. The main canvas displays a possible timed word tree. Vertical lines represent time intervals. Branch thickness shows how many players performed a specific action in a given time.
Data
Log-file data were utilized from previous research by Biles and Plass, 27 containing data recorded from 117 different middle school students in sixth through eighth grade from urban schools in a major Northeastern metropolitan city. Standard CREATE (Consortium of Research and Evaluation of Advanced Technologies in Education) log-file format includes basic information about the game state and time-stamped player actions, such as the angle and rule selected, tied to the player’s unique ID number. The general structure of the log is depicted in Figure 4, following the guidelines presented by Chung and Kerr 29 (Table 1). Players were also given a demographic survey to collect data on their age, grade, gender, ethnicity, mothers’ educational level, and questions regarding their attitude toward computer games. Two geometry tests were administered, one immediately before and another immediately after 30 min of gameplay. Data were collected in compliance with the requirements of the Institutional Review Board of New York University (Table 1).

Log tables.
Log entries.
Comparing variable importance
The calculation of variable importance is a controversial topic, and the number of approaches is considerable and none is without issue. 30 We have therefore utilized three different methods to ensure some coverage of the differing problems that can be encountered: the first was to manually build a series of multivariate regression models, informed by domain knowledge of geometry education and gameplay. The second was to automatically detect variable importance using a Random Forest classifier and the third was also an automatic detection, the calculation of Mutual Information.
Multivariate regression
General linear models were implemented in R. Variables’ relative importance was compared based on the significance of their relationship with post-test scores at an alpha level of 0.10, R2 values reflecting the proportion of variance that each variable explained, and the absolute value of the t-statistic reflecting the importance of the variable to the overall prediction.
Random Forest
Random Forests™ are supervised learning techniques that are well suited to the comparison of nonlinear data and comparison of variables. The Random Forest will generate a measure of variable importance through Out-of-Bag (OOB) sampling to measure prediction error. Though this method is susceptible to bias importance toward correlated predictors, an adjustment proposed by Strobl et al. 31 is generally accepted to minimize the problem. The adjusted Random Forest was implemented in R using the cforest command within the party package. A total of 100 trees were grown and two input variables were randomly sampled as candidates at each node.
Mutual information
Mutual information measures the reduction in uncertainty about one random variable given knowledge of another. 32 It is given by the following equation
Mutual information is a preferable measure to determine variable importance compared to Pearson correlation due to the resulting bias when analyzing categorical data. Mutual information was implemented in R using the entropy package. A pairwise calculation was done on the relationship between change in geometry test score and each of the 10 student’s characteristics. Mutual Information is sensitive to the number of levels within a variable and will weight variables with more levels with greater importance. Therefore, all predictors were converted to binary forms, continuous variables were converted to above and below the median, and categorical variables with multiple levels were converted to majority/non-majority membership (EG ethnicity was converted from five categories to White/non-White). This resulted in a considerable loss of variance.
Cross validation
Cross validation of both the multivariate regression and Random Forest models was performed to provide information on the generalizability of the models; 10-fold cross validation of the regression model was performed in R using the DAAG package. And 10-fold cross validation of the Random Forest model was performed using the rfcv function from the RandomForest package with variable importance reassessed at each step of the variable reduction.
Results
In this section, we will present some of the visualizations obtained using the timed-word-tree tool, as well as some statistical tests performed to validate the identified patterns.
Visualization patterns
Expected patterns of behavior
Using VisCareTrails, we were able to identify gameplay trends that we would expect to find. Figure 5 shows filtering of the log file by the NOOB_DIED rule, demonstrating for each player, a path composed of the rules that he/she got wrong. By looking at the thickness of the lines and the rules associated with those lines (appended to the end of the line), we can identify error trends with respect to different angles. As can be noted in Figure 5(a), in chapter 1, level 7, 59 of 119 players (50%) committed a mistake. Also, most of them (41 players) committed more than 1 mistake, which means that they had to play that level more than once to finish it. By observing the thickness of the blue paths (students who committed three mistakes), we can also observe that the majority of these mistakes involved acute angles, while students who committed a single mistake (green paths) tended to utilize obtuse angles. At the end of chapter 1, level 9, we would expect to see fewer players make mistakes and this can be confirmed by Figure 5(b), which shows that in level 9, 32 players of 104 who played this level (30%) got at least one answer wrong. Moreover, fewer players committed more than one mistake at this point of the game as can be seen in the thinner blue line in Figure 5(b) relative to Figure 5(a). It is also possible to observe that the majority of mistakes in level 9 are now related to the right angle rule as students who make more than one mistake tend to utilize this rule. This trend is observable by looking at the label sizes and number of labels referring to “RIGHT.”

Wrong answers in chapter 1: (a) level 7 and (b) level 9. Each rule is depicted by a single color. Green paths show players who committed a single mistake, as yellow ones mark players who committed two mistakes and blue paths players who got three or more wrong answers.
Selection of inactive angles
Figure 6 shows the various rules that each player selected. A gray angle (Figure 2) is considered inactive, that is, it cannot be solved, while yellow or locked angles are the active ones, for which the player can select and apply an angle rule. This is a key feature of the gameplay in Chapter 1 and we would not expect players to select inactive angles in these levels. Despite this we can observe that in both levels 7 and 9, some players keep selecting rules for inactive angles. However, the number of players who present such a behavior decreases by the end of the first chapter of the game.

Rule selection in chapter 1: (a) level 7 and (b) level 9.
Simple versus complex rules
In chapter 2, level 1, VisCareTrails was able to identify a new pattern that distinguished players. In this chapter, players are introduced to complementary rules in a guided mode. After this introduction, they can use a single complementary rule and in subsequent levels for example, in Level 3 (Figure 7), more than one complementary rule can be applied. However, the player can also complete these levels without using complimentary rules at all. By visualizing the player data, filtered by wrong answers and the first correct successful path for each player, we were able to identify the choice of whether or not to utilize a complimentary rule when it was available.

Chapter 2, level 3.
As depicted in Figure 8(a), one can note that among the players who committed a mistake, the majority of them were related to the newly introduced rule. It is also possible to observe that some players who got a wrong answer when applying the new rule persisted with the same mistake. Through Figure 8(b), however, we can observe that a similar percentage of players used the complementary rule whenever possible as the number of players who did not.

(a) Wrong answers and (b) successful paths in chapter 2, level 3. Paths in (a) are encoded in the same way as in Figure 5. Red labels present complementary rules and orange labels show simple rules. In (b), green paths display correct answers using complementary rules whenever possible. Red paths, on the other hand, generated a successful path comprehend by simple rules only. Finally, blue paths display answers formed by a mix of simple and complex rules.
The same pattern of behavior was observed for the last level of chapter 3 (Figures 9 and 10). In this chapter, players are introduced to the supplementary rule, and again we see that students either attempt to use the new rule or avoid it. A total of 18 players got to this level, and among them, as depicted in Figure 10(a), 5 were killed when required to use the new rule. What was different in this level was that, as can be seen in Figure 10, all players who were successful in this stage made use of either the supplementary or complementary rules.

Chapter 3, level 8.

(a) Wrong answers and (b) successful paths in chapter 3, level 8. Paths in (a) are encoded in the same way as in Figure 5. Red labels present supplementary rules, purple labels complementary rules, and orange labels show simple rules. In (b), green paths display correct answers using complementary rules whenever possible. Finally, blue paths display answers formed by a mix of simple and complex rules.
Again, in chapter 4, we were able to distinguish the same pattern even within the four players who reached that level. In chapter 4, level 5 (Figures 11 and 12), players were able to start the vertical rule. Among four players who made to this stage of the game, all of them had success in finishing it, though only half of them opted to use the new rules to do so.

Chapter 4, level 5.

Chapter 4, level 5, successful paths. Green paths display correct answers using complementary rules whenever possible. Red paths, however, generated a successful path comprehend by simple rules only. Finally, blue paths display answers formed by a mix of simple and complex rules.
From these visualizations, we developed two new features to describe students’ gameplay: simple rule use and complex rule use. Complex rule use denotes when a student used only complementary, supplementary or vertical rules to solve a level, while simple rule use denotes players who eschew the use of complex rules, relying only on the most basic rules to solve the level. The number of times each player used either of these strategies was then calculated and averaged over the number of levels they attempted to reach a numerical score for further analysis.
Validating new features
Outcome
Though the distributions of pre- and post-geometry test scores are highly irregular, there appears to be improvement on average between the two test sessions (Wilcoxon Signed Rank: W = 30,000, p < 0.0001). Though 14 players did score worse in the post-test, subtracting the post-test score from the pre-test generates a normally distributed change score distribution (Kolmogorov–Smirnov: D = 0.1, p = 0.1), suitable for regression analysis. This change score was used as the outcome for determining variable importance for this reason and also because it represents the total effect of learning in the sample, a trait of substantive interest.
Purposeful regression
First, a purposeful regression approach 33 was taken to construct a series of multivariate regression models that predicted change in geometry score. These were based on an understanding of how students play the game, the suite of demographic variables, and the variables developed from the visualizations. The most parsimonious model included the demographic variables age and gender and two of the new variables: use of only simple rules and use of only complex rules. No other variables demonstrated a significant relationship with the change in geometry test score. Table 2 shows the R2 values and parameter estimates for these variables independently and together. A model that includes all variables was also generated to compare variables based on absolute value of variable t-statistics.
Multivariate regression models predicting change in geometry test score.
p < 0.001, **p < 0.01, *p < 0.05, +p < 0.10.
Mutual Information
Mutual Information scores for all pairwise combinations of change in test score and the other variables showed a somewhat different pattern. The use of only simple rules was the most important feature, followed by age and gender. Use of complex-only rules was only the sixth most important feature.
Random Forest
Mean Decrease in Accuracy for a Random Forest model predicting change in geometry score generates a third ordering of our variables. Again, use of simple-only rules is strongly associated with change, and it ranks third after student’s age and gender. Use of complex rules is ranked fourth.
Cross validation
Given that there is no definitive order of importance between the three methods, cross validation can give us an idea of how many of these variables we should consider. A 10-fold cross validation of the regression model shows use of simple rules only and gender have generalizable predictive strength at an alpha level of
A 10-fold cross validation of the Random Forest model does not show a great deal of improvement with the addition of more variables. Greater prediction accuracy is not achieved by the addition of more than two or three variables with a change in error rate between 1 variable and 10 variables of only 0.23. For this model, that would mean age, gender and the simple rule strategy would be included.
Discussion
This study investigated the utility of the VisCareTrails visualization software for uncovering patterns in the gameplay of an educational geometry game. We identified a number of patterns and used five of these to generate variables to classify students. To determine whether these classifications had utility not only within the game but also for geometry education more generally, we then used them to predict changes in geometry test score over time. We achieved this by comparing the predictive capacity of these variables against five demographic variables using three different prediction models (Figure 13).

Variable importance with respect to predicting change in geometry test score as measured by absolute t-statistic (red), mutual information (green) and Random Forest importance (blue). Measures have been scaled to be represented on the same graph. Faded bars represent those variables that are very unlikely to have a relationship with change in score.
Visualization
Variable importance
Across the three importance measures, there was general consensus that four variables were of more relative consequence: age, gender, the simple-only strategy, and the complex-only strategy. By far the most reliably important variable appears to be age, and this is not surprising. Developmental factors, as well as greater exposure to geometry, games, and tests, could all explain why age shows a strong positive relationship to score change. The association between score change and gender is somewhat more ambiguous, both in terms of its relative importance and the reasons behind the association. There is a straightforward numerical reason for male players’ greater improvement from pre- to post-test
Of the five variables that we generated from the visualizations (acute angle use, complimentary rule use, inactive angles, simple rule use, and complex rule use), two demonstrated predictive capacity beyond that of the majority of demographic variables. These reflected two wheel-spinning strategies: 34 the persistent use of a single strategy irrespective of the problem context. The first of these strategies was the application of only simple rules (acute, obtuse, straight, and right angles) in an attempt to solve a level. On average, 10 attempts to solve levels with a simple-only strategy are associated with a 10% lower change score between pre- and post-tests. This would seem to have a straightforward explanation, and complex rules are more cognitively demanding and are introduced after simple rules. As the game is designed to teach the complex rules, players may not have as much knowledge or experience with using the more complex rules compared to the simple rules. Players therefore may default to using the easier simple rules when they are stuck.
A potential intervention developed from this analysis would be to introduce adaptivity within the game’s level progression in order to provide struggling players more exposure to complex angles in a manner designed to reduce extraneous cognitive load. Players demonstrating this simple rule strategy would receive an extra series of levels that have a more moderate increase in difficulty. By decreasing the amount of alternate paths and guiding the player through several more applications of the complex angle, the extra practice may benefit struggling players by increasing their confidence in applying the more complex rules. Alternately, the game designer could also explore different methods of introducing and reinforcing the angle concepts within the game by embedding more contextual information about the angle properties. One possibility would be to provide more detailed and targeted feedback when the player incorrectly applies a complex rule. For example, if the player applies the complementary angle to two angles that are supplementary, the game could display an animated overlay of the complementary angle being drawn. The player can then clearly visualize the difference between the two angle concepts, providing added scaffolding of the fail state and potentially providing a valuable moment of self-reflection on gameplay decisions.
Perhaps the more interesting wheel-spinning pattern seen is the related strategy of using only complex rules though. This seems counter-intuitive if the complex rules are more conceptually difficult, but may represent players who are struggling even more than those who are using the simple rules–only strategy. It is possible that a player has memorized the full catalog of angles but does not understand what they mean. In this case, they may be either randomly choosing angles or taking cues from the game that may be unrelated to the angle problem. For example, the player could be possibly conflating a difficult configuration of Leets with the need to use more conceptually difficult angles. However, this is merely speculation and further investigation would be needed to determine an appropriate intervention. For the players who are randomly choosing angles, a possible solution would be to provide the added scaffolding during the fail state as previously discussed. This would help students who are blindly guessing without a deeper understanding of the angle rule properties.
Generalizability
The majority of the variables are likely unrelated to the change in geometry test score as demonstrated by cross validation. Cross validation of both the regression model and the Random Forest points to the importance of age, gender, and the simple-only strategy, with an association between score change and the complex-only strategy being borderline. However, since the consequences for having a false positive in this case are fairly benign, at worst resulting in a suggestion of more exposure to geometry, having a relaxed tolerance for Type II error is acceptable. With this in mind, if we consider the commonly accepted 5% chance of seeing an association between variables at random, we would expect 1 in 20 of the new variables we found with VisCareTrails to have an association with change score purely by chance. Of the five variables that we found and tested, we have identified two that have a relationship with change score. Based on this result, we conclude that VisCareTrails can identify new, consequential variables from log-file visualization.
Alternatives
It is fair to ask whether there are alternatives to VisCareTrails available that may have produced the same result. Traditionally, these patterns might be identified using pivot tables and matrix algebra, possibly visualized using scatter plots to identify relationships. It is unlikely that these strategies would have yielded the same two variables though as VisCareTrails was used to both simultaneously define the complex and simple categories and visually determine their magnitude. It is likely though that the two characteristics would have been identified once models of relationships were built. However, this is a far more labor-intensive process than VisCareTrails. We are confident given this experience that VisCareTrails allowed for a streamlined data analysis process.
Conclusion
Overall, we are confident that VisCareTrails was useful for uncovering both rare and interesting gameplay patterns, as well as pedagogically relevant traits for geometry learning—the reliance on simple or complex rules to solve geometric problems.
Moreover, this trait is more predictive of performance than many demographic identifiers for predicting learning. Positively, these characteristics are actionable, with the possibility to both identify struggling students and suggest possible remediation. Furthermore, within an online environment data about player, demographic characteristics may be unknown and unethical or difficult to collect. Tools such as VisCareTrails allow us to efficiently identify patterns within gameplay data that can be used to devise small-scale educational interventions and to improve and iterate learning game design principles. By understanding which types of procedural behaviors are associated with potential learning impediments or non-adaptive learning strategies, learning game designers can improve the game’s ability to either prevent the problem through game design decisions or to introduce adaptive features like personalized level progression and supplemental scaffolding. Although VisCareTrails is chiefly aimed at the identification of interesting patterns in gameplay and analysts currently need to export data into contingency tables to generate numeric values to the patterns, future versions of the software could easily include this functionality. Tools like these are necessary if the promise of technologically aided education is to be realized, such tools will be indispensable to both developing the Learning Sciences and to aid in the adaptivity and personalization enterprise.
Footnotes
Acknowledgements
We thank Dr. Lauro Lins for giving us the VisCareTrails code used in this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: We would also like to thank the McGraw-Hill Educationfor providing their generous support in funding this work. Claudio Silva has also been funded by NSF, NASA, DOE, and the Moore-Sloan Data Science Environment at NYU.
