PCG/PCGML evaluations: Introducing panda evaluation using the soft launch

Abstract

This study takes a new perspective on the procedural content generation (PCG) evaluation problem, extracts current PCG evaluation methods from previous works, and presents a novel classification of these methods while showing each method’s capabilities. Also, the present study introduces a novel concept called Panda Evaluation. Additionally, the soft and hard launches were presented as two evaluation methods and possible building blocks of PE. A group of papers was analyzed to understand previous works and find new opportunities. In doing so, some missing PCG evaluation areas were found, and some new methods were proposed for future PCG evaluations. To the best of our knowledge, this is the first time these concepts have been presented in PCG evaluation.

Keywords

Procedural content generation (PCG)platformer soft launch panda evaluation (PE)machine learning (ML)evaluation graph

1 Introduction

Digital games are becoming more and more popular worldwide [12]. One of the proposed solutions for more competitive game content creation is “procedural content generation” which can have several benefits in the game development industry, such as decreasing content creation cost, the ability to create infinite content, etc. [1].

Content in PCG research is most of what is contained in a game: levels, textures, quests, stories, game rules, items, vehicles, maps, weapons, characters, etc. [82].

PCG is not online or offline player-generated content. It is the algorithmic creation of game content with limited or indirect user input [102]. It is a young research topic and not only it has been used in much game-related research, but also several practical papers other than games like architectural models [24, 78]. Figure 1 shows a brief overview of the most popular keywords in PCG related studies. The map in Fig. 1 was created based on the data retrieved from

Fig. 1

Popular keywords in PCG research field.

Table 2’s first-row search query (All PCG resarch in the computer science filed). This query was run on the Scopus database [79] on 2020/12/25, most popular keywords were selected. In this map, keywords that share the same papers are connected.

Table 2

Search queries used to mine PCG research data from the Scopus database

Data Name	Search Query	Number of Papers
All PCG Resarch in the computer science filed	(“Procedural Content Generation”) AND (LIMIT-TO (SUBJAREA, “COMP”))	1396
PCG In Platformer Game	(“Procedural Content Generation”) AND (((“3d platform”) OR (“platform game”) OR (“2d platform”) OR (mario) OR (platformer) OR (spelunky))) AND (LIMIT-TO (SUBJAREA, “COMP”))	476

1.1 Procedural content generation

Procedural content generation is a growing research field that studies the automatic creation of game content using algorithms. Research on content generation techniques should complement research in semi-automatic evaluation of the generated content [39].

1.2 Procedural content generation via machine learning (PCGML)

Machine learning addresses the question of how to build computers that are automatically improved through experience [44]. It is an evolving branch of computational algorithms designed to copy intelligence or humans through environmental learning [66]. In the field of artificial intelligence, machine learning has emerged as the method of choice for developing practical software for content generation applications [5, 15] or several general applications, including natural language processing, speech recognition, controlling robots, computer vision [44], gesture recognition [3], algorithm optimization [71], etc.

In recent years, there has been more use of machine learning in a PCG called procedural content generation via machine learning. PCGML is suited for repair, critique, and content analysis because it focuses on modeling existing content [96].

1.3 Brief comparison of PCG and PCGML

Although several approaches have been proposed in PCG research, much work remains to be done to characterize the quality, learnability, interestingness, utility, even playability, and other elements that may be important for users’ experience.

On the other hand, challenges in machine learning approaches are sometimes different from other PCG methods. For example, playability cannot be guaranteed easily in machine learning approaches for procedural generation of 2d platformer games [53], and some studies could not guarantee playability [31–35 , 76].

However, some other studies guarantee playability, such as [103], in which an agent was used to check the playability of the generated content of introduced PCGML, or [90], in which playability constraints were used to its core sampling algorithm. Furthermore, in some PCGML research, playability is a value to be optimized or a constraint to be satisfied [92].

That it is sporadic in current PCG with machine learning approaches to test the system results with an actual player to see if their methods’ outputs are usable in game development or not (more details in Section 3, Fig. 5).

Fig. 5

Case study methods and their evaluation methods. Notably, five methods did not use any kind of evaluation method.

On the other hand, unlike search or grammar-based PCG, PCGML does not require hand authoring of original content or rules [32]. Instead, PCGML is typically framed as the task of fitting a generative model to full-scale examples [46] and relies on existing content and black-box models, which can be challenging to tune or tweak without expert knowledge. This is especially problematic when a human designer needs to understand how to manipulate their data or models to achieve the desired results [33].

Table 1 presents a brief comparison between PCGML and other PCG methods. It should be noted that PCGML algorithms also need labeled data in the case of supervised learning methods [96].

Table 1

A brief comparison between PCG and PCGML

	PCG	PCGML
Need a machine learning expert		√ [48, 33]
Need learning		√ [72, 46]
Need data set		√ [72, 46]
Require hand authoring of initial content or rules	√ [33, 46]
Usually, guarantee playability	√ [89 , 9]
Mixed authorship	√ [47 , 55]	√ [33, 32]
Stay on a specific style	√ [19, 20]	√ [32, 34]
Generate adaptive content	√ [40]	√ [57]

1.4 What is a platformer game?

Platformer games are a video game genre and subgenre of action games. In a platformer, the player-controlled character must jump and climb between suspended platforms while avoiding obstacles. Environments often feature uneven terrain of varying heights that must be traversed. The player often controls jumps’ height and distance to prevent their character from falling to death or missing significant jumps [86]. Famous examples of platformer games are Super Mario Bros [26], Mega Man [68], Donkey Kong Country [56], Kirby’s Adventure [37], kid Icarus [67], Sonic [80].

Selecting platformer genre as the case study enables us to consider different evaluation methods in these papers. because the platformer genre is one of the deeply studied evaluations compared to other genres.

1.5 What has been done in this paper?

In this paper, a brief history of PCG evaluations is presented, and then popular evaluation methods are staged and analyzed, focusing on each method’s pros and cons. What each evaluation method can and cannot do is also reviewed briefly. Our main purpose here is to find the missing part of executing PCG evaluation in current studies, introduce a novel discipline of evaluation called “Panda Evaluation,” and redefine soft launch and hard launch as a critical tool in the PCG evaluation area.

Throughout this research, several research papers and their evaluation methods are analyzed and compared. In the end, two algorithms are presented using flowcharts, PE, and soft launch.

Also, it is suggested to use analytics tools or questionnaires in the hard launch, which are costly but very powerful in demonstrating the actual performance of almost any PCG. It is concluded that each of the presented evaluation methods can complete a piece of the evaluation puzzle. The Recommended Panda Evaluation (RPE) is presented to minimize evaluation costs while maximizing evaluation quality and performance.

2 Case study papers

To study current PCG methods and choose case study papers, the following steps were taken. First, a brief analysis of previous PCG methods and their evaluation was provided, focusing on platformer games. Second, the platformer genre was chosen as a case study, enabling us to consider different evaluation methods in these papers.

2.1 Criteria for choosing case study papers

Chosen case study papers are those that introduce a complete PCG method in the platformer game genre or used their proposed method within a produced platformer game. The reason is to make it possible to analyze their evaluation method and compare these evaluations with others.

So the papers that did not use their new PCG method to produce different levels of a platformer game were not chosen as case study papers. Choosing PCG for the platformer genre as a case study

In the present study, PCG for the platformer genre was chosen as the case study for studying PCG evaluation methods. Platformers are one of the oldest yet most popular digital game genres [18, 65]. Although platformers are very important in PCG research, the gameplay level is the most popular form of the genre content to be generated procedurally [7]. Furthermore, in the present study, data research has been done on the Scopus database. checking platformer game genre popularity in recent PCG studies. The search queries were presented in

Table 2 and the analysis of the results is represented in Fig. 2. As shown in Fig. 2, more than one-third of all the PCG research (including non-game research) are in PCG for the platformer games field.

Fig. 2

This pie chart shows research in PCG for platformer games, versus all the research in the PCG area.

Therefore, the platformer genre is one of the deeply studied contexts compared to other game genres. Some tools are currently only available in PCG for platformers, for example, the level metric for measuring the production capabilities of PCG, which is often only used for platformer games.

2.2 Case study papers overview

Table 3 shows a list of research papers collected here as a case study of PCG evaluation methods. Each row of the table is includes a unique research paper, year of paper publication, authors, paper citations based on Google Scholar [30] and also, the bold used methods in each paper.

Table 3
List of case study research papers for PCG evaluation methods, containing publication date, the number of citations based on Google Scholar reports, and the author’s name

Title Ref. Authors Year Citation Type Bold Method

1 Launchpad - A Rhythm-Based Level Generator for 2-D Platformers [89] Smith et al. 2011 86 PCG Design heuristic

2 Procedural level generation using occupancy-regulated extension [59] Mawhorter and Mateas 2010 64 PCG Geometry Assembly

3 Evolving levels for Super Mario Bros. using grammatical evolution. [81] Shaker, et al. 2012 116 PCG Design heuristic / Evolution

4 A procedural procedural level generator [47] Kerssemakers et al. 2012 43 PCG Evolution

5 Procedural content generation for platformers - designing and testing FUN PLEdGE [73] Mazza et al. 2017 10 PCG Mixed-Authorship

6 Tanagra: Reactive planning and constraint solving for mixed-initiative level design [88] Smith et al. 2011 120 PCG Reactive Planing / Constraint Solving

7 Patterns and procedural content generation: revisiting Mario in world 1 level [21] Dahlskog and Togelius 2012 43 PCG Design patterns

8 A Procedural Method for Automatic Generation of Spelunky Levels [9] Baghdadi et al. 2015 9 PCG Evolution / Search Based

9 Procedural generation of collaborative puzzle-platform game levels [6] Arkel et al. 2015 1 PCG Grammar Based / Design Pattern

10 Leveraging Multi-Layer Level Representations for Puzzle-Platformer Level Generation [91] Snodgrass and Ontanon 2017 0 PCGML Markov Model

11 Adaptable game experience based on player’s performance and EEG [40] Mikami et al. 2017 3 PCG Rhythm-Group

12 The Evolution of Fun - Automatic Level Design Through Challenge Modeling [93] Sorenson and Pasquier 2010 44 PCG Evolution

13 Towards automatic personalized content generation for platform games [85] Shaker et al. 2010 204 PCGML Preference Learning

14 Computational Intelligence-based Entertaining Level Generation for Platform Games [36] Zahid Halim et al. 2015 0 PCG Design heuristic / Evolution

15 Patterns as objectives for level generation [19] Dahlskog and Togelius 2013 24 PCG Evolution

16 Procedural content generation using patterns as objectives [20] Dahlskog and Togelius 2014 23 PCG Evolution

17 A Multi-level Level Generator [22] Dahlskog and Togelius 2014 30 PCG Evolution

18 Linear levels through n-grams [23] Dahlskog et al. 2014 38 PCG Grammar Based

19 Multi-population genetic algorithm for procedural generation of levels for platform games [27] Ferreira et al. 2014 12 PCG Evolution

20 Evolving mario levels in the latent space of a deep convolutional generative adversarial network [103] Volz et al. 2018 19 PCGML Generative Adversal Network

21 Combinatorial creativity for procedural content generation via machine learning [31] Guzdial and Riedl 2018 4 PCGML Combinatorial Creativity

22 Procedural level generation using multi-layer level representations with MdMCs [92] Snodgrass and Ontanon 2017 2 PCGML Markov Model

24 A genetic approach in procedural content generation for platformer games level creation [64] B. Moghadam and K. Rafsanjani 2017 4 PCG Evolution

25 Modeling perceived difficulty in game levels [105] Wheat et al. 2016 3 PCGML Decision Tree / andom Forest

26 Player movement models for platformer game level generation [90] Snodgrass and Ontanon 2017 2 PCGML Markov Model

27 Explainable PCGML via Game Design Patterns [33] Guzdial et al. 2018 4 PCGML Explainable AI

28 Blending Levels from Different Games using LSTMs [76] Sarkar and Cooper 2018 1 PCGML LSTM Neural Network

29 Automated Game Design via Conceptual Expansion [35] Guzdial and Riedl 2018 1 PCGML Conceptual expansion

30 Generating non-monotone 2D platform levels and predicting difficulty [51] Koens 2016 2 PCG Search Based

31 Friend, Collaborator, Student, Manager: How Design of an AI-Driven Game Level Editor Affects Creators [32] Guzdial et al. 2019 4 PCG Markov Model / Explainable AI / LSTM Neural Network

32 Game level generation from gameplay videos [34] Snodgrass and Ontanon 2016 29 PCG Bayes Network

33 Intentional Computational Level Design [49] Khalifa et al. 2019 3 PCG Evolution

	Title	Ref.	Authors	Year	Citation	Type	Bold Method
1	Launchpad - A Rhythm-Based Level Generator for 2-D Platformers	[89]	Smith et al.	2011	86	PCG	Design heuristic
2	Procedural level generation using occupancy-regulated extension	[59]	Mawhorter and Mateas	2010	64	PCG	Geometry Assembly
3	Evolving levels for Super Mario Bros. using grammatical evolution.	[81]	Shaker, et al.	2012	116	PCG	Design heuristic / Evolution
4	A procedural procedural level generator	[47]	Kerssemakers et al.	2012	43	PCG	Evolution
5	Procedural content generation for platformers - designing and testing FUN PLEdGE	[73]	Mazza et al.	2017	10	PCG	Mixed-Authorship
6	Tanagra: Reactive planning and constraint solving for mixed-initiative level design	[88]	Smith et al.	2011	120	PCG	Reactive Planing / Constraint Solving
7	Patterns and procedural content generation: revisiting Mario in world 1 level	[21]	Dahlskog and Togelius	2012	43	PCG	Design patterns
8	A Procedural Method for Automatic Generation of Spelunky Levels	[9]	Baghdadi et al.	2015	9	PCG	Evolution / Search Based
9	Procedural generation of collaborative puzzle-platform game levels	[6]	Arkel et al.	2015	1	PCG	Grammar Based / Design Pattern
10	Leveraging Multi-Layer Level Representations for Puzzle-Platformer Level Generation	[91]	Snodgrass and Ontanon	2017	0	PCGML	Markov Model
11	Adaptable game experience based on player’s performance and EEG	[40]	Mikami et al.	2017	3	PCG	Rhythm-Group
12	The Evolution of Fun - Automatic Level Design Through Challenge Modeling	[93]	Sorenson and Pasquier	2010	44	PCG	Evolution
13	Towards automatic personalized content generation for platform games	[85]	Shaker et al.	2010	204	PCGML	Preference Learning
14	Computational Intelligence-based Entertaining Level Generation for Platform Games	[36]	Zahid Halim et al.	2015	0	PCG	Design heuristic / Evolution
15	Patterns as objectives for level generation	[19]	Dahlskog and Togelius	2013	24	PCG	Evolution
16	Procedural content generation using patterns as objectives	[20]	Dahlskog and Togelius	2014	23	PCG	Evolution
17	A Multi-level Level Generator	[22]	Dahlskog and Togelius	2014	30	PCG	Evolution
18	Linear levels through n-grams	[23]	Dahlskog et al.	2014	38	PCG	Grammar Based
19	Multi-population genetic algorithm for procedural generation of levels for platform games	[27]	Ferreira et al.	2014	12	PCG	Evolution
20	Evolving mario levels in the latent space of a deep convolutional generative adversarial network	[103]	Volz et al.	2018	19	PCGML	Generative Adversal Network
21	Combinatorial creativity for procedural content generation via machine learning	[31]	Guzdial and Riedl	2018	4	PCGML	Combinatorial Creativity
22	Procedural level generation using multi-layer level representations with MdMCs	[92]	Snodgrass and Ontanon	2017	2	PCGML	Markov Model
24	A genetic approach in procedural content generation for platformer games level creation	[64]	B. Moghadam and K. Rafsanjani	2017	4	PCG	Evolution
25	Modeling perceived difficulty in game levels	[105]	Wheat et al.	2016	3	PCGML	Decision Tree / andom Forest
26	Player movement models for platformer game level generation	[90]	Snodgrass and Ontanon	2017	2	PCGML	Markov Model
27	Explainable PCGML via Game Design Patterns	[33]	Guzdial et al.	2018	4	PCGML	Explainable AI
28	Blending Levels from Different Games using LSTMs	[76]	Sarkar and Cooper	2018	1	PCGML	LSTM Neural Network
29	Automated Game Design via Conceptual Expansion	[35]	Guzdial and Riedl	2018	1	PCGML	Conceptual expansion
30	Generating non-monotone 2D platform levels and predicting difficulty	[51]	Koens	2016	2	PCG	Search Based
31	Friend, Collaborator, Student, Manager: How Design of an AI-Driven Game Level Editor Affects Creators	[32]	Guzdial et al.	2019	4	PCG	Markov Model / Explainable AI / LSTM Neural Network
32	Game level generation from gameplay videos	[34]	Snodgrass and Ontanon	2016	29	PCG	Bayes Network
33	Intentional Computational Level Design	[49]	Khalifa et al.	2019	3	PCG	Evolution

2.3 Super Mario Bros versus other platformers

Platformer games are one of the oldest game genres, and different platformer games are released in the game industry. However, Super Mario Bros [26] received special attention for some reason, and most of PCG in platformer studies focused on content generation for Super Mario. The reason for this attention can be the Mario AI championship [83] or super Mario fun nature. Figure 3 represents the percent popularity of Super Mario versus other platformer games. It should be mentioned that most of these studies used Super Mario as an example or a base to model their methods. Furthermore, it does not necessarily mean that their algorithm will not work for other platformers. Besides Mario, the other platformer games used in the case study papers are Kid Icarus [67], Mega-Man [68], Kirby’s Adventure [37], Spelunky [25, 101] (initially released in 2008, and later in 2020, the version Spelunky2 was released [38]).

Fig. 3

The percent popularity of Super Mario Bros versus other platformer games in the present case study papers.

3 Content generator evaluation

Evaluation is necessary to measure PCG system functionality due to several main reasons [82]:

To understand each PCG method’s capabilities better. It is difficult to understand the capabilities of a content generator by seeing every single instance of its output.

To confirm that the generated content can be guaranteed. If the desired content that we want to be able to produce has characteristics, it is crucial to be able to evaluate whether the generated content has the desired quality.

To iterate on the generator more easily by checking whether its output matches the programmers’ intent. As with any creative endeavor, the process of creating a procedural content generator involves reflection, iteration, and evaluation.

Being able to compare PCG methods with each other, despite different approaches. As the community of researchers working on creating the procedural content generators continues to grow, it is essential to understand how we are making progress concerning the current state-of-the-art.

3.1 Current evaluation methods

PCG research community has introduced different evaluation methods. Below is a list of the most well-known evaluation methods in the PCG research area, extracted from case study papers and previous works in the PCG area. Note that two new methods are also presented in the present study, which will be discussed in Section 5.

Player Questionnaire: one of the most obvious approaches to the content evaluation is to ask the players about it explicitly. A game user study may involve a small number of dedicated players that will play through varying amounts of content (e.g., [73]) or a crowd-sourced approach that can provide sufficient data to machine-learn content (e.g., [16 , 84]).

The Case study papers that used a player questionnaire for their evaluation are [19 , 105].

Expert Questionnaire: in many research fields, expert reviews are frequently used as a questionnaire evaluation method [70]. The PCG systems are not an exception. Expert players or expert game designers can give valuable information about a game produced by a procedural content generator. Experts can see the game from a professional viewpoint and find the missing part of the game. An intermediate player may never realize [73] that he/she is a subject of an expert questionnaire to evaluate the PCG system.

The case study papers that used an expert questionnaire for their PCG evaluation method are [73]

Expressive Range: expressive range refers to the space of possible levels that the generator can create, including how biased it is towards generating particular kinds of content in that space [87]. This evaluation is performed by choosing metrics based on which the content can be evaluated and using those metrics as axes to define the space of possible content. Many content pieces are then generated and evaluated according to the defined metrics and usually plotted in a heatmap [82].

The Case study papers that used an expressive range for their evaluation: [9 , 89].

Learning Ability: the entertainment value of the evolved games can be verified using Schmidhuber’s artificial curiosity theory to see how quickly a player learns to play an evolved game [77]. Games learned very quickly will be trivial for the player and thus, not entertaining. Those that take a long time to learn will be too complex for the player and will have no or shallow entertainment. Games between these two extremes will fall in the range of entertaining ones [36].

The case study papers that used a learning ability for their PCG evaluation method are [36].

Algorithm Validation: this criterion is considered for papers that show their used algorithms work well enough. An example [64] is a dynamic difficulty evolution-based method. Authors show that their implemented genetic algorithms can find rhythms with target difficulty. The mentioned process is certainly an evaluation but not an evaluation for the whole method. This kind of evaluation is considered in “Algorithms Validation.”

The case study papers that used algorithm validation for their evaluation [32 , 103].

Algorithm Comparison: if the presented method compares its algorithm output with others, it will be considered in the algorithm comparison criteria. Also, some papers that check their algorithm output with different input parameters are considered in this section.

The case study papers that used algorithm comparison for their evaluation are: [23 , 105].

Agent (Robot): an agent is not an evaluation tool or method but is a tool that helps implement some evaluation methods. For example, one can use an agent to test the playability of the implemented level.

The case study papers that used an agent for their evaluation are: [32 , 93].

Player Sensor: sensors can measure some parameters, such as pressure or light. The sensor will then be able to convert the measurements into the readable signals [104]. They can measure humans’ emotional signals, such as stress [75]. Some proper definitions of the emotional states [58] and detection of the human emotional state are developed using sensors [43].

For example, facial analysis can be used to detect players’ emotion [17], and some methods only use facial analysis and computer vision techniques to detect emotions [98, 100] It is also used in games, including correlating dimensions of experience [99] and enhancing online games [106 –108]. On the other hand, signals, such as HR (Heart rate) and HRV (Heart Rate Variability), are used to detect stress remotely [13 , 61–63]. In most cases, however, subjects are instructed to stay still [74], leading to improved accuracy in the estimations. Such behavioral constraints affect the interaction between players and games, making the experience unnatural.

A sensor is a device that converts one type of energy to another [4], it is a device which is able to convert any objective quantity to be measured into a signal which is interpreted [10], in computer software, sensors can also be a non-physical device. There have been studies on non-physical sensors and using them to detect players’ (human) emotions, such as stress or boredom [11].

The case study papers that used a player sensor for their PCG evaluation method are [40].

In-Game Rating: It is another method of evaluating user satisfaction. One may define it as a questionaries’ evaluation method inside the game itself.

The case study papers that used in-game ratings for their evaluation are [73, 105].

3.2 Level metrics and expressive range

It is essential to find a way to characterize the content generator’s performance in the context of game design concerns to make informed decisions about which content generation method would be best suited for a particular type of content generation problem. A promising approach takes the form of metrics applied to the generator’s output and is used to characterize the generator’s expressive range.

3.2.1 Current level metrics

One weakness of the current PCG evaluation system is the lack of enough attention to the capabilities of the developed PCG method. Expressive range [89] is an excellent example of showing the production capabilities of the PCG method in the case of some game design level metrics, such as linearity and leniency. Unfortunately, no standard definition is currently available for these concepts. On the other side, there are many properties at a level. Some works consider more factors besides leniency and linearity in showing the production capabilities of the PCG method. The current popular metrics used to evaluate levels are only for 2D platforming levels, specifically those created in the Mario AI framework [45]. These metrics are grounded in design theory rather than being extracted from interviews with level designers or a critical design process [14]. In this section, an overview of accessible level metrics is provided in the platformer games genre. Figure 4 shows an overview of introduced level metrics and their progression through time and different studies.

Fig. 4

Level metrics progression from 2011.

Also, there are several different metrics, but only a few are popular in the research community. Here, five different metrics are chosen as the most popular ones. This choice is based on previous research and the use of metrics in them. A detailed description of some of these metrics for platformer games was presented in

Table 3, where each column corresponds to different research and its perspective. Note that [95] is not included in this table due to its lack of popularity compared to the other four papers.

Expressive range: Evaluating each metric independently from others does not indicate any relationships between them and will not show the shape of a generator’s expressive range. Viewing all metrics at a high level can show that a particular generator might be predisposed to create an intermediate range of linearity and density levels. However, it would not be able to show any correlation between levels according to those metrics [41]. Visualizing the expressive range of generators makes it easy to see such biases [89]. This visualization involves plotting a heatmap histogram where each axis represents a metric, and a color is assigned to each bucket in the histogram based on how many levels are in the bucket. The original definition of the expressive range comes from the work of Smith and Whitehead (2010) [87]: “With metrics allowing us to compare produced levels, we can describe the level generator’s expressive range by generating several levels and ranking them by their linearity and leniency scores.” Newer research extended this definition using other metrics (Table 4). In the present paper, the expressive range is not limited to linearity and leniency. However, any metrics can be used in calculating the expressive range. A list of current popular level metrics is presented in Table 4.

Table 4

Popular level metrics for platformer games

	Canossa and Smith [14]	Horn et al. [41]	Shaker et al. [81]	Smith et al. [89]
Linearity	Linearity measures by taking the walkable space of a 2D level and try to fit it to a line, so the vertical flow of the player progress through it is the linearity of that level. So, Low linearity involves too many ups and downs in levels like hills and pits.Linearity shows how the player experiences a level in terms of his/her movement through it.	Linearity is R2, the goodness of the fitness measure for a line that has been fitted to the endpoints for each level platform. So, levels with more height variations have low linearity.	Linearity is affected by the hill’s height along with the level, and the variations in the platform height.	Linearity measures the “profile” of a level, and this is a more aesthetic quality that the player will experience while playing the level.
Leniency	Leniency is designed to measure the difficulty of game levels, and it is based on the components which are used at that level. Each element in the level is assigned a leniency score; for instance, a wall may have leniency of 1 if it cannot harm the player at all. Alternatively, a valley may have a leniency of zero.	The leniency aim is to capture how difficult a level is for a player. It can be calculated by finding the points in the level in which an action by the player is needed and then determining how lenient that challenge would be.	The leniency is how easy it is for the player to complete the level.	Leniency describes how forgiving the level is likely to be to a player.
Density	Density measures the average amount of content in a vertical slice of a level. For instance, a vertical slice that contains three walls and a powerup is denser than the same sized vertical slice with only one enemy. Density approximates the number of available paths through a level, as well as the amount of its visual clutter.	Density is a measure of how many game objects are on top of each other. The density calculator assigns a density value to each position	Hills of different height can be on top of each other, allowing the player to reach higher places and introducing new patterns in the level design. Density measures the number of density chunks occurrences.	Not Mentioned.
Pattern Variation	Pattern variation measures the number of different kinds of typical Mario-like level constructs that exists at a game level. This metric also shows how similar a generated level is to an original, human-designed Mario level.	Unique occurrences of patterns are counted, different use of meso-patterns are higher scores than repetitive ones.	Not Mentioned.	Not Mentioned.
Pattern Density	Pattern density measures how much of a level can be explained by Mario’s design patterns. A level that has many patterns in it has a higher pattern density value.	The number of meso-patterns of original Super Mario Bros in the current level is counted.	Not Mentioned.	Not Mentioned.

3.2.2 The dark side of using level metrics

Using level metrics can have many small tricks and may make the evaluation difficult. For example, in studying the effect of enemy spread on the level difficulty, using Super Mario enemies as a sample can lead to false results. It is necessary to consider that one of the enemies in the Super Mario game is “Koopa Troopa,” [52] which may flee from or retreat inside its shells, which can usually be used as weapons [69]. So, Koopa is an enemy that can also be used to defeat other enemies and seen as a recurring weapon horizontally. In the research [95], as an influential study on level metrics in super Mario and platformer games, the effect of such level metrics have been studied on level difficulty.

3.3 Discussion on the evaluation methods of the case study papers

Figures 5 6 present the case study papers and their evaluation methods. Figure 5 shows the exact method and name of each evaluation method, while Fig. 6 shows the usage number of each evaluation method in the case study papers.

Fig. 6

Evaluation methods and their usage in case study papers. The darker color corresponds to PCGML, and the lighter color corresponds to PCG without ML.

According to the figures, some notable points can be explained. These points will be used to find where the PCG research community has worked and which areas still need the community’s attention and progress.

3.3.1 Lack of different evaluations in PCGML methods

All the case study papers that used PCGML have somehow included an evaluation system, but most of them used only algorithm comparison/validation. This means that many of these methods have never tested the quality of their output from the player’s perspective. Comparing and validating algorithms may help prove the algorithm performance. This tells us the algorithm does what it is meant to do, but it will never show us if the output is suitable for end-users or not. For example, one may implement a random algorithm that randomly places game objects in the space level. This algorithm error is most likely zero because it is entirely random as it is meant to be. However, it does not mean players would like to play this level.

3.3.2 Some methods have not evaluated their implemented method at all

It is worth mentioning that five paper [6 , 59], on the PCG methods were never tested their output. It means that readers will have no idea how to measure these papers’ methods.

3.3.3 Some methods lack expressive range evaluation

However, most of these methods used at least one type of evaluation. As shown in Fig. 5, only some methods used expressive range evaluation [9 , 89], which currently seems necessary because the expressive range is the only way of showing how the outputs of the implemented algorithm are different. So, we strongly suggest that every PCG research should use the expressive range as a part of its evaluation method. Otherwise, the usability of its method may not be trusted. Unfortunately, expressive range calculation is currently only prevalent in PCG for platformer research.

3.3.4 Some methods used only expressive range as evaluation

Some methods only use expressive range as their evaluation [9 , 89]. The expressive range is also significant, but it does not show players’ ideas about created levels. For example, it is possible to develop a random algorithm with an acceptable expressive range, but this does not mean that a randomly generated level is acceptable. Indeed, the expressive range can only measure one aspect of Panda Evaluation, which will be discussed in section 5.

4 Soft launch and hard launch

Soft launch and hard launch are two methods that are new to the PCG research community. Even though hard launch and soft launch have a long history in digital software production, but to the best of our knowledge, this is the first time they are suggested for PCG method evaluation.

In product development, one can adopt two different approaches when it finally comes to taking a product to market: soft launch or hard launch.

The soft launch: it is when the researcher takes a limited approach to the launch (D1), which can be limited by geography, technical specifications or, most likely, by selecting customers (generally limited to very good, highly trusted beta testers). Figure 7 presents a flow chart of our recommended soft launch algorithm. It should be noted that this is not the only way to do a soft launch. Several online services help researchers find people ready to test systems and report their experiences and ideas. Some examples are Amazon Turk [2], Ipsos I-Say [42], or Swagbucks [97].

Fig. 7

Flowchart of our recommended soft launch algorithm.

Definition 1 (D1): Soft Launch is a limited release of a new product, service, or app to a limited audience before the public.

Definition 2 (D2): Hard launch is the general publish of a new product, service, or app for the target audience worldwide with full functionality on the first day [94]. However, in this research, the hard launch is defined by D3.

The idea of a hard launch as an evaluation method is to remove users’ balances, thresholds, and background about the research and the game and find the real value of the developed PCG method by releasing it in the real world with real users.

Definition 3 (D3): Hard Launch is releasing the game (or method output) to as many as possible real users who may or may not know anything about the research.

The soft and hard launch is not recommended to be used alone. Launching without tracking the user’s actions may not be a wise decision, and it is necessary to take users’ ideas and monitor their activities and analyze these data. Tracking users’ actions can be done using third-party frameworks while programming the target game, such as Google analytics [29] or game analytics [28]. On the other hand, an “in-game questionnaire” can be used to gather user ideas and feelings about evaluating methods.

5 Panda evaluation of PCG methods

Throughout this paper, some missing points of current evaluations were presented. Also, a group of papers was analyzed in the case of the evaluation methods.

In this section, first, each of the current evaluation methods is studied concerning their ability to measure a specific dimension of the target PCG method and is referred to as “Method Usage,” as defined in D5, since each evaluation method can measure a specific dimension(s) of a PCG method.

In the current paper, these dimensions are called “Evaluation Metrics,” as defined in D4. Of course, this is different from the definition provided for level metrics in Section 3.2

Definition 4 (D4): Every PCG evaluation method can evaluate some aspects (dimensions), each of which is referred to as an “Evaluation Metric.”

Definition 5 (D5): “Method Usage” is the number of “evaluation metrics” in an evaluation method. It is claimed that an ideal evaluation would contain several different evaluation methods to cover maximum possible evaluation metrics, and here, it was called Panda Evaluation.

Definition 6 (D6): Panda Evaluation, as defined in D5, is an ordered series of evaluation methods covering the maximum possible different evaluation metrics.

PE is a concept that tries to standardize the PCG evaluation by defining several steps for each PCG evaluation method. Of course, here, the focus is on PCG for platformers as it is one of the most studied areas in PCG for game research in case of evaluation. However, the same PE concept can be applied to a different PCG area in games and even PCG in general.

5.1 Discussion on panda evaluation

To define PE, first, it is necessary to find out what metrics should be the ideal evaluation measures in every PCG in the game algorithm. Table 5 presents a suggested list of these metrics. The following steps were considered to generate data in Table 5:

Table 5
List of introduced evaluation metrics. Each evaluation method measures at least one of the mentioned metrics

PCG Metric Description

Levels Variation It can generate enough (usually many) different outputs. Furthermore, it does not stick to similar ones.

Ideality The method does the task designed by a designer.

Stylish The technique that a single designer uses in his design is referred to as a style. A stylish PCG method will produce outputs in the designer’s specified design style.

Beauty Outputs are attractive in the case of visuality.

Attractive The designed level challenges are attractive so that the player wants to continue playing it.

Playable The level of attraction is different from the level of vision. It is possible for a game to be ugly yet challenging for the players.

Production speed All produced outputs are playable. There is no “deadlock” in any situation.

Resource Usage Indicates how fast an output is generated by the PCG method. If the production speed is low, the implemented method cannot be used online (while the game is running).

Adaptive The number of resources needed by the implemented method to generate an output.

Maximum PCG Indicates how much adaptive method is to each specific player’s needs.

PCG Metric	Description
Levels Variation	It can generate enough (usually many) different outputs. Furthermore, it does not stick to similar ones.
Ideality	The method does the task designed by a designer.
Stylish	The technique that a single designer uses in his design is referred to as a style. A stylish PCG method will produce outputs in the designer’s specified design style.
Beauty	Outputs are attractive in the case of visuality.
Attractive	The designed level challenges are attractive so that the player wants to continue playing it.
Playable	The level of attraction is different from the level of vision. It is possible for a game to be ugly yet challenging for the players.
Production speed	All produced outputs are playable. There is no “deadlock” in any situation.
Resource Usage	Indicates how fast an output is generated by the PCG method. If the production speed is low, the implemented method cannot be used online (while the game is running).
Adaptive	The number of resources needed by the implemented method to generate an output.
Maximum PCG	Indicates how much adaptive method is to each specific player’s needs.

Examining: Examining the case study methods to show that each evaluation method has a (possibly hidden) metric so that the corresponding method researchers believe measuring that metric is enough to show how useful their implemented method is. Putting these metrics together helps find empty areas in the evaluation research, as much as important ones.

Determining: Determining the number of both PCG and game design experts to polish the previous step results, as well as a list of the final metrics.

5.2 Executing PE

We advise that a full aspect evaluation examine the implemented method in all introduced aspects given in Table 5. In this paper, such an evaluation is named Panda Evaluation. To do so, it is worth mentioning Figs. 8 9 information. Each evaluation method is represented with a diamond node in these two figures, while each evaluation metric is represented with a trapezoid. Each node in the evaluation method is connected to the metrics that it can measure. Note that launch evaluation methods are presented separately in Fig. 9 for simplifying the graph. Figure 9 shows the launch method plus data analysis or user questionnaire usage. Each of these two methods has different and shared applications.

Fig. 8

This EV Graph shows the application of each of the previous evaluation methods. Note that these are common applications of each method. One may use an agent or other methods in a novel way to evaluate a PCG method.

Fig. 9

The use of launch methods combined with analyzing player data or questionnaire, as shown in this EV Graph, almost consider all applications presented in this paper, based on case study methods.

5.2.1 Define PE based on FE graph

Here, any graph, such as those in Fig. 9 and Fig. 10, containing the connected evaluation method and metric nodes, is referred as an evaluation graph (D7), and a full evaluation graph is referred to as FEG (D8). According to the definition of the FEG graph and previous definition of PE, it is now possible to present another definition for PE using FEG (D9).

Fig. 10

This flowchart shows the Panda Evaluation steps for each PCG method, focusing on PCG for platformer games.

Definition 7 (D7): An Evaluation (EV) Graph is a graph that has two kinds of nodes: Evaluation Methods and Evaluation Metrics. Each evaluation method node is connected to the metric nodes that it can evaluate.

Definition 8 (D8): A Full Evaluation Graph (FEG) is an EV graph containing all evaluation methods and metrics (in the assumed world).

Definition 9 (D9): Every path in the FEG graph that contains all diamond nodes defines a PE.

Based on D9, there are many different PEs, so that each one guarantees to cover all available metrics. PEs can differ in evaluation quality, cost, performance, or other features. In the next section, we have introduced a unique evaluation algorithm to maximize total performance evaluation while minimizing cost.

6 Recommended Panda Evaluation (RPE)

The previous section showed that PE is not unique. According to D9, there are many ways to have a Panda Evaluation. Each PE has its pros and cons. In Fig. 10, a recommended PE is presented called “Recommended Panda Evaluation”, as it almost considers all presented evaluation methods but agents. Agents are customized for each method and may or may not be necessary for the evaluation. Therefore, they are not included in RPE.

Definition 10 (D10): RPE is a unique series of evaluation methods trying to maximize performance while minimizing evaluation costs.

RPE flowchart is represented in Fig. 10. A discussion of its steps is given below:

Step 1: Algorithm validation: It would be wise to fail the PCG method that cannot pass the algorithm validation step and no longer continue its evaluation along PE. So validation will be calculated in the first step of this flowchart because if an algorithm fails to perform the task designed for it, it is likely not necessary to continue RPE. If the result of evaluating the PCG algorithm in the validation step is “No,” then the process will be finished.

Fail scenario: If an algorithm is designed to produce average random numbers, but it turns out that it can only produce a number, using this algorithm as a random generator is not acceptable. Moreover, it will fail in algorithm validation.

Step 2: Expressive range: The RPE will calculate the expressive range in this step because if an algorithm does not have a valid expressive range, it may be useless to calculate other evaluations.

Fail scenario: If an algorithm can only produce a few different levels, it may not be useful as a PCG algorithm.

Step 3: Expert questionnaire: The next step of RPE is to take expert opinions about the target PCG algorithm. This evaluation can help find hidden values of the developed PCG method. According to the analysis of case study papers, this is the only way of evaluating the “maximum PCG” metric ever used.

One may also cover this metric by proving that his/her developed PCG algorithm is procedural enough. Nevertheless, this metric is within the expert’s scope, and therefore he/she has placed him/her in an experts’ position, so it is still “expert opinion.” For this reason, the calculation of the expert’s opinions is critical.

Fail scenario: Imagine a PCG method that can generate many different outputs, but many of them are useless from an expert’s viewpoint.

Step 4: Comparison: The next step of RPE is the comparison. It is recommended to check if the presented method uses machine learning or not. If so, then it is suggested to compare it with previous related machine learning algorithms. In machine learning approaches, comparing the algorithm output is possible and essential. While in other PCG methods, comparing the algorithm is not always an easy task. So, it is not included in RPE. Of course, if a researcher compares methods, his/her research evaluation will be better and more accurate.

Step 5: Soft launch: The next step of RPE is doing the soft launch, Usually, the soft launch needs more effort and resources compared to previous evaluations of RPE, which is why it is in the final steps of RPE. It must be done if the method is worthy enough and has passed previous evaluations successfully. Especially in commercial projects, managing the budget is an essential task, and it is an important principle to fail faster. It means that if a method wants to fail the whole project, it is better to fail faster, consuming a minimum of project budget and resources [8, 50]. That is why, in PRE, less expensive evaluations will be calculated first.

In the soft launch phase, the player’s data will be gathered using an automatic tracker or a questionnaire, or both. The benefits of each of these two are presented in Fig. 9.

Step 6: Analyzing: After the soft launch, RPE suggests analyzing soft launch data to see the implemented method’s pros and cons. This analysis is also useful to ensure the implemented algorithm works better and to find bugs if there are any. An acceptable way of finding possible bugs is to let a group of players play the implemented game method. The soft launch is also useful for player behavior recognition. For example, it can be used to find possible places where a player performs a particular action, such as buying in-game coins or spending them, the number of players in each section of the level, the player’s playing path, use of guns or other game objects. These behaviors can be measured precisely using the soft launch and a tracking system inside the game. Some examples of these analytic tracking tools that can be used inside a game to track player behaviors are [28, 29], of course, one may develop his/her tracking framework.

Step 7: Finalize: The next step of RPE is to decide if the current method needs some improvement before releasing it or not. If so, developers may change the algorithm and restart the RPE process. Nevertheless, if changes are not needed while releasing the game, there is an optional step in RPE, based on which it is recommended to release the game and do the hard launch.

At this level, RPE is completed, and its results can be published to show the features of the implemented method.

References

Amato

, Procedural content generation in the game industry, in: Game Dynamics, O. Korn and N. Lee, eds., Springer, 2017, pp. 15–25.

Amazon Turk, Amazon. (2020).

Amuer

and Ben

, LeapGestureDB: A public leap motion database applied for dynamic hand gesture recognition in surgical procedures, In: V. Balas, L. Jain, M. Balas (eds.) Soft Computing Applications. SOFA 2018. Advances in Intelligent Systems and Computing 1222 (2021), 125–138, Springer, doi.org/10.1007/978-3-030-52190-5_9.

Anju Latha

, Rama Murthy

and Kumar

K.B.

, Distance sensing with ultrasonic sensor and arduino, Intenational Journal of Advance Research, Ideas and Innovovations in Technology 2 (2016), 1–5.

and Liu

, Multi-feature recognition of english text based on machine learning, Fuzzy Systems Pre-press (2021), DOI: 10.3233/JIFS-189214.

Arkel

A.V.

, Karavolos

and Bouwer

, Procedural generation of collaborative puzzle-platform game levels, (2015).

Atmaja

P.W.

, Parlika

and Muttaqin

, Generating two-dimensional platformer game levels from storylines, Proceeding of the International Conference Science Technology (ICST 2018), 2018.

Babineaux

and Krumboltz

, Fail fast, fail often: How losing can help you win, 2013.

Baghdadi

, Eddin

F.S.

, Al-Omari

, Alhalawani

, Shaker

and Shaker

, A procedural method for automatic generation of spelunky levels, Proceeding of the European Conference on the Applications of Evolutionary Computation, (2015), pp. 305–317.

10.

Bapat

P.M.

, Insights to sensor technology and it’s applications, International Journal of Engineering and Computer Science 5 (2016), 15962–15965.

11.

Bevilacqua

, Engström

and Backlund

, Game-calibrated and user-tailored remote detection of stress and boredom in games, Sensors 19 (2019), 28–77.

12.

Bottino

, Chioccariello

, Freina

and Tavella

Digital games in primary schools for the development of key transversal skills, in: Sustainable ICT, Education and Learning, A. Tatnall and N. Mavengere, eds., SUZA 2019, pp. 55–65.

13.

Bousefsaf

, Maaoui

and Pruski

, Remote assessment of the heart rate variability to detect mental stress, Proceedings of the 7th International Conference on Pervasive Computing Technologies for Healthcare, May 2013, pp. 348–351.

14.

Canossa

and Smith

, Towards a procedural evaluation technique: metrics for level design, Proceeding of the 10th International Conference on the Foundations of Digital Games, (2015), pp. 8.

15.

Chen

, Wang

and Du

, Diagnostic Evaluation model of english learning based on machine learning, Journal of Intelligent & Fuzzy Systems Pre-press (2021), DOI: 10.3233/JIFS-189216.

16.

Chernova

, Orkin

and Breazeal

, Crowdsourcing HRI through online multiplayer games, Proceeding of the AAAI Fall Symposium Series, 2010.

17.

Cohn

J.F.

and De La Torre

, Automated face analysis for affective computing, in: The Oxford Handbook of Affective Computing, R. Calvo, S. D’Mello, J. Gratch, and A. Kappas, eds., Oxford University Press, 2015.

18.

Cuevas

, Best Nintendo Switch 2D Platformers 2020, Imore Company, (2020), https://www.imore.com/best-nintendo-switch-2d-platformers.

19.

Dahlskog

and Togelius

, Patterns as objectives for level generation, Proceeding of the Second Workshop Design Patterns Games, 2013.

20.

Dahlskog

and Togelius

, Procedural content generation using patterns as objectives, Proceeding of the European Conference Applications of Evolutionary Computation, (2014), pp. 325–336.

21.

Dahlskog

and Togelius

, Patterns and procedural content generation: revisiting Mario in world 1 level 1, Proceeding of the FirstWorkshop Design Patterns Games, (2012), pp. 1.

22.

Dahlskog

and Togelius

, A multi-level generator, Proceeding of the Conference Computation Intelligence and Games, (2014), pp. 1–8.

23.

Dahlskog

, Togelius

and Nelson

M.J.

, Linear levels through n-grams, Proceeding of the 18th International Academic MindTrek Conference: Media Business, Management, Content & Services, (2014), pp. 200–206.

24.

Demir

and Aliaga

D.G.

, Guided proceduralization: Optimizing geometry processing and grammar extraction for architectural models, Computation Graphics, (2018).

25.

Derek Yu

, Mossmouth, Spelunky, (2008).

26.

Ead

, Super Mario Bros, (1985).

27.

Ferreira

L.N.

, Pereira

and Toledo

, Amulti-population genetic algorithm for procedural generation of levels for platform games, Proceeding of the Conference on Genetic and Evolutionary Computation, (2014), pp. 45–46.

28.

Game Analytics, Game Analytics Ltd., (2011). www.gameanalytics.com

29.

Google, Google Analytics, (2005). https://analytics.google.com/analytics/web/

30.

Google Scholar, https://scholar.google.com/intl/en/scholar/about.html

31.

Guzdial

M.J.

and Riedl

M.O.

, Combinatorial creativity for procedural content generation via machine learning, Proceeding of the Workshop Thirty-Second AAAI Conference Artificial Intelligence, 2018.

32.

Guzdial

, Liao

, Chen

S.-Y.

, Shah

, Reno

, Smith

and Riedl

M.O.

, Friend, collaborator, student, manager: How design of an ai-driven game level editor affects creators, Proceeding of the CHI Conference Human Factors Computation System, (2019), pp. 624.

33.

Guzdial

, Reno

, Chen

, Smith

and Riedl

, Explainable PCGML via game design patterns, ArXiv Preprint, ArXiv1809.09419. (2018).

34.

Guzdial

and Riedl

, Game level generation from gameplay videos, Proceeding of the Twelfth Artificial Intelligence and Interactive Digital Entertainment Conference, 2016.

35.

Guzdial

and Riedl

, Automated game design via conceptual expansion, Proceeding of the Fourteenth Artificial Intelligence and Interactive Digital Entertainment Conference, 2018.

36.

Halim

, Baig

A.R.

and Abbas

, A computational intelligence-based entertaining level generation for platform games, International Journal of Computation Intelligence and System 8 (2015), 1128–1143.

37.

HAl Laboratory, Kirby’s Adventure, (1993).

38.

Harris

, Exploreing Roguelike games, CRC Press, 2020.

39.

Hendrikx

, Meijer

, Van Der Velden

and Iosup

, Procedural content generation for games: A survey, ACM Transaction Multimedia, Communication and Computing Application 9 (2013).

40.

Heny

, Mikami

and Kondo

, Adaptable Game Experience Based on Player’s Performance and EEG, proceeding of the Nicograph International (NicoInt), (2017), pp. 1–8.

41.

Horn

, Dahlskog

, Shaker

, Smith

and Togelius

, AComparative Evaluation of Procedural Level Generators in the Mario AI Framework, Proceeding of the Foundations of Digital Games, (2014), pp. 1–8.

42.

Ipsos I-Say, (n.d.). rec.i-say.com.

43.

Jerritta

, Murugappan

, Nagarajan

and Wan

, Physiological signals based human emotion recognition: A review, Proceeding of the IEEE 7th International Colloquium on Signal Processing and Its Applications, 2011.

44.

Jordan

M.I.

and Mitchell

T.M.

, Machine learning: Trends, perspectives, and prospects, Science 349 (2015), 255–260.

45.

Karakovskiy

and Togelius

, The Mario AI benchmark and competitions, IEEE Transaction on Computational Intelligence and AI in Games 4 (2012), 55–67.

46.

Karth

and Smith

A.M.

, Addressing the fundamental tension of PCGML with discriminative learning, Proceeding of the 14th International Conference Foundations Digital Games (2019), pp. 1–9.

47.

Kerssemakers

, Tuxen

, Togelius

and Yannakakis

G.N.

, A procedural procedural level generator generator, Proceeding of the Conference Computation Intelligence and Games (2012), pp. 335–341.

48.

Khadivpour

and Guzdial

, Explainability via Responsibility, ArXiv preprint, ArXiv 2010.01676 (2020).

49.

Khalifa

, Green

M.C.

, Barros

and Togelius

, Intentional computational level design, ArXiv Preprint, ArXiv1904.08972, (2019).

50.

Khanna

, Guler

and Nerkar

, Fail often, fail big, and fail fast? Learning from small failures and R&D performance in the pharmaceutical industry, Academy of Management Journal 52 (2016), 436–459.

51.

Koens

F.E.

, Generating non-monotone 2D platform levels and predicting difficulty, Master thesis, Utrecht University Repository 2016.

52.

Koopa [Troopa, Gamicus. (2018). https://gamicus.gamepedia.com/Koopa_Troopa (accessed 9 August 2018).

53.

Lee

, Partlan

and Cooper

, Precomputing player movement in platformers for level generation with reachability constraints, Proceeding of the Experimental AI in Games, 2020.

54.

, Lee-Urban

, Appling

D.S.

and Riedl

M.O.

, Crowdsourcing narrative intelligence, Advances in Cognitive System 2 (2012).

55.

Lopes

, Eisemann

and Bidarra

, Authoring adaptive game world generation, IEEE Transaction on Games 10 (2018), 42–55.

56.

Ltd

, Donkey Kong Country, (1994).

57.

Magerko

, Heeter

, Fitzgerald

and Medler

, Intelligent adaptation of digital game-based learning, Proceeding of the Conference on Future Play: Research, Play, Share, (2008), pp. 200–203.

58.

Mandryk

R.L.

, Atkins

M.S.

and Inkpen

K.M.

, A continuous and objective evaluation of emotional experience with interactive play environments, Proceeding of the Conference Human Factors Computation System, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2006.

59.

Mawhorter

and Mateas

, Procedural level generation using occupancy-regulated extension, Proceeding of the Proceeding of the Conference Computation Intelligence and Games, (2010), pp. 351–358.

60.

Mazza

, Ripamonti

L.A.

, Maggiorini

and Gadia

, Fun pledge 2.0: a funny platformers levels generator, Proceeding of the 12th Biannual Conference on Italian SIGCHI Chapter, (2017), pp. 22.

61.

McDuff

, Gontarek

and Picard

, Remote measurement of cognitive stress via heart rate variability, Proceeding of the 36th Annual International Conference IEEE Engineering in Medicine and Biology Society, (2014), pp. 2957–2960.

62.

McDuff

, Gontarek

and Picard

R.W.

, Improvements in remote cardiopulmonary measurement using a five band digital camera, IEEE Transaction on Biomedical Engineering 61 (2014), 2593–2601.

63.

McDuff

D.J.

, Hernandez

, Gontarek

and Picard

R.W.

, COGCAM: Contact-free measurement of cognitive stress during computer tasks with a digital camera, Proceeding of the Conference Human Factors Computation System, 2016.

64.

Moghadam

A.B.

and Kuchaki Rafsanjani

, A genetic approach in procedural content generation for platformer games level creation, Proceeding of the 2nd Conference Swarm Intelligence of Evolutionary Computation, (2017), pp. 141–146.

65.

Morton

and Park

, 2021 games: all the new launches you’ll want to watch for next year, PC Gamer Company, 2020. https://www.pcgamer.com/new-games-2021

66.

Naqa

I.E.

and Murphy

M.J.

, What is machine learning?, Machine Learning in Radiation Oncology, Springer, (2015), pp. 3–11.

67.

Nintendo

P.S.

, Tose, Sora Ltd., Nintendo Research & Development 1, Kid Icarus, 1986.

68.

Nintendo, Mega Man, (1987).

69.

Nintendo, Koopa Troopa, (n.d.), https://play.nintendo.com/themes/friends/koopa-troopa/

70.

Olson

, An examination of questionnaire evaluation by expert reviewers, Field Methods 22 (2010), 295–318.

71.

Pintea

C.M.

, Matei

, Ramadan

R.A.

, Pavone

, Niazi

and Azar

A.T.

, A fuzzy approach of sensitivity for multiple colonies on ant colony optimization, In: V. Balas, L. Jain, M. Balas (eds.) Soft Computing Applications. SOFA 2016. Advances in Intelligent Systems and Computing, 634 (2019), 87–95, Springer, doi.org/10.1007/978-3-319-62524-9_8.

72.

Rieder

, Using procedural content generation via machine learning as a game mechanic, Austrian Marshall Plan Foundation, (2018).

73.

Ripamonti

L.A.

, Mannalà

, Gadia

and Maggiorini

, Procedural content generation for platformers: designing and testing FUN PLEdGE, Multimedia Tools and Applications 76 (2017), 5001–5050.

74.

Rouast

P.V.

, Adam

M.T.P.

, Chiong

, Cornforth

and Lux

, Remote heart rate measurement using low-cost RGB face video: a technical literature review, Frontiers of Computer Science, (2018).

75.

Salai

, Vassányi

and Kósa

, Stress detection using low cost heart rate sensors, Journal of Healthcare Engineering (2016).

76.

Sarkar

and Cooper

, Blending Levels from Different Games using LSTMs, Proceedings of the Conference on Artificial Intelligence and Interactive Digital Entertainment Workshops, 2018.

77.

Schmidhuber

, Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts, Connection Science 18 (2006), 173–187.

78.

Schwarz

and Müller

, Advanced procedural modeling of architecture, ACM Transaction on Graphics 34 (2015).

79.

Scopus, (2020). https://scopus.com/

80.

SEGA, Sonic the Hedgehog, 2020. https://www.sonicthehedgehog.com/

81.

Shaker

, Nicolau

, Yannakakis

G.N.

, Togelius

and O’Neill

, Evolving levels for Super Mario Bros using grammatical evolution, Proceeding of the Conference Computation Intelligence and Games, (2012), pp. 304–311.

82.

Shaker

, Togelius

and Nelson

M.J.

, Procedural Content Generation in Games, Springer, 2016.

83.

Shaker

, Togelius

, Yannakakis

G.N.

, Weber

, Shimizu

, Hashiyama

, Sorenson

, Pasquier

, Mawhorter

, Takahashi

, Smith

and Baumgarten

, The Mario AI championship: level generation track, IEEE Transaction Computation Intelligence and AI Games 3 (2011), 332–347.

84.

Shaker

, Yannakakis

G.N.

and Togelius

, Crowdsourcing the aesthetics of platform games, IEEE Transaction Computation Intelligence and AI Games 5 (2012), 276–290.

85.

Shaker

, Yannakakis

and Togelius

, Towards automatic personalized content generation for platform games, Proceeding of the Sixth Artificial Intelligence and Interactive Digital Entertainment Conference, 2010.

86.

Smith

, Cha

and Whitehead

, A framework for analysis of 2D platformer levels, Proceeding of the Sandbox 2008 An ACM SIGGRAPH Videogame symposium, 2008.

87.

Smith

and Whitehead

, Analyzing the expressive range of a level generator, Proceeding of the Workshop Procedural Content Generation Games, (2010), pp. 4.

88.

Smith

, Whitehead

and Mateas

, Tanagra: Reactive planning and constraint solving for mixed-initiative level design, IEEE Transaction Computation Intelligence and AI Games 3 (2011), 201–215.

89.

Smith

, Whitehead

, Mateas

, Treanor

, March

and Cha

, Launchpad: A rhythm-based level generator for 2-D platformers, IEEE Transaction Computation Intelligence and AI Games 3 (2011), 1–16.

90.

Snodgrass

and Ontanón

, Player movement models for platformer game level generation, Proceeding of the 26th International Joint Conference Artificial Intelligence, (2017), pp. 757–763.

91.

Snodgrass

and Ontanón

, Leveraging multi-layer level representations for puzzle-platformer level generation, Proceeding of the Thirteen Artificial Intelligence and Interactive Digital Entertainment Conference, 2017.

92.

Snodgrass

and Ontañón

, Procedural level generation using multi-layer level representations with MdMCs, Proceeding of the Proceeding of the Conference Computation Intelligence and Games, (2017), pp. 280–287.

93.

Sorenson

and Pasquier

, The Evolution of Fun: Automatic Level Design Through Challenge Modeling, Proceeding of the International Conference on Computational Creativity (2010), pp. 258–267.

94.

Stoddart

, Soft launch vs. hard launch: taking a new product to market, 2018. http://marketing.channelcreator.com/blog/all/soft-launch-vs-hard-launch-taking-a-new-product-to-market

95.

Summerville

, Mariño

J.R.H.

, Snodgrass

, Ontañón

and Lelis

L.H.S.

, Understanding Mario: an evaluation of design metrics for platformers, Proceeding of the 12th International Conference Foundations of Digital Games (2017), pp. 8.

96.

Summerville

, Snodgrass

, Guzdial

, Holmgård

, Hoover

A.K.

, Isaksen

, Nealen

and Togelius

, Procedural contentgeneration via machine learning (PCGML), IEEE Transaction onGames 10 (2018), 257–270.

97.

Swagbucks, (n.d.). www.swagbucks.com

98.

Tan

C.T.

, Bakkes

and Pisan

, Inferring player experiences using facial expressions analysis, Proceeding of the ACM International Conference on Interactive Entertainment, 2014.

99.

Tan

C.T.

, Bakkes

and Pisan

, Correlation between facial expressions and the game experience questionnaire, Proceeding of the 13th International Conference on Entertainment Computation, (2014), pp. 229.

100.

Tan

C.T.

, Rosser

, Bakkes

and Pisan

, A feasibility study in using facial expressions analysis to evaluate player experiences, Proceedings of The 8th Australasian Conference on Interactive Entertainment: Playing the System, 2012.

101.

Thompson

, ‘With Fate Guiding My Every Move’: The Challenge of Spelunky, Proceeding of the Foundation of Digital Games, 2015.

102.

Togelius

, Kastbjerg

, Schedl

and Yannakakis

G.N.

, What is procedural content generation?: Mario on the borderline, Proceeding of the 2nd International Workshop Procedural Content Generation Games, (2011), pp. 3.

103.

Volz

, Schrum

, Liu

, Lucas

S.M.

, Smith

and Risi

, Evolving mario levels in the latent space of a deep convolutional generative adversarial network, Proceedings of the Genetic and Evolutionary Computation Conference, (2018), pp. 221–228.

104.

What are Sensors?, http://manaelectron.com/1108/what-is-sensors/.(accessed 27 December 2020).

105.

Wheat

, Masek

, Lam

C.P.

and Hingston

, Modeling perceived difficulty in game levels, Proceeding of the Australasian Computer Science Week Multiconference, (2016), pp. 1–8.

106.

Zhan

, Li

, Ogunbona

and Safaei

, Facial Expression Recognition System for Online Games, International Journal of Computation Games Technology, (2008), 1–7.

107.

Zhou

, Huang

and Wang

, Real-time facial expression recognition in the interactive game based on embedded Hidden Markov Model, Proceeding of the International Conference Computation on Graphics Imaging Visuualization, 2004.

108.

Zhou

, Huang

, Xu

and Wang

, Real-time facial expression recognition based on boosted embedded hidden Markov model, Proceeding of the Third International Conference Image on Graphics, 2004.