LuckyMera: a modular AI framework for building hybrid NetHack agents

Abstract

In the last few decades we have witnessed a significant development in Artificial Intelligence (AI) thanks to the availability of a variety of testbeds, mostly based on simulated environments and video games. Among those, roguelike games offer a very good trade-off in terms of complexity of the environment and computational costs, which makes them perfectly suited to test AI agents generalization capabilities. In this work, we present LuckyMera, a flexible, modular, extensible and configurable AI framework built around NetHack, a popular terminal-based, single-player roguelike video game. This library is aimed at simplifying and speeding up the development of AI agents capable of successfully playing the game and offering a high-level interface for designing game strategies. LuckyMera comes with a set of off-the-shelf symbolic and neural modules (called "skills"): these modules can be either hard-coded behaviors, or neural Reinforcement Learning approaches, with the possibility of creating compositional hybrid solutions. Additionally, LuckyMera comes with a set of utility features to save its experiences in the form of trajectories for further analysis and to use them as datasets to train neural modules, with a direct interface to the NetHack Learning Environment and MiniHack. Through an empirical evaluation we validate our skills implementation and propose a strong baseline agent that can reach state-of-the-art performances in the complete NetHack game. LuckyMera is open-source and available at https://github.com/Pervasive-AI-Lab/LuckyMera.

Keywords

Reinforcement learning imitation learning hybrid models NetHack bot

1 Introduction

In the last few years, Artificial Intelligence algorithms achieved astonishing results in a wide range of tasks, exploiting both classical, symbolic approaches and data-driven methodologies from the field of Machine Learning [23]. Research in this area was sustained and encouraged by the availability of several benchmarks, needed to experiment with new architectures and technologies. Video games in particular, gained a great popularity since they offer challenging experiences, similar to real-world problems, at a much smaller cost; therefore, they are excellent playgrounds to study and test different approaches. AI was applied to different environments with surprising success, e.g. winning against the world chess champion [9] and, especially with the introduction of deep architectures in the field of Reinforcement Learning (RL), beating professional players in Go [29] and Dota 2 [3].

Among the wide variety of games available, roguelikes are of particular interest due to their unique features. Roguelikes 1 are turn-based, role-playing games, with a deep focus on cautious exploration, fighting enemies and wise resource management. This kind of game also features random generation of the levels structure, together with the type of enemies and objects the player will find, and permadeath, meaning that there is no checkpoint during the game, and players have to start from the first level each time they die. Because of these features, roguelike video games are extremely challenging, since players are required to deal with a variety of situations, each time having to overcome many levels, where a single mistake could ruin an entire run. Given these characteristics, roguelike video games are perfectly suited to test AI agents ability to generalize in increasingly complex environments.

NetHack, published in 1987, is one of the earliest and most popular roguelikes; here, the player controls the hero, selected among different races, roles and alignments, with the objective to retrieve the Amulet of Yendor by exploring over 50 random generated floors, and offer it to a deity to became a demigod. Each level is made of rooms connected by corridors, filled with monsters, objects and other peculiar features, such as shops or altars. NetHack, in its original version, provides a simple terminal interface (Fig. 1), depicting the map of the current level. In addition, it shows a message on top of the screen offering additional information, and a bottom line with character statistics.

Fig. 1

An example of the NetHack ASCII interface.

The game offers a complex, procedurally generated open world with sparse rewards, forcing the agent to explore, reason and acquire knowledge about hundreds of entities. NetHack is considered one of the most difficult games for humans, and a hard challenge for modern RL models as well; in fact, current best models are only comparable to human beginners 2 , a beginner score is less than 2,000 score points.

In this work, our objective is to present a complete and integrated framework, that can facilitate the research in AI exploiting the environment offered by NetHack. We argue that our framework is an effective tool to design a number of agents playing the game, using both classical symbolic AI solutions and Machine Learning ones. To the best of our knowledge, this is the first open-source framework aimed at the definition of AI agents built around the game of NetHack and the NetHack LearningEnvironment.

Our main contributions can be summarized as follows:

We introduce LuckyMera 3 , a modular and extensible framework for building intelligent agents for NetHack. It integrates different AI paradigms, i.e. symbolic and neural approaches, and offers the possibility to easily define custom modules to solve specific tasks;

We discuss different approaches to the game, in particular Imitation Learning, Reinforcement Learning and Neuro-Symbolic methods. We perform ablation studies concerning these components, to analyze their performance;

We show how a bot build with LuckyMera is able to reach state-of-the-art performances within the top 6 bots of the NeurIPS NetHack Challenge 2021 among over 600 submissions [15].

2 Related work

In this section, we review some studies related to our work, starting from the virtual environments defined around the game of NetHack for AI agents, and relevant AI approaches, namely Rule-Based, Imitation Learning and Reinforcement Learning methodologies.

2.1 NetHack as AI testbed

As we will see more in depth in Section 2.2, the environment offered by NetHack was widely used to develop and test intelligent agent capable of playing the game. An interesting approach is the one proposed in [7], in which the authors present a solution to explore the levels of the game exploiting the concept of occupancy maps, especially popular in robotics. In RL research, the first example of usage of NetHack is gym_nethack [8, 6], which offers an interface to the game through a Gym [5] environment. However, in this case, the dynamics were heavily modified by removing several obstacles, resulting in a much simpler version of the game.

In this work, instead, we make deep use of the NetHack Learning Environment (NLE) [21]: a Gym environment that leaves the game mechanisms unchanged. It is designed to wrap the entire game, returning all the observation available from the game, i.e. the map of the level, the current agent’s statistics, the textual message showed to the user and information about the inventory. The environment has 93 available actions, divided in 16 movement actions and 77 command actions. NLE is of particularly interesting because it is able to combine a complex environment with a fast simulator, being extremely efficient and computationally lightweight. Since current architectures cannot win the game, MiniHack [27] was released: a framework defined on top of NLE, which proposes a set of simpler environments, together with the possibility to easily design new tasks. The tasks proposed can be mainly divided into navigation tasks, in which the agent has to reach a goal position, and skill tasks, which involve more complex abilities, such as using potions, selecting the appropriate armor and fighting more powerful monsters.

We were deeply inspired by the results obtained in the NeurIPS 2021 NetHack Challenge [15]. Our framework was developed using mainly the challenge task, but it is completely environment-independent, and it work well with all the tasks proposed in NLE and MiniHack.

2.2 First AI Bots for NetHack

From its initial release, there have been several bots addressing the problem of NetHack. One of the first able to achieve significant results is TAEB 4 , a modular framework for designing automatic and semi-automatic players. It uses the publish/subscribe paradigm to perform the communication among the different components; for the pathfinding task, it employs Dijkstra’s algorithm [11]. The first symbolic bot able to "ascend", i.e. win the game, was BotHack 5 . Its architecture is particularly noticeable for the accurate recognition of the kind of floor the agent is exploring, and the use of the A* algorithm [17] for the navigation tasks. Nonetheless, it was able to achieve these results mostly using an exploit present in older NetHack version, which is no longer applicable in the current game. The current best open-source NetHack bot is AutoAscend 6 , winner of the NeurIPS 2021 NetHack Challenge. It implements a set of high-level strategies, each handling a specific behavior and wrapping multiple actions, and selects one based on its priority.

Although these approaches are able to obtain good results at the game, none of them offers a valid research platform, as we do with LuckyMera. Their goal was to create performance-oriented agents to win the game, while our main objective is to provide a development-oriented framework, to train, integrate and test neuro-symbolic approaches.

2.3 Imitation learning approaches

Imitation Learning is a Machine Learning technique in which the agent, to learn an intelligent behavior, instead of relying on the interaction with the environment, is provided with a set of demonstrations from an expert [31]. The agent’s objective is to mimic the expert’s actions, hopefully achieving an optimal policy, following a form of Supervised Learning. The dataset contains trajectories of experiences, made of state-action pairs; in particular, the trajectories will be in the form of $τ = {(s_{0}, a_{0}^{*}), (s_{1}, a_{1}^{*}), \dots, (s_{n}, a_{n}^{*})}$ where an action $a_{i}^{*}$ is assumed to be optimal in a given state s_i. It is critical to notice that, when performing Imitation Learning, the agent should not copy the expert’s behavior unconditionally; instead, it should extract key information from the trajectories, being able to generalize and achieve good performance also in states never seen before.

One of the simplest Imitation Learning algorithms is Behavioral Cloning (BC): given a state-action pair $(s_{t}, a_{t}^{*})$ , the objective is to learn a policy π by minimizing a loss function $L (a_{t}^{*}, π (s))$ , assuming that the pairs are i.i.d., which means that the pairs are mutually independent and have the same probability to appear. BC has been shown to achieve good results especially in environments with relatively small state space, so that it can be covered for the most part by the expert’s demonstrations, e.g. autonomous driving [13]. Nonetheless, in most cases it can be quite problematic due to the i.i.d. assumption. An improvement of BC is DAgger [26], that employs an iterative process in which it first performs Supervised Learning to learn a policy, like in BC. It then uses it to produce observations, queries the expert on those observations, and integrates the dataset with these new demonstrations. A different approach to Imitation Learning is represented by Inverse Reinforcement Learning [1], in which the idea is to learn the reward function by observing the demonstrations from the expert, and then use it to find the optimal policy with Reinforcement Learning algorithms. NetHack is particularly convenient to perform Imitation Learning, thanks to the NetHack Learning Dataset [16]. It collects both state transitions from human games, and state-action trajectories generated by the winner of the NetHack Challenge 2021, AutoAscend.

We will review in detail the implementation of the Behavioral Cloning algorithm we offer in LuckyMera in Section 3.2.4.

2.4 Reinforcement learning approaches

Reinforcement Learning algorithms are usually tested in simulated environments, like games. RL approaches have shown superhuman capabilities in classical games, such as Go [29] and Chess [28]; furthermore, there were also works on more complex, multiplayer games, like StarCraft II [30] and Dota 2 [3]. One of the most influential benchmark for Reinforcement Learning agents is the Arcade Learning Environment [2]. It offers an interface to hundreds of video games from the Atari 2600 game console, and it has been widely used in RL research as a testbed to experiment with different approaches [24]. However, the environments proposed are too simple and not adequate for current RL algorithms. Nowadays, AI research needs more challenging benchmarks, to test more elaborate capabilities in complex environments; in this sense, NetHack is perfect to drive the research in the next years. Furthermore, the Arcade Learning Environment simply provides an interface to the games, while LuckyMera builds on top of the game interface to offer a modular framework for designing AI agents.

A comparable framework to our work is OpenSpiel [22], which is a collection of game environments and algorithms, including both search/planning strategies and reinforcement learning, with the possibility to add new games and methods. Their objective is to provide a set of simple and traditional games, enabling researchers to assess the same algorithm across multiple environments. In contrast, we focus on a single yet complex environment, which requires different skills, releasing a framework mainly oriented towards the definition of modular architectures.

Several studies on Reinforcement Learning were conducted on the NetHack environment. The MiniHack suite was used to test the E3B algorithm [18], a method for defining intrinsic exploration bonuses based on learned embeddings of previous states. Chester et al. [10] proposed a hybrid approach, using symbolic planning for low-level actions, and Reinforcement Learning to train a meta-controller; they show that this method surpasses the baseline algorithms in a custom MiniHack environment. Powers et al. [25] presented CORA, a platform for Continual Reinforcement Learning, providing MiniHack as one of the benchmark environments. Using the MiniHack tasks from CORA, Kessler et al. [19] studied a task-agnostic, model-based method for Continual Reinforcement Learning, showing it to be a strong baseline compared to state-of-the-art approaches.

LuckyMera is instead an approach-agnostic research platform; it can be expanded to test any Artificial Intelligence method or paradigm, including Reinforcement Learning ones. It is particularly convenient because of the possibility to train targeted skills, tackling a more feasible problem, and then to integrate them with the other modules offered by the framework.

3 The LuckyMera Framework

LuckyMera is a framework for simplifying the development of Artificial Intelligent agents able to play the game of NetHack, designed following the principles of modularity, extensibility and configurability. The main objective of the architecture is to provide a high-level interface for defining game strategies, represented in the code through the Skill abstraction. A skill is defined as any complex activity — a composition of several elementary actions to achieve a given goal — that can be planned and executed in a given state of the NetHack game. Each skill is defined as a separate module, so that it can communicate with the main components of the framework without limitation on the implementation details. In the following sections, we will present some examples of modules released with the framework. A complete overview on how to use the framework can be found in Appendix 5.

3.1 Design of the Agent

The LuckyMera framework is released together with the implementation of an AI agent, designed following the modular structure, that represents a useful baseline for further studies. At the highest level of abstraction, the agent’s strategy simply consists of the iterative execution of the highest-priority plannable skill in the current state; the user can easily set the priority of each skill via a configuration file. In Figure 2 there is a high-level representation of the system execution flow.

Fig. 2

Flowchart of the behavior of the LuckyMera agent. Essentially, it iterates through planning and execution. During skill planning, it gets the highest priority skill that can be planned; then, this skill is actually executed.

The agent’s main execution flow is made by the iteration, throughout the duration of the game, of the two principal phases: skill-planning and skill-execution. The planning of a skill begins with the analysis of the game state, derived from the NLE observations. During this phase, the agent verifies if the skill can actually be executed, i.e. its preconditions are satisfied, and performs some preliminary steps, depending on the nature of the skill itself. The module implementing a skill should provide the planning method, in compliance with the framework’s main component interface. Thanks to this abstraction, the system is perfectly compatible out-of-the-box with any symbolic and neural skill implementation, and in general with any AI module; we will discuss this feature more in detail in Section 3.2.2. Once the planning of a skill succeeds, the agent performs the skill-execution phase. Each skill provides the implementation of a series of actions needed to perform its plan.

Our framework offers also the possibility to easily define custom modules. To leverage this extensibility property, we provide simple class templates, that enable the definition of any specific action. An in-depth discussion about the extensibility property of LuckyMera is present in Section 3.2.1 .

In addition to the high-level modules, the architecture also provides low-level solutions for interacting with the game. In particular, it is based on the GameWhisperer class, which deals with the interaction with the NLE framework, offering several refinements to the low-level observations; it also defines methods to encapsulate multiple NetHack commands into single, more abstract atomic commands. On the other hand, to handle the navigation in the game world, the system leverages on the DungeonWalker class. It offers some useful functionalities for pathfinding and exploration. To do so, it employs the A* algorithm [17], using the octile distance heuristic [4] to take into account also diagonal movements.

3.2 Features of the Framework

Here we will discuss some of the main features that LuckyMera offers, which make it an integrated framework for quick testing of new approaches, automatic creation of labeled trajectories and training of Machine Learning models. During the discussion, we will also present some possible applications of the framework, namely a neuro-symbolic approach and our baseline agent LuckyMera-v1.0.

3.2.1 Definition of New Skills

The main features of the framework are represented by its modularity and extensibility. In fact, each in-game action is defined as an independent module, and the LuckyMera main component is in charge of composing the various skills, without imposing constraints on their internal structure.

The interface it exposes has been designed to offer the possibility to define custom modules, by extending one of the classes that represent the concept of skill. The base class for defining new modules is Skill; it is an abstract class, therefore it provides only some general-purpose methods, without a real implementation of planning and execution. In most cases, custom modules should inherit from this class. Another example is ReachSkill, which concerns simple navigation skills, offering an execution method to reach specific locations in the game world. Similarly, the HiddenSkill class implements useful methods for finding secret passages or areas in the game.

This interface simplifies the definition and integration of new skills, providing researchers with a convenient way to test any AI methodology on the challenging NetHack benchmark. Our framework reduces the need to write extensive boilerplate code to ensure functionality, as it manages all the low-level interactions with the game. This allows researchers to convey their efforts into the implementation of their approaches only.

3.2.2 Skills Integration

The main component of the framework offers a simple, straightforward interface, making it easy to integrate skill modules. Each module should inherit from one of the classes representing the Skill concept and, by doing so, define its own planning and execution methods. All the actions that a LuckyMera agent can perform are defined as skills, and are executed following a priority list. The strategy the agent follows is determined by the priority assigned to each module. In the framework, the order of the modules can be easily changed through a configuration file. In our tests, we let the agent adopt a cautious strategy, in order to maximize the score⁷. In particular, the top-priority actions are the ones that can help the bot overcoming dangerous situations, like praying to receive resources, engraving the name “Elbereth” to scare enemies and run. After those actions, the agents checks if it can fight nearby monsters. Otherwise, if it is in a safe circumstance, it can explore the unseen parts of the dungeon, or search for hidden rooms and corridors. The complete list of the currently implemented skills is available in Appendix 5.1. It includes both the symbolic skills and the ones coming from the integration of the neural modules.

Besides these skills, it is possible to integrate any external module compliant with the interface, as presented in Section 3.2.1. In fact, the framework allows for the import of any given model, so that new actions — or new strategies for already defined operations — can be implemented, or trained neural models can be used to perform specific tasks. As an example, the framework was tested with the integration of a neural RL agent, trained using the IMPALA algorithm [12], with the implementation from TorchBeast [20]. Since the task of playing the entire game of NetHack is too difficult for current RL approaches, the training was executed on some MiniHack environments, which offers more controlled and feasible challenges. The environments selected are Room-Ultimate-15x15 and KeyRoom-S5, represented in Figure 3. In this case, the game map is depicted using the pixel observation from MiniHack, which is offered in addition to the standard ASCII interface. To improve the performance of the pure neural agents, the trained models were then integrated with some prior knowledge about the game, in the form of a set of simple, generic rules. More details about the symbolic rules are available in Section 4.1.

Fig. 3

MiniHack environments used for the training of the RL agent.

3.2.3 Trajectory Saving

The architecture we propose comes with the possibility to save the experiences of the bot with the environment, in the form of trajectories of state-action pairs. Given the performance of a LuckyMera bot, its behavior is meaningful and valid in the context of the game, and it could be used as an expert, e.g. in Imitation Learning applications. Within the framework, it is possible to exploit the capabilities of the bot to define a dataset of experiences, which is automatically labeled with the actual action performed by the agent. The trajectory saving mechanism is independent of the type of observation: it is possible to store any element present in the observation space defined by NLE, by defining them at runtime. The framework integrates also the NLE Language Wrapper [14], which translates the non-language observations from NetHack, e.g. glyphs and chars, into corresponding language representations (Figure 4). These language observations can be selected in the saving process as well, and could be useful in the fine-tuning of language models.

Fig. 4

One level of NetHack viewed with different representations. The top image is the standard ASCII interface, while the bottom image shows the language representation provided by the NLE Language Wrapper.

3.2.4 Training of Neural Agents

The LuckyMera framework offers also the possibility to perform training processes on the NetHack environment, providing an interface that can handle any training algorithm. In fact, the architecture comes with an abstract class that represents a generic training procedure, that should be extended to define a specific algorithm. In this way, the training process is strictly incorporated in the system, so that it is easy to evaluate the model performance and integrate it with the other modules, giving the possibility to also define hybrid architectures. As an example, we provide the implementation of the Behavioral Cloning algorithm, one of the most intuitive approaches for Imitation Learning; it is briefly described in Algorithm 1.

Algorithm 1 Behavioral Cloning

Ensure: a policy π_θ trained on the problem

While L (a^*, π (s)) is not small enough do

Collect trajectories τ₁, …, τ_n from the expert.

Get all the $(s_{i}, a_{i}^{*})$ pairs from each τ_i, assuming they are i.i.d..

Learn policy π_θ by minimizing L (a^*, π (s)).

end while

3.2.5 Example of Usage: LuckyMera-v1.0

LuckyMera comes with a set of pre-defined modules, to showcase one of the possible concrete use-cases of the framework. These modules implement some basic actions and the relative reasoning processes, that consider the facts known by the agent and its surroundings to select the best skill to perform. Such actions allow the agent to interact with the NetHack environment and to progress in the game. Indeed, there are modules designed to explore the dungeon, other ones to deal with the enemies present in the game, and other skills for actions such as eating or praying. The complete list of the skills released together with the LuckyMera framework is reported in Table 3.

These skills specify the behavior of a possible agent, referred to as LuckyMera-v1.0. This implementation mainly serves as a practical demonstration of the usage of the framework, emphasizing the intrinsic modularity of our architecture, rather than positioning itself as a fully-fledged AI agent. With the LuckyMera-v1.0 agent, our goal is to establish and release a solid baseline — a starting point for further developments — without the aspiration to surpass the state-of-the-art. Nonetheless, we assessed our approach against the current best-performing solutions, achieving comparable results. Further details regarding this evaluation can be found in Section 4.

Fig. 5

Results of the ablation studies performed on the framework.

4 Empirical evaluation

In this section, we evaluate the results of the experiments we conducted with the LuckyMera framework. Firstly, we will analyse the ablation studies we performed on some modules of the architecture, namely the hybrid RL module and the Imitation Learning approach, to see their individual performance. We will then present the results obtained by the LuckyMera agent we release, comparing it with the participants in the NeurIPS 2021 NetHack Challenge.

4.1 Reinforcement learning approach results

We integrated a Reinforcement Learning approach, based on the IMPALA algorithm. As testing environments, we used the Room-Ultimate-15x15 and KeyRoom-S5 tasks from MiniHack. Initially, we tested the pure neural approach, then we integrated it with some prior knowledge about the problem, in the form of basic symbolic rules, expressed using first-order logic. Those rules were used to increase the probability of crucial actions, like attacking nearby monsters and moving towards the key. In particular, the rules we employed are showed in Table 1.

Table 1
Logic rules used in integration with the Reinforcement Learning

do_not_hit_stone: ∀x, ∀ y Agent (x) ∧ Stone (y) ∧ AreClose (x, y) ⇒ ¬ Move (x, y)

attack_enemies: ∀x, ∀ y Agent (x) ∧ Enemy (y) ∧ AreClose (x, y) ⇒ Attack (x, y)

move_to_key: ∀x, ∀ y Agent (x) ∧ Key (y) ∧ AreClose (x, y) ⇒ Move (x, y)

do_not_repeat_action: ∀x, ∀ y Agent (x) ∧ Action (y) ∧ LastAction (x, y) ⇒ ¬ Perform (x, y)

In Figure 5a, results for the two environments considered are reported. It is clear that, in both cases, the integration of rules led to a increase in the agent’s performance.

4.2 Imitation Learning Approach Results

The imitation learning approach implemented in LuckyMera is based on the Behavioral Cloning algorithm. This method was tested using the Room-5x5 environment from MiniHack. It was selected for its low complexity, which guarantees relatively fast training processes and low space occupation to store the dataset. The model was trained for five epochs, and then the learned policy was evaluated in a different instance of the environment. In Figure 5b we present a comparison between the performance of a random agent and a trained model. It is clear that the trained agent is always able to solve the task, being also close to the optimal behavior.

4.3 LuckyMera-v1.0 Agent Baseline

As a final validation of our implementation, we designed a LuckyMera agent composed of multiple skills (more details in Appendix 5.1) and tested against the most challenging NetHackChallenge-v0 environment, which represents the complete NetHack game. This was done to compare our approach with the participants of the NeurIPS 2021 NetHack Challenge [15], which represent the current state-of-the-art models tackling the game of NetHack.

Our agent was able to achieve an average score of 1046.96 and a median of 817. In Figure 6, the results of LuckyMera are put against the highest scoring teams from the challenge. The agent is able to virtually reach the 6^th position, overcoming more than 80% of the competitors. The challenge ran for 144 days, with 46 participating teams and 631 overall submissions⁸.

Fig. 6

Results obtained by a LuckyMera agent, compared with the state-of-the-art bots from the NeurIPS NetHack Challenge 2021.

5 Conclusion

LuckyMera is a flexible, modular, extensible and configurable framework to speed up the development of smart AI agents tackling the NetHack game. The architecture is designed to assist AI researchers by offering them an integrated platform to seamlessly implement and test their solutions in a challenging environment, which represents an excellent test bed for evaluating AI approaches. That is provided through an intuitive code interface, that hides all the boilerplate code and implementation details needed to interact correctly with the game.

LuckyMera is also approach-agnostic, since it is capable of integrating any AI module. We tested it with symbolic skills, Reinforcement Learning policies and Imitation Learning approaches, and it natively supports any AI algorithm and implementation.

The framework also includes a strong baseline agent, called LuckyMera-v1.0, capable of achieving good results in the game, and offers the possibility to easily extend its behavior using external modules. Such modules can be symbolic rules performing a specific action, or neural models trained on a given task. The architecture also provides the possibility of creating automatically labeled datasets, by storing the experiences of the agent in the form of trajectories made by state-action pairs. It is possible to specify the elements of the observations to save, including the language representations from the NLE Language Wrapper. The trajectories can be used to train neural models via Imitation Learning, or to fine-tune language models to interact with the environment.

However, our approach has also some limitations. Firstly, the LuckyMera framework, being entirely developed in the Python language, currently supports only Skills written in this language.

Following that, we believe that our code base could be further optimized. We spent our efforts to make it clear and efficient, but additional improvements can be performed, to make it more accessible and open to external collaborations. Moreover, we plan to implement a new feature that exploit the parallelism of CPUs and GPUs architecture. This would allow researchers to simultaneously execute multiple runs of their LuckyMera agent.

Conflict of interest statement

All authors declare that they have no conflicts of interest.

Footnotes

Acknowledgments

Research partly funded by PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 - "FAIR - Future Artificial Intelligence Research" - Spoke 1 "Human-centered AI", funded by the European Commission under the NextGeneration EU programme.

Appendix

B. LuckyMera Skills

The LuckyMera framework is released with a set of implemented skills; those skills are the one used to obtain the results presented in Section 4. The complete list of LuckyMera skills is presented in . The list can be modified, by adding or removing skills, and each skill can be easily changed, implementing new approaches. Further details about the game mechanics can be found at the NetHack Wiki⁸.

in , a beginner score is less than 2,000 score points.

Being a modular framework, it is similar to a chimera: a mythological hybrid creature composed of different animal parts. But NetHack is difficult, so it needs to be lucky "mera" (which stands for "a lot" in Sardinian)!.

The leaderboard is available at .

References

Arora

, Doshi

, A survey of inverse reinforcement learning: Challenges, methods and progress, Artificial Intelligence 297 (2021), 103500.

Bellemare

M.G.

, Naddaf

, Veness

, Bowling

, The arcade learning environment: An evaluation platform for general agents, Journal of Artificial Intelligence Research 47 (2013), 253–279.

Berner

, Brockman

, Chan

, Cheung

, Dębiak

, Dennison

, Farhi

, Fischer

, Hashme

, Hesse

, et al., Dota 2 with large scale deep reinforcement learning, arXivpreprint arXiv:1912.06680, 2019.

Bjornsson

, Halldorsson

, Improved heuristics for optimal pathfinding on game maps. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 2 (2006), pp. 9–14.

Brockman

, Cheung

, Pettersson

, Schneider

, Schulman

, Tang

, Zaremba

, Openai gym, 2016.

Campbell

, Verbrugge

, Learning combat in nethack. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 13 (2017), pp. 16–22.

Campbell

, Verbrugge

, Exploration in nethack with secret discovery, IEEE Transactions on Games 11(4) (2019), 363–373.

Campbell

, Verbrugge

, Exploration in nethack with secret discovery, IEEE Transactions on Games 11(4) (2019), 363–373.

Campbell

, Hoane

A.J.

Jr , Hsu

F.-h.

, Deep blue, Artificial Intelligence 134(1-2) (2002), 57–83.

10.

Chester

, Dann

, Zambetta

, Thangarajah

, Oraclesage: Planning ahead in graph-based deep reinforcement learning. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19-23, 2022, Proceedings, Part IV, (2023), pp. 52–67. Springer.

11.

Dijkstra

E.W.

, A note on two problems in connexion with graphs. In Edsger Wybe Dijkstra: His Life, Work, and Legacy, (2022), pp. 287–290. Numerische Mathematik.

12.

Espeholt

, Soyer

, Munos

, Simonyan

, Mnih

, Ward

, Doron

, Firoiu

, Harley

, Dunning

, et al., Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures, In International Conference on Machine Learning (2018), pp. 1407–1416. PMLR.

13.

Farag

, Saleh

, Behavior cloning for autonomous driving using convolutional neural networks. In 2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), (2018), pp. 1–7. IEEE.

14.

Goodger

, Vamplew

, Foale

, Dazeley

, A nethack learning environment language wrapper for autonomous agents, Journal of Open Research Software 11 (2023), 06.

15.

Hambro

, Mohanty

, Babaev

, Byeon

, Chakraborty

, Grefenstette

, Jiang

, Daejin

, Kanervisto

, Kim

, et al., Insights from the neurips nethack challenge. In NeurIPS 2021 Competitions and Demonstrations Track, (2022), pp. 41–52. PMLR.

16.

Hambro

, Raileanu

, Rothermel

, Mella

, Rocktaschel

, Kuttler

, Murray

, Dungeons and data: A large-scale nethack dataset, arXiv preprint arXiv:2211.00539, 2022.

17.

Hart

P.E.

, Nilsson

N.J.

, Raphael

, A formal basis for the heuristic determination of minimum cost paths, IEEE Transactions on Systems Science and Cybernetics 4(2) (1968), 100–107.

18.

Henaff

, Raileanu

, Jiang

, Rocktaschel

, Exploration via elliptical episodic bonuses, arXiv preprint arXiv:2210.05805, 2022.

19.

Kessler

, Miłoś

, Parker-Holder

, Roberts

S.J.

, The surprising effectiveness of latent world models for continual reinforcement learning, arXiv preprint arXiv:2211.15944, 2022.

20.

Kuttler

, Nardelli

, Lavril

, Selvatici

, Sivakumar

, Rocktaschel

, Grefenstette

, Torchbeast: A pytorch platform for distributed rl, arXiv preprint arXiv:1910.03552, 2019.

21.

Kuttler

, Nardelli

, Miller

, Raileanu

, Selvatici

, Grefenstette

, Rocktaschel.

, The nethack learning environment, Advances in Neural Information Processing Systems 33 (2020), 7671–7684.

22.

Lanctot

, Lockhart

, Lespiau

J.-B.

, Zambaldi

, Upadhyay

, Perolat

, Srinivasan

, Timbers

, Tuyls

, Omidshafiei

, et al., Openspiel: A framework for reinforcement learning in games, arXiv preprint arXiv:1908.09453, 2019.

23.

LeCun

, Bengio

, Hinton

, Deep learning, Nature 521(7553) (2015), 436–444.

24.

Mnih

, Kavukcuoglu

, Silver

, Graves

, Antonoglou

, Wierstra

, Riedmiller

, Playing atari with deep reinforcement learning, arXiv preprint arXiv:1312.5602, 2013.

25.

Powers

, Xing

, Kolve

, Mottaghi

, Gupta

, Cora: Benchmarks, baselines, and metrics as a platform for continual reinforcement learning agents. In Conference on Lifelong Learning Agents, (2022), pp. 705–743. PMLR.

26.

Ross

, Gordon

, Bagnell

, A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics , (2011), pp. 627–635. JMLR Workshop and Conference Proceedings.

27.

Samvelyan

, Kirk

, Kurin

, Parker-Holder

, Jiang

, Hambro

, Petroni

, Kuttler

, Grefenstette

, Rocktaschel

, Minihack the planet: A sandbox for open-ended reinforcement learning research, arXiv preprint arXiv:2109.13202, 2021.

28.

Schrittwieser

, Antonoglou

, Hubert

, Simonyan

, Sifre

, Schmitt

, Guez

, Lockhart

, Hassabis

, Graepel

, et al., Mastering atari, go, chess and shogi by planning with a learned model, Nature 588(7839) (2020), 604–609.

29.

Silver

, Huang

, Maddison

C.J.

, Guez

, Sifre

, Van Den Driessche

, Schrittwieser

, Antonoglou

, Panneershelvam

, Lanctot

, et al., Mastering the game of go with deepneural networks and tree search, Nature 529(7587) (2016), 484–489.

30.

Vinyals

, Babuschkin

, Czarnecki

W.M.

, Mathieu

, Dudzik

, Chung

, Choi

D.H.

, Powell

, Ewalds

, Georgiev

, et al., Grandmaster level in starcraft ii using multiagent reinforcement learning, Nature 575(7782) (2019), 350–354.

31.

Zheng

, Verma

, Zhou

, Tsang

I.W.

and Chen

, Imitation learning: Progress, taxonomies and challenges, IEEE Transactions on Neural Networks and Learning Systems (2022), pp. 1–16.

do_not_hit_stone:	∀x, ∀ y Agent (x) ∧ Stone (y) ∧ AreClose (x, y) ⇒ ¬ Move (x, y)
attack_enemies:	∀x, ∀ y Agent (x) ∧ Enemy (y) ∧ AreClose (x, y) ⇒ Attack (x, y)
move_to_key:	∀x, ∀ y Agent (x) ∧ Key (y) ∧ AreClose (x, y) ⇒ Move (x, y)
do_not_repeat_action:	∀x, ∀ y Agent (x) ∧ Action (y) ∧ LastAction (x, y) ⇒ ¬ Perform (x, y)