Abstract
Thinking to oneself is a prerogative of man when he needs to think about or repeat what he is doing or experiencing. It is a way of processing information and setting in motion a decision-making process. When this is done aloud, there is also a chance that someone else will understand the meaning or reasons for the action. Equipping an agent with the ability to reveal the reasons for its decisions is both a way to improve human interaction and a way to improve the triggering of a decision process. In this work, we propose to use the speech act to enable a coalition of agents to exhibit inner speech capabilities to explain their behavior, but also to guide and reinforce the creation of an inner model. The BDI agent paradigm, Jason, and CArtAgO are used to give agents the ability to act in a human-like manner. The BDI reasoning cycle has been extended to include inner speech. The proposed solution continues the research path that started with the definition of a cognitive model and architecture for human-robot teaming interaction and aims to integrate the believable interaction paradigm in it.
Introduction
Agents think, agents act, agents explain. That’s the dream, to have systems that work with us in a self-adaptive, self-autonomous, self-explanatory way. When working in a team, understanding why the other person is performing a particular action is fundamental. Getting an agent to make a decision that takes into account capabilities such as autonomously choosing the best action, updating its own knowledge in pursuit of specific goals, or observing the behavior of others therefore presents a number of challenges.
In a collaborative task, people and agents cooperate and work together to achieve a common goal. They share goals, knowledge about the world and about themselves, i.e., skills, abilities, etc., assignments and tasks, and communicate to share information. They also communicate to delegate or commit actions when needed. For example, when a team member understands and knows what they are supposed to do, but realizes they are not able to do it. Factors such as trust, mental states (the inclination or willingness to perform an action), and knowledge derived from the other person’s ability to explain the reasons for their actions or outcomes must also be considered.
The interaction between humans and agents also raises issues that are still being explored from a modeling and implementation perspective. Indeed, it is not possible to define the characteristics of a system evolving at execution time and fully explore its requirements. It is thus a need to make an intelligent system analyze the input given by its own actions and use it to either provide an explanation for its actions or to repeat the reasoning process. Our idea is based on the creation of a cognitive model and the corresponding agent architecture, whose modules allow the structuring of the agent’s decision-making process, taking into account its internal states. In the past, we have combined aspects of self-modeling and Theory of Mind to integrate the elements of interaction mentioned earlier. At the implementation level, we have looked at the performance of the BDI agent paradigm [17, 28, 29] and one of the best known and most widely used agent languages, Jason [6, 7].
In the present work, we go one step further and propose a model that gives the agent the ability to reason aloud, i.e., to externalize its reasoning and thus make its actions understandable to humans by demonstrating the reasoning process aloud. People are able to talk to themselves, either aloud or in a quiet voice, to improve their understanding of what they are doing, self-regulate their behavior, and increase their knowledge of what surrounds them by using the feedback that so-called inner speech gives them. The main contribution of this paper is an architectural approach to identify modules in which inner speech can be implemented, and then bridge to a specific implementation using Jason and CArtAgO. Thus, we propose to combine the concept of inner speech with that of speech act to bridge the gap between the cognitive model and its implementation. and a further extension of the BDI reasoning cycle. We then implement the newly introduced inner speech capabilities using Jason for agent programming and CArtAgO [26, 30] for environment programming. In this work, we have heavily used the properties of Jason and CArtAgO to elaborate our theory. Therefore, we provide an initial validation of the work using these tools at the end of the paper. The full development of a scenario in which an agent system is able to exhibit inner speech in human interaction is deferred to future work, here we will also address the methodological part of system development. Here we will illustrate the architectural model and show how the BDI argumentation cycle can also be extended from an implementation perspective.
The rest of the paper is organized as follows: in section 2, we describe the work done so far in this context and then, in sections 3 and 4, we give an overview of the concepts of inner speech and speech acts with the aim of highlighting their characteristics and the motivation for using them; in section 5, the model we propose; in section 6 the validation scenario of the model with its code; and finally, in section 7 some conclusions are drawn.
Decision Process in Human-Agent Interaction
Adaptivity, autonomy, and proactivity are the qualities or skills that each person must possess in order to decide what actions to take to achieve the desired results in reaching a goal. Whether it is humans or agents (or robots), someone must have the ability to make appropriate decisions, and if it is a human-agent interaction, the agent must have the ability to select an action and decide whether to perform it by itself or delegate it to someone else.
In [11] we proposed an architecture for modelling a human-robot team system. It is based on the standard model of mind [20], which takes into account the classical MAPE process (Monitoring, Analyse, Process, Execute) in the thought cycle of any intelligent system [1]. To this were added the modules for representing autonomous and adaptive interactions. We assumed that the inputs to the thought process are mainly motivations. Motivations are at the heart of the decision process and the elements that set it in motion. They include all the information and processes that represent the internal world of an agent, his beliefs, desires, intentions, knowledge and skills, but also norms, rules, emotions, the degree of trust in the other, and anything else that can serve as an input for an action. We have considered external (objects, resources, …) and internal (state of mind, emotions, …) environments. An agent acts in the external environment to achieve its goal, and also uses external resources and beliefs about its internal world to make decisions. Decision making often requires replanning, so anticipation is the process of generating a “current situation,” i.e., the state of the world, for a selected possible action. It makes it possible to imagine the outcome of an action and to actually perform it only if the outcome is consistent with the postconditions of the chosen goal. Dealing with all these elements gives the agent the possibility to justify and explain the result of an action.
In [12, 32] we proposed a prototype for implementing some modules of the described architecture using the BDI paradigm or the Jason language and a simple modification of the reasoning loop underlying the Jason interpreter to incorporate general motivation into the language, as Jason does in managing beliefs.
In [9] we used the trust model developed by Falcone and Castelfranchi [10, 14] in conjunction with practical reasoning theory to build a model for robot decision making. The idea was to extend the deliberative process and belief base representation so that the robot can decompose a plan into a series of actions and then associate each of these actions with knowledge useful for their execution. The function for computing justification is:
Where, α i is the selected action and B α i is the set of beliefs the agent owns on that action.
In this way, the robot creates and maintains a model of itself and can justify the results of its actions. Justification is a key outcome of the application of self-modelling capabilities and a useful means of improving trusting interactions. At the beginning of a BDI reasoning cycle, the agent (a Jason agent) updates all its beliefs and intentions, determines its desires, and selects some of them that become intentions to which it assigns a plan (line 1 to line 8 in Figure 1). After that, the agent usually processes the stack of actions to decide which ones to execute. We propose to insert a new function at this point:

which allows us to link part of the belief base to the robot’s capabilities. The set of actions A c is identified among all those for which at least one belief in the set of beliefs matches an ability of the agent. In this way, each time an agent does not know how to perform an action, or fails at an action, it can reason why.
In [33] we added the possibility to change the state of mind of the agent. After evaluating actions, the agent activates its inner language to change its mind and views about the environment. On the implementation side, we used speech acts and Jason. Now we provide another extension to the implementation to better handle the external and internal world. We propose to use JaCaMo [5, 6] where the separation between the agent, interaction and environment levels better allows us to create MASs where agents can explain their actions.
To our knowledge, there is no comparable approach in the literature, especially in the field of agents and robotics, that combines inner speech with speech acts, both conceptually and in terms of implementation. In what follows, we first provide an overview of inner speech and speech acts, then introduce JaCaMo and explain how its elements are used to implement inner speech.
A definition of inner speech is very complex. Some authors define inner speech as the subjective experience of speech in relation to one’s actions, feelings, and experiences. It is often used as a synonym for thinking. Most authors in the literature state that it is better to use the term mental process instead.
The term inner speech has several connotations in the literature. First, it is a concept developed and used in psychology. It is often associated with the concept of self-awareness and self-consciousness [23, 34]. Self-awareness is the ability to become the object of one’s own attention [13] and considers three possible elements: the social environment, the physical world, and the self. Self-awareness also includes the ability to direct attention to one’s mental state, i.e., perceptions, feelings, attitudes, intentions, emotions, etc.
Morin was the first to relate the concept of inner speech to the ability to think about oneself [24].
Inner speech is a concept used primarily in cognitive development and executive function theory. In a number of foundational studies, inner speech has been linked to cognition and behavior, or even to rehearsal and working memory. It is a form of reflection on one’s own experiences. People generally reflect on their own experiences in different ways [4]. Inner speech plays an important role in self-regulation of cognition and behavior, so it is used in people for both data collection and behavior regulation, and is also considered a motivational tool. In each case, inner speech is recognized as a feature of the developmental process. Thus, it provides the output to activate a developmental process.
According to Morin, inner speech is the activity of silent self-talk. There are many other synonyms or equivalents, with some differences between adults and children, for example, self-talk, inner dialog, private speech, or egocentric speech or self-verbalization. The latter two activities generally refer to children’s behavior of commenting on their own actions without caring whether they are understood or not. We are interested in the internal speech attitude that serves self-control, self-regulation, problem solving, planning, and remembering. Another important point we want to keep in mind is that inner speech serves information gathering. Or rather, self-information, as it is called by Morin and Everett. As mentioned earlier, a person who needs to perform an action may talk to himself. This conversation serves him to identify data and processes. So to act and on what elements. We claim that in this moment of the thinking process, one can acquire the ability to externalize his thinking, making it explainable or transparent. Our work contributes to report that in agents.
The sources of internal speech are the social milieu and the physical environment. For our purposes, however, we need only consider the latter. Indeed, the physical environment contains a set of stimuli that enable a person, in our case an agent, to think about the stimuli emanating from the external environment and to make appropriate decisions. Using inner speech or noisy inner speech during collaborative and interactive tasks can be a way to make cognitive and decision-making processes transparent. And it is also a way to perform verbal mediation with oneself to support specific activities. Especially with children, inner speech is often seen or used as an accompaniment or constant commentary on what they are doing. This is exactly what we want to do.
In this paper, we take a cue from [37] and propose to allow an agent to comment aloud on what it is doing. The actions it decides to take as it acts. Vygotsky also says in [36] that verbal accompaniment of actions is much more pronounced in tasks that present extreme difficulties. Since an agent must make decisions in a complex and dynamic environment for which it does not have a precise plan, modeling and implementing the agent’s inner language can lead to a more efficient decision-making process.
Human inner language is an almost instinctive tendency that emerges during the developmental phase. We use it from a theoretical point of view to create a computer model on which to build agent behavior that is transparent during human interaction. Our idea is to create a parallel with the internal language of the developmental process in children. We are mainly concerned with the development and design of robots that interact autonomously with humans. The robot, working in an environment that is not fully known, is in some ways like a developing child. In a previous work [27], we conducted experiments to measure whether and to what extent human trust can increase when interacting with robots. In this work, we take this as a given and attempt to develop an intelligent agent capable of explaining its actions through inner speech, focusing on a theoretical model to extend the BDI reasoning cycle, which is then implemented through speech acts.
Speech acts
The multi-agent paradigm provides a set of features that allow agents to talk to other agents. Communication between agents changes the state of the world of each agent that receives a new message. In general, this behavior can be traced back to the speech act theory proposed by Searle [31] and Austin [2].
Bordini et al. [8] described the speech act and how it works in an agent communication module. The basic principle of speech act theory lies in the meaning of speech. The principle can be summarized by assuming speech as an act. An artificial agent uses utterances to inform other agents of changes or to exchange new information that affects the surrounding world. The utterance produced by an agent effectively changes the state of the world in terms of beliefs, desires, and intentions possessed by a listening agent. In Austin’s framework of speech acts, speech acts are [2], locutionary, illocutionary, and, perlocutionary. Locutionary acts concern what was said and meant, illocutionary acts represent what was done, and perlocutionary acts describe what happened as a result. According to this theory, Searle identified [31] different types of speech acts: (i) representatives, (ii) directives, (iii) commisives, (iv) expressives, and (v) declarations. Theoretically, speech acts consist of two parts: (i) a performative verb and (ii) a propositional content.
A multiagent system supports basic speech act in its communication module by adopting agent communication languages [35],such as Knowledge Query and Manipulation Language (KQML) [15, 16] or FIPA [3, 18, 25]. For example, as described in [8], speech acts can be used in Jason by using a set of internal actions that allow agents to communicate with each other.
According to Bordini [8], the BDI framework Jason is able to perform illocutionary speech acts, also called performative speech acts. This is possible thanks to a set of internal actions that each agent possesses. Looking at the reasoning cycle of the agent Jason, the reasoning cycle even starts with checking mails. Each time an agent receives a message, that message is composed as follows: <
where
Given the list of all possible illocutionary speech acts, then, the point is to use them to let the agent speak to itself, to show the capacity for inner speech. In this way, inner speech can change the agent’s mental state, as well as his intentions and behavior. Moreover, inner speech, by definition, makes the agent’s decision-making process explainable.
Agents are able to explain themselves: the proposed approach
Suppose we have a situation where, in pursuing a collaborative task, we need to consider the following representation:
Suppose we know that there is a set of objects in the environment and all the actions associated with them, which are also accompanied by norms and rules for their use. For example, suppose the object is an apple, the action is eating, and they are connected by the belief that realizes the norm ”not”. Consequently, the action ”eating” cannot be applied to the object ”apple”. With reference to equations (1) and (2), we have means to select the set of actions related to an object and to what the agent can do (its abilities, e.g., to take or move objects, which we do not include here for simplicity) and then justify. Ideally, we want to achieve a situation where, when I tell the agent to ’eat the apple," the agent weighs, decides, and responds, ’I can not eat the apple because I am an agent," or because I am not a human, or other more complex thoughts. It would just depend on how complete the agent’s knowledge and representation is.
Figure 2 shows the process that connects the modules in the architecture mentioned in [21]. Start and end points are not shown because an agent’s reasoning process is continuous. Agents think by processing inputs from the environment, both external and internal. It selects actions from the set provided by the designer. Once the action is selected, the two functions mentioned in section 2 are activated to create a rationale for the actions. Then the agent observes the effects of its actions and the resulting changes in the external world, updates its beliefs through perception, and activates the justification function. Justification is simply an explanation for the action and focuses only on the results. The step forward is to add new beliefs or modify the existing ones to account for the effects of observing the inner world through inner speech. Let us say roughly, for the effect of what the agent thinks.

Key ideas for implementing inner speech in agents.
Inner language is included by further modifying the BDI argumentation cycle according to Algorithm 1.
For each selected action, the agent considers whether the action can be performed by evaluating the pre- and post-conditions of the action with reference to the knowledge base. It then revises its beliefs and desires using the rehearsal function. Observation of events through inner dialog feeds back into the agent’s mind, resulting in new desires and an updated set of beliefs. A BDI agent does the same by using communication with other agents to affect changes in other agents’ beliefs. In this case, the agent sends a message to itself as a result of rehearsing.
The BDI agent paradigm and Jason cycle lend themselves well to the development of the process in Fig. 2. At runtime, a BDI agent consists of the belief set, plan set, intention set, event set, and action, as well as the choice functions managed by the interpreter: S E , S O , and S I [7].
Events can be external and internal. External events are generated by the perception of the environment and correspond to the deletion or addition of beliefs. Internal events are generated by the agent when executing a plan and do not affect the environment. Therefore, these events are most closely associated with the execution phase. Some examples are the addition of performance or test goals.
Our idea is to process internal events and link them to speech acts to generate an explanation for the agent’s actions. The plan represents how the agent should act to achieve a given goal, taking into account the belief, the action to be performed, and the goal to be achieved. Thus, plans are the elements that enable inner discourse.
Our idea finds its natural implementation in speech acts. In an agent system, communication is based on the theory of speech acts [31], which is briefly described in the section 4. Since inner speech is a way of thinking about oneself, we claim that a speech act can address the agent itself under certain conditions. We assume that each agent in each plan can send a message to itself in addition to taking actions to achieve the team’s goals. This message aims to change his beliefs or goals, and at the same time makes this message available to the outside world.
In our approach, agents think about an internal and an external world. The separation of these two aspects is fundamental, because in order to achieve a goal, the environment must be changed, and this can be done by acting on objects and resources. At a higher level of abstraction, the decision to perform an action on an object implies thinking about the norms of that object, the ability to perform the action, etc. In equation (2), the set of actions is determined based on all available beliefs about an action and about the agent’s capabilities. For these reasons, we decided to use the JaCaMo platform. JaCaMo allows to integrate at the programming level the three dimensions of a MAS: the agent, the environment and the organisation, and therefore integrates the functions of Jason, CArtAgO and Moise. Jason draws on BDI-based logic and its interpreter perfectly realises a BDI reasoning cycle, CArtAgO provides the infrastructure for environment programming, and Moise realises an organisation modelling language. In this work we focus only on the agent and the environment, the organisation level will be considered in future work. The main element in CArtAgO is the artifact, an entity for structuring the environment. Agents and artifacts are contained in workspaces, which are the logical node of agents’ interaction with the environment. An artifact has a set of observable properties, which represent the state that an agent can perceive, as well as some operations, i.e., all possible actions that an agent can perform with an artifact.
Fig. 3 is redrawn from [5] and shows the elements of the Jason and CArtAgO metamodels involved in the realization of inner speech. In particular, the agent uses an operation to act on an artifact, the result of the operation changes the state of an observable property, leading to a change in the belief base through another action, through a speech act. Through beliefs, we can link actions to the environment and thus have or gain knowledge about what actions can or cannot be performed and on what object they can occur.

Jason and CArtAgO metamodel and the relations with elements realising the inner speech.
To validate and test the approach, we chose a collaborative task, a multi-agent system for setting a table. The overall experimental setup we are working on in our lab aims to investigate the role of trust in humans when robots exhibit inner speech capabilities (see [27] for details). In this section, we use a small set of specifications. The system we want to develop is to help humans set a table, perform actions, or delegate actions when necessary. Setting a table is a very simple task that consists of selecting the right elements for the table and positioning them in the right place. The set of elements of the table can be varied and consist of different elements. Each element must be placed in a specific location on the table and in a specific relative position to the other elements. There are some specific etiquette norms that can be followed by both the human side and the robot side. In the implementation of the systems, we assume that a multi-agent system is used in the robots. The mission, i.e., the scope of action, is fixed at the time of development; there is no dynamism or uncertainty in the pursuit of the goal. Humans and robots must pick up glasses, plates and cutlery and position them correctly. In doing so, they must observe each other and decide whether to perform or delegate an action depending on what is happening. For example, if an agent wants to assign a location but it is unavailable, it can ask the human to take over the task. Also, an agent may receive orders from humans that it must evaluate before executing. There can be many reasons for which an action cannot be executed. They depend on the norms of the environment and on the capabilities of each agent, which we can make arbitrarily complicated. During the execution of the mission, before performing an action, the agent must explain why it will perform that action. At the end of each action, the agent must update its knowledge base for further decisions.
In this section we describe a small part of this system, namely the one devoted to the representation of inner speech. We use only two agents and do not consider the presence of the human. This simplification does not affect the result of the validation phase. For the same reasons, it is not necessary to use the organisation level of a MAS, so we do not use the Moise part of JaCaMo, but only Jason and CArtAgO.
After identifying the agents, the second step was to outline their knowledge. We considered only a simple belief base at the agent level, but focused mainly on the environment level. For this level, we identified a workspace and three artefacts. The workspace is the space in which agents can interact with artefacts. The identified artefacts are: a blackboard, a table, and a cutlery container. They are summarised in the following table, along with some observable properties we identified. They are shown in Fig. 4. For a complete overview of the code produced, see https://github.com/roboticslabunipa/WOA2022.

Portion of code related to the definition of the observable properties of artifacts.
Details and description of agents, workspace and artefacts.
The blackboard contains the list of tasks that the agents must perform, as well as all the useful information for the correct execution of the tasks. In this case, we have included the normative part of the environment in this artefact. A more complex and efficient model can be created using a specific methodological approach, but this is beyond the scope of this paper. For our purposes, then, it is not important exactly which artefacts are listed, but only that each artefact contains the correct observable properties and operations for the agent to do its job. For example, we could identify an artefact for each piece of cutlery and describe in detail all the properties and operations for each. We leave aside the full version of the example when we will also discuss the methodological approach to designing and implementing self-explanatory agents in the future.
Figure 5 shows parts of the code related to the justification of actions and the activation of speech acts. In the first part, we see the agent reasoning about the preconditions of the actions and outputting a string containing the elements of its justification. In the second, the speech acts (.send(....)) are executed with the associated performatives to change beliefs. The figure shows the change in knowledge about an object that one possesses and then gives up.

(left) Portion of code related to the implementation of justify function to display agent’s explanation.(right) Plans for atomic actions and subsequent inner speech through speech acts. Agents present ad-hoc plans to perform actions on artefacts and update their beliefs with the results of those actions.
Fig. 6 shows a descriptive diagram of an agent’s reasoning cycle and highlights some aspects introduced to enrich the cycle. To trace how a cycle proceeds, we take the hypothetical case where the pepper agent must execute the plan to place an object on the table. Assuming that the agent has already updated its beliefs and selected the event associated with the plan, execution proceeds as follows. The agent cycle begins with the deliberation phase, which first selects the relevant plans and then identifies the applicable plans depending on the context. After determining the applicable plan for the action required to complete the task, the agent can provide a rationale for why it can perform that action before moving on to the actual execution. Now we move to the action phase, where the intent is to evaluate and execute the action. The execution process moves to the artefact that has this operation, in this case the table, which can provide feedback that is then evaluated in the intention update phase. Before the agent completes the execution of this single cycle, it must evaluate the next formula in the plan body, which involves sending a message. In particular, this is an ’internal’ message, since it is sent to himself and is used to update his own beliefs as a result of the action performed. Here we can also see how the thinking process improves; in fact, reaharsal enables the addition of data to thinking in the form of beliefs about actions. By improvement, then, we mean an increased ability to consider facts and events. In the figure, the dashed red-cayenne line emanating from the Execute Intention block and splitting toward the Update Intention and (above) Check Post blocks outlines the activation of the internal language process associated with the agent’s reasoning cycle in the Execute phase. Here we can see how the elements of the metamodel in Fig. 3 are implemented in the agent.

Schematic diagram relating to the execution of a reasoning cycle of the Jason interpreter with its extensions.
Interest in systems capable of self-adaptation and self-awareness is rapidly increasing in these years. Equipping robots or agents with cognitive capabilities is certainly the next breakthrough in the field of artificial intelligence, and more and more scientists are talking about machines that can behave like humans. In this paper, we present a possible solution to equip agents, robots, and intelligent systems with internal language capabilities. The deliberations that a human makes before taking an action or making a decision are a key moment for adaptive and autonomous behavior, especially when working in teams. Humans have developed the ability to put themselves at the center of their thinking and activate the so-called inner discourse to regulate and control their behavior. The idea we present in this article is a preliminary approach to use the concept of speech act to implement inner speech in agents. From a technological point of view, we have experimented with the agent’s BDI technology on the way from theory to implementation and have obtained good results in terms of simplicity of handling some specific design abstractions of Jason and CArtAgO. It is important to note that the.send primitive is used for the example in this paper. One can hide the soliloquy function inside the interpreter so that we do not have to change the code every time we implement something. There is also a need for an ontology that represents all the consequences of each action performed, so that the knowledge base can be automatically (and correctly) updated. Without such an ontology, some sort of list of possible alternatives resulting from various actions would have to be inserted into the interpreter’s code. However, this remains a viable option that can be implemented at a later time when a more complex and structured scenario is available. Currently, the implementation of the inner language module is designed for the simple validation scenario and does not consider the use of a full ontology, which is one of the crucial points of this approach.
The proposed approach builds on and extends our earlier work by giving robots the ability to justify the results of their actions. The ability to explain what is done and why is the focus of our work, which aims to create agents that are reliable, explainable, and believable. We believe that the method we use and the cognitive model that underlies our work can also incorporate other elements such as emotions, mental states, and even moral and ethical values into the reasoning process. This will be the topic of our future work.
The advantage we took from our choices was that we could easily combine the theoretical model with its practical implementation, using the reasoning cycle of the Jason interpreter and the programming of the CArtAgO environment. However, a challenge that we highlighted during our work and that will be the subject of our future work is that the transition from the cognitive model to the implemented model requires a careful and precise methodological approach to the design.
Another element to consider is the completeness and accuracy of the agent’s knowledge representation. Indeed, it is necessary that the elements of the environment are correctly associated with all the actions that can be performed and the rules for their activation. Ontologies, which can be found in the literature and for which a process of conversion to belief is necessary, can help in this regard, and their integration will be part of our future work.
Finally, we plan to further validate the proposed approach using a more complex scenario and complement it with a rigorous methodological approach.
Footnotes
Acknowledgments
The research leading to these results is in the frame of the project awarded by number FA9550-19-1-7025 - Air Force Office of Scientific Research.
Conflicts of interest
The authors declare no conflict of interest.
