Abstract
In recent years, the growing use of Intelligent Personal Agents in different human activities and in various domains led the corresponding research to focus on the design and development of agents that are not limited to interaction with humans and execution of simple tasks. The latest research efforts have introduced Intelligent Personal Agents that utilize Natural Language Understanding (NLU) modules and Machine Learning (ML) techniques in order to have complex dialogues with humans, execute complex plans of actions and effectively control smart devices. To this aim, this article introduces the second generation of the CERTH Intelligent Personal Agent (CIPA) which is based on the RASA framework and utilizes two machine learning models for NLU and dialogue flow classification. CIPA-Generation B provides a dialogue-story generator that is based on the idea of adjacency pairs and multiple intents, that are classifying complex sentences consisting of two users’ intents into two automatic operations. More importantly, the agent can form a plan of actions for implicit Demand-Response and execute it, based on the user’s request and by utilizing AI Planning methods. The introduced CIPA-Generation B has been deployed and tested in a real-world scenario at Centre’s of Research & Technology Hellas (CERTH) nZEB SmartHome in two different domains, energy and health, for multiple intent recognition and dialogue handling. Furthermore, in the energy domain, a scenario that demonstrates how the agent solves an implicit Demand-Response problem has been applied and evaluated. An experimental study with 36 participants further illustrates the usefulness and acceptance of the developed conversational agent-based system.
Keywords
Introduction
Nowadays, the latest advances and developments in the domains of Artificial Intelligence (AI), Natural Language Understanding (NLU), Speech Processing and Recognition, and the Internet of Things (IoT), enable advanced Human-computer interaction and translation of human commands and intention into various actions performed by Intelligent Personal Agents (IPAs). Leveraging on the aforementioned advances, IPAs are able to communicate with humans, send them reminders, provide them entertainment, execute custom tasks and plans, and control a large amount of SmartHome devices over different communication protocols such as Wi-Fi, ZigBee and Bluetooth. The fast growth of demand for IPAs in everyday life, led the market-leaders to create a wide variety of commercial IPAs such as Google Home Assistant, Amazon Alexa and Apple Siri by moving further of previous significant research results such as ELIZA [1] and IBM Shoebox [2] which are considered as two of the most representative first attempts to create IPAs. In order to effectively interact with humans and assist them, all the developed solutions and systems are facing three major challenges: (a) the natural language understanding, (b) the management of the dialogue/conversation flow and (c) the effective execution of a series of actions and tasks. For addressing these challenges, a wide variety of IPAs frameworks have been developed. These frameworks equip developers and researchers with a series of tools that enable the design and development of Virtual Agents/Assistants and chat bots for various domains. RASA, Google Dialogflow, Facebook WIT.AI and Amazon Lex are among of the most popular IPAs framework solutions. Plenty of these frameworks provide high-level services to developers/researchers has no access neither to their internal ML models or their source code as they are accessed through web Application Programming Interfaces (APIs). These features enable the faster development of IPAs.
By exploiting the aforementioned domain advances, frameworks and corresponding research results, CIPA Gen-A has been developed and introduced in our previous work [3]. CIPA Gen-A extends the architecture of RASA’s StarSpace [4] adapation and introduces a dialogue generator for adjacency-pair dialogue scenarios. The work that is presented in the current article is the extended version of CIPA Gen-A, the CIPA Gen-B. The new agent derives all the core functionalities of its predecessor such as the multi-intent, the novel dialogue generator based on adjacency pairs and its application in different domains of CERTH nZEB SmartHome. In CIPA Gen-B, new functionality related to the execution of complex plans by an AI Planner [5], is introduced. The newly added feature enables the agent to perform planning of implicit Demand-Response (DR) actions, based on its conversation with a resident of the SmartHome, towards increasing a system’s efficiency (e.g., in terms of energy consumption management). To the authors’ knowledge, the proposed agent-based system is the first-of-its-kind which uniquely combines methods for NLU multi-intention, dialogue generation for adjacency pairs, and AI planning to form plans of actions for implicit Demand-Response. Furthermore, a novel experimental study with 36 participants has been conducted in order to assess the application, usefulness, and acceptance of the developed agent. Conclusively, this paper aims to answer the following questions:
Q1: How easily can humans with minimal prior knowledge of a domain interact with an Intelligent Personal Agent by using only a dialogue generator to define the conversational tree? Q2: Can an Intelligent Personal Agent support a multi-domain AI Planner to form complex plans of actions towards solving effectively an implicit Demand-Response problem?
The remainder of this article is structured as follows. A brief review of the related work is presented in the next chapter. The architecture of CIPA Gen-B which is based on Gen A is documented on Section 3. Besides the architecture, all the core functionalities such as the dialogue generator, the multi-intent models and the planning generator for the implicit DR are available on Section 3 as well. Section 4 provides the evaluation results for the NLU Models, the dialogue generator, implicit DR planning and the results of the acceptance study. In Section 5 a discussion for this work’s findings is presented. Finally, Section 6 describes the conclusions.
Related works
This work aims to present the release of CIPA Gen-B and its application in a SmartHome for the optimal planning of an implicit Demand-Response (DR) scheme. Given the fact that to the best of the authors’ knowledge, there is no published research regarding an Intelligent Personal Assistants/ Virtual Assistants that enable residential implicit DR schemes, a brief literature review regarding conversational task-based agents is presented, along with a short overview of methods used to apply DR plans in residential buildings.
Nowadays, the conversational task-based agents execute tasks based on verbal interaction with humans exploiting natural language processing and AI/ML capabilities. The introduced agent is based on dialogue generation as the modeling of the future dialogue’s generation is crucial for tasks’ execution. To this aim [6], agents learn to map messages, and responses enabled by recurrent neural networks (RNNs) are used for data-driven conversation modeling. Industry leaders such as Amazon Alexa, Google Home Assistant and Apple Siri are based on ML models for dialogue generation. Recently, Google introduced Meena,1 an end-to-end, neural conversational model that learns to respond sensibly to a given conversational context. A single evolved transformer encoder block is used for processing the conversation context block. An actual response is formulated by evolved transformer decoder blocks that are available in Meena. The dialogue generation based on the model of Encoder-Decoder is widely used. A neural network-based architecture, with hierarchical latent variables that capture dependencies over an extended conversation history was introduced by Serban et al. [7].
To this aim, Park et al. [8] introduces a variational hierarchical conversation RNNs model. Reinforcement Learning (RL) approaches [9] are popular as well for human dialogue generation based on training model for the production of the response sequences. Recently, a Deep RL approach [10] was introduced. It is based on the optimization of long-term rewards and the conversation between two agents for the discovery of possible actions and the maximization of the expected rewards. In another Deep RL approach [11] proposes a multi-agent system with independent single agents which have a partial understanding of the environment, that are able to overcome their limited knowledge by using a centralized referee during the learning phase. Besides the aforementioned general approaches for dialogue generation and task execution, applications related to SmartHome automation, which is the domain application of the current work have been introduced and they are based on usage of agents and dialogue systems. Besides the applications of world leaders (Amazon Alexa and Google Home), Dumitrescu in [12] introduced Cassandra, a voice assisting system that enables user to control SmartHome appliances. NLU and automatic speech recognition are used alongside a knowledge base containing predefined scenarios to handle the dialog with the SmartHome occupant. A predefined scenario is selected after analysis of the occupant’s voice command. This comes to opposition with CIPA Gen-B functionality which does not require any historical knowledge with predefined scenarios with its novel dialogue generator.
In another work in this domain, Park et al. [13], proposed a framework for the development of task-oriented dialogue systems in a SmartHome environment. The framework is able to build a dialogue system by editing the dialogue knowledge which is ontologically expressed. The framework is also equipped with a dialogue management system that is based on a rule-based system for defining the possible states of a dialogue and the behavior of the system for the given state. Both information-state [14] and finite-state [15] based systems are used. The finite-state methodology can be considered related to the adjacency pairs methodology that is used by CIPA Gen-B for dialogue generation, which is based on adjacency pairs [16] and enables the generation of all the possible dialogue trees. Personal assistants that contributes on users’ planning which is in accordance with the presented application of CIPA Gen-B is a field with limited applications but with some interesting results.
In another approach related to home applications, Lera et al. [17] introduces a context-awareness component for labeling user’s activities in a human-robot shared environment based on ANNs. The work provides an enhanced inference engine that is helpful for robot’s decision making as it based on dialog flow, time and date, and localization information for labeling plus an environment recognition component supported by acoustic signals.
Yu et al. [18] introduced Uhura, a personal assistant that support planning for complex human requests. Uhura integrates a knowledge base and a dialogue manager with a planner that provides reasoning and supports negotiation with the user until a resolution is reached for competing requirements. In order to simplify planning for the users, Uhura provides features such as multi tasks and constraints, natural language communication mechanisms and goal-directed interactions. Based on these features, the main functionality of this personal assistant is that it receives as input semantic, spatial and temporal constraints, that causes the extraction of tasks from the knowledge base, and provides, after the evaluation the alternatives of each task, plans that best meet the users’ requirements. Uhura, during the time of writing this article had not been implemented.
A multi-intent planner for story generation in a multi-agent system is proposed by Riedl et al. [19]. The planning algorithm uses causal reasoning and a simulated intention recognition process in order to generate narratives with plot coherence and strong character believability. The authors proposes a solution that merges methodologies derived from Belief-Design-Intention agent framework (enables the formulation of intent and intention recognition to the problem of narrative generation) with partial-order planning techniques. However, the introduced implemented intent driven planner has not been fully tested and the testing process is an empirical evaluation. In another approach, Geib et al. [20] introduced a model that uses plan recognition and automated planning for the creation of collaborative virtual agents. The model is based on two agents with different roles, initiator and supporter that use a shared action representation. The results have been tested in 3 experimental domains. The plan and actions of a domain are defined using Combinatory Categorial Grammars (CCGs) [21]. Both plan recognition and automated planning are enable by using the created CCG lexicon per domain. The approach finally delivers a virtual robot platform in which the supporter agent proposes a set of goals and plans to achieve in order to support the initiator agent.
In addition to the aforementioned approaches related to planning, some others related to decision support by agents can be considered. In [22] an agent based model framework for deciding the execution of an evacuation plan in two domains, a museum and a train platform is proposed. In particular, human behavior model is introduced by the authors. The model is based on various blocks such as desires, emotion, memory and belief. The blocks are matrix or weighted vectors. Above them there is a final block for decision making that all the previous models converge. In another work [23] related to agent systems for decision control, a decentralized system is proposed by the the authors. More precisely, the proposed solution is a multi-agent controller for vibrations control of smart structures and its reasoning functionalities are enabled by usage of replicator dynamics coming from game theory.
In the context of the Smart Grid [24], Demand-Response is defined as “the changes in electric usage by end-use customers from their normal consumption patterns in response to changes in the price of electricity over time” [25]. The DR programs are valuable asset in Distribution System Operator’s quiver in order to maintain the stable operation of the power grid during peak load periods [26]. DR programs are divided in two large categories, i.e. the incentive-based and the price-based, each of them is compromised by many subcategories [27]. In case of consumers only, DR can take two main forms: a) peak shaving, i.e. load reduction during critical time periods, a practice that implies temporary loss of comfort [28] and, b) load shifting, i.e. in times of high energy prices, customers move partially their demand to off-peak periods [29, 30].
How these plans are decided and implemented is out of the scope of this work. The question in place regards how these plans are applied from the customer/ consumer side. From the customer point of view, when a DR request arrives, the user is expected to comply with a message demanding either the specific operation of some appliances (e.g it is recommended to use the washing machine from 19:00 to 21:00 [31]) or by reducing the overall household consumption by a specific amount of power. The compliance to the DR schedule happens either in an automated manner (explicit user) via the control of the loads by a Building Management System (BMS) or in a more abstract way, where the user shuts down appliances without knowledge of their impact on their overall consumption. An important aspect is the nature of the household loads: these are divided into shiftable and non-shiftable loads [32] (also called controllable and non-controllable loads). This categorisation is arbitrary because it is directly connected to the residents’ needs or preferences and it also depends on the functional possibilities of the loads [33].
It is evident that any kind of DR scheme corresponds to shiftable/ controllable loads. The control of shiftable loads, in case of an automation system is present (BMS) can be implemented in a ruled-based manner or by employing some sort of optimisation with specific objectives (e.g. minimisation of user discomfort). Khorram et al. [34] proposed an optimised planner in order to reduce the total lighting consumption whilst comfort-related constraints are satisfied. The optimiser was validated over a set of multiple scenarios in simulation level and all control actions were designed to be applied without user interference. On the contrary, in [35], instead of lightning the direct load control on the HVAC were utilised for providing intra-hour load balancing services to the DSO. In an aggregated manner the proposed control signals could lead aggregators to additional revenue from participating in DR market. A similar approach was followed in [36], where the setpoints of the installed HVAC of a commercial building were adjusted for analyzing peak demand reduction capabilities.
CIPA Gen-B architecture
Schema of CIPA.
In this section the design approaches of the CERTH Intelligent Personal Agent Gen-B architecture are presented. The conceptual architecture of the system is illustrated in Fig. 1. The user can interact with the agent either through speech and voice commands or through text messages. Three components of the agent are then utilized in order to execute the requested actions: a) the RASA2 Core with the conversational tree produced by the Dialogue Generator aiming to capture all adjacency-pair based dialogue branches specific to the agent’s domain b) the RASA NLU with the modified Starspace Module, enabling higher recognition accuracy of phrases denoting multiple intents by the user c) and the Demand Response Planning Module which produces and executes plans of actions to either lower (or increase) the resident’s energy consumption by a specific amount by controlling the SmartHome’s appliances while attempting to minimize the resident’s disturbance to these actions. The RASA framework was chosen as the basis of CIPA due to its popularity, extensible architecture and open-source format.
A design decision behind CIPA-gen B is supporting multiple task domains. This capability is enabled by utilizing metaprogramming techniques which generate the required Python code (in the RASA format) by Python functions that take a series of mappings that define the agent’s domain as input. Moreover, CIPA-gen B offers an API that supports exchange of messages in JSON format. Furthermore, it includes an SQL database for the storage of user’s messages and an action history. CIPA-gen B is provided with a multiple intent model for each of the two supported domains. The novel dialogue generator from CIPA-gen A, the supported multi intent model, as well as the Planning system for implicit DR response are explained in details in the following subsections. The technical descriptions come alongside with examples related to the supported domains by the assistant and their datasets so as to better explain the application of the proposed agent and make it easier for the reader to follow and understand the concepts in this document.
The remainder of this section, describing the modules that compromise the architecture of CIPA, is structured as follows. Section 3.1 describes the “Dialogue Generator” module that is added on top of RASA Core to handle the dialogues of the agent with the user, that is provide the rules handling the conversation tree. Section 3.2 describes the “RASA NLU with modified StarSpace Module” that tackles natural language understanding, applied on the supported domains, responsible of understanding the user’s words on a sentence level. Section 3.3 describes the supported domains of the agent and datasets. Sections 3.4–3.6 describe the “Demand Response Planning Module” responsible of producing and executing plans of actions to lower or increase the user’s energy consumption.
A novel dialogue-story generator that is based on the idea of adjacency pairs has been designed and developed for CIPA-gen B. The proposed story generator models dialogue trees that consist of a subset of adjacency pairs based on the following two assumptions: (a) the user may omit information required by an action and thus the Agent will have to ask for that information by interacting with the user in a dialogue and (b) the user may not cooperate by following up the conversation. For example, instead of replying to a question posed by the Agent, the user may request another action. These assumptions are based on studies of human-chatbot conversation patterns [37, 38], which reveal that conversations are multi-turn and that missing information (e.g., “location”) may exist in users’ phrases, which needs to be addressed by the agent before executing the required action. In addition, according to further studies on human-chatbot dialogue patterns, switching between different intents seems to be natural behaviour for users [39].
For every action requested, the story generator produces conversation flows with various pieces of the missing but required information and defines the conversation flows that interact with the user to obtain this missing information. This produces all valid conversation flows for each action. For example, consider the “Turn on HVAC” action from the nZEB SmartHome energy domain. The introduced generator will produce the following valid stories for this action:
The user gives the room and the on/off switch. The user gives only the room and the agent asks for the on/off operation. The user gives only the on/off operation and the agent asks for the room. The user doesn’t give the required information and the agent first asks for the room and after getting a correct reply asks for the on/off option.
In addition, the generator produces dialogues for handling valid multi-intents by the user. For example in the turn-HVAC+change-HVAC-mode multi intent, the generator will produce all valid adjacency pair flows for the union of their slots. At the end of each story it will call each action sequentially. If a slot is common to both intents and the user has not provided it, the CIPA-gen B will ask it once.
Moreover, it produces invalid conversation flows in which the user does not cooperate by following up the conversation with the supported intents but he/she follows up the conversation with a different intent. The invalid conversation flows restart the conversation after the agent tells the user that it did not understand his/her intentions.
Write story
repeat >
write repeat signature number
write multi intent signature number
WriteStoryII(intent,included,mappings)
write ’-’ slots_to_act_map[excluded[
WriteStoryII(intentToProcess,[],mappings)
write ’ - utter_repeat’
WriteStoryII(intentToProcess, [excluded[
write ’* inform{’ (excluded[
write ’-’ intent_to_act_map[intent]
each intent inte_ of multi intent intent
write ’-’ intent_to_act_map[inte_]
write ’- action_restarted’
The proposed algorithm that generates stories (Algorithm 3.1 to 3.1), takes as inputs an intent to action mapping (intent_to_action_map), a function that maps intents to a set of slots (that are needed for the intent – slot_fun), a slot example relation that maps slots to random slot instances (slot_example_fun), a slot to action mapping (that specifies which action requests each slot – slots_to_act_map) and a mapping of valid multi intents (multi_intent_map). The algorithm automatically generates the story file (in the RASA format)3 for training the dialogue models. The algorithm could be ported to generate the story file in other formats. It operates by including a tree-expansion phase to produce all possible story combinations that have slot information missing in the original user message, as well as all invalid variations of them in which the user’s reply is classified to a different intent from the expected one. Algorithm 3.1 generates a single RASA Core story, that is a branch of the dialogue tree, providing a possible path of user-agent interaction. Algorithm 3.1 generates all possible stories, that is all branches of the dialogue tree that flow from a specific user intent. Finally, Algorithm 3.1 is the main function of the generator which produces the full dialogue tree. The auxiliary functions used are (a) SlotsCombinations, which is responsible to generate all possible pairs (included, excluded) through a list of slot combinations and (b) WriteStoryII which writes the signature of a story along with the slots matched as input.
Generate all stories of an intent
multIntent(intentToProcess)
Split intentToProcess to individual intents slots
slotsComb
include, exluded in slotsComb
WriteStory(intentToProcess, intentToProcess, included, excluded, runRepeated
intent in intent_to_act_map
WriteStory(intentToProcess, intent, included, excluded,runRepeatedAct
Story Gen – Main function of the algorithm
Write a story of no understand classified to utter default
intent in intent_to_action_map
slot in slots_to_act_map
Write a story consisting of a single inform and a slot classified to no-understand
Write a story consisting of a single inform classified to no-understand
Both single and multi intent models are supported by the CIPA-gen B. In the single intent model the Chatito,4 a third-party open-source natural language generator is used in order to create a training set for the NLU module. Chatito was used to model patterns of phrases which are then expanded to produce a training dataset by generating various combinations of words and phrases. Chatito is a widely used tool for this task and is being used in other works regarding Conversational Agents for generating the training data [40]. For the NLU model we did not select Rasa NLU’s default model that uses the SpaCy Natural Language Processing system but opted to use a RASA NLU Model based on StarSpace [4] for intent classification. Afterwards, we defined in Chatito’s DSL (Domain Specific Language) examples of intents for the nZEB SmartHome domain. These do not correspond to an exact one-to-one mapping, as we defined some extra intents. The two most important of them are the inform and nounderstand intents. The first should match all user messages that inform the agent about a slot (a parameter) of an action, such as the location of an action, in the case it was not included in the original message. The second one, the nounderstand intent, should match all user messages that resemble messages that correspond to other intents but that do not make sense. For example consider the user message “turn on the lights” and the message “turn on the door”, the first message should be classified to the turn the lights intent, whereas the second should be classified to the nounderstand intent. By specifying these intents and text patterns in the Chatito DSL, we used the generator to generate our training set for the NLU Model.
For the dialogue Model, we used the Rasa Core’s memorization model. Furthermore, we added a fallback policy for both the intent classification and dialogue classification models. Therefore, the agent will be able to reply that he does not understand anything in the cases of low classification confidence.
For training the multi intent model, we developed an algorithm that takes as input a Chatito DSL file alongside with a mapping of valid multi intents and produces an extended Chatito DSL file that includes patterns of these multi intents by combining the patterns of their single intents counterparts. In addition, it uses a random variable to consider omitted duplicated words when combining messages from two intents. For example, in the case of combining “please turn on the HVAC in the kitchen” with “turn the HVAC mode on”, the second “HVAC” word will probably be omitted in the training example for the multi-itent.
Thereafter, we used the output of this algorithm to generate our NLU training examples using Chatito. The output of this algorithm produced a multitude of multi intent data that surpassed the single-intent ones in the data. To balance the classes we post-processed this output by tripling to quadrupling the single-intent observations in the data. Furthermore, we extended RASA NLU’s adaption of the StarSpace method, by adding more hidden layers to the NLU model and training an NLU Model for multi-intent classification. We used a 512
Supported domains and datasets
CIPA-gen B has been deployed in CERTH nZEB SmartHome in two different domains, energy and health. The CERTH nZEB SmartHome is a rapid prototyping and a novel technologies demonstration infrastructure resembling a real domestic building where occupants can experience actual living scenarios while exploring various innovating smart IoT-based technologies with provided Energy, Health, Big Data, Robotics and Artificial Intelligence (AI) services. Table 1 presents the intents recognized for both domains of the SmartHome.
Summary of available operations of CIPA-gen B
Summary of available operations of CIPA-gen B
CERTH nZEB SmartHome – Health Domain: Health related IoT devices monitors a variety of physiological attributes, and enabling the extraction of valuable data through intelligent processing towards preventing situations that could lead to harmful outcomes. A dataset, based on the testers’ inputs, has been created. Thereafter, it is referred as the gold set of the health domain. This set is continuously extended and updated for nZEB SmartHome’s health domain. Due to its size (1972 samples) it is solely used as a test set.
CERTH nZEB SmartHome – Energy Domain: SmartHome is equipped with energy domain related IoT devices that monitor the energy consumption and production, and the conditions of the entire building while various algorithms can support automation and energy efficiency scenarios. A dataset has been created from the testers’ (SmartHome occupants) inputs, thereafter referred to as the gold set of the energy domain. This set is continuously extended for nZEB SmartHome’s energy domain. Due to its size (1608 samples) it is solely used as a test set. A common pattern that has been found in the data was the omission of information crucial for an operation to be performed. For example a user could utter “Turn on the lights” without providing the room where he/she wanted the action to be performed. Moreover, there were observations such as “Turn the lights” which did not contain the fully specified action information, such as if the user wanted the lights on or off. As the user could request a change for the state of the lights of another room this is crucial information.
Action costs for light level settings
Action costs for HVAC speed settings
In order to demonstrate the applicability of the designed agent on the energy planning problem, the use case that will be demonstrated is the following: the residents of the SmartHome participate in an implicit DR scheme. The information arrives as a notification on their smartphone, which essentially comprises a prompting signal to increase or decrease their total energy consumption of their household loads by a specific amount of watts for a specific period of time. The examined household consist of 9 different rooms (i.e. Living Room, Kitchen, Double Bedroom, Single Bedroom, Playroom, Guest Room, Hall, WC, Corridor) where a variety of smart loads are installed, while the power consumption of each load is measured by a smart energy meter. In general, DR schemes are implemented by either explicit directions towards the residents regarding turning on or off specific appliances at specific time intervals as suggested by Jovanic et al. [31]. However, such an approach – in case of BMS absence – may cause discomfort to a user because it demands actual actions from their part (e.g. gradually shutting down loads, dimming lights, changing the temperature and fan speed of HVAC etc.). On the contrary most BMS have the ability to monitor and control individually the installed loads, but they lack planning functionality. Therefore in order to eliminate this problem the proposed agent is utilised as a virtual assistant to DR handling. The agent collaborates with the BMS for retrieving information related to the current status of the loads and for sending set-points to these loads. In order to demonstrate the aforementioned proof of concept, the two most utilized loads in implicit DR schemes were selected; lighting and HVAC control. It should be noted at this point that since in the SmartHome a centralised HVAC system is installed, in order to demonstrate the intelligence of the agent planner regarding the different selected actions for each room, we have selected the control of the HVAC fan coils which are independent in each space.
The lights in each room of the SmartHome are dimmable and respective measurements have been realised regarding their consumption at different levels, with a 20% step (“Off”, 20%, 40%, 60%, 80%, 100%). Similarly, each room’s fan coils speed has been measured, corresponding to the predefined by the manufacturer speed levels (“Off”, “Slow”, “Low”, “Medium”, “High”, “Power”), which were assigned to the same 20% step levels as for the lights. Finally, smart sensors measurements reporting the human presence in each SmartHome room and the outdoor luminance and temperature are also employed in order to define fully the scenarios exampled in the next paragraph. It should be noted that proper validation of the sensors measurements took place in order to ensure their proper operation.
Agent operations and cost definition
In order for the agent to respond to the user’s request, a set of specific operations from which the agent can choose, has been defined. These operations are changes in the SmartHome’s electrical devices percentage of operation. Specifically, these changes can vary from 0% to 100% and from 100% to 0%, in steps of 20% (e.g. from 20% to 80%, from 60% to 0% etc.) The two groups of electrical devices that can be adjusted are lights and HVAC systems, in which the lights can be dimmed and the fan speed can be altered. In each group different conditions are considered, particularly human presence, TV operation and outdoor light for the light group as well as human presence and outdoor temperature for the HVAC group. According to the above, the possible scenarios of actions to be performed from the agent, are generated.
For the agent to be able to choose the ideal combination of achieving the user’s request, some specific costs are defined. The costs are defined arbitrarily according to the following factors: hot/cold/medium for the output temperature and true/false for human presence, TV operation and outdoor light and finally the change of light or HVAC operation level. The total cost of every action derived from the sum of condition costs. The actions that include human presence have higher cost, since the agent must avoid causing disturbances to rooms that humans are present (e.g. turning off the lights). Also, due to the fact that actions changing the lighting – compared to changing the fan speed level – lead to higher power consumption, these actions have been associated with higher cost, especially in case residents are present in the room. In Table 2 for lights and Table 3 for HVAC systems, the costs of the possible action are presented.
where,
LHPM: Lights Human Presence Modifier,
HHPM: HVAC Human Presence Modifier,
Equations (1) and (2) compute the action costs of an action, according to the costs defined in Tables 2 and 3. There exists an action
Tables 4 and 5 present the sensitivity analyses of the two formulas using the Sobol method [41]. The input data for performing the sensitivity analysis were generated using Saltelli’s sampling scheme [42, 43]. The samples generated in both cases were in the order of 1e
Sobol sensitivity analysis of Eq. (1)
Sobol sensitivity analysis of Eq. (2)
For a user to ask the agent to perform implicit Demand-Response actions, she has to ask the agent to “Decrease consumption by
where
To define the PDDL domain of the energy domain for performing implicit DR response, we categorized the PDDL actions according to Table 6. We consider normal outside temperatures values ranging from 15
Categories of PDDL domain actions
Categories of PDDL domain actions
For each of the nine categories we have a multitude of PDDL actions that correspond to the level of change of the lights or HVAC fan percentage (i.e., 100 to 80, 100 to 60, 80 to 20, 40 to 0, 0 to 20, 0 to 100 etc.), all the various combinations both ascending and descending in steps of 20, each with its corresponding action cost and total watts change amount. Formally, for the set
The PDDL Domain contains 2 types (room and presence), 3 predicates (sroom y - room, hu- man-presence ?x - presence, at-place ?x - presence ?y - room). It also contains 7 PDDL function definitions (total cost, fan-level, light-level, total-watts, outdoor-temperature, tv-operation, outdoor-light), where the last two take binary values (0 or 1). These predicates and PDDL functions handle integer values. To handle real numbers for the total watts change, we multiplied all numbers that correspond to total watts change (both in the PDDL Domain and in PDDL problem instances) by a factor of 10 (as the total watts change values had one decimal place). We defined nine PDDL actions which include some meta-variables (not part of the PDDL specification) that correspond to the different values that differentiate the multitude of actions that belong to each category. For example, Action 8 is defined as:
Lights-fromx-toy-withHP-Outdoorlight-without-TV (?room,?presence): An action which changes the light-levels at room ?room by {diff}, increases/decreases ({inc}) the total-watts by {w}, and increases the total cost by {c}, when there is human presence in the room, the outdoor lights are on but there is no TV operation.
The meta-variables {x} and {y} correspond to the before and after the action application lights percentage, while {inc} corresponds to increase, when
To define a problem instance for implicit Demand-Response we define a PDDL Problem with the 9 rooms of the SmartHome and a PDDL object for human presence. For each room we define if there is human presence currently in that room or not. In addition, the current energy consumption in watts (total-watts variable) is set. For each room we define the fan-levels of the HVAC (a value of 0 denotes that the HVAC is not currently operational in the room), as well as the light levels for each room (a value of 0 denotes that the lights are off in the room). Moreover, we define the outdoor temperature, the status of the outdoor lights (boolean variable), and if there is a TV on in a room where there is human presence we set the TV-operation flag to 1.
To define the goal we subtract the user’s requested change in total consumption from the total-watts variable when the user tells the agent to decrease the consumption (or add the user’s requested change when the user requests an increase) and compute the goal interval using Eq. (3). We use an interval instead of a specific value for the goal because the actions’ change in total consumption may not add up exactly to the goal value. Finally, we minimize the problem according to the total-cost variable, that is the sum of the selected actions’ costs.
Generating and executing a plan
When CIPA-genB classifies an intent for either decreasing or increasing the energy consumption, it queries the SmartHome for (a) total energy consumption, (b) the light status of each room, (c) the HVAC status of each room, (d) the outside temperature, (e) the status of the outdoor lights, (f) human presence in each room, (g) if there is a TV on in a room where there is human presence.
Subsequently the agent generates the problem instance with the current status and the goal according to the previous sub-section. For computing Formula (3) we set
Evaluation metrics
Evaluation metrics
CIPA-gen B has been deployed on the CERTH nZEB SmartHome. In the following sub-sections we evaluate its various components, that is (a) the NLU multi-intent models for the Energy and Health Domains, (b) the dialogue generator based on adjacency pairs and (c) CIPA’s Planning module for the energy domain for implicit Demand-Response.
NLU models & dialogue generator
In this sub-section the evaluation results for the multi intent models and for both domains, energy and health, are presented. Prior to introducing the results of the evaluation, an assumption for mis-classification should be taken into consideration. Most of the mis-classification were of the type no-understand to inform. This mis-classification type is not a problem as it is handled by the dialogue model. We can classify orphan informs (that is informs that do not occur in the middle of the dialogue) as no-understands, whereas incorrect informs in the middle of a dialogue that do no match any of the valid dialogue flows generated by the generator presented in this paper are handled by the produced stories for invalid scenarios. So, if we consider this type of mis-classifications as correct our accuracy is increased (column Accuracy* in Table 7).
Table 7 presents the results of the multi-intent NLU model on the gold sets of the two domains. The Precisional/Recall/
To evaluate the dialogue generator we present examples of valid use cases. In Table 8, indicative examples of possible user messages for the actions supported for the nZEB SmartHome energy domain by CIPA-gen B, are presented. A user can phrase a command for the agent in various ways. Moreover, the agent can converse with the user if it recognizes an intent that information is missing. For example if the user commands the agent to “turn the lights”, the agent will reply with either “where?” or “in which room?”. If the user replies with a valid room (i.e., “In the kitchen”) the agent will reply “Do you want them on or off?”. If the user replies with either “on” or “off” the agent will execute the turn-lights action. A representative example of the generative process of the algorithm through the conversational flow between the agent and a user, as far as set temperature-single intent command are concerned:
Active node Final node
A subset of nZEB SmartHome energy multi-intent domain actions and examples
Scenarios’ initial conditions
Finite automata of story generator example.
A case example was used as part of the evaluation process. The example aims to demonstrate that the Novel Dialogue Generator of CIPA-gen B is able to produce all the possible use case scenarios in its conversation with a user by adopting the concept of adjacency pairs. In particular, in the presented example the agent requires information about the value of the temperature and the place (room). There are 4 initial cases: (a) the user defines the intent accompanied by the value or (b) the place or (c) both of them or (d) the case that the user inputs only the intent. In the next step of this discourse, the user can either provide valid information that fulfills a slot or provide an invalid input that is categorized in one of the 14 invalid operations in the sector of energy (lights, HVAC, etc) including no-understand operation in the case of no relevant input. Every single invalid input in the Fig. 2 diagram is translated to multiple alternative rejected scenarios. The node that contains value and place is a final node. On the contrary, the rest of the nodes need at least one more slot to be fulfilled in order to end up in a final state. In the same logic, every active node generates more complex scenarios until a final node is born. In this example, the set_temperature intent generates 64 unique scenarios that cover all possible contingencies and they are illustrated from the colourful nodes in the Fig. 2. Each node represents the given input from the user. Calculation of all possible generated scenarios in a multi intent command is a process identical to the single intent, differentiating only in the number of slots. Multi intents commands have as slots the union of the slots of two intents. So, for example in the set_temperature_lights intent that is responsible to set the temperature in a room and turn the lights, the available slots comprise the union of (value, place) from set_temperature and lights_on_off from light. The total number of generated scenarios are 281.
The evaluation of the CIPA story generator was an experimental process conducted through a comparison between the enumeration of all possible scenarios that can be generated from a dialogue between an agent and a user, as illustrated in the Fig. 2, and the actual scenarios that are generated by CIPA (the parsed data from the generated file and the calculation of the sum of the generated scenarios for every intent). CIPA has been evaluated as a competent agent that behaves properly covering all contingencies and as such we consider our first research question positively answered.
To evaluate CIPA’s Planning module for implicit Demand-Response we showcase 4 scenarios executed by CIPA-gen B on CERTH’s nZEB SmartHome. In Tables 9 and 10 we present the initial conditions of the scenarios and the goal consumption in watts. Table 11 presents the plans generated and executed by CIPA to reach the target goal consumption. The CPU time elapsed for generating these plans by the AI Planner ranged from 0.01 (Scenario 2) to 0.63 seconds (Scenario 4).
Scenarios’ device initial conditions
Scenarios’ device initial conditions
Plans for scenarios 1–4
In the first three scenarios the agent avoids disturbing the humans, whereas in the fourth scenario the goal is more ambitious and the agent has to disturb the humans in the living room by dimming the lights. However, even in that scenario the agent does not turn the lights completely off, but instead dims them. In Scenario 1 the agent avoids lowering the HVAC fans as the outside temperature is high, whereas in Scenario 3 it executes only one HVAC lower fans’ action in a room without human presence. CIPA reached the goal in all scenarios and as such we consider our second research question positively answered.
Herein, the results of an experimental evaluation study assessing the usefulness and acceptance of the agent are presented. We consider this as an important first step towards conducting further longitudinal studies in real-world settings in the future.
Methodology
Participants were recruited through an electronic invitation to the staff of a research centre in Greece (CERTH), in which the rationale of the study was explained. An online questionnaire was administered to the study participants. The questionnaire was comprised of three main sections. In the first section, demographic and personal information of the participant were requested. Questions about education, familiarity with speech interactions with a virtual agent, familiarity with smart home automation and voice control automation, were asked.
In the second section of the questionnaire, the participants were requested to perform a series of actions in interaction with the intelligent agent, and indicate the success or failure of the actions. More specifically, the participants were asked to connect to the online SmartHome platform, through which the agent communicates with the users, and the energy management of the smart home takes place. The requested series of actions was the following:
Change the status (on, off or dim) of the lights in a room. Change the status (on, off, fan speed, temperature) of the HVAC in a room. Change the status (on, off) of both the lights and HVAC in a room.
Subsequently, the participants were asked questions about whether the agent identified the missing parts of user’s voice commands and whether the right action in the right room was executed, the level of easiness/ difficulty in communication with the agent, and the frequency of interactions with the agent which they could tolerate.
In the third section of the questionnaire, questions about the usefulness and acceptance of the agent were asked. More specifically, participants were asked questions about their concern for the environment, whether the agent could help them in energy savings, their awareness about Demand-Response programs, their willingness to change their electricity consumption to gain monetary benefits, their intention to use the agent to adjust the electricity consumption, their tolerance in permitting the agent to handle loads in empty rooms or rooms they are present. The questionnaire used can be found in the Appendix.
In total, 36 people (21 male, 15 female) participated in the study. The mean age of the participants was 28.5
In the questions following the execution of the requested interactions with the agent, the vast majority of the participants found that the agent executed the right action (94%), and the agent correctly identified missing parts of their voice commands (75%). Participants were regarded to be concerned about the environment (mean: 3.9
Participants overall found it easy to communicate with the agent (mean: 3.8
A correlation analysis on the participants’ answers, using Spearman’s rank correlation coefficient and setting the significance level at
Discussion
We presented a task-based agent that (a) enables conversation with humans without prior knowledge of the environment and (b) provides an AI planner to form complex plans for solving a DR problem. A general purpose task-based agent was introduced, equipped with a novel dialogue generator, which was applied to two domains of the CERTH nZEB SmartHome, energy and health domain. The developed CIPA-gen B is based on the RASA framework and supports multi-intent classification for the aforementioned domains capable of recognizing up to two intents per user command. Towards to a domain agnostic agent, the code-base was engineered so as to be easily applied to different domains. CIPA-gen B uses an embedding method for its NLU model that is not based on pre-trained word vectors of a specific language so as it can be generalized to any natural language. The training data should have to be generated for that language, using the tools selected and the algorithms developed, while omitting a dependency on SpaCy for slot extraction. This consists the only limitation on training a model for a different language. In addition, messages from real-world users should be collected in order to build a real world usage data set that can be used for both training the model to achieve greater accuracy and for evaluation purposes.
Furthermore, a novel dialogue generator, based on the idea of adjacency pairs, has been implemented in order to enable the proposed agent to generate all the possible scenarios in a conversation between the agent and a user. The generated dialogue tree can be utilized by the agent during operation using a variety of methods. In CIPA we apply the RASA framework’s Memoization policy, which follows the dialogue tree generated. More advanced methods, such as calculating the distances to the various dialogue tree branches by using a distance metric could be also applied. These methods could be investigated in further work.
In addition, CIPA-gen B is connected to an AI Planner and utilizes AI Planning research [5] to form complex plans of actions. A PDDL Domain has been defined for an implicit Demand-Response scheme on the CERTH nZEB SmartHome and CIPA-gen B can form and execute plans of actions to reach the required energy consumption. In this light, our research question Q2 has been answered positively. CIPA gen-B is to the best of our knowledge the first implemented Intelligent Personal Agent that utilizes an AI PLanner. Furthermore, CIPA gen-B is (to the best of our knowledge) the only Intelligent Personal Agent that solves an implicit Demand-Response problem.
An experimental user evaluation study was conducted to assess the usefulness and the acceptance of the intelligent agent. The outcomes of the study showed that the agent was regarded to be helpful in energy savings and the vast majority of the participants would like the agent to control loads at their home. Furthermore, the users did not find any major difficulty in communicating with the agent. It is important also to note that despite the medium familiarity of the participants with speech interactions with a virtual agent, and the high number of participants who had no smart home automation at their home, the ease in communication with the agent was deemed to be high. In this context, our research question Q1 has been answered positively. Overall, the results suggest that the proposed system is highly acceptable. However, further longitudinal real-world studies are necessary to show the value of the proposed system in daily living.
Future research and development related to CIPA agent will be focused on the extension of the agent’s capabilities. In the next releases of the agent we plan to integrate the PDDL Problem Generation with more advanced dialogue trees from the part of the agent, that is the agent will form the Planning problem instance after a dialogue with the user. Moreover, we plan to extend the Planning problem into a Temporal Planning one. This will allow the agent to schedule these actions and activate them for specific time intervals, thus allowing the authors to implement a more complex implicit Demand-Response scheme. In addition, we consider using Machine Learning to learn the user’s preferences when generating the planning problem instance in more complex domains. Finally, we aim for a more advanced conversion model from intents/entities to planning actions/variables, as well as Probabilistic Planning and Probabilistic Reasoning over Time support. Moreover, new dynamic ensamble methods could be tested for the NLU models [46]. Our future work also involves the testing of the health domain agent with users in real-life settings.
Conclusion
In conclusion, we have presented a conversational agent-based system for application in a SmartHome. Through utilizing and integrating components for NLU multi-intention, dialogue generation for adjacency pairs, and AI planning to form plans of actions for implicit Demand-Response, the system is able to effectively interact with humans, execute complex actions, and control smart devices. Experimental results demonstrated the usefulness and acceptance of such an intelligent system.
Footnotes
rasa-core (0.10.4), rasa-nlu (0.13.2).
Acknowledgments
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreements No 643607 (myAirCoach), No 732679 (ACTIVAGE) & 773960 (DELTA).
Appendix
Page 1 of Questionnaire.
Page 2 of Questionnaire.
Page 3 of Questionnaire.
