Abstract
In this paper, we report on the planning and visualization capabilities of
Introduction
Meetings are an essential part of almost every productive endeavor today, and research has shown that people spend up to three-quarters of their working lives in meetings; however, meeting productivity numbers rarely breach the 50% mark [19]. Moreover, meetings are not going away anytime soon – indeed, this observation made by Hackman & Kaplan [9] holds true even today: “Almost every time there is a genuinely important decision to be made in an organization, a group is assigned to make it – or at least to counsel and advise the individual who must make it.” In the present era, this motivates the need for AI assistants that will help make meetings more efficient and productive for human agents engaged in a decision-making process.
Previous work on such assistive agents has included both purely software agents [4,25] as well as ones that co-inhabit physical spaces with humans [6] and use their understanding of what is happening in those spaces to act as collaborators on cognitive tasks such as decision making.
Some of the primary responsibilities of the agent (Fig. 1) include solving problems interactively with other humans in the environment, generating summaries of events in the workspace, visualizing the decision processes of all the embedded agents and of the group (humans and agents) as a whole, proactively managing resources like background services and data; and so on. We divide the responsibilities of
An interesting caveat of having a truly proactive agent is the loss of common ground with the human decision-makers – i.e. the humans are no longer in control of the planning and execution process of the agent. This makes it imperative to establish effective channels of communication between the agent and the human(s) interacting with or around it, so that the benefits of collaborative decision-making can be realized. We outline the role of visualization in the operation of
In this paper, we report on
how proactive elements of the decision-support agent can be designed in the context of long-term interactions for collaborative decision-making with humans in the loop;
how visualization capabilities of the agent become an integral component towards the support and realization of those capabilities and in establishing common grounds for trust and transparency of its decision-making processes.
We present a suite of demonstrations to illustrate these functionalities from an initial deployment of the system in the

The different roles of

The building blocks of
In the following section, we present the design and implementation of
A brief introduction of planning
Automated Planning is the process of generating courses of action to achieve goals given a description of the world. Informally, planning is usually done with a declarative set of operators or actions in the world that transform states into other states. Goals are a subset of those states. The input to a planner is thus a domain file (describing the environment and operators available to the agent) and a problem file (describing the current state of the world and the goal of the agent). The output or solution is a sequence of operators or a plan that transforms the current state into the goal state. Plan recognition, on the other hand, involves mapping observations (which can be in the form of partial states or plans) to a complete sequence of actions (or a final goal in the case of goal recognition). A detailed treatise of planning can be read in [8]. We posit that automated planning is uniquely positioned to orchestrate the many processes in a smart room environment which require long term sequential decision-making while goal and plan recognition allows the agent to be cognizant of the intentions of the agents in the room and inform its own decision making processes to best suit the needs of the latter. For example, the agent can anticipate that a certain task is going to be performed and start proactively bringing up the required resources for it; or reason about its roles in proceedings based on those anticipated tasks – e.g. it can automatically start recording a meeting if a summary will be requested; or even assist the humans in reminding them missing parts of a process that they might have overlooked (such as in the orchestration of a business process).
For an “end-to-end” planning system, such as the one we describe, the planning problem is built from sensory data and the solution or plan is dispatched to actuators that operate on the real world. Furthermore, for systems with humans in the loop, such as ours, the actuated plan of the agent should consider intentions of agents around it and it follows that the sensory data should also include information that facilitates prediction of the same. In order to achieve this, we separate out the internal decision making processes of
ENGAGE
The Engage process consists of the decision support assistant monitoring various inputs from the world in order to situate itself in the context of the group interaction – that is, the assistant literally engages with the world around it. The process is further divided into two stages which run in a tightly-coupled loop. First, the assistant gathers various inputs like speech transcripts, live images, and the positions of people within the room; these inputs are fed into a higher level symbolic reasoning component constructed from plan recognition algorithms. The second stage involves the assistant acting on the recommendations of the plan recognition step; it (1) requisitions resources and services that may be required to support the most likely tasks based on its recognition; (2) visualizes the decision process – this can depict both the internal process, which is the agent’s own recognition algorithm, and the external process, which is task-dependent; and (3) summarizes the group decision-making process (e.g. a meeting) in various forms. In this way, the Engage stage provides the “proactive” part of the decision support assistant.
ORCHESTRATE
The Orchestrate process is the decision support assistant’s contribution to the group’s collaborative process. Once the assistant is able to identify and situate itself in the task that is being collaborated upon (during the Engage phase described above), it must contribute to the decision-making. This can be done using standard planning techniques, and can fall under the aegis of one of four actions as shown in Fig. 2. These actions, discussed in more detail in [21], are: (1) execute, where the assistant performs an action or a series of actions related to the task at hand; (2) critique, where the assistant offers recommendations on the actions currently in the collaborative decision sequence; (3) suggest, where the assistant suggests new decisions and actions that can be discussed collaboratively; and (4) explain, where the assistant explains its rationale for adding or suggesting a particular decision. The Orchestrate process thus provides the “support” part of the decision support assistant.
The Engage and Orchestrate processes can be seen as somewhat analogous to the interpretation and steering processes defined in crowdsourcing scenarios in [16,24]. A key difference is that in those works the humans are the final decision makers with the assistant merely supporting the decision making.

Flow of control in
The design of the entire system is shown in Fig. 2, situating
Built on top of Watson Conversation and Visual Recognition services on IBM Cloud and other IBM internal services.
The design of
Knowledge acquisition/learning
The knowledge contained in the system comes from two sources – (1) the developers and/or users of the service; and (2) the system’s own memory; as illustrated in Fig. 2. One significant barrier towards adoption of higher level reasoning capabilities into such systems has been the lack of familiarity of developers and end users with the inner working of these technologies. With this in mind we provide XML-based modeling interfaces – i.e. a “system config” – where users can easily configure new environments. This configuration contains description of the environment – for example, for a smart room, this includes people who are generally expected to use the room, layout of the room, what different devices are there and where, and so on. That information in turn enables automatic compilation of the files that are internally required by the reasoning engines. Thus system specific information is bootstrapped into the service specifications written by expert developers, and this composite knowledge can be seamlessly transferred across task domains and physical configurations. The memory of the system is contained in the“system logs”, and is intended to provide the rest of the domain knowledge to the planning component. This is meant to enable easily customizable and generalizable configuration of the system with a very small setup phase, as well as to let services continually learn and adapt to the dynamics of a particular environment over time. In this paper, we do not discuss this further, but instead focus more on the decision making processes of the agent.
Goal and plan recognition
Currently, the system employs the probabilistic goal/ plan recognition algorithm from [18] to compute its beliefs over possible tasks that are currently underway in its environment. The algorithm internally casts the plan recognition problem as a planning problem by compiling away observations to the form of actions in a new planning problem. The solution to this new problem enforces the execution of these observation-actions in the observed order. This technique is heavily dependent on the availability of detailed and accurate models. However, it has the advantage of being able to explain the reasoning process being the belief distribution in terms of the possible plans that the agent envisioned using the compilation in [18]. The explanation service (visualized in the next section) is dependent on the underlying algorithm(s) used.
Plan generation
The

Snapshot of the mind of
In the following section, we will concentrate on the visualizations of the planning technologies afforded by
To this end, we externalize the “mind” of
Raw Inputs – These show the camera feeds and voice capture (speech to text outputs) as received by the system. These help in externalizing what information the system is working with at any point of time and can be used, for example, in debugging at the input level if the system makes a mistake or in determining whether it is receiving enough information to make the right decisions. It is especially useful for an agent like
Lower level reasoning – The next layer deals with the first stage of reasoning over these raw inputs – What are the topics being talked about? Who are the agents in the room? Where are they situated? This helps an user identify what knowledge is being extracted from the input layer and fed into the reasoning engines. It is also meant to increase the situational awareness of agents by visually summarizing the contents of the scene at any point of time.
Higher level reasoning – Finally, the topmost layer uses information extracted at the lower level to reason about higher level tasks in the scene. The upper widget thus provides a visualization of the outcome of the plan recognition process, and an explanation of this process in terms of the information extracted in the lower levels (agents in the scene, their positions, speech intents, etc.). This puts in context the agents current understanding of the processes in the scene and reveals the rationale behind its actions.
The interface consists of five widgets. The largest widget on the top shows the various usecases that the

An example of an orchestration plan, taken with permission from [2]. The system employs a method for visualization as a process of explanation, explained in detail in [2]. it instantiates the “model reconciliation” process [3] with an empty model to determine the minimal subset of domain features that may be required to prove optimality of a plan. This figure illustrates the minimized view of conditions relevant to a plan describing the decision-making process in the smart room. Blue, green and red nodes indicate preconditions, add and delete effects respectively. The conditions from the domain which are not necessary causes for this plan (i.e. the plan is still optimal in a domain without these conditions) are grayed out in the visualization (11 out of a total 30).
Below the largest widget is a set of four widgets, each of which give users a peek into an internal component of
Since many of the tasks either involve confidential client data or are not in the public domain yet, we do not provide empirical analysis of the system in any individual domain. In this section, we instead report on four emerging deployments of the
Smart room assistant
The primary evaluation and demonstration of the
Engage
We demonstrate via a recorded video (link: ibm.biz/jones-demo) how the Engage process evolves as agents interact in the
Orchestrate
The orchestrate role of the system is demonstrated by its ability to moderate decision making in multi-entity settings. In such scenarios, the assistant must act as a facilitator of various opinions and preferences among the multiple decision-making entities.

A plan from the exoplanet exploration domain, taken with permission from [12]. The nodes are as follows: dark green – calculated variable; gray – function; and orange – retrieved from database.
The next deployment that we describe is the use of a smart assistant to help astrophysicists visualize and analyze data about extra-solar (exo) planets, which are planets outside the traditional Solar System. This assistant – described more fully in a report to appear early next year [12] – enables subject matter experts (in this case, astrophysicists) to interact with data, visualizations, and analyses about exoplanets. The exoplanet assistant provides various capabilities including deixis using speech and gestures; proactive recognition and guidance; interactive query refinement; symbolic model discovery; and explainable self-programming using automated planning techniques.
For example, given a user query about computing a specific quantity – e.g. the stellar luminosity – the assistant can return not just the computed quantity, but the provenance of the calculation as an explanation. This explanation, visualized in Fig. 6, consists of the steps taken by the planner to compute the desired quantity. A full demonstration of the smart assistant in action can be viewed via the following URL: ibm.biz/tyson-demo. A preliminary, deployed version of this instantiation of the smart assistant was awarded Best Systems Demonstration [11] at AAAI 2018.
Automated meeting summarization
The
This kind of visual summary provides a powerful alternative to established meeting summarization tools like text-based minutes. The visual summary can also be used to extract abstract insights about this one meeting, or a set of similar meetings together and allows for agents that may have missed the meeting to catch up on the proceedings. Whilst merely sampling the visualization at discrete time-intervals serves as a powerful tool towards automated summary generation, we anticipate the use of more sophisticated visualization [5] and summarization [14,15,22] techniques in the future.
Conclusion
In this paper, we presented
Footnotes
Acknowledgements
We would like to gratefully acknowledge Jason Ellis, Victor Dibia, and Andy Aaron for their help with this work and the various deployments in the
