Abstract
Automated Planning deals with reasoning processes where a set of goals must be achieved from an initial state using some actions. Most work on planning have a static view of goals; they are given at start of the planning process and they do not change over planning and/or plan execution. However, in many real world domains, agents need to consider dynamic goal management. In this paper, we propose to increase the performance of planning agents by learning when goals will appear in the near future. The learned predictive models allow agents to perform some kind of anticipatory planning, where the planning process considers not only current goals, but also future predicted goals. We also study under which conditions this anticipatory approach outperforms a standard planning approach. Finally, experiments that support our hypothesis are presented.
Introduction
Automated Planning is the AI discipline that generates plans to achieve goals from given initial states. A planner receives as input a set of actions (that describe how the states are modified by their execution), a set of goals to achieve, and an initial state. Many and varied are the techniques explored to this purpose [16]. In order to make it tractable, classical planning has made a set of assumptions. In relation to goals, a common assumption is that goals are given as input to the planning system and they remain static during the execution of plans.
In recent years the interest in creating autonomous agents for numerous real world applications has greatly increased. Applications range from surveillance purposes to control tasks [1,7,12,15,28]. In these cases the closed-world and static-goals assumptions do not hold any more and reasoning about goals and the changes in the environment becomes essential [29,31]. Therefore, new approaches explicitly deal with dynamic goals. A notable example is the concept of Goal-Driven Autonomy (GDA), inspired by Cox’s work [9] and detailed in [20]. GDA is a conceptual model that explicitly considers goal reasoning as a key component of the deliberative reasoning process of autonomous agents. It allows us to design and deploy autonomous agents that can explicitly reason about their goals, identifying when they need to be updated or changed through environment monitoring.
The first works on GDA only generated goals following a set of pre-programmed rules that are triggered under some state’s conditions [8]. A human must code all the goal-triggering rules before the system’s execution. Some recent works on goal reasoning learn goal formulation without human interaction, extending agents autonomy [19]. All these previous works rely on goal reasoning based on the current state of the world. We want to extend the agent’s performance by creating a system that is able to generate and handle not only the current existing goals, but also the possible upcoming ones, given the current and recent states of the environment.
From an Automated Planning point of view, previous approaches trigger planning episodes when the current state and/or goals change (most often state changes). We refer to this paradigm as Reactive Planning, since its behaviour is triggered to react to changes in the environment. Our system is based on Anticipatory Planning [6] that takes into account the possible upcoming – but not currently existing – goals along with the current goals when triggering the planning process.
The main contributions of this paper include:
The design of a learning system that can predict the appearance of new goals in the near future. The learning system is capable of learning goal’s appearance off-line and on-line by collecting learning examples from the plans’ execution. In an on-line setting it is able to handle concept drift; when the conditions of goal’s appearance change dynamically.
The design of a goal management system that takes into account the learned goal predictive models to generate new goals.
The integration of the new goal management system within a cognitive architecture that already integrates planning, execution, monitoring and replanning capabilities. We call this new approach Learning-driven Goal Generation Anticipatory Planning (
The evaluation of its performance in a typical domain for goal management, comparing
The analysis of the impact of different relevant parameters that influence the anticipation of future goals.
The paper is organized as follows: the next section formally defines Automated Planning tasks; the third section enumerates the domain characteristics where
Planning tasks
A classical
In order to represent planning tasks compactly, the Automated Planning community uses the standard language PDDL (Planning Domain Description Language) [14]. A planning task Π is automatically generated from the PDDL description of a domain D and a problem P. The domain defines the actions that agents can perform. The problem describes the specific task to be solved at each reasoning step; i.e., the state objects involved, the initial state and the set of goals to achieve. The underlying representation formalism used in PDDL is predicate logic. In order to increase efficiency, planners usually transform that high level representation into a more efficient one, such as propositional logic or other equivalent representations.
This planning model assumes the world is deterministic and the agent has full observability, among other assumptions. In most real-world environments, this is not the case. Most previous works focus on the uncertainty about the actions outcomes, but in this paper we are interested in a recently presented paradigm that focuses on reasoning about goal’s uncertainty [5,6]. There have mainly been two ways to handle uncertainty. Either uncertainty is represented explicitly in the planning model and planners reason with those models [4], or planners reason with deterministic world models and when execution of some actions fails, the agent replans [34]. In this paper, we will use a deterministic approach and we will replan either when the set of goals changes or the system predicts it may change.
Anticipatory domains
In many real world domains agents can improve their performance if they can anticipate and reason about the arrival of future goals. Agents can generate [10] some (future) goals and start trying to achieve them sooner. In order to define the anticipatory behavior, we will first provide the following definitions related to goals.
(Goal predicate).
A predicate p is a goal predicate in a planning domain D if any of its instantiations (groundings) appear in the goals list of any of its problems P.
Given any domain D, in theory any predicate can appear instantiated as a goal of any problem. However, in most domains there is a subset of predicates that are the ones that appear instantiated as problems’ goals. For instance, consider a Taxi domain in which a fleet of taxis have to serve a set of customers’ pick-up requests. The domain model defines predicates like
(All-goals set of a set of predicates
,
).
Set of all instantiated goals that can be generated by instantiating predicates in
Goals in
(Known goal).
A goal
Following the previous Taxi domain example, if at a given time step only the goal
(Active goal).
A goal g is active if the agent is trying to achieve it in the next planning episode.
(Predicted goal).
A goal g is predicted if the agent’s inner model foresees its appearance in the near future. At the moment a goal is predicted, it becomes active and known.
The agent should generate plans to achieve those goals in G. Some of them are known goals that the agent has not achieved yet. But, new goals
We will make some assumptions in this paper on the properties that domains should meet in order Anticipatory Planning to be useful.
For each goal
There is at least one goal
There is full observability on the appearance of goals. At each time step, the agent can observe the set of new goals that have appeared or have been generated by itself, updating the set of active goals
When a goal is active, it will not disappear from G until it is achieved by the execution of the plan.
An increasing penalty is paid at each time step for all goals
As we have mentioned, there are many domains where those assumptions hold. For instance, take a company that provides products to some warehouses. Suppose that each warehouse will generate a new goal of having a given product when they do not have stock of that product. If the company is able to predict when warehouses will run out of a product, it can plan to supply the product before the warehouses ask for it, offering a better service and saving time for the warehouse.
Yet another domain is the case of a smart city that wants to control its urban traffic, as we have previously shown in [26]. In that work the goals consisted on decreasing the density level of the busy streets through changing the green and red phases of the traffic lights. We showed that if we are able to learn a predictive model that suggests us the appearance of congestions in the near future, we can incorporate these predicted goals into the set of active goals. Then, we can start the planning process sooner, improving the behavior of the system and leading to less waiting time for the cars and pollution levels for the city.
These domains can be seen as goal maintenance problems [30], where some predicates must hold during execution. But, Anticipatory Planning offers some advantages. First, techniques that perform goal maintenance trigger actions to achieve maintenance goals as soon as the goals do not hold [24]. For example, when the warehouse runs out of a product, the agent will immediately try to achieve the goal of having a specific quantity of the product. Other recent works define proactive agents. An agent will take actions not only in response to a maintenance goal not holding, but also in anticipation of the maintenance goal being violated [13]. The agent reasons about the effects of several actions in the near future and will not take any action that violates a maintenance goal. That is, the agent takes into account that all the maintenance goals hold during the planning process. This can improve the rational behavior of agent systems, but these maintenance goals cannot deal with exogenous events, as our Anticipatory Planning approach does. We store information about the environment and build a model that predicts the appearance of goals (or maintenance goals that will be violated) in the future based on these exogenous events.
Other example domains, which can not be seen from a maintenance goal point of view, are surveillance tasks, as those of police, guards, or drones. If we can anticipate where the security breach will appear (where each security breach will generate a new goal to address it), they can arrive at the place earlier and patrol (or execute a set of actions) where the predictive model suggests. In this paper, we will focus on Unmanned Aerial Vehicle (UAV) domains. UAV’s usage is growing in recent years due to their low price, increasing set of programming tools, and versatility. The possible uses of UAV’s range from military approaches to different surveillance purposes. In this case we propose a domain in which an UAV performs surveillance tasks on an area that has been discretized in a grid.
The UAV has to serve a set of requests coming from a surveillance center. Each goal is a request of performing a given set of tasks (as taking images) in a specific cell of the grid. To service a request, the UAV must move from its current position to the cell of the request. For example, a request can be taking an image or making some kind of measurement. So, the goal would be
All these domains share the same assumptions, so we expect similar results in terms of improvement of performance over a system that does not anticipate to their appearance. Since we had good results in the traffic domain, we wanted to analyze here whether they generalize to domains with similar assumptions on goals.
In the following section we describe the architecture that allows the agent to generate and formulate its own goals based on learning, as well as performing anticipatory planning and executing the generated plans.
Architecture
Given that the agent needs to integrate learning, goal management, planning, and execution, we have based our work in a domain-independent architecture that we had developed,

Planning architecture that includes goal formulation and goal learning capabilities.
Initially, the
The

Generation of two training examples
At each time step, and before calling the
It can also change the metrics to optimize for, though we are not changing metrics here.
In this section we describe the Learning-Driven Goal Generation, corresponding to the modules within the dotted line shown in Fig. 1. This component allows the architecture to formulate new goals when the system predicts they will appear in the near future. There are three steps in this process: the generation of learning examples; the construction of the predictive model; and the formulation of new goals.
Generation of examples
Our base idea is to predict future goals based on current and past observations. Observations include measurable features about the state and checking whether new goals have appeared. Examples of state features in the case of the UAV can be temperature readings, seismic activity, or any other measurement that our UAV can observe at each cell. Therefore, we will use the state features to generate the attributes of each training example. The check about the appearance of a new goal at each cell generates the class of each example. In the case of the UAV, the check at each cell and time step returns true if the surveillance center requested an observation in that cell and time step.
The
The learning task consists on building a predictive model for each cell i of whether the system expects a goal to appear at i in the near future. Therefore, each example should contain information about several observations in the past S time steps, as well as the information on whether the goal has appeared in exactly H time steps into the future. S defines the number of previous time steps that will be taken into account when making the prediction, and H refers to the prediction horizon. So, our learning system uses two parameters whose values we will study in the experimental section.
Second, a training example can be generated for each cell i and time step t from a set of environment observations:
The set of training examples will be:
Building a goal predictive model
Once the system has the examples, it can use a learning algorithm to generate a predictive model of future goal appearances. We learn a single model for all cells. In case different cells have different behaviors with respect to goal appearances, the learning system can use the
Our approach does not need a particular learning algorithm. The only restrictions are that it has to handle the representation formalism of the examples, and it has to deal with classification tasks (class is discrete). As we mentioned, in our previous work, we used a relational representation. Then, we had to use a relational learning algorithm as TILDE [3], which learns relational decision trees from structured examples based on predicates. In this paper, we use attribute-value representation, so we can use any standard learning technique that solves classification tasks. The input of the learning algorithm is the set of training examples generated in the previous step. The output is a predictive model, L. The resulting predictive model is provided as input to the
In the case of an off-line learning setting, the system would collect all examples from one or several runs and learn from them. In an on-line setting, as the one in this paper, examples come over time and the system learns every time new examples come. Given that the reasons (pattern) for the appearance of goals can change over time, there might be a concept drift. Therefore, we add new examples and re-train at each time step.
Generation of predicted goals
In the last step of learning-driven goal generation, the
Experimental setting
In this section we cover the simulator we have implemented for testing our approach, as well as the input variables and metrics used in the comparison.
UAV simulator
In our previous work, we used an existing urban traffic simulator, SUMO [2]. Given that we did not have access to a corresponding UAV simulator we built a simulator for our tests. This simulator allows us to define different stochastic scenarios under diverse settings. The simulator receives two files:
Static configuration file: it contains the static characteristics of the simulation, such as the map size. For the experiments, we have generated a
Screenshot from the surveillance UAV simulator. In this example scenario there is an active volcano on the top right side of the grid and a lake on the bottom left.
Dynamic configuration file: it contains the dynamic characteristics of the simulation. In particular, for each time step t, it contains
At each time step t, the simulator will receive one action from the
In the experiments, we have defined two kinds of goal appearance patterns: random and pattern-based. Some goals appear randomly. Other goals follow an appearance pattern that depends on the observations. We will define in each experiment what pattern we are using.
First we introduce the parameters we will vary through the simulation and then we present the employed metrics. We will study the impact of varying the main parameters that can affect the
Anticipation horizon H: we use the values one, three and five.
Noise in the appearance of the predicted goals: we use the values 0%, 20% and 40%. The noise level represents the probability that a goal does not appear even if the pattern indicates it. We stop at 40%, because with higher values no appearance pattern would be generated.
Goal ratio: ratio between the number of goals that follow an appearance pattern and the total number of goals. The latter is the sum of the randomly generated ones and the ones that follow a pattern. Both numbers are computed over the whole simulation. We test five different values:
Number of goal appearance patterns that occur at the same time step (one per cell maximum): from one to five.
Number of agents: only one UAV pursuing the goals.
Exogenous events produced by the simulator: seismic activity and earth temperature. These parameters take values from zero to three. A volcano eruption is correlated with high values of both parameters, i.e., it is likely to occur. The frequency of an eruption will depend on these values and will be varied over the experiments to test the system’s capabilities.
We run each simulation for
In the experiments the agent pays an increasing penalty of one for each time step when a goal that has already appeared (is active) has not been achieved. The penalty values P obtained by each approach at the end of the simulation depend on the number of goals. Thus, analysis of results can be more difficult to perform. We propose a metric that normalizes the penalty paid by each approach. We denote this metric with
Experiments and results
In the first experiment we modify three of the parameters that can affect
Influence of parameter settings in lgg-ap
We begin with a one step anticipation pattern (

Sample of observations generated in the volcano cell.
Firstly, we want to test the performance of the two approaches over time, and the results are shown in Fig. 5. When

Accumulated penalty of
We then conduct several experiments where we modify the parameters described in the experimental setting. The results are shown in Fig. 6. The difference between
As the noise levels are bigger,

From left to right, top to bottom: one, three and five time steps anticipation. The x-axis represents the ratio between the number of goals that follow an appearance pattern and the total number of goals. The y-axis represents the value of
We obtain better results anticipating the goals one or three time steps rather than five. In this case
Even in the worst case, in which the agent learns an inaccurate predictive model,
In this experiment we study how the number of goal appearance patterns influence

Normalized penalty obtained by using different number of goal appearance patterns. The x-axis represents the number of appearance patterns that could occur at the same time. The y-axis represents the value of
In the previous experiments we have shown that
To test the capability of adapting to new goal patterns, we generate a one time step anticipation pattern in the volcano cell and we change it to a random pattern at time step 850. At that time step, pattern-based goals will not appear any more in the volcano cell and pattern-based goals will appear in the lake’s cell. The appearance of goals in the lake will be correlated only with the temperature variable unlike the previous pattern that also depends on the seismic value.

Accumulated penalty paid by Reactive Planning and
As we can see in Fig. 8,
Most works in the context of goal reasoning have focused on the Goal-Driven Autonomy (GDA) conceptual model [9,20]. A GDA agent generates a plan to achieve a given goal together with its expectations; i.e., the set of constraints that are predicted to hold in the partial states generated when executing the plan. The agent monitors the environment for discrepancies between its expectations and its observations during execution. If the expectations do not match the observed states or if the current plan fails, the GDA agent can formulate a new goal [11,23]. The first works on GDA formulated new goals using rule-based principles, which describe situations where specific goals should be generated [8]. These rules were hand-crafted by a domain expert.
Few works have studied the addition of learning capabilities to the agents in the goal reasoning process. Powell et al. extended the ARTUE agent [20] with the ability to learn goal selection knowledge through interaction with an expert [25]. They framed this as a case-based supervised learning task that employs active learning. Unlike their approach, we do not need the interaction with a human to formulate goals, since we learn this ability by collecting examples from the agent execution in an environment, that can be real or simulated.
Without the need of human interaction, Jaidee summarized some work on creating GDA agents capable of automatically acquiring knowledge using Case-Base Reasoning (CBR) and Reinforcement Learning (RL) methods [19]. In this case, the problem domains are Real-Time Strategy (RTS) games, more specifically DOM and Wargus. Weber et al. implemented a method that also uses CBR and intent recognition in order to build GDA agents that learn from demonstration [32]. They applied the approach to build an agent for the RTS game StarCraft. Molineaux and Aha employed a variant of FOIL [27] to learn models of unknown exogenous events in partially observable, deterministic environments and showed how they can be used by a GDA agent [22]. They implemented this learning method in FOOLMETWICE, an extension of ARTUE. Maynord et al. employ TILDE [3], a relational learning algorithm, to learn a decision tree for goal prediction in the blocksworld planning domain [21]. Finally, Gopalakrishnan et al. learn goals from planning traces in planning domains [17]. Our work differs from those in the sense that they are not generating and reasoning with possible upcoming goals, as we do. While they learn from world states in isolation, we take into account the time context, as we are planning with goals predicted by an on-line learning model.
Regarding the concept of Anticipatory Planning we are addressing in this paper, while the idea of using Automated Planning taking into account possible upcoming goals comes from previously presented works [5,6], our work differs from theirs in some aspects. While they assume they know a priori the goal arrival distribution, we are learning it through the collection of examples from the system’s execution. Another difference is that they use a special planner that reasons internally with the upcoming goals distribution and its penalties. We propose to use a classical planner, incorporating either the current or the possible upcoming goals and replanning when a new goal appears.
Discussion
In this paper we have presented an architecture that allows the design and implementation of autonomous agents with learning capabilities. Using this architecture for a small UAV domain, we have shown that an agent can discover opportunities and adapt its behavior as the surrounding environment changes following a concept drift approach.
We have gone further in the goal reasoning concept, letting the agent not only reason with the current state of the world but also with the possible near future. If the agent discovers a goal before it really appears, it can start the planning process sooner, improving its performance.
Finally, we have enumerated the requirements that a domain must fulfill in order to successfully apply Anticipatory Planning. We have presented a list of such domains and selected one of them to perform the experiments. Through a surveillance UAV domain, we have carried out some experiments in order to discuss the main characteristics and parameters that affect
Future work
In future work, we would like to handle goals that must be achieved on a given time because they disappear, or goals with non homogeneous penalty distribution like in some real world domains. We would also like to explore
Footnotes
Acknowledgements
This work has been partially supported by Spanish MINECO funded project TIN2014-55637-C2-1-R. We would also like to thank the anonymous reviewers for their useful comments that helped us greatly improve the paper.
