Abstract
HTN (Hierarchical Task Network) planning has shown its effectiveness and potential in emergency decision-making. In realistic emergency conditions, the temporal preferences should be considered. This paper proposes a temporal HTN planner, TPHTN (Temporal Preferences HTN), to handle the temporal constraints with preferences. The planner uses STNP (Simple Temporal Networks with Preferences) to express the temporal preferences information, and extends the operator and method to express the temporal preferences information in planning domain knowledge. In the planning process, TPHTN propagates the STNP top-down, and three definitions of level consistency are proposed to estimate the quality of STNP, and a new heuristic search is designed to choose the suitable operator/method to apply based on the quality of the corresponding STNP. When the planning process is terminated, TPHTN will generate a plan satisfying the decision-makers preferences. Finally, the effectiveness and practicability of TPHTN are demonstrated with emergency logistics problems.
Introduction
Emergency decision-making [1, 2] is used to respond the emergency events, such as floods, forest fires, terrorist attacks. Emergency decision-making involves not only how to perform actions, but also involves what actions to perform. The conventional mathematical model is not good at solving such emergency decision-making problem, which needs to consider numerous action reasoning [3]. Automated planning, one of the most important research fields of AI (Artificial Intelligence), is the reasoning side of acting [4]. Especially, HTN (Hierarchical Task Network) planning [5, 6] can generate the plan quickly by using domain knowledge. In particular, HTN planning based emergency decision-making approach has attracted growing interest and been studied extensively recently [7–10]. In many realistic conditions, the efficiency of the emergency plan execution is influenced greatly by the quality of the plan, and the decision-makers need to choose the suitable plan to execute from multiple plans. The previous works evaluate the emergency plans and choose the suitable one after the plans are generated [11–13], and they do not consider how to generate the high quality plan. In additional, the previous HTN planners rarely are designed for generating the high quality plan [6]. This paper will consider the temporal preferences of the emergency plan and try to generate the high quality plan satisfying the decision-makes preferences.
Temporal constraint [14] is one of the core characteristics in the realistic emergency decision-making problem. The decision-makers have different preferences when the plan is executed at different times. In detail, they have different preferences for the temporal variables in the plan are assigned with different values. Take the example of a resource dispatching problem in emergency decision-making. When the location G is suffered by a disaster, the Emergency Response Center orders an emergency task: transporting the emergency materials and emergency equipments to location G before 12AM. The transport teams are located at different locations A, B and C. The emergency materials and equipment are located at locations C, D and E. This is a temporal planning problem, and it has multiple valid plans. It’s assumed that there are two valid plans. One plan plan1: the emergency materials are transported to G at 10AM, the emergency equipments are transported to G at 12AM. The other plan plan2: both the emergency materials and emergency equipments are transported to G at 11AM. Most decision-makers would hope the emergency materials and emergency equipments are transported to the destination at the same time. So they will prefer plan2 to plan1, although both of the two plans are valid plans. Therefore, it is necessary to consider temporal preferences in HTN planning for emergency decision-making. There are two challenges to solve this problem: first, how to express the temporal preferences information in the planning; second, how to generate the high quality plan which satisfies the decision-makers preferences from the huge planning search space.
There are several HTN planners handling temporal constraints, such as Nonlin [15], O-Plan [16], and SIADEX [9]. STNs (Simple Temporal Networks) have been used to express the temporal constraints [17, 18] in planning. These planners check consistency of the STN associated with the partial plan and to ensure the temporal constraints of plan are satisfied. Based on this strategy, our research team developed an HTN planner named XEPlanner [7]. XEPlanner extends the operator, method, axioms of the conventional HTN planning to express temporal constraints among the primitive tasks and non-primitive tasks. XEPlanner meets the requirement of emergency decision-making with complex temporal constraints. All the above planners focused on generating a valid plan, but do not consider which plan is more preferred than the others.
Preference-based planning (PBP) [19] has attracted much interests in recent years. Most of the PBP planners are based on the IPC(International Planning Competition)’s PDDL3 (Planning Domain Definition Language 3) language [20, 21]. Furthermore, PDDL3 has recently been extended for preference-based HTN planning [22–24]. There are two ways to manage and handle preferences. One is advocating the qualitative approach [25, 26] and the other is advocating the reward-based approach [27]. The qualitative approach uses a partial order to describe the preferences, the main disadvantage of this approach is that the use of a partial order introduces the complication of there being many incomparable plans [21]. The reward-based approach uses award or punishment value to describe the preferences when the constraint is satisfied or violated. Unfortunately, it can only express the discrete preferences, while there are many continuous preferences in temporal constants in the realistic emergency decision-making problem.
Aiming to generate the plan satisfying the decision-makers temporal preferences, we present a novel temporal HTN planner (Temporal Preferences HTN, TPHTN) to handle temporal preferences. Firstly, TPHTN uses a new expression formalism to express the temporal constraints with preferences in the planning domain knowledge; secondly, TPHTN uses a new heuristic search to guide the planning direction, the temporal preferences information, which is encoded by STNP, is propagated top-down in the planning process, and three definitions of level consistency are proposed to estimate the quality of STNP. TPHTN chooses the suitable method/operator to apply based on the quality of the corresponding STNP. Unlike the other temporal planners, TPHTN can not only generate the valid plan, but also can generate the plan which will satisfy the decision-makers preferences better.
The paper is organized as follows. Section 2 presents the description of the HTN planning problem with temporal preferences. Section 3 presents the domain representation in TPHTN. Section 4 proposes the planning algorithm of TPHTN. In Section 5, some emergency logistics problems are used to validate the effectiveness of TPHTN. Finally, the conclusions and future works are given in the Section 6.
Problem description
The emergency decision-making problem can be represented by an HTN planning problem. The current emergency situation can be described as the initial state of the planning problem, the emergency task set is the initial task set of the planning problem, the operator set in the planning problem is translated by the actions which can be used to respond the emergency event, the method set is decoded by the domain knowledge from the emergency response expert. The solution (i.e., a plan) of the HTN planning problem is an emergency response plan for the emergency decision-making problem.
In this paper, we translate the emergency decision-making problem into an HTN planning problem. The planning problem has the following assumptions: (1) the planning state is finite; (2) the planning system is fully observable; (3) the action is deterministic, i.e., if a action can be applied, after the action is applied, the current set of states is deterministic; (4) the decision-makers temporal preferences information is known.
The conventional definition of HTN planning problem [4] does not consider the decision-makers preferences. So, we define a new concept for the problem in this paper as following.
The metric preference function can not only express discrete preference information, but also can describe the continuous preferences. We will detail the preference function in Section 3.1.
The preference degree is used to measure the quality of the plan. We compute the preference degree of a plan by the lowest preference value of the temporal variables in the plan. The plan with higher preference degree will satisfy the decision-makers preference better. In this paper, the aim of TPHTN is to generate the plan with higher preference degree.
Domain representation for TPHTN
TPHTN extends STN to express the temporal preferences, and uses temporal operator and temporally-enhanced method to express the temporal preferences in HTN planning domain.
Temporal constraints with preferences
The temporal constraints in this paper are encoded by STNs with preferences.
Four common illustrations of temporal preference functions in TPHTN are shown in Fig. 1. The general preference function is shown in Fig. 1(1), the decision-makers hope the temporal variable is assigned by the value in the interval [T3, T4] in its domain [T1, T2]. For the interval [T1, T3], the decision-makers preference is increasing linearly from 0 to 1. While, the decision-makers preference is decreasing linearly from 0 to 1 for the interval [T4, T2]. There are three varieties of temporal preference functions as shown in Fig. 1 (2), (3) and (4). Aiming to express temporal preferences in TPHTN planning process, a new formalism of temporal function is given. For a temporal consistent, the tuple ((T1, μ1) , (T3, μ3) , (T4, μ4) , (T2, μ2)) is used to express the decision-makers preference in its domain [T1, T2]. Specifically, the decision-makers preference changes linearly from μ1 to μ3 for the interval [T1, T2]. The rest intervals, and so on. The preference function in Fig. 1 (1) can be denoted by ((T1, 0) , (T3, 1) , (T4, 1) , (T2, 0)), the preference function in Fig. 1 (2) can be denoted by ((T1, 0.5) , (T3, 1) , (T4, 1) , (T2, 0)), the preference function in Fig. 1 (3) can be denoted by ((T1, 0) , (T3, 1) , (T2, 1) , (T2, 1)), the preference function in Fig. 1 (4) can be denoted by ((T1, 1) , (T1, 1) , (T2, 1) , (T2, 1)).
Temporal operator
The operator in TPHTN is a durative action operator [28], and stamped by the temporal constraints with preferences information.
An example of an operator in TPHTN is shown in Fig. 2. It represents driving a truck in emergency transport domain. The instantaneous condition of this operator must be satisfied before it is applied (i.e. the truck must be in the departing location; there must be enough fuel on the truck; and there must be a road connecting the departing location and the destination). The continue condition of this operator must be satisfied when it is applying (i.e. there must be a driver in the truck to drive it). The end condition of this operator must be satisfied when it will be ended (i.e. the preparation for receiving the truck must be done at the destination). When the operator start to be applied, the instantaneous delete effect is that the truck is at the departing location, the instantaneous add effect is that the truck is on the road. When the operator is achieved, the delete effect is that the truck is on the road and the add effect is that the truck is at the destination. The temporal consistent of this operator is that the duration time between 4 and 10, considering that if the truck is drive too fast, the risk will rise, meanwhile, if the duration is too long, the other tasks may be affected, so the preference of this temporal consistent is that: the duration between 6 and 7 is best (i.e. the preference degree is 1), the preference degree is increasing linearly from 0 to 1 if the duration is assigned between 4 and 6, the preference degree is decreasing linearly from 1 to 0 when the duration is assigned between 7 and 10.
Temporally-enhanced method
An example of a method in TPHTN is shown in Fig. 3. It represents emergency transport strategy. The compound task transporting a cargo can be decomposed by this method. The condition must be satisfied when the task is decomposed by this method, the task is decomposed to four subtasks by this method: truck refueling, driver readying, boarding cargo and driving truck. The decision-makers wish the truck is drive to the destination as soon as possible after the cargo is boarded. The detailed temporal constraints with preferences in this method are shown in Fig. 3.
Planning algorithm of TPHTN
TPHTN planning process
In the planning process of TPHTN, the STNP of planning problem, which is initialized by the initial tasks and the temporal constraints with preferences, is propagated and updated top-down. Specifically, when a primitive task is applied by an operator, the STNP is updated by adding the temporal constraints with preferences of the operator; when a compound task is decomposed into subtasks by a method, the temporal constraints with preferences in STNP of the compound task are propagated to the temporal constraints of the subtasks based on the propagation rules, which is similar of the hierarchical resource propagation rules [8], and the STNP is updated by adding the temporal constraints with preferences related on the subtasks.
In this paper, we measure the quality of a plan by using its preference degree. The preference degree of the plan depends on the STNP associated with the plan. So, we propose an STNP-based heuristic search for TPHTN. The overall framework of this planning algorithm is shown in Fig. 4. In this planning algorithm, three definitions of level consistency are proposed to estimate the quality of STNP (We will define the three level consistencies in Section 4.2). TPHTN will choose the suitable method/operator to apply based on the quality of the corresponding STNP. When a complete plan is generated, the STNP associated with the plan is got. TPHTN needs to solve the STNP problem, which is a constraint optimization problem [29]. Finally, TPHTN will get a plan satisfying the decision-makers preferences.
Level consistencies of STNP
The conventional consistency checking is used to analyze STN. But it cannot estimate the quality of STNP. In this section, three definitions of level consistency for STNP are proposed to estimate the quality of STNP.
The α-level consistency of an STNP (X, C, F) denotes that the preference degree of all assignment for (X, C, F) are not less than α.
The β-level consistency of an STNP (X, C, F) denotes that the preference degree of all assignment for (X, C, F) are not greater than β.
If an STNP is γ-level consistent with γ, there must be an assignment, the preference degree of the assignment is not less than γ.
For two STNP ω1 and ω2, if ω1 is γ-level consistent with γ while ω2 is γ-level inconsistent with γ, it can infer that the preference degree of result for ω1 is greater than ω2. In practice, the maximum γ for the γ-level consistency of an STNP is difficult to get. TPHTN estimates the γ-level consistency of an STNP based on its α-level consistency and β-level consistency.
Heuristic rule
Aiming to generate the better plan, the heuristic search is designed to choose the suitable method/operator to apply in TPHTN. When there are alternative methods to decompose the current task, TPHTN tries all the valid methods and generates the STNPs, then TPHTN estimates the qualities of method based on the qualities of STNPs and choose a suitable method to decompose the compound task. It is assumed that there are two valid methods m1 and m2, the STNP ω1 is generated when TPHTN uses m1 to decompose the complex task, while the STNP ω2 is generated when TPHTN uses m2 to decompose the complex task. The degree of α-level consistency and β-level consistency of ω1 are α (ω1) and β (ω1), the degree of α-level consistency and β-level consistency of ω2 are α (ω2) and β (ω2). TPHTN choose one method to decompose the current task based on the following heuristic rules: If β (ω1) ≤ α (ω2), TPHTN chooses m2 firstly; If β (ω2) ≤ α (ω1), TPHTN chooses m1 firstly; If both (1) and (2) are not true, then TPHTN chooses several γ randomly from [γ
min
= max {α (ω1) , α (ω2)} , γ
max
= min {β (ω1) , β (ω2)}]. In practice, TPHTN chooses the extreme points and trisection points. If there is a γ, ω1 is γ-level consistent with γ while ω2 is γ-level inconsistent with γ, TPHTN chooses m1 firstly; if there is a γ, ω2 is γ-level consistent with γ while ω1 is γ-level inconsistent with γ, TPHTN chooses m2 firstly; else, TPHTN believes the methods m1 and m2 are equal and chooses one to apply randomly.
If there are more than two valid methods to decompose the current compound task, TPHTN computes multiple times and compares two methods each time, and a suitable method is chosen to apply finally. In the planning process, if there are multiple valid operators for the primitive task, TPHTN chooses one operator to apply based on the same procedure.
TPHTN planning algorithm
The TPHTN planning algorithm is shown in Algorithm 1. The input of TPHTN is an HTN planning problem with temporal preferences (s0, T, O, M, C, F). Using a heuristic search, TPHTN will generate a plan that will satisfy decision-makers preferences better. The main process of the planning algorithm consists of the following six steps: Initializing the variables: the variable plan, storing the current partial plan, is initialized with a null plan, the current planning state s is initialized with s0, the current STNP is initialized with (T, C, F). Checking the complete plan: if T = 0, TPHTH has generated a complete plan plan and an STNP associated with plan, go to STEP 6; else, go to STEP 3. Choosing a task: non-deterministically chooses a task t which is no other task in T to be preceded, if t is a primitive task, go to STEP 4; if t is a compound task, go to STEP 5. Applying the operator: choose one operator by the heuristic rules in Section 4.2 to apply (Lines 10, 11, Algorithm 1), STNP ({T, plan} , C, F) is updated by adding the temporal constraints with preferences of a (Line 12, Algorithm 1). Then, go to STEP 2. Applying the method: choose one method by the heuristic rules in Section 4.2 to apply (Lines 19, 20, Algorithm 1), and update the STNP ({T, plan} , C, F) by propagating the temporal constraints with preferences of the compound tasks and adding the temporal constraints with preferences of m (Line 21, Algorithm 1). Then, go to STEP 2. Solving the STNP: solve the STNP problem (plan, C, F) (Line 25, Algorithm 1), TPHTN generates a fixed plan.
Local search algorithms are naturally suited for dealing with the constraint optimisation problem [29], the STNP problem is a kind of constraint optimisation problem. In this paper, we use local search algorithm to solve the STNP problem and assign a fixed value for the temporal variable in plan. Finally, TPHTN outputs a fixed plan.
Experimental study
Emergency logistics distribution is one of the most important typical emergency decision-making problem. Large amount of emergency materials are required to be allocated and transported in limited time for disaster relief [31]. It involves numerous emergency departments and materials, both the logistical reasoning and temporal reasoning must be handled at the same time in the problem. Furthermore, the temporal preferences should be considered in this experiment. In this section, a series of emergency logistics planning problems are designed to demonstrate the capacity of TPHTN for emergency decision-making.
Emergency logistics domain and problems
The general emergency logistics domain involves transporting emergency materials to satisfy series of material demands. A typical emergency logistics can be described as follows: the regions A1, A2 and A3 are demanders (e.g., affected areas). The regions B1, B2, B3, B4 and B5 are supplies that store p1, p2, p3 and p4 four categories of emergency materials. The regions C1, C2, C3, C4 and C5 are transport interchange stations. There are seven transport teams for the emergency response. The initial locations of transport teams are supplies or transport interchange stations. One transport team can transport specified categories of emergency materials, the amount of material transport every time must not exceed the capacity of the transport team. Every transport team has a minimal speed Speed _ min and a maximum speed Speed_max. The detailed information of those transport teams is given in Table 1. In this experiment, the temporal constraints with preferences of one transport team moving is (T1 ≤ End - Start ≤ T2, ((T1, 0.5), ((T1 + T2)/2, 1), (0.2 * T1 + 0.8 * T2, 1) , (T2, 0.5))), where T1 is the time that the transport team moves with minimal speed, and T2 is the time that the transport team moves with maximum speed. For the supply region, the emergency materials are needed to be dispatched as soon as possible. For one demander region, it is preferred that the relevant emergency materials are arrived at the same time. The congestion of road is not be considered in this experiment.
Aiming to demonstrate the effectiveness of TPHTN, ten emergency logistics planning problems are designed. For each problems, the top tasks and temporal constraints with preferences are different. The detailed information is given in Table 2.
Experimental results
This experiment compares TPHTN with the state-of-the-art HTN planner SHOP2 [32]. SHOP2 is a well-known HTN planner and has been used in many practical domains [33]. The emergency decision-making planners XEPlanner [7] and REHTN [8], which have been proposed by our research team, have been implemented based on SHOP2 and work well. In addition, to our knowledge, there is not a planner that can handle the kind of temporal preferences proposed in this paper. Based on the above reasons, we choose SHOP2 as the comparison planner. In this experiment, when SHOP2 is used to solve these problems, the speed of transport team are set by the average speed of the transport team. The preference degree of these plans are computed based on the temporal preference functions after the plans are generated by SHOP2.
Plan preference degree
In this section, we compare the quality of the plan generated by TPHTN and SHOP2. The above ten emergency logistics planning problems are solved by TPHTN and SHOP2. The comparison results are shown in Fig. 5.
According to the results in Fig. 5, we can see that the plans generated by TPHTN have higher preference degrees than the plans generated by SHOP2, although both of them can generate valid plans for the ten emergency logistics planning problems. For some complex problems (such as Problem 5, Problem 8 and Problem 10), the quality of plans generated by SHOP2 are very poor, while TPHTN can generate much better plans. The reason is that the temporal preferences information is used to guide the planning direction in TPHTN, while the temporal preferences information is not considered in the planning process of SHOP2. TPHTN can choose the suitable branch to search and generate the better plan based on the heuristic search.
Planning time
In the emergency decision-making process, it requires making emergency plan quickly to response the emergency events. In Fig. 6, it shows the planning time to get the plan by TPHTN and SHOP2.
From the results in Fig. 6, we can find that TPHTN needs more planning time than SHOP2 to get the plan. In the planning process, TPHTN estimates the quality of method/operator and needs to solve the constraint optimization problem, both of them cost extra planning time. But, the branch guided by heuristic search is just the branch which exists a possible valid plan. This may improve the efficiency of TPHTN to generate the plan. After all, TPHTN costs more planning time than SHOP2, but it is in an acceptable range.
Conclusions
The preferences in temporal constraints, which is the important issue in emergency decision-making, affects the quality of the emergency response plan. Aimed to handle the temporal preferences effectively and generate the plan satisfying the decision-makers preferences, this paper proposes a novel HTN planner, TPHTN. When an emergency event occurs, the emergency decision-makers need to make an emergency response plan quickly. The decision-makers identify the emergency tasks and the preferences information based on the emergency situation. Then, TPHTN will be used to generate the response plan. The main contributions of TPHTN are: (1) A new expression formalism is proposed in this paper. STN is modified to STNP for expressing the temporal preferences in planning problem. TPHTN extends the conventional HTN operator/method to express temporal preferences in HTN domain knowledge. The preferences information is expressed by preference function, which can express the continuous preferences. (2) A heuristic search algorithm is used in TPHTN to guide the planning direction. Three definitions of level consistency are proposed to estimate the quality of STNP, which is used as heuristic information to evaluate the quality of operator/method. The heuristic search chooses the suitable operator/method to apply based on theirqualities.
TPHTN can generate the plan satisfying the emergency decision-makers preferences. The experiments validate the effectiveness and practicality of TPHTN. However, the STNP in this paper is a simplified version of TCSPP (Temporal Constraint Satisfaction Problems with Preferences) [34], it cannot express the more complex decision-makers preferences [35]. Our ongoing work is to design some new algorithms to handle the more complex preferences information. In some realistic emergency decision-making problems, the preferences from different departments may be inconsistency or incomplete [36]. It is a challenge to aggregate their preferences information. Additionally, the quality of emergency response plan is referred not only to temporal preferences, but also to resource preferences. It is a meaningful and interesting issue to generate the plans satisfying the decision-makers various preferences.
Acknowledgments
This work is supported by National Science Foundation of China Grant No. 71371079, National Science Fund for Distinguished Young Scholars Grant No. 71125001 and Fund of the Hubei Collaborative Innovation Center for Early Warning and Emergency Response Technology No. JD20150103.
