Abstract
We describe an extension of the Tactical Battle Manager, which uses goal reasoning techniques to control unmanned air vehicles in simulated scenarios of beyond-visual-range air combat. Our prior work with the Tactical Battle Manager focused primarily on behavior recognition, the task of identifying the behaviors being performed by hostile aircraft. In this article, we instead focus on distributed discrepancy detection and response. We also describe an ablation study for which we report evidence that these discrepancy management components improve mission success.
Introduction
Goal reasoning agents can reason over their goals and dynamically modify them in response to notable events or changes in their environment models [12]. This capability provides an advantage over hard-coded or scripted AI, since it allows the agents to react to changing conditions without explicitly encoding all ways in which the environment can change. However, this requires the ability to detect discrepancies between expected environment states and what is actually observed. In this article, we describe how distributed discrepancy detection was incorporated into the Tactical Battle Manager (TBM), a goal reasoning agent for controlling an Unmanned Aerial Vehicle (UAV) in simulated scenarios of beyond-visual-range (BVR) air combat [1,3,4].
BVR air combat (or, more simply, BVR) is a modern style of air-to-air combat where aircraft engage each other over large distances (i.e., over 100 km) using long-range missiles [14]. This contrasts with the classical dogfighting style, where combat occurrs at a significantly faster pace and over much smaller distances. Whereas dogfighting lends itself well to fast-paced reactive strategies, BVR combat allows for longer-term planning and reasoning. Additionally, several characteristics of BVR combat make unexpected events and environment changes likely (e.g., imperfect information, multi-agent environments, adversarial agents, continuous sensor values).
In prior work [1,3,4], we described an earlier version of the TBM, that used behavior recognition, goal selection, and automated planning techniques to control a simulated UAV. However, goal selection was limited; it occurred only upon plan completion, when threatened by a missile, or when a human pilot intervened by issuing a new command to the UAV. Thus, most real-time discrepancies, such as those that may occur due to unexpected actions taken by hostile aircraft, were improperly ignored. Air combat pilots who assessed the behavior of the TBM noted that, due to this flaw, it behaved differently than human pilots and had worse mission performance.
In this article, we describe an extension of the TBM that uses distributed discrepancy detection. The four discrepancies that can be detected are: Model Changed (i.e., a hostile aircraft changed its behavior), Flanking Hostile (i.e., the UAV is being approached by an unexpected hostile), Expectations Violated (i.e., the state of the environment differs from what the UAV expects to observe), and Incoming Missile (i.e., an enemy is attacking the UAV). Each type of discrepancy has a unique detector that is responsible for monitoring subsystems of the TBM, identifying when discrepancies have occurred, and encoding the discrepancies so that other subsystems can respond to them. This allows the TBM to dynamically respond to changing conditions in BVR combat scenarios.
We begin by describing the domain (Section 2) followed by a description of the TBM (Section 3). We then describe four discrepancies that can be triggered and how they are processed by the TBM (Section 4). We next describe our empirical study on the usefulness of these discrepancies (Section 5), where we found evidence that incorporating them improves the TBM’s performance on our BVR air combat scenarios. Finally, we discuss related work (Section 6), the benefits of our approach (Section 7), plans for future work (Section 8), and conclude (Section 9).

A beyond-visual-range air combat engagement between two teams of aircraft (aircraft size not to scale).
Beyond-visual-range air combat (Fig. 1) involves two opposing teams of aircraft, with each team attempting to destroy their enemy or force them to retreat. Initially, the aircraft are located at very large distances from each other (i.e., hundreds of kilometers) and the engagements occur in large airspaces (i.e., thousands of square kilometers). The long-range missiles used in BVR, with ranges of approximately 50 kilometers, force the aircraft to maintain separation from their enemies. This results in BVR being a deliberative form of combat, with positioning and timing being more important than low-level motion planning.
We use the Advanced Framework for Simulation, Integration and Modeling (AFSIM) system [17] as our simulated BVR domain. AFSIM is a high-fidelity air combat simulator that allows aircraft to be controlled either programmatically (e.g., by an AI agent) or directly by human pilots through replicated hardware (e.g., flight sticks, cockpit consoles). Additionally, it contains numerous features that facilitate scientific studies, including configurable large-scale evaluations, and scenarios involving both AI and human pilots.
AFSIM allows for a variety of aircraft to be modeled and piloted, including both real-world and custom aircraft. For this work, the TBMs control a modified version of an F-16 fighter jet (Fig. 2). The F-16s are modified to remove restrictions necessary to protect human pilots from high g-forces, thereby increasing their speed and turning tolerance. AFSIM provides support for flying and maintaining both absolute (i.e., global coordinates) and relative (i.e., with respect to a target) bearings as well as waypoint-based navigation. Each aircraft is equipped with a payload of eight active radar homing missiles, and built-in actions allow for acquiring a weapon lock and firing.

Two simulated F-16 fighter jets.
Each aircraft is either controlled by a human pilot or a TBM. At the start of each mission, all aircraft are provided with a mission briefing, which contains information about their team and their enemies. More specifically, the mission briefing includes:
Identity of squadron leader
Each teammate’s aircraft type and missile capabilities
Initial mission goals
Tactical information (e.g., engagement altitude, evading altitude, angle of approach, firing angle, passive speed, approach speed, engagement speed, cornering speed, escape speed)
Weapons information (i.e., expected missile ranges, shot prediction models, defensive shot guidelines)
Number of enemy aircraft
Type of aircraft
Weapons capabilities
During a simulation, each aircraft receives sensory input messages at discrete time intervals. Each sensory input contains positional information (i.e., latitude, longitude, altitude, heading, velocity) for all visible aircraft and missiles. The aircraft have partial observably (i.e., can only observe objects in their radar range) but have team-wide communication capabilities, allowing them to share sensory input information. Therefore, each aircraft has access to information about the objects they can observe and any objects their teammates can observe.
The TBM controls its aircraft using the following high-level parameterized actions:
Flying in a specified formation: Maintaining a speed, distance, and direction relative to a teammate
Free flying: Flying in a specified direction at a fixed altitude
Flying relative to a target: Towards, away from, to its left, to its right
Firing: Firing a missile at a specified target
The role of the TBM is to use the mission briefing and sensory information to perform actions that intelligently control the aircraft. The following sections detail how the TBM uses this information for reasoning and identifies when unexpected events occur.
The TBM was designed to autonomously control a UAV while allowing for high-level interactions with human commanders or teammates (i.e., delegating tasks to the TBM). The TBM’s architecture (Fig. 3) is divided into shared resources, reasoning components, and discrepancy detectors.1
The discrepancy detectors also perform reasoning but we separate them into their own category for clarity.

The Tactical Battle Manager architecture including reasoning components (orange), discrepancy detectors (yellow), and shared resources (gray).
The TBM contains five shared data sources and systems that can be accessed by its components. The data sources serve as a central communication mechanism though which components share information and influence other components. The data sources are: Goals, Plans, Expectations, Discrepancies, and Object Models. Additionally, the TBM has an Internal Simulator that can be used by the components to perform fast, low-fidelity simulations of BVR scenarios. Unlike the reasoning components and discrepancy detectors, the Internal Simulator is an on-demand service rather than a stand-alone process.
Goals
The Goals data source contains the goals that the TBM is attempting to achieve. The representation used for goals differs from traditional representations (i.e., a set of grounded literals to achieve); it is a set of m high-level desires. Each desire has an associated function
Parameterized desires used by the TBM
Parameterized desires used by the TBM
The environment state is encoded as a tuple
The Goal Manager is responsible for selecting goals (Section 3.3), which are used by the Planner to generate plans (Section 3.4). The TBM can process multiple simultaneous goals, but our current implementation operates on only a single goal (i.e., any previous goals are overwritten).
Any plans that the TBM is currently executing are stored in the Plans data source. Each grounded plan π contains a sequence of actions
Object models
The Object Models data source contains detailed information about any aircraft or missiles that have been observed in the environment. This includes objects that are currently observable as well as any that are no longer visible or have been destroyed. If n unique objects have been observed, the set
As new observations are received from the environment, object models are either created (i.e., if the object has never been observed before) or updated. The Object Models data source is central to both the reasoning components and the discrepancy detectors, since it stores both observable and reasoned data (e.g., other aircrafts’ plans and targets).
Expectations
The Expectations data source stores the TBM’s state expectations at discrete points in time. Expectations are stored at two levels of granularity: object-level and desire-level. Object-level expectations
Discrepancies
The Discrepancies data source stores any discrepancies that were identified by the detectors. Each discrepancy d contains a type
For simplicity, our examples will treat the priorities as discrete values (i.e., LOW, MEDIUM, and HIGH), but in practice the values can be continuous.
The components of the TBM have access to a lower fidelity internal simulator that can be used to generate predictions of future environment states or evaluate alternative courses of action. Our internal simulator is a light-weight version of AFSIM; it allows complete simulations to be run in under a second by using less sophisticated flight models, giving aircraft full observability, and providing less frequent sensor updates. This increased computational efficiency yields less precise simulations, but they still provide acceptable near-term forecasting of aircraft behavior.
Behavior Recognizer
The Behavior Recognizer examines each of the objects in the environment and reasons about their behavior. More specifically, each object’s observation history
An aircraft’s target is detected using rules that consider its speed, orientation, and position relative to other aircraft. For example, if an enemy aircraft has been flying towards a teammate aircraft and adjusting its orientation to track the teammate’s movements, the rule-based system labels the teammate as its target. A similar rule-based approach is used for plan recognition. Each aircraft is initially labeled as either Attacking or Evading. Attacking aircraft are assumed to be following a plan that involves flying towards and firing a missile at their target. The assumed plan for an Evading aircraft involves flying away from any enemies and enemy missiles. This approach uses a limited number of high-level behaviors and only a single simple plan for each behavior type, so plan recognition is performed with the understanding that actual plans may deviate noticeably. Our previous work has shown that, while more precise plan recognition is possible, the real-time constraints of the domain and high degree of uncertainty make a simple, high-level approach preferable. For both target and plan recognition, all rules are provided by domain experts.
During mission execution, the Behavior Recognizer will continuously monitor the Object Models for any new observations that are received from the environment. If new observations occur, target and plan recognition are performed on each object that has an updated observation, and their corresponding object models are revised if any changes occurred. The Behavior Recognizer relies on only environmental observations, so no other components of the TBM directly influence its behavior.
Goal Manager
The Goal Manager is responsible for reasoning about the TBM’s current goals and, if necessary, selecting new goals to pursue. Goal changes can be externally or internally motivated. External goal changes occur when explicit commands are received from a superior (i.e., a human teammate or lead aircraft). For example, the squad commander might tell the TBM to aggressively pursue enemies. Any external commands are given priority, so the TBM will always change its goals in response. Internal goal changes are a result of the TBM’s own decision making process and are a result of the TBM successfully achieving its current goal, determining it has failed or cannot achieve its current goal, or identifying discrepancies. For either success or failure, the TBM will no longer have a valid goal and will therefore immediately select a new goal. However, discrepancies can either be acted on or ignored.
The Goal Manager continuously monitors the Discrepancies data source to see if any discrepancies have been identified. A set of rules are used to evaluate each discrepancy to determine if a goal change is necessary. These rules consider the discrepancy type
For example, consider an example where a ModelChanged discrepancy was created because hostile aircraft
(
Change Goal
(
Ignore Discrepancy
Although the two rules are nearly identical, the distance between the hostile aircraft and the TBM dictates whether the Goal Manager will select a new goal or ignore the discrepancy. The implemented version of the Goal Manager uses a collection of such rules that are provided by a domain expert.
A new goal is generated based on the current environment state and a set of goal priorities that are supplied during the initial mission briefing. For example, the mission briefing may specify that priority should be given to goals that target a high-value target (e.g., the enemy squad commander). If possible, the Goal Manager will select a high-priority goal. However, a lower priority goal may be selected if the current environment state makes achieving high-priority goals impractical. For example, if the high-priority target is not currently visible to the TBM then it is not currently possible to target that aircraft, so the TBM will select a lower priority goal.
The Goal Manager is heavily dependent on the Discrepancy Detectors to correctly identify discrepancies. Our previous work did not include sophisticated discrepancy detection components, so goal changes were only caused by external commands, success, or failure. The addition of discrepancies allows the Goal Manager to react to unexpected external events and opportunities, thereby increasing the number of situations for which it can act intelligently. The output of the Goal Manager, the selected goal, is used by the Planner.
Planner
The Planner generates a plan for the TBM that will achieve its current goal. Our implementation uses a plan-library planner [5], where the library contains a set of ungrounded template plans. Each template plan contains a sequence of actions but the actions do not have their parameters specified. For example, consider the following template plan:
Free fly at a specified heading
Fly directly at a target
Fire at the target
The actions and their ordering are all specified but the heading and target are both ungrounded. During planning, the Planner uses the Object Models to generate multiple grounded instantiations of each applicable plan. In the example plan, this would involve instantiating plans with a variety of headings and target aircraft. Each candidate plan is evaluated using the Internal Simulator to predict the outcome. During simulation, the TBM uses the candidate plan and other aircraft use their currently predicted behavior. The outcome of each simulation is used to measure how well the plan satisfies the TBM’s current goal, and the plan that best achieves the goal is selected. In the event of a tie, one of the plans that best satisfies the current goal is selected at random.
Planning is performed immediately after a goal change, and also periodically to account for the dynamic environment. In addition to executing the actions contained in the plan, the plan is also used by the Predictor to anticipate the long-term outcome of the TBM’s behavior.
Predictor
Prediction in the TBM is performed using the Plan Execution Predictor (PEPR) [11]. PEPR uses the TBM’s plan and the recognized plans of all other objects to predict future environment states. Similar to the Planner, a high-level simulation is performed using the Internal Simulator. At fixed time intervals during the simulation, the current environment state is sampled and stored as (object and desire) expectations.
Each time PEPR is run, any previous expectations are removed. This removal occurs because PEPR uses a low-fidelity simulator, so longer-term expectations are expected to be more error prone than short-term expectations. However, even with the possibility of error, the expectations provide a general trajectory for future environment states. In practice, PEPR regularly recomputes expectations, so expectations are regularly updated as more information becomes available.
PEPR is heavily reliant on both the Planner and Behavior Recognizer. To perform accurate simulations, and therefore collect realistic expectations, both the TBM’s plan and the plans of other objects are necessary and should be accurate. The output of PEPR is not directly used by any of the other reasoning components but is used to perform discrepancy detection.
Discrepancy detection
Discrepancy detection allows the Goal Manager, and the TBM as a whole, to efficiently respond to unexpected situations. Our previous version of the TBM contained only a single discrepancy detector, Incoming Missile. Here we add three additional detectors: Model Changed, Flanking Hostile, and Expectations Violated.
Discrepancies are processed in a distributed manner, as shown in Fig. 3, with each detector responsible for identifying a single type of discrepancy. Each detector aligns closely with one or more of the reasoning components, and identifies notable changes in their output or reasoning failures. The distributed nature of the discrepancy detectors is beneficial because it allows new discrepancy detectors to be easily added and the detectors to remain agnostic to how each reasoning component is implemented. This section provides details on each of the four discrepancy detectors used by the TBM.
We use the following notation throughout this section:
Incoming Missile
The Incoming Missile discrepancy detector identifies hostile missiles that are targeting the TBM’s aircraft. The ability to identify and respond to incoming missiles is central to air combat, so this discrepancy detector existed in prior versions of the TBM. A discrepancy is created if one of three conditions are met:
Cause: A discrepancy is created if a missile object is added to the Object Models (e.g., a missile that was just fired) and it is targeting the TBM ( Format: These discrepancies are given a high priority and denote that a missile which previously had no target is now targeting the TBM ( Cause: A discrepancy is created if a missile object has its target change to the TBM ( Format: These discrepancies have high priority and denote that the TBM is now the target (
Cause: A discrepancy is created if a missile object is no longer targeting the TBM (
Format: These discrepancies have low priority and show that the TBM is no longer the target (
The difference in priorities is because the TBM needs to respond quickly to imminent threats, whereas a missile that is no longer a threat does not require an immediate response. Similarly, the TBM only raises a discrepancy if a new missile is targeting it. This is done because the TBM assumes that a missile targeting another aircraft will not pose a significant threat, unless it changes its target, so it is safe to ignore. The discrepancy detector continuously monitors for discrepancies and is closely linked to the target information provided by the Behavior Recognizer, and also relies on the environment observations.
Model Changed
The Model Changed discrepancy detector identifies when there are changes in the behavior of hostile aircraft. Much like the TBM, hostile aircraft may change their plans. Similarly, noise in the environment and deceptive actions make it difficult for the Behavior Recognizer to accurately determine the correct plan or target of a hostile. However, since the Behavior Recognizer outputs the plan and target of each hostile aircraft at fixed intervals, the Model Changed detector can identify when any changes occur. Model Changed discrepancies allow the TBM to identify and respond to such changes and are triggered under four conditions:
Cause: A discrepancy is created if a hostile aircraft object is added to the Object Models ( Format: These discrepancies are given a low priority and show that a new hostile aircraft has been observed ( Cause: A discrepancy is created if a hostile aircraft’s plan changes ( Format: These discrepancies are given a medium priority and denote that the hostile’s plan has changed ( Cause: A discrepancy is created if a hostile aircraft object is now targeting the TBM ( Format: These discrepancies are given a high priority and show that the hostile is now targeting the TBM (
Cause: A discrepancy is created if a hostile aircraft object is no longer targeting the TBM (
Format: These discrepancies are given a low priority and show that hostile is no longer targeting the TBM (
The differences in priorities represent the relative threat each discrepancy poses to the TBM. For example, a newly observed hostile likely occurs at the edge of radar range and is therefore lower priority than an existing aircraft that is now targeting the TBM. Similar to the Incoming Missile discrepancy detector, the Model Changed detector is closely linked to the Behavior Recognizer and relies on environmental observations.
Flanking Hostile
The Flanking Hostile discrepancy detector identifies when a hostile aircraft deviates from its expected position such that it becomes a threat to the UAV. In practice, this involves the distance between the hostile and UAV approaching missile range. The discrepancy occurs under the following condition:
Cause: A discrepancy is created if a hostile aircraft is not targeting the TBM but the distance between the objects falls below a threshold α (
Format: These discrepancies are given a high priority and show that its observed position has changed (
The priority of this discrepancy is high for two reasons. First, the hostile aircraft is nearing firing range and may attack the UAV, posing a direct threat. Second, the UAV allowing itself to be flanked indicates it has incorrectly identified the hostile’s plan or target, incorrectly predicted the hostile’s position, or has no other options. Although the current plan may have been the best (or only) alternative when planning was performed, changes in the environment may allow for different plans or goals. This discrepancy detector is closely linked to both the Planner and Goal Manager.
Expectations Violated
The Expectations Violated discrepancy detector identifies any significant differences between the expectations about the environment and actual environment states. Given that expectations are generated based on long-term plans in a dynamic and adversarial environment, such deviations are expected. However, rapidly identifying and responding to the discrepancies allows the TBM to modify its plans and goals in response. Expectations Violated discrepancies are triggered under the following conditions:
Cause: A discrepancy is created if the distance between an object’s expected position and actual position is greater than a threshold β ( Format: These discrepancies are given a low priority and denote that its observed position differs from its expected position (
Cause: A discrepancy is created if the similarity between the current and expected desires is below a threshold γ (
Format: These discrepancies are given a low priority and denote that the expected desires differ from the current desires (
These discrepancies are given a low priority because they do not pose an immediate threat to the UAV. For example, if a hostile aircraft deviated significantly from its expected position and was a threat to the UAV, a higher priority Flanking Hostile discrepancy would be triggered. Expectation Violation discrepancies are primarily aimed at determining when the quality of future predictions is decreasing (i.e., new predictions have not been generated recently). Thus, it is most closely linked to the Predictor. Additionally, these discrepancies are also useful for detecting when the plans and targets of hostile aircraft are not following the recognized behaviors.
Empirical study
Our empirical study examines the influence of discrepancy detection on the TBM’s mission performance (i.e., the number of aircraft destroyed on both teams) and efficiency (i.e., how quickly enemy aircraft are destroyed). We test the following hypotheses:
Using discrepancy detection will increase the TBM’s mission performance Using discrepancy detection will increase the TBM’s mission efficiency Performance and efficiency improvements will increase as the number of aircraft in the scenario increases
Scenarios and evaluation method
Our evaluation scenarios involve two teams of TBM-controlled aircraft, where each TBM is given a starting goal of destroying all opposing hostiles while minimizing teammate casualties. One team is composed of TBMs that uses all four discrepancy detectors (referred to as DON) whereas the other team uses only the Incoming Missile discrepancy detector (referred to as DOFF). Otherwise, both teams use identical TBMs.
We use four base scenarios that differ in the number of aircraft involved: 2 vs 2, 3 vs 3, 4 vs 4, and 5 vs 5. In each base scenario, teams are arranged in columns with teammate aircraft spaced 10 nautical miles from each other, and the opposing teams spaced 45 nautical miles from each other. Each base scenario is used to create 100 random scenarios, where each aircraft is modified according to a uniform random distribution. An aircraft’s position is independently perturbed between

Example of the starting conditions in a 3 vs 3 scenario (aircraft size not to scale).
Each random scenario was run twice, once with DON controlling the white team and once with DON controlling the black team. This was done to remove any inherent bias in the starting positions. Additionally, each scenario was run twice so that DON and DOFF would have runs with similar starting configurations that could be compared during significance testing. Namely, all aircraft, both teammates and opponents, having identical initial positions, with the only difference being which agent controlled each team. This results in 800 total experimental runs (4 base scenarios × 100 random variations × 2 sides). During each run, we logged information about the aircraft that were destroyed, missiles shot, and the simulation length. A run ended when a team was completely destroyed or 20 minutes of simulated time had elapsed.
We use metrics for mission performance and mission efficiency to assess the impact of discrepancy detection in BVR air combat.
Mission performance
Each TBM (on both DON and DOFF teams) is given an initial goal to destroy all opposing forces and minimize teammate casualties. Thus, our mission performance metrics measure how well that goal is achieved. We use the following metrics:
If a team reliably records a higher number of kills and has more wins, we argue that it has a higher mission performance.
Mission efficiency
The mission efficiency measures how long it takes a team to achieve an absolute win. Other than discrepancy detection, both teams of TBMs are otherwise identical. This means that when one TBM approaches a kill shot, so too does the enemy. Any differences in the win times are a result of the DON team detecting and reacting to discrepancies. For example, using discrepancy detection to identify a vulnerable enemy would decrease the time it takes to win, whereas using discrepancy detection to react to a threat would either prevent or delay a loss. Thus, our experiments compare the duration of absolute wins and absolute losses to demonstrate the impact of discrepancy detection.
Results
The results of the simulations are shown in Table 2 (2 vs 2), Table 3 (3 vs 3), Table 4 (4 vs 4), and Table 5 (5 vs 5). For each metric, the results show the totals for the team with discrepancy detection on (i.e., DON) and the team with discrepancy detection off (i.e., DOFF), and the net and percent differences between those totals. In all instances, enabling discrepancy detection increased mission performance.
Results of 200 2 vs 2 scenarios, where metrics marked with ∗ denote a statistically significant difference according to a single-tailed t-test (
)
Results of 200 2 vs 2 scenarios, where metrics marked with ∗ denote a statistically significant difference according to a single-tailed t-test (
Results of 200 3 vs 3 scenarios, where metrics marked with ∗ denote a statistically significant difference according to a single-tailed t-test (
Results of 200 4 vs 4 scenarios, where metrics marked with ∗ denote a statistically significant difference according to a single-tailed t-test (
Results of 200 5 vs 5 scenarios, where metrics marked with ∗ denote a statistically significant difference according to a single-tailed t-test (
The TBM had significantly more kills, missiles fired, absolute wins, and partial wins when using discrepancy detection (using a t-test with
The mission efficiency results do not show a significant difference for all scenario types (using a t-test with
Our results show that discrepancy detection significantly improves mission performance in BVR air combat scenarios. Even when using identical aircraft and reasoning components, a team of TBMs using discrepancy detection outperforms a team that does not use discrepancy detection. As shown in our results, both variants have similar shot-to-kill ratios. Although discrepancy detection allows the TBM to identify more opportunities to attack and results in more shots fired, it does not improve the percentage of missiles that hit an enemy. For both variants, approximately 30% of missiles hit their target. This relatively low hit rate occurs because the TBM’s firing decisions consider both its chance of hitting an enemy and the chance of evading if the enemy returns fire. Both have identical firing and evasion systems, so they also have equivalent attack ranges (i.e., the distance at which fired missiles are guaranteed to hit the enemy). This results in an aircraft firing from outside its attack range to keep itself safe. However, since using discrepancy detection fires more missiles but still maintains a similar hit rate, it is still performing reasonable attacks and not firing shots with little chance of hitting.
Related work
Discrepancy detection in goal reasoning agents has been an active research topic. For example, Muñoz-Avila et al. [13] describe GDA-HTNBots, a goal reasoning agent that operates in a team game. GDA-HTNBots’ planner generates state expectations and continually monitors to determine if those expectations are violated (i.e., a discrepancy). When a discrepancy occurs, GDA-HTNBots selects a new goal and replans. The ARTUE agent [12] uses a similar process to monitor for discrepancies. When one is found, ARTUE abduces an explanation for why it occurred, adds new assumptions to its beliefs about what caused the discrepancy, and uses the updated beliefs to select a new goal. Cox et al. [7] describe the MIDCA agent and present an approach for detecting when its distribution of beliefs change over time. A noticeable belief change indicates that the world state has changed significantly, possibly due to a noteworthy event that the agent should detect and to which it should respond. Wilson et al.’s [16] agent uses bounded expectations to avoid false discrepancies that are not semantically meaningful; this is important for its application in a continuous domain. Dannenhauer and Muñoz-Avila [8] use informed expectations to guide planning in a dynamic environment by monitoring the effects of the planning actions that have been performed so far. This work was later extended to include expectations for agents that are not fully aware of the actions they can perform until they have explored the environment [9]. As a final example, GDA-C [10] continuously measures the utility of the environment state and considers any decreases to be potential discrepancies. Domain-specific methods are used to identify the specific type of discrepancy, and a case-based reinforcement learning algorithm is used to select a new goal and associated policy.
A common theme among these agents is that they use a single method of discrepancy detection. In contrast, we use a distributed discrepancy detection architecture where each detector is responsible for monitoring a single type of discrepancy, and those discrepancies span a variety of causes. This allows new discrepancy detectors to be added as needed and does not couple discrepancy detection to the underlying implementation of the reasoning components; this design supports a more comprehensive goal reasoning process. Additionally, the TBM represents the environment at two levels of abstraction, object-level and desire-level, thereby allowing flexibility regarding how sensitive it is to discrepancies.
Accurately modeling an opponent allows an agent to properly respond to it [6]. For the TBM, opponent modeling focuses on recognizing the behavior of enemies (i.e., their plans and targets). Many of the discrepancies that are detected by the TBM relate to enemy behaviors changing or being incorrectly identified. Behavior-related discrepancies can either be detected directly, such as the Model Changed discrepancy, or indirectly as a result of another type of discrepancy, such as inaccurate expectations or selecting an inappropriate plan. This makes the Behavior Recognizer an important component in the TBM architecture. Policy and Goal Recognizer (PaGR) [4] is another method for behavior recognition we have implemented in the TBM. Whereas our current implementation recognizes targets and a limited set of possible plans, PaGR recognizes targets, goals, and a larger set of possible plans. In our BVR scenarios, we found that the plans associated with each goal did not vary significantly. As such, there was no significant performance benefit when recognizing both the plans and goals, although there was a higher computational cost. However, PaGR may prove beneficial if the tactics used in BVR become sufficiently complex (e.g., coordinated multi-agent tactics).
In an adversarial environment, generating plans that successfully achieve an agent’s goals also requires predicting the plans of opponents. This has been demonstrated in domains like poker [2] and rock-paper-scissors [15], and is also true for BVR air combat. The TBM uses predictive planning (i.e., potential plans are evaluated using the Internal Simulator), so it requires accurate assumptions about the plans of each hostile aircraft. In prior work, we reported that predictive planning can aid in behavior recognition if the plan includes actions that help distinguish an opponent’s plan [1]. However, in this article we use a simpler predictive planner with the assumption that improperly recognized behaviors can be dynamically identified and counteracted. Thus, behavior recognition, predictive planning, and discrepancy detection are separate, loosely-coupled processes.
Summary
The primary benefit of our distributed discrepancy detection architecture is that it decouples discrepancy detection from the underlying reasoning components. This makes discrepancy detection independent of the specific implementation used for each reasoning component since the discrepancy detectors use only the output of the reasoning components rather than any implementation-specific knowledge of how they operate. For example, while the Model Changed detector uses the recognized plans of hostile aircraft, it does not need to know what specific algorithms are used to recognize a hostile aircraft’s plan. This is beneficial for two reasons. First, it allows for the implementation of each reasoning component to be changed as necessary. This is especially valuable when the reasoning components are still being actively researched and developed as it does not couple the discrepancy detectors to earlier prototype algorithms. Second, it allows discrepancy detection capabilities to be developed even if the implementation of a reasoning component is unknown. For example, if a reasoning component is proprietary software provided by a third-party vendor, it might not be possible to embed discrepancy detection capabilities directly into the software.
The distributed nature of the architecture also simplifies the development and maintenance of discrepancy detectors. If a single monolithic discrepancy detector was used that covered all possible discrepancy types, it would be more difficult to add or remove discrepancy detectors. For example, if the TBM was deployed in a scenario without any hostile aircraft (e.g., a search-and-rescue mission) it may save resources to disable the Behavior Recognizer and the discrepancy detectors that deal with enemy behaviors. Using a monolithic discrepancy detection system, that would require modifying the entire system rather than disabling individual detectors. Similarly, if the TBM had a long-term deployment (e.g., weeks or months), the distributed architecture allows new components to be added dynamically without interrupting or modifying any existing components. Additionally, while not the focus of this work, a distributed approach also allows for providing each component with dedicated computational resources (e.g., each component runs on its own processor).
While the current implementation of the TBM uses discrepancy detectors that are independent of the underlying reasoning components, they are tightly coupled to the BVR combat domain. However, both the TBM architecture and the discrepancy detectors have properties than are applicable in many adversarial combat domains, and variations of each of the existing detectors could be used in related domains. For example, the Incoming Missile detector could be used in any domain with enemies firing projectiles at the agent, the Model Changed detector could be used in any domain with opponents that plan, the Flanking Hostile detector could be used in any domain where opponents attempt to achieve positional superiority, and the Expectations Violated detector could be use in any domain where the agent performs long-term prediction. Although the specific implementations would need to be modified, the various detection tasks are general enough that they have usage outside of BVR combat.
One of the primary lessons learned from this work was that although there is benefit from more general-purpose discrepancy detection (e.g., the Expectations Violated detector), adding a broad range of discrepancy detectors that each focus on a specific type of discrepancy results in more significant improvements. As such, for sufficiently complex domains like BVR combat, it is often beneficial to have discrepancy detectors that use a large amount of domain-specific knowledge in order to detect discrepancies that will be of more value to the agent’s overall reasoning process.
Future work
We are planning improvements to the TBM that should increase its goal reasoning capabilities and BVR air combat performance. For example, we plan to extend the TBM’s discrepancy detection capabilities. Some of the additional discrepancies we have begun to explore include identifying when an aircraft has different capabilities (e.g., speed and maneuverability) than expected, or when the expected range of weapons does not match observed missile performance. Also, we plan to explore methods for learning new types of discrepancies and dynamically responding to them. For example, the TBM could examine common plan failures, determine a root cause for the failures, and learn to detect the root cause. As the complexity of the reasoning components increase, so too will the complexity of discrepancy detection necessary to respond to the possible failures. Thus, a discrepancy learning system would be beneficial if it can reduce the knowledge engineering required by subject matter experts.
Our discrepancies are a result of unexpected events that occur because of reasoning errors (e.g., incorrect assumptions, incomplete information, unexpected behavior), but we also plan to explore discrepancy detectors that identify opportunities for the TBM. One example of this is multi-agent planning for cooperative tactics. If the TBM can determine when an opportunity for a team tactic occurs and initiate the team’s behavior, that team can potentially overwhelm the enemies. For example, the TBM could identify a vulnerable enemy and instruct teammates to surround it.
We also plan to improve the TBM’s reasoning components. We selected the current method for each reasoning component based on the trade-off between capabilities and real-time execution. We will examine additional reasoning capabilities and algorithms to identify those that provide the most benefit to the TBM. For example, it would be beneficial for the TBM to execute team tactics, but it may also be necessary to add a reasoning component that identifies when the enemies are using team tactics.
Conclusions
We presented a method for distributed discrepancy detection in a goal reasoning agent. The agent, the Tactical Battle Manager, controls an unmanned aerial vehicle in a simulated beyond-visual-range air combat domain using behavior recognition, goal management, planning, and prediction. Discrepancy detection allows the TBM to respond to unexpected events, opportunities, or failures in the reasoning components. We examined four types of discrepancies: Incoming Missile, Model Changed, Flanking Hostile, and Expectations Violated. The distributed discrepancy detection framework is beneficial because it allows discrepancy detectors to be easily added, and provides a clear separation between the implementations of the discrepancy detectors and reasoning components.
Our empirical study demonstrated that discrepancy detection improves the TBM’s mission performance, with improvements increasing as the number of agents in the environment increases. Increasing the number of agents in the environment makes the reasoning components’ tasks more difficult due to adversarial behavior, uncertainty, and complex dynamics. However, the TBM’s ability to detect and respond to discrepancies allows the reasoning components to react to the changing environment and correct any erroneous or outdated assumptions.
Footnotes
Acknowledgements
Thanks to OSD ASD (R&E) for supporting this research. Thanks also to our subject matter experts for their many contributions.
