Recursive Bayesian Estimation Search with Environmental Constraints and Psychological Beliefs and Biases

Abstract

In the paper, we consider a modification of the Recursive Bayesian Estimation technique and incorporate the Fast Sweeping Method to extend recent work in search applications with an algorithm capable of calculating optimal trajectories in the context of multiple targets and searchers. In addition to providing a computational overview of the algorithm, we demonstrate how incorporating knowledge, deception, and belief biases into the algorithm alters the optimal trajectories of the searchers. Finally, we present Monte-Carlo simulations of how these psychological factors influence the mean probability that the searchers detect the target. We will discuss the implications of the findings, current limitations and future extensions of the model, and potential applications to decision support.

Keywords

Decision Aids Decision Making Decision Support Discrete Simulation Maritime Issues Mental Reasoning Military Navigation

Introduction

The U.S. Coast Guard, U.S. Navy, and U.S. federal law enforcement agencies contribute to the multi-national detection, monitoring, and interdiction of transnational criminal organizations exploiting transshipment routes for moving black market goods (e.g., narcotics or weapons). Finding and stopping these vessels involves a great deal of aerial and maritime monitoring and international cooperation.

As depicted in Figure 1, the interdiction task is a complex search and detection task where a target platform (e.g., drug smuggler boat or semi-submersible) is transitioning between locations while attempting to avoid detection by searcher platforms (e.g., Coast Guard Cutter or helicopter). Each platform might have varying levels of mobility, detectability, and endurance.

Figure 1.

Overview of the USCG search and interdiction task.

The search and detection task can be highly complex given the scenario’s geometry. Multiple searchers and targets can operate simultaneously anywhere within the game plane. Stationary and dynamic obstacles (e.g., weather hazards or maritime traffic) can hinder straight-path transits between originating and destination locations. For any given scenario, geometry, physics, and mathematics can be applied to ascertain realistic, optimal waypoints and paths for both platforms.

In addition to the complexity imposed by the environment on the search and interdiction mission, we must account for the human aspects of the operators, including their knowledge, decision-making, and coordination. For example, the target platform may have preferred originating and/or destination locations. A searcher platform may prioritize the protection/monitoring of one location over another location. One platform may bias its activities based on historical knowledge of another platform’s activities or demonstrated Pattern of Life (PoL). Intelligence and deception can further bias and influence the operators’ beliefs. These factors could also be layered; a search platform might favor mobility choices that improve/maintain its endurance over ensuring frequent coverage of areas in which the target platforms may transit. Further, either platform might have entrenched strategies driving repeated processes or patterns or could be feigning the establishment of those entrenched strategies (engaging in deception). These competing criteria drive the paths that can be taken by each platform and therefore increase the complexity of the models that can be developed to demonstrate the probabilities of likely outcomes.

Given the complexity of maritime operations, artificial intelligence (AI) applications have begun to emerge in which an AI agent acts as an advisor to support operator decision-making. For example, the Tool for Multi-Objective Planning and Asset Routing (TMPLAR) aids navigation for Naval and commercial shipping, providing recommended paths optimized based on several objectives or decision attributes (e.g., travel time, fuel efficiency, navigator-specified deadlines, etc. (Avvari et al., 2018). Other Naval tasks that implement decision and navigational aids include counter-smuggling operations (Courses of Action Simulated Tool - COAST), dynamic autonomous aerial systems’ operations under uncertainty (Supervisory Control Operations User Testbed - SCOUT), and pirate interdiction (Pirate Attack Risk Surface - PARS (Esher, Regnier, et al., 2010)).

Previous research has developed methods to address the search and detection problem, but many applications have been limited to scenarios with only a single searcher and a single target (Eagle, 1984, Kress et al., 2010). In the current project, we consider a single target with multiple searchers. Further, in our model we implement the Recursive Bayesian Estimation (RBE) technique of Bourgault et al. (2004) together with the Fast Sweeping Method (FSM) by Zhao (2005) to compute the optimal trajectory of the searchers for interdiction.

We provide a computational overview of the model in the following section, as well as demonstrate how knowledge, deception, and belief biases alter the optimal trajectories of the searchers. Next, we present Monte-Carlo simulations of how psychological factors influence the mean probability and latency that the searchers detect the target. Finally we discuss the implications of the findings, current limitations and future extensions of the model, and potential applications to decision support.

Model Background and Overview

At time $k$ , let the position of the target be $x_{k}^{t}$ . We consider the case where the target's objective is to go from some source $Γ$ to some destinations $Ω$ optimally (with some noise) under the environment constraint and belief-utility function $f^{t}$ . For simplicity, assume the source consists of a single point. Denote by $u^{t} (x)$ the solution given by the following Eq 1,

| \nabla u (x) | = f^{t} (x)

with $u (x) = 0$ for all $x \in Γ$ . Here $u^{t} (x)$ is the optimal utility/cost for the target going from the source $Γ$ to a location $x w . r . t .$ the cost $f^{t}$ . The function $u^{t}$ can be computed in $O (N)$ , and $N$ is the number of grid points, using FSM. Given, $u^{t} (x)$ , we can compute the optimal path (or trajectory) from $Γ$ to $x$ by descending along the gradient of $u^{t}$ from $x$ . To introduce variability, we assume that the target may deviate from optimality by not always following the steepest gradient direction. The detail of this deviation is specified by the activation function below. Following the RBE approach, we also assume that the searchers have some knowledge about the target's probability distribution function (PDF), i.e., $p (x_{k}^{t})$ for any $k$ , but do not know the exact position of the target at any given time. The searchers also know the target’s conditional probability $p (x_{k}^{t} | x_{k - 1}^{t})$ , i.e., the probability of the position of the target at time $k$ given its prior position at time $k - 1$ . Using Bayes’ Law on conditional probability implies Eq 2,

p (x_{k}^{t} | x_{k - 1}^{t}) = \frac{p (x_{k - 1}^{t} | x_{k}^{t}) p (x_{k}^{t})}{p (x_{k - 1}^{t})}

Each of the terms used in Eq 2 can be computed from $u^{t}$ from Eq 1 as follows: Assume there is a single destination from some time $K$ , $x_{K}^{t} = Ω$ and suppose the target is at $x_{k}^{t}$ , for $k \leq$ K. Let $N_{8} (x_{k}^{t})$ be the eight neighboring positions of $x_{k}^{t}$ . The conditional probability for each neighboring position $x_{k - 1}^{t} \in N_{8} (x_{k}^{t})$ is determined by the relative change in utility which we call the activation function, defined as

p (x_{k - 1}^{t} | x_{k}^{t}) = \frac{{[u^{t} (x_{k}^{t}) - u^{t} (x_{k - 1}^{t})]}^{+}}{\sum_{x \in N_{8} (x_{k}^{t})} {[u^{t} (x_{k}^{t}) - u^{t} (x)]}^{+}}

where $a^{+} = \max (0, a) .$ This conditional probability can be combined with the target’s PDF at time $k$ to calculate the PDF for time $k - 1$ ,

p (x_{k - 1}^{t}) = \int^{​} p (x_{k - 1}^{t} | x_{k}^{t}) p (x_{k}^{t}) d x_{k}^{t} .

Noting that $p (x_{K}^{t}) = 1$ at the last time step $K$ and starting at K, we can substitute Eqs 3 and 4 into Eq 2 to recursively compute $p (x_{k}^{t} | x_{k - 1}^{t})$ for all $k \leq K$ until $x_{k}^{t}$ is the source. By updating the k indices, if necessary, we may assume that $x_{0}^{t} = Γ$ and $x_{K}^{t} = Ω$ . Generalizing this to multiple destinations, the above procedure is performed separately for each destination and the resulting PDFs and conditional probabilities are then combined via summing them up.

Given $M$ searchers, the target detection probability is defined as follows. At time $k$ , we use $x_{k}^{t}$ to denote the position of the target, $x_{k}^{s_{i}}$ for the position of searcher $i$ , for $i = 1, \dots, M$ , $z_{k}^{s_{i}}$ for the observation of searcher $i$ , and $D_{k}^{s_{i}}$ for the event when searcher $i$ detects the target. The probability of target detection, conditional on target location is defined as,

p (D_{k}^{s_{i}} | x_{k}^{t}) : = p (z_{k}^{s_{i}} = D_{k}^{s_{i}} | x_{k}^{t}) \approx N (x_{k}^{s_{i}} - x_{k}^{t}, δ_{i})

where $N (x_{k}^{s_{i}} - x_{k}^{t}, δ_{i})$ is a normal Gaussian distribution. The value $δ_{i}$ represents the sensor sensitivity for searcher $i$ , which is proportional to the distance between the searcher and the target.

The probability that searcher $i$ does not detect that target at time $k$ is given by

p (\bar{D_{k}^{s_{i}}} | x_{k}^{t}) = 1 - p (D_{k}^{s_{i}} | x_{k}^{t})

and the probability that none of the searchers have detected the target by time k to be

(\bar{D_{k}} | x_{k}^{t}) : = \prod_{i = 1}^{M} p (\bar{D_{k}^{s_{i}}} | x_{k}^{t}) = \prod_{i = 1}^{M} (1 - p (D_{k}^{s_{i}} | x_{k}^{t}))

In this paper, we consider a modified version of the RBE approach by updating the searchers’ state a single step at a time with the anticipated knowledge of where the target’s PDF is within the next $N_{k}$ steps. For $j = 0, 1, \dots, N_{k},$ define

p_{k + j} = \int^{​} p (x_{k + j}^{t} | {\bar{D}}_{1 : k - 1}) ″ (1 - p ({\bar{D}}_{k} | x_{k + j}^{t}) d x_{k + j}^{t},

where $p (x_{k + j}^{t} | {\bar{D}}_{1 : k - 1})'$ is recursively computed by the following algorithm:

Algorithm (alternate RBE): For $j = 1, \dots, N_{k} .$

Suppose $p (x_{k + j - 1}^{t} | {\bar{D}}_{1 : k - 1})'$ is known.

Prediction step: Compute $p (x_{k + j}^{t} | {\bar{D}}_{1 : k - 1})'$ by

p {(x_{k + j}^{t} | {\bar{D}}_{1 : k - 1})}^{″} = \int^{​} p (x_{k + j}^{t} | x_{k + j - 1}^{t}) p (x_{k + j - 1}^{t} | {\bar{D}}_{1 : k - 1}) ″ d x_{k + j - 1}^{t} .

Update step:

p {(x_{k + j}^{t} | {\bar{D}}_{1 : k})}^{″} = p {(x_{k + j}^{t} | {\bar{D}}_{1 : k - 1})}^{″} p ({\bar{D}}_{k} | x_{k + j}^{t}) .

Define the detection rate of the searchers’ states $x_{k}^{s_{1} : s_{M}}$ at time $k$ with $N_{k}$ look ahead states of the target by,

J_{k} (x_{k}^{s_{1} : s_{M}}) = \sum_{j = 0}^{N_{k}} p_{k + j},

and the associated cost by

F_{k} (x_{k}^{s_{1} : s_{M}}) = \sum_{i = 1}^{M} f^{s_{i}} (x_{k}^{s_{i}}),

where $f^{s_{i}}$ is the cost for searcher $i$ which can arise from the environmental constraint and belief. Combining Eqs 8 and 9, we solve the following optimization problem which amounts to minimizing the cost while maximizing detection,

\min_{a_{k : k + N_{k}}^{s_{1} : s_{M}}} [α F_{k} - (1 - α) J_{k}]

where $α$ is a weight parameter that controls the relative importance of detection probability relative to the costs in the maximization.

Of course, the searchers' knowledge about the target's PDF and conditional probability can be imperfect, affecting the detection rate. We will go over different scenarios where this knowledge and beliefs range from perfect to imperfect calibration and how knowledge/belief calibration affects the success of the searchers detecting the target.

Model Demonstrations

The demonstration scenario entails targets from a single source traveling to three possible destinations under constraints and beliefs. The single target has no knowledge of the searchers. The objective of the target is to travel from the origination location to the destinations optimally while minimizing costs, and where the target has constraints and biases—preferences for destinations and minimizing navigational costs. The searchers know the target's origination point and the target's possible movements assuming optimality under constraints and beliefs. The objective of each searcher is to follow the trajectory that it believes to be optimal (maximizing the probability of target detection while minimizing its costs). Thus, the searchers' strategies to find the target can assume different constraints and beliefs from the target's true beliefs and constraints. For instance, the searchers may base their beliefs about the target's trajectory and destination on their PoL or historical knowledge about the target. In contrast, the target's current beliefs (i.e., goal) can strategically deviate from their PoL.

Demonstration 1: Searcher Beliefs Based on Pattern of Life

Figure 2 demonstrates how the target's PoL is incorporated into the searchers' beliefs and behavior using a simplified example with only two destination ports, located at the top and bottom right corners. The rectangle in the middle of the frames represent an impassible barrier, e.g., an island, that agents must travel around. The white clouds in the left panels illustrate the searcher beliefs for the target position, based on the historical PoL. The right panels show the behavior (routes) of the target and the searchers, which start traveling from the left and right edges of the frame, respectively. The top-right panel illustrates a situation where the searchers head in opposite directions around the center barrier since the target is equally likely to take either route based on the PoL. The lower-right panel illustrates a situation where both searchers take the same route around the center barrier, as that is the route with the highest probability of detection based on their proximity. This demonstrates the impact that the target’s PoL have on the behavior of the searchers and targets.

Figure 2.

Left: Two different target PoL PDFs. Searchers travel in the direction with highest probability of detection based on their proximity and beliefs about the target’s intentions (PoLs).

Demonstration 2: Target Deviates from Searcher Knowledge of Target PoL

We provide a demonstration of how beliefs and biases change the behavior of the searchers and the probability that they successfully detect the target.

Figure 3 illustrates two examples where the searchers base their movement on a known PoL, but the target’s novel behavior deviates from that PoL. In each example, the dots represent the current location of the agents, red for the searchers and white for the target, while the gray cloud represents the searchers' beliefs of the target's PoL based on historical data. In both examples, the target is moving more slowly than expected from the PoL. Note in the right panel of Figure 3, that the searchers are actively searching the areas that they believe the target is most likely to be, but these beliefs are inaccurate. These examples illustrate how deviations from PoL compromise detection.

Figure 3.

The searchers’ movement is based on the historical PoL for the target. By deviating from its historical PoL, the target avoids detection.

Demonstration 3: Effects of Look Ahead Steps

Figure 4 illustrates the influence of the searchers’ ability to look ahead steps in the future, where $N_{k}$ is the number of time steps that the searchers can ‘mentally simulate’ the movement of the target into the future. As shown in Figure 4, the searchers’ paths are more optimal (lengthwise) as $N_{k}$ increases.

Figure 4.

Searcher's Paths as a Function of Increasing the Number of Look-Ahead Steps ( $N_{k}$ ).

In theory, $N_{k}$ could be estimated from overserved data of searcher behavior.

Monte-Carlo Simulations

In the following simulations two searchers are assigned to originate from one of three target destination ports. One searcher is always assigned to the first port, and a second searcher is assigned to the second port with a 2/3 probability or to the third port with a 1/3 probability. Additionally, the first searcher only considers the PoL PDF that terminates at the first port, while the second searcher considers the other two PoL PDFs, terminating at ports two and three. Estimates of mean detection probability are based on at least 500 Monte-Carlo runs for each simulated scenario.

Simulation 1: Biased Beliefs Based on Intelligence can Affect Detect Probability & Latency

The belief matrix that drives Simulation 1 is provided in Table 1. The columns show the port preference for the target and the port preference beliefs the searchers have for the target. In the No Bias condition, the target's behavior and the searchers' beliefs exhibit maximum uncertainty for which port the target will prefer. In the Good Intelligence condition, the target is biased to port one, .94, and the searcher's beliefs are perfectly calibrated to the target's bias. In the Bad Intelligence condition, the target is biased to port two, .94, but the searcher's beliefs about the target are highly biased to port three.

Table 1.

Sim 1: Target Bias and Searcher Beliefs.

	Target Bias			Searchers Belief
	Port 1	Port 2	Port 3	Port 1	Port 2	Port 3
No Bias	33.3%	33.3%	33.3%	33.3%	33.3%	33.3%
Good Intel	94%	3%	3%	94%	3%	3%
Bad Intel	3%	94%	3%	3%	3%	94%

Figure 5 plots the mean probability of detection by the level of bias imposed on the searchers’ beliefs. As illustrated, the mean detection probability increases for Good Intelligence relative to No Bias and decreases for Bad Intelligence relative to No Bias.

Figure 5.

Mean Detect Probability by Searcher Belief Bias.

Thus, Figure 5 illustrates that the mismatch between the searchers’ beliefs and the target’s intent can significantly compromise mean detect probability and latency.

Simulation 2: Calibration of Searcher Beliefs on Detection

Simulation 2 further explores how the level of calibration of searcher beliefs to the target’s preferences affects mean detect probability. Table 2 is the belief matrix, where the columns show the port preference for the target and the port preference beliefs the searchers hold for the target across the different scenarios (rows). The first three rows illustrate a situation where the target’s behavior exhibits equal preference for the three ports or destinations. The searchers’ beliefs match the target’s preference in the No Bias scenario, show a relatively small deviation (bias for port two) in the Small Searcher Bias scenario, and show a relatively large deviation in the Large Bias scenario (significant bias for port two). The bottom two rows (scenarios) represented in Table 2 are situations where the searchers’ beliefs are calibrated to the target’s PoL, but the target purposefully deviates from its PoL, exhibiting a slight or strong preference for port two.

Table 2.

Sim 2: Target Bias and Searcher Beliefs.

	Target Bias			Searchers Belief
	Port 1	Port 2	Port 3	Port 1	Port 2	Port 3
No Bias	33.3%	33.3%	33.3%	33.3%	33.3%	33.3%
Small Searcher Bias	33.3%	33.3%	33.3%	10%	60%	30%
Large Searchers Bias	33.3%	33.3%	33.3%	10%	80%	10%
Small Deviation PoL	10%	60%	30%	33.3%	33.3%	33.3%
Large Deviation PoL	10%	80%	10%	33.3%	33.3%	33.3%

As illustrated in Figure 6, the mean detection is substantially compromised when there are small and large deviations from PoL. Surprisingly, the small and large searcher bias does not compromise mean detect probability compared to the calibrated condition. Searcher bias is mitigated by scenario-specific characteristics, primarily because the PoL exhibits maximum uncertainty. Also, the target bias was always for port two, where the searcher was only assigned to start from 67% of the time and its search behavior was based on the combination of PoL PDFs terminating at both ports two and three.

Figure 6.

Mean Detect Probability by Scenario.

Discussion & Implications

The goal of this paper was to extend recent work in search with an optimal algorithm capable of incorporating multiple targets and searchers. In addition, we provided several demonstrations and simulations where the optimal search algorithm incorporated psychology and human behavior via belief biases. Our simulations show that belief biases mirroring the effects of good intelligence and bad intelligence, or deception, significantly affected the ability of searchers to detect targets. Although, generally, our demonstrations and simulations show that good calibration between searcher and target beliefs supports detection, scenario-specific details did moderate this effect. The fact that scenario-specific characteristics can interact with belief biases in often counter-intuitive ways further supports the need to incorporate psychological beliefs into models and optimizations to support search in mission-realistic scenarios. In other words, knowing what is optimal depends on understanding how less-than-rational searchers and targets behave in different contexts. Thus, our model could support a tool that provides USCG-recommended paths to support interdiction.

A critical aspect of the model is the incorporation of the searcher's beliefs of the target's POL coded via the conditional probability equation $p (x_{k + j}^{t} | x_{k + j - 1}^{t})$ , step 2 of equation 9. We assume the POL is based on historical knowledge and the operator's experience. We plan to leverage some of our previous work using cognitive models to simulate beliefs (e.g., conditional probability judgments via experience) to generate the POLs and belief updating based on external information (e.g., intel reports), respecting human information processing limitations and task constraints (Thomas et al., 2008). Moreover, such beliefs could be elicited in operational contexts using formal methods for estimating subjective probability distributions (Garthwaite et al., 2005) or social sensing methodologies (Prelec, 2004).

Limitations & Future Directions

There are several limitations and exciting future directions for the current work. First, we intend to extend the simulation environment. Our targets did not know about the searchers’ behavior, and we plan to incorporate such knowledge of the searchers in future extensions.

Modeling tradeoffs between detect probability and utility—the searchers may unequally value detections in different locations—could be fruitful. The searchers may value specific locations over others and want to protect them. Moreover, once we incorporate utility functions, we can model the influence of psychological parameters like loss and risk aversion on the searcher, target behavior, and detection probability.

We also plan to incorporate Command & Control (C2) nodes into the model. C2 will allow the searchers and targets to share information vicariously via a higher level (shared operating picture). C2 decision and deployment strategies, including those that are less than optimal, could be interesting to model.

Finally, we would like to incorporate dynamic maximization criteria. For instance, looking back at Figure 3 (right panel), we suspect that searchers will shift their criteria when they cannot find a target in a location they believe it should have been detected with near certainty after a certain amount of time.

Ultimately, we hope our model can generate, represent, display, and manipulate mixed-initiative, agile courses of action under uncertainty, competing goals, and dynamic threats while accounting for human beliefs and constraints in the solutions. As the system continuously updates the mission-transition state beliefs based on information concerning the adversary (sensor updates, intelligence reports, etc.), the system will also update the strategic level of analysis at the command-node level to generate alerts and update COA. Moreover, incorporating What-If capability and intuitive visualizations could be helpful to operators, particularly presentations of the rationale for the anticipated location of a target in terms of current and prior beliefs.

Footnotes

ORCID iD

Steven C. Howell

References

Avvari

G. V.

Sidoti

Zhang

Mishra

Pattipati

Sampson

C. R.

Hansen

(2018, March). Robust multi-objective asset routing in a dynamic and uncertain environment. In 2018 IEEE Aerospace Conference (pp. 1-9). IEEE.

Bourgault

Furukawa

Durrant-Whyte

H. F.

(2004, September). Decentralized bayesian negotiation for cooperative search. In 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566) (Vol. 3, pp. 2681-2686). IEEE.

Eagle

J. N.

(1984). The optimal search for a moving target when the search path is constrained. Operations research, 32(5), 1107-1115.

Garthwaite

Paul H.

Kadane

Joseph B.

O'Hagan

Anthony

. “Statistical methods for eliciting probability distributions.” Journal of the American statistical Association 100.470 (2005): 680-701.

Esher

Hall

Regnier

Sánchez

P. J.

Hansen

J. A.

Singham

(2010, December). Simulating pirate behavior to exploit environmental information. In Proceedings of the 2010 Winter Simulation Conference (pp. 1330-1335). IEEE.

Kahneman

Tversky

(2013). Prospect theory: An analysis of decision under risk. In Handbook of the fundamentals of financial decision making: Part I (pp. 99-127).

Kress

Royset

J. O.

Rozen

(2010). The eye and the fist: Optimizing search and interdiction. NAVAL POSTGRADUATE SCHOOL MONTEREY CA DEPT OF OPERATIONS RESEARCH.

Lopes

L. L.

Oden

G. C.

(1999). The role of aspiration level in risky choice: A comparison of cumulative prospect theory and SP/A theory. Journal of mathematical psychology, 43(2), 286-313.

Prelec

. (2004). A Bayesian truth serum for subjective data. science, 306(5695), 462-466.

10.

Slootmaker

L. A.

(2011). Countering piracy with the next-generation piracy performance surface model. NAVAL POSTGRADUATE SCHOOL MONTEREY CA.

11.

Thomas

R. P.

Dougherty

M. R.

Sprenger

A. M.

Harbison

. (2008). Diagnostic hypothesis generation and human judgment. Psychological review, 115(1), 155.

12.

Zhao

(2005). A fast sweeping method for eikonal equations. Mathematics of computation, 74(250), 603-627.