Abstract
Objective
The aim of this study was to understand factors that influence the prediction of uncertain spatial trajectories (e.g., the future path of a hurricane or ship) and the role of human overconfidence in such prediction.
Background
Research has indicated that human prediction of uncertain trajectories is difficult and may well be subject to overconfidence in the accuracy of forecasts as is found in event prediction, a finding that indicates that humans insufficiently appreciate the contributions of variance in nature to their predictions.
Method
In two experiments, our paradigm required participants to observe a starting point, a position at time T, and then make a prediction of the location of the trajectory at time NT. They experienced several trajectories from the same underlying model but perturbed by random variance in heading and speed.
Results
In Experiment 1A, people predicted linear paths well and were better in heading predictions than in speed predictions. However, participants greatly underestimated the variance in predicted location, indicating overconfidence. In Experiment 1B, the effect was replicated with frequencies rather than probabilities used in variance estimates. In Experiment 2, people predicted nonlinear trajectories poorly, and overconfidence was again observed. Overconfidence was reduced on the more difficult predictions. In both main experiments, those better at predicting the mean were not better at predicting the variance.
Conclusions
Predicting the level of uncertainty in spatial trajectories is not well done and may involve qualitatively different abilities than prediction of the mean.
Application
Improving real-world performance at prediction demands developing better understanding of variability, not just the average case. Biases in prediction of uncertainty may be addressed through debiasing training and/or visualization tools that could assist in more calibrated action planning.
Introduction
The task of prediction has been well documented to be cognitively difficult (Kahneman, 2011). A consistent finding across many research disciplines is that most people, including domain experts, exhibit poor performance at prediction (Einhorn & Hogarth, 1982; Fischhoff & MacGregor, 1982; Tetlock, 2005; Wickens, Hollands, Banbury, & Parasuraman, 2013). Our focus in the current research is on continuous trend prediction (e.g., the course of a hurricane, the trend in global temperature or the stock market, or the flight path trajectory of an airplane).
Against this generalization that humans are not good at prediction, however, is an important qualification. First, the normative standard of “correct” ought to be not perfect performance but, rather, in comparison to the best that an optimal predictive process could achieve (e.g., the ideal weather forecast model). Given the randomness of many events in the world and the fact that such randomness accumulates over time, optimal will sometimes be far less than perfect. However, even when the normative standard is less than perfect, humans frequently underperform their potential and have overconfidence in predictive success (Buehler & Griffin, 2003; Fischoff & MacGregor, 1982; Fischhoff, Slovic, & Lichtenstein, 1977; Spiegelhalter, Pearson, & Short, 2011). Phrased in other terms, people appear to underestimate the contributions of random forces in the environment (Kahneman, 2011). In terms of overconfidence in trend prediction, one focus of our current research, other researchers have observed overconfidence by corporate chief financial officers in predicting stock gains (Ben-David, Graham, & Harvey, 2010), by fighter pilots in predicting spatial behavior of an adversary (Sulistyawati, Wickens, & Chui, 2011), and by weather forecasters in underestimating the uncertainty of predicted hurricane track locations (Broad, Leiserowitz, Weinkle, & Steketee, 2007; Regnier & Harr, 2006).
What characteristics influence the quality of prediction? First, good predictors are highly skilled through feedback-supported practice (Murphy & Winkler, 1984). Second, the levels of uncertainty involved must be manageable. In the case of predicting the flight of a ball, opportunities for environmental disturbances on predicted trajectory are minimal because of few disturbances that have a shorter time to operate. Closely coupled with this latter characteristic is a concept we call the span of prediction, or look-ahead time (LAT). To the extent that this LAT is longer, environmental forces tend to have greater cumulative opportunity to perturb the trajectory, and hence even the optimal models will more likely be in error (Regnier & Kirlik, 2012). In direct applications, we see this growth of uncertainty represented in the expanding “cone of uncertainty” that hurricane forecasters employ to graphically represent the uncertainty of storm location at progressive days (LAT) after a forecast is made (Broad et al., 2007). It is reflected also in the growing uncertainty of the state of other dynamic process variables as time proceeds (Sheridan, 1970), such as the forecast location of an aircraft (Wickens, Gempler, & Morphew, 2000). Conceptually, these effects on measurements of prediction accuracy are represented in Figure 1. Research also indicates that confidence does not decline with LAT as rapidly as does knowledge, indicating a growing overconfidence in prediction, as shown also in Figure 1 (Armor & Taylor, 2002).

An illustration of the relationship between uncertainty and knowledge. This figure shows a situation in which possible certainty about the future state of the world (solid line) is decreasing with the distance into the future that must be predicted. A knowledge gap reflects the shortfall in human understanding (dashed line). Operator confidence (dotted line) about future states is not necessarily tied to either the future state of the world or the actual degree of understanding of that future state, thus overconfidence will increase with look-ahead time.
Although some researchers have explored limitations of trend prediction, and more still have examined the processing of uncertainty (e.g., Kahneman, 2011), surprisingly little work has integrated these two concepts by understanding and modeling the specific effects of uncertainty on human prediction in dynamic environments. Unaided predictions in safety-critical dynamic systems, such as ship operations, industrial process control, or air traffic control, are problematic, even when compared with uncertain estimates of an optimal model. The cognitive challenges are not limited to novices but also extend to highly trained professionals. The extent of this cognitive challenge can be illustrated by the degree to which predictive displays provide benefits to user prediction and control in predicting aircraft conflicts (Wickens et al., 2000; Wickens, Mavor, Parasuraman, & McGee, 1998), ship driving (Van Breda, 1999), and industrial or energy process control (Roth & Woods, 1988; Yin, Wickens, Helander, & LaBerge, 2015). Thus human trend prediction is difficult, with performance less than optimal; yet often, prediction both is coupled with overconfidence and can be aided by displays.
In the experiments described here, our interest is in predicting continuous trajectories of objects simulating motion across a time scale of several minutes, to examine the extent to which similar challenges and biases may be manifest and address how prediction complexity may influence these challenges. As outlined earlier, understanding, and ultimately improving, human decision making about future states of uncertain systems has clear real-world implications in a range of complex tasks. To develop scientific understanding of the factors at play, we developed a lab-based task to capture crucial, prototypical elements of scenarios in which predictions of uncertain future states are required. The task is designed to simulate the movements of objects on a map, such as the path of a boat at sea or the track and speed of a hurricane (Broad et al., 2007). There are two very different aspects of such prediction over time, and the contrast between these is fundamental to our work: (a) predicting the “typical” path (e.g., the most likely point of landfall in a cyclone) and (b) predicting the uncertainty itself (e.g., the growth of the “cone of uncertainty”).
This distinction between the two elements of prediction is quite analogous to the distinction or understanding of the mean based on several samples, a task at which people are quite good (Peterson & Beach, 1967; Pitz, 1980, Wickens et al., 2013), versus assessing the variance of those samples, a task at which people are not so proficient and at which people exhibit systematic biases (Mannes & Moore, 2013). In particular, in other contexts it has been observed that people tend to underestimate the variance of multiple samples, as if consistently underestimating the contribution of additional factors to variables in the world (Henrion & Fischhoff, 2002; Kahneman, 2011; Tversky & Kahneman, 1971). Such an underestimation may be described as implicit overconfidence in the precision of an estimate, although there is a wide variety of other causes of overconfidence in judgment and prediction (Einhorn & Hogarth, 1982; Kahneman, 2011), just as underprediction of variance may result from other causes as well.
In addition to assessing differences in the prediction of central tendency and variance of spatial trajectories, we are also interested in assessing differences in predictions of speed and direction within this spatial navigational context as well as differences in linear versus accelerating trajectories. One reason for examining these issues is evidence from air traffic control that controllers have a preference to reroute aircraft using vectors (direction) rather than speed changes (Wickens et al., 1998, 2009), a difference accounted for, in part, by the greater difficulty of visualizing speed changes and their implications on predicted distance (a mental calculus integration). A second reason is that in many applied geographical-spatial domains, such as hurricane or tornado track forecasting, people must understand and respond to both the temporal (along track) and spatial (track direction) aspects of the moving entity (Broad et al., 2007; Regnier & Kirlik, 2012).
To explore the difference between the understanding of mean path and path variance, we created a task in which individuals were given the opportunity to learn the speed and direction of an object moving on a map. Participants were shown the starting point of the object (at T0) and then an estimated location of the object at a later time (T1). The estimates of the location at T1 were sampled from a “model” having random variance in both speed and direction. Over a number of encounters with the model, an individual can learn the “average” trajectory and would be able to predict the location of the object at T3 (i.e., an LAT of two time units). Deviation from the most probable location allowed us to measure participants’ understanding of the mean. Participants also separately estimated the probability they would encounter the object in a sampling of various regions at T3, allowing us to assess their predictive understanding of the growth of uncertainty or model variability. Given that we were measuring what they learned in predictive phase, we did not provide feedback for their probability estimations.
Experiment 1
Experiment 1A
Three hypotheses are offered for Experiment 1A:
Hypothesis 1: Participants will be good at identifying and predicting the mean trajectory of the target.
Hypothesis 2: They will be relatively less proficient at predicting variability (Obrecht, Chapman, & Gelman, 2007; Peterson & Beach, 1967). Precisely how such a decrement will manifest is uncertain. However, based on previous analyses of the perception of randomness (e.g., Kahneman, 2011), we anticipate in particular that people might underestimate its contribution and hence overestimate the tendency of samples to cluster closely around the mean. In addition, we anticipate similar overestimation biases at locations farther away from the mean.
Hypothesis 3: They will be better at extrapolating heading than distance (Wickens et al., 1998).
Method
Eighty-three subjects participated for optional, partial course credit. The experiment was administered via E-Prime and consisted of five two-phase blocks. In the first phase of each block, the participant encountered instances of a target trajectory distribution to learn the underlying pattern. In the second phase, the participant was presented a series of probes and was asked to estimate the probability that the target was within each of a set of probe circles. The first phase therefore indexed knowledge of the central tendency, whereas the second phase assessed understanding of the distribution and hence the variance.
A block consisted of distribution of target trajectories, drawn from the same mean direction and speed (i.e., the same model in both phases; see Figure 2A), with a different distribution employed for each block. Using the model, we then calculated a trajectory from time T0 to T3 with randomized variance added to both speed and direction by sampling from normal Gaussian distributions. These variances represented the growth of uncertainty over the predicted time, such as that in hurricane forecast maps. The standard deviation for the distributions was held constant within each block, but each trial sampled from the distribution independently, creating a distribution of trajectories centered on the model (see Figure 2B). The arrowheads at T3 depict the endpoints of each sample collectively, although subjects viewed only one sample at a time and never viewed the data collectively. At T1, a possible location of the target was taken from the distribution of samples to be shown to the participant but was unrelated to the sample selected to be predicted at T3. Target start point and starting azimuth were varied around the screen between blocks (so that not all of them were left-right horizontal as depicted in Figure 2). The design was entirely repeated measures. The order of presentation of the different models was randomized.

Schematic of the underlying target behavior. The model is shown in Panel A. Each model contains a starting location, an initial heading, and speed. The participants are shown an estimate of the location at Time 1 (T1) drawn from one vector and must then estimate the final location at Time 3 (T3) drawn from another instance. Over a repeated number of trials, the location of the target at T3 will form a distribution like that shown in Panel B.
The first phase for each block (central tendency estimation) consisted of 20 learning trials for a given model. Each learning trial consisted of four consecutive screens, as shown in Figure 3, with Screens 1 and 2 representing sequential times of data observation, Screen 3 requiring positioning of the subjects prediction response, and the final screen providing feedback. Screen 1 or start screen displayed the location of the target at time T0 at a set point along the horizontal or vertical edge of the screen for 5 s. Within each block, the start location remained the same, as it did not include the added random variance. On Screen 2 (predictive information screen), participants were shown a possible location of the target at time T1 (see Figure 2B) that remained visible for 5 s. To prevent simple extrapolation, the location shown on Screen 2 was not the actual location of the target at T1 but was a different sample drawn from the same model. Participants were informed that Screen 2 was a possible location and not the actual location.

Schematic of Phase 1 event sequence. Participants first see the true location of the target at T0, then a possible location of the target at Time 1 (T1; predictive information screen), and are required to place a marker where they predict the target will be at Time 3 (T3). They are given feedback displaying the true location of the target and the distance of their error.
Screen 3 was the estimation screen and was identical to Screen 2 except that it contained a prompt instructing the participant to estimate the target location at T3 (i.e., twice the time interval and hence twice the distance traveled from Screen 2, from those separating T0 to T1). A crosshair was placed when the participant indicated his or her estimated location with a mouse click. Screen 3 remained visible for 10 s or until the user chose to continue. Following Screen 3, a feedback screen was presented containing the prediction crosshair and the target stimulus at the location calculated by the sampled trajectory for T3. The feedback screen also provided a numerical calculation of the distance between the prediction crosshair and the location of the target. After the feedback screen was shown for 10 s, a new trial began, with a new sample trajectory calculated from the same model and variance. After completing the 20 trials in Phase 1, the participant moved on to Phase 2, with the same underlying model utilized throughout both phases of that block. For each block, a new model was generated that changed the direction of the vector and distance moved and modified other parameters slightly (e.g., the “splay” of the vectors in Figure 2B).
Phase 2 (variability estimate) consisted of 20 trials, each containing three screens presented in sequential order. Screens 1 and 2 were identical to those from Phase 1. Screen 3 in Phase 2 presented a 100-pixel-diameter circle at a point selected from a grid pattern that was centered on the model trajectory at T3. Participants were queried to provide a numerical estimate of the probability that the target was within the circle at T3; that is, provide a subjective perceived “confidence interval” (circle), whose value would correspond to that percentage response given by the participant. (In Figure 2B, we illustrate schematically such a confidence interval of about 75% centered on the model mean.) Screen 3 remained visible until the participant made a response, and then a new trial began. The center of each circle was presented randomly from a grid pattern of locations centered on the 2-D model mean endpoint (the center of the distribution for the given model). Locations closer to the center were more likely to be selected. No feedback was given within this phase to prevent continued learning of the target behavior after Phase 1. This procedure was repeated for a total of 20 trials, before a new block began, with Phase 1 containing a new model (i.e., a different start location, direction, speed, and uncertainty growth function).
Results
Phase 1: Prediction of the mean trajectory
The overall accuracy for the Phase 1 target placement task was measured by calculating the absolute distance between the generated target location at T3 from the participants’ response of their predicted location. The overall mean accuracy for participants was 96.02 pixels (SE = 1.71). As a measure of learning across trials, the distance to the target was regressed over trial for each individual. With use of a log-log transformation to express the expected power law form of the learning curve, the slope coefficient can be considered as a measure of how much the individual improved (or worsened), and the R-squared value is useful in providing the extent to which this change in accuracy is explained by learning over the trials. Overall, participants had an average slope coefficient of −0.149 and an R-squared value of .21, with the slope significantly different from zero according to one-sample t test, t(82) = 13.16, p < .001, and hence indicated learning.
Although improvement was seen, learning did not quite reach optimal performance. In the last five trials, participants remained significantly worse than the optimal strategy of choosing the average position, t(82) = 6.91, p < .001, a strategy that would have yielded an average absolute error of 73 pixels (see Figure 4).

Average distance from target across trials. The average distance (in pixels) between the target location at Time 3 (T3) and the participant location estimates at T3 is initially large but decreases rapidly toward the optimal strategy. Brackets indicate one standard error above and below the mean.
Consistent with our predictions in Hypothesis 3, error was greater for distance than for heading, t(82) = 47.99, p < .001, d = 10.53 (along-track [distance], M = 155 pixels, SE = 1.97; across-track [heading], M = 54 pixels, SE = 1.82).
Phase 2: Prediction of variance
For Phase 2, participant estimates of probability were compared with the distribution of end points generated by 10,000 simulations of the model. Overall, participants performed inaccurately on this task. Even where they were most accurate (circles near the model center), they estimated the probability of falling within the circle to be 40% when in fact it was only 32%, with the group estimate significantly different from the actual value, t(82) = 3.01, p < .01, d = .66. Figure 5 depicts the subjects’ estimate of the likelihood that samples would fall in probe circles placed at four different eccentricities from the center of the distribution. This finding contrasts participants’ reported probabilities from those regions with the actual values obtained from the model simulation, within rings or “probability bands” moving outward.

Average true probability of probe versus the average estimated probability of probe at varying probability bands centered on the model mean and extending outward toward the edges of the distribution.
The most important aspect of the data is that the proportion of estimated observations in the center area, near the model mean, substantially overestimates the true proportion as reported earlier. However, it is noteworthy that this overestimation continues to be manifest, to even a larger degree at the more peripheral locations. An ANOVA performed on the overconfidence estimate revealed a significant effect across locations, F(3, 319) = 21.24, p < .01, η2 = .17, and does not decline as rapidly as the actual distribution. It is possible that these biases represent the failure of participants to fully understand the meaning of the response measure “probability estimation,” as people have been shown to have difficulty dealing with probabilities as opposed to frequency counts. We examine this possibility in Experiment 1B.
Individual differences
To determine if the ability to effectively predict the mean was related to estimation of variance, we regressed participant variance estimation performance on mean estimation performance for the final five trials. There was no significant relationship between performance on the mean and performance on variance estimation, r = .07, p = .57. We further separated participants into high and low performers by taking the best- and worst-performing quartiles on each task. Participants who performed well (mean estimation: M = 64.5 pixels, SE = 1.2) on mean estimation were no better than those who performed poorly (mean estimation: M = 111.5 pixels, SE = 3.1) on the center variance estimation task (where they were best calibrated), t(35.08) = 1.62, p = .11, d = .56 (Welch’s t test was used here, as differences in variance was expected between high and low performers). Similarly, those who had the least error in variance estimation (variance estimation: M = −0.6%, SE = 1.1%) performed as poorly as those who had the most error (variance estimation: M = 25.5%, SE = 7.7%) when it came to mean estimation, t(30.38) = −0.14, p = .89, d = .05.
Discussion
Experiment 1A was designed to explore how well people make spatial predictions under conditions of uncertainty. In this experiment, we looked at how individuals learned and understood the mean trajectory and variability of the behavior of a moving target that was probabilistic in nature. In line with our hypotheses, our results from Phase 1 demonstrate that participants are able to learn to predict a mean trajectory, performing at high levels after a small number of trials. Their final performance was within 82% of optimal.
Although participants learn the mean trajectory, their understanding of underlying variance is far less precise (overestimating more than 5 times the total probability, averaged across the four sample locations). Overestimation of probability was seen at the center of the distribution, and participants continued to be poorly attuned to the reduction in probability as circles moved toward the periphery. Indeed, we find approximately 100% overconfidence at even the two most central locations (center and near center), likely reflecting an implicit overconfidence in prediction accuracy. This is not simply an extension of the findings from the literature on the underestimation of the contribution of random variation to the act of trajectory prediction (e.g., Kahneman, 2011), as participants grossly overestimated across the entire distribution rather than simply overrepresenting around the mean. This miscalibration was found for participants who were both good and poor estimators of the mean. Such a finding is congruent with a dissociation between mean estimation and understanding of the variance in our population, and these tasks may tap quite different cognitive skills. Individual differences in appreciation of uncertainty have been reported elsewhere (e.g., Washburn, Smith, Baker, & Raby, 2001) but not in the prediction paradigm such as that employed here.
In accounting for the difference between good learning of the model mean and poor learning of its variance, it is possible that a key element lies in feedback. On the one hand, feedback on every trial of Phase 1 clearly underlays the learning curve of Figure 4, whereas there was no feedback in Phase 2, when variance estimate was poor. But on the other hand, the same feedback of error magnitude in Phase 1 that informed about the mean estimate could also have provided fully reliable feedback for learning the variance if participants had mentally integrated the dispersion of feedback points on the fourth screen. Yet it did not.
As expected, participants also were shown to be better at extrapolation of heading compared to distance, in line with the understanding that predicting distance is a more difficult mental calculation. To reduce mental calculation, it may be possible to develop visualization tools and predictive displays that can convert the distance extrapolation into a direct visual extrapolation similar to heading.
Our findings paint an interesting picture for human understanding of spatial uncertainty. It appears that people are able to learn the average behavior of a spatial phenomenon while simultaneously being relatively insensitive to the variance of the behavior. The overestimation mirrors past research that shows individuals to be overconfident in their predictions (Bueheler & Griffin, 2003).
Experiment 1B
In order to examine the possibility that the overconfidence phenomenon, reflected in the subjective probability distributions of Figure 5, might have been an artifact of the difficulty our participants had in understanding the meaning of probability (see, e.g., Gigerenzer, 1994), we ran a small experiment (N = 20) that used the identical procedures as Experiment 1A, with the single exception that participants were asked, “Over the course of 100 trials, how many times would the target fall within the circle shown?” The results are represented in Figure 6, in the same format as Figure 5, and replicate the same effect as with the probability estimates of Experiment 1A. In fact, at the center location, the overconfidence measure as reflected by the difference between reported and actual number is actually larger (12%) than in Experiment 1A (8%) and is again statistically significant, t(20) = 2.33, p = .03, d = 1.04. Again, this overestimation of proportion within the target region continues and appears to be amplified at the more peripheral target locations.

Average true frequency of probe versus the average estimated frequency of the target falling within the probe for 100 trials at varying probability bands centered on the model mean and extending outward toward the edges of the distribution.
Experiment 2
The previous experiment involved a relatively simple extrapolation task, in which trajectories were calculated in Gaussian but linear manner. However, humans have been shown to be considerably worse at extrapolation of nonlinear or second-order trends (Wickens et al., 2013), whose cognitive complexity is greater (Halford, Wilson, & Phillips, 1998). To see if the pattern of results for Experiment 1A holds true for more complex dynamic behaviors, we manipulated the trajectory in Experiment 2. In order to separate out the effects of heading from the effects of speed, we manipulated complexity of heading and speed separately as well as together. Heading was manipulated so that the target followed a curved path. Speed was manipulated by adding acceleration, so that the target either slowed or sped up as time progressed. It is possible that the linear case is the general case for predictions of this nature and we will encounter a similar pattern of prediction and variance estimation. However, following past research, we predict that these manipulations will make it more difficult for participants to predict future locations of the target than was the case with the linear trajectories in Experiment 1, simply through the increased complexity imposed by the trends.
We also anticipate that prediction of straight but accelerating paths should be harder than prediction of curved paths (just as velocity extrapolation was more difficult than heading extrapolation in Experiment 1A), but following the assumption that workload is somewhat additive, we predict both components together should be more difficult than either alone.
How this difficulty in predicting future locations will affect the understanding of variance, manifest as overconfidence in Experiment 1A, remains to be seen. It is possible that the understanding of variance is independent of the behavior of the object and that the linear case in Experiment 1A represents the general case for trajectories, accelerating or otherwise, and participants will perform similarly poorly. Performance could also possibly decrease, as the task is more complex and therefore more demanding. Participants may be unable to grasp the nuances of the underlying behavior and rely on a simpler model with greater inherent overconfidence (Kahneman, 2011). In contrast, the increased complexity in the prediction task may clue participants in that the task is more difficult and, therefore, they should put less trust in their mental models and intuition and display less overconfidence. It is also very possible that we will see all three levels of performance among our participants, as statistical understanding varies widely between individuals.
Hence in Experiment 2, we hypothesize the following:
Hypothesis 4: Prediction performance will be poorer overall than in Experiment 1A.
Hypothesis 5: Within Experiment 2, the mean prediction error will be progressively higher from curve prediction to linear acceleration prediction to prediction of both.
Hypothesis 6: Overconfidence will again be manifest, but the relative ordering of this bias across the three prediction cases cannot itself be easily predicted or hypothesized.
Method
Participants
Fifty-nine subjects participated for optional, partial course credit. These participants were randomly assigned into one of the three groups with different variations in the complexity of the trajectory paths.
Task and procedure
The experimental design was similar to Experiment 1A but had several important differences. The number of blocks was increased to six experimental blocks and one practice block to allow for two blocks for each level of the acceleration condition. To accommodate this increase in blocks, the number of trials per block was reduced to 16 in both Phase 1 and Phase 2. The model for the target trajectories contained additional terms corresponding to change in speed (i.e., acceleration) and change in direction (curve). In version CUR (curve), the change in direction was set so that the heading changed by 10° or 30° between T n and Tn+1, but change in speed was held at zero. In version ACC (acceleration), change in speed was set so that the target accelerated by 10 pixels or 30 pixels or decelerated by 10 pixels between T n and Tn+1, and change in direction was held at zero. Version C&A (curve and acceleration) combined both change in direction and speed. Changes in speed and/or direction were randomized between blocks. Trajectory type (ACC, CUR, C&A) was varied between participants.
In order to depict the changing nature of the models, an additional predictive information screen was shown in Phase 1 (as Screen 3), indicating a possible location at T2. The estimation screen was moved to Screen 4, such that the event sequence proceeded: start screen (T0), Predictive Information Screen 1 (T1), Predictive Information Screen 2 (T2), estimation screen (T3), and feedback screen (T0–T3). Using two predictive information screens with the start screen is minimal but provides the necessary and sufficient information to identify curvature or acceleration of the paths.
In Phase 2, participants were not shown predictive information screens and were instructed to base their estimate on the previous encounters with the model from the start location shown. The locations of the probe circles were dispersed as in Experiment 1.
Results
Phase 1: Prediction of the mean
Following the procedures used in Experiment 1A, the distance to the target was regressed over trial for each individual, as a measure of learning across trials. Following as log-log transformation, overall, participants had an average slope coefficient of −.042 and an R-squared value of .10. The former value was significantly different from zero according to one-sample t tests, t(57) = −3.70, p < .01, d = .96. This finding suggests that although learning did take place (significantly negative slope), this improvement accounted for only 10% of the total variance in performance across trials. There were no significant differences between conditions for either the slope coefficient, F(2, 55) < 1, or the R-squared values, F(2, 55) < 1. Finally, we note that in comparison with Experiment 1A, learning in Experiment 2, as assessed by the log-log slope, was of smaller magnitude, t(139) = 6.38, p < .01, d = 1.07.
The performance on the final five trials was also calculated to provide a better overall assessment of task difficulty (see Figure 7). Participants in the ACC condition (M = 182, SE = 7.13) had greater error than those in the CUR condition (M = 155, SE = 3.92) and those in the C&A condition (M = 166, SE = 3.70), F(2, 55) = 7.40, p < .001, η2 = .21. Post hoc comparisons revealed that ACC had significantly higher error than CUR (p = .01) and marginally higher error than C&A (p = .10). The latter two conditions did not differ from each other (p > .20).

Mean estimation error (top). Actual and estimated variance at the center (model mean) of the distribution as a function of condition (bottom). *p = .05, ***p = .001.
For Phase 2, as in Experiment 1A, participant estimates of probability were compared with the distribution of end points generated by 1,000 simulations of the model. To measure the understanding of variance, we calculated the difference between the user estimation of likelihood at each location with the likelihood generated by the simulations of the model (referred to as true likelihood). Overall, participants performed poorly on this task, overestimating the probability of the target’s falling within the circle near the model’s center by an average of 32% (actual probability = 12%, estimated probability = 44%), t(57) = 10.66, p < .001, d = 2.78. Participants in the CUR condition overestimated to a greater extent, with an average of 41% overestimation, whereas the straight-line ACC had an average of 32% overestimation, and participants who experienced both curvilinear and linear acceleration overestimated by 21% (see Figure 7). A single-factor ANOVA on the variance estimates at the center revealed a significant difference between groups, F(2, 55) = 4.32, p < .05, η2 = .13, with planned pairwise comparisons revealing that the significance is a result of the difference between the estimates in the curvilinear group (poorest) and the group that experienced both (best) (Bonferroni adjusted p value = .014). The remaining two contrasts did not approach significance (p > .20).
Individual differences
As in Experiment 1A, we regressed participant variance estimation performance on mean estimation performance for the final five trials as a measure of the relatedness of these two measures. Collectively across all three trajectory types, there was no significant relationship between mean estimation and variance estimation, r = −.10, p = .21. Type of trajectory was not a significant factor when it was included as a factor in the regression model, F(2, 55) = 2.21, p = .12.
Although there was no relationship between mean estimation and overall variance estimation, the performers in the top quartile at estimating the mean location may have actually performed worse than other participants on the variance estimation at the center (43% overestimation compared to 27% overestimation). This difference approached significance, t(27.4) = 2.00, p = .06, d = .62 (once again, using Welch’s t test). Examining the contrast between mean and variance from a different perspective, we see a similar pattern, with the top quartile of variance estimators at the center having an average error of 178 pixels in Phase 1, compared to 161-pixel error for the lowest 25% of the sample, which also approached significance, t(24.65) = 2.06, p = .05, d = .64. In both of these contrasts, then, there was an absence of a positive association between these two forms of estimation.
Discussion
Compared with Experiment 1A, participants had greater difficulty learning how to predict the location of the target, even though they showed some learning. The progress of trials explained only 10% of the variance, and the improvement on a per-trial basis was small (less than 1% improvement) in comparison with the improvement in average error of roughly 50%. Overall performance was within 72% of optimal.
Subjects continued to overestimate the likelihood that the target would be at any given point, as shown by the estimation of likelihood at the center. It is indeed possible that this estimate of participant overconfidence is actually conservative, since participants continued to estimate the target likelihood at values similar to those in the center, even as the probes moved farther out and the true likelihood diminished (across the entire distribution, they estimated 7 times the total probability).
Perhaps the most important message revealed by Experiment 2 is, once again, the dissociation between the ability to estimate central tendency and the ability to understand variance. Individuals who were good at estimating mean location were actually worse at understanding the variance, as defined by their degree of overconfidence in the precision of their estimate. This finding is also mirrored in the conditions in which people were good at estimating the mean versus the conditions in which people were good at estimating the variance. Despite being better at estimating the mean, those in the curvilinear group showed the largest bias in underestimating variance.
Surprisingly, the C&A group was the best at estimating the variance at the center. Several investigators have postulated the difference between two cognitive systems of judgment that may underlie this effect (Evans, 2008, Evans & Stanovich, 2013; Kahneman, 2011). System 1 is said to be fast, intuitive, invoked in easy problems, and subject to overconfidence. System 2 is said to be slower, deliberative, and effortful, and its lack of “cognitive ease” dissipates the sense of overconfidence. In this context, it is possible to postulate that this dissociation might relate to the invocation of a different form of judgment (System 2) on these multidimensional problems (C&A) that are seen as very difficult, in contrast with the single-dimensional judgments CUR and ACC. In particular, the curved predictions, found to be the most accurate in Phase 1, might have induced the use of an intuitive System 1 judgment (Kahneman, 2011), a kind of processing often associated with overconfidence.
General Discussion
In the current research, we examined the prediction of slow changes in continuous geographic location, rather than the occurrence of discrete events, to assess if corresponding accuracy loss and overconfidence would be observed, whereby confidence was assessed by asking for subjective estimates of the likelihood that a target prediction would fall within a given area. Across two experiments, we examined predictions of linear paths in their angle and distance, and predictions of nonlinear paths that both accelerated (speed changes) and curved (heading angle changes).
The task in general was learnable, but challenging, particularly in Experiment 2 with the nonlinear changes, as inferred from both the relatively high error rates at the end of each block with a given model and the small variance accounted for by learning.
The most pervasive finding, across all conditions of both experiments, was the pronounced overconfidence in the 2-D accuracy of subjects’ predictions, as indexed by their confidence that the endpoint of the predicted trajectory would fall within a circle of a given radius. As assessed by the difference between actual and perceived frequency of such prediction outcomes, this overconfidence ranged between 20% (linear) and 41% (nonlinear curved). Such a finding of overconfidence, while reflecting the general findings regarding event prediction, also replicates and extends other findings of quantitative predictions, such as those that experts underestimate the variability of scientific constants (Henrion & Fischhoff, 2002), underestimate the variability of the stock market (Tetlock, 2005), or implicitly underestimate the variance in experimental data that will be collected by choosing insufficient sample sizes (Tversky & Kahneman, 1971). It also is consistent with the observation that people tend to underestimate the contribution of randomness to trends and processes in general (Kahneman, 2011) and are more overconfident in future (prediction) than in present-oriented situation awareness questions (Sulistywati et al., 2011).
A second finding emerging from our results is that prediction problems differed in difficulty (measured by mean error) as, seemingly, manifest in the ease or cognitive complexity in visualizing the predicted variable. There were three manifestations of this cognitive complexity difference. First, linear (constant-rate) predictions of both angle and distance were easier in Experiment 1A than were nonlinear or changing-rate predictions examined in Experiment 2, whether the changing variable was angle or along-track position. In Experiment 1A, angle can be visualized and, with a constant rate, time can be directly visualized as position. But in Experiment 2, these constancies no longer hold, and considerably greater cognitive complexity is imposed. Angle can be directly visualized as a visual “primitive” (Hubel & Wiesel, 1962) but speed no longer so, because motion cannot be directly perceived but must be inferred by differencing two consecutive points.
The second manifestation of cognitive complexity (angle vs. along-track position) was directly reflected in the higher magnitude of estimation error for the latter. The third manifestation, seen in Experiment 2, is that curves (changes in angle) were more accurately predicted than accelerations (changes in speed or position differences), essentially “scaling up” the angle–speed differences of Experiment 1A to their first derivative counterparts in Experiment 2.
The third major finding relates to the dissociation between predicted mean and variance, a dissociation manifest in two general phenomena: between people and between conditions. In both experiments, people who were particularly good in one type of estimation were not particularly proficient in the other, thus seemingly representing a difference in cognitive ability. Furthermore, in Experiment 2, the condition that spawned better performance in estimating the mean or central tendency (curvature) was that which spawned poorer performance (i.e., greater overconfidence) in estimating the variance. These two, parallel dissociations (between subjects and between conditions) could be explained in part from an attentional focus perspective (Kahneman & Frederick, 2002). The broad, integrative attention required to “filter out” variance and differences and focus on the generalized mean or prototype model prediction will, by definition, obliterate or dampen the memory for that variance. Better performance of people or conditions at the former level could thus be expected to be worse at the latter.
The data reveal an intriguing observation about the general effects of problem difficulty on overconfidence. Between experiments, the greater overconfidence in the more difficult, nonlinear trajectories than in the easier, linear ones confirmed the general finding of a trend emerging from overconfidence research revealed by Wickens et al. (2013; chap. 8, section 7.2), in which more difficult predictions induced greater overconfidence. However, this pattern is the opposite of the relations found between the conditions within Experiment 2, in which the more complicated model behavior (C&A) produced the least overconfidence, and greater overconfidence was found on the easier curved predictions than on the more difficult acceleration predictions. Perhaps this difference may reflect a way in which geographic path or trend prediction differs from event prediction. However, given the repetitive nature of the trials and that feedback was provided on a trial-by-trial basis (in Phase 1), it is possible that all conditions fall under the domain of “good” decision making and the differences between conditions are due to the use of more confident System 1 invoked in direction extrapolations and more cautious System 2 thinking when acceleration is present (and it is no longer possible to rely on the System 1 perceptual extrapolation). Such a difference between the cognitive demands of estimating means versus variance has been argued by Pitz (1980) and has been shown in performance by Obrecht et al. (2007).
Applications, Practical Implications, Limitations, and Future Directions
The current results generate two major applications or implications for spatial prediction in complex environments. First, these findings have implications for visualization tools. In other areas of prediction, particularly in aviation, ship operations, and process control, predictive aids have proved advantageous for control. But such aids are often based on “point predictions” providing little visualization of the growth of uncertainty with longer LAT. How might such growth be effectively conveyed (Broad et al., 2007; Wickens et al., 2000) in a manner that not only improves prediction of the task at hand but might generalize to other predictive environments? Our findings suggest that understanding of mean behavior and understanding of variance are dissociable, which makes it important to identify how trainings or visualization (e.g., of uncertainty rings or of fast-time trajectories) may distinctly influence performance and overconfidence. The current task allows us to approach these measures separately, which lays the groundwork for future investigation into managing uncertainty.
Second, the current results have a major implication for strategic decision making that follows from an estimate of future uncertainty. By understanding the general conditions in which overconfidence in predictions occur, we can better consider the consequences of such overconfidence. The most important of these would seemingly be a failure to prepare for the alternative course of events should the prediction be wrong. For example, unwarranted certainty that a hurricane path will miss a town may lead to the failure to prepare for evacuation. Likewise, unwarranted certainty that a weather front will stay clear of an airport approach may lead the controller to fail to prepare an alternative approach path for incoming traffic. Such scenarios might lead to better training as to the consequences of overconfidence in prediction. These decisions should be explored in future research.
The primary limitation of the current study would appear to be its use of unpaid student volunteers with a task that is relatively unfamiliar beyond a minimal amount of training. This is a legitimate concern, and the study warrants replication with more highly trained experts. However, this concern can be mitigated somewhat by the findings that such overconfidence remains well documented with experts in many domains (Kahneman, 2011; Tetlock, 2005).
A second limitation of the study is that we did not directly elicit explicit judgments of confidence from our participants. Instead, our measure was an implicit one based on their probabilistic estimates of spatial termination locations. However, this procedure may actually have an advantage of not requiring participants to have an explicit understanding of the quantitative meaning of confidence.
A third limitation is the potential confounding of response mode between the mean and variability estimation of spatial material. The mean response, a spatial one, may be more compatibly mapped to the cognitive demands of spatial estimation, whereas the variability response may be less compatibly mapped with a verbal numerical response (Loomis, da Silva, Fujita, & Fukusima, 1992; Wickens, Sandry, & Vidulich, 1983). This difference, in favor of the mean estimation, might also contribute to the performance differences observed here, and authors of future research should explore possible spatial means of indicating a perceived variability.
A fourth limitation, in interpreting the dissociation of effects between mean prediction and variance prediction, is that we explicitly directed participants’ attention to the former but not the latter. Perhaps variance learning might have been improved had participant attention been directly focused on this dispersion via instructions. This issue is worthy of further investigation.
Key Points
We examined overconfidence biases in people’s prediction of the terminus of uncertain spatial trajectories, like the track of a hurricane or the future movement of a ship in a storm, given that such biases have been observed in other forms of prediction.
In our first experiment, people were reasonably good at predicting the mean termination of a linear trajectory, given two early samples, but they were poorer at predicting distance compared with heading and were overconfident in assessing their prediction accuracy.
In a second experiment, prediction was less accurate for nonlinear (accelerating and curving) trajectories than for the linear trajectories of Experiment 1 and less accurate in assessing acceleration than curvature. Participants were again substantially overconfident in the accuracy of their prediction.
In both experiments, people who were good at estimating the mean termination were not necessarily good at estimating the uncertainty (they were overconfident), and those proficient at estimating uncertainty were not necessarily good at estimating the mean, suggesting that qualitatively different skills may underlie the two forms of judgment.
Footnotes
Acknowledgements
This research was supported by Code 34 of the Office of Naval Research under grant N00014-15-1-2559, program manager Jeff Morrison (PI: Cap Smith).
Nathan Herdener obtained his master’s degree in psychology and statistics from Oregon State University. He is currently a doctoral student under the supervision of Ben Clegg at Colorado State University with a focus on cognitive psychology and human factors.
Christopher D. Wickens is a professor emeritus of aviation and psychology at the University of Illinois and is currently a senior scientist at Alion Science and Technology, Boulder, Colorado, and a professor of psychology at Colorado State University.
Benjamin A. Clegg is a professor of cognitive psychology at Colorado State University. He received his PhD in psychology in 1998 from the University of Oregon.
C. A. P. Smith holds an engineering degree from Massachusetts Institute of Technology and a PhD in information systems from the University of Arizona. He is currently an associate professor of information systems at Colorado State University (CSU). Prior to working for CSU, he was a senior scientist at the U.S. Navy’s Space and Naval Warfare Systems Center in San Diego, also know as SPAWAR. While at SPAWAR, he conducted a program of research into decision making and managed development of state-of-the-art decision support systems. He has published a number of scholarly articles in journals, such as Human Factors, International Journal of Human–Computer Systems, and International Journal of Human–Computer Interaction.
