Abstract
Humans increasingly use automated decision aids. However, environmental uncertainty means that automated advice can be incorrect, creating the potential for humans to act on incorrect advice or to disregard correct advice. We present a quantitative model of the cognitive process by which humans use automation when deciding whether aircraft would violate requirements for minimum separation. The model closely fitted the performance of 24 participants, who each made 2,400 conflict-detection decisions (conflict vs. nonconflict), either manually (with no assistance) or with the assistance of 90% reliable automation. When the decision aid was correct, conflict-detection accuracy improved, but when the decision aid was incorrect, accuracy and response time were impaired. The model indicated that participants integrated advice into their decision process by inhibiting evidence accumulation toward the task response that was incongruent with that advice, thereby ensuring that decisions could not be made solely on automated advice without first sampling information from the task environment.
Keywords
Human interaction with automation has been a major area of inquiry for more than 40 years (Bainbridge, 1983; Wiener & Curry, 1980). Decision aids that recommend actions to human operators are increasingly prevalent: In health care, decision aids support diagnoses and provide treatment advice; in air traffic control (ATC), decision aids advise controllers how to maintain aircraft separation; in defense, decision aids recommend how to coordinate unmanned vehicles; and in airports, decision aids support luggage inspection.
Decision aids benefit performance and reduce workload (Onnasch et al., 2014). However, the uncertainty inherent in complex work systems means that automated advice can be incorrect. This creates the potential for operators to erroneously act on incorrect advice (misuse; Lee & See, 2004) or disregard correct advice (disuse), 1 although this is less likely with reliable automation (Wickens & Dixon, 2007). Automation misuse by experts can occur in work domains in which incorrect decisions risk serious consequences, including aviation, health care, and process control.
Research has identified several design (e.g., reliability, transparency) and environmental (e.g., task demands, uncertainty) factors that impact automation use (Endsley, 2017; Parasuraman & Manzey, 2010). However, extant research has relied largely on verbal psychological theories and tests of partial performance measures, such as accuracy and mean response time (RT), but has made little progress toward quantitative theories of cognitive mechanisms. Quantitative human-performance modeling is advantageous because it can provide theoretical insights, unify interpretation of seemingly disparate data, refine predictions, and predict performance when human in-the-loop testing is not feasible (Byrne & Pew, 2009; Farrell & Lewandowsky, 2010).
In the current research, we applied a quantitative model of decision making (Boag et al., 2019) to automation use in a simulated ATC conflict-detection task. Conflict detection requires humans to decide whether aircraft will violate minimum separation standards in the future on the basis of their altitude, speed, and relative distance from intersection (Loft et al., 2009). Decision aids are increasingly used to maximize airspace capacity. Conflict detection is particularly suitable to examine the cognitive dynamics of human–automation interaction because decision uncertainty can trade off with temporal pressure (Loft, Sanderson, et al., 2007), and it is representative of other work contexts in which operators make judgments about moving objects on displays (e.g., unmanned vehicle control, maritime surveillance).
A Model of Human Adaptation to Automation
Evidence-accumulation models are the most successful class of model for understanding speeded binary choice decisions (Ratcliff & Smith, 2004). We model conflict detection with the linear ballistic accumulator (LBA; Brown & Heathcote, 2008), an evidence-accumulation model in which evidence for each possible decision accrues linearly and independently, and the first accumulator to reach threshold determines the decision made (Fig. 1).
Statement of Relevance
In modern workplaces and industries such as aviation and health care, automated decision aids that recommend actions to humans are increasingly prevalent. However, automated advice can be wrong, creating the potential for humans to act on incorrect advice or to disregard correct advice, with potentially catastrophic consequences. Although prior research has identified several factors that impact how people use automation, there are currently no quantitative theories of the underlying cognitive mechanisms. We present an evidence-accumulation model of the cognitive process by which humans use automated advice to decide whether aircraft would violate requirements for minimum separation. The model provided a good fit to the observed effects (i.e., increased participant accuracy with correct advice, decreased accuracy and increased response time with incorrect advice) and indicated that advice from the decision aid was used to inhibit the task response that was incongruent with that advice. The model provides a tractable, quantitative theoretical framework for understanding how humans use automated advice.

Linear ballistic accumulator (LBA) model of conflict detection. Evidence for each accumulator begins at a start point, drawn from a uniform distribution, and increases at an accumulation rate, which is drawn from a normal distribution. The first accumulator to reach threshold determines the observed response.
The LBA provides estimates of the cognitive processes underlying observed performance. Accumulation rates index how fast evidence accrues toward each response. They are classified as either matching (i.e., accumulation toward the correct response, e.g., responding “conflict” when aircraft are in conflict) or mismatching (i.e., accumulation toward the incorrect response). Thresholds index the amount of evidence required for each decision and reflect cognitive control. For example, raising thresholds increases accuracy but at the cost of slower responding.
In the model, decision aids provide inputs to the decision process (automation inputs), integrated with task information (stimulus inputs) in a feedforward manner (Fig. 2; see Boag et al., 2019; Strickland et al., 2018). Accumulation rates are simultaneously increased by excitation from stimulus inputs that match the response and decreased by inhibition from mismatching inputs, an assumption consistent with the finding that biased information affects decision making via a constant effect on rates (Hanks et al., 2011). For example, processing stimulus inputs that match a conflict response (e.g., close predicted relative arrival times of aircraft; Loft, Neal, & Humphreys, 2007) excites the conflict accumulator and inhibits the nonconflict accumulator. Similarly, if a decision aid recommends a conflict decision, it excites the conflict accumulator and inhibits the nonconflict accumulator. The final accumulation rate is determined by summing the inhibition and excitation provided from both automation and stimulus inputs.

Impacts of automation inputs and stimulus inputs on evidence accumulation. Both stimulus inputs and automation inputs can potentially excite corresponding accumulators (solid lines), increasing accumulation rates, but also inhibit the opposing accumulator (dashed lines), decreasing accumulation rates.
The model can answer fundamental questions about automation use. Decision aids are typically provided with the expectation that humans will not solely rely on them. In model terms, overreliance on automation occurs if excitation from the decision aid causes enough accumulation to trigger a response without requiring excitation from stimulus inputs. Given the risks of overreliance, humans may be reluctant to purely use automation excitation when accuracy is important. By contrast, inhibition of decisions that mismatch the decision aid cannot trigger a response without excitation from stimulus inputs but can increase the accuracy of decisions when automation inputs are correct. Thus, when automation is not perfectly reliable, inhibition may be preferable to encourage appropriate automation use while minimizing misuse. In addition, humans might adjust response thresholds. For instance, when automation is less than 100% reliable, they could increase response thresholds to minimize noise in their own decision-making.
The Current Study
We used the model to quantify the excitation, inhibition, and threshold control underlying performance in an ATC conflict-detection task with an automated decision aid. The automation had 90% accuracy, and on the basis of the work by Boag et al. (2019), we expected the task parameters to produce around 90% human manual accuracy, allowing us to study a human–automation team of approximately equal ability.
There were two within-subjects conditions: manual, in which participants detected conflicts unassisted, and automation, in which a decision aid recommended a response. Participants were informed that the automation was not perfectly reliable and were encouraged, with scoring and a financial incentive, to avoid complete reliance on automation. We expected that when the automation was correct, it would benefit accuracy. Thus, on automation-correct trials, accuracy should be higher than on manual trials. By contrast, automation should impair performance when it recommended an incorrect decision, decreasing accuracy relative to performance on manual trials. RT could also be affected by automation. If automation causes excitation, responses that automation recommends should be faster than manual responses. If automation causes inhibition, nonrecommended decisions should be slower than manual decisions.
Method
Participants
To ensure adequate measurement for cognitive modeling—where reliable inference depends on the number of trials per participant rather than the number of participants (Smith & Little, 2018)—we ran the experiment over 2 days, yielding 2,400 total trials for each participant. We tested 27 participants, and excluded the data from three (one participant received the conditions in the wrong order, one had chance-level performance during manual trials, and computer error occurred on one participant’s trials). For the remaining 24 participants (14 female, 10 male), the mean age was 22.29 years (range = 18–37). Participants were from a convenience sample of psychology students and The University of Western Australia’s community research pool. Participants received either course credit or $40 (Australian). For all participants, an additional reward between $0 and $20 was provided on the basis of performance. The study was approved by The University of Western Australia’s Human Research Ethics Committee.
Design
Participants completed two sessions, each lasting approximately 90 min. In each session, participants had one block with automated advice and one without (manual), and condition order was counterbalanced across days. Each block contained 600 trials. The conflict-detection task included two possible response-key assignments, counterbalanced across participants—either “f” for conflicts and “j” for nonconflicts or “j” for conflicts and “f” for nonconflicts.
Materials
ATC conflict-detection task
The conflict-detection task (Fothergill et al., 2009) has previously been used to test expert controllers (Loft et al., 2009), balancing representativeness with experimental control. Figure 3 depicts the display. The sector was 180 nautical miles (nmi) by 112.5 nmi. At the start of each trial, two aircraft appeared within the circular light-gray sector and flew on straight paths toward the intersection. Adjacent to each aircraft was a data block containing the aircraft call sign (e.g., YFH255) and type (e.g., B737 indicates a Boeing 737), flight level (e.g., 370 indicates 37,000 feet), and speed in knots (nmi per hour) divided by 10. A 10-nmi × 20-nmi scale was included on the left of the display, and a probe vector was attached to aircraft indicating their heading and predicted position in 1 min.

An example of the air-traffic-control simulator display. At the start of each trial, two aircraft appeared within the circular light-gray sector and flew on straight paths toward the intersection. Next to each aircraft, its call sign (e.g., YFH255), aircraft type (e.g., B737, which indicates a Boeing 737), current and cleared altitude (e.g., 370 > 370; “370” indicates 37,000 feet), and airspeed in knots (e.g., 64) were displayed. Note that airspeed shows only two digits (e.g., 640 knots would be shown as “64”). A trial countdown timer was displayed on the top left, and a 10-nmi × 20-nmi scale was below the timer. The automated advice about whether to classify the aircrafts’ flight paths as in conflict or not in conflict appeared under the data block of each aircraft. For example, in this screen capture, the decision aid is recommending a “conflict” decision. In manual conditions, this was replaced with a string that had no special meaning (“########”).
Conflict-detection stimuli
Participants judged whether aircraft pairs would conflict in the future (they had no control over aircraft). Aircraft pairs were in conflict if they would simultaneously travel within 5 nmi (laterally) and 1,000 feet (vertically). Altitude for all aircraft was fixed at 37,000 feet, and thus decisions were based on lateral separation. The spatial variables defining the aircraft pairs are listed in Table 1. Conflict status and nonconflict status were created using the lateral distances of minimum separation (dmin). For conflict stimuli, dmin was drawn from uniform distribution U [0, 1.5] nmi. For nonconflict stimuli, dmin was drawn from uniform distribution U [8.5, 10] nmi. The angle of approach of one of the aircraft was randomly sampled between 0° and 360°. The relative angle of approach between aircraft was fixed at 90°. Other features were varied randomly to avoid instance-based learning (Bowden & Loft, 2016). Aircraft speeds were randomly fixed between 400 and 700 knots, and time to minimum separation was randomly fixed between 120 and 210 s. Which aircraft crossed the intersection first (the faster or the slower one) varied randomly across trials.
Ranges of the Spatial Variables Defining Aircraft Stimuli
Note: dmin = distance of minimum separation; tmin = time to minimum separation.
Automated decision aid
During training, participants performed the task with no decision aid. In automated conditions, advice was placed under the data block of each aircraft (Fig. 3). The advice read “CONFLICT,” to recommend classifying the pair of aircraft as in conflict, or “NON-CONF” (nonconflict). In manual conditions, a string with no special meaning (“########”) was placed under the data block of each aircraft.
Participants completed 2,400 trials. In the automated condition, the decision aid failed on one randomly selected trial out of every 10, and equally often for conflicts and nonconflicts. This resulted in 60 automation failures on conflict trials (30 each session) and 60 on nonconflict trials. The nonfailure stimuli in the automated condition for each day were matched to those in the manual condition in terms of aircraft speeds and distances from the intersection. To minimize learning across matched aircraft pairs, we randomized presentation order and angle of approach for each condition (while maintaining a relative angle of 90° between aircraft) and assigned different call signs. The automation-failure stimuli in the automated condition were presented in the same trial positions as in the manual conditions. This provided 60 conflict and nonconflict “matched manual” trials that had matched stimulus properties and trial positions across automated and manual blocks.
Automation-trust questionnaire
After completing the experiment, participants rated their trust in the decision aid (see the Supplemental Material available online). Participants rated six trust questions on a 5-point scale ranging from strongly disagree to strongly agree.
Procedure
Experimental procedure
Participants provided informed consent and then viewed training instructions followed by a demonstration that showed aircraft pairs with different dmin to help participants gauge whether aircraft pairs were in conflict. Subsequently, participants completed 40 training trials. After training, participants completed their first experimental block (either manual or automation). A financial reward, ranging from $0 to $20, was associated with accuracy. The maximum reward for each block was $5. In the manual condition, rewards were calculated with the formula 5/600 × (Ncorrect – Nincorrect – Nnonreponses). In the automated condition, rewards were calculated with the formula 5/600 × (Ncorrect – 9 × Nincorrectaccept – Nincorrectreject – Nnonresponses) Although the automation is highly reliable, it is not perfect. In the event that the automation makes an incorrect recommendation, it is essential that you perform the correct action. Rejecting the automated recommendation when it is actually correct will reduce your performance score and subsequent bonus. Accepting the automated recommendation when it is wrong will result in a substantially greater reduction in your performance score and subsequent bonus.
In manual conditions, participants were informed that there was no automated advice and instead just the string “########” would appear, which they should ignore. They were also told that “incorrect responses will reduce your performance score and subsequent bonus.”
Participants took self-paced breaks between each block and also mid block. After each block, they were presented with accuracy feedback for that block. In the automation blocks, their feedback was broken down into the percentage of trials on which they incorrectly disagreed with the decision aid and the percentage of trials on which they incorrectly agreed with the decision aid. The end of the first block of each session was followed by instructions for the subsequent block. Participants returned for Session 2 within 10 days of Session 1. The procedure for Session 2 was the same as for Session 1, except that Session 2 did not include a demonstration of dmin, and after Session 2, participants completed the trust questionnaire.
Trial procedure
Trials began with an aircraft pair heading toward a common intersection. Participants had 8 s to respond. A trial completed when the participant responded or after 8 s had elapsed. If participants submitted a correct response, the next trial began. If they submitted an incorrect response or did not respond, then they received feedback. The feedback informed participants that they were incorrect and which decision would have been correct (e.g., “Incorrect! This pair was in conflict”). Participants clicked an “ok” button to proceed to the next trial.
Results
Trials were excluded from analysis if participants failed to respond (0.17% of trials) or responded very quickly (< 0.2 s; 0.03% of trials). We report analysis of accuracy and response time (RT) on trials with correct responses (“correct RTs”) using linear mixed-effects models. We examined four factors: stimulus type (conflict, nonconflict), condition (automated, manual), automation accuracy (correct, incorrect), and session (1, 2). For manual conditions, “automation-incorrect trials” refer to trials that were matched to automation-incorrect trials in the automated condition, and “automation-correct trials” refer to the other manual trials. To examine accuracies, we fitted a generalized linear model with a probit link to response accuracy on every trial (either 0 or 1). To examine RTs, we fitted a linear mixed-effects model to the mean correct RTs of each participant. Significance tests of the factors in each model are tabulated in the Supplemental Material, as are follow-up contrasts. Our aim was to identify strong effects. Thus, our significance criterion was set at p < .005 (Benjamin et al., 2018). The standard errors reported use the Morey (2008) bias-corrected method for within-subjects designs.
Accuracy and RT
Summaries of participant accuracy (indexed by the proportion of correct responses) and mean RT (in seconds) are displayed in Figure 4. There were main effects of stimulus type, session, condition, and automation accuracy on participant accuracy. Responses were slightly more accurate on conflict trials (M = .87, SE = .04) than nonconflict trials (M = .86, SE = .04). Accuracy was lower on Session 1 (M = .84, SE = .04) than Session 2 (M = .88, SE = .03). Condition and automation accuracy interacted. Accuracy was higher for trials in which participants were provided with correct automation advice (M = .94, SE = .02) than for matched manual trials (M = .89, SE = .01) and lower when participants were provided with incorrect automation advice (M = .73, SE = .01) than for matched manual trials. This suggests that participants used automation to their advantage, but when automation failed, it imposed a cost. However, importantly, accuracy on automation-incorrect trials was far from floor, suggesting that participants did not rely on the automation entirely.

Mean accuracy (top) and response time (bottom) as a function of automation accuracy (correct, incorrect) and condition (automated, manual). Results are shown separately for each stimulus type (in columns) and session (in rows). Error bars show standard errors, calculated using the Morey (2008) bias-corrected method for within-subjects error bars.
There were main effects of stimulus type, session, condition, and automation accuracy on mean correct RTs. Correct responses were slower on conflict trials (M = 2.24, SE = 0.15) than nonconflict trials (M = 2.06, SE = 0.14). Condition and automation accuracy interacted. Correct RTs were similar on trials in which participants were provided with correct automation advice (M = 2.08, SE = 0.1) than on matched manual trials (M = 2.03, SE = 0.09). By contrast, they were much slower on automation-incorrect trials (M = 2.48, SE = 0.15) than on matched manual trials (M = 2.02, SE = 0.09). Condition and session interacted. In Session 1, RTs were slower in the automated condition (M = 2.61, SE = 0.13) than the manual condition (M = 2.22, SE = 0.09). In Session 2, RTs were slower in the automated condition (M = 1.95, SE = 0.09) than the manual condition (M = 1.82, SE = 0.06), although the magnitude of the difference was attenuated relative to the magnitude in Session 1 and failed to reach the p < .005 threshold.
We conducted exploratory analyses of the correlations between the costs and benefits of automation. We examined whether the accuracy advantage provided by correct automation advice (correct automation-trial accuracy – matched manual-trial accuracy) was associated with the cost of automation on failure trials to accuracy (matched failure-trial accuracy – automation failure-trial accuracy) or RT (automation failure-trial correct RT – matched manual-trial correct RT). The increased accuracy provided by correct automation was positively correlated with the accuracy cost of automation on automation-incorrect trials, r(22) = .79, p < .001. The accuracy increase on automation-correct trials was positively correlated with the correct-RT increase on automation-failure trials, r(22) = .42, p = .04, although this did not reach significance at p < .005. We report correlations between the three above measures and automation trust in the Supplemental Material, although no significant associations were found.
In summary, when the decision aid was correct, participant accuracy was higher than on manual trials. By contrast, when it was incorrect, accuracy was lower than on manual trials. When the decision aid was correct, RTs were not impacted, but when it was incorrect, RTs were slower than in manual trials. These effects generally support an inhibition account of decision-aid use, in which the response incongruent with the decision-aid advice is inhibited. In the next section, we present LBA modeling to formally measure latent processes such as inhibition.
Model results
Model specification
We applied a two-accumulator LBA model (Fig. 1), and each conflict and nonconflict decision was assigned an accumulator. Evidence for each accumulator began at a start point independently drawn from U [0, A]. It then accumulated linearly at a rate drawn from a normal distribution N (v, sv) truncated at 0, until one accumulator reached its threshold b, determining the response. We estimated thresholds in terms of the positive quantity B = b – A. Total RT was determined by decision time plus nondecision time (i.e., the time to encode the stimulus and produce a motor response). To facilitate estimation, we allowed only one A parameter and one nondecision-time parameter for each participant. The variability in mismatching accumulation rates was fixed at 1 as a scaling parameter. One sv parameter indexing variability in matching accumulation rates was estimated for each participant. Mean accumulation rates could vary by stimulus type, condition, and automation accuracy. We estimated separate thresholds for each accumulator, experimental condition, and session. Thresholds did not vary across stimulus type to avoid circularity (if the stimulus type were known, there would be no point in the decision process). However, a reviewer suggested that participants might initially process the aid’s advice without processing stimuli inputs, leading to an initial one-step change in evidence, which would be mathematically equivalent to a threshold change. In the Supplemental Material, we explore a model that allows thresholds to adapt in response to the automation’s recommendation. However, we did not find support for this mechanism, and thus we report the simpler model below.
Model fit
We performed Bayesian parameter estimation using the Dynamic Models of Choice R suite (Heathcote et al., 2019; see the Supplemental Material). This software provides posterior samples estimating probability distributions of the model parameters. Figure 5 displays fits of the model to the data. The model closely fitted the data, including the effects of automation on accuracy and RT.

Posterior predictions of group performance as a function of condition (automated, manual) and automation accuracy (correct, incorrect). Results are shown separately for the proportion of correct responses (accuracy), response time (RT) on trials with correct responses (correct RT), and RT on trials with incorrect responses (error RT). The white circles show model predictions; the black circles show posterior means. Error bars display the 95% posterior credible intervals of the predictions. In the RT graphs, three quantiles are depicted: the 0.1 quantile of RT grouped on the bottom, the median RT at the middle, and the 0.9 quantile of RT at the top.
Parameter inference
For inference, we created a group posterior distribution by averaging the values of each posterior sample across participants. The values of the average model parameters are tabulated in the Supplemental Material. In the following sections, we examine the effect of automation on accumulation-rate and threshold parameters. To test parameter differences, we calculated a one-tailed posterior p, corresponding to the proportion of posterior samples on which one parameter value was higher than another. We report the p value against whichever direction was closest to an observed effect (i.e., a p value near 0 is evidence in favor of an effect). Many effects were significant in the sense that p was less than .005. To estimate effect size, we calculated z, the mean of the parameter differences divided by the standard deviation.
Excitation and inhibition
Accumulation rates are plotted in Figure 6. We compared accumulation rates on automation trials with accumulation rates on matched manual trials. Excitation is indicated by increased accumulation toward the accumulator that agrees with the decision aid (i.e., match). For example, on a conflict trial on which the automation recommends “conflict,” excitation would increase the conflict-accumulation rate. Inhibition is indicated by reduced accumulation toward the accumulator that disagrees with the decision aid (i.e., mismatch). For example, for conflict trials on which the decision aid recommends “conflict,” inhibition would reduce accumulation in the nonconflict accumulator.

Estimates of accumulation rates as a function of automation accuracy (correct, incorrect) and condition (automated, manual), separately for matched and mismatched manual trials. Results for conflict stimuli are shown in the top row, and results for nonconflict stimuli are shown in the bottom row. The shapes indicate the posterior means, and the error bars indicate posterior standard deviations.
Table 2 presents statistical tests of excitation and inhibition. We found evidence of both. However, inhibition was much larger in magnitude. Further, several additional analyses reinforced the finding that inhibition was more relevant to automation use (see the Supplemental Material). First, simulations from the fitted model indicated that inhibition was responsible for the majority of automation’s benefits to accuracy on automation-correct trials and for cost to accuracy and RT on failure trials. Second, exploration of individual differences indicated that inhibition was observed more consistently across participants than excitation and was more responsible for automation benefits to accuracy. Third, inhibition was correlated across participants with accuracy improvements brought about by automation, whereas excitation was correlated with the accuracy costs of incorrect automation.
Tests of Automation-Induced Excitation and Inhibition Effects
Note: The table depicts z, which is the posterior mean of the parameter difference divided by its standard deviation, and p, which is the one-tailed posterior probability against there being an effect.
A final supplemental analysis included an additional model that allowed excitation and inhibition to vary over the session factor, to explore learning effects. This analysis indicated qualitatively similar patterns over experimental sessions, showing a smaller (but still substantial) inhibition effect in Session 2, corresponding to the smaller effects in our conventional results.
Threshold effects
Threshold estimates are plotted in Figure 7. Statistical tests of differences in thresholds across automated and manual conditions are shown in Table 3. Overall, automation had little effect on thresholds. In Session 2, both conflict and nonconflict thresholds were slightly higher in manual than automated conditions. Simulations suggest that these effects did not substantially affect performance (see the Supplemental Material). Thus, automation primarily affected participants’ evidence accumulation, showing little evidence for shifts in speed/accuracy trade-off or bias.

Estimates of thresholds as a function of accumulator type (conflict, nonconflict) and condition (automated, manual), separately for each session. The shapes indicate the posterior means, and the error bars indicate posterior standard deviations.
Tests of Differences in Thresholds Across Automated and Manual Conditions
Note: The table depicts z, which is the posterior mean of the parameter difference divided by its standard deviation, and p, which is the one-tailed posterior probability against there being an effect.
Discussion
We used a quantitative model to illuminate the cognitive processes by which humans use automation in an ATC conflict-detection task. The automated decision aid increased accuracy when correct but decreased accuracy when incorrect. Correct RTs were longer when the decision aid was incorrect, demonstrating an RT cost from failed automation. Our evidence-accumulation model provided a good fit to the effects of automation use, indicating that advice from the decision aid was primarily integrated into the decision by inhibiting evidence toward the response incongruent with that advice.
Participants may have used inhibition to integrate the decision-aid advice because this would increase accuracy without directly increasing the evidence (excitation) in either response accumulator, thereby avoiding the risk that decisions could be made solely on the basis of the decision aid (i.e., complete reliance on automation inputs). Although we also found small excitation effects, it is unlikely that this was strong enough to trigger a response without sampling task information (stimulus inputs). Supplementary analyses, including simulations and exploration of individual differences, provided further support for the idea that inhibition can improve accuracy while minimizing misuse, whereas excitation can lead to risk of misuse.
The asymmetry observed between automation-induced inhibition and excitation may reflect a broader property of human information processing. In conflict tasks such as the Stroop task, participants are slower to respond to stimuli with incongruent dimensions (e.g., identifying that the word “red” is printed in green) than without (e.g., identifying that the word “stage” is printed in green), referred to as interference (MacLeod, 1991; also see the picture–word interference task; Starreveld & La Heij, 2017). Participants can also be faster to identify stimuli with two congruent dimensions (e.g., identifying that the word “green” is printed in green) than without, referred to as facilitation; however, this effect is much smaller (MacLeod, 1991). This asymmetry mirrors our results, in which decision aids caused strong inhibition but only weak excitation. However, in our task, participants were aware that the decision aid was informative and were encouraged to integrate it with stimulus inputs, whereas in the Stroop task, they are requested to base decisions solely on one information source. Stroop conflict is attributed to interference from an automatically retrieved competing response, whereas in our model, inhibition arises because of mismatching inputs. Nonetheless, both our task and conflict tasks require participants to execute a response potentially cued by two conflicting sources of information, and thus it is reasonable to expect similarities in the underlying processes.
The success of our model in accounting for the effects of automation is a critical first step to moving beyond identifying disparate factors affecting automation use and toward the identification and quantification of the cognitive mechanisms underlying automation use. Our tractable quantitative modeling framework has the potential to be used to generalize and unify findings across the automation literature.
Practical implications
To the extent that humans integrate decision aids into their decisions via inhibition rather than excitation, incorrect decision aids are likely to slow RTs. Thus, in situations with high time pressure, providing imperfect decision aids may produce undesirable RT costs. However, in situations without time pressure, inhibition may boost accuracy, making decision aids desirable. Had we found that decision aids caused excitation rather than inhibition, this would suggest that decision aids can speed up performance and alleviate time pressure, but this was not the case. Given the apparent importance of inhibition for interacting with automation, inhibitory abilities may be a desirable quality to either train or select for in work contexts that require humans to interact with less-than-perfect decision aids. However, more work is needed to identify the boundary conditions under which inhibition is the mechanism underlying automation use. For example, our task did not require visual search, whereas many dynamic display tasks do. In situations in which automation can reliably direct visual attention, it seems likely that it would improve rather than cost RT.
In many human-factors studies, including in field settings, there may not be enough observations to directly apply our model. Fortunately, our findings highlighted a characteristic pattern in the manifest data that was associated with inhibition. Accurate decision aids benefited accuracy, inaccurate decision aids reduced accuracy, and inaccurate decision aids slowed mean RT. Identifying this pattern provides a means to identify inhibition mechanisms in situations that are more difficult to cognitively model, such as high-fidelity task simulations and field studies.
A longer term practical implication is the advancement of a human performance model of human–automation interaction, which could be used to predict performance and inform work design. With its dynamic decision front end that predicts choice and RT, our model could provide a crucial link from performance data to cognitive architectures such as ACT-R (Anderson & Lebiere, 2014), which in turn could be inputted to broader task network architectures such as the Improved Performance Research Integration Tool (IMPRINT; Samms, 2010), to account for system-level work performance (Lebiere et al., 2005).
Future directions
One key direction is to examine how the cognitive mechanisms underlying automation use change under different conditions. For example, the current automation was approximately equal to human ability. When using automation known to be more accurate than they are, humans may implement excitation-based strategies, possibly even to the extent that they routinely base their decisions solely on automation input. Similarly, the cognitive mechanisms underlying automation use may vary depending on the relative cost of errors when automation fails and succeeds. We instructed participants in a way that made the possibility of automation failure salient (incorrectly agreeing with automation was more costly than incorrectly disagreeing), to encourage them not to rely entirely on automation. If instead the cost of accepting faulty automated advice is low, the mechanisms underlying automation use may differ. Another direction is to extend our model to account for the effects of time on task and practice. Our supplemental analysis suggested that in participants’ second experimental session, inhibition and subsequent behavioral effects, although remaining substantial, were reduced. This suggests that it would be fruitful to pursue future studies to examine models of automation incorporating learning and adaptive processes.
Supplemental Material
sj-pdf-1-pss-10.1177_09567976211012676 – Supplemental material for Inhibitory Cognitive Control Allows Automated Advice to Improve Accuracy While Minimizing Misuse
Supplemental material, sj-pdf-1-pss-10.1177_09567976211012676 for Inhibitory Cognitive Control Allows Automated Advice to Improve Accuracy While Minimizing Misuse by Luke Strickland, Andrew Heathcote, Vanessa K. Bowden, Russell J. Boag, Micah K. Wilson, Samha Khan and Shayne Loft in Psychological Science
Footnotes
Transparency
Action Editor: Sachiko Kinoshita
Editor: Patricia J. Bauer
Author Contributions
L. Strickland, S. Loft, V. K. Bowden, R. J. Boag, M. K. Wilson, and A. Heathcote designed the study. L. Strickland analyzed the data. S. Khan conducted testing and data collection. L. Strickland wrote the manuscript, and S. Loft, V. K. Bowden, R. J. Boag, M. K. Wilson, and A. Heathcote provided critical feedback and revisions. All the authors approved the final manuscript for submission.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
