Abstract
Objective:
The aim of this study was to validate the strategic task overload management (STOM) model that predicts task switching when concurrence is impossible.
Background:
The STOM model predicts that in overload, tasks will be switched to, to the extent that they are attractive on task attributes of high priority, interest, and salience and low difficulty. But more-difficult tasks are less likely to be switched away from once they are being performed.
Method:
In Experiment 1, participants performed four tasks of the Multi-Attribute Task Battery and provided task-switching data to inform the role of difficulty and priority. In Experiment 2, participants concurrently performed an environmental control task and a robotic arm simulation. Workload was varied by automation of arm movement and both the phases of environmental control and existence of decision support for fault management. Attention to the two tasks was measured using a head tracker.
Results:
Experiment 1 revealed the lack of influence of task priority and confirmed the differing roles of task difficulty. In Experiment 2, the percentage attention allocation across the eight conditions was predicted by the STOM model when participants rated the four attributes. Model predictions were compared against empirical data and accounted for over 95% of variance in task allocation. More-difficult tasks were performed longer than easier tasks. Task priority does not influence allocation.
Conclusions:
The multiattribute decision model provided a good fit to the data.
Applications:
The STOM model is useful for predicting cognitive tunneling given that human-in-the-loop simulation is time-consuming and expensive.
Keywords
Introduction
In this article, we present and validate a model of the time sharing of two tasks relevant to an astronaut’s mission. The tasks are configured such that they cannot easily be done concurrently. Time sharing or multitasking can often be considered to take place in two different “modes” (Salvucci, 2013; Wickens & McCarley, 2008), defined by concurrent and sequential processing. In concurrent processing, two tasks are performed simultaneously, although perhaps at a degraded level, such as driving while conversing or mind wandering. Concurrent performance models of multitasking have focused on the use of multiple resources (Navon & Gopher, 1978; Wickens, 2008). Recently, a model of threaded cognition (Salvucci, 2013; Salvucci & Taatgen, 2011) has been proposed, which addresses both concurrent and sequential processing. These models, and the paradigms that generate data for their validation, are primarily focused on dual tasking (i.e., time sharing of two tasks).
Sequential task performance, in contrast, is often observed when there are more than two tasks confronting the operator, particularly in high-workload, high-tempo environments where it may be impossible to perform them concurrently. One task must be chosen at a time and other tasks “shed” or deferred until later. A primary focus of modeling in this case will be in “what drives the switch?”
Sequential task switching or task management can be characterized by different paradigms, measures of performance, dominant theories, and models that are often derivatives of the theory. We distinguish five of these paradigms/theory-defined “clusters” as follows.
First, perhaps the most dominant paradigm in this area has been interruption management (see Wickens, Santamaria, & Sebok, 2013, for a summary of 35 studies). Here the operator performs two tasks sequentially. These tasks are typically referred to as an ongoing task (OT) and an interrupting task (IT). When the IT arrives, research interest has focused on the time to switch to it, the time to resume the OT after the interruption is finished, and the overall quality of performance of both tasks. This paradigm is a close cousin of the “dual-stimulation” task, in which, while processing one stimulus in a response time (RT) task, a second one arrives, and a key measure is the delay to process the second. In both paradigms, the interest is focused very much on processing time (Pashler, 1998). The foremost theory of interruption management has been memory for goals (Altmann & Trafton, 2002; Trafton & Monk, 2007), and this theory and its derivative model have accounted for much of the data.
A second paradigm, less populated by research but closer to the target of the current experiments, has been that of voluntary task switching (VTS), in which the operator is confronted with (typically) two tasks but is allowed to alternate between the two, at whatever rate and, sometimes, with whatever bias to one or the other is preferred (e.g., Arrington & Logan, 2004; Katidioti & Taatgen, 2014; Payne, Dugan & Neth, 2007; Salvucci & Bogunovich, 2010). A dominant theory underlying the model in this paradigm has been that of threaded cognition (Salvucci & Taatgen, 2011), in which the two tasks make demands on multiple resources, and task selection will be heavily determined by the unavailability of nonsharable resources to one task or another. In this research, the focus is less on the speed with which a switch is made and more on the relative time spent on each task, or the global pattern of “interleaving” between them. Here again, successful validation of the threaded cognition model has been achieved (e.g., Janssen & Brumby, 2010).
A third paradigm is that of supervisory sampling and control (Freed, 2000; Klatzky, 2000; Sheridan, 1970, 2007). In this research area, operators interact with displays, and scanning behavior or interaction often serves as a proxy for an actual task; operators are assumed to move attention from task to task, much as the eyeball scans from display to display (Tulga & Sheridan, 1980). The major theories and models here are based on optimal queuing theory (Moray, 1986; Moray, Dessouky, Kijowski, & Adapathya, 1991), and the issues here are the extent to which people’s selection strategy, across a host of “tasks” that may vary in their importance or other attributes, departs from an optimal schedule. This approach, in contrast to the first two, has incorporated true multitasking (>2).
A fourth paradigm, of sequential multitasking, directly examines how visual scanning reflects the properties of the tasks, served by the visual information. For example, sequential multitasking deals with how the pilot scans his or her aircraft instrument panel, when each instrument might serve different tasks, such as the control of heading or altitude or the display of navigational hazards (Wickens, Goh, Helleberg, Horrey, & Talleur, 2003). A dominant model here, based heavily on the queuing work of Sheridan (1970; Moray, 1986) and the optimum expected value theory from decision making, is the SEEV (salience, effort, expectancy, value) scanning model (Wickens, 2015).
A final theory and model, examined in the current research, which borrows attributes from all of the above, is the strategic task overload management (STOM) model (Wickens, Gutzwiller, & Santamaria, 2015). In contrast to interruption management, STOM allows focus on more than two tasks, and no particular task is assigned a primary “ongoing” or “interrupting” role. Like the VTS literature, STOM focuses less on the time required to switch or the quality of performance than on the decision to switch and the time spent performing one task or the other. Like typical supervisory/sampling theories and models, tasks are characterized by “attractiveness” attributes that drive the focus of attention to them. Indeed, three of the same attributes that drive visual scanning in the SEEV model (salience, effort, and value; expectancy is the outlier that is not included in STOM) drive the decision to select one task or another.
STOM is a multiattribute decision model that asserts that under high multitask workload, the tasks that may be switched to, avoided, or subject to unwarranted “cognitive tunneling” (a reluctance to switch away; Dehais, Causse, & Tremblay, 2011; Wickens & Alexander, 2009) can be predicted on the basis of each task’s ranking on each of four critical task attributes. In combination, these four attributes can determine the net attractiveness (to be switched to or continued) or its inverse, repulsion (to be avoided or abandoned rapidly after only a short period of performance). These four attributes are as follows.
Priority: Priority refers to the relative importance of a task. It is analogous to the value property in the SEEV model and can often be established through mission analysis. For example, a safety-critical task, such as maintaining stability in an aircraft (keeping it from stalling), should be of higher priority than one of communicating with air traffic control (Helleberg & Wickens, 2003; Schutte & Trujillo, 1996).
Difficulty: This attribute (like effort in SEEV) is associated with the mental workload imposed by a task. Here, on the one hand, some empirical data show (e.g., Arrington & Logan, 2004; Kool, McGuire, Rosen, & Botvinick, 2010) and intuition supports the conclusion that easier tasks tend to be more attractive than more difficult ones. “I’ll get this simple little task out of the way first, before I tackle the hard job.” Such a view is compatible with an inherent “effort-conserving” approach that people may apply in busy circumstances (Kahneman, 2011; Wickens, 2014). On the other hand, we can also identify a counteracting tendency for a more difficult task to be “more attractive” once it has been switched to and is now an ongoing, rather than alternative, task.
Although the first tendency, to avoid switching to a more difficult task, was statistically supported in the meta-analysis, most of the experiments that supported effort aversion in the VTS research were tasks involving the choice between two different versions of a single game (Kool et al., 2010, Payne et al., 2007), and only one of these versions appeared to be truly high workload (Kushleyeva, Salvucci, & Lee, 2005), and this study showed only a very slight 3% preference to chose the easier task.
Regarding the second tendency for “task stickiness” for the OT, the meta-analytic data were entirely ambivalent. From both the task-switching data of Wickens, Gutzwiller, et al. (2015) and the interruption management meta-analysis data of Wickens et al. (2013), more difficult OTs were switched from equally to easier tasks. In addition, the interruption management memory-for-goals theory predicts that tasks that are more demanding of working memory (and hence more difficult) may be preserved longer once initiated (switch resistance) because of the consequences of leaving the task while its goal status remains incomplete.
Interest or engagement: This attribute (which has no SEEV correlate) has been rarely examined in multitask workload overload research but would seem to be operating, for example, in the behavior of a driver who becomes so engaged or involved in an interesting cell phone conversation that he or she fails to switch attention to the task of monitoring the roadway for unexpected hazards (and collides with one of them; Horrey & Wickens, 2006). In this example, interest would seem to trump priority, since avoiding a collision is clearly of higher priority than conversing. Spink, Park, and Koshman (2006) found the prominent role of interest in task selection, although this study was not carried out in a multitasking high-workload environment, where STOM is designed to be most applicable.
Salience: This attribute (the same as in SEEV) is explicitly defined as the ability of the arrival of a task to “call attention to itself,” so, for example, an auditory task (phone rings) would do better (higher salience) than a visual task (message pops up on computer screen) at drawing attention away from an OT. However, both of these sensory attributes are more salient than tasks depending solely on prospective memory (Dismukes, 2010; Loukopolis, Dismukes, & Barshi, 2009), such as a pilot’s needing to remember to lower the landing gear at a specific time. Its role in task switching was established in a meta-analysis of modality differences in ITs (Lu et al., 2013).
Each of these four attributes has a “polarity” governing its attractiveness (a high-priority, easy, interesting, and salient task will be switched to frequently, and operators may be slow to leave such a task once it is the OT). But these attributes may differ in their “weights” and hence how they trade off against one another. For example, will a high-priority difficult task trump a lower-priority easier one?
In the empirical research reported here, we asked participants to rate different tasks on the four STOM attributes, to assess—by direct comparison with actual task selection performance—how well these attribute ratings predict task-switching behavior. Such behavior is manifest in both how often a task is switched to and how long the task is performed before being “left” for a different task.
Our research addresses the focus of attention to tasks. In this context, attention is a cognitive/performance concept, and in both experiments we associate the allocation of attention to the observable actions associated with the task. Also, in Experiment 2, we couple measurement of actions with the measurement of head/eye movement. But this latter measure is of course imperfect since where one looks does not always correspond to what one is doing or thinking about (Wickens & McCarley, 2008), and visual scanning is, of course, “blind” to the allocation of attention to auditory tasks. Across both experiments, and in particular in Experiment 2, which exploits converging evidence from both actions (the “mindball”) and visual looking (the eyeball) to widely separated areas and uses only visual tasks, we infer that we have captured a large degree of variance in the allocation of attention to tasks.
In the following, we present two experiments that inform the validity of the STOM model. Experiment 1 uses four tasks and Experiment 2 is in a dual-task environment. In Experiment 1, whose details are reported elsewhere (Gutzwiller, Wickens, & Clegg, 2014), we do not attempt a full STOM validation but, rather, examine how particular task attributes of difficulty, interest, and priority influence the allocation of attention between the four synthetic laboratory tasks of the Multi-Attribute Task Battery (MATB II; hereafter MATB). From this research, we draw conclusions about the relative role of the three attributes. In Experiment 2, we examine task selection in a potential real-world astronaut multitasking scenario. We assess task salience as well and examine all the circumstances that can provide a true quantitative validation of STOM and inform us of the currently ambiguous role of task difficulty as an attractor or a repeller.
Experiment 1
Method
Participants
Eighty-one undergraduate psychology students at Colorado State University participated in return for optional, partial course credit.
Materials
The MATB system was presented on a computer with a standard mouse and stereo headphones, and a Logitech joystick was used for control actions. The task screens, shown in Figure 1, were arranged in a square with approximately 1.16° of visual angle separating two rows and 0.19° separating two columns.

The Multi-Attribute Task Battery interface, including (clockwise from top left) monitoring, tracking, resource management, and communications.
The MATB platform presents four distinct tasks to the operator:
Monitoring: The monitoring task, in the upper left of the display, required participants to monitor for different types of events within the three main elements of the task. The first element is four vertical, randomly oscillating scale measures in the lower half. Each scale updates periodically and the participant attempts to monitor each scale to identify (by clicking on the particular scale) when any scale shows a steady “extreme” position in either the uppermost or lowermost end of the scale. A second element, a green light, is presented, which periodically turns off. Participants monitor this element to ensure the light remains on (clicking it when it is detected to be off, which turns it back on). In the upper right corner, the final, third element—a red light—is presented as an onset event. Participants are asked to detect when the light is turned on by clicking on the red light button, which turns it off. In the current experiment, the monitoring task in MATB possessed all three element events in equal proportions.
Tracking: The tracking task in the upper center of the MATB display was a two-dimensional random-input compensatory tracking task. Participants attempted to keep a circular reticle’s smaller, inner circle positioned on the intersection of the tracking display’s x- and y-axes by providing inputs using a joystick. Tracking-task difficulty was manipulated by varying tracking bandwidth. The tracking task was scored by measuring error in pixel deviations between the reticle center and the center of the axes intersection (at 1 Hz).
Resource management: The resource management task in the lower center of the display was a representation of a fuel management task aboard an aircraft. Participants attempted to maintain fuel levels (within a target level, seen in Figure 1) in two depleting tanks (A and B) by controlling flow of fuel using pumps that directed flow to and from supplemental tanks. Pumps were able to be turned on and off to start and stop flow, and they needed to be coordinated to maintain the threshold levels of the A and B tanks. Pumps siphoned fuel at various rates, listed on the right-hand side of the resource management display. Events in the resource management task were pump failures across all of the pumps. When a pump fails, the pump turns red and is unable to be turned on until a 30-s period elapses, forcing the participant to compensate through other pumps.
Communications: The communications task, located in the lower left of the MATB display, simulated a pilot interacting with air traffic controller requests. Events were auditory messages presented to participants, beginning with a call sign to denote their intended recipient, and conveyed an instructed action. In this experiment, the participant was assigned a call sign. When the message was directed to a participant’s call sign, “NASA 504,” participants responded via mouse in the visual display in the lower half of the communications display. Upon hearing instructions, participants were told to select one of four radios and then change frequencies to a new value, given in the auditory message. The task was completed when a final Enter button was clicked. In general, the communications task takes about 15 s total to complete. If the call sign used in the communications event was not the one assigned to participants, no action was required. The current experiment included both types of communications task events.
Procedure
Participants were introduced to the MATB II simulation through a series of instructional, self-paced slides adapted from Santiago-Espada, Myer, Latorella, and Comstock (2011), the developers of the MATB II simulation. In a between-participants manipulation, participants in one condition (equal priority) were told to perform all tasks as best as possible. In the other (tracking priority), participants were told to prioritize tracking over all of the other tasks while still performing them as best as possible (e.g., Gopher, Weil, & Siegel, 1989). Participants were instructed to perform the tasks with only the single dominant hand. They were not allowed to use two hands or the keyboard to respond. Thus, the participants were required to switch between mouse and joystick when necessary. This critical instruction allowed us to examine task-switching behavior without the confounding influence of concurrent inputs.
Participants completed a training trial containing all of the elements of the MATB simulation used during later experimental test trials. Before beginning the test trials, the experimenter reminded participants of their group instruction (e.g., equal or tracking priority). Subsequently, participants performed three test trials of varying tracking-task difficulty (easy, difficult, and transition). The difficulty of one task in MATB (tracking) was manipulated within subjects by altering the update rate of the tracking task (i.e., changing bandwidth; Wickens & Hollands, 2000) while controlling task input rate. Easy and difficult trials were counterbalanced.
Multiple events in all four tasks were presented, randomly interspersed with single events, and participants attempted to respond to all task events. After the final test trial (transition), in order to help assess attribute weights, a survey was administered that asked participants to make paired comparison ratings to determine which tasks were more difficult to perform, more interesting, and of higher priority. Comparison ordering was mixed between rating variables. All six task pairs were comparatively evaluated on each of the four attributes, and from these comparisons we derived a single measure on each task of its degree of STOM attractiveness on each attribute.
Results
A more complete set of results from Experiment 1 is presented in Gutzwiller et al. (2014). Here, we concentrate on those results that were most informative for the STOM modeling and examined in more detail in Experiment 2.
First, we examined the subjective, paired comparison ratings on the STOM attributes. Subjective ratings for each task, as compared to each other task, were collected. In the case of each task, then, we were able to add the ratings together to get a score of overall task attractiveness based on the model (reversing the direction of the scores for task difficulty, per the model). Thus, out of a potential score from −9 to +9, a negative value means very unattractive and +9 means fully most attractive across all task comparisons.
Table 1 presents the mean ratings on the three STOM attributes rated by participants for each of the four tasks listed in column 1. Because tracking was varied in both priority (between subjects) and difficulty (within a subject across two trials), we present two ratings for priority and difficulty, collapsed across the other.
Attribute Ratings for Experiment 1
Note. The attribute ratings of all tasks/attributes represent the mean value across the two conditions (equal priority and tracking priority) except for the priority rating of the tracking task.
In Experiment 1, we examined several aspects of the voluntary switching behavior between tasks, and we report here those findings that are most informative to the STOM model.
Increasing tracking priority (through instructions) between the two groups of participants, although strongly rated as an increased attribute (shown by the vastly different values of 0.5 and 6.4 in the upper left cell of Table 1), t(71) = −6.81, p < .001, affected neither the frequency with which tracking was “switched to,” t(77) = 0.66, p = .51, nor the mean duration of time that participants “stayed on” tracking before switching to an alternative task, t(77) = −0.08, p = .94.
When tracking was more difficult (in the experimentally manipulated condition), it was switched to less frequently (mean switch count: easy, 48.3; difficult, 43.5), t(78) = 3.55, p < .01. However once attention had been switched to the difficult tracking, attention remained there significantly longer than it did for the easy tracking condition (mean duration: easy, 518 s; vs. hard, 526 s), F(1, 75) = 11.53, p < .01.
The experiment included a restricted number of “competition events” in which, while the tracking task was ongoing, two alternative tasks arrived simultaneously, and we examined which of these two was chosen. Most informative was the simultaneous arrival of a resource management event and communications task event. When these two tasks competed, the communications task was chosen twice as frequently over the resource management event (48.2 vs. 24). A test of proportions reveals that the proportion (48 / 72 = .67) differs significantly (p < .05) from equality. When this choice preference is examined in the context of the attributes of the two tasks shown in Table 1, we can infer the dominating role of task difficulty in making an alternative task less attractive, consistent with the original STOM model. We infer that the communications task is chosen because it is much easier, despite the fact that it is subjectively ranked as lower priority and particularly as less interesting.
Discussion
One message we derive from Experiment 1 concerning the individual STOM attributes is that increasing the difficulty of an alternative task makes it less attractive. This holds true whether the same task (tracking) is increased in difficulty (Analysis 2 in the previous section) or whether an easy and a difficult task are placed in competition (Analysis 3 in the previous section). Importantly, this finding is entirely congruent with the STOM meta-analysis of VTS studies reported by Wickens, Gutzwiller, et al. (2015), who derived the quantitative estimate of a significant 63% “easy task preference” from the collective data of the 11 studies that had manipulated difficulty within tasks. When an alternative task was made harder, it was selected less frequently.
A second message is that the more difficult task (here, the difficult version of tracking) appears to be “stayed on” longer once it has been initiated (i.e., as an OT). This form of “hysteresis” in task switching also appears to be consistent with the results of the meta-analyses of Wickens, Gutzwiller, et al., (2015) as well as that performed on interruption management literature (Wickens et al., 2013). Like a cave with a small entrance, a difficult task may be more difficult to “enter” initially (increasing switch resistance to it), but once inside the cave, it is more challenging to “leave,” so it results in longer stays: The more difficult OT is “stickier.”
The third message is that although people clearly understood (and rated) the priority attribute, it appeared to have little influence on their switch preferences. In a study of high-workload aviation task switching, Raby and Wickens (1994) also found that although priority of cockpit tasks influenced how long pilots stayed on tasks, it did not appear to influence the frequency of switching to those tasks.
Experiment 2
All three of the message elements derived from Experiment 1 help us to interpret the data of the more controlled Experiment 2, in which the STOM model was formally tested. But although the MATB tasks employed in this Gutzwiller et al. (2014) study were more real-world relevant than those examined in many basic switching studies that used tasks like digit classification (e.g., Arrington & Logan, 2004), they still remained relatively abstract versions of actual astronaut tasks. Experiment 1 was driven by a requirement for the use of volunteer subjects who could commit only restricted time to the experiment. In contrast, Experiment 2 examined the validity of STOM predictions with two far more realistic astronaut simulation environments and with well-paid participants, who were provided considerably greater task training. In each of the two task simulations, features could be manipulated to vary its difficulty and hence help to resolve the uncertain role of this critical STOM attribute in affecting switching. The two task simulations employed were as follows:
A relatively realistic spacecraft environmental process-control simulation, called AutoCAMS (Manzey, Reichenbach, & Onnasch, 2012), in which the operator is responsible for managing the mixture of gases, such as oxygen and nitrogen, and fixing minor failures in the system with the assistance of an automated decision aid, called AFIRA (Automatic Fault Identification and Recovery Agent). This decision aid unexpectedly failed to appear, driving task difficulty upward, as participants suddenly needed to do manual diagnosis and repair. Operator tasks included monitoring the process-control system, detecting faults, and repairing the fault.
A robotic-arm control task, employing software, displays, and the kinematics of the realistic training simulator used at NASA for generic robotics training. This simulation is BORIS (Basic Operational Robotics Instructional System; Sebok et al., 2013), in which our participants manipulated the 3-D trajectory of the robotic arm (Li, Wickens, Sarter, & Sebok, 2014; Wickens, Sebok, Li, Sarter, & Gacy, 2015). To make the task more relevant, participants were instructed to imagine an astronaut attached to the end of the arm, moved about for extravehicular repairs. This task could either be highly automated (“easy” version; referred to as autopilot mode) or needed to be done with full manual control (“hard” version; manual mode). Both versions require operator intervention, in switching camera views or adjusting the arm movement rate when traversing corners in the trajectory.
We examined the allocation of attention during eight conditions defined by normal versus failure operation of AutoCAMS, AFIRA-supported versus unsupported failure management, and manual versus autopilot operation of BORIS. The focus on switching, rather than on individual (or overall), task performance in this paper was implemented because the environment was one in which the two tasks were primarily performed sequentially (the context in which STOM is relevant) and because single task performance of each has been well studied and reported elsewhere (for BORIS, in Li et al., 2014; Wickens, Sebok, et al., 2015; and for AutoCAMS, in Manzey et al., 2012; Wickens, Clegg, Vieane, & Sebok, 2015). Importantly, our measures of attention allocation between the two tasks were accomplished through two independent techniques. As described earlier, focus-of-attention allocation was measured by both control activity in each of the tasks and by a head tracker. We assumed, given the wide 60° visual angle separating the most relevant displays for the two tasks, the direction of head orientation corresponded with the focus of visual attention. Our interest is in the ability of STOM attributes to predict both of these focus-of-attention allocation changes.
Finally, we note that although STOM can be applied to multiple (three or more) tasks, as it was in the MATB experiment, in Experiment 2 only two tasks were used, so there were no alternative tasks to choose between. Hence the STOM decision was not “which task to choose when a switch occurred,” but instead, at each iteration, the decision was whether or not to switch tasks. Thus, the important measure was the time on task. This measure has important implications for our model-fitting exercise described later.
Method
Participants
Fifty-six participants were recruited and were paid $45.00 for their participation in the experiment, which ran two sessions and lasted approximately 4 hr. All participants were engineering students or graduate students in psychology. The experimental sessions consisted of seven trials in which robotic-arm and process-control tasks were performed under different conditions, described next.
Robotic-arm operations
The participant was required to move, or supervise the movement of, a simulated robotic arm in a series of three-segment (staple shaped) trajectories as shown in Figure 2: a vertical, single-axis movement up above an obstruction; a turn above the top of the obstruction; a horizontal diagonal movement across the top; another turn; and vertical movement to descend to a point at the other side of the obstruction. When the trajectory had been completed, the movement was reversed and the pattern completed in reverse. This cycle continued until the AutoCAMS scenario (described later) was completed. For more detail, see Li et al. (2014).

The two panels of the Basic Operational Robotics Instructional System (BORIS) simulator. The graphical user interface is shown on the left, and the camera and window views of the robotic arm are shown at the right.
Movement was controlled by two joystick controls. One joystick was used to control x, y, and z movements in 3-D space in which a twist controlled changes in up and down movement, and a second joystick controlled speed (fast or slow) and rotation of the arm and wrist. In the manual mode, the ideal staple trajectory was displayed by a three-dimensional line path to be followed, as shown in Figure 2. In the autopilot mode, this ideal trajectory was executed by an autopilot. In both conditions, the operator was responsible for (a) manually reducing speed as corners were approached, (b) assuring that the arm trajectory avoided hazards, and (c) following guidance to select appropriate camera viewpoints of the workspace that were visible on two displays. Furthermore, in both conditions, participants were instructed to stop arm movement when attention (for more than a brief glance) was directed to the AutoCAMS task. For the autopilot condition, the participant pressed a pause button to stop the arm movement. For the manual condition, movement stopped when the participant’s hands were removed from the joysticks, as the spring loading returned the sticks to a neutral position.
AutoCAMS process control
Participants used AutoCAMS to monitor the fluctuating levels of process variables and to diagnose and repair occasional disturbances of the system, such as leaks or stuck valves. Both the accuracy of diagnosis and the total time to repair the fault were recorded. An important distinction in our modeling approach was the allocation of attention prior to (monitoring) and after the failure until fault management was completed (total repair time). For the first six trials (scenarios), this fault management process was supported by a decision aid called AFIRA. However on the seventh scenario, with no warning, AFIRA failed to provide diagnosis and management support. Participants interacted with AutoCAMS through mouse clicks. More details are provided in Manzey et al. (2012) and Wickens, Clegg, et al. (2015).
The two tasks were configured in the three-screen layout shown in Figure 3. As depicted in Figure 2, the two screens on the right supported the BORIS task, with the rightmost screen providing the four camera views channeled to two displays necessary to support all arm trajectory motion. The leftmost BORIS screen (center “GUI” screen in Figure 3) provides primarily arm mode control information. The left screen is devoted to AutoCAMS. The total visual angle subtended by the three screens was 120°, and participants were requested to sit at a fixed chair location, with their back to the chair and head upright, in order to maintain this relatively constant angle.

The workspace layout.
An Xbox Kinect motion tracker was located above the center screen to track the allocation of visual attention (assessed here by neck rotation of the head) to the two tasks. As shown in Figure 3, the focus-of-attention allocation was dichotomized into that toward AutoCAMS (0° to 40°) and to BORIS (40° to 120°). Our head-tracking data confirmed that most head movements were directly between the AutoCAMS panel and the BORIS camera view (right panel); previous studies involving solely the BORIS task have shown that this four-panel camera view display occupies approximately 90% of the operator’s visual attention, compared to 10% for the graphical user interface (Wickens, Sebok, et al., 2015).
Procedures and instructions
Participants were instructed on how to perform the AutoCAMS task and then the BORIS task. Each training session began with a series of PowerPoint slides. Participants then undertook several single-task trials, blocked for the two tasks, under close experimenter supervision and guidance to assure that the tasks were performed correctly. Supervised training gave us a chance to assess when adequate practice had been given prior to moving into the dual-task training phase. This single-task practice amount varied somewhat from participant to participant (90 min to 2 hr) depending on their speed of mastery of each of the tasks. The single-task practice session was 2 hr in duration, with approximately half of the time allocated to each of the two tasks individually.
Single-task training was followed on a separate day from the training by a series of seven dual-task trials. Each trial lasted approximately 6 min. During each trial, the AutoCAMS system ran normally and simply required monitoring, until a “routine disturbance” (supported by AFIRA) occurred, sometime between 1 and 3 min into the trial. Following the monitoring phase, noticing the failure began a second failure management phase that required the participant to diagnose and repair the fault, which lasted approximately 90 s (contingent upon the skill of the participant). Once the repair was completed, the remainder of the trial continued as another monitoring phase. The first five trials offered ample dual-task practice. Trial 6, containing experimental data, was identical to the first five. Trial 7, the second experimental trial, differed from all other trials in that the AFIRA decision aid was unexpectedly unavailable. The AFIRA window appeared, but it was blank. The blank screen indicated the presence of a failure in which no diagnosis or management advice was available. It was assumed that to the extent that participants had become reliant upon AFIRA during training and the first six trials to assist diagnosis and system repair, they would find themselves in an unexpectedly high-workload period in the failure phase of Trial 7 (Wickens, Clegg, et al., 2015). Trial 7 lasted 10 min and the failure was introduced 3 min into the trial.
The participants were randomly assigned to either a manual or an autopilot BORIS condition. Independent of their assignment, all participants were instructed that both tasks were equally important and life critical. Thus participants in the BORIS autopilot condition were clearly reminded of the criticality of speed control, hazard monitoring, and camera selection, even though their attention was not required for actual arm control.
After the final scenario, participants were asked to provide the four attribute ratings (priority, interest, salience, difficulty) of the two tasks, each task along a 5-point scale. In addition, they rated the difference in difficulty for the monitoring versus failure management phases of AutoCAMS.
Design
A 2 × 2 × 2 mixed-factor design was employed with BORIS difficulty (manual vs. autopilot) varied between subjects, and failure phase (monitoring vs. fault diagnosis and management) and failure type (routine, Trial 6 versus AFIRA gone, Trial 7) varied within subjects.
Results
Data were sampled at 1-s intervals to assess BORIS movement (from xyz arm displacement), BORIS camera and speed mode changes, and AutoCAMS activity (mouse clicks), and the momentary allocation of visual attention between the two tasks as assessed by the head tracker provided data regarding attention switches. For visual switching, we examined both the total number of attention switches and the mean percentage of time allocated to each of the tasks. This percentage-of-time-allocated variable was the criterion against which we validated STOM.
We could also infer the allocation of the task attention to the two tasks by, for AutoCAMS, assessing the number of mouse clicks in AutoCAMS and, for BORIS in the manual control condition, assessing the proportion of time that the arm was in motion. These assessments were made separately during the prefailure monitoring and the failure management phase of both Trials 6 and 7. Because these two phases of AutoCAMS (prefailure monitoring and failure management) as well as the two types of trials (normal: Trial 6 and unexpected AFIRA failure Trial 7) were of different lengths, we divided both click count and switch count by duration to obtain rates (of clicking and attention switching).
Data from a small percentage of participants were discarded because of the failure of the head tracker to provide sufficient attention allocation data. When more than 30% of the 1-s data samples from either AutoCAMS phase was missing because the camera failed to capture head orientation, the entire data for that participant on that trial (6 or 7) were discarded. No outliers were removed from the remaining data.
Attention allocation measures
Separate mixed-model ANOVAs were carried out on each trial (Trial 6: routine fault management; Trial 7: decision aid failure), because some data recording failures left fewer data points available for Trial 7.
Percentage of attention allocated to the AutoCAMS task (see Figure 4), the perfect complement of percentage attention to the robotic-arm BORIS task, showed a significant increase toward AutoCAMS during the more difficult failure management phase compared to the easier prefailure monitoring: Trial 6, F(1, 33) = 73.9, η2 = .68; Trial 7, F(1, 25) = 58.2, η2 = .70; both p < .01. There was also a significant increase in attention to AutoCAMS when the concurrent BORIS task was in its easier autopilot mode, relative to its more demanding manual mode: Trial 6, F(1, 33) = 15.5, η2 = .32; Trial 7, F(1, 25) = 19.0, η2 = .43; both p < .01. The two variables did not interact on either trial: Trial 6, F(1, 33) = 2.49, p > .10, η2 = .02; Trial 7, F(1, 25) = .09, p > .10, η2 < .01. Although the general pattern is similar on both Trials 6 and 7, one difference is statistically and practically significant: For the autopilot group during fault management, a pairwise comparison revealed that the focus-of-attention allocation to AutoCAMS was significantly greater on Trial 7 (86%), when no support was provided (difficult), compared to Trial 6 (75%), where decision support occurred (easier), t(8) = 3.93, p < .01.

Percentage of visual attention allocated to AutoCAMS performance across the failure phases of a trial (prefault monitoring vs. during-fault diagnosis and management). The dashed line represents data from the Basic Operational Robotics Instructional System (BORIS) autopilot subjects and the solid gray line from the BORIS manual subjects. The left graph shows data from Trial 6 (routine abnormality management), and the right graph is from Trial 7 (unexpected Automatic Fault Identification and Recovery Agent failure).
Control activity, or clicks per second, in AutoCAMS (see Figure 5) reflects a similar effect in the focus-of-attention allocation to that of visual attention. That is, a significantly higher click rate was observed during failure management than during the prefailure monitoring stage: Trial 6, F(1, 33) = 64.9, η2 = .66, p < .01; Trial 7, F(1, 24) = 199.6, η2 = .88, p < .01. A higher click rate was observed in AutoCAMS when BORIS was in the autopilot mode than when it was manually controlled for Trial 7 but not Trial 6: Trial 6, F(1, 33) = 1.8, η2 = .05, p = .19; Trial 7, F(1, 24) = 7.4, η2 = .24, p = .01. No interaction was observed between these factors: Trial 6, F(1, 33) = 0.2, η2 < .01, p > .10; Trial 7, F(1, 24) = 2.2, η2 = .01, p = .15. The data also showed a strong effect of trial for the failure management phase for the autopilot subjects, indicating the greater click rate on Trial 7 after AFIRA was unexpectedly removed (M = 0.22) than on Trial 6 (M = 0.16), t(8) = 3.32, p = .01.

Average operator activity in the AutoCAMS task across the failure phases of a trial (prefault monitoring versus during-fault diagnosis and management). The dashed line represents data from the Basic Operational Robotics Instructional System (BORIS) autopilot subjects and the solid gray line from the BORIS manual subjects. The left graph shows data from Trial 6 (routine fault management), and the right graph is from Trial 7 (unexpected Automatic Fault Identification and Recovery Agent failure).
Overall visual attention switching rate (number of switches per second) between the two tasks was also examined (see Figure 6). The rate of switching between tasks was much greater when BORIS was in its easier autopilot mode: Trial 6, F(1, 33) = 31.4, η2 = .49, p < .01; Trial 7, F(1, 25) = 12.0, η2 = .32, p < .01, than when it was in manual mode. There is also a higher rate of switching during monitoring than during failure management: Trial 6, F(1, 33) = 71.1, η2 = .60, p < .01; Trial 7, F(1, 25) = 10.1, η2 = .27, p < .01. The interaction between the two factors was significant for trial 6, F(1, 33) = 20.7, η2 = .17, p < .01, but not for Trial 7, F(1, 25) = 2.29, η2 = .06, p = .14. In short, switching declines when tasks become more difficult. Those in the manual condition did not differ in their switch rate from Trial 6 (M = .01) to Trial 7 (M = .01), t(11) = 0.81, p > .10.

Average switch rate between the tasks across the failure phases of a trial (prefault monitoring vs. during-fault diagnosis and management). The dashed line represents data from the Basic Operational Robotics Instructional System (BORIS) autopilot subjects and the solid gray line from the BORIS manual subjects. The left graph shows data from Trial 6 (routine abnormality management), and the right graph is from Trial 7 (unexpected Automatic Fault Identification and Recovery Agent failure).
Performance measurement
Although the previously reported data showed a general equivalence in the pattern of effects between Trials 6 and 7, there were large differences in total repair time between Trial 6 (89 s) and Trial 7 (148 s), t(19) = 3.46, p < .01, reflecting the cost of the unexpected failure of the decision support. These differences were also reflected in a significant decline in diagnostic accuracy from Trial 6 (100% as subjects followed the correct AFIRA guidance) to 91% (manual group) and 80% (autopilot group) on Trial 7, F(1, 18) = 12.2, η2 = .39, p < .01. The difference between groups in unaided failure diagnostic accuracy (that is, Trial 7) was not significant, t(15.2) = 1.18, p > .10.
We also measured the glance rate during the prefailure monitoring period, with a glance defined by a 3-s (or less) visual fixation on AutoCAMS in the absence of any mouse-click activity on AutoCAMS (defined as a glance whether or not the subject paused BORIS during the interval, and for most glances, they did not). These data revealed significantly fewer glances in the more difficult manual mode of BORIS than in the autopilot mode of BORIS for Trial 6 (manual, M = 1.3 per minute; automatic, M = 2.2), t(33) = 2.0, p = .05, but not for Trial 7 (manual, M = 1.6; automatic, M = 2.4), t(24) = 1.3, p = .21.
Our measure of control activity for BORIS in both manual and autopilot conditions was the rate of clicks for both speed changes and camera views; within the manual group only, we also measured the percentage of time in motion along the trajectory. BORIS click rate in autopilot conditions was observed to be higher on Trial 6 (M = 1.27 per minute) than on Trial 7 (M = 1.03 per minute) containing the more demanding unaided AutoCAMS failure management, F(1, 19) = 4.95, η2 = .20, p = .04. Click rate was also found to be marginally higher when BORIS was autopilot controlled (M = 1.33) than when manually controlled (M = 0.97), F(1, 19) = 3.56, η2 = .16, p = .08, with the heavier demand of manual control reducing the degree of compliance with speed reduction and camera choice recommendations. The two variables did not interact, F(1, 19) =.57, η2 = .02, p > .10.
Time in motion (that is, time of the control joystick deflection in either x- or y-axes) for the manual group showed no difference between Trial 6 and Trial 7 and a small, nonsignificant trend for more motion in the prefailure phase (54%) than in the failure management phase (46%). Interestingly, as with the visual attention data, these control activity data also suggest that about half the time was spent engaged in BORIS control. This finding indicates that subjects adhered to the equal-priority instructions.
Discussion
The results were much as anticipated, with large, statistically reliable effects: Participants paid more attention, both visual and cognitive (inferred by action rate) to AutoCAMS when BORIS was easier, when AutoCAMS was more difficult during failure management, and when failure management was further unsupported by the AFIRA decision aid (when BORIS was in its easier, autopilot condition). Next, we apply the STOM model to the visual focus-of-attention allocation, because this is the variable that is equivalent between these otherwise heterogeneous tasks.
Modeling
STOM attributes ratings
Our analysis next focused on the extent to which the four rated attributes (interest salience, difficulty, priority), in isolation and in combination, could predict switching behavior. The mean ratings are shown in Table 2 for AutoCAMS and Table 3 for BORIS for the three different attributes: interest, salience, and difficulty. Priority is not shown in the table because its inclusion did not improve model fit, but it was consistently rated as 3.7 for AutoCAMS and 2.2 for BORIS. The eight rows correspond to the eight data points in Figure 4. The final columns of the tables depict the attractiveness score of each task, based upon our final model, described later.
AutoCAMS Attribute Scores of Interest (I), Salience (S), and Difficulty (D) for Pre- and During-Failure Periods
Note. Total attractiveness of AutoCAMS across the eight conditions is shown. Standard errors shown in parentheses.
BORIS Attribute Scores of Interest (I), Salience (S), and Difficulty (D) for Pre- and During-Failure Periods
Note. BORIS = Basic Operational Robotics Instructional System. Total attractiveness of BORIS across the eight conditions is shown. Standard errors shown in parentheses.
When considering the ratings in Table 2, two important caveats should be noted. First, only difficulty was asked to be rated differently between the two AutoCAMS phases. Second, subjects provided only one rating for both Trial 6 (AFIRA on) and Trial 7 (AFIRA gone). However, we were able to infer that task difficulty was approximately 50% higher in this failure management stage on Trial 7 versus Trial 6 on the basis of analysis of subjective workload carried out on the corresponding trials in Wickens, Clegg, et al. (2015).
Between the two tasks, the data indicate that BORIS is rated of significantly lower priority than AutoCAMS, t(53) = 6.40, p < .01, and significantly less salient, t(53) = 7.20, p < .01. Requiring manual control in BORIS renders it judged significantly more interesting than the rating under automatic control, t(24) = 3.50, p < .01, and more difficult, t(44.7) = 7.03, p < .01. AutoCAMS is rated more difficult during failure management than prefailure, and in the failure management condition (using the heuristic applied from the Wickens, Clegg, et al., 2015, study), AutoCAMS is rated more difficult on Trial 7 (AFIRA unexpectedly gone) than on Trial 6 (AFIRA available).
In Table 2, the differential attribute ratings of interest, priority, and salience for AutoCAMS between the prefailure (monitoring) phase and the during-failure (management) phase was based only on the ratings of 11 of our subjects, as the remaining subjects provided only a single rating of interest and salience for both phases of the AutoCAMS trial. We assumed that the ratings provided by these 11 were a random sample and hence typical of the remaining subjects. This assumption was validated by correlating those attributes that were rated by both groups and observing that the correlation was .94. We also observed that the correlation between the group of 11 and the full cohort in attention allocation percentage across the eight conditions was also .94.
Model fitting
Based on data from Experiment 1 that showed that priority had no effect, and on preliminary examination of the current data (see Sebok, Wickens, Sargent, Clegg, & Jones, 2015, for details), we excluded priority from the STOM model. Furthermore, although the attribute of difficulty in more than two task applications was assumed to be a repeller for the alternative task, as also found in Experiment 1, there is some ambiguity as to how this attribute behaves in a dual-task context. If the easier of the two tasks is not being performed (it is an alternative task), then it should attract attention more strongly than if it is difficult. Yet in this same situation, the harder task is, by definition, the OT, and Experiment 1 provided evidence for an increase in resistance to switch away from the harder task, whereas the two meta-analyses were neutral on this point (Wickens et al., 2013; Wickens, Gutzwiller, et al., 2015). Thus it is not clear which of these forces will dominate when two tasks differing in difficulty are competing for the allocation of attention. However, a quick review of the Experiment 2 data, evident in the allocation of attention shown in Figures 4 and 5, reveals that harder versions of AutoCAMS and easier versions of BORIS both lead to greater focus-of-attention allocation to AutoCAMS. Hence in this dual-task case, we assign the positive weighting to difficulty as a STOM attractor; the more difficult the task, the longer it will be continued. Thus the model became
where I = interest, S = salience, and D = difficulty.
In the right-hand columns of Tables 2 and 3 are listed the net attractiveness values for the eight conditions, depicted in Figure 4, for each task.
In Table 4 we present the net attractiveness for AutoCAMS relative to BORIS for each of the eight conditions, based on the difference in the right-column model predictions of each task, as computed in Tables 2 and 3. These net attractiveness ratings are shown in combination with their corresponding percentage visual allocation of attention to AutoCAMS for that set of conditions, the same data depicted in Figure 4.
Relative Attractiveness of AutoCAMS Compared to BORIS and Percentage Attention to AutoCAMS for Each of the Eight Conditions of Measurement
Note. BORIS = Basic Operational Robotics Instructional System
When these final predictions of the I + S + D model are correlated with the percentage allocation data, the scatter plot shown in Figure 7 yields a very high correlation of r = .979. (We note here that the correlation with AutoCAMS click rate, shown in Figure 5, was also high; r = .924.)

Predicted attractiveness versus obtained percentage allocation to AutoCAMS (r = .979). Predicted points based on differential ratings of interest, salience, and difficulty for pre- and during-AutoCAMS-failure phases.
In summing up the total “attractiveness points” offered (i.e., adding their absolute values; a value coincidentally = 10), when this value was divided by the average attractiveness over all conditions (a small 0.08 advantage for AutoCAMS), we can predict an overall preference for AutoCAMS. This value is approximately 54%, which, although close, is slightly at odds with the value observed for the visual attention allocation measure, which turned out to be a precise 50% even split and hence corresponded directly to our instructions to equally prioritize the tasks.
Discussion: Experiment 2 Modeling
Experiment 2 was successful in inducing a relatively balanced allocation of visual attention between the two tasks that were designated, in instructions, as equally critical (both high priority). This visual attention allocation, indexed by both head direction and AutoCAMS control activity (clicks), showed a pattern of qualitative effects across the eight conditions that was consistent with intuition and with the role of increased difficulty as a factor that attracted attention. More attention was paid to AutoCAMS (assessed by both visual attention in Figure 4 and the mindball measure of clicks in Figure 5) when the concurrent task (BORIS) was easier (autopilot controlled) and when subjects needed to engage in AutoCAMS failure management rather than just monitoring. The complementary effect was observed for BORIS clicks. Furthermore, during failure management for the BORIS autopilot group, still more attention was allocated when this cognitive process was no longer supported by the now-removed decision aid AFIRA (Trial 7). This differential effect was very strongly expressed in click activity (Figure 5). Assuming that subjects had demonstrated a certain amount of complacency on the six proceeding AFIRA-supported trials (see Wickens, Clegg, et al., 2015), they presumably engaged in several unnecessary clicks on Trial 7, perhaps diagnosing an incorrect failure or undertaking incorrect steps.
The switch frequency data (Figure 6) are also informative of different levels of attention switching between different conditions, particularly in revealing that overall switching decreased when the BORIS task was more difficult, suggesting, as revealed in prior research (Gutzwiller et al., 2014; Wickens, Gutzwiller, et al., 2015) that switching is a resource-limited activity.
Our STOM modeling effort predicted the quantitative differences in attention allocation between tasks across the conditions whose differences could be operationally defined in terms of either different periods of attention measurement (Figure 4) or separate assignment (by subjects) of attribute ratings (Tables 1, 2, and 3). Using the equal-weighting (Attractiveness = I + S + D) model, we accounted for approximately 95% of the variance in the allocation of visual attention.
It is noteworthy that we reversed the polarity of the difficulty influence from a repeller, as defined for the alternative task in the original meta-analysis, to an attractor here, and this reversal requires explanation. As noted in the Introduction, the meta-analytic data that supported the repeller status of difficulty emanated primarily from studies of the choice between playing easy and hard versions of the same game or of selecting among simple cognitive tasks and not from selections of the heterogeneous and complex task sets used in the current experiments, and these results were confirmed in Experiment 1. Also, in the original STOM model, the difficulty effect was found only for the alternative task and not for the OT. Indeed Kool et al. (2010) reported that a more difficult OT had a longer “giving-up time,” or that participants persisted longer with difficult tasks. Hence the role of OT difficulty was ambiguous. In Experiment 2 here, it is clear that more difficult OTs are consistently more “sticky” in resisting switching away, no matter how difficulty was varied.
The absence of priority influence is particularly intriguing given that this null effect was found in both experiments reported here and also replicates a prior finding in aircraft cockpit task-switching behavior (Raby & Wickens, 1994). It suggests that priority may be reflected more in how much effort is placed on a task while it is being performed rather than how often it is switched to or how long it is stayed on. It is also possible that, on the one hand, the null effect of priority observed here could reflect the fact that priority was not explicitly manipulated in Experiment 2; it was 50/50, and indeed, measures of visual attention and BORIS control activity were consistent with this mean allocation policy. Yet on the other hand, participants clearly rated priority of AutoCAMS higher priority (Tables 2 and 3), and there were substantial differences in perceived priority between subjects and conditions, but in analyses reported elsewhere, inclusion of the priority factor in the STOM model was not found to improve model fit and actually decreased it somewhat (Sebok et al., 2013).
Our modeling here has focused nearly exclusively on the focus of visual attention and not cognitive or task attention (as assessed through control activity). However, STOM is also a model of task switching, not just visual switching. In part, this exclusive modeling focus on visual attention allocation results because our analysis of control activity from BORIS is harder to interpret in all conditions. Particularly in the autopilot condition, often the arm continued to move even in the absence of visual attention; and this continued movement occurred sometimes even in the manual conditions, as subjects did not always adhere to the experimental instructions to pause when attention went to AutoCAMS. This emphasis on visual attention focus is also consistent with the fact that in our experimental simulation, both tasks are entirely visual and widely separated in visual angle, and hence, where the operator is looking serves as a close proxy for what he or she is doing.
Finally, we note that in this particular case, when considerable supervision of AutoCAMS is required, even when fault management is not engaged, the glances to that task do represent the mindball as well as the eyeball, even as the former is not represented in click rate.
Current status of STOM
The current status of the STOM model may be best represented by the evolution of the STOM equation: from I + S – D + P to its current version, I + S + D. This evolution is presented in Table 5, as the model has progressed from the original pair of meta-analyses (Wickens, Gutzwiller, et al., 2015; Wickens, Santamaria, et al., 2013) through the results of Experiment 1 to the current status from the results of Experiment 2. The emphasis in the table is on the change or consistency in the polarity of the attributes.
Evolution of Parameters of the STOM Model
Note. AT = alternative task; OT = ongoing task; STOM = strategic task overload management. The polarities of both interest and priority do not change as a function of whether the task is an OT or AT. The salience attribute is applied only to the AT. “–Polarity” means that higher values of the attribute make the task less attractive. “+Polarity” means that higher values make the task more attractive. “0 polarity” means that the task attribute value has no influence on attractiveness.
Table 5 indicates that, as described earlier, the STOM model has evolved somewhat in the AT difficulty and priority parameters, given its initial formulation on the basis of fairly basic laboratory game tasks, in low-workload environments. This evolution was required in order to accommodate data collected in higher-tempo/higher-workload environments, with operationally real tasks used here. In particular, in the original model, as an OT attribute, difficulty was assigned neither a positive nor a negative polarity because of uncertainty in the existing data that were reviewed at that time. Now, with both experiments conclusively pointing to its positive polarity for an ongoing task, this ambiguity appears to be somewhat resolved, and we now argue that the positive polarity is more appropriate for the OT, creating this hysteresis effect with more difficult tasks: One may be less likely to choose them, but once initiated, they will be more switch resistant. Beyond improving model fit, the plausibility of such an assignment is based on the notion that more difficult tasks often induce greater engagement, and even cognitive tunneling, once they are initiated. Clearly, however, the difficulty issue remains less than fully resolved and awaits further research.
We note here that we exercised the analytic version of STOM (Wickens & Sebok, 2014), simply multiplying the weights rather than using the probabilistic discrete event simulation (DES) version, which runs the model multiple times with task choice probabilities proportionately equal to the weights (Wickens, Gutzwiller, et al., 2015). With a sufficient number of fast time simulation runs, these two will provide identical results. However what the DES gives that the analytic equation model does not is the distribution of allocation times. This latter is a critical element because it will provide a measure of the proportion of times that people are likely to stay on a given task longer than some critical safety-relevant limit, that is, the likelihood of cognitive tunneling. The DES model also allows the opportunity to reverse the polarity of difficulty for a task on the iteration in which it becomes chosen as an OT, a reversal revealed in the current research.
One potential application of this work is the demonstration of using a remote (not head-mounted) camera for head tracking for potential future adaptive automation systems. The head-tracking system used in this experiment was unobtrusive, and it allowed us to identify, with reasonable accuracy, which system the participant was using at any point in time. Such a system might have application in adaptive automation, in which the system infers operator behaviors and understanding and offers additional support (increases the degree of automation) if it “perceives” or interprets that the operator needs additional help.
Limitations
The primary limitation of the current modeling effort in Experiment 2 is that it did not employ true robotic experts or astronauts in training. Our decision to achieve a large sample size, enough to attain stable data to model, prohibited this approach, and now, with a stable model as a target, it is certainly appropriate to replicate this work with a more skilled population.
A second limitation was the use of a head tracker as a proxy for visual attention allocation. Here, as noted, we felt that the widely separated displays for the two tasks allowed for a high correlation between the direction of the head and the eyes. However, further work should be carried out with the more precise eye trackers, as has been done in the single-task BORIS context by Wickens, Sebok, et al. (2015).
A third limitation is the absence, at this point, of a comparison of the STOM model predictions of the current data with the predictions made by the most appropriate alternative models of multitasking, particularly, threaded cognition (Salvucci & Taatgen, 2011).
Key Points
In Experiment 1, participants time-shared tracking, communications, resource management, and monitoring tasks, and analysis of switching between tasks revealed the role of salience, difficulty, and interest in driving the switch.
In Experiment 2, well-trained participants time-shared a realistic space environmental control task with robotic-arm manipulation. The former had periodic failures that were both expected and unexpected. The latter was manually controlled by half the participants and received automated guidance support for the other half.
Attention allocation between the two tasks was measured by a head tracker and also by the relative amount of control activity between them.
These attention allocation measures were strongly affected by the independent variables of robotic automation level, environmental control task phase (monitoring vs. fault management), and fault management support.
The attention allocation measures were predicted by the strategic task overload model of task-switching choice in overload, as these switches were based on relative task attractiveness, which was in turn based upon rated task interest, difficulty, perceived priority, and salience. This model accounted for over 95% of the variance in visual attention allocation.
Footnotes
Acknowledgements
This material is based upon work supported by NASA under Grant NNX12AE69G, technical monitors Dr. Brian Gore, Dr. Sandra Whitmire, and Dr. Jessica Marquez. We thank our NASA sponsors and many other NASA professionals who generously participated in interviews, demonstrations, and beta testing sessions. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of NASA. Robert Gutzwiller’s contributions were supported in part by a Department of Defense SMART scholarship through Space and Naval Warfare Systems Center Pacific. We also thank Tyler Scott for his help in data management for Experiment 1.
Christopher Dow Wickens is a professor emeritus of aviation and psychology at the University of Illinois and is currently a senior scientist at Alion Science and Technology, Boulder, Colorado, and professor of psychology at Colorado State University.
Robert S. Gutzwiller is a scientist with the Space and Naval Warfare Systems Center Pacific in San Diego. He received his PhD in cognitive psychology from Colorado State University in 2014.
Alex Vieane is a graduate student at Colorado State University. She received her BA in psychology in 2012 from California State University, Long Beach.
Benjamin A. Clegg is a professor of cognitive psychology at Colorado State University. He received his PhD in psychology in 1998 from the University of Oregon.
Angelia Sebok is a principal human factors engineer and program manager at Alion Science and Technology. She earned her MS degree in industrial and systems engineering from Virginia Tech in 1991.
Jess Janes is a senior at Colorado State University majoring in cognitive psychology.
