Abstract
Teaming permits cognitively complex work to be rapidly executed by multiple entities. As artificial agents (AAs) participate in increasingly complex cognitive work, they hold the promise of moving beyond tools to becoming effective members of human–agent teams. Coordination has been identified as the critical process that enables effective teams and is required to achieve the vision of tightly coupled teams of humans and AAs. This paper characterizes coordination on the axes of types, content, and cost. This characterization is grounded in the human and AA literature and is evaluated to extract design implications for human–agent teams. These design implications are the mechanisms, moderators, and models employed within human–agent teams, which illuminate potential AA design improvements to support coordination.
Introduction
Artificial intelligence systems, including those based on machine learning, have the potential to provide useful effort that can moderate human operator workload (Kaber & Endsley, 2004), enhance situation awareness (Gorman et al., 2006), and improve team performance outcomes (Kaber et al., 2001). However, the vision of systems including tightly coupled human and artificial intelligence that leverage their combined capabilities, as discussed by Licklider (1960) nearly 60 years ago, remains elusive. In fact, a significant proportion of research in human–agent teaming continues to explore the interaction between a human and an intelligent entity, rather than the integrated teams of humans and intelligent entities as proposed in this landmark article. Recent performance increases and cost reductions in sensing, data transport, processing power, storage cost, and algorithm design (Brundage, 2016) are enabling artificial agents (AAs). The resulting AAs are capable of sensing their environment, applying this information to support decision making, and utilizing actuators to influence not only their environment, but other AAs, and human teammates (Weiss, 2013).
The cognitive functions that AAs are addressing are increasingly abstract (Hare & Coghill, 2016) and dimensionally complex (Silver et al., 2017). However, general artificial intelligence is not considered imminent (Goertzel, 2014). Current AAs are assumed to be weak or narrow AAs (Kurzweil, 2005) that, while proficient in their specific task, lack generalized awareness and cognition regarding the world beyond their design. These narrow-focus AAs often address an individual subgoal within a hierarchical goal decomposition of a team’s activity (Miller et al., 2020). As each AA has a narrow focus, multi-agent systems are often comprised of heterogeneous AAs to support higher level goals (Weiss, 2013). The resulting systems of multiple AAs have large state and action spaces, allowing them to respond to a large number of environmental cues, even if individual AA capabilities remain narrow. However, these systems are limited to tasks that are stable over time, have clear and measurable goals, and can be characterized by a clear mapping of inputs to outputs (Brynjolfsson & Mitchell, 2017). Thus, successful deployment of AAs in most complex, real-world environments will require the AAs within a system to be teamed with one or more humans capable of innovating the application of these systems to solve more abstract challenges (Mercado et al., 2016; Rosenberg, 1982).
Following the design philosophy of adapting the design of AAs to support naturalistic human interaction, it is reasonable to design AAs to support team cognition. To facilitate such design, we must understand the aspects of team cognition (Mesmer-Magnus et al., 2017), which are most important in human teams to enable in AAs. A critical feature of human team cognition is the use of communication to build the team cognition necessary to execute interdependent tasks (Salas et al., 2008). We understand that sharing the cognitive load to enable coordination with team members has a cost (Woods & Hollnagel, 2006). Therefore, it is incumbent upon us, as system designers, to clearly understand and design AAs to support appropriate coordination within human–agent teams. This includes permitting teammates to observe, predict, and when necessary, direct activities, as well as leverage common knowledge to control the costs of coordinated activity (Klein et al., 2004).
In this research, we assume that the coordination observed in human–human teams contributes to desired performance levels of human operators in complex situations. Taking this behavior as a design goal for interaction in human–agent teams, we focus on how to develop AAs to perform as a team member, as opposed to designing systems that force humans to adapt to the limitations of AAs. A necessary result of this perspective is the assumption that human–AA coordination can eventually synchronize action as effectively as human team members. This philosophy suggests that our goal is not to redesign human team member coordination. Instead, we assume that it is necessary to adapt AAs to support the available human coordination paradigm if we are to achieve high performing human–AA teams.
The goal of the current research is to review and define team coordination as it relates to human–agent teams. This paper begins by clearly placing human–agent teams in context. We define and characterize coordination in human–AA teams, including content, type, and cost. Analysis of the cost illustrates the importance of improving an AA’s ability to leverage implicit coordination to improve their capacity to act as a teammate with a human. We then discuss the means for designers to improve the coordination of human–AA teams.
Clarifying the Design Goal for Human–Agent Teams
Teams are required to accomplish any task of sufficient complexity or criticality that require greater knowledge, skill, ability, work, or redundancy than a single operator can provide in the available timeframe (Cooke et al., 2013; Salas et al., 1992). In these environments, teams are formed with a sufficient number and diversity of teammates to provide adequate cognitive and physical capacity to overcome the complexity and criticality challenges. The team members not only bring diversified knowledge, skills, and abilities to the team, but train to work together interdependently (Delise et al., 2010). The result is a team comprised of individuals with varied perspectives on the task, which are derived from each operator’s mental models of the task at hand as well as their mental models of their teammate’s ability to respond to task demands (Van den Bossche et al., 2011).
High-performing teams are thought to be comprised of diverse members who are committed to increasing their performance toward common outcomes. The team members collectively possess the skills necessary to address the task at hand, have the interpersonal skills (e.g., social sensitivity, emotional engagement, and communication patterns) necessary to perform as a team, and have the training to understand when and willingness to play specific roles within a team. At a minimum, these roles include creator, leader, and participant (Cheruvelil et al., 2014). As such, high-performing teams will select talented (Noe et al., 2011) and adaptable operators (Cox, 2017) from diverse disciplines (Kearney et al., 2009) and train them intensively (Delise et al., 2010) to focus on clear objectives (McComb et al., 1999). These team members take part in planning, execution, and feedback processes, often referred to as transition, action, and interpersonal processes (Marks et al., 2001), permitting information and plans to be exchanged during the transition and interpersonal processes rather than during more time-critical action processes.
The focus of high-performing teams often extends beyond the team members themselves to include examination and modification of policies and processes that impact their performance (Dickson et al., 2009), creation of domain-specific vocabulary and gestures (Woods & Hollnagel, 2006), as well as customization of supporting hardware and software systems (Cox & Szajnfarber, 2018). When employing supporting hardware and software, these teams often utilize operator creativity to take advantage of detailed control of the systems to extend the software and hardware functions beyond the designed system capacity with minimal design changes (Jacques & Strouble, 2010). As a result, they are able to adapt the system at a pace that outstrips the system development cycle (Cox & Szajnfarber, 2018).
Teams of AAs certainly can possess knowledge and skills that extend beyond those of the human operator, and thus these entities may be desirable members of future high-performing teams. However, the ability to coordinate individual adaptations toward a common goal, which is a capability that is often lacking in multi-agent systems, appears to be a key attribute for members of high-performing teams. Thus, it is important that we decide whether to view multi-agent systems as adaptable hardware and software systems to be modified by the human team members or whether we seek to develop true human–AA teams in which the AAs are capable of coordinating their behavior with other team members in response to the environment and the actions of teammates. The latter will require AAs that are capable of adapting to the needs of the human, as well as the environment, rapidly and reliably (Sycara & Lewis, 2004). While integrated user interfaces and agent transparency improve the operator’s knowledge of the agents, permitting improved coordination (Stowers et al., 2016), the agents typically lack insight into the operator’s understanding of the situation and task. Therefore, current AAs require explicit communication to coordinate (Klein et al., 2004) with their human teammates. Typically, this explicit coordination is designed to occur during the mission, and thus are unable to leverage the less explicit forms of communication and coordination applied in high-performing human–human teams to shift the cost of coordination from periods dominated by action processes to less time critical periods that are dominated by transition and interpersonal processes (Chen et al., 2018). As a result, there are frequent references within the human–AA teaming literature to the need to design AAs to support implicit communication or coordination and intent inference (Espinosa et al., 2004; Riley, 1989). However, these terms are often imprecisely defined. If we seek to design AAs for well-coordinated human–AA teams, we must understand coordination.
A Grounded Example
To support our discussion of coordination, it is useful to ground this discussion in a plausible scenario. Here we consider a notional human–AA team performing a surveillance mission using a remotely piloted aircraft (RPA). The team includes a human pilot, responsible for flying the aircraft. The pilot is supported by an autopilot AA responsible for controlling the aircraft, reducing the physical demands on the pilot enabling them to focus on higher level activities. A human sensor operator (SO) is responsible for controlling the video sensors to perform surveillance. They are supported by a tracking AA, is responsible for controlling a gimballed camera to keep a specific target in the sensor view. The team is tasked with tracking a high value target (HVT) who is initially in a house on the outskirts of a large city. The autopilot is flying the aircraft in a figure eight pattern several kilometers away from the house at the designated altitude, accounting for the wind. The tracking AA is keeping the door of the house in view as the aircraft maneuvers. The human crewmembers are monitoring other intelligence information and maintaining situation awareness of an approaching weather system, while watching for the HVT to leave the house.
Eventually, the HVT emerges, mounts a motorbike, and departs. A flurry of activity ensues, the sensor operator manually acquires the target and resets the tracking agent on the motorbike. The SO communicates to the pilot: “Target is moving down the highway.” At highway speeds, the SO would be hard pressed to keep the HVT in view were it not for the control precision of the tracking agent. The pilot projects that the HVT’s likely destination based on intelligence information and direction of travel and decides how to position the aircraft in response. Shifting the flight pattern further up the highway, decreasing the altitude, and increasing airspeed, the pilot directs the autopilot to reposition the aircraft. The pilot communicates with the SO: “Don’t lose them. Switch to high contrast mode and zoom in to make sure we’ve got the right bike.” SO, responds: “Acknowledged. Be advised we should set up behind the target as they head into the city.”
At this point, the pilot realizes that the autopilot has flown in the opposite direction intended because of the aircraft’s position relative to the new pattern. The autopilot determined that it did not have the control authority to turn toward the new pattern in time to meet the insertion point at the new airspeed. Therefore, it turned away to execute a loop and reach the insertion point correctly. To compensate for the wasted time, the pilot switches the autopilot to hold airspeed and altitude, while manually controlling the RPA through a hard turn back toward the target.
As the HVT heads into the urban canyon of the city, they approach an intersection. The SO zooms out to ensure they capture any direction the HVT turns, while the pilot positions the aircraft beside and slightly ahead of the intersection. As the HVT turns left, the SO zooms in and the pilot repositions the aircraft behind the HVT. The team has done this hundreds of times since they graduated training and they will see the mission through to the end.
Note that within this team, the AAs possess capabilities that extend the knowledge and skills of the human operators. The autopilot can maintain controlled flight, freeing the pilot to attend to higher level goals. Similarly, the tracking AA can adjust the gimbal more rapidly and accurately than the SO to maintain the HVT within the camera’s view. However, our goal is to understand the design of these AAs to support appropriate coordination with their human team members.
Types of Coordination Mechanisms
Salas et al. (2008) note that a critical feature of human teams is the communication employed to accomplish the team cognition necessary to execute interdependent tasks. We will thus refer to coordination as a cyclical communication process, verbal or nonverbal, which enables synchronized actions of teammates who are working on interdependent tasks. Based on this definition, while communication that supports coordination may occur within transition or interpersonal processes, coordination occurs during action processes. We begin by examining the types of coordination mechanisms to clarify the differences between coordination and communication.
Coordination studies have noted and investigated the difference and development of human team coordination and found that coordination behaviors can be identified as explicit or implicit (Strack et al., 2011; Entin & Serfaty, 1999). These types have been mapped orthogonally to other characterizations of coordination (Kolbe et al., 2013). Explicit coordination behaviors or mechanisms are identified by many different terms (Bolici et al., 2016; Butchibabu et al., 2016; Espinosa et al., 2004). For this discussion, explicit coordination refers to communication directly or solely focused on managing dependencies and synchronizing actions. Implicit coordination is described as dependency management without dedicated or purposeful communication regarding synchronization (Butchibabu et al., 2016). Viewed in contrast, explicit coordination involves communication for the single purpose of coordinating activity. Implicit coordination involves communication that is multipurpose, providing context, which can be interpreted to imply activities necessary for coordination. This does not imply that no communication is occurring for implicit coordination; rather, the team cognition is enabling existing communication and perception to be extended and used for coordination as well as its original purpose (Espinosa et al., 2004).
To clarify the distinction between explicit and implicit types of communication and coordination, we propose the following definitions.
Explicit communication: the purposeful exchange of a discrete message (e.g., sensor operator to pilot “Target is moving down the highway”).
Explicit coordination: explicit communication the primary purpose of which is synchronization of action in a team (e.g., pilot tells the sensor operator “We can’t lose them. Switch to high contrast mode and zoom in to make sure we’ve got the right bike.”).
Implicit communication: observable behaviors, other than explicit communication, that convey additional information (e.g., seeing the pilot take over manual control, the sensor operator begins increasing camera zoom to compensate for the increased distance of the HVT from the aircraft).
Implicit coordination: explicit or implicit communication the primary purpose of which is other than synchronization of action. Implicit coordination may involve implicit communication (e.g., pilot observing the SO suddenly move in response to increased attention as the HVT exits the house) or explicit communication (e.g., sensor operator saying “Target is moving down the highway”). Note that the pilot alters their attention and begins taking steps to reposition the aircraft in response to the sensor operator’s comments and actions in the situation.
The proposed relationship between implicit and explicit forms of communication and coordination is shown in Figure 1. Implicit coordination occurs in response to implicit, as well as explicit, communication. Explicit coordination is strictly the result of explicit communication.

Explicit versus implicit communication and coordination.
Also important to this discussion is the fact that the communication can vary from abstract concepts to concrete data depending upon the desired level of control. This fact is recognized in system analysis models such as Rasmussen’s ends-means hierarchies (Rasmussen et al., 1994), natural language processing models (Tomai & Forbus, 2009), and Geddes’ operator intention model (Geddes, 1989). Therefore, while an explicit command pertaining to a high-level goal or abstraction layer provides explicit coordination, it also potentially implies numerous coordination activities. For example, the pilot’s explicit command to “Switch to high contrast mode and zoom in to make sure we’ve got the right bike” is a lower-level command than the command of “Don’t lose them.” The latter requires the SO to determine appropriate sensor settings to improve target custody, whereas the former provides the SO with the proper sensor settings. It might also be noted that to provide a command at a lower level of abstraction requires the sender have more knowledge of the situation than is required to provide a command at a higher level of abstraction.
Multiple studies have demonstrated that coordination shifts from highly explicit toward implicit as teams become familiar with each other and the task (Butchibabu et al., 2016; Espinosa et al., 2004; Mathieu et al., 2000; Rico et al., 2019). Thus, the knowledge gained through experience with explicit coordination events enables implicit coordination by improving each team member’s understanding of the team’s response to a set of conditions. The literature indicates that humans adapt their coordination type depending on their perception of the team’s cognition. If they observe that the team is managing interdependencies well and the individual teammates perceive the team behaving consistent with their expectations, implicit mechanisms are more prevalent. Conversely, if the team is acting unsynchronized or an individual teammate does not perceive that the team behavior is matching their expectations, the teammate will likely revert to explicit coordination mechanisms to improve management of interdependencies and refine team cognition. In the example, the autopilot does not behave in a coordinated manner, by turning away from the target. In response, the pilot explicitly commands the desired heading while leaving the autopilot in control of altitude and airspeed. Importantly, this knowledge of coordinating activity is applied by each team member to select a set of coordinated actions. This is accomplished without the sender issuing specific and highly detailed commands to each team member indicating the actions each team member should take to behave in a coordinated fashion. Coordinating activity is differentiated from general broadcast methods employed by many software agents that send their messages without regard to who the recipients are or what they need to synchronize their actions.
Beyond the type of coordination, the characterization of the messages employed to coordinate is important to examine. We consider the content of coordination as independent from the implicit or explicit type of the coordination; however, understanding what is coordinated among a team helps design better human–AA teams.
Content of Coordination
Several authors within the human factors, cognitive science, and artificial intelligence literature have addressed coordination. A summary of the categories of coordinating communication is provided in Table 1. Klein and colleagues lay out the precursors to coordination and describe the mechanics of what they term the “Choreography of Joint Activity” in which parties signal changes in the phase of activity using coordination devices (Klein & Bradshaw, 2005). They examine the mechanics of how humans execute the coordination cycle, but there is little discussion of the content of coordination, beyond the need for planning the phases and the need to signal changes in phase. Malone and Crowston (1994) developed coordination theory with a focus on the types of dependencies (e.g., resources, task assignment, simultaneity, etc.) that the team is managing. Their taxonomy addresses the foci of coordination and the extant processes particular to each of these foci. Within this body of work, Crowston (1991) develops a model of coordination activity using four language constructs, including information, requests, information requests, and proposed actions. Peterson and Bailey investigated air traffic controllers and developed a domain-specific taxonomy of topics as well as what they termed “communication formats.” These communication formats include question, answer, statement, command, and command answer (Peterson et al., 2001). Recent dynamic modeling research divided coordination into information, negotiation, and feedback, which was applied to the specific human–human team paradigm being studied (Gorman et al., 2010).
Coordination Categories Described in the Literature
Within the agent design literature, speech act theory has provided the foundation for AA communication languages like Knowledge Query Manipulation Language (KQML) and the Foundation for Intelligent Physical Agents’ Agent Communication Language (FIPA-ACL; Vaniya et al., 2011). Vaniya et al. (2011) characterize KQML as having language constructs to include multi-response, response, generic informational, capability definition, and networking. The performatives of FIPA-ACL are characterized by Huget as passing information, requesting information, negotiating, performing actions, and error handling (Huget, 2014). These techniques are heavily tailored to their domains, which do not explicitly cover the human–AA team. However, these studies provide insight since they pertain to human-human teams working with machines or the communication between the AAs.
One can also discuss coordination in terms of speech act theory (Searle, 1969). However, the theory is designed to handle any kind of natural language communication. Coordination in a human-AA team may seldom be expressed in natural language due to the user preference for direct input human-machine interfaces (HMIs) over linguistic methods (Cox et al., 2008; Noyes & Starr, 2007). Therefore, it is necessary to characterize the content of coordination in a human–agent team.
Coordination is fundamentally a type of communication and so it can be expressed in the Shannon-Weaver model of communication (Shannon, 1948). Although it is recognized that communication is clearly cyclic between teammates influenced by their environment (Beer, 1995; Demir et al., 2020), applying the Shannon-Weaver model allows us to examine the types or coordination categories that may be passed from the teammate who is behaving as a sender at any moment in time. This viewpoint is particularly useful when considering coordinating activities of a team leader, such as the pilot in our example. However, it applies to other team members as well. The temporal frame for a coordination event is the time required for the formulation of the communication in the sender, the transmission of the communication through a medium to the receiver, and the receiver’s comprehension of the communication. Drawing on the above sources for inspiration, some categories of coordination are intended to direct the receiving agent’s behavior while other categories do not. Further, some coordination categories are intended to be negotiable between the agents while others are not. Table 2 provides a proposed list of coordination content. The coordination content is arranged by whether it is posed to encourage negotiation and whether executable directions are provided by the sender.
Coordination Content Structured by the Sender’s Expectations of the Receiver
Information is situated data that enhances the awareness of the receiver, including directing the receiver’s attention. An example is our SO’s statement that “Target is moving down the highway.” Note the sender expresses this information in a manner that is concisely comprehensible to the receiver. Information is not negotiable because the sender believes it to be true, to some confidence level. If the receiver does not believe the information to be true, they can respond with corrected information based on their beliefs. The sender does not expect the receiver to perform a specific action based on the information. However, it is expected that the receiver will react appropriately given the assumedly agreed upon information (Gorman et al., 2010). By its broad nature, information provides content that supports both explicit and implicit coordination.
Commands are non-negotiable directives that the sender expects the receiver to carry out as soon as possible. For example, our pilot’s statement “Don’t lose them. Switch to high contrast mode and zoom in to make sure we’ve got the right bike.” The receiver may acknowledge the command by providing an information response indicating acceptance, execution, or completion. Commands cover a spectrum of content from fully specifying the exact activities for execution, to stating a broad goal, allowing the receiver to formulate a plan and decide implementation details (Peterson et al., 2001). In practice, commands are more frequently explicit coordination.
Requests are negotiable directives consisting of actions or tasks geared to achieve goals. The sender expects the receiver to carry out the request, but the receiver may reject or delay the implementation of the request. An example is our SO’s statement: “Be advised we should set up behind the target as they head into the city.” The receiver is expected to acknowledge the request and an information response will be provided, signaling the sender the receiver’s decision to act, or not act, on the request (Klein & Bradshaw, 2005). While the negotiability implies a preference for explicit coordination, mutual observability between team members can enable implicit coordination as the receiver’s actions be can interpreted as acceptance or rejection of the request.
The remaining three taxa are all negotiable and are not directive. As such the information within these taxa requires a common understanding between the sender and the receivers within the team. The first of these are plans.
Plans are future-looking sequences of actions or tasks geared to achieve goals. A team may develop a library of plans during the transition process that can occur prior to execution. The development of plans can reduce cognitive and communication load during the action phase. Alternately, these plans may be developed dynamically during execution. These plans answer the question of “what,” “when,” and occasionally “how” actions should be taken by a team member to achieve the agreed objective. A directive to enact a specific plan is command content, not plan content, which references a plan shared between the teammates (Huget, 2014).
Responsibilities are negotiable assignments of authority and answer the question of “who” is to conduct each activity. It is possible to have multiple operators and AAs responsible for the same task or with the same authority, in which case they should be backing each other up (Cummings, 2014). Again, communication of responsibilities can occur during or outside the action phase.
Expectations are the manner and procedures that the sender desires the receiver to use during execution. When commands or plans do not fully specify “how” to execute a task, the receiver relies upon shared expectations to determine the timing and actions that will produce the desired execution.
These negotiable, nondirective coordination content taxa are essential for coordination. However, content within this category, contrary to items in the directive category, are not required to be exchanged during the action processes and thus do not explicitly appear within the grounding example. Thus, the items in these categories can be explicitly communicated during transitional phases and refined during interpersonal phases. Further, it is frequently information within this category that is likely to be encoded in team mental models, permitting it to transition from explicit to implicit coordination.
Notably, the categories of coordination shown in Table 2 are a subset of the categories of coordination shown in Table 1. Specifically, the categories feedback, command answer, capability-definition, performing action, and error handling are not present in Table 2. Each of these categories requires an expanded temporal frame that includes not only formulation, transmission, reception, and comprehension of the coordinating activity but also response formulation, transmission, reception, and comprehension. Expansion of the temporal frame in this manner captures the fact that coordination requires cyclic communication, but imposes the assumption that coordination is closed with a single - feedback loop. Definition of coordination in one direction with taxa that are relevant to both the feed forward and feedback portions of the coordination loop permits coordination to be described with an indeterminate number of coordinating loops.
However, regardless of type or content, the organization, cognition, and action of synchronizing execution between team members requires cognitive resources. As system designers of human–AA teams, it is imperative that we address the cost of coordination in the systems we design.
Cost of Coordination in Teamwork
Several authors have addressed the cost of coordination in team cognition. Klinger and Klein (1999) suggest that the cost of coordination increases with the addition of team members such that the marginal value of adding additional team members decreases with each increase in team size. As this cost is readily apparent when viewing humans working in teams (Macmillan et al., 2004), it is important to understand this cost and the effect that including additional AAs within a team will have on coordination costs.
These coordination costs have been classified into four categories: synchronization overhead, the time one entity spends waiting for another entity to complete a prerequisite task before beginning its task; communication overhead, the effort required to manage a handoff; redirection overhead, the time spent following an out of date plan after a new plan is signaled but before all entities understand the change; and diagnosis overhead, the additional burden spent diagnosing a problem that occurs as a result of interrelated activity (Klein & Bradshaw, 2005; Schaefer et al., 2016). Klein and colleagues expand on these costs by listing the activities required to support coordination (Klein & Bradshaw, 2005). Among these activities are communication, monitoring, and feedback activities, some of which might be the responsibility of a frequent sender, such as a team leader, and some of which are clearly delegated to frequent receivers, that is, the team members. Each of these lists of costs build upon the work of Clark and Brenan (1991) who provide a list of costs for constructing common ground. A review of this list clearly illustrates that some of these costs, such as formulation and production costs are born by the sender. Other costs such as reception, understanding, and delay costs are primarily born by the receiver. Finally, many of the costs, including start-up, directing attention, asynchrony (e.g., interruptions), change, display, fault, and repair costs, are shared between the entities.
Coordination can occur across a spectrum from fully explicit (e.g., all coordination is explicit, and messages must be complete) to fully implicit (e.g., a sender and receiver conduct activities without explicit communication). Further, the cost of any given coordination exchange may vary significantly in magnitude for different entities within a team. As a result, we offer the construct in Table 3 to explore the cost of communication to the sender and receiver, as it is required for coordination as a function of the interaction of team member role (i.e., sender or receiver). As shown, the cost to each team member is depicted in the two right columns of the table. The two major rows of the table indicate the extremes of coordination type, that is, fully implicit or fully explicit. Each of these two rows are further divided into three types of communication action, namely activity formulation, communication formulation/understanding, and communication production/reception. The activity formulation stage indicates which team member is responsible for interpreting the coordination content in the current context to create the sequence of activities the receiver is to conduct to maintain coordination. The following two rows represent the cognition necessary to support communication of these activities between the sender and receiver, as motivated by Clark and Brennen.
Relative Cost of Communication Required for Coordination as a Function of Team Member Role, Communication Type, and Communication Action
These include the cost necessary to either formulate or understand the content of the communication and the costs necessary to produce or receive the communication. The values within the cells of this matrix indicate an estimate of the relative cost born by either the sender or receiver. These costs include at least the consumption of cognitive resources by the sender or receiver as well as the time required to conduct each of the actions. Each of these costs result in the loss of potential to attend to alternative activities within the work domain. These relative costs, while likely continuous, are estimated as either full, partial, or none within the cells of this table. Additionally, no effort is made to differentiate the relative costs of the three communication actions.
In the case of fully explicit coordination, the sender must formulate all activities for each team member, formulate the communication of these activities, and produce this communication, for example, the pilot specifying the location, shape, altitude, and airspeed of the flight pattern for the autopilot. The receivers must receive this communication, understand the activities to be performed, and of course execute the activities. Importantly, at the extreme, fully explicit coordination requires the sender to formulate each team member’s sequence of activities at the lowest possible, that is, most detailed, level of control. Further, the communication must be formulated at extreme detail and transmitted, placing the full or largest possible burden on the sender. The receiver must receive the full set of communication placing the full burden on the receiver for this action but as the information is fully specified, the receiver does not have to spend significant resources to understand the communication. Instead, the receiver’s only requirement is to translate the communicated steps directly to actions, for example, the autopilot taking the most direct route to a correct flight pattern insertion, regardless of the impacts to other team members. Therefore, the receiver is not required to formulate any actions as would be required to translate commands at a high level of abstraction to actions at a lower level of abstraction.
Fully explicit coordination is exceedingly rare in human–human communication. In the example, the pilot instructed the sensor operator to “Switch to high contrast mode and zoom in to make sure we’ve got the right bike,” the pilot is explicitly coordinating specific sensor settings. The pilot expects the sensor operator to understand this communication and the use of the system implementation to understand that they must formulate a set of activities, which includes setting the zoom to a level appropriate for the target size, maneuvering, and speed, characterizing the target to assess and maintain positive identification, and set the sensor image parameters to achieve and maintain the positive identification. It is more common for practiced human–human teams to coordinate implicitly, with the sensor operator immediately beginning to adjust the sensor settings as the pilot begins taking steps, which signals the end of the static surveillance phase to a more active moving target tracking phase.
As the coordination becomes more implicit the sender can begin to formulate and communicate at a more abstract level, for example by communicating the use of a plan. At the fully implicit extreme, the sender only formulates and communicates information critical to coordination. During formulation, the sender can be less precise, reducing the burden for communication formulation, and the communication is abbreviated. In the example, when the sensor operator zooms out as the target approaches an intersection, the pilot, observing the sensor operator’s action, positions the aircraft to be ready if the target turns down a narrow side street without any explicit communication. In implicit coordination, the sender incurs much smaller cost for formulating and communicating coordinating activities than during fully explicit coordination. As there is less information to receive, the receiver incurs only a part of the cost of reception when compared to the full cost they incurred during explicit communication. As the level of detail in coordinating communication is reduced, the receiver utilizes context information and knowledge from their mental model of the task and team cognition to create an understanding of the communication and its implications for actions. The receivers must then translate their understanding of the communication and environmental context to formulate their own activities. As such, the demands for understanding the communication and formulating activities increases for the receiver.
Returning to the earlier discussion of the lists of the costs associated with coordination from the literature, each cost list focuses primarily upon the costs associated with communication or actions taken by other teammates. However, Clark and Brenan (1991) include the cost of directing attention, which is associated with ensuring that teammates are aware of environmental cues used to trigger changes in future activities. It is rational to partially exclude the cost of perceiving environmental cues from the cost of coordination as this cost is also born by individuals interacting within an environment. However, plans or other coordinating information can include environmental events to trigger changes. Thus, the activity required to observe these environmental events also plays a significant role in coordination. It affects the coordination behavior of the team and in this framework is a critical step in understanding the communication. Therefore, it is rational that the costs associated with environmental awareness includes understanding of the team member’s situation, including whether a team member is able to perceive the environmental cues important to coordination.
Table 4 provides further exploration of the cost of coordination due to each team member’s situation. Similar to Table 3, this table is divided into four quadrants with columns representing the costs incurred by the sender and receiver and rows representing fully explicit and fully implicit coordination. Each quadrant is further divided into a two-by-two matrix.
Cost of Coordination Due to Understating Team Member’s Situation as a Function of Team Member Role and Coordination Type
The columns of the matrix designate the costs incurred by each team member, which is associated with constructing their own understanding of the situation or constructing their understanding of their teammate’s (i.e., the receiver’s) situation. To coordinate effectively, there are elements of team cognition that involve the sender bearing a cost to understand the receiver’s understanding of the receiver’s situation, and the receiver’s understanding of the sender’s situation. The fact that a team member must not only form their own understanding of their and their teammate’s situation but project their team member’s understanding of these situations may not be initially obvious. The need for the latter is discussed in an incident reviewed by Lee et al. (2015) in which a pilot, during a climb to a higher altitude, commanded the autopilot to descend to a lower altitude based on clearance from air traffic control. The autopilot, instead of descending, entered a vertical speed hold mode and continued the climb, as it was programmed to do in this specific situation. The pilot clearly understood their own situation and the receiver’s situation. However, the pilot failed to track the receiver’s understanding of their own situation until the aircraft violated the air traffic control clearance. This resulted in a miscommunication that was fortunately resolved without incident. However, this example illustrates the need for the sender to understand their own and the receiver’s situation, but also the value of the sender’s projection of the receiver’s understanding of their situation. While this example is specific to human–AA communication and might be attribute to poor AA design, similar communication errors occur in human–human communication when an individual fails to appreciate their team member’s understanding of the situation (Marstall et al., 2016).
As shown in Table 4, when operating in a fully explicit paradigm, the sender must remain aware of their own and their teammates’ situations to guide activity formulation as they must decide which team member is in the best situation to perform each required activity. Additionally, they must be aware of their teammate’s understanding of their own situation to understand if their teammate is likely to execute the actions once they are explicitly assigned. For our RPA example, the autopilot required fully explicit communication to fly the aircraft as desired. Since the aircraft position relative to the new flight pattern was insufficient for a turn toward the flight pattern it did the necessary thing and turned away to reach the new pattern correctly. The pilot, as the sender, did not have a clear understanding of the autopilot’s understanding, as the receiver, of the autopilot’s situation. In this situation, the receiver is not required to understand anything about the sender as they will be given exact instructions, for example, coordinates, shape, altitude, and airspeed. A human receiver is likely to maintain situation awareness, but this awareness is minimally required in the extreme where they will be fully instructed on how to behave.
As communication becomes more implicit, responsibility for situation awareness shifts to the receivers. As the receivers must now understand communication from the sender, they must be aware of the sender’s state (e.g., the pilot, as receiver, knowing the sensor operator is tracking a specific target that may choose to turn at the intersection) and their own situation (e.g., the pilot understanding how their positioning of the aircraft impacts the occlusion of the target in an urban canyon). Although this shift reduces the cognitive load on the sender when they are coordinating with a single team member, this change in load increases when we consider the circumstance when the sender is the team leader and multiple receivers are included in the team.
To summarize the information in Tables 3 and 4, explicit coordination, typically, consumes, predominantly, the cognitive resources of the sender who must formulate, as well as communicate an explicit command or request to each team member (Woods & Hollnagel, 2006). Implicit coordination distributes cognitive demand to the receivers. The receivers must then incur the obvious perceptual channel load, verbal, auditory, haptic, or visual, to gather information to serve as cues to the sender’s as well as their own situation. Further, they must execute cognitive processes to abstract the available information, understand the utility as it relates to the context, and formulate the appropriate coordinating activity (Endsley & Kiris, 1995). Implicit coordination reduces the real-time communication load on the team by multi-purposing a given communication event from a sender to permit each teammate to select actions that will be coordinated with the actions of other teammates based on their understanding of each teammate’s situation and their assessment of each team member’s understanding of their current situation. It is important that as teams develop the knowledge required to understand situations and the implications of these situations for coordinating activity, they often develop domain-specific terminology that encapsulates the description of system states with accepted plans (Woods & Hollnagel, 2006). The use of these terms then encapsulates or abstracts the situation and plans to a higher level, facilitating the use of less explicit communication.
The utility of explicit communication to improve team effectiveness can be found throughout the teaming literature. For example, a meta-analysis of 150 team communication studies found that team communication positively influences performance (Lacerenza et al., 2017). Explicit communication, and perhaps, more importantly, communication to support explicit coordination, becomes an issue in time-pressured environments where the pace of execution and volume of information restricts the time available to explicitly communicate coordinating information (Entin & Serfaty, 1999). In such environments, the highest performing teams demonstrate well-coordinated behavior with limited or even no communication by governing their actions from team cognition that allow implicit coordination such that each individual can anticipate the actions of their teammates (Burke et al., 2004; Lacerenza et al., 2017; Stout et al., 1999). For human–agent teams to improve their coordination during time-critical execution, it is important to examine the mechanisms, moderators, and models for implicit as well as explicit coordination.
Designing Improved Coordination for Human–Agent Teams
Although efforts have been made to develop AAs that are capable of engaging in dynamic team interactions, it is recognized that further research is required in this domain (McKendrick et al., 2014; McNeese et al., 2018) and current systems rarely attempt to actively participate in teaming behaviors. Therefore, it is important for designers to understand means to improve the process of coordination in human–AA teams. Since coordination is a process performed by members of a team, it can be influenced by system design. We propose that future AA design must consider the three M’s associated with coordination as summarized in Table 5. These include coordination mechanisms, moderators of coordinating behavior, and models used to coordinate. Each of these concepts and examples are provided in the following discussion.
Mechanisms, Moderators, and Models to Improve Coordination in Human–Agent Teams
Mechanisms
The mechanisms of coordination are the tools used by the team to coordinate (Okhuysen & Bechky, 2009) and have been discussed previously in the literature and earlier in this paper. As discussed, preplanning and debriefing tools (Stout et al., 1999) support transition and interpersonal processes, which are important enablers of model construction to support coordination. Specifically, these tools should focus on permitting negotiable, directive coordinating content (i.e., plans, responsibilities, and expectations) of the mission to be encoded into the AAs or intelligent systems during transition processes and altered during debrief to facilitate implicit or abbreviated explicit communication, reducing the cost of coordination during the action processes.
Further, the literature discusses multiple mechanisms that permit the human to understand the information necessary to coordinate with an agent. These include common operating picture interfaces and shared information displays (Bolici et al., 2016), transparency-focused interfaces (Mercado et al., 2016), and standardized callouts (Miller, 2017; Stanton et al., 2019). Therefore, the design team should consider inclusion of similar mechanisms and potentially customization of these mechanisms during transition processes.
Although significant research has been conducted in building block technologies such as natural language understanding, gesture recognition, and human state estimation, the literature appears to provide limited discussion of technologies that provide mechanisms that aid the AAs in understanding the information necessary to coordinate with a human teammate. It is possible that the implementation of AAs to provide an interpretation of these human methods of communication may be useful in shifting communication from explicit toward implicit means, reducing the cost of communication between humans and intelligent systems.
In the context of human–AA teams, care must be taken in designing the mechanisms to support the coordination of multiple narrow-focused AAs with humans. As discussed in the cost section, explicit communication between a sender and receivers can place a significant burden on the sender, particularly if the system is comprised of multiple receivers. Assuming the multiple AAs in an intelligent system serve as the receivers, the human sender can experience a significant coordination burden. Further, the narrow-focused AAs must be able to coordinate with each other and with the operators to avoid misunderstanding. The narrow AA may be capable of considering significantly more input information and accessing a larger number of alternatives regarding the narrow decisions it is capable of making than a human is capable of in a time constrained environment. However, these AAs lack breadth. This mismatch in the depth and breadth between an individual AA and the human operator can result in misinterpretation of a coordination mechanism with the operator. This observation leads one to consider specialized AAs that serve to maintain a model of the operator and their likely beliefs to provide the mechanism for coordination with humans who are functioning within human–AA teams.
Moderators
There are many factors that may moderate the effectiveness of coordination behaviors and either enhance or degrade team performance. Examples from the human teaming literature include the ability and willingness of teammates to coordinate (Sukthankar et al., 2013), the flexibility of coordination mechanisms (Stachowski et al., 2009), and the reliability or resilience of teammates in common and uncertain situations (Wohleber et al., 2016).
As AAs lack a facility for inductive reasoning, these entities are likely to lack resilience, particularly when the environment is influenced by an intelligent adversary. This deficiency of AAs is likely to impose increased importance on the flexibility of coordination mechanisms, as it is likely the human may have to resort to the use of more explicit coordination under such situations to employ the AA appropriately. In the RPA example, when the autopilot turns the wrong way, the pilot explicitly commands the turn rate to achieve the appropriate heading. The failure of the autopilot to respond reliably lead to an increase in explicit coordination, which consumed more pilot cognitive resources. This increased demand on the pilot can reduce the pilot’s ability to monitor and be aware of other situational considerations like weather.
It is also likely that the training program for operators will be required to facilitate the human side of these moderators and build more successful human–AA teams (Walliser et al., 2019). It was through leveraging shared training and experience that the pilot and sensor operator in the example executed a complex surveillance maneuver fluently using implicit coordination. This training will likely to be necessary for human AA teams as well. Real-world operations frequently push the human–AA team beyond the designer’s understanding, which will necessitate adapting coordination to succeed. This requirement also supports designing the mechanisms to adapt to changes in these moderators, particularly reliability or resilience.
Models
Finally, there are the internal models that the individual team members employ to manage their coordination behavior. Examples from the human teaming literature include mental models (Gervits et al., 2020; Rico et al., 2019), transactive memory systems (Mesmer-Magnus et al., 2017), checklists, and scripts (Geddes, 1997). These models may be of the goals, the work, the situation, or individual teammates, and must be updated dynamically. AAs are known to predict the outcome of their own behavior. However, based upon human–human teaming literature, it is only by employing a model to estimate a future state in response to the interaction of their own and anticipated human activities, that AAs can implicitly coordinate and reduce the cost of coordination to their human teammates. The pilot in the RPA example explicitly coordinates the high-level plan and expectations by saying “We can’t lose them. Switch to high contrast mode and zoom in to make sure we’ve got the right bike.” This synchronization of mental models between the crew enables the later surveillance at the intersection to be fully implicit coordination.
A specific subset of the teammate models found within the AA literature includes tracking the intent of an individual human teammate to better predict their actions and needs (Ahmad et al., 2016; Chen et al., 2016; Holtzen et al., 2016; Huber & Marvel, 2016; Kofler et al., 2015; McGhan et al., 2015; Periverzov & Ilieş, 2015; Vered et al., 2016). Based on the fact that the cost to the sender (e.g., human) is significantly higher when forced to employ explicit coordination with multiple receivers (e.g., AAs within a multi-agent architecture), we propose that the exploration of explicit intent models for coordination, particularly in high performing teams, may be a fruitful area of future research.
Conclusion
If we are to achieve the cyberneticist’s vision of integrated human–agent teams (Licklider, 1960), it is important to design future multi-agent systems with AAs that can coordinate effectively with human team members. Grounding our design in the coordination performance observed in human-human teams, this research focused on design considerations for AAs that utilize analogous coordination methods. We have provided a framework for exploring and understanding coordination. By defining the types and classifying the content of coordination, we provide insight into the costs of coordination and illustrate how these costs are particularly high when a single human is required to explicitly coordinate with multiple AAs within an intelligent system. We further discuss how the transition of negotiable, nondirective coordination content from explicit to implicit means can reduce the cost to the human of these systems. As humans, we have been improving our coordination in teams for generations. With the advent of AAs that can sense and react to the environment and their teammates, system designers should leverage human coordination paradigms in AA design. We should consider the mechanisms, moderators, and models that are important for improving coordination in human–AA teams to reduce the burden these systems place upon the human operator.
Footnotes
Acknowledgment
The authors gratefully acknowledge the financial support of the Air Force Office of Scientific Research, Computational Cognition and Machine Intelligence Program. The views in this article are those of the authors and do not necessarily reflect the official policy or position of the Department of the Air Force, Department of Defense, or the U.S. Government.
Author Biographies
Michael Schneider is a doctoral student at the Air Force Institute of Technology. He earned his Bachelor's in Mechanical Engineering from the University of Dayton in 2008 and a Master's in Systems Engineering from the Air Force Institute of Technology in 2011. As a civil service engineer in Air Force acquisition, he has been responsible for human factors and cognitive engineering for mobility and unmanned systems programs.
Michael Miller is an Associate Professor in the Department of Systems Engineering and Management. Dr. Miller earned his PhD in Industrial and Systems Engineering, Human Factors specialization, from Virginia Tech in 1993 and an MS from Ohio University in 1989, also in Industrial and Systems Engineering. Dr. Miller’s research interests include human systems modeling, human interaction with automation, and human-display integration. Dr. Miller has contributed to more than 100 issued U.S. patents on digital imaging and display systems, 30 peer-reviewed journal articles, and numerous conference proceedings.
David Jacques holds a PhD and an MS in Aeronautical Engineering from AFIT, and a BS in Mechanical Engineering from Lehigh University. He enjoys collaborative, interdisciplinary projects that tend to focus on system effectiveness and/or value. He has developed methods and algorithms for improving the multi-agent effectiveness of Small UAS through inter-vehicle cooperation and auto-routing algorithms. Presently, he is leading research in the area of effective multi-vehicle control for flexible reconnaissance and surveillance operations.
Gilbert Peterson is a Professor of Computer Science at the Air Force Institute of Technology, and Chair of the IFIP Working Group 11.9 Digital Forensics. Dr. Peterson received a BS degree in Architecture and an MS and PhD in Computer Science at the University of Texas at Arlington. He teaches and conducts research in digital forensics, statistical machine learning, and autonomous robots. He has over 90 peer-reviewed publications and six edited books.
Thomas Ford is a senior systems engineer with Integrity Applications Incorporated, Dayton, OH, and is Adjunct Assistant Professor of Systems Engineering at the Air Force Institute of Technology. He obtained his PhD in Systems Engineering from the Air Force Institute of Technology and recently retired from the United States Air Force with 22 years of engineering and management experience. His research interests are systems architecting, systems resiliency, and systems interoperability.
