Abstract
Background
Current team assessment instruments in healthcare tend to involve rater-based evaluations that are susceptible to well-known biases. Recent advances in technology include portable devices to measure team-based activities. Consequently, the possibility exists to move away from rater-based assessments of team function by identifying quantitative measures to replace them.
Aim
This article aims to provide potential approaches to developing quantitative measurement suites involving large amounts of data to address the challenges of assessment presented by the complex nature of teamwork.
Conclusion
By addressing construct, measurement, and context components, we provide a practical approach to developing a suite to capture quantitative measurements that, through incorporation of social network analysis and aggregated other values, aligns with the Team Strategies & Tools to Enhance Performance and Patient SafetyTM (TeamSTEPPSTM) dimensions for fostering teamwork.
Introduction
Black Knight: “Tis but a scratch.” King Arthure: “A scratch! Your arm’s off!” Black Knight: “No it isn’t.” King Arthur: “Then what’s that then?”
Monty Python and the Holy Grail (White et al, 2001).
The above satirical exchange, occurring after King Arthur has loped off the left arm of the Black Knight during a sword fight, aims to highlight the absurdity of denial in the face of overwhelming evidence. For evaluators of performance, however, it is an excellent illustration of the importance of accurate measurement in assessment and the challenges in achieving it. Without valid, reliable tools, vastly different interpretations of the same action or performance can occur. Nonetheless, adequate validity and reliability in an instrument is still prone to inaccuracies due to bias that can creep into scores. In observer-based assessment, such bias results from human interpretation and subjectivity. Other types of measurement might have calculations altered by inaccurate raw data. Such lack of accurate measurement compromises the ability to demonstrate learning and to evaluate training effectiveness.
Having a reliable, effective, efficient, and valid manner of assessment is especially important in measuring team performance in healthcare, since teams are a foundational component for caring for the patient. Team development, therefore, is essential to foster patient safety and quality care. To date, team performance measures have relied on observer-based instruments that are prone to the weaknesses previously described. Recent advances in technology present an opportunity to construct quantitative assessment tools of team performance independent of observer rating. Like all instruments, however, these quantitative measurements present their own set of challenges and weaknesses to overcome.
The aim of this article is to provide potential approaches to developing quantitative measurement suites involving large amounts of data to address the challenges of assessment presented by the complex nature of teamwork. It will present a methodical approach in developing a quantitative measurement suite using technology to assess teamwork in healthcare. It will first evaluate the complexities and challenges of assessing teamwork in healthcare. Next, it will describe two useful frameworks for developing assessment tools using quantitative data. Finally, it will provide a practical application using one of these frameworks to conceptualize a quantitative measurement suite to assess team function that incorporates the latest available unobtrusive technology.
Assessing Team Performance in Healthcare
The Complexity of Conceptualizing Teamwork
Part of the difficulty in assessing team performance in healthcare is that team function involves more than just the team alone. Instead, teamwork is a complex interplay of actions of individuals on the team, their reactions to other members, and the overall work context in which the team operates. Team function, therefore, is multi-tiered in nature, unfolding on micro- (i.e., individual), meso- (i.e., team), and macro- (i.e., organizational) levels. This dynamic process results in the emergence of team phenomena over time based on individuals’ actions, team member interplay, and outside influences on the team (Kozlowski & Chao, 2018).
The team phenomena emerging out of team dynamics consists of an interrelated set of team-level attitudinal (e.g., team cohesion and trust), behavioral (e.g., coordination, communication), and cognitive states (e.g., team learning, shared mental model). These so-called ABCs of teamwork combine to influence team performance (Bell et al., 2018). They arise from each individual’s actions, and they are interdependent with other team members’ activities (Salas et al., 2008). Collectively defined as non-technical skills (NTS), they result from the interpersonal, cognitive, and personal resource skills that each member contributes to the team (Agha et al., 2015).
The complex interplay of individual, team members, and work context combine to make teamwork more than the sum of the individual actions of its members. At the team level, individual team members act autonomously according to their own internal motivations and rules of behaviors. An action on one member’s part, however small, therefore, can have an outsized impact on team function. Additionally, as the team reacts to its past actions and environmental conditions, unpredictable and new team behaviors can arise. In this manner, the output of a team is greater than the cumulative sum of what its individual members could produce (Pype et al., 2018).
Teamwork is thus a multi-dimensional construct that is more than the sum of its component parts. This fact makes its assessment particularly challenging. In addition to capturing the nuances of overall teamwork, an instrument evaluating team performance must take into account the NTS forming their basis. Such assessment can involve rater-based evaluations or rater-independent quantitative measurements, each of which has its strengths and weaknesses, as subsequently described.
Rater-Based Team Assessment Instruments
To date, rater-based instruments have dominated as the tool of choice for assessing team performance. At least 80 different rater-based instruments evaluate NTS and teamwork in healthcare. They assess a wide range of healthcare specialties in a variety of clinical and simulated settings. These instruments universally measure communication as a construct. Leadership and situation awareness are other popular constructs. (Higham et al., 2019).
Rater-based instruments can employ observers to evaluate team performance. Additionally, they can rely on team member self-assessment in determining team effectiveness. Regardless of the type of rater, these tools’ scales have a variety of formats: continuous, dichotomous, polydichotomous, or ordered categorical. Whatever the scale, formal rater-based tools require a standard way of delivering the assessment, a standard set of items, specified forms of combining the information from the items, and ways to standardize scores in order to ensure consistent and precise measurement. Tool developers attempt to satisfy these requirements in four ways: (1) providing a clear description of the attribute being evaluated, (2) having a scheme of numbers, (3) creating an operational tie between the number scheme and either the magnitude or the presence/absence of the attribute, and (4) having a standard way of assigning numbers on the scale during rating to reflect the operational tie to the attribute (Widaman, 2020).
Rater-based assessment tools lend themselves well to capturing the complex, multidimensional components of team interaction, since instrument developers can tailor each item to a specific attitude, behavior, or cognitive state. Additionally, self-assessment tools can reach a wide range of team members as surveys sent to target audiences (Kash et al., 2018). Furthermore, these instruments, when administered over time to the same individuals, have the potential to provide some insight into the dynamic nature of emergent team phenomena. Finally, rigorous rater training to foster a common understanding of scoring criteria can help provide a consistency in the rating process (Kondrasuk, 2011).
Several disadvantages counterbalance the utility of rater-based instruments. Observer-based instruments have limitations related to their breath: rater number, team size, events evaluated, and time of observation. Self-assessments are prone to issues related to sampling: population size, representation, and response rate. When used to capture dynamic processes, they are disruptive to the phenomena they attempt to evaluate. Finally, both types of instruments might fail to measure adequately the desired behavior through flaws in construct (Kozlowski & Chao, 2018).
Observer-based tools are particularly prone to influence by the human element of the rater in the evaluation. Personal bias, attitudes, and values of raters may influence their impressions. (Javidmehr & Ebrahimpour, 2015). Additionally, different behaviors and cognitive processes of raters during evaluation lead to variability in interpretation of the instrument scoring system (Villamane et al., 2017). As a result, the raters end up providing a score that does not reflect the performance based on its quality, influencing the validity of the interpretation and use of the ratings (Wind, 2019). Such bias also exists among team members completing self-assessment surveys (Kash et al., 2018). These tendencies of raters to either over or under estimate performance in relation to the scoring rubric, collectively known as rater effects, take on manifold individual features. Combined, they are the metaphorical Achilles heel of rater-based evaluations.
Rater-Independent Measurements of Team Performance
Quantitative measurements of team performance that are free of human input as part of the evaluation have the advantage of removing the element of rater effects. Common quantifiable measurements used to assess team performance include time, distance, volume, and frequency counts. Determination of time duration is useful in evaluating process measures. For example, the calculation of the time it takes a patient to go to computed tomography scanning or to undergo endotracheal intubation during trauma resuscitation serves to evaluate a process reliant on team performance (Hoang et al., 2019; Welsch et al., 2018). Distance measures help assess positioning of team members, and, when combined with time, give a sense of motion dynamics (Ward et al., 2014). Volume and rate of speech assists with assessing communication characteristics (Peng et al., 2019). Finally, frequency counts help with evaluating outcome measures such as morbidity and mortality as well as process measures and team dynamics (Welsch et al., 2018). For example, frequency counts of types of communication may assist with the overall evaluation of communication within a team (Peng et al., 2019).
A clear advantage of quantitative measurement of teamwork is its ability to measure emergent phenomena in an ongoing manner, often unobtrusively. A plethora of technologies, such as mobile badges, remote sensors, and audio/video recording equipment, now exist to evaluate dynamic interactions within organizations and between people (George et al., 2016). Among teams, the dynamism arises from changing behavioral processes as well as alterations in cognitive and affective properties over time (Luciano et al., 2018). Such temporal shifts can occur as part of development through a time span, as episodic, cyclic changes within an activity, or as a particular event disrupting the course of activities (Luciano et al., 2018). By measuring all this dynamic information, one can obtain an enormous amount of raw data. Such data collection of team performance occurs via one of three avenues: (1) behavioral action, (2) word use, and (3) physiological response (Luciano et al., 2018). So much data results in the ability to accrue a more representative amount of information and to measure directly the constituent components of a targeted research construct. This scope and granularity in data collection allows researchers to ask new questions, obtain better answers to existing ones, and shift team analysis to specific aspects of each individual team member (George et al., 2016, Luciano et al., 2018).
The benefit of the sheer quantity of this so-called big data becomes in itself a major challenge for researchers due to the three “Vs” associated with it: volume, velocity, and variety. The large amount of information, the speed with which one can collect it, and the diversity of sources from which it is gathered requires application of principles from data science. Its use allows the development of models to capture, analyze, and visualize the trends within the big data through collection, storage, processing, analysis, and reporting (George et al., 2016).
Additionally, as with observer-based assessments, quantitative measurements have their own unique issues related to accurate collection and bias. Determining measurements using wearable badges designed to quantify interactions between team members, so-called sociometric badges, is an illustrative example. Kayhan et al. demonstrated that these badges may fail to synchronize, creating lags and discrepancies in measurement of the time of a particular activity, leading to erroneous calculations of constructs related to them (Kayhan et al., 2018). These discrepancies also can exist related to values of specific data measurements, such as speech volume and posture, leading to the over-detection or under-detection of one badge compared to another (Chaffin et al., 2017; Kayhan et al., 2018). As with unsynchronized internal timing, these measurement differences lead to miscalculation of behavior constructs that the badges quantify (Chaffin et al., 2017). The measurement of dynamic interactions using big data, therefore, is complex, especially when one is trying to determine values for more than static points within them.
A final challenge related to using quantitative measurements, be they physical or observer-based, stems from the fact that they are only indirect measures of teamwork. They are not measuring the team processes themselves. Instead, they measure indicators of these constructs and serve as proxies of the actual team phenomena (Uher, 2020). Consequently, human interpretation of what constitutes an interaction is necessary. For example, determining the threshold distance between team members that qualifies as a significant interplay becomes relative. Such determinations are important, and they affect data analysis that, in itself, presents various approaches to interpreting quantitative measurements that can in turn influence findings (Kozlowski & Chao, 2018).
Two Frameworks for Development of a Quantitative Measurement Suite of Teamwork in Healthcare
As delineated in the prior section, observer-based team assessment tools are popular instruments. Their disruptiveness in capturing dynamic team processes, however, limits their utility. Quantitative measurement suites using unobtrusive technology present an attractive alternative that enables ongoing, dynamic collection of data. The amount of such data is potentially enormous, given the continuous nature of its uptake. Approaching the development of such a measurement suite requires some sort of framework in order to address such issues.
The Construct, Measurement, Context Framework
Framework for Developing a Team Assessment Tool Using Quantitative Measurements.
Rational approach to developing systems-based measures
Another useful framework for developing a quantitative measurement suite is Orvis et al.’s Rational Approach to Developing Systems-based Measures (RADSM, McCormack et al., 2018; Orvis et al., 2013). RADSM is a six-step process designed to create a measurement tool that draws on all available data within a system, obviating the need for human input or coding. Step 1 identifies the context and constructs of interest. Step 2 delineates the indicators of these constructs. Step 3 involves the selection of the data within the system related to the construct. Step 4 develops measurement indicators that operationalize the behavior indicators using available system data. Step 5 implements these measurements. Finally, Step 6 validates the measurements (McCormack et al., 2018; Orvis et al., 2013).
Compared to Luciano et al.’s framework, RADSM has considerable overlap. Step 1 simultaneously addresses both construct and context. In Step 2, further refinement of the construct occurs. Steps 3, 4, and 5 focus on measurement features. Finally, in Step 6, context again comes into play. Both frameworks strive to collect data free from immediate human input in order to allow its aggregation to be unobtrusive without disruption of the dynamic processes that lead to emergent team phenomena.
Pratical Application
The preceding section outlined two useful approaches to develop a quantitative measurement suite to assess team performance. This section will utilize Luciano et al.’s framework to provide a practical application for the reader (Figure 1). The IPO Model of Team Effectiveness.
Theoretical Construct Elements
Although multiple constructs exist regarding how teams interact, the Team Strategies & Tools to Enhance Performance and Patient SafetyTM (TeamSTEPPSTM) framework is well-known and popular in healthcare (Figure 2) (Agency for Healthcare Research and Quality [AHRQ], 2019). This multi-dimensional construct envisions team structure arising from a combination of leadership, communication, situation monitoring, and mutual support. Team leadership, communication, and situation monitoring are all team processes involving behaviors (i.e., skills). Although they can be episodic in nature, they often arise from an event-based temporal frame when a disruption in normal team activity forces the team to adapt through their use. Mutual support is an emergent, affective state that develops over time as team members learn to trust one another and begin to view themselves as members of a team. These behaviors and this affect arise from the individual team members, and their presence can wax and wane within a team based on several variables such as its member composition, environmental stressors, and the duration of an event. They best manifest during high stress, dynamic situations such as the rapid, unexpected clinical deterioration of a patient. The Input-Process-Output (IPO) Model and Team Development Interventions.
Measurement Features
Graphical Representations in Social Network Analysis.
SNA reveals the underlying structure of social networks, in particular, the manner and patterns in which the social entities interact through relationships. (Cordeiro et al., 2018; Tassabum et al., 2018). SNA can focus on the role of vertices within a network, looking at a particular social entity’s influence or centrality, or it can investigate the overall network itself, determining its properties or underlying phenomena (Tassabum et al., 2018). Each approach involves using mathematical equations and representations of concepts to describe relationships within the social network (Cordeiro et al., 2018; Tassabum et al., 2018).
SNA can enhance evaluation of the nuanced interactions between and within teams. Quantitative physical measures like time, distance, volume, and frequency can assist in evaluating aspects of teamwork such as communication and motion dynamics. Sensors are available to capture these physical values, including radio frequency identification (RFID) monitors, infrared sensors, audio and video recorders, accelerometers, counters, and timers (Rosen et al., 2015). Combined with SNA, these measurements form a sociometric database of a team’s intra- and inter-team interactions that can serve as an evaluation of its performance. These data can serve as an objective, events-based coding of team behaviors that can provide insight into visible and invisible team interactions related to dynamic team phenomena such as inter-member conflict or team adjustment of goals (Kolbe & Boos, 2019).
Investigators have already applied SNA to evaluate intra- and inter-team interactions in a wide variety of fields, including business (Carboni & Ehrlich, 2013), sports (Praca et al., 2019), and even healthcare (Sullivan, Saachti, Younis, & Harris, 2019). In sales teams, the core centrality of a member, that is how centrally located within the network a person is, has an impact on individual performance only to the extent to the degree of the team performance and longevity (Carboni & Ehrlich, 2013). An analysis of match status on players’ prominence and teams’ network properties in the knockout phase of the 2018 Federation Internationale de Football Association (FIFA) World Cup revealed microstructure changes in player behaviors based on whether the team was winning or losing at the time (Praca et al., 2019). Finally, an SNA of influence and information spread among teams of medical doctor trainees revealed that influence remained limited to clinical-technical issues (Sullivan et al., 2019).
Aligning TeamSTEPPSTM Dimensions, SNA Elements, Quantitative Measurements, and Measurement Devices to Assess Team Interaction.
Contextual Considerations
This proposed quantitative measurement suite arises out of a desire to assess team performance in healthcare that is not dependent on rater evaluation. Creating such a suite will enable more granular evaluation of team interactions to help improve team performance, thereby enhancing patient care overall. Thus, development of this tool occurs within a research context of trying to improve patient outcomes by examining and enhancing team performance in healthcare. Individual study contexts would vary according to the environment chosen to use this instrument. For example, assessment of actual teamwork during patient care could occur by employing the measurement suite in the clinical setting, such as the trauma bay or the operating room. Alternatively, it could help evaluate the effectiveness of team training methods, such as simulation-based experiential immersion, by using it in a simulated setting. Finally, these quantitative measurements could assist in answering specific questions related to team dynamics in a controlled, experimental setting. Each study context permits the use of the suite within ethically appropriate research protocols.
Feasibility testing/tool validation
Having created a measurement suite consisting of quantifiable components, the next step would involve beginning the process of its validation for various populations. One approach to this undertaking entails piloting the components of the measurement suite within a setting in which teams can react to the same clinical experience. For example, incorporating the RFID badges, video recording, and dosimeters that record decibels into a pre-existing simulation-based training activity would provide an opportunity to perform such a validation. At our institution, the Student Operating Room Team Training (SORTT) program is such a simulation-based activity (Paige et al., 2014). SORTT brings together senior medical students taking an anatomy elective, senior undergraduate nurses enrolled in a perioperative nursing elective, and nurse anesthesia students to learn and practice team-based competencies during a 2 hour simulated session in which student teams participate in two standardized resuscitation scenarios occurring within the operating room. In the first scenario, the student team must perform an exploratory laparotomy on a stab wound victim bleeding to death from an injured iliac vessel. The second scenario focuses on resuscitation of a patient suffering from lidocaine associated systemic toxicity (LAST). These crisis scenarios serve as catalytic events designed to disrupt the activity of the team and bring team-based affects, behaviors, and cognition to bear on addressing the situation and treating the patient.
Incorporating the RFID badges, dosimeters, and audio/video recording into SORTT would serve to measure the necessary components of team activity to enable SNA computation and its aggregation with other quantitative measures to determine the completeness and accuracy of the construct related to team structure and performance. Additionally, using these devices and calculating these measures would help determine the feasibility of their incorporation into the measurement suite. Comparison of the results from the measurement suite with changes occurring in team performance using established rater-based instruments could then demonstrate convergent validity. The results from such a pilot would then inform the re-evaluation of the three major components of Luciano et al.’s framework for developing assessments of dynamic phenomena using big data (i.e., the theoretical construct for its basis, its measurement features, and the context within which one uses it). The measurement suite would then undergo revision as necessary. Through this iterative refinement, the quantitative measurement suite would undergo sequential tuning as it demonstrates validity for various populations of participants in evolving settings.
Conclusion
In measurement, the Black Knight’s reputed scratch cannot equate with King Arthur’s assessment of a severed arm. In the context of team interactions in healthcare, such disparate evaluations of the same action prevent meaningful interpretation of changes in behavior and the effectiveness of an intervention. Current rater-based teamwork assessment tools are prone to well-known biases, and they can disrupt measurement of ongoing, emergent team phenomena. Most teamwork assessment instruments used in healthcare today rely on rater-based scoring prone to well-known biases and disruptive in their measurement of ongoing, emergent team phenomena.
Using a framework for harnessing big data to measure dynamic interactions, such as that proposed by Luciano et al. or the RADSM, one can develop a preliminary quantitative measurement suite. For example, unobtrusive technology such as RFID badges, dosimeters, and video recording can collect data for SNA to assess the TeamSTEPPSTM team structure dimensions of leadership, communication, situation monitoring, and mutual support. The utility of such frameworks lies in their scaffolding, since it allows developers to choose the measurement devices and teamwork constructs best suited for them and the particular context in which they want to evaluate teams. Thus, measurement devices might include sociometric badges in lieu of RFID tags. Other team dimensions might replace the ones from TeamSTEPPSTM.
After the development of such a quantitative measurement suite, its validation can occur in a variety of settings, including simulation-based activities. During such validation, an assessment of the feasibility of using the measurement suite is possible. The development of more teamwork assessment tools incorporating quantitative measurements that capture dynamic and emergent team processes using these established frameworks have the potential to provide for more targeted identification of gaps in performance. Such assessments would allow for the economical use of valuable resources and time by individualizing training to improve them.
Footnotes
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Dr. Paige receives royalties from Oxford University Press and Springer Nature for three books relating to simulation or surgical education. He also is a consultant to Boston Scientific as a faculty instructor. Finally, he has received grant support from the Southern Group on Educational Affairs (SGEA) and the International Association of Medical Science Educators (IAMSE) as PI for teamwork research as well as from Avita Medical as a co-investigator for hernia research. Drs. Bonanno, Garbee, and Yu as well as Ms. Kerdolff are co-investigators on the SGEA and IAMSE grants. Drs. Bonanno and Garbee also receive royalties from Springer Nature for royalties for a book on simulation. Dr. Rogers is a co-investigator on the SGEA grant.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Drs. Paige, Bonanno, Garbee, and Yu as well as Ms. Kerdolff are investigators on an International Association of Medical Science Educators (IAMSE) grant applying principles related in this article.
