Abstract
Team research increasingly incorporates emergent states as an integral mediator between team inputs and outcomes. In conjunction with this, we have witnessed a proliferation and fragmentation of measurement techniques associated with emergent states. This inconsistency in measurement presents a problem for scientists and practitioners alike. For the scientist, it becomes difficult to better understand the nature and effects of various emergent states on team processes and outcomes. For the practitioner, it complicates the process of measurement development, selection, and implementation. To address these issues, we review the literature on emergent states focusing on various measurement strategies, to better unpack best practices. In so doing, we highlight existing research that suggests innovative solutions to the conceptual, methodological, and logistical problems that consistently plague emergent state research. Our aim is to enhance emergent state theory by applying psychometric principles to the measurement techniques associated with them.
The past 30 years has witnessed a surge in team research (Wuchty, Jones, & Uzzi, 2007). Accordingly, there is now a large body of evidence that points to the critical drivers of team performance (e.g., Bell, 2007; De Dreu & Weingart, 2003; Gully, Incalcaterra, Joshi, & Beaubien, 2002). While initial research focused on identifying antecedents to team performance (i.e., team, individual, and task characteristics), more recent research has begun to unpack the black box within input–mediator–output (IMO) models of team performance (Mathieu, Maynard, Rapp, & Gilson, 2008), with emergent states theorized to be a primary explanatory variable mediating the relationship between team inputs and outcomes. Over the past decade, emergent states—relatively dynamic, collective-level characteristics that “vary as a function of team context, inputs, processes, and outcomes” (Marks, Mathieu, & Zaccaro, 2001, p. 357)—have been consistently demonstrated to influence desirable team outcomes (e.g., Kozlowski & Ilgen, 2006; Mathieu et al., 2008; Rico, Sánchez-Manzanares, Gil, & Gibson, 2010). Given the importance of emergent states across many disciplines (e.g., organizational, sports, military), assessment of these constructs is paramount. However, emergent state measurement suffers from fragmented definitions, operationalizations, aggregation techniques, and disjointed methodologies (cf. Lewis & Herndon, 2011; Mohammed, Klimoski, & Rentsch, 2000). This article reviews common emergent state measurement practices, gauging adherence to psychometric best practices (e.g., Nunnally & Bernstein, 1994), which may have implications for the internal validity and generalizability of emergent state research. Issues addressed will relate to construct clarity, measure development, multilevel aggregation, and measurement over time. We will highlight instances where research has fallen short, and where it has made significant advancements. Ultimately, this targeted conceptual review shall advance emergent state research by providing practical and insightful guidelines to enhance researchers’ and practitioners’ ability to assess and address team emergent states.
Research Methodology
To document the emergent state measurement science, the authors conducted a multipronged search of several databases (i.e., PsycINFO, PsycARTICLES, Business Source Premier, Military and Government Collection, ERIC, MEDLINE, Human Resources Abstracts). We paired search terms relating to different emergent states (i.e., transactive memory, mental model, situation(al) awareness, shared cognition, team knowledge, psychological safety, cohesion, shared/group trust, collective/group efficacy), terms for measurement (i.e., measurement, assessment, monitoring), and terms indicating a collective level of analysis (i.e., team, collective, group, multilevel). To facilitate relevance and parsimony, we limited search results to those published from the year 2000 and on. This search strategy yielded 703 unique articles. We leveraged insights from subject matter experts in teams and organizational research to eliminate some articles and identify others we had missed. Articles that were deemed to not include relevant information (e.g., unclear explication of measurement practices, did not study the focal constructs, or used non-adult or individual-level samples) were removed. These articles were reviewed by the authors and used for the basis of claims made throughout this article. Finally, to further enhance relevance, we conducted targeted searches to identify studies that used innovative solutions to address problems within emergent state measurement practices. Ultimately, we reviewed 259 articles; due to the conceptual nature of the review, and for reasons of parsimony, only a subset of these articles is directly referenced in the article. We reference articles that engage in good examples of emergent state measurement practices, and draw special attention to work that offers innovative solutions to particularly troublesome measurement and research conundrums. We review articles with an eye toward internal validity and generalizability (i.e., good methods and research) rather than focusing on specific metrics (e.g., correlation/regression coefficients, consistency/agreement indices)—as specific metrics may be suppressed or inflated depending on the nature of the methodological violation (Cooke, Gorman, & Winner, 2007; Nunnally & Bernstein, 1994). A full list of reviewed articles is available from the first author.
Issues in Emergent State Measurement
Clearly Defining the Construct
Lack of construct clarity was identified as a common problem. As Suddaby (2010) noted, “perhaps the most common definitional issue in manuscripts is that others simply fail to define their constructs” (p. 347). Conceptually ambiguous, multifaceted, cross-disciplinary constructs—as are common with emergent state research (Edwards, 2001; Hornsey, Dwyer, & Oei, 2007)—make systematic research difficult. Indeed, the inability to converge on consistent definitions has led some researchers to suggest that some constructs be abandoned altogether (Hornsey et al., 2007). Despite this, advancements have been made recently (e.g., Mohammed, Ferzandi, & Hamilton, 2010; Mohammed et al., 2000), but we see room for much improvement across all emergent states. Lack of definitional clarity has been identified as a significant impediment to emergent state measurement (Mohammed et al., 2010). To effectively measure constructs, all definitions/operationalizations must be enumerated. This is particularly germane for emergent states, due to the only fairly recent clarification of emergent states from team processes (Marks et al., 2001). Transactive memory theory, for instance, originated from social psychology and intimate dyadic relationships (Wegner, 1987); it has since focused on “the cognitive processes in groups, the factors that affect those processes, and the group performance outcomes that result” (Lewis & Herndon, 2011, p. 1254, emphasis added). Similarly, cohesion has been studied in sports teams (Carron, Widmeyer, & Brawley, 1985; Pain & Harwood, 2008), virtual teams (Huang, Kahai, & Jestice, 2010), military teams (Oliver, Harman, Hoover, Hayes, & Pandhi, 1999), and even political parties (Owens, 2003; Rice, 1925). However, more modern conceptualizations define cohesion as an affective attraction to the task and/or group, shifting from initial behavioral conceptualizations (e.g., degree to which political party members voted collectively, rather than individually).
Slight definitional shifts are important to understand when operationalizing and theorizing emergent states; beyond simply confusing researchers and practitioners, definitional nuances can be impactful in numerous ways. First, definitions may range in degree of specificity. For example, collective efficacy is often operationalized either as beliefs regarding a collective’s ability to either meet an overarching goal (e.g., “defeat the enemy”; Chen, Gully, & Eden, 2001), or multiple specific goals (e.g., “communicate effectively,” “minimize unnecessary casualties”; Heuze, Raimbault, & Fontayne, 2006; Myers, Payment, & Feltz, 2004). Despite these differences (which have been shown to be empirically meaningful; see Stajkovic, Lee, & Nyberg, 2009), both are often considered the same construct. Without considering divergent emergent state definitions/operationalizations, researchers risk developing or selecting unsuitable measures. Second, related constructs may not be appropriately distinguished. For example, transactive memory systems (TMS), shared mental models (SMM), and team situation awareness (TSA) have been positioned under the umbrella of team cognition theory (Cooke, Gorman, Myers, & Duran, 2013; Cooke et al., 2007). In addition, group learning, strategic consensus, cross-understanding, and shared task understanding may also fall under this purview; however, the exact relationships between these myriad concepts remain unclear. Consider TMS compared with cross-understanding. Although theoretically and operationally distinct, Huber and Lewis (2010) admitted that they “are similar in that they are both composed in some way of members’ understandings” (p. 9). If a researcher uses superficial definitions of TMS—who knows what—these concepts would be very difficult to differentiate. Similar issues are apparent for non-cognitive emergent states. For example, reviewing the literature on team intimacy and cohesion, Rosh, Offermann, and Van Diest (2012) noted that these have been often confused, merged, and used interchangeably. Ultimately, misappropriated definitions may yield flawed research and misinterpreted results, as previous reviews of cohesion (Hornsey et al., 2007) and TMS (Lewis & Herndon, 2011) have pointed out.
Nonetheless, progress has been made toward greater construct clarity in numerous emergent states. Meta-analysis facilitates greater construct clarity despite multiple different definitions. For example, while trust is typically defined in terms of its antecedents (Adams, Bruyn, & Chung-Yan, 2004; Schoorman, Mayer, & Davis, 1996), it has been operationalized with 3 (Rempel, Holmes, & Zanna, 1985), 4 (Schoorman et al., 1996), or even 10 (Butler, 1991) factors. Despite this variation, Colquitt, Scott, and LePine (2007) were able to meta-analyze the literature because researchers explicitly defined/operationalized trust, providing strong support for the validity of a 4-factor structure. However, meta-analyses are only as accurate as the data they integrate. Considering the recent uptick in meta-analytic (e.g., Gully et al., 2002; Mesmer-Magnus & DeChurch, 2009; Stajkovic et al., 2009), this can be particularly problematic.
Generating Items and Collecting Data
Generating/selecting a measurement tool and collecting data are no easy tasks. However, these tasks can be made easier—at least more efficient—if several factors beyond simple construct clarity are taken into consideration. These factors largely relate to the contextual nature of emergent states, and their meaning can shift slightly depending on various key factors, including (a) type of entity that the construct references, (b) size of the collective, and (c) type of collective task.
Construct referent
When developing emergent state measures, the referent should be clarified, because construct meaning can vary accordingly (e.g., by tasks, groups, people). For instance, cohesion and psychological safety within a long-standing group may mean qualitatively different things than in larger, more nebulous groups (Edmondson, 2004). Similarly, trust operates differently when referring to different trustees; research has found meaningfully different trust effects when accounting for the referent (Colquitt et al., 2007; Robertson, Gockel, & Brauner, 2013). In a similar manner, the referent should match the intended level of performance measurement. That is, when examining the relationship between collective efficacy and performance in an eight-person military squad, both collective efficacy and performance should be measured at the team level. This is evidenced by Gully et al.’s (2002) meta-analysis, which found stronger relationships between both team efficacy and potency and performance at the team level as opposed to the individual level. That is, when measuring emergent states as explicitly team level (with team as the referent), these perceptions tend to predict team-level performance more so than individual-level performance. Some researchers even go so far as to suggest that emergent states, specifically team cognition, should only be measured at the team level and, contrary to the majority of past research, in real time from team interactions (Cooke et al., 2013).
Collective size
Traditional guidance within the literature suggests that researchers should avoid common method bias by triangulating measurement. Although this is foundational science, adherence may be difficult depending on context, such as the size of the collective. Our review revealed variability in the formats available for measuring cognitive states. However, these measurement tools are typically time-consuming to develop and implement. For example, measurement complexity increases as the size of the collective to be measured increases. In larger collectives, methods that require more laborious data aggregation (e.g., establishing consensus via intraclass correlation [ICC], rwg, rwg(j)) are less practical. Moreover, the size of the collective can also affect data collection. Austin’s (2003) measure of TMS, for instance, is constructed on a matrix comprising team size and the number of skills identified. For example, a team of 4 members and 4 skills requires 16 ratings per member, whereas a team of 25 members and 4 skills requires 100 ratings per member. There is less variation and difficulty associated with non-cognitive emergent states; however, development of these measures can be time-intensive due to their task-specific nature. In short, measurement selection depends in great part on the size of the collective being measured, with easier to develop, distribute, complete, and assess methods (e.g., questionnaires, scenario-based measures) being best suited for both larger and smaller groups and more cumbersome methods (e.g., observations, card-sorting) being better suited for smaller groups. Difficulties associated with emergent state measurement have driven the development of innovative, non-obtrusive measures such as social network analysis (SNA), sociometric badges, vocal recognition, content analysis, and archival data analysis. For example, SNA has recently been used to measure SMM (Avnet & Weigel, 2013) and cohesion (Wise, 2014), while content analytic approaches have been used to measure cohesion (Gonzales, Hancock, & Pennebaker, 2010) and collective cognition (Clariana & Wallace, 2007).
Differences in task
Beyond demonstrating alignment between referent and intended performance source, emergent states with significant task implications should be measured such that unique task components are reflected (cf. Bartram, 2005). Collective efficacy, for example, is often defined as collective belief that specific levels and components of task performance can be attained (Blecharz et al., 2014; Heuze et al., 2006; Stajkovic et al., 2009). Moreover, group cohesion partially depends on the extent to which group members are attracted to specific group tasks (Carron & Brawley, 2000; Mullen & Copper, 1994). Teams can develop mental models about specific subtasks within a larger performance environment (McComb, Kennedy, Perryman, Warner, & Letsky, 2010). Accordingly, team task analyses should be conducted prior to emergent state measurement (Mohammed et al., 2010), making emergent state measurement relatively context-specific, and not easily adaptable to other domains (or even different tasks in similar domains). However, there have been steps to mitigate this difficulty. Webber, Chen, Payne, Marsh, and Zaccaro (2000) developed and validated a measure of strategic team mental models that can be easily adapted to construct a generic measure of team mental models.
Team task may also guide appropriate measurement technique. For example, team tasks may be characterized by declarative, procedural, and/or strategic knowledge. Mohammed and colleagues (2010) have shown recently that certain SMM measurement techniques are more effective for assessing different types of knowledge content (Cannon-Bowers, Tannenbaum, Salas, & Volpe, 1995; Webber et al., 2000). In addition, DeChurch and Mesmer-Magnus (2010a) demonstrated that compositional (e.g., SMM) and compilational cognition (e.g., transactive memory) were differently predictive of performance depending on team task type and emergent state conceptualization. Relatedly, task interdependence affects how emergent states operate. For example, stronger performance relationships have been found between cohesion (Barrick, Bradley, Kristof-Brown, & Colbert, 2007) and team efficacy (Gully et al., 2002) when interdependence was high. Staples and Webster (2008) demonstrated a stronger trust–performance relationship when interdependence was low. DeChurch and Mesmer-Magnus (2010b) found that moderate and high task interdependence benefited compilational cognition (e.g., transactive memory), but the cognition–performance relationship was stronger for compositional cognition (e.g., SMM) when interdependence was moderate as opposed to high. Consequently, researchers and practitioners should pay close attention to the type of task(s) and the level of task interdependence under investigation because this can serve as a decision aid when selecting emergent state measures.
Aggregating Measures to the Collective
Having addressed issues of construct clarity and operationalization, the researcher must consider questions of aggregation and level of analysis. Assuming the construct is truly group level, the researcher must still determine how it should be conceptualized and analyzed at the group level. Various researchers (Chan, 1998; Chen, Bliese, & Mathieu, 2005; Kozlowski & Chao, 2012; Kozlowski & Klein, 2000) have discussed theoretical elements and best practices for measuring and modeling multilevel constructs. Although a complete review of the tenets of multilevel theory is beyond the scope of this article, a brief synopsis is appropriate. One of the most basic tenets is that multilevel constructs (of which emergent states are a class) must have similar manifestations at individual and collective levels of analysis, though this similarity may range from loosely metaphoric to essentially identical (Chen et al., 2005). Related to this is whether the emergent state construct should be conceptualized as shared or configural. Shared and configural properties both emerge from individual team members, though shared constructs are similarly perceived by all members while configural properties are unshared. Shared constructs are operationalized at the group level through mean or average levels, while configural conceptualizations are concerned with dispersion and structure (e.g., dispersion, patterns; Kozlowski & Klein, 2000). With these questions in mind, we focus this portion of our review on current and future issues in emergent state aggregation.
Shared conceptualizations
In our review, we noticed that the overwhelming majority of non-cognitive emergent states were conceptualized as shared. That is, individual perceptions were aggregated to the collective level using additive or mean models (and typically, checking for sharedness with rwg and/or ICC indices). This is common practice when measuring and aggregating the majority of non-cognitive emergent states such as trust (Burke, Sims, Lazzara, & Salas, 2007), collective efficacy (Stajkovic et al., 2009), psychological safety (Edmondson, 1999; May, Gilson, & Harter, 2004), and team climate (M. Baer & Frese, 2003; Bain, Mann, & Pirola-Merlo, 2001). Certain aspects of cognitive emergent states—such as team mental model (TMM) accuracy (Smith-Jentsch, Cannon-Bowers, Tannenbaum, & Salas, 2008) and TMS specialization, credibility, and coordination (Lewis, 2003)—have been assessed with shared/compositional models by averaging the degree to which an individual team member’s mental model overlaps with that of an expert’s. Overall, shared models are most appropriate when (a) units hold relatively homogeneous views on the construct of interest, (b) there are no substantial subgroups/faultlines, and (c) the construct of interest has little to no meaning at the dyadic level (Chan, 1998; Cole, Bedeian, Hirschfeld, & Vogel, 2011; Kozlowski & Klein, 2000). This emphasis on sharedness when modeling emergent states represents a continuing trend noted by researchers throughout at least the past decade (Chan, 1998; Cole et al., 2011; Klein & Kozlowski, 2000). Conceptual advances have been made in the past decade, as researchers and theoreticians discuss issues of temporality (Kozlowski & Chao, 2012), agreement indices (Lance, Butts, & Michels, 2006), and accounting for level and sharedness simultaneously (Cole et al., 2011), yet we did not see much research actually applying this work.
When modeling shared emergent states, researchers typically justify aggregation by assessing within-team agreement. Recently, new agreement indices have been proposed, such as within-group agreement (awg, Brown & Hauenstein, 2005), absolute deviation (ADm, Cohen, Doveh, & Nahum-Shani, 2009), team-specific agreement (rrg, Biemann, Ellwart, & Rack, 2014), and group dissimilarity (Solanas, Manolov, Leiva, & Andres, 2013). One study even used Cohen’s kappa as an agreement index (Rau, 2006). Roberson, Sturman, and Simons (2007) compared multiple agreement indices, finding that each performed differently depending on the criterion. Despite the availability and diversity of multiple agreement indices, research typically simply justifies aggregation by appealing to rwg cutoffs. Indeed, this defaulting to arbitrary cutoffs continues to occur, despite recent advances in this very area that address some of the most problematic issues (Lance et al., 2006). Rather, the more appropriate practice would be to use significance testing to determine 95% critical values for setting agreement cutoffs (Dunlap, Burke, & Smith-Crowe, 2003; Lebreton, James, & Lindell, 2005). Despite this, of the 259 emergent state articles reviewed, only 1 (Wholey et al., 2011) mentioned using the critical value approach to determine cutoff scores. This is not to claim that the critical value approach has not made an impact—a quick Google Scholar search shows 135 references to the Dunlap and colleagues article; but the majority of empirical studies were on organizational climate (e.g., McKay, Avery, & Morris, 2009), team processes (e.g., Vecchio, Justin, & Pearce, 2010), or static team characteristics (Murphy, Cronin, & Tam, 2003; Van Mierlo, Rutte, Vermunt, Kompier, & Doorewaard, 2006).
Several articles used (or appeared to use) strictly additive models (i.e., aggregating to the collective with mean indices without referencing any check for agreement). Additive models are appropriate if within-group variance is irrelevant (Chan, 1998; Kozlowski & Klein, 2000); this may be the case in loosely interdependent collectives (Molleman, 2009; Saavedra, Earley, & Van Dyne, 1993; Steiner, 1972). Most emergent state research that does not report agreement studied loose collectives (e.g., neighborhoods) where there was no specific collaborative task (e.g., Sherrieb, Norris, & Galea, 2010; Tendulkar, Koenen, Dunn, Buka, & Subramanian, 2012). However, we also noticed several studies that failed to mention agreement indices even though they studied collectives with highly interdependent structures and tasks (e.g., non-profit boards, various therapy groups). It would be helpful, given that the common practice is to clearly explicate the rationale and method for aggregation, for future researchers to clearly point out why they chose to use an additive model, especially when the task is somewhat interdependent.
Configural conceptualizations
Configural conceptualizations are important to consider when the presence of subgroups or any other systematic variations in related constructs elicit non-normal (e.g., bimodal) distributions of focal construct (cf. Alexandrov, Babakus, & Yavas, 2007; Cole et al., 2011; Murrell & Gaertner, 1992). Kozlowski and Klein (2000) noted that “compilation-based emergent processes are relatively little explored from a multilevel perspective” (p. 18). Nearly 15 years later, this is mostly still true. Notable exceptions include research on TMM similarity and TMS, which are consistently modeled configurally (e.g., Austin, 2003; Ellwart, Konradt, & Rack, 2014; Smith-Jentsch, Kraiger, Cannon-Bowers, & Salas, 2009; Swaab, Postmes, Neijens, Kiers, & Dumay, 2002). This is not entirely surprising, as TMM similarity and TMS dispersion/patterning are inherently meaningful, not just a prerequisite for aggregation (Chan, 1998; Mohammed et al., 2010). Beyond TMM similarity, other emergent states may be appropriately modeled through dispersion under certain conditions, though these seem to be less frequently researched (e.g., Goddard, 2001; Sorensen & Stanton, 2011). Emergent states such as cohesion, psychological safety, and trust seem to be underrepresented with configural conceptualizations, possibly because theory tends to label some emergent states as inherently compositional and others as compilational (Kozlowski & Chao, 2012). Indeed, it may be time for researchers to incorporate configural indices rather than simply discarding low agreement teams, as fairly common practice (e.g., Aryee, Chen, & Budhwar, 2004; Rentsch & Klimoski, 2001; Susskind, Kacmar, & Borchgrevink, 2003). Unfortunately, this solution comes with problems of its own (Carron et al., 2004; Cole et al., 2011). By focusing only on teams that achieve a certain level of sharedness, researchers assume, for example, that trust does not exist in teams that lack this sharedness, even though it may be present within subgroups without there being a consistent level of generic team trust. In some studies we reviewed, researchers acknowledged low levels of agreement, but opted to keep all teams in, explaining that removing teams reduces power. This is preferable, but when significant disagreement exists, researchers should consider incorporating configural conceptualizations into their research models.
An anonymous reviewer noted that one reason for this underrepresentation may be that dispersion indices are more heavily impacted by missing data than are means. Newman and Sin (2009) suggested several strategies for correcting measures of within-team agreement when there is missing data. We refer the reader to their research for an in-depth discussion and formulae for correcting for missing data. Cole and colleagues (2011) argued that researchers should begin including both mean and dispersion indices in multilevel models. We echo this sentiment by Cole and colleagues (and refer the reader to their work for specifically how to include both). Finally, we emphasize that a theoretically appropriate dispersion index should be selected (see above).
Shared or configural?
Despite the widespread use of shared/consensus aggregation techniques, researchers emphasize that this approach should not be utilized without a strong theoretical rationale (Burke et al., 2007; Chan, 1998; Dion, 2000). Deciding when to conceptualize an emergent state as shared or configural is a difficult task and there does not seem to have been significant advances made in the field (Burke et al., 2007; Chan, 1998; Dion, 2000; Gibson, Randel, & Earley, 2000). However, our review highlights factors—including task characteristics, team structure, and construct effects—that recent research has identified as being important for multilevel modeling. Gully and colleagues’ (2002) meta-analysis found that collective efficacy was more strongly related to performance when teams were more interdependent, suggesting that mean aggregation may be more appropriate when interdependence is higher. Conversely, in more configural interdependence structures, such as is the case with conjunctive tasks where if just one team member performs poorly due to low perceived levels of an emergent state, the entire team may perform more poorly (Klein & Kozlowski, 2000; Saavedra et al., 1993). This would then mean that the minimum level of said construct would be its most meaningful collective index (maximum indices would, conversely, be relevant if the highest level of a variable is most relevant for performance; for example, Ng & Van Dyne, 2005). For further insight on incorporating these indices into multilevel models, see Harrison and Klein’s (2007) discussion of team disparity.
Obviously, the nature of the emergent state construct itself determines how it should be aggregated. This is clear with TMS and SMM, which are primarily concerned with assessing the degree of sharedness (Austin, 2003; Mesmer-Magnus & DeChurch, 2009); as such, most reviewed studies that used configural indices assessed cognitive emergent states (Austin, 2003; Smith-Jentsch et al., 2009; Sorensen & Stanton, 2011; Swaab et al., 2002). However, certain emergent states may necessitate different aggregation techniques, depending on their operationalization. For example, Carron and colleagues (2004) studied cohesion using different rwg cutoffs, finding that more lenient cutoffs reduced the cohesion–performance relationship when operationalized more individually; when operationalized more collectively, stringent cutoff scores increased this relationship.
Complex conceptualizations
Ultimately, a major difficulty with multilevel research is that whether one chooses a shared/compositional or configural/compilational approach to modeling multilevel data, some information is lost at higher levels of analysis. To circumvent these issues, researchers have proposed several more complex solutions, including hierarchical linear modeling (HLM), network analysis, and consensus-dispersion models. HLM addresses several problems commonly present in multilevel research. It accounts for multicollinearity within aggregates, deals with heteroscedasticity due to uneven group numbers, can accommodate missing data at Level 1, and tests hypotheses at the aggregate level (Gill, 2003; Wright & Benson, 2011). There are some difficulties associated with HLM, such as requiring larger sample sizes, but this can be circumvented to an extent by sampling more groups with fewer individuals per group (Scherbaum & Ferreter, 2009; Woltman, Feldstain, MacKay, & Rocchi, 2012). This is particularly germane for field researchers, but may also be something to consider when designing and conducting multilevel studies. To determine whether HLM has been gaining in popularity since 2000, we conducted a targeted literature search using the same construct search terms as in our broader searches, but adding HLM and hierarchical linear, yielding 118 articles ranging in publication date from 1979 to 2014. There has been an exponential increase in published articles, with articles being published at rates of 0.7/year (1979-2002), 4.57/year (2003-2008), and 11.33/year (2009-2014). Recent research has used HLM techniques to assess the multilevel effects of collective efficacy (Bayazit & Mannix, 2003; Dithurbide, Sullivan, & Chow, 2009), cohesion (Cohen, Ben-Tura, & Vashdi, 2012; Fullagar & Egleston, 2008), psychological safety (Idris, Dollard, Coward, & Dormann, 2012), and transactive memory (Yuan, Carboni, & Ehrlich, 2014; Yuan, Fulk, Monge, & Contractor, 2010). Interestingly, the majority of articles identified as leveraging HLM studied collective efficacy or cohesion (i.e., other emergent states were either rare or absent). One limitation of HLM is that researchers are still left to determine whether to conceptualize group-level variables with either consensus or configural operationalizations, and the majority of HLM studies continue to justify aggregation with rwg indices (e.g., Bayazit & Mannix, 2003; Cohen et al., 2012; Idris et al., 2012).
Researchers have also argued for the adoption of network models when conducting multilevel research (Crawford & Lepine, 2012; Murase, Doty, Wax, DeChurch, & Contractor, 2012). Network models are compilational techniques for measuring team variables that assess interrelationships between all individuals in a given team (Crawford & Lepine, 2012; Murase et al., 2012). It should be noted that much research on social networks focuses on the network itself (i.e., general connectedness between individuals within a collective), and may even explore how the network influences a mean aggregated emergent state (e.g., Espinosa & Clark, 2014; Tirado, Hernando, & Aguaded, 2012; Zhong, Huang, Davison, Yang, & Chen, 2012). We see this as a missed opportunity for exploring the structural dynamics of the emergent state; indeed, network analysis techniques can be utilized to study emergent states by modeling individual-level constructs and assessing similarities within all dyadic connections (see Espinosa & Clark, 2014; Salmon et al., 2009; Walker et al., 2009). SNA enables researchers to measure “structures and systems that would be nearly impossible to describe without relational concepts,” and allows for “the testing of hypotheses about the networks’ structural properties” (Comu, Iorio, Taylor, & Dossick, 2013, p. 298). Espinosa and Clark (2014) illustrated the importance of SNA to modeling cognitive emergent states, noting that when “team knowledge constructs [are] more complex . . . simple averages provide an incomplete picture” (p. 333). Resick and colleagues (2010) showed that a network approach to modeling team cognition was superior for predicting performance than were other metrics of team cognition. In our review, we noted a recent uptick in articles using network analyses to study emergent states such as team trust (Lusher, Kremer, & Robins, 2014), cohesion (Tirado et al., 2012; Wise, 2014; Zaheer & Soda, 2009), affective climate (Yuan et al., 2014), team mental models (Avnet & Weigel, 2013; Dionne, Sayama, Hao, & Bush, 2010), TMS (Comu et al., 2013; Espinosa & Clark, 2014), and situational awareness (Sorensen & Stanton, 2011). Network operationalizations are most relevant when a construct may have meaningful intradyadic variance, such that the felt presence of a given emergent state may differ from dyad to dyad. Finally, although a full review of the nuances of SNA is outside the scope of this work, it is worth mentioning m-slices, a specific, little-used SNA technique (Rodríguez, Sicilia, Sánchez-Alonso, Lezcano, & García-Barriocanal, 2011). This technique can complement SNA by identifying clusters of related perceptions within a social network. Although Rodriguez used m-slicing to identify interest areas in an e-learning environment, applying this technique to emergent states such as cohesion and mental model measurement is appealing.
Finally, Cole and colleagues (2011) recently argued for the use of consensus-dispersion models. Although a full summary of this work is not appropriate here, they essentially outline a methodology for simultaneously modeling consensus (i.e., mean) and configural (i.e., dispersion) effects, while also accounting for multicollinearity between means and dispersion. Unfortunately, Cole and colleagues’ work seems to not be that widely cited, as a Google Scholar search of articles citing this work returned only 15 hits; of these hits, only one dealt with an emergent state—trust. De Jong and Dirks (2012) found that mean trust, trust dispersion, and their interaction term all significantly predicted team performance.
Incorporating Time Into Emergent State Models
Kennedy and McComb (2010) noted that “little is known about how the [team cognition] convergence process occurs in a team domain” (p. 340). Similar sentiments have been echoed by other researchers (Costas et al., 2013; Kozlowski & Chao, 2012; Roe, Gockel, & Meyer, 2012). We therefore conducted targeted literature searches for research exploring emergent states from a temporal or longitudinal perspective. To do this, we added the terms longitudinal, curve modeling, growth curve, and over time to our set of broader search terms; from these hits, we identified 44 additional articles that in some way discussed temporal issues in relation to emergent states. Our goal was partly to validate claims made by past researchers regarding the inadequacy of temporal research, but also to identify recent advances made, and also to highlight where the field needs to go. Although there is certainly a lack of research on temporal issues and emergent states, we highlight a few key findings here relevant to emergent state research and measurement.
Convergence over time
Arthur, Bell, and Edwards (2007) found support for the hypothesis that within-team agreement on measures of collective efficacy should increase, especially when using referent-shift measures. Their argument, which applies to all emergent states, was that “continued interaction among team members provides a basis for which the team members can better estimate” (Arthur et al., 2007, p. 39) the presence of an emergent state. Growing convergence over time was also evidenced in other studies (Dunlop, Falk, & Beauchamp, 2013; Goncalo, Polman, & Maslach, 2010; Hommes et al., 2014; Kanawattanachai & Yoo, 2007; Lee, Zhang, & Yin, 2011). Accordingly, the general consensus in the literature seems to be that teams do trend toward agreement over time. However, Kozlowski, Ployhart, and Lim (2010, cited in Kozlowski & Chao, 2012) measured teams consistently (using experience sampling) over an 8-week period and found that some teams converged toward common cohesion perceptions, while others converged then diverged cyclically.
Timing of measurement
The general consensus from our reviewed articles is that it takes time for emergent states to develop and converge. Nonetheless, a plethora of lab studies exist that examine emergent states in ad hoc teams, and these short-lived teams continue to demonstrate acceptable agreement indices. Taken together, it seems as if emergent states might indeed exist, at least in some form, early in group life. Should emergent state researchers really worry about time if they can find decent convergence early on? Empirical and conceptual work may shed some light on this issue. Bradley, Baur, Banford, and Postlethwaite (2013) looked at cohesion in teams spanning 4 months, finding that later cohesion was more strongly linked to performance than was cohesion measured earlier. Siebold (2006), reviewing years of military cohesion research, noted that cohesion tends to be volatile (and down trending) early in group life, and only stabilizes much later. Kanawattanachai and Yoo (2007) found that it took several weeks for TMS to develop, but once it was developed, it was stable and was a strong predictor of team performance.
Conceptually, even when agreement is reached, it is simply intuitive that emergent states may be qualitatively different later on in the team’s life, even if numerical indicators remain constant. That is, moderate levels of cohesion likely means quite different things in teams existing for 3 hours as opposed to 3 years. Indeed, Chiocchio and Essiembre (2009) argued that teams need to interact for at least 4 weeks before cohesion can truly emerge, meaning that studies that measure cohesion in ad hoc, short-term teams may not actually be assessing cohesion (despite convergence). Furthermore, the existence of swift versions of emergent states such as cohesion (Meyerson, Weick, & Kramer, 1996) and psychological safety (Dufresne, 2013) suggests that constructs measured early on in collective life may be qualitatively different from the same construct measured at a later period in team development. For example, Arthur and colleagues (2007) found that after accounting for interim performance, only initial measures of collective efficacy predicted final performance. In other words, pre-task collective efficacy was meaningfully different from collective efficacy in situ (which was essentially equivalent to teams’ actual ongoing performance).
Recommendation 7a: Account for the effects of time in emergent state research, understanding that teams tend to progress toward convergence, and that findings from a group in one phase may not generalize to groups in other phases.
New temporal constructs
Recently, DeRue, Hollenbeck, Ilgen, and Feltz (2010) have argued that another component of team-level conceptualization should be the trajectory of emergent states over time; that is, teams with similar means and dispersions of a construct might experience said construct in different ways, if one is moving toward greater convergence while the other experiences growing divergence. Li and Roe (2012) showed that incorporating trajectory indices into regression models adds significant predictive power. Quintane, Pattison, Robins, and Mol (2013) showed that time horizon may influence the nature (and appropriate measurement strategy) of cohesion. Specifically, they note that in teams with a more distal time horizon, closure and reciprocity (typical SNA indices associated with cohesion) more commonly occur, while in teams with a shorter time frame, adaptation processes are more prevalent. This suggests that time horizon, and perhaps perceived time horizon (see Molleman, 2009), may be an important construct to consider when theorizing about and measuring emergent states.
Phases in emergent state development
Roe and colleagues (2012) reviewed team process research and concluded that researchers tend to acknowledge the importance of temporality while only examining differences between teams at different points in time. This is an important distinction from truly temporal research, which would study differences within teams across time. In our review, we also noticed that while longitudinal research is growing in frequency, it tends to focus on two time points, and rarely separated by more than a few months. This research design does little to tell us about the evolution of emergent states, and is typically more focused on whether a given psychological state at one time influences another variable at another time (e.g., Allen, Jones, & Sheffield, 2009; Blecharz et al., 2014; Chen et al., 2005; Hirak, Peng, Carmeli, & Schaubroeck, 2012). Even when research is conducted over the course of an extended period of time, the evolution of the emergent state tends not to be the focal point (e.g., Bradley et al., 2013; Brahm & Kunze, 2012; H. W. Chou, Lin, & Chou, 2012). These research designs inherently assume that the construct being measured at multiple points in time is qualitatively the same, which is problematic.
Various researchers have argued that emergent states develop in a phase/process manner (Langan-Fox, Anglim, & Wilson, 2004). Most process-based theories of emergence argue that phases involve the following: (a) Team members orienting themselves with each other; (b) gathering information about team members (e.g., roles, trustworthiness); and, finally, (c) compiling construct-relevant information after extended team interaction. These phases substantially overlap existing team development theories (Kozlowski, Watola, Jensen, Kim, & Botero, 2009; Tuckman & Jensen, 1977). Because teams and multilevel research typically discuss team development phases conceptually (if at all), we did not notice any articles that empirically showed state emergence across specific development phases. That notwithstanding, we offer a few thoughts on applying a development framework when thinking about measuring emergent states.
At early phases in team development, team members become oriented to each other and the task (Kozlowski et al., 2009); they attain a base level of interpersonal and task knowledge but are also often characterized by disagreements (Tuckman & Jensen, 1977). Social bonding, interpersonal learning, and task practice opportunities occur, which yield initial levels of social cohesion (Siebold, 2006) and team cognition (Kanawattanachai & Yoo, 2007). Convergence tends to be lower in earlier phases, making strict consensus/compositional models inappropriate. This means that accounting for dispersion (either through dispersion-consensus or configural conceptualizations) may be more important than at other phases.
As teams persist, tasks, roles, and overall team identity (e.g., norms) are clarified. Team communication/clarification processes yield a common cognitive framework for social and task interactions (Kozlowski et al., 2009). These processes also drive the development of other non-cognitive emergent states, such as psychological safety (e.g., Alavi & McCormick, 2008; Bradley et al., 2013; Brahm & Kunze, 2012; Hommes et al., 2014). Although means and dispersion indices will of course be important here, we suggest that trajectory (DeRue et al., 2010; Li & Roe, 2012) is particularly important during these phases. The literature seems to suggest that emergent states are either most predictive early (e.g., Arthur et al., 2007) or later in team life, before performance is measured (Goncalo et al., 2010; Salanova, Rodríguez-Sanchez, Schaufeli, & Cifre, 2014). Initial volatility in team perceptions would render trajectory indices likely unreliable, but trajectory assessed at middle phases could be used to predict final levels of the emergent state. This may be especially helpful in fast-paced teams where data collection is difficult immediately prior to task performance.
Our review indicates a consensus that emergent states typically move toward agreement over time, meaning that emergent state changes at later phases should be relatively small. Funk and Kulik (2012) highlighted key characteristics of late-stage groups including behavioral stability and an aversion toward mental model changes; they argued that studying the social networks within these teams is essential to diagnosing their performance. And although these teams may be less likely to change, when they focus on addressing sources of low performance (Kozlowski et al., 2009; Tuckman & Jensen, 1977), emergent states may change as a result. If this happens, emergent states that were initially stable may change and not immediately converge; in fact, perceptions may oscillate between convergence and divergence as mature teams work to arrive at sustainable solutions (e.g., Kozlowski et al., 2010).
Summary
Teams and teamwork are increasingly important in modern society; accordingly, research interest in measuring emergent states has grown considerably. Yet despite the importance of these variables for predicting and improving team performance, the extant emergent state literature remains somewhat nascent. Furthermore, from our review of the emergent state literature, we noted several problematic trends.
First, constructs are frequently either not defined sufficiently or defined inconsistently across different studies (Lewis & Herndon, 2011). This obfuscates trends that may be apparent across different streams of research. It complicates theory building, because cohesion or efficacy in one domain might not mean the same thing in another domain. We urge researchers to intentionally distinguish the exact nature of their emergent state of interest and resist the temptation to infer generalizability across research domains with divergent definitions of a given construct. To do this, we suggest that researchers also understand the evolution of the construct of interest over time. Because emergent state research is relatively recent, many constructs have fluid definitions. Building theory and making inferences across studies and situations can be problematic when a construct meant something different 50 years ago. Understanding the breadth (across research domains) and depth (over time) of the construct of interest should not only enable better integration of findings across studies but also facilitate more nuanced and insightful theory building and research design.
Second, and a related problem, is the observation that there is no clear criterion for developing appropriate item-specific operationalizations of different constructs (Lewis & Herndon, 2011). This issue is especially complicated by the fact that the meaning and impact of various emergent states can change somewhat depending on the referent, size of the group, and the team’s task. Regarding team task type, we noted that different emergent states are impacted differently by task interdependence. Although we did not notice clear trends for understanding how interdependence affects specific emergent states, we encourage researchers to pay attention to this potentially moderating factor.
Third, even when items are developed or selected correctly, researchers tend to be fairly limited in the ways in which they operationalize the constructs at different levels of analysis. We encourage researchers to more consistently use complex models to represent emergent states—accounting for both emergent state level (e.g., individual, collective) and method of aggregation (e.g., sharedness, dispersion, structure). It has been at least 15 years since researchers began highlighting the importance of multilevel modeling, and the complex ways that this can happen (e.g., Chan, 1998; Kozlowski & Klein, 2000). However, team-level and multilevel models rarely incorporate configural elements in their team-level models of emergent states (see Cole et al., 2011). We have presented several recent articles that we believe can and should continue to make a strong impact on the field (e.g., Carron et al., 2004; Cole et al., 2011; Dunlap et al., 2003). These works may help researchers better conceptualize levels of analysis in their theoretical and statistical models. Doing so will facilitate more accurate and insightful multilevel models, allowing researchers to generate and answer new and important research questions, which will be increasingly important as organizations to look to different kinds of teams (e.g., distributed, cross-functional, multiteam systems) to achieve objectives.
Finally, research has yet to give consistent attention to the role of temporality and the dynamic emergence of various constructs. Recently, researchers have begun developing non-obtrusive methods for measuring some emergent states, which may facilitate more frequent and less cumbersome measurement. We recommend that researchers leverage these measurement advances to further research in all areas of emergent state measurement. Furthermore, to help address the role of time, we theoretically tie Kozlowski and colleagues’ (2009) team development phases to emergent state development to suggest a few ways in which these states may shift over time.
In an effort to synthesize the literature on emergent state measurement, we have provided recommendations to what we view as the central issues of the day. These recommendations are intended to act as guideposts for both researchers and practitioners alike. Practically, many of these recommendations can act as standalone best practices that can and should be immediately implemented into practice. Some of these recommendations are best practices that have already been acknowledged and developed elsewhere in seminal works on measurement and multilevel theory (e.g., Chan, 1998; Kozlowski & Klein, 2000; Nunnally & Bernstein, 1994). Nonetheless, our review highlights that some best practices are not being consistently followed. We point to some of these inconsistent practices to help narrow the gap between where we should be as a science and where we currently are. Specifically, better abiding by these best practices will increase construct clarity, facilitate research across domains, and strengthen the validity and generalizability of findings, among other benefits. From a theoretical standpoint, these recommendations are intended to stimulate debate on emergent state measurement and act as a jumping-off point for future critical analysis and research. More research on the role of agreement across different emergent states (e.g., Carron et al., 2004), the nature of various types of swift emergent states (e.g., Dufresne, 2013; Meyerson et al., 1996), and the role of time in emergent state emergence is needed. We also encourage researchers to continue developing and using innovative ways to unobtrusively assess various emergent states. As we seek to understand the development and performance of collectives in increasingly complex environments, these methodologies will become increasingly important. The importance of understanding emergent states will only grow as we continue to rely on teamwork to accomplish societal and organizational goals; it is therefore essential that we not only better understand these states, but that we better understand how to measure them. This work represents one step toward the goal of continuing to improve the science of emergent state measurement.
Footnotes
This article is part of the special issue: 2014 Annual Review Issue, Small Group Research, Volume 45(6).
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by an Army Research Institute grant (Contract No. W5J9CQ-11-D-0002, Task Order 11-10002) to Dr. Christina Curnow of ICF International.
