Abstract
Team faultlines—hypothetical dividing lines based on member attributes that split a team into relatively homogeneous subgroups—influence team processes across contexts, as recent meta-analytic findings show. We review the available faultline measures with regard to their properties and identify several limitations, including dealing with more than two subgroups. We thus propose a new cluster-based approach, average silhouette width (ASW), that identifies the number of subgroups and subgroup membership. We then compare the measures with 1,400 simulated teams with varying properties and investigate their factor structure and their behavior under missing values. We also investigate the predictive validity of the measures with data from real work teams. Results show that different measures respond to different team features in different ways but that most of them load on two correlated factors. Taken together, the ASW measure had the most favorable attributes and was the only measure that accurately determined subgroup membership in the presence of more than two subgroups. We discuss limitations and further research opportunities pertaining to faultline measures and provide software for calculating all investigated measures at http://www.group-faultlines.org.
The labor market has become more heterogeneous: Globalization, outsourcing, increased workforce mobility, demographic changes, an increased percentage of women in previously male-dominated jobs, and an increased internationalization of education are only some of the factors contributing to the workforce heterogeneity in all areas of employment in the 21st century. As a consequence, organizations have become more diverse (van Knippenberg & Schippers, 2007).
This development has raised the question as to whether team diversity, that is, the differences among team members that may lead to the perception of being different (van Knippenberg, De Dreu, & Homan, 2004), has an impact on team processes and outcomes. For more than five decades, organizational research has investigated whether team diversity brings about positive or negative effects, mainly with regard to performance-relevant outcomes, and how the diversity-performance relationship can be influenced (for reviews, see van Knippenberg & Schippers, 2007; Williams & O’Reilly, 1998).
This large body of research has not led to conclusive results, as several meta-analyses showed (S. Bell, Villado, Lukasik, Belau, & Briggs, 2011; Bowers, Pharmer, & Salas, 2000; Guillaume, Brodbeck, & Riketta, 2012; van Dijk, van Engen, & van Knippenberg, 2012; Webber & Donahue, 2001). The effects of diversity on team processes seem to be highly contextual (Joshi & Roh, 2009) and depend on several mediating and moderating processes. In other words, there seems to be no main effect of team diversity with regard to a specific type of diversity (e.g., gender diversity) on team outcomes.
As one way of overcoming the inconclusive findings of diversity research, researchers have proposed to investigate the impact of the distribution of several diversity attributes within teams on team-level outcomes simultaneously. This idea is captured in the construct of team faultlines, which was first introduced by Lau and Murnighan (1998). In current research, faultlines are defined as “hypothetical dividing lines that split a team into relatively homogeneous subgroups based on the team members’ demographic alignment along multiple attributes” (Thatcher & Patel, 2011, p. 1119). In contrast to the main body of diversity research, faultlines thus capture the configuration of several diversity attributes simultaneously. Therefore, two teams with the same level of diversity with regard to individual attributes can differ in the extent to which these attributes split the team into hypothetical homogeneous subgroups (for an example, see “The Conceptual Underpinnings of Team Faultlines” below).
Faultline theory assumes that the different extents to which a team is split into (hypothetical) homogenous subgroups are associated with different dynamics and outcomes. In other words, it assumes that not diversity per se has an effect on team processes and outcomes, but its alignment within the team. Recent meta-analytic evidence (Thatcher & Patel, 2011, 2012) revealed that strong faultlines are usually associated with a negative main effect on important team-level outcomes such as performance, conflict, and cohesion—even if they are not perceived by team members (Thatcher & Patel, 2012). Therefore, faultlines appear to be the first construct that is associated with a consistent main effect on important team-level outcomes across contexts in diversity research.
Faultline measures quantify the extent to which a given team is split into (hypothetical) homogeneous subgroups. To our knowledge, there are eight different ways of quantifying faultlines (Barkema & Shvyrkov, 2007; Bezrukova, Jehn, Zanutto, & Thatcher, 2009; Lawrence & Zyphur, 2011; Li & Hambrick, 2005; Shaw, 2004; Thatcher, Jehn, & Zanutto, 2003; Trezzini, 2008; van Knippenberg, Dawson, West, & Homan, 2011), all of which have certain disadvantages (Thatcher & Patel, 2012). This multitude of measures makes it difficult for researchers to choose the appropriate measure for their studies. The different measures also make it difficult to compare findings across studies because they are calculated differently and have different numeric properties. For this reason, in their review of diversity faultlines Thatcher and Patel (2012) call for studies that compare different measures with the same data set, which we adhere to with this article. In doing so, we hope to contribute to the literature by providing a comprehensive overview on the available measures with both real and simulated data sets under different conditions. This kind of overview and the discussion that follows may provide guidance for researchers as to which faultline measure may be most appropriate for their research question.
We would like to further our contribution by comparing the different available faultline measures not only computationally but also conceptually. After reviewing the faultline literature and before coming to the computational comparison, we propose several properties to which a method for calculating faultlines should adhere, including the ability to calculate a diversity faultline measure for a team consisting of more than two homogeneous subgroups. As our subsequent review of the available measures reveals that not one of them complies with all properties, we see the second contribution of this article in our introduction of a new cluster-based faultline measure, which we evaluate and compare to existing measures. Finally, we provide a package for the free open-source statistical environment R (R Development Core Team, 2012) that can calculate all of the measures that we investigate, including the new multi-subgroup measure (package asw.cluster; see http://www.group-faultlines.org). To our knowledge, this package constitutes the first comprehensive (and free) software suite for calculating faultlines with a variety of measures.
The Conceptual Underpinnings of Team Faultlines
As an illustration of the faultline construct, picture an eight-member medical team in an operating room in an American hospital. There are three young Black scrub nurses, three older Asian surgeons, and two Caucasian middle-aged anesthesiologists. Compare this team to another medical team of the same size with a young Black, a middle-aged Asian, and an older Caucasian scrub nurse; a young Caucasian, a middle-aged Asian, and an older Black surgeon; and a young Black and an older Asian anesthesiologist. If we look at the diversity attributes functional background, ethnicity, and age separately, these two teams are equally diverse (three older members, three younger members, and two middle-aged members; three nurses, three surgeons, and two anesthesiologists; three Blacks, three Asians, two Caucasians). However, these teams differ with regard to the distribution of these diversity attributes, that is, with regard to the hypothetical split of the team. The first team has three homogeneous hypothetical subgroups based on age, ethnicity, and functional background and is thus characterized by a strong faultline that splits these subgroups. In contrast, the second team is what is called a crosscut team, where the hypothetical subgroups that are created by functional background are less homogeneous than in the first team, since age and race vary within functional background. Accordingly, a measure of faultline strength would result in different values for these two teams, whereas measures of age diversity, functional diversity, ethnic diversity, and compound “overall” diversity would return identical values for these two teams. This example illustrates that faultlines are not a compound or “overall” measure of diversity, which is a measure that is difficult to interpret and whose use is therefore discouraged (Harrison & Klein, 2007). Instead, a measure of faultline strength carries a clear meaning with regard to the composition of the team—namely, the extent to which the attributes in a diverse team converge in such a way that the team is split into homogeneous (hypothetical) subgroups based on these attributes. Faultlines therefore require diversity as a precondition: In a homogeneous team, there can be no faultlines, but diverse teams can differ with regard to their faultline strength (e.g., diverse teams can be split or crosscut): “By their nature, faultlines become most likely in groups of moderate diversity” (Lau & Murnighan, 1998, p. 331), because subgroups cannot be homogeneous if all team members are different from each other. Accordingly, faultline strength predicts team outcomes above and beyond diversity measures (Lau & Murnighan, 2005).
Team faultlines are known to affect team-level dynamics and performance-related outcomes (Thatcher & Patel, 2011, 2012). Several theories help explain the processes underlying the effects of faultlines; among them are the self-categorization theory, optimal distinctiveness theory, and distance theories (Thatcher & Patel, 2011).
Self-categorization theory (e.g., Oakes, Turner, & Haslam, 1991; Turner, Hogg, Oakes, Reicher, & Wetherell, 1987) posits that the salience of social categorizations is contingent on their comparative fit, their normative fit, and their cognitive accessibility. Comparative fit refers to the extent to which observed similarities and differences between people or their actions are perceived as correlating with social categories (Turner et al., 1987). Thus, a person is more likely to perceive a target as different if the target belongs to a homogeneous (sub)group that is different from the perceiver’s (sub)group. For example, in a six-person team of three young female engineers from the development department and three older male mechanics from the production department, there is a higher probability that one of the women perceives one of the men as very different from herself than in a team where these attributes were not perfectly correlated among subgroups. Thus, the stronger the faultline, the higher the probability of intergroup bias between the homogeneous subgroups within a team, which in turn can lead to negative affective reactions between subgroups, such as increased levels of conflict (van Knippenberg et al., 2004).
Optimal distinctiveness theory (e.g., Brewer, 1991) can also link the faultline construct to negative outcomes such as conflict. It posits that individuals are motivated to be both similar and unique; subgroup members in groups with strong faultlines will thus be more motivated to distinguish themselves from members who are not in their subgroup, which can ultimately lead to conflict within the team (Halevy, 2008; Insko, Schopler, Hoyle, Dardis, & Graetz, 1990). As conflicts decrease a team’s social integration, there is decreased team performance (Jehn, Northcraft, & Neale, 1999; van Knippenberg et al., 2004).
Distance theory posits that team members in one subgroup will experience psychological distance from the members of other subgroups (Brewer, Manzi, & Shaw, 1993). Furthermore, due to their shared different background with regard to several attributes, the members of different homogeneous subgroups may have different values and different ideas about how to complete tasks (Brewer, 1991), regardless of whether team members perceive a split into subgroups or not. Team members are also likely to require higher levels of effort and motivation to exchange task-relevant information across a strong faultline (Meyer & Schermuly, 2012). Therefore, teams with strong faultlines are more likely to experience low levels of cohesion (Cronin, Bezrukova, Weingart, & Tinsley, 2011; Molleman, 2005) and low levels of communication across subgroup boundaries (Meyer, Shemla, & Schermuly, 2011).
Faultline research has differentiated between faultlines denoting a hypothetical dividing line (called dormant faultlines, see Bezrukova et al., 2009) and active faultlines; active faultlines describe a situation where the faultline is actually perceived by group members. This distinction “is similar to that made in the diversity literature between objective (e.g., actual) diversity and perceived diversity” (Thatcher & Patel, 2012, p. 982). Of note, “regardless of whether faultlines are dormant or active, they clearly have an impact on group outcomes” (Thatcher & Patel, 2012, p. 991). With few exceptions, faultline research has studied hypothetical or potential faultlines, and “researchers find that the presence of dormant faultlines has consequences even when faultlines are not activated” (Thatcher & Patel, 2012, p. 982). Apparently, relationships between faultlines and outcomes are stronger when the faultlines are activated than when they are dormant (Thatcher & Patel, 2012), but they do not vanish if they are dormant. In line with Thatcher and Patel (2012), we employ the term faultline for hypothetical dividing lines and use the term active faultline for faultlines that are perceived by team members.
Faultlines usually become active if a contextual trigger renders the team’s subgroup structure salient (Thatcher & Patel, 2012). Changes in the team structure through member entry, exit, or substitution are therefore one potential way for changing (Lau & Murnighan, 1998) or activating a team’s faultline over time: The behaviors of members of homogeneous demographic subgroups can polarize the group by aligning members’ behavior to the faultline structure (Mäs, Flache, Takács, & Jehn, 2012) and can trigger a faultline in this way. The introduction of new group members who share characteristics with several subgroups, called “crisscrossing actors,” can overcome polarization (Mäs et al., 2012). In research, active faultlines either are created by experimental manipulations in the laboratory or are measured using questionnaire scales as proposed by Jehn and Bezrukova (2010) that elicit team members’ subjective perceptions of the team’s subgroup structure. Qualitative approaches that ask team members about their subjective experiences (e.g., Meyer et al., 2011) can also help to investigate active faultlines. Faultlines that are based on demographic attributes are elicited with measures that quantify the extent to which these attributes are distributed as homogeneous hypothetical subgroups. A plurality of such measures exists and can involve complex calculations with many iterations. The following review focuses on these latter measures. Before we describe them, however, we formulate properties to which such faultline measures should adhere.
Desirable Properties of Faultline Measures
First, we posit that a method for examining faultlines should deliver a numeric quantification of the extent of the faultline. Only a numeric value of this kind can be employed for predicting the outcomes of faultlines. 1
Second, a numeric faultline measure should be sensitive to changes of the homogeneity of the subgroups that are present within a team. Less homogeneous subgroups should result in a different value than more homogeneous subgroups.
Third, Lau and Murnighan (1998) argued that an ideal faultline measure should work with numeric (e.g., age) and categorical (e.g., race or gender) demographic attributes. Because categorical attributes can be converted into numeric variables, this requirement can be expressed as a requirement to process numeric data.
Fourth, a viable method for determining faultlines should reveal the inner subgroup structure for a given team, that is, for a given faultline, it should determine which team member belongs to which hypothetical subgroup. Only such a determination of the subgroup structure with adequate accuracy allows researchers to compare the hypothetical dividing line constituting a faultline with other information regarding subgroup structure, for example, social networks or perceptions of active faultlines (Jehn & Bezrukova, 2010). In other words, only a member-to-subgroup association allows the validation of the team split that a faultline measure identifies.
Fifth, a given method should be able to compute faultline strengths using commonly available computer hardware within a reasonable time, even for larger groups and a large number of demographic attributes. Large and complex statistical models can take days and weeks to compute, and such computation times are sometimes necessary. However, we believe that a computation time that can take several millenniums to complete on current computing hardware is not a feasible approach.
Diversity Faultline Strength for More Than Two Subgroups
Lau and Murnighan’s (1998) initial definition of faultlines as “hypothetical dividing lines that may split a group into subgroups based on one or more attributes” (p. 328) does not limit the number of subgroups in a given team. Lau and Murnighan (1998) argued that the analysis of cases with two subgroups is particularly interesting because this case is associated with particularly intense dynamics in groups. However, it is important to determine the correct number of subgroups in a given group because an increasing number of subgroups tends to reduce measures of diversity faultline strength (Shaw, 2004; Thatcher & Patel, 2012; Trezzini, 2008). Due to the potential differences in team dynamics in the presence of more than two subgroups (Carton & Cummings, 2012), we believe that it is important that a given method of determining faultlines can detect the correct number of subgroups with the strongest split between them. Furthermore, homogeneous splits into more than two subgroups can occur even in relatively small teams, as the example of the medical team described above illustrates. In larger groups of 9 or 12 individuals, it is even more plausible to observe more than two homogeneous subgroups.
Calculating the diversity faultline strength for the imaginary medical team with a method that is restricted to two subgroups would not represent the structure of the team appropriately and would underestimate the extent of the alignment. It can thus be misleading to apply a faultline measure with an a priori restriction to two subgroups to data where more than two subgroups are possible. We therefore propose that the sixth desirable property of faultline measures is the capability of identifying the number of homogeneous subgroups in a given team and of calculating the according faultline measure. A researcher can then, if so desired, exclude teams with more than two homogeneous subgroups, but simply assuming that only two subgroups are present can lead to an inappropriate quantification of the actual group split.
In the following section, we briefly review the available faultline measures with regard to their adherence to the propositions and, based on this review, propose a new one. As some features of the methods, such as their sensitivity to different levels of subgroup homogeneity, are difficult to assess by investigating the formulas, we subsequently compare the methods computationally with simulated and real team data.
Measures for Quantifying Diversity Faultline Strength
Thatcher’s Fau
Thatcher et al. (2003) proposed a formula for diversity faultline strength that is suitable for numerical data and provides a member-to-subgroup association. For all possible splits g = 1, 2, …, S of a given group into two subgroups, Thatcher et al. calculated Faug
as the portion of the total variance explained by the subgroup membership according to Formula 1.
Here, p denotes the number of attributes in the data,
Thatcher et al. (2003) limited their method to two subgroups, partly because the number of required calculations for all possible subgroup splits for an unknown number of subgroups equals the Bell Number (E. Bell, 1934), which is an extremely fast growing sequence. For example, the number of possible partitions of a 20-person group equals 51,724,158,235,371, and this many permutations require about 27,000 years of computation time on a powerful computer of today.
Furthermore, even if no computational limitations existed, this approach would not be suitable for identifying the number of subgroups because for an arbitrary number of subgroups larger than two, it maximizes for the configuration where all variance is between subgroups, that is, where all group members are assigned to their own subgroup of size one. As a possible solution for this issue, Thatcher et al. (2003) suggested that further research could use a clustering algorithm for identifying the number of subgroups and for reducing the calculation effort, which is in line with the latent class cluster analysis (LCCA) approach and our average silhouette width (ASW) approach, both of which are presented below.
In summary, the conceptual analysis of this measure shows that it meets at least three of the proposed propositions: It delivers a numeric value of diversity faultline strength, it works with numeric values, and it delivers a member-to-subgroup association. However, it is not suitable if there are more than two subgroups.
Subgroup Strength: Gibson and Vermeulen (2003)
Gibson and Vermeulen (2003) provide a method for calculating subgroup strengths, a conceptualization closely related to faultlines. Strong subgroups exist if there is high variability in the extent to which attributes overlap in the dyads within a team. Thus, in some dyads, individuals are relatively similar, causing a large overlap with respect to the measured attributes, while in other dyads, members are relatively diverse, showing little or no overlap. In contrast, subgroups are weak, if the overlaps over all possible dyads are of similar magnitude. The according measure is calculated as
In summary, the conceptual analysis of this measure shows that it delivers a numeric value of diversity faultline strength, it works with numeric values, and it is suited for more than two subgroups. However, it does not deliver a member-to-subgroup association.
Shaw’s FLS
Shaw (2004) measures the extent to which categorical attributes are aligned within subgroups and deviate between subgroups. Within the subset of group members sharing the same category of a given attribute, the alignment of the respective other attributes results in a measure that Shaw called internal alignment (IA). In addition, the cross-subgroup alignment index (CGAI) denotes the extent to which group members belonging to another subgroup (by falling in another category of the same given attribute) share the same category of the other attributes. As the values of IA and CGAI fall into the range from 0 to 1, the faultline measure FLS is then given as
Therefore, FLS is positively correlated with the IA and negatively with the CGAIs. This method works only with categorical data; continuous attributes need to be categorized. Shaw (2004) argued that the conversion of numeric attributes such as age into categories such as young, middle aged, and old is justified because humans tend to experience continuous variables in categories.
However, finding the appropriate number of categories and their boundaries can constitute a major source of unwanted side effects (e.g., Altman, 2005; Schellingerhout, Heymans, de Vet, Koes, & Verhagen, 2009), as the perceived categories may have fuzzy boundaries and depend on the context in which the attributes are measured. Furthermore, it might be difficult to find a reasonable categorization for tenure, cognitive abilities, or personality traits.
With FLS, subgroups are formed by group members who share the same category of a particular attribute. Thus, subgroups are defined with respect to a single attribute, and the number of subgroups per attribute is determined by the number of its categories. This perspective offers a very detailed view on relations between in-groups and out-groups and an in-depth examination of substructures within a team. However, this conceptualization of subgroups differs from the conceptualization at the core of the definition of faultlines—namely, that they are formed by simultaneous splits on multiple attributes. Therefore, FLS is unable to produce a single member-to-subgroup association with respect to all attributes involved.
In summary, Shaw’s method is unsuitable for numeric data and does not deliver a member-to-subgroup association. However, it is not restricted to two subgroups (although it cannot quantify the number of subgroups), delivers a numeric value, and is easy to compute, as the calculation consists of a well-defined number of simple steps, influenced only by the number of attributes and their categories. 2
Factional Faultlines: Li and Hambrick (2005)
The measure by Li and Hambrick (2005) is not intended for detecting the strongest faultline across several attributes for a given team. Instead, it is designed for a case where there is an a priori focus on faultlines with regard to a specific attribute of interest. This attribute constitutes the factions; hence, the measure is called factional faultline strength. In the Li and Hambrick study, factions referred to different nationalities in the context of a merger between an American and a Chinese company. The researchers split each team that they investigated into an American faction and a Chinese faction. They subsequently calculated the means of other numeric attributes such as age or tenure for each faction and quantified the difference between factions in a given team for each attribute with a measure that is similar to Cohen’s d,
Trezzini’s (2008) Index of Polarized Multi-Dimensional Diversity
Trezzini (2008) operationalized faultline strength as the degree of polarized multidimensional subgroup diversity for categorical attributes (PMDcat
). This measure sums the n × n pairwise juxtapositions between all possible subgroups, which are defined as the combination of all n possible attribute combinations, for example, n = 2 × 2 × 2 = 8 for a set of three attributes with two categories each. PMDcat
is then calculated as
Faultline Distances: Bezrukova et al. (2009) and Zanutto, Bezrukova, and Jehn (2010)
Both Bezrukova et al. (2009) and Zanutto et al. (2010) stated that a faultline measure should reflect not only the extent of attribute alignments across group members (i.e., the faultline strength) but also the distance between the emerging subgroups, after the strongest faultline has been detected. In both articles, the authors thus proposed detecting the strongest faultline with Fau (Thatcher et al., 2003; see Formula 1) and subsequently multiplying this value with the Euclidean distance
The statement that “faultline strength and faultline distance relate to a two-dimensional plane” (Thatcher & Patel, 2012, p. 998) shows that faultline strength and distance are perceived as being orthogonal in faultline research. However, at least if faultline strength is operationalized with Fau, this is not the case. Both the between-subgroup variability, which is used in the denominator of Fau (see Formula 1), and the Euclidean distance between the subgroup centroids (see Formula 6) contains the sum of squares of the between-group variability. Therefore, an increase in the Euclidean distance between subgroup centroids will lead to an increase of between-subgroup variability and vice versa. Thus, faultline strength and distance are related, and their multiplication increases the influence of the sum of squares of between-group variability on the resulting measure. More specifically, the multiplication of Fau with De leads to an inclusion of the range of De , which depends on the scale of the diversity attributes in the measure, which therefore has a range outside of 0, 1.
In summary, these considerations indicate that both Fau and De respond to changes in faultline distances. This challenges the assumption that faultline strength and distance are two separate constructs. Accounting for either faultline strength or distance might sufficiently cover all aspects that are essential for the measurement of faultline strengths, with the respective other potentially providing no or little additional information. We address this issue again below.
Multiple Linear Regressions: Van Knippenberg et al. (2011)
Van Knippenberg et al. (2011) proposed a measure that calculates faultlines as the product of each attribute’s amount of variance that is explained by the other attributes,
This approach does not require boundaries between separate subgroups. As a consequence, it does not reveal the subgroup structure of the data, that is, the number of subgroups and a member-to-subgroup association. Furthermore, this approach has another limitation: As each attribute is entered into a regression as dependent variable, it cannot compute faultline strength for groups that are completely homogeneous on one attribute (e.g., all male groups in a study on faultlines involving gender). In such cases, the measure returns a value of 0 for the faultline strength, even if there is some overlap in the other attributes in the group.
Latent Class Clustering: Barkema and Shvyrkov (2007) and Lawrence and Zyphur (2011)
Barkema and Shvyrkov (2007) and Lawrence and Zyphur (2011) proposed LCCA, also referred to as latent class analysis (LCA), for identifying faultlines in a stepwise way. First, several latent class solutions with different clusters are obtained over the data of a given team, where the clusters represent the subgroups. For example, if we assume that a subgroup cannot contain less than two members, we would obtain an LCCA for all subgroup solutions from one to n/2 subgroups, where n denotes the team size. Out of these possible latent cluster solutions, the best-fitting one is subsequently identified by the lowest Bayesian information criterion (BIC) value (Lawrence & Zyphur, 2011). Each team member is then assigned to a subgroup based on the posterior probabilities for a given individual to belong to a certain class. As high posterior probabilities are likely in the case of homogeneous clusters, the homogeneity of posterior probabilities of all group members, which is determined with the entropy measure, can be employed as a measure of faultline strength (Lawrence & Zyphur, 2011).
LCCA is a promising approach: It identifies a number of subgroups greater than two, and it delivers a member-to subgroup association and a measure of the overall faultline strength. Apart from these advantages, LCCA can have some practical limitations if employed for faultline calculation. First, a team size below 30 can deliver unstable results and can fail to converge (Nylund, Asparouhov, & Muthén, 2007; see also Thatcher & Patel, 2012). Furthermore, if subgroup membership is determined based on the posterior probabilities, team members can be grouped into a subgroup that is not the closest to the given member because the member fits to the distribution of another subgroup in a better way. Also, LCCA has difficulties with finding clusters in the case of binary categorical variables (such as gender) and/or within-class correlations (Muthén, 2004), which are not uncommon in team data.
We thus feel that LCCA has a lot of potential for the identification of faultlines for multiple subgroups but that it also has certain practical limitations that stem from the fact that LCCA is a broad method that was—in contrast to the others described above—not specifically developed for faultlines. We thus suggest another cluster-based approach to multiple subgroups that shares the benefits of LCCA without having some of its computational issues, which we explore further below. After introducing this measure in the following section, we will compare all measures (except factional faultlines) using simulated and real team data.
Average Silhouette Width Faultline Clustering
In line with Thatcher et al. (2003), we propose the use of cluster analysis for detecting the subgroup split associated with a group’s strongest faultline for groups with more than two homogeneous subgroups. Cluster analysis groups objects (team members in this case) into clusters (subgroups in this case) according to their similarity, such that the clusters have maximum internal homogeneity and maximum between-cluster heterogeneity (Bortz, 2005). Similar to the LCCA approach, we propose a two-step clustering procedure: In a first step, we employ known cluster-analytic methods to identify a set of start configurations (i.e., a set of subgroups) for the clustering procedure for a given team. In a second step, we permute team members through each start configuration and employ a criterion, the maximum ASW, to identify the optimal solution.
For the first step, we propose using agglomerative cluster algorithms for preclustering the team data. These procedures (often referred to as hierarchical clustering) begin with a maximum cluster separation where each object (i.e., team member) is placed in its own cluster (i.e., subgroup). These clusters are subsequently joined in a stepwise way according to various strategies, until all objects (i.e., individuals) belong to the same cluster (e.g., Mojena, 1977). From the range of available methods, the Ward algorithm (Ward, 1963) and the average linkage strategy appear optimal because they both use all available information. The Ward algorithm calculates a price denoting the increment of the error sum of squares over all group members for joining two clusters for every pair of clusters. The pair of clusters with the smallest price is subsequently joined, which results in solutions with high cluster homogeneity. The Ward algorithm, in its first steps, prefers creating small clusters in regions where objects are relatively close together. In subsequent steps, the process tends to balance the cluster sizes, which is a unique feature compared to all other agglomerative cluster methods (Bortz, 2005). Milligan’s (1981) Monte Carlo study identified the Ward algorithm as the best agglomerative cluster strategy if Euclidean distances are used as a measure of object similarity. However, in situations where subgroups are very unequal in their size, the algorithm may fail to find the optimal solution, which is why we also employ the average linkage strategy. It joins those two clusters with the smallest average distances between all cluster members across the two clusters (Bortz, 2005). By joining the nearest clusters, regardless of their size, the method optimizes the separation of the remaining clusters. Using the conjoined set of results from these two methods returns a set of 2 × n configurations, with each having a number of clusters from 1 to n, where n denotes the size of the group to be clustered (see the appendix for such a clustering for an example team).
In the second step of the procedure, we propose to determine the team’s maximum ASW (Rousseeuw, 1987) value for selecting the optimal subgroup configuration from the set of 2 × n start configurations, as we outline in the following. This approach combines both cluster cohesion and separation (Tan, Steinbach, & Kumar, 2006) in detecting the most “natural” number of clusters k by maximizing the ASW across all cluster solutions with 2 ≤ k < n clusters.
ASW is the average of all team members’ individual silhouette widths, which quantify how well a team member i fits into Cluster A in comparison to another Cluster B. This individual silhouette width is given by
We thus employ an incremental improvement method to maximize the ASW value: For all of the obtained start configurations, we calculate ASW. Next, for each configuration, we try to increase ASW by temporarily placing each individual consecutively into each “foreign” subgroup, calculating ASW after each move and reassigning the person back to his or her originating subgroup. Afterward, the move causing the highest increase in ASW is made permanent. This procedure is repeated as long as improvements can be achieved. Out of all of the solutions that we obtain in this way for cluster numbers between 2 and n, we then choose the one with the highest ASW. In this way, a given data set will always result in the same ASW faultline values. Using optimized starting configurations for the incremental cluster improvement procedure obtained by the Ward and average-linkage algorithms reduces the risk of being trapped in a solution that constitutes just a local optimum of the cluster variable (ASW) instead of the desired absolute maximum, which is a general challenge in cluster analysis. To illustrate how ASW faultlines are calculated and how the member-to-subgroup association is determined, in the appendix we provide a stepwise description of the procedure for an example eight-person team.
In sum, the ASW measure is a measure of the quality of a group’s partitioning with reference to the within-subgroup homogeneity, the between-subgroup separation, and the optimal number of clusters. As these properties of the measure perfectly align with the aim of faultline detection, we believe that ASW is ideally suitable for quantifying faultline strength and propose it as a measure for faultline strength.
Conceptually, employing ASWs as a measure of faultline strength is similar to using the entropy of posterior probabilities as a faultline measure in the LCCA approach. However, where LCCA employs different measures for the identification of the most homogeneous cluster solution (BIC) and for quantifying the faultline strength (entropy of posterior probabilities), the elegance of the ASW approach lies in the fact that it uses the ASW value for both, as ASW maximizes at the optimal cluster solution.
This approach adheres to all above propositions without having the potential practical issues of LCCA because contrary to LCCA, this approach is designed to work with small and large teams and with categorical and numeric data alike. To test ASW against the other measures and to investigate whether LCCA does have potential practical issues, in the next sections below we compare them using simulated and real data sets. By also investigating behavior under missing data and predictive validity, we aim to provide a comprehensive overview over all faultline methods and hope to help researchers in choosing the appropriate measure for their research questions. All measures employed in the analysis, including LCCA and ASW, are included in the free R package “asw.cluster” (see http://www.group-faultlines.org).
Comparison of Faultline Measures With Team Data
A comprehensive summary of the measures mentioned above requires an assessment of their behavior under varying conditions. Therefore, we first investigate the influence of different team sizes, number of subgroups, and different levels of subgroup homogeneity on the measures with simulated data sets and investigate the effect of missing data. We then apply them to a data set of real work teams from different organizations, comparing their predictive validity by correlating them with relationship conflict, a construct that has been consistently linked to faultline strength in recent meta-analytic findings (Thatcher & Patel, 2011, 2012).
Simulated Data Sets
We compared the measures using simulated data sets of teams with known faultline characteristics. Specifically, we simulated 100 teams for each cell of a design spanning small (8-person) and large (16-person) 4 teams, with faultlines based on three attributes (gender, age, and tenure, with the latter two correlated at .40) that split the team in either very homogeneous subgroups (10% within-subgroup variance) or more heterogeneous subgroups (25% within-subgroup variance). Teams were split into two, three, four, or five (for 16-person teams only) subgroups. Within the subgroups, all attributes were normally distributed around their respective center points.
To ensure comparability between the cells, the within-subgroup standard deviation was adjusted to remain at 0.1 (or at 0.25 for more heterogeneous teams), regardless of the number of subgroups. The distance between the subgroups was adjusted accordingly to keep the ratio of the within-subgroups sums of squares to the between-subgroups sums of squares at the same value for all data sets. As stated above, the distance between subgroups, the within-subgroup to between-subgroup ratio of sums of squares, and the within-subgroup standard deviation are related. Thus, by fixing the two latter features for each team to remain constant, the distance between homogeneous subgroups decreases with an increase in the number of subgroups. This caused the scenarios to differ only by the number of subgroups and by the average subgroup distance—as a consequence of the constant variability ratio—while all other parameters remained constant.
We implemented all of the above measures in a package called asw.cluster for the open-source statistical environment R (R Development Core Team, 2012) according to the formulas and descriptions provided by the original authors. 5 Our implementation of the LCCA approach builds on the FlexMix package (Grün & Leisch, 2007, 2008; Leisch, 2004), which provides LCA in R.
With the simulated data sets, LCCA was unable to converge in 12% of the homogeneous 8-person teams, in 10.75% of the heterogeneous 8-person teams, in 6.25% of the homogeneous 16-person teams, and in 4.5% of the heterogeneous 16-person teams. This indicates that LCCA values cannot always be determined reliably for three attributes with these team sizes. Apparently, the smaller and more homogeneous the team is, the higher the likelihood for LCCA to not converge.
Figure 1 shows the resulting average faultline values and their confidence intervals and reveals several aspects. First, the measures differ substantially within cells (i.e., within a certain team size with a certain number of subgroups). Second, all measures except LCCA are sensitive to changes in the homogeneity of the subgroups: They exhibit significantly larger values for homogeneous teams than for more heterogeneous teams, regardless of team size or number of subgroups. Thus, in our simulation, all measures except LCCA adhered to the second proposition, which demands sensitivity to changes in subgroup heterogeneity.

Comparison of multiple faultline measures with simulated data sets. Each bar represents 100 simulated groups; error bars are 95% CIs. Abbreviations: Fau = faultline strength (Thatcher et al., 2003); FLS = faultline strength (Shaw, 2004); PMDcat = Polarized Multi-Dimensional Diversity (Tezzini, 2008); Fau × De = faultline strength x faultline distance (Bezrukova et al., 2009); Fk = faultline strength based on multiple linear regressions (van Knippenber et al., 2011); LCCA = latent class cluster analysis; ASW = average silhouette width; SG Strength = Subgroup Strength (Gibson & Vermeulen, 2003).
Third, all measures except LCCA decreased with an increasing number of subgroups but to different extents. As the distance between subgroups decreased with an increase of the number of subgroups, this result demonstrates that all measures except LCCA are sensitive to changes in the distance between faultlines. The influence of the changes in the distance is thus due to the fact that both faultline distance and strength are based on the between-subgroup sums of squares as noted above.
Even for the measures that are not explicitly based on the sums of squares, this decline can be explained: Shaw’s FLS is influenced by the fact that the more subgroups there are, the greater the number of categories that are necessary to accurately reflect the group structure. A higher number of categories decreases the probability for alignments within subgroups and leads to lower values of FLS. PMDcat
(Trezzini, 2008) also tends to yield lower values for a higher number of subgroups, although the average degree of intersubgroup disparity between attribute combinations on which the subgroups are based increases with a higher number of categories. Because the decrease of the relative proportion of a given attribute combination (the
Fourth, with regard to the differing shapes of decrease in faultline strength, the measures that are restricted to operate on two-subgroup solutions, Fau, and Fau × De , exhibit a larger decrease between the splits in two and three subgroups than between three subgroups and above. Fk (van Knippenberg et al., 2011) and PMDcat also exhibit this characteristic drop between the two-group and the three-subgroup solution. The drop is noticeably smaller for ASW and Subgroup Strength, especially in larger groups, and does not exist in LCCA-based faultline measures. An inspection of the ground truth of the simulation revealed that the trajectories of ASW and Fau were closest to the ground truth of the simulation. Fk and Fau × De exhibited the strongest deviation from the ground truth. As expected, Fau × De was also the measure whose range was outside the bound of 1.
Of the measures included in the comparison, two were able to identify the number of subgroups present in a faultline team and the corresponding member-to-subgroup association: ASW and LCCA. To evaluate this ability, we compared their classification accuracy with the overlap between the simulated and the estimated subgroup membership matrices for each team. For a given team, the simulated subgroup matrix is a symmetric team member × team member matrix with a value of 1 in the cells for those team members who are in the same homogeneous subgroup, and a 0 for those members who are not. For homogeneous (heterogeneous) 8-person and 16-person teams, ASW achieved an average classification accuracy of 89% and 88% (85% and 86%), and LCCA achieved 53% and 51% (49% and 44%). A further simulation with 100,000 simulated groups that compared true subgroup membership with random subgroup membership resulted in an average random accuracy of 59%. Thus, LCCA fared worse than agreement expected by chance.
A visual inspection of the LCCA-based cluster solutions allowed us to determine the reason for this high error rate: Group members who belonged to visually distinct subgroups (i.e., clusters) were clustered as belonging to the same subgroup, apparently because they were on the trajectory of a distribution that LCCA employed for determining the clusters. As LCCA was apparently able to find such distributions in all cases where it converged, this finding also explains its insensitivity to changes of the heterogeneity between subgroups (see Figure 1).
The Influence of Missing Values on Faultline Measures
Missing data have to be anticipated when collecting data from groups of individuals, and its impact on team-level constructs such as faultlines should be investigated (Maloney, Johnson, & Zellmer-Bruhn, 2010), especially since single-attribute team-level diversity indices such as the Blau index are strongly influenced by missing data (Allen, Stanley, Williams, & Ross, 2007). We therefore investigated whether this is also true for faultline measures by following a procedure similar to the one employed by Allen et al. (2007) and by Maloney et al. (2010).
Specifically, we took the 16-person teams and aggregated the different number of subgroups (2, 3, 4, 5) into one data set for homogeneous subgroups and into one data set for heterogeneous subgroups, each amounting to 400 simulated teams. We then removed team members randomly and recalculated the faultline values, whose average values are plotted in Figure 2. Consequently, the value at 0 removed team members represents the “true” measure for the complete team and serves as the reference point for the subsequent trajectory.

Behavior of faultline measures under missing data conditions (see text). Abbreviations: Fau = faultline strength (Thatcher et al., 2003); FLS = faultline strength (Shaw, 2004); PMDcat = Polarized Multi-Dimensional Diversity (Tezzini, 2008); Fau × De = faultline strength × faultline distance (Bezrukova et al., 2009); Fk = faultline strength based on multiple linear regressions (van Knippenber et al., 2011); LCCA = latent class cluster analysis; ASW = average silhouette width; SG Strength = Subgroup Strength (Gibson & Vermeulen, 2003).
As is apparent in Figure 2, the influence of missing values on the various faultline measures is similar for homogeneous and heterogeneous faultline teams. In both instances, the trajectories of all measures remain fairly constant up to 40% missing values, with the exception of Fau × De and LCCA. Fau × De exhibits a linear increase with an increasing amount of missing data from the onset, whereas LCCA exhibits no change at all, because of its insensitivity to changes in between-subgroup heterogeneity (see above).
Of the remaining six measures, PMDcat , Fk , and Fau showed an increase in strength, all of similar magnitude, from about 40% missings on. Thus, with an increasing number of missing values, these measures tend to overestimate faultline strength. The remaining three measures exhibited a decrease with an increasing number of missings from about 50% of missings onward, with Subgroup Strength (Gibson & Vermeulen, 2003) showing almost no decrease in the case of more heterogeneous teams. FLS and ASW were almost in parallel. Thus, with an increasing amount of missing values, FLS, ASW, and Subgroup Strength tend to underestimate the faultline strength of a given team, but do so at a later onset, and in the case of Subgroup Strength, to a small extent.
In sum, all faultline measures except Fau × De appear to be more robust against missing data than single-attribute measures of diversity, which are strongly affected by 20% missing cases and onward (Allen et al., 2007). If for a given research question overestimating the faultline strength is more problematic than underestimating it, researchers should consider using FLS, ASW, or Subgroup Strength, with Subgroup Strength being the most robust against missing data while still exhibiting sensitivity to changes in within subgroup-homogeneity, in contrast to LCCA.
Summing up so far, ASW is the only measure that is sensitive to changes in within-subgroup homogeneity, determines the number of subgroups with adequate accuracy, and exhibits robustness against missing data up to about 50% of missing values per team.
As a next step, we examine the factor structure of faultline measures before examining their predictive validity with real team data.
Factor Structure of Faultline Measures
Given that different measures exhibit different patterns when applied to these simulated data sets and given their different properties, the question arises as to whether they measure the same underlying construct. We therefore conducted an exploratory factor analysis with oblique rotation on an aggregated data set that contained all 1,400 simulated teams, from which we removed the ones with missing LCCA values. The resulting data set contained 1,266 simulated homogeneous and heterogeneous teams with their corresponding eight faultline measures.
The Eigenvalues, parallel analysis, optimal coordinate analysis, and the scree test all suggested the extraction of two factors with Eigenvalues of 3.6 and 1.8, respectively. These explained 67% of the total variance, with the first factor accounting for 45% and the second for 22%. Table 1 shows the pattern matrix of the rotated solution.
Pattern Matrix of the Rotated Oblique Factor Solution of Faultline Measures.
Note: N = 1,266 simulated teams. Factors 1 and 2 are correlated with r = .77. Loadings < |.10| are omitted. ASW = average silhouette width; Fau = faultline strength (Thatcher et al., 2003); Fau × De = faultline strength × faultline distance (Bezrukova et al., 2009); Fk = faultline strength based on multiple linear regressions (van Knippenberg et al., 2011); FLS = faultline strength (Shaw, 2004); LCCA = latent class cluster analysis; PMDcat = Polarized Multi-Dimensional Diversity (Trezzini, 2008).
The first factor consisted of five measures with factor loadings of or above .60: Fau (Thatcher et al., 2003), Subgroup Strength (Gibson & Vermeulen, 2003), Fau × De (Bezrukova et al., 2009; Zanutto et al., 2010), Fk (van Knippenberg et al., 2011), and ASW. The second factor comprised two measures, FLS (Shaw, 2004) and PMDcat (Trezzini, 2008). The LCCA measure did not load at all on either factor, as indicated by its uniqueness of 1.00. This was most probably due to its lack of variance (compare Figure 1). The two factors were correlated at .77.
The two measures composing the second factor have two things in common: They require categorical attributes and conceptualize faultlines as an alignment of the categories of these variables within the team. Thus, we refer to FLS (Shaw, 2004) and PMDcat (Trezzini, 2008) as categorical internal alignment (CIA) measures.
The five measures with substantial loadings on the first factor all work with numeric data. With its loading of 1.00, Fau is the anchoring measure on this factor, and the loadings of the other measures can therefore be interpreted as the degree to which they are similar to Fau. Furthermore, all measures are based on a decomposition of intrateam variance: Fau, Fau × De , and ASW determine the amount of between-subgroup variability of the total variability. Fk quantifies the extent to which a given attribute’s variance can be explained by the variance of the other attributes. Subgroup Strength is based on the normed variance (i.e., the standard deviation) of attribute overlap between team members. We thus refer to Fau (Thatcher et al., 2003), Subgroup Strength (Gibson & Vermeulen, 2003), Fau × De (Bezrukova et al., 2009; Zanutto et al., 2010), Fk (van Knippenberg et al., 2011), and ASW as numeric variance-based (NVB) measures.
CIA and NVB measures appear to measure different constructs, but their high factor correlation shows that these constructs are related. Thus, overall, all faultline measures seem to tap into a similar construct, which gives credibility to the different conceptualizations and to their aggregation in meta-analyses (Thatcher & Patel, 2011). The fact that Fau × De (Bezrukova et al., 2009; Zanutto et al., 2010) loads on the same factor as Fau further underscores the previous assertion that the inclusion of faultline distance into the Fau measure does not add a different dimension or construct, as both Fau and De rely on the attributes’ sums of squares and are thus mathematically related.
At this point, we would like to emphasize that the ASW measure measures the same underlying construct as the most frequently employed faultline measure, Fau (Thatcher & Patel, 2012), while being the only one that is able to accurately determine a number of subgroups beyond two, while being sensitive to changes in within-subgroup homogeneity, and while being robust against missing data.
After examining all of these properties under the ideal conditions of simulated data sets, we put the eight faultline measures to test with a sample of real work teams.
Real Team Data
With this analysis, we wanted to compare the predictive validity of the faultline measures by correlating them with a construct that is known to be associated with faultlines. For this, we chose relationship conflict (Jehn, 1995) for four reasons: First, meta-analytic evidence shows that (demographic) faultlines are significantly associated with relationship conflict (Thatcher & Patel, 2011). Second, relationship conflict is always thought to be detrimental for teams, whereas other forms of conflict, such as task and process conflict, can have positive effects on team functioning at certain periods in a team’s life cycle (Jehn & Mannix, 2001). Third, relationship conflict is significantly and negatively related to team performance and team member satisfaction (De Dreu & Weingart, 2003), which in turn are also negatively associated with faultline strength (Thatcher & Patel, 2011, 2012). Fourth, it can be elicited in a standardized way across organizational contexts, whereas performance appraisals are usually company specific. That makes them difficult to compare in a sample with teams from several organizations, such as the one we employed.
Thus, based on these theoretic considerations and meta-analytic findings, we propose that a valid measure of faultline strength should be positively associated with team-level measures of relationship conflict.
We tested this assumption with a diverse data set of 404 employees (291 men [11 participants chose not to disclose their gender], average age = 31.64 years, SD = 5.16), nested in 59 real work teams. These were elicited in 11 companies in Germany and Switzerland in four industries: architecture (one organization, 6 teams), advertising (two organizations, 5 teams), technology (two organizations, 13 teams), and consulting (two organizations, 25 teams). All teams were project teams that worked on intellective tasks, such as project teams in consultancy, design teams in architecture, and R&E teams in technology. Average team size was 6.54 members (SD = 4.99). Average organizational tenure was 2.49 years (SD = 2.69), and average team tenure was 1.09 years (SD = 1.73). With regard to the nationality of team members, 192 were German, 90 were Swiss, and 51 were Indian. The remaining participants were from Albania, Argentina, Austria, Canada, Croatia, the Czech Republic, Great Britain, Hungary, Iceland, Ireland, Italy, Kosovo, Lithuania, Macedonia, Nepal, the Netherlands, the Philippines, Russia, Serbia, Slovakia, Spain, Sweden, Turkey, Ukraine, the United States, and Venezuela.
In each team, we elicited team members’ perception of relational conflict using the three-item scale by Jehn (1995), for example, “How much emotional conflict is there among members in your work unit?” (α = .80). Aggregation of the relationship conflict measure to the team level was justified by within-team agreement, ICC(1) = .15, F(35, 297) = 2.60, p < .001, ICC(2) = .62.
As meta-analytic evidence shows that team-level perceptions of relationship conflict are associated with demographic faultline strength (Thatcher & Patel, 2011), we calculated the above-mentioned faultline measures based on the three attributes age, gender, and nationality. Specifically, we calculated Fau (Thatcher et al., 2003), FLS (Shaw, 2004), Subgroup Strength (Gibson & Vermeulen, 2003), PMDcat (Trezzini, 2008), Fau × De (Bezrukova at al., 2009), LCCA (Lawrence & Zyphur, 2011), Fk (van Knippenberg et al., 2011), and ASW (see above). However, we were unable to employ the LCCA values, as the calculation failed to converge for all teams. LCCA was therefore excluded from the analysis. Furthermore, an inspection of outliers (larger or smaller than three standard deviations from the mean) of variables required the exclusion of four groups, all from different organizations, from the sample. Table 2 shows the correlation matrix and the means and standard deviations of the final sample of 55 teams. As Table 2 shows, only two measures exhibited the expected positive association with relationship conflict: ASW, r = .30, p < .05 (two-tailed), and FLS, r = .27, p < .05 (two-tailed). Surprisingly, PMDcat exhibited a negative correlation with relationship conflict, r = –.32, p < .05 (two-tailed). Of note, the average number of subgroups as identified by ASW was significantly larger than 2, M = 2.40, SD = 1.05, t(54) = 2.83, p < .01 (two-tailed). As expected, the number of subgroups was negatively correlated with Fau, because Fau always assumes two homogeneous subgroups and thus underestimates the faultline strength in the presence of more than two subgroups (see above). As the occurrence of more than two homogeneous subgroups is more likely in larger teams, the number of subgroups was also positively correlated with team size. Subgroups were also positively correlated with FLS, r = .49, p < .001, but negatively with PMDcat , r = –.29, p < .05.
Means, Standard Deviations, and Bivariate Correlations of Measurement Variables (N = 54 Teams).
Note: ASW = average silhouette width; Fau = faultline strength (Thatcher et al., 2003); Fau × De = faultline strength × faultline distance (Bezrukova et al., 2009); Fk = faultline strength based on multiple linear regressions (van Knippenberg et al., 2011); FLS = faultline strength (Shaw, 2004); LCCA = latent class cluster analysis; PMDcat = Polarized Multi-Dimensional Diversity (Trezzini, 2008).
*p < .05. **p < .01. ***p < .01 (two-tailed).
In sum, only one NVB measure, ASW, and one CIA measure, FLS, exhibited the expected predictive validity with relationship conflict. Interestingly, these were the only measures that were positively associated with the number of subgroups as identified by ASW in the given sample. We thus see the sensitivity to the number of subgroups of a given measure as one possible explanation for its predictive validity. We can only speculate as to why PMDcat was not related to the number of subgroups despite its conceptual similarity with FLS. Apparently, a specificity of the given data set was responsible for a negative association between PMDcat and the group size, and PMDcat was the only measure that was significantly and negatively associated with ASW. We elaborate further on this issue in the discussion below.
All in all, if one views the previous analyses in conjunction, it becomes evident that ASW is the only faultline measure for which the following all hold: (a) It is sensitive to changes in the homogeneity of subgroups, (b) it determines the number of subgroups with above-chance accuracy, (c) it is robust against up to 50% of within-team missings, (d) it measures the same underlying construct as the most frequently employed measure, and (e) it exhibits predictive validity. It also adheres to the propositions presented in the introduction. Table 3 summarizes the core results from the comparisons of the measures.
Overview of Faultline Measures and Their Properties as Determined by Conceptual and Computational Comparisons.
Note: ASW = average silhouette width; CIA = categorical internal alignment; Fau = faultline strength (Thatcher et al., 2003); Fau × De = faultline strength × faultline distance (Bezrukova et al., 2009); Fk = faultline strength based on multiple linear regressions (van Knippenberg et al., 2011); FLS = faultline strength (Shaw, 2004); LCCA = latent class cluster analysis; NVB = numeric variance-based; PMDcat = Polarized Multi-Dimensional Diversity (Trezzini, 2008).
aConstructs were identified with factor analysis; see text.
bAs factional faultlines do not necessarily quantify the extent to which a group is split into homogeneous subgroups, they were excluded from the analysis (see text).
cLCCA appeared to be insensitive to changes in the subgroup homogeneity in the simulations and was thus insensitive to missing data.
Discussion
To our best knowledge, this study was the first to compare several available faultline measures with the same data sets. Specifically, we reviewed eight existing faultline measures and, due to initial conceptual issues of those, proposed a ninth one, ASW, with the aim to overcome some of the other measures’ limitations. After excluding one measure, factional faultlines (Li & Hambrick, 2005), from further analyses because of its conceptual distinction from the faultline construct, we compared the remaining eight measures with simulated data sets, investigating their behavior under different levels of heterogeneity, different team sizes, and different number of subgroups. We also compared their predictive validity with real team data by correlating them with a measure of team-level relationship conflict.
Taken as a whole, the results show that all measures have strengths and weaknesses to different extents. Fau (Thatcher et al., 2003), the measure employed most frequently in faultline research (Thatcher & Patel, 2012), exhibited a sharp drop in the presence of more than two subgroups, caused by its inability to deal with such a case. Subgroup Strength (Gibson & Vermeulen, 2003) exhibited this drop to a lesser extent and proved to be very robust against missing data, but it is unable to uncover team members’ subgroup memberships. The same was true of Fk (van Knippenberg et al., 2011), but that turned out to be less robust against missing data. Fau × De (Bezrukova et al., 2009) exhibited characteristics similar to Fau and appeared to measure the same underlying construct, despite the fact that it was the only measure that explicitly fractures the distance between subgroups into its value. However, this (Euclidean) distance is based on the same sum of squares that is already included in Fau, which therefore also responded to changes in faultline distance. The scale of Fau × De (and the scale of Subgroup Strength) also depended on the scale of the diversity attributes, whereas the other measures operated in ranges between 0 and 1, independent of the scaling of the attributes. As Fau × De was also the measure that exhibited the least amount of robustness against missing data, we cannot see much merit in preferring it to Fau. Fau, Subgroup Strength, Fau × De , and Fk measured the same underlying construct, which lends credibility to their meta-analytic aggregation. ASW also loaded on the same factor as these measures but exhibited some features that went beyond those of the other measures that appeared to measure the same construct: It identified the number of subgroups beyond two subgroups with high levels of accuracy and was—therefore, as we assume—able to predict relationship conflict in a sample with teams that consisted of more than two subgroups on average.
The only two measures of the analysis that only work with categorical attributes, FLS and PMDcat , ended up measuring a construct that was related to the construct that underlies the other measures. It appears to mainly capture the within-team alignment of categorical variables, which is at the heart of both measures. However, only one of these two measures, FLS, exhibited predictive validity by correlating with relationship conflict in the real team data.
One possible explanation for the difference between PMDcat and FLS in this regard is that FLS includes the relation between the within-subgroup variability and the between-subgroup variability (see Formula 3), whereas PMDcat relies on the proportion of overlapping attribute combinations (see Formula 5). Thus, PMDcat is influenced by the correlations between attributes. In the case of more than two homogeneous subgroups, which were frequent in the employed data set, homogeneous and well-separable subgroups (forming strong faultlines) can be arranged with a correlation of zero between attributes, as we elaborate further below. Therefore, the context of the specificity of the data set, combined with the specific features of the measure, may have given rise to this rather surprising finding. Such issues could be potentially prevented by taking the correlation between attributes into account if more than two subgroups are potentially present, as we argue below.
A faultline measure based on the entropy of LCCA as proposed by Lawrence and Zyphur (2011) proved to be somewhat problematic in the context of our analyses, despite its promising theoretical attributes. In the simulated data sets, it failed to compute a result for up to 12% of the teams under investigation. In the case of our real team data, it did not converge in a single instance. Furthermore, in the simulation, LCCA failed to adequately identify the subgroups that were present in the data and was not sensitive to changes in the heterogeneity of the subgroups. We believe that this is due to LCCA’s difficulty with dealing with sample sizes below 30 (Nylund et al., 2007; see also Thatcher & Patel, 2012) and homogeneous dichotomous variables (Muthén, 2004). Note that our analysis focused on rather small groups and does not speak to the value of LCCA-based faultlines that are computed for larger entities such as entire organizations, as Lawrence and Zyphur (2011) suggested.
The strong drop that most faultline measures exhibited between a split in two and three or more subgroups leads to the situation where ASW delivers higher values for three-subgroup splits than most other measures in almost all cases. Combined with the finding that only ASW is able to accurately determine the true number of subgroups, this means that other common measures tend to underestimate the faultline strength in the case of more than two subgroups.
In sum, despite the fact that other measures were more robust against missing data, ASW turned out to be the most versatile and accurate measure. This speaks in favor of its future adoption as a faultline measure that is suitable for dealing with cases where multiple subgroups can at least be expected.
As noted in the introduction, teams are not static entities but change over time (Roe, Gockel, & Meyer, 2012), and the passing of time can affect a team’s faultline structure (Mäs et al., 2012). Although the sensitivity of the measures to changes in team structure over time was addressed only by investigating their responses to different levels of between-subgroup homogeneity, our results allow some conclusions regarding the measures’ usefulness for studies on organizational and team dynamics. As we found that LCCA-based measures can lack sensitivity to different levels of subgroup homogeneity, they appear to be the least suitable for studying changes in faultlines over time. Those measures that do not provide a member-to-subgroup association (FLS, PMDcat , Subgroup Strength, and Fk ) will exhibit change over time, but only in the single parameter that they deliver, the overall faultline strength. More subtle changes such as changes of subgroup memberships and changes in the relative sizes of subgroups, which could be associated with shifts in the power structure of a team (Thatcher & Patel, 2012), will go unnoticed if these measures are employed in dynamic scenarios. Fau and Fau × De can detect such changes, but only if the team is and remains restricted to two subgroups. If this cannot be taken for granted, ASW is the only measure that can model changes in the group structure over time with several indices: the overall ASW faultline strength, the number of subgroups, the member-to-subgroup association, the relative subgroup sizes, and even team members’ individual silhouette widths that can be obtained from the software (see http://www.group-faultlines.org). Thus, if applied repeatedly over time, we believe that ASW provides the highest informational resolution for modeling a team’s dynamic structural changes over time, which is an important avenue in future research on faultlines (Thatcher & Patel, 2012).
Limitations and Avenues for Future Research
We feel that our analyses have left two areas aside that we deem important fields for future research: the weighting and scaling of the attributes composing a faultline and the correlation between attributes. We take a brief look at both issues in the following.
The Weighting, Scaling, and Meaning of Demographic Attributes
With the exception of LCCA, all of the methods that we investigated computationally implicitly assume that all attributes comprising the faultline are of equal weight. The question as to whether a difference in nationality, for example, should be weighted in the same way as a difference in gender is highly debatable from our perspective. Thus, in our view, the question of how to weigh the different attributes contributing to a faultline is an important open research question that our analysis did not address. Specifically, we did not fully investigate how sensitive the measures are to the scaling of numeric data: We assume that like ourselves, many faultline researchers circumvent this issue by employing the scaling that Bezrukova et al. (2009) proposed. In line with Thatcher et al. (2003), Bezrukova et al. suggested multiplying the dummy-coded values of nominal attributes by
Sensitivity to the weighting of attributes provides an opportunity for further research: If we knew that in a specific context a split into subgroups based on a specific attribute is more important than other attributes that are also under consideration, faultline measures could take this knowledge into account to give more weight to a specific attribute. Note that this is the same idea that is behind Li and Hambrick’s (2005) measure of factional faultlines.
This issue is not just an empirical one but also a theoretical one. Expectation states theory (Ridgeway, 1987, 2003, Ridgeway & Berger, 1986) assumes that demographic attributes can be associated with stereotypes pertaining to competence and status (e.g., the stereotype that men are better at math and women are better at reading; see Chatman, Boisnier, Spataro, Anderson, & Berdahl, 2008). Furthermore, the distribution of demographic attributes in organizations conveys their meaning, and diversity should therefore be studied in the corresponding societal and organizational contexts (Joshi & Roh, 2009).
We believe that the same is true for faultlines. Dormant faultlines in large social systems create heterogeneity and even inequality, as sociological research on intersectionality (e.g., Weber, 2001) shows: In the United States, African American women suffer from different discriminations on the employment market than African Americans in general or women in general (Browne & Misra, 2003). Such meanings of categories differ between contexts: Where age could be more related to status than race or education in one country, education may be the salient index in another.
Thus, the meaning of small group faultlines emerges in the context of the meaning of attributes in larger contexts, which influence the status and roles that individuals have in small groups, and thus the meaning faultlines acquire. As a consequence, subgroups can differ in status, and these differences can differ across contexts. Note that by stating that subgroups can differ in their respective power, Thatcher and Patel (2012) highlighted a similar area for future research. Thus, in a particular context, status-relevant attributes may be more consequential for faultlines than others and could receive higher weights in the calculation of a faultline measure. Accordingly, the software for calculating ASW (see http://www.group-faultlines.org) allows such weights to be specified. However, data for a priori assignments of weights are missing, as we know little about the societal or organizational dynamics that influence which faultlines become salient and what meanings they acquire. Further theoretical advancements toward a multilevel theory of faultlines are therefore warranted that conceptualize their meaning in small groups in the context of the larger social systems of which they are a part. However, an investigation of power asymmetries between subgroups that is based on their relative size, as Thatcher and Patel (2012) called for, is already possible with ASW and its ability to display subgroup sizes and member-to-subgroup associations.
With further reference to the organizational and societal context of faultlines, it may even be the case that group members select (sub)group membership in informal and self-organizing groups in such a way that the addition of their attributes to the group increase faultline strength: The similarity-attraction paradigm (Byrne, 1971) proposes that individuals prefer other individuals who are similar to themselves. Thus, if they have a free choice, individuals will join others who are similar to them, thereby increasing subgroup homogeneity. Therefore, the distinction between small informal groups and prescribed groups (e.g., in the laboratory or in the organization), where such membership choices are ruled out, may be another contextual factor to consider in future research on faultlines.
Implications of Correlated Attributes for More Than Two Subgroups
As stated in the second proposition, a faultline measure should reflect the homogeneity of the subgroups. In that context, we believe that the relationship between the correlations among attributes and the homogeneity of subgroups requires further examination. Whereas the association between the correlation of attributes and the homogeneity of subgroups is coercive in scenarios with only two subgroups, it is more ambiguous with an increasing number of subgroups. In cases where the number of attributes is lower than the number of subgroups, even highly homogeneous and well-separable subgroups (thus forming strong faultlines) can be arranged with a correlation of zero between attributes. Thus, in the case of multiple subgroups, correlations between attributes within a subgroup are not a requirement for the formation of homogeneous subgroups.
Therefore, faultline strength can be measured in such a way that it does not rely on the correlation between attributes but only on the relation between the within-subgroup variability and the between-subgroup variability, as several measures do (Bezrukova et al., 2009; Lau & Murnighan, 1998; Shaw, 2004; Thatcher et al., 2003). In these instances, the main prerequisite for calculating faultline strength is a measure of the dissimilarity of the members within a team, for which Euclidean distances are used in the case of Fau and Fau × De . They are also used in ASW’s Ward algorithm when determining the distances between clusters.
However, if two attributes are correlated, their contribution to the distance between team members will be overstated if the Euclidean distance is employed, as they mutually incorporate a portion of the variance of the other attribute. Take the example of a team where the attributes income and years of education are correlated at r = .90. Without acknowledging this correlation, we would regard Person A with an income of $50,000 and 10 years of education as more different from Person B with an income of $70,000 and 18 years of education than from Person C with an income of $70,000 and 10 years of education. This can be counterintuitive, however, because we would expect Person B to have a higher income than Person A because of Person B’s longer years of education. In other words, B’s income is based on the same principle as A’s, and therefore they are not inherently dissimilar. However, Person C has as many education years as A but has a much higher income. In that regard, Person C can be seen as more dissimilar from A than Person B. Thus, with respect to the correlation between income and years of education, the income difference could be weighted more if the difference in education is small, so that the distance between Persons A and C should become larger than between Persons A and B, even though the one-dimensional differences between Person A’s and Person C’s attributes are smaller.
In other words, if the correlation between income and years of education were taken into account in a faultline measure, it would group those individuals into one subgroup whose attributes are in line with the correlation (e.g., employees with low tenure and low income and employees with high tenure and income), and other individuals (e.g., low tenure but high income) into another subgroup. We believe that such an option opens interesting paths for future faultline research. We thus incorporated an option in the ASW package that gives the researcher the choice to use Mahalanobis distances (Mahalanobis, 1936) instead of Euclidean distances. According to Mahalanobis, the distance between two vectors X and Y (see Formula 6) can alternatively be specified as
Using Mahalanobis distances as the basis for computing faultline strengths can allow a more in-depth examination of group diversity, because it eliminates correlations. However, the fact that Mahalanobis distances for uncorrelated data are equal to normalized Euclidean distances (e.g., Timm, 2002) may lead to the inappropriate conclusion that Euclidean distances could be generally replaced by Mahalanobis distances, regardless of whether correlations are present or not. However, Mahalanobis distances should not be applied without carefully investigating the prerequisites of their employment because the observed correlations between measured attributes are influenced by the study design, for example, previous categorizations or dichotomizations of attributes or the existence of correlated attributes that depend on variables not included in the data set (Hunter & Schmidt, 2004). For example, if people’s yearly income were dependent on their education and on the number of vacation weeks per year, a data set containing both salary and education years but not the vacation weeks would separate into subgroups of different income/education categories showing inaccurate correlations, compared to the correlation of the same attributes over the entire group. We therefore suggest that Mahalanobis distances should be applied only if the correlations of attributes within subgroups reflect their correlations in the overall data set. Still, we believe that the possibility of accounting for scenarios where the correlation between diversity attributes matters could lead to fruitful and fine-grained analyses of within-team subgroup structures.
Conclusion and Outlook
The calculation of ASW-based faultlines allows the application of faultline research to contexts with larger groups where more than two subgroups are likely to form. Thus, this measure opens the faultline concept to various fields such as training evaluation, school and educational settings, and larger organizational units, such as boards of directors or even entire departments. It could also be employed in sociological research on intersectionality, where such methodologies are lacking (McCall, 2005). In samples of this kind with larger clusters, a reassessment of the merits of LCCA should also take place.
By implementing a faultline measure that has the potential to quantify the subgroup structure of work teams in the presence of multiple subgroups more reliably than before, we hope that ASW enables organizations to identify faultlines in a better way, helping them to make the most out of their work teams. For example, ASW could be employed in HR to prevent homogeneous subgroups that can pose a danger to team productivity (Thatcher & Patel, 2011). ASW-based faultline measures could also be employed for examining whether the processes in teams with multiple homogeneous subgroups are different than those in teams with two homogeneous subgroups, as was recently proposed by Carton and Cummings (2012).
For future research, we also propose to challenge the conceptualization of faultlines as a group-level measure. Quantifying the strength of faultlines at the group level often includes trade-offs in terms of associating some individuals to rather inappropriate subgroups, in favor of maximizing the aggregated faultline measure. If these individuals were asked about their view of faultlines in their group, they would presumably consider other subgroup boundaries than the ones proposed by the aggregated and maximized faultline.
Thus, an interesting potential for further research is the ability to calculate an individual-level faultline value with individual silhouette widths. A measure of this kind would quantify the similarity of an individual to his or her subgroup with regard to multiple attributes—a multiattribute individual-level egocentric measure of separation (Harrison & Klein, 2007). Such individual faultlines for all group members would result in an adjacency matrix, containing the number of associations between the individuals in terms of being included in each other’s in-group. Methods of social network analysis could then be employed to provide further insight into the structure of the group.
In summary, most of the previously employed faultline measures seem to be more robust against missing values than single-attribute team-level measures of diversity and tap into two related constructs, which speaks to the internal validity of research on faultlines. Among the measures that we investigated, ASW has the potential to overcome the limitations of the other methods with regard to the presence of multiple subgroups, while measuring the same construct as the most established measure. We thus hope that ASW can help more organizations and more researchers to investigate faultlines in new contexts in the interest of helping diverse teams to unlock their full potential.
Footnotes
Appendix
Acknowledgments
We are very grateful to Kate Bezrukova, Jeremy Dawson, Ramón Rico, and Sherry Thatcher for their helpful comments on earlier versions of this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
