Abstract
The Group Environment Questionnaire (GEQ) is a measure of group cohesion that has a long history of use in sports psychology and group research. However, researchers often fail to account for the hierarchical nature of group data in their analysis, leading to statistical aggregation biases. This study used multilevel confirmatory factor analysis to evaluate the factorial validity of the GEQ at the individual and group levels simultaneously, using a sample of 519 netball players from 56 New Zealand semi-elite and elite teams. Results supported a four-factor model, based on the four subscales of the GEQ, at each level. Factor loadings for the final multilevel model were stronger at the group level, compared with the individual level, suggesting that cohesion is a group-level construct. This study provides evidence for the multilevel factorial validity of the GEQ and suggests that group-level analysis and interpretation should be emphasized in future research.
The Group Environment Questionnaire (GEQ) is one of the most extensively used multidimensional measures of cohesion in sports and exercise psychology (Eys, Carron, Bray, & Brawley, 2007). As such, it is important that the GEQ has reliability and validity for use with sports teams and other groups. Although several studies (Carron, Widmeyer, & Brawley, 1985; Dyce & Cornell, 1996; Li & Harmer, 1996; Schutz, Eom, Smoll, & Smith, 1994) have looked at the factorial validity of the GEQ at the individual level, no studies to date have looked at the factorial validity of the GEQ from a multilevel framework—across group and individual levels simultaneously. Conceptually and from a research perspective, cohesion is viewed as a group-level construct (e.g., Carron & Ball, 1978). This study aims to provide confirmatory evidence of the multilevel factorial structure of the GEQ, using a method called multilevel confirmatory factor analysis (MCFA).
The GEQ
The GEQ is a conceptual model of cohesion, derived from Carron et al.’s (1985) hierarchical model of cohesion in sports teams. The construct of cohesion is first divided into group and individual factors: group integration and individual attraction to group (Carron et al., 1985). These factors represent perceptions of the group from collective and personal levels, respectively. When in common, individual perceptions give rise to team-level constructs, such as the four factors posited by the GEQ. Each of these components is then divided into task and social dimensions, which describe general motivation toward achieving group objectives and developing social relationships. Group Integration–Social (GI-S) refers to a team member’s sense of group closeness, similarity, and bonding as a social unit; for example, “members of our team do not stick together outside of practices and games” (reverse-scored; Carron et al., 1985). Group Integration–Task (GI-T) refers to an individual’s beliefs about team closeness, similarity, and bonding around the group’s task; for example, “our team is united in trying to reach its goals for performance” (Carron et al., 1985). Individual Attractions to the Group–Social (ATG-S) refers to the team member’s impressions of social interactions and personal acceptance within the group; for example, “some of my best friends are on this team” (Carron et al., 1985). Individual Attractions to the Group–Task (ATG-T) refers to a group member’s feelings about personal involvement in relation to shared group goals and productivity; for example, “I do not like the style of play on this team” (reverse-scored; Carron et al., 1985). Thus, the 18 items of the GEQ form a four-factor model of cohesion (Brawley, Carron, & Widmeyer, 1987).
Reliability of the GEQ
Several studies have examined the psychometric properties of the GEQ, with variable findings (Carron, Brawley, & Widmeyer, 1998). More specifically, empirical research on the internal consistency of the GEQ has yielded mixed results (e.g., Li & Harmer, 1996; Prapavessis & Carron, 1996). The original Cronbach’s alpha values for the four cohesion subscales show moderate reliability: .64 (ATG-S), .75 (ATG-T), .76 (GI-S), and .70 (GI-T; Carron et al., 1998). While some studies have calculated similar or larger values (e.g., Carron & Ramsay, 1994; Li & Harmer, 1996), variable internal consistencies have also been reported on one or more GEQ subscales by other studies (e.g., Prapavessis & Carron, 1996, 1997; Westre & Weiss, 1991). For example, Westre and Weiss (1991) found moderate Cronbach’s alpha values particularly for the social scales: ATG-S and GI-S had lower values of .54 and .44, respectively, while ATG-T and GI-T had higher values of .68 and .66, respectively. A study by Eys et al. (2007) found larger Cronbach’s alpha values for three out of four subscales when using a positively worded version of the GEQ and proposed that the use of items that were positively and negatively worded may have reduced the GEQ’s internal consistency. A further explanation for variable internal consistency can be found in the dynamic and multidimensional nature of cohesion: All dimensions may not be salient for a group at a specific point in time, or across different types of groups (Carron et al., 1998).
Validity of the GEQ
A review by Carron et al. (1998) found substantial support for the validity of the GEQ. Content was confirmed in the development phase of the instrument and was supported by a broad literature search, use of participants as active agents in concept definitions, a conceptual basis for the scales, unbiased assessments, and intercorrelations between the items (Carron et al., 1985). Several studies have also investigated the criterion validity of the GEQ, that is, its convergent, concurrent, and predictive validity. Studies have investigated the concurrent validity of the GEQ by comparing it with the Sport Cohesion Questionnaire (Martens, Landers, & Loy, 1971), the Team Climate Questionnaire (Grand & Carron, 1982), and the Bass Inventory (Bass, 1962). In each of these three studies, the overwhelming majority of analyses supported concurrent validity of the GEQ (Brawley et al., 1987).
Several studies have also investigated the predictive validity of the GEQ for other theoretically linked variables (Carron et al., 1998). For example, cohesion was found to predict adherence behavior in exercise groups: the tendency for team members to continue their contribution, motivation, and participation in the group (Carron, Widmeyer, & Brawley, 1988). Cohesion in the GEQ has been found to predict dropout behaviors (Carron et al., 1988; Spink & Carron, 1993), attendance (Spink & Carron, 1994), absenteeism, and lateness (Spink & Carron, 1992) but not early departure behaviors (Spink & Carron, 1994). Carron et al.’s (1998) review of the validity of the GEQ also found empirical support for GEQ predictions of variables such as group size (e.g., Carron & Spink, 1995), leadership (e.g., Westre & Weiss, 1991), team building (e.g., Prapavessis, Carron, & Spink, 1996), role involvement (e.g., Grand & Carron, 1982), and collective efficacy (e.g., Paskevich, Brawley, Dorsch, & Widmeyer, 1995). However, predictive validity was not found for coordination or duration of membership (length of time as a member) variables (Carron et al., 1998). Overall, the GEQ shows considerable evidence of criterion validity (Brawley et al., 1987), indicating that this measure is psychometrically sound in its ability to provide consistent results across studies.
In contrast, research using factor analytic methods has been largely inconclusive for sports teams and other groups (Dion, 2000). Carron et al. (1998) found mixed results across four studies examining the factor structure of the GEQ. The four subscales were originally supported by exploratory factor analysis (Carron et al., 1985) and, then later, by a confirmatory factor analysis (CFA), using athletes from interuniversity sports competitions (Li & Harmer, 1996). However, limited evidence for the factorial validity of the GEQ was found in a study by Schutz et al. (1994), which used participants in individual and team sports at secondary school. Using CFA and post hoc exploratory analysis, Schutz et al. found that the initial four subscale model was a poor fit for their data. Similarly, studies by Hogg and Hains (1998), Sullivan, Short, and Cramer (2002), and (with a work-adapted version of the GEQ) Carless and De Paola (2000) failed to find adequate evidence for a four-factor structure. Furthermore, a study by Leeson and Fletcher (2005) found sufficient evidence for a two-factor model (task and social) but not for a four-factor model. The lack of consistent evidence for a four-factor structure may be in part a product of failing to consider the hierarchical structure of group data, as only single-level analyses have been used (Byrne, 2006). The original authors of the GEQ (Carron & Brawley, 2000; Carron et al., 1998) also suggested that researchers need to place greater consideration on the multidimensional and dynamic nature of cohesion. They argued that groups go through dramatic changes throughout their development and similarly go through changes in levels of cohesion (Tuckman & Jensen, 1977). Thus, all dimensions of cohesion are not concurrently or equally present at any given stage of a group’s development (Carron & Brawley, 2000). In addition, they cautioned that all dimensions of cohesion may not be present across all types of groups (Carron et al., 1998). For future CFAs, Carron et al. (1998) recommended one of two approaches: (a) a more longitudinal or extended approach including multiple assessments of a number of teams or (b) a cross-sectional approach where a broad sample of teams with differing membership qualities are studied over a broad period of development. A third approach, recommended by Byrne (2006), is multilevel modeling (MCFA), which incorporates individual and group dimensions of cohesion (e.g., players nested in teams).
In typical groups research, the individual data are often aggregated at the group level, omitting a great deal of individual variability, which can lead to spurious results (Moritz & Watson, 1998). Team members are nested within groups and players are nested within teams; this means that individual responses to the GEQ are interdependent and subject to group influences (Carron et al., 1998). The issue here is that both levels of the data need to be tested simultaneously to allow the interaction between the individual and the group to be modeled correctly.
Studies investigating the factorial validity of the GEQ have largely used an individual level of analysis. An individual-level analysis assumes that members within groups do not share any common characteristics or goals (Heck, 2001) and therefore, may underestimate the significant effects that membership in a group may have on individual behaviors (Moritz & Watson, 1998). Furthermore, the assumption that all random errors are independent, normally distributed, and homoscedastic is violated—analysis at the individual level implies that there is no expected systematic influence of variables at the higher level (Heck, 2001), which is clearly not the case with data measuring cohesion in groups. Because of these assumptions and the single-level nature of an individual analysis, the construct of cohesion and its relationships at the individual level are over-generalized to include the group level (Moritz & Watson, 1998), although other authors have emphasized this when examining group-related issues in sports (Paskevich, Brawley, Dorsch, & Widmeyer, 1999; Spink, Wilson, & Odnokon, 2010).
In the group-level approach, individual responses within groups are aggregated to provide a mean average for each group (Heck, 2001). Similar to individual analysis, group analysis can underestimate effects between levels: Group studies may not account for the way in which individuals influence their environment (Moritz & Watson, 1998). To satisfy the homogeneity assumption for this approach, an appropriate level of agreement within groups should be demonstrated (Gully, Devine, & Whitney, 1995). Kenny and Lavoie (1985) suggested use of the intraclass correlation coefficients (ICCs) procedure to test for statistical nonindependence (i.e., group effects) and to determine whether individual- or group-level analysis is justified. If the ICC is not significant, analysis at the individual level is appropriate; if the ICC is significant and positive, a group-level effect is evident and analysis at the group level should follow (Kenny & Lavoie, 1985). The degree to which consensus among players exists can be reflected in the magnitude of the index of agreement, and research suggests that values ranging from .40 to .70 represent groupness when using the GEQ (Carron et al., 2003).
MCFA
Multilevel modeling allows the researcher to consider the individual and group levels of hierarchical data and to test the stability of the factor structure at both levels simultaneously. MCFA is merely an extension of CFA to include the various levels in the data. For example, MCFA simultaneously analyzes the individual-level and the group-level model.
To date, no known studies have used MCFA to test Carron et al.’s (1985) four-factor model for cohesion. Carron and colleagues (Brawley et al., 1987; Carron & Brawley, 2000) have argued that cohesion can be conceptualized as a construct that operates at individual and group levels, as reflected in their differentiation between an individual’s personal attraction to the group and overall perceptions of the group as a collective unit. In light of this, it seems important to test the validity of the factorial structure of the GEQ, not only at the individual and group levels separately but also at both levels concurrently. Carrying out an MCFA is a logical and necessary step forward as it will allow a more accurate examination of the factor structure of the GEQ and provide further evidence for or against the adequacy of the underlying theoretical model.
The importance of establishing the multilevel factorial validity of the GEQ through MCFA is also highlighted by use of a multilevel framework for analysis in GEQ studies (e.g., Shapcott, Carron, Greenlees, & Hakim, 2010; Spink, 2005). Spink (2005) used multilevel modeling to account for the nested structure of team data in an investigation of the relationship between task cohesion and team satisfaction. He found that task cohesion predicted 33% of the variance in task satisfaction at the individual level and 55% at the group level (Spink, 2005). Similarly, Shapcott et al. (2010) used a multilevel framework to examine predictors of team-referent attributions—such as cohesion, performance history, performance outcome, and level of agreement on attribution dimensions—at the individual and group levels. At the individual level, they found that ATG-S was a significant predictor of locus of causality, while locus of causality and performance outcome were significant predictors at the group level (Shapcott et al., 2010). These findings emphasize the importance of multilevel modeling through an illustration of the differences in construct relationships at the individual and group levels. Furthermore, the use of multilevel modeling of the GEQ in these studies underscores the importance of establishing factorial validity at the individual and group levels through MCFA so that these studies can be better informed and supported in their analyses.
As this is the first study to test the multilevel factorial structure of the GEQ, the two models to be tested were Model 1, the first-order, correlated, four-factor model, and Model 2, the multilevel, first-order, correlated four-factor model.
Method
Participants
The sample comprised two data sets and included 519 semi-elite to elite level players from 56 netball teams in New Zealand. Overall, the number of participants per team ranged from 6 to 12 (M = 9.58, SD = 1.33). The average age in the combined data set was 21.45 years (SD = 5.27) with a range of 14 to 47 years. Less than 5% of participants were older than 30 years, and therefore, older participants would have little impact on the results of this study. All athletes were female and engaged in competitive netball in New Zealand. Each player was tested prior to competing. The first data set comprised 318 netball players in 31 teams and was collected at national-level tournaments. The average age in this first sample was 20.34 years (SD = 4.13 years), and the average team size was 10.25 players (SD = .77). The second data set consisted of 201 players from 25 teams and was collected before the first game of the season in a regional league. The average age in the second sample was 23.08 years (SD = 6.17), and the average team size was 8.76 players (SD = 1.47). Netball is a ball sport, similar to basketball, which is played predominantly by females in a number of countries around the globe. Players are assigned certain positions that restrict their movements across different areas of the court, and the ball is moved up the court via passes to other players. Players cannot dribble or run with the ball, and the aim is to shoot the ball into the opposing goal more than the opponents. The combination of these two data sets provided an adequate sample size from which to conduct MCFA. This is not an uncommon approach when examining team-level variables (Carron et al., 2003).
Measure
The GEQ was administered as part of a larger study that involved additional measures. The items for each scale were presented in order, and all measures were counterbalanced across participants and teams. Participants were allowed 40 minutes to complete all measures and were assured of confidentiality of their responses. The GEQ consists of 18 items that can be divided into four subscales. These subscales are the ATG-S (5 items), the ATG-T (4 items), the GI-S (4 items), and the GI-T (5 items). The GI-S and GI-T measure an individual’s perceptions about group integration as a social unit and around group tasks, respectively. The ATG-S measures a participant’s interpersonal attraction to group social interactions, while the ATG-T measures feelings about personal involvement in relation to group productivity and objectives. Participants were asked to respond to each of the 18 items on a 9-point Likert-type scale (1 = strongly disagree, 9 = strongly agree). Reverse scoring was used for negatively worded items, and ratings were summed for each subscale, with higher scores indicative of greater cohesion.
Data Analysis
Descriptive statistics
After negatively worded items on the GEQ were reverse-scored, means, standard deviations, skewness, and kurtosis statistics were calculated for each item of the GEQ (see Table 1). For each of the four subscales, means, standard deviations, and internal reliability statistics are reported in Table 2.
Summary Descriptive Statistics for GEQ Items (N = 519).
Note. Individual Attractions to the Group–Social (ATG-S): Items 1, 3, 5, 7, 9; Individual Attractions to the Group–Task (ATG-T): Items 2, 4, 6, 8; Group Integration–Social (GI-S): Items 11, 13, 15, 17; Group Integration–Task (GI-T): Items 10, 12, 14, 16, 18. GEQ = Group Environment Questionnaire.
Summary Descriptive and Reliability Statistics for GEQ Subscales (N = 519).
Note. Response scale: 9-point Likert-type scale (1 = strongly disagree, 9 = strongly agree). ATG-S = Individual Attractions to the Group–Social; ATG-T = Individual Attractions to the Group–Task; GI-S = Group Integration–Social; GI-T = Group Integration–Task; GEQ = Group Environment Questionnaire.
Analysis of variance
One-way ANOVAs were calculated for each of the four GEQ subscales to determine whether there was a significant amount of between-team variation. A significant F-value indicates that there is greater variance between teams in their responses compared with variation within team players and justifies aggregation of responses and analysis at the group level (Moritz & Watson, 1998).
ICCs were also calculated for each GEQ subscale to provide further information on the degree of systematic variance at the group level (Dyer, Hanges, & Hall, 2005). ICCs measure the portion of between-group variance relative to total variance (Hox, 2002) and provide an indication of the level of homogeneity or agreement of responses within teams. ICC values range between 0 and 1, with larger values suggesting greater group effects within the data. Although there are no firmly established guidelines stating the minimum ICC value needed to justify multilevel analysis, most published MCFAs have reported ICCs greater than .10 (Dedrick & Greenbaum, 2011). However, multilevel analysis is generally seen to be appropriate when ICC values are greater than or equal to .05 (Dyer et al., 2005), as this provides evidence of sufficient group-level variance to justify multilevel modeling.
CFA models
A CFA model was calculated using the 18-items of the GEQ (Model 1) and was conceptualized as four correlated factors (ATG-S, ATG-T, GI-S, and GI-T) based on the four subscale structure of the GEQ noted by Carron et al. (1998, 1985) and Li and Harmer (1996).
MCFA procedures
In MCFA, the total sample covariance matrix is partitioned into within-group and between-group components (Byrne, 2006). These covariance matrices are used to test the factor structure at the individual and group levels.
The multilevel model (Model 2, see Figure 1) was represented by identical four-factor CFA models at between- and within-group levels, with each of the four GEQ subscales representing a factor. This model was evaluated by determining, at each level, whether the factor structure was clearly defined and examining goodness-of-fit indices for evidence of an adequate model fit.

Model 2, a two-level, correlated four-factor model of the GEQ.
Model analyses were carried out using EQS version 6.1 (Bentler, 2007), which allows parameter estimation using the maximum likelihood (ML) analytical procedure. The ML approach calls for large sample sizes, particularly at the highest level of the hierarchical structure (Level 2), to ensure that there is enough variability at that level and that convergence of the model is possible. To gain better estimates of the standard errors, Maas and Hox (2004) suggest that at least 50 groups are needed. EQS handles all estimation by means of the EM algorithm and is one of the few current structural equation modeling (SEM) programs that can compute ML estimation with unbalanced group sizes (Byrne, 2006).
Evaluation of model fit
Several goodness-of-fit indices were used to evaluate and compare models. The chi-square statistic was reported, although not overly emphasized, due to a tendency to be inflated by the number of imposed restrictions and sample size, and associated problems with power (Marsh, Balla, & McDonald, 1988). The comparative fit index (CFI) and root mean square error of approximation (RMSEA) were used as incremental goodness-of-fit indexes. The CFI has a range of 0 to 1, and values between .90 and .94 are typically seen to indicate a good fit, while values at or above .95 are seen to reflect an excellent fit to the data (McDonald & Marsh, 1990). For the RMSEA, a close fit is thus indicated by a score of .05 or less, a reasonable fit by a score of .08 or less, and a mediocre fit by values between .08 and .10 (Browne & Cudeck, 1993). Fit indices are more sensitive to evaluating models at the individual level and less sensitive to misspecifications at the group level (Hox, 2002). Thus, other key indicators such as standardized factor loadings and factor correlations were used to evaluate the robustness and soundness of models, in conjunction with the goodness-of-fit indices described above. It is worth remembering that fit indices are merely guidelines, and Bollen (1989) suggests that when testing models in an emerging field, lower fit might be acceptable. In the case of MCFA with the GEQ, this might be viewed as acceptable progress in testing the structure across the two levels of data (player and team), given that there are very few studies to establish foundations for comparison.
Results
Descriptive Statistics
Descriptive statistics for the GEQ items are summarized in Table 1. Item means ranged from 4.57 (SD = 1.99) for GEQ15 (“Our team would like to spend time together in the off season”) to 6.56 (SD = 2.31) for GEQ10 (“Our team is united in trying to reach its goals for performance”). Responses showed an approximately normal distribution with skewness statistics ranging from −0.97 to 0.10 and kurtosis values ranging from 1.57 to 0.20 (see Table 1). Overall, there was minimal evidence of any substantial univariate skewness or kurtosis. Furthermore, ML statistics are fairly robust to the effects of any outliers or departures from normality.
Summary statistics were also calculated for each GEQ subscale, as shown in Table 2. ATG-S (n = 5, M = 25.72, SD = 8.55) had a Cronbach’s alpha of .58, while ATG-T (n = 4, M = 22.53, SD = 10.74) had a Cronbach’s alpha of .91. The GI-S subscale (n = 4, M = 20.49, SD = 5.92) had a Cronbach’s alpha of .49, while the GI-T subscale (n = 5, M = 29.99, SD = 7.71) had a Cronbach’s alpha of .66. Overall, internal reliability estimates for the GEQ subscales were in the low to moderate range, with the exception of high reliability for ATG-T and were comparable with values found in other studies (e.g., Carron et al., 1998; Carron & Ramsay, 1994; Li & Harmer, 1996; Westre & Weiss, 1991).
Analysis of Variance
A one-way ANOVA was conducted to gain information to calculate the ICC values. As shown in Table 3, F values were significant at the .001 level for ATG-S, F(55, 463) = 9.02, p < .001; ATG-T, F(55, 463) = 28.79, p < .001; GI-S, F(55, 463) = 3.31, p < .001; and GI-T, F(55, 463) = 4.12, p < .001. The statistically significant F values indicate that responses on each of the four subscales differed significantly across teams and were relatively homogeneous within teams. In other words, team members agreed with each other in terms of their responses. This is further indicated by the ICC values. Table 3 shows the ICC values for ATG-S (.46), ATG-T (.75), GI-S (.20), and GI-T (.25). Values were substantially greater than .10, suggesting that there was sufficient between-group variability to justify multilevel analysis.
One-Way ANOVA and Intraclass Correlations for GEQ Subscales.
Note. ATG-S = Individual Attractions to the Group–Social; ATG-T = Individual Attractions to the Group–Task; GI-S = Group Integration–Social; GI-T = Group Integration–Task; GEQ = Group Environment Questionnaire; ICC = intraclass correlation procedure.
p < .001.
CFA Models
Review of the goodness-of-fit statistics in Table 4 shows a mediocre fit for Model 1. The chi-square value for Model 1 was significant, χ2(129) = 890.11, p < .001, and indicated room for improvement in fit. Similarly, the goodness-of-fit statistics suggested a mediocre fit (CFI = .81, RMSEA = .11). Cronbach’s alpha was .85 (see Table 4), indicating moderate internal reliability for Model 1. The standardized factor loadings, as shown in Table 5, ranged from .09 to .86, with a moderate average of .48. Interfactor correlations for Model 1 (see Table 6) ranged from .48 (between ATG-T and GI-S) to .92 (between ATG-S and ATG-T). While the fit is on the lower side, it serves as a basis on which to examine the multilevel structure of the measure and is consistent with Carron et al.’s (1985) conceptualization of the GEQ.
Fit Indices and Internal Reliability for CFA and MCFA Models.
Note. GEQ = Group Environment Questionnaire; CFA = confirmatory factor analysis; MCFA = multilevel confirmatory factor analysis; CFI = comparative fit index; RMSEA = root mean square error of approximation; W = Within-group level; B = between-group level.
Standardized Factor Loadings for CFA and MCFA Models.
Note. Factor loadings < .40 are in boldface. CFA = confirmatory factor analysis; MCFA = multilevel confirmatory factor analysis; GEQ = Group Environment Questionnaire.
Factor Correlations for MCFA Models.
Note. All factor correlations were significant (p < .05). MCFA = multilevel confirmatory factor analysis.
MCFA
Review of the goodness-of-fit statistics (see Table 4) show a notably better fit for Model 2 in comparison with Model 1 when analyzing both levels of the data. The chi-square value was large and significant, χ2(272) = 2,564.11, p < .001. The CFI (.85) and RMSEA (.08) suggest that the multilevel model had mediocre fit to the data. Although fit indices do not indicate an excellent fit, they represented a reasonable fit (Browne & Cudeck, 1993; McDonald & Marsh, 1990) and an improvement on the initial single-level CFA models. Cronbach’s alpha of internal reliability for Model 2 was moderate to high, especially at the between-groups level (W = .76, B = .93), as seen in Table 4.
Standardized factor loadings for Model 2 (see Table 5) were noticeably higher at the between-group level, ranging from −.53 to .99 with an average of .66, in comparison with the within-group level, where values ranged from .09 to .50 with an average of .36. Higher values at the between-group level indicate that the multilevel model has a superior fit at this level in comparison with the within-group level. This supports the use of multilevel analysis and the proposition that cohesion is a group-level construct. Interfactor correlations for Model 2 (see Table 6) were all significant (p < .05) and showed a range of .18 (between ATG-S and GI-T) to .78 (between ATG-S and GI-S) at the within-group level, and .91 (ATG-T and GI-S) to 1.00 (between ATG-S and GI-S, ATG-S and GI-T, and GI-S and GI-T) at the between-group level. Correlations between factors were also substantially higher at the between-group level, reflecting a group-level construct.
One item in Model 2, GEQ13 (“Our team members rarely party together”) was found to have a negative loading at the between-group level (B = −.53) and a positive loading at the individual level (W = .47). The mix of moderate positive and negative factor loadings indicate that although this item fits well at the individual level, it does not work well at the group level. However, this item does not stand out as a rogue item, as it does not display any extreme or abnormal characteristics—item mean, standard deviation, skewness, and kurtosis statistics are within the normal range of values (see Table 1). It is possible that this item (“Our team members rarely party together”) may have been adversely affected by the ambiguity of the term party in New Zealand culture—does this include social drinks, celebrations organized by a sports club, or independent social gatherings? The GEQ was designed in the context of North American culture and the negative factor loading at the group level may, in part, reflect different interpretations of the term party, within teams and across different teams. As a result of a negative factor loading at the group level, item GEQ13 was removed to create an improved multilevel model, Model 3, which was thus an exploratory post hoc analysis.
Goodness-of-fit indices for Model 3 (CFI = .85, RMSEA = .08) were identical to those found for Model 2 and similarly indicated a mediocre fit to the data (see Table 4). Cronbach’s alpha indicated high internal reliability with a modest improvement from Model 2 at the between-group level (α = .95) and a slight decline at the within-group level (α = .74). Standardized factor loadings (see Table 5) were low to moderate at the within-group level, ranging from .10 to .52 with an average of .35. At the between-group level, standardized factor loadings were substantially higher and had an average of .72 and a range of .16 to .91. This provides strong evidence for cohesion as a group-level construct. Standardized factor loadings were very similar to Model 2, with only a slight improvement in the average between-group value and a slight decrease for the average within-group value. All factor loadings in Model 3 were positive for between- and within-group levels, indicating that all remaining items fit (to varying degrees) at both levels. Interfactor correlations for Model 3 (see Table 6) were all significant (p < .05) and ranged from .18 (ATG-3 and GI-T) to .99 (ATG-S and GI-S) at the within-group level and from .77 (ATG-S and GI-S) to .99 (ATG-S and ATG-T) at the between-group level.
Discussion
The present study assessed the multilevel structure of group cohesion, as measured by the GEQ. Although fit indices for the multilevel models (Models 2 and 3) did not indicate excellent fit to the data, they represented an improvement on the initial single-level CFA models and acceptable progress in this emerging field of research (Bollen, 1989).
An examination of ICC values for each GEQ subscale revealed that the data showed a high level of variance at the group level, confirming the appropriateness of multilevel modeling. ICC values for ATG-S (.46) and ATG-T (.75) were particularly high in comparison with previous studies, where ATG-S values have generally ranged from .13 (Burke et al., 2005) to .19 (Senecal, Loughead, & Bloom, 2008) and ATG-T values have ranged from .17 (Carron, Bray, & Eys, 2002) to .33 (Patterson, Carron, & Loughead, 2005). The group integration scales, GI-S and GI-T, had ICC values (.20 and .25, respectively) that were typical of values found in previous literature: GI-S values have ranged from .16 (Heuze, Raimbault, & Fontayne, 2006) to .44 (Senecal et al., 2008), and GI-T values have ranged from .12 (Senecal et al., 2008) to .33 (Burke et al., 2005). It is possible that the notably high ICC values for the interpersonal attraction to group scales may, in part, be a reflection of the nature of the sample. Traditionally, females are seen as more interpersonal and relational than males, and an all-female sample may have led to higher group consensus on personal involvement and attachment to the group as a social (ATG-S) and task-focused (ATG-T) unit. A future comparison study using an all-male sample may begin to shed some light on the truth of this assumption.
Internal reliability estimates for each of the four subscales ranged from low to high, although it is important to note that these values did not account for the grouping structure of the data. Our values were consistent with previous findings that have shown similar levels of internal reliability for the GEQ subscales (Carron et al., 1998) and typically lower values for the social scales ATG-S and GI-S (Westre & Weiss, 1991).
Low but acceptable fit statistics for multilevel models provided evidence for a four-factor structure across within- and between-group components. One GEQ item in the initial multilevel model (Model 2) displayed a negative group-level factor loading and was removed from the final MCFA model (Model 3).
Taken together, these results provide support for the multilevel structure of the GEQ and its four subscales. The superior fit of the final multilevel model in comparison with the fit of the single-level CFA models provides compelling evidence that cohesion is indeed a group-level construct and that the hierarchical nature of the data needs to be taken into account when analyzing group data using the GEQ. Similarly, higher factor loadings at the team level, compared with player level, supports the depiction of cohesion as a group-level construct and suggests that future research using the GEQ might consider emphasizing interpretation at the group level, rather than the individual level, when analyses at both levels are carried out.
Furthermore, these findings and recommendations are underscored by the strength of this study’s sample size. This sample, comprising 519 players from 56 netball teams, is of significant magnitude and is above the range of 30 to 50 group-level units recommended for MCFA by Dedrick and Greenbaum (2011). Furthermore, use of two samples with participants ranging from semi-elite to elite adds to the generalizability of the results. This process is in line with that outlined by Carron et al. (2003), whose work is notable in that the range of different sample sets used added to the generalizability of the results.
A few potential limitations should be considered when interpreting the present study’s findings. Our sample of female netball players was very specific in nature. While a homogeneous sample can also be considered a strength as it reduces variability, it is important that future MCFA studies investigating cohesion encompass genders and a number of different team sports, to increase generalizability of findings. Future research could also investigate cohesion in other types of groups, for example work teams, to examine the contextual nature of the GEQ from a multilevel perspective.
More generally, it is important that researchers continue to use MCFA to investigate the multilevel structural validity of instruments similar to the GEQ. MCFA can be used to systematically test whether the structure of a construct is different across levels of analysis and can provide a more accurate factor analysis of hierarchically structured data. Such an approach leads to a more real-world understanding of the cross-level effects of individuals nested in teams, which currently from a cohesion point of view is lacking.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
