Abstract
In working to understand the predictors of experiential learning in teams, researchers have focused on one variable more than any other—psychological safety. In virtually all of this work, psychological safety is viewed as a direct predictor of team learning and, through team learning, of team performance. We suggest that this work has overlooked the critical effect the nature of the task environment has on the capacity of psychological safety to have beneficial effects. To investigate this, we conduct a comprehensive meta-analysis of studies examining the relationships between psychological safety, team learning, and team performance. We find that psychological safety is more strongly associated with learning and performance in studies conducted in knowledge-intensive task settings, that is, settings that involve complexity, creativity, and sensemaking. The results of this study suggest that psychological safety may be insufficient to stimulate learning in groups where the task environment does not require learning.
Keywords
In a world of rapidly evolving technologies and intense global competition, continuous learning is a critical organizational performance requirement. Moreover, the specialized and interdependent nature of modern work, combined with the limits of individual problem solving, suggest that much of this continuous learning must and should happen in the context of groups and teams (Paulus & Nijstad, 2003; Sawyer, 2008). Recognizing these two facts, researchers have devoted considerable attention to understanding the dynamics of learning in groups. At least two patterns emerge from a review of this literature. First, groups vary considerably in the extent to which they are engaged in the continuous learning process (Argote, 2012; Argote & Epple, 1990). And second, past attempts to account for this variance have focused on one variable more than any other: psychological safety, or “a shared belief held by members of the team that the team is safe for interpersonal risk taking” (Anderson & West, 1998; Edmondson, 1999, p. 354; Schein & Bennis, 1965).
The theoretical argument underlying this broad interest in psychological safety is fairly intuitive. Psychological safety creates a context in which team members feel safe to engage in the risky behaviors that promote experiential learning—experimentation, asking questions, and flagging/discussing errors—and these behaviors facilitate higher performance. In other words, psychological safety is presumed to enhance the expectancy of engaging in experiential learning behaviors by removing barriers of fear, uncertainty, and self-defensiveness that can impede those behaviors. Greater psychological safety in a team should, therefore, enhance both learning and performance.
While this argument is compelling and the preponderance of the empirical evidence seems to support it, our review of the literature left us with two concerns. First, there is considerable variability in findings across studies. And secondly, almost no attention has been paid to potential moderators that might explain that variability. Of the 20 empirical papers we found that hypothesize relationships between psychological safety and team learning, 17 position psychological safety as an unmoderated antecedent, 1 positions psychological safety as a moderated antecedent, and 2 position psychological safety as a moderator. The 19 empirical papers with hypotheses involving psychological safety and team performance are similarly consistent; 17 of those papers position psychological safety as an unmoderated antecedent, no papers propose it as a moderated antecedent, and 2 propose that psychological safety is a moderator. In sum, the extant empirical literature—including emerging work—has generally ignored the potential role of moderators.
This paper begins to address this oversight by highlighting one key moderator that can help to explain variance in past research on the relationship between psychological safety, team learning, and team performance: the nature of the task. More specifically, we will suggest that the magnitude of the relationship between psychological safety and team learning or performance is dependent on the knowledge intensity of team tasks. This proposition is grounded in contingency theories of organizational behavior (e.g., Lawrence & Lorsch, 1967) which suggest that a team’s social fabric becomes more critical for performance when tasks require greater social interaction and collective problem-solving, as with knowledge-intensive work (Gladstein, 1984; Stewart & Barrick, 2000; Tushman, 1978; Vashdi, Bamberger, & Erez, 2013). In such cases, social norms that facilitate productive social interaction and initiative-taking—like psychological safety—should become particularly important. It follows that psychological safety should become less important for tasks where problem-solving and information-sharing are less central to success. We examine this basic proposition by means of a comprehensive meta-analysis of extant research on the relationships between psychological safety, learning, and performance in teams, and explicitly account for the context in which these studies were conducted.
In pursuing this research agenda, we respond to past calls to contextualize research on learning (Edmondson, Dillon, & Roloff, 2007), teams (Cronin, Weingart, & Todorova, 2011), and management in general (Cronin & Bendersky, 2012). Contextualizing our research enables us to provide more guidance to both scholars and practitioners who wish to understand the range of settings within which a particular effect or prescription will—and, importantly, will not—hold. A key goal of this paper, therefore, is to examine whether the strong prescriptions about the positive and universal learning and performance benefits of psychological safety that have been advanced in the literature need to be tempered by acknowledging that those effects may be limited to teams that perform knowledge-intensive work.
The critical role of knowledge intensity
When psychological safety was first introduced by Schein and Bennis (1965) nearly 50 years ago, it was suggested that it would help organizations with what they felt were the two most pressing business needs of the time—(a) improving employee knowledge utilization through interspecialist communication and (b) becoming more adaptive. They proposed that psychological safety would help with these interrelated issues because it would mitigate against the primary reason that employees do not engage in these behaviors: fear of expressing different viewpoints.
Though their ideas were subsequently taught to many practitioners, psychological safety did not receive much attention from academics until it was incorporated into research on the experiential learning process. Experiential learning is an iterative cycle of acting/experimenting, reflecting on the consequences of past action in order to revise and update cognitive representations, and planning new experiments or courses of action. At the group level, these learning activities are intrinsically social in that group members must take one another’s past and anticipated future actions and intentions into account as they negotiate a shared understanding of how to achieve collective goals. Experiential learning at the group level therefore requires that group members openly acknowledge and evaluate past actions (including mistakes), express and discuss divergent views, speak up to correct errors or misinformation, and try new and unproven ideas in order to discover what works (for reviews, see Edmondson et al., 2007; Wilson, Goodman, & Cronin, 2007). These activities are interpersonally risky, since they open the door for criticism, judgment, sanction, and disapproval (Detert & Edmondson, 2011; Edmondson, 1999).
Recognizing these risks and vulnerabilities, West (Anderson & West, 1998; West, 1990) and Edmondson (Edmondson, 1999) revitalized the concept of psychological safety by conducting research that applied psychological safety to the interpersonal problems that are inherent in the experiential learning process. Their basic premise is that psychological safety reduces the perceived costs of engaging in risky learning activities like those described before by reassuring the actor that he or she will not be “hurt, embarrassed, or criticized” for doing so (Edmondson, 2003, p. 260). As a result, “the benefits of speaking up are likely to be given more weight” relative to the costs (Edmondson, 1999, p. 355). These studies ignited an impressive volume of research on the relationship between psychological safety, team learning, and team performance.
In pursuing this research agenda, however, researchers have tended to ignore the role of context, and have implicitly assumed that psychological safety will be universally beneficial for team functioning (Brueller & Carmeli, 2011; Nembhard & Edmondson, 2011; Schaubroeck, Lam, & Peng, 2011). But research on the contingent view of organizational behavior (Lawrence & Lorsch, 1967; Stewart & Barrick, 2000; Vashdi et al., 2013) provides a more nuanced view. This literature proposes that social norms are more important when teams work on knowledge-intensive tasks because such tasks require highly variable interpersonal activities like planning, discussing options, and ultimately coming to agreements (Goodman, 1986; Herold, 1978). As a result, they are more impacted by social norms because such shared understandings function as critical guideposts that help members navigate these exchanges.
This task-based contingency implies that psychological safety—a key social norm—should have more impact on team behaviors and performance when teams work on more knowledge-intensive tasks—that is, tasks that require applying, interpreting, and recombining members’ specialized knowledge (Gladstein, 1984; Stewart & Barrick, 2000). Moreover, it is precisely in such knowledge-intensive tasks where experiential learning—that is, experimenting and sharing perspectives, reflecting on past actions—becomes particularly important for group performance and, therefore, where psychological safety becomes such an important enabling condition. Put differently, psychological safety is unlikely to stimulate learning in settings where learning simply isn’t critical for team success. But in settings where learning is critical—knowledge-intensive tasks—psychological safety becomes an essential performance requirement.
It follows that not all teams have much to gain from engaging in the experiential learning process because they do not need to process much information to figure out the best course of action (Gladstein, 1984; Stewart & Barrick, 2000; Tushman, 1978); they simply need to execute on others’ decisions (McGrath, 1984). When teams are working on cognitively simple, isolated, and routine tasks where the ends and means of production are clear, they get little benefit from and members have little reason to engage in the experiential learning process. As a result, psychological safety’s reduction of learning-related costs should be of little consequence for team learning or performance.
In short, the effects of psychological safety on team learning and performance should be larger in teams that perform more knowledge-intensive tasks, which we define (following past work) as tasks that have three characteristics. First, they are more complex in that they require a variety of different skills (Baer, Oldham, & Cummings, 2003). Second, they require creativity or the generation of novel solutions and approaches (Torrance, 1966). And third, they require sensemaking or the capacity to organize and assign relative significance to ambiguous information (Weick, Sutcliffe, & Obstfeld, 2005). These three characteristics lie at the heart of past conceptualizations of what researchers mean when they say that particular work is knowledge-intensive (Cross & Cummings, 2004, p. 929; De Dreu & Weingart, 2003; Galbraith, 1973; Gardner, Staats, & Gino, 2012; Haas, 2010, p. 993; Tushman, 1978).
In sum, we propose that the relationship between psychological safety and both team learning and team performance will be contingent on the knowledge intensity of team tasks, defined in terms of task complexity, creativity requirements, and sensemaking requirements. In the following section, we test this proposition using a meta-analysis. This methodology is well suited for our purpose because it allows us to test our theory using a large sample of teams that are responsible for a variety of tasks, and to explicitly and directly examine whether documented relationships between psychological safety and both team learning and performance are contingent on task characteristics.
Data and methods
Identification and coding of studies
We used a variety of approaches to ensure that we included as many relevant studies as possible in our meta-analysis. First, we conducted keyword searches using relevant terms such as “psychological safety,” “participative safety,” “participation safety,” “safety climate,” and “intrateam trust” using a variety of databases (e.g., Business Source Complete, ISI Web of Knowledge, PsycInfo). Second, we reviewed all papers that cite those source papers where survey measures for psychological and participation safety were introduced (Anderson & West, 1998; Edmondson, 1999). Third, we searched the ProQuest Dissertation database and the last 10 years of Academy of Management (AOM) and Society for Industrial and Organizational Psychology (SIOP) conference proceedings to identify relevant, unpublished papers. Fourth, we contacted authors who had published two or more studies on psychological safety and asked for any relevant unpublished work. Fifth, we checked the reference lists from recent meta-analyses and review chapters that focus on psychological safety or innovation/learning (Frazier, Fainshmidt, Klinger, Vracheva, & Pezeshkan, 2012; Hammond, Neff, Farr, Schwall, & Zhao, 2011; Nembhard & Edmondson, 2011; Tumasjan, Strobel, Portele, & Welpe, 2012). We also posted a request to the listserv of the Organizational Behavior division of the Academy of Management. In total, 1,551 papers were identified.
Criteria for inclusion
We used inclusion criteria similar to those used in other team-level meta-analyses (Balkundi & Harrison, 2006; Hong, Liao, Hu, & Jiang, 2013). First, we dropped all theoretical or qualitative papers (651 of the 1,551 papers). Second, we included only studies of teams. Third, to prevent double counting, we included only studies with original data. Fourth, a study had to measure psychological safety and team learning and/or team performance. To qualify as a measure of psychological safety, a measure had to assess a shared belief among team members that the team climate permitted risk-taking and openness without fear of reprimand or reprisal (see Edmondson, 1999). A team learning measure had to explicitly assess one or more aspects of the experiential learning process—experimenting, reflecting on experience, and/or updating representations. And a team performance measure had to assess performance using direct efficiency/effectiveness measures or supervisor performance ratings; we did not include self-reported performance measures.
Using these criteria, eligible papers were identified by one of the authors. As a reliability check, an independent coder applied the inclusion criteria to 50 randomly selected papers (after removing qualitative and theoretical papers). The author and coder agreed on all but one paper; in that case, the coder confused regression and correlation results. In the end, we were left with a final sample of 51 studies (48 papers) that met our inclusion criteria. These studies included 17,944 people on 3,687 teams.
Extracting correlations
The effects of central interest are the correlations between psychological safety and team learning, psychological safety and team performance, and team learning and team performance. All of the studies in our sample either reported these associations as correlations or we were able to obtain correlations from a study author. We used two coders—one author and one research assistant—to independently identify and extract the needed information from each study. The two coders disagreed on two correlations (96% agreement). These two disagreements came from two studies in the same paper (Mathisen, Einarsen, Jørstad, & Brønnick, 2004). In both cases, one coder coded customer satisfaction as a performance measure but the other coder did not. The disagreement was resolved through discussion and the paper is included in our analysis. 1
Coding the task environment
In order to code the task environment for the teams in each study, we used two government-sponsored occupational databases: the Dictionary of Occupational Titles (DOT) and the Occupation Information Network (O*NET). Both databases have been used extensively in past academic work to quantify the characteristics of different jobs and work settings (DOT: Shalley, Gilson, & Blum, 2000; Tierney & Farmer, 2002; Xie & Johns, 1995; O*NET: Chiaburu & Harrison, 2008; Colquitt, Lepine, Zapata, & Wild, 2011).
The DOT and O*NET databases were established to thoroughly describe a comprehensive listing of jobs in order to help people find jobs they will be qualified for and happy doing. In keeping with this mission, both databases contain a wide variety of information about jobs. The DOT quantifies the physical and mental skills required for a job and provides brief qualitative descriptions. O*NET quantifies required skills as well as other aspects of jobs including the amount job holders must engage in various types of behaviors and activities as well as the types of knowledge they must have. Both databases have a similar structure where each job is given a code and that code is linked to the descriptors. (See Cain and Treiman [1981] for more detailed information about the DOT and Peterson, Mumford, Borman, Richard, and Fleishman, [1999] for more information about O*NET.)
Two independent research assistants linked the jobs performed by the team members in each study to the corresponding O*NET and DOT occupational classifications thereby connecting each job to their task descriptors. Studies that were not conducted with people at their place of work were excluded from this coding. Since the match between tasks and codes is often imperfect and context descriptions in studies can be brief and vague, coders selected the three occupational classifications that most closely matched the tasks described in a given study. If a particular study sampled teams that performed different tasks, we assigned codes for each of these different tasks and then averaged them to generate a single measure for the study. In doing so, we used a weighted average when the necessary information was available. If there was not enough information in a study for the coders to confidently and accurately assign a code, we contacted the study authors for more information. Our coders were ultimately able to assign codes to all but two of the field studies. One study was excluded because the coders could not confidently assign a code given the available information and one was excluded because the jobs were not included in O*NET or DOT. There were no significant differences between included and excluded field studies in the magnitude of effect sizes (p > .10 in all ANOVAs). Interrater agreement was acceptable (Landis & Koch, 1977) in assigning DOT (Cohen’s kappa = .78) and O*NET codes (Cohen’s kappa = .76). After coding all papers, coders met to discuss and resolve discrepancies.
Once each study was assigned DOT and O*NET codes, we used DOT and O*NET data to assess the extent to which the tasks performed by the teams in each study were knowledge intensive (creativity requirements, sensemaking requirements, and complexity). Creativity requirements were assessed using the two O*NET dimensions identified by Tierney and Farmer (2011) as particularly indicative of the creativity demands of a job: fluency and originality (see Torrance, 1966). These two dimensions were averaged to create a single measure of creativity requirements (α = .98). Normed scores (i.e., scores divided by the maximum possible score) ranged from .22 to .73 with a standard deviation of .09.
Sensemaking requirements were also assessed using two dimensions from O*NET: speed of closure and flexibility of closure. These dimensions measure the extent to which employees need to “quickly make sense of, combine, and organize information into meaningful patterns” and “identify or detect a known pattern … in other distracting material” (Cognitive Abilities, O*NET, 2014). A measure of sensemaking requirements was computed by averaging these two dimensions (α = .87) and normed scores ranged from .29 to .61 with a standard deviation of .06.
Complexity was assessed using job complexity measures from the DOT. Past research has frequently used job complexity measures from the DOT to measure the complexity of a job (Baer et al., 2003; Oldham & Cummings, 1996; Tierney & Farmer, 2002, 2011). Moreover, DOT complexity scores have demonstrated convergent validity with measures obtained via the Job Diagnostic Survey (Spector & Jex, 1991). Following Xie and Johns (1995), we used the three dimensions of general education development within the DOT to measure complexity: reasoning skills, mathematical skills, and language skills. We averaged these three dimensions to form one measure of complexity for the studies in our sample (α = .93). Normed scores ranged from .24 to .96 (M = 0.7, SD = .13).
We used factor analysis to verify that the previous task environment measures were getting at distinct constructs. From the exploratory factor analysis, convergent and discriminant validity were evaluated based on eigenvalues. Our eigenvalues were above the typical cutoff of 1 (eigenvalues = 3.49, 1.86, 1.28; 95% variance explained) indicating that the aforementioned scales were three clean factors. Items loaded on factors as predicted by our measurement theory (average predicted loading = .96, min. = .92, max. = .98) with minimal cross-loadings on other factors (average cross-loading = .11, min. = −.05, max. = .18). Correlations between aggregated task environment variables across all coded jobs (n = 154) were all positive and below .40, r(complexity, sensemaking) = .17, r(complexity, creativity) = .23, r(sensemaking, creativity) = .32.
Meta-analytic procedures
We followed Hunter and Schmidt’s (2004) procedures for calculating the population correlation. Specifically, we individually corrected for all artifacts mentioned in Hunter and Schmidt (2004) when the information was present to do so. If reliability measures were not provided, we corrected the effect size based on the average reliabilities across the other studies (Lipsey & Wilson, 2001). We then calculated the average of the independent effect sizes by weighting each one by the number of groups in the study multiplied by the compound attenuation factor (Hunter & Schmidt, 2004). We used this random-effects technique because it allows for an analysis of parameter variability across studies.
We examined Q statistics (Hedges & Olkin, 1985) as well as the standard deviation of the meta-analytic correlations (SD ρ; Hunter & Schmidt, 2004) in order to evaluate the degree to which the correlations between safety, learning, and performance varied across studies. A Q statistic is an approximation of a chi square in distribution and is used to test the homogeneity of the corrected effect sizes across studies. A significant Q statistic suggests that a given correlation does vary across studies and that interaction effects are therefore likely (Lipsey & Wilson, 2001). Another rule of thumb is that moderators are likely present if less than 75% of the variance in the correlations is accounted for by artifacts (Hunter & Schmidt, 2004). Due to the importance of interactions in our theory, both measures are reported.
We assessed the impact of the task on the relationships between psychological safety and team learning as well as psychological safety and team performance using weighted least squares (WLS) regression. WLS is used to examine how study characteristics (i.e., knowledge intensity of team tasks) relate to the magnitude of effect sizes, that is, r(psychological safety, team learning), r(psychological safety, team performance) across studies. It does so in this case by regressing (a) the magnitude of the relationships between psychological safety and learning as well as (b) psychological safety and performance on each of the knowledge-intensive task characteristics (see Hunter & Schmidt, 1990; Lipsey & Wilson, 2001). Given the small number of studies included in these regressions (N < = 27), we computed a separate WLS regression for each of our task variables.
We also examined the extent to which the effect of psychological safety on team learning and performance is contingent on knowledge-intensive task requirements by calculating the average effect sizes of studies conducted in environments that are high and low on those task characteristics. This allows for a more direct comparison of the effects of psychological safety on learning and performance in teams that score high versus low on knowledge-intensive task characteristics. For this median split, high and low values of each task environment variable were determined using population medians across all occupations in the DOT and O*NET databases. We then split our sample of studies into above-median (high) and below-median (low) groups for each task environment variable, and computed the average correlations between psychological safety and team learning as well as psychological safety and team performance across studies within each group. Lastly, we constructed confidence intervals for each group based on the observed within-group variance.
We compared the indirect effect of psychological safety on performance through learning using coefficients from meta-analytic path analysis models computed for high versus low values of each task environment variable (see Shadish, 1996; Viswesvaran & Ones, 1995). The correlations for this analysis were calculated the same as they were for the median splits described before. We used these correlations to compute the indirect path between psychological safety and team performance through team learning for high groups versus low groups, and compared these coefficients. Standard errors and 95% bias-corrected confidence intervals for each indirect effect were computed using 3,000 bootstrapped samples. The standard error for the difference between indirect effects was computed from the relevant bootstrapped standard errors. Following Viswesvaran and Ones (1995), we used the harmonic mean as the sample size in computing path estimates and set the means and standard deviations in our input matrices to 0 and 1 respectively.
As noted before, the studies included in our meta-analysis used different but conceptually analogous measures of psychological safety, learning, and performance. We ran ANOVAs to verify that the magnitude of the focal relationships did not systematically vary with the measure used. We also investigated whether measurement reliability varied across measures. We found no significant differences in the magnitude of correlations (all p-values > .10) or the variability of Cronbach’s alpha across different measures (all p-values > .10).
The papers in our meta-analysis studied three types of teams: working professionals (42 studies), student project teams (six studies), and experimental teams (three studies). We ran ANOVAs to verify that the magnitude of the focal relationships did not vary systematically based on the type of team. We did not find any differences in the effect sizes in student compared to professional teams (p > .10 in both ANOVAs). However, we found that the magnitude of the relationship between psychological safety and learning in experiments was significantly smaller than in studies involving student project teams or working professionals (p < .01 in both ANOVAs). We were unable to compare the relationship between psychological safety and performance in laboratory teams compared to student and professional teams because only one experiment measured that relationship. The corrected correlation between psychological safety and performance for the 152 teams in that one study was −.01 (Woolley, Chabris, Pentland, Hashmi, & Malone, 2010). In short, few laboratory studies have examined the safety–learning or safety–performance relationship, and those that did found weak relationships. This may be due to the fact that studies of group dynamics in the laboratory often utilize simpler tasks that can be completed in shorter periods and that require no specialized knowledge. Our theory would predict that relationships between psychological safety and learning or performance will be weaker in groups that perform such tasks. And studies with weaker results are often not published.
Results
Table 1 summarizes the parameter estimates from our meta-analysis. The estimated corrected population correlation (ρ) based on the 2,147 teams in our meta-analysis for which the relationship between psychological safety and learning was examined was .58 (95% CI = [.18, .98]). The estimated corrected population correlation between psychological safety and performance based on the 2,512 teams in which this relationship was measured was .32 (95% CI = [.07, .57]). The corrected population correlation between learning and performance was also found to be positive and significant (across 972 teams, estimated ρ = .39; 95% CI = [.12, .66]. In short, we found generally positive and significant relationships between psychological safety, learning, and performance across these studies.
Meta-analytic estimates of the relationships between psychological safety, team learning, and team performance.
Note. κ = the number of studies; N = the total number of teams; r = the sample-weighted mean correlation; ρ = the mean estimate of the corrected population correlation; SD ρ = the standard deviation of the mean estimate of the corrected population correlation; % SE = the percentage of variance attributable to sampling and measurement error; Q = the Q-statistic; 95% CI = the 95% confidence interval.
*p < .05, **p < .01, ***p < .001.
We also found that the strength of these relationships varied considerably across studies. We found large 95% confidence intervals for each of the average correlations with significant Q statistics in every case: safety–learning = 186 (p < .001), safety–performance = 65 (p < .001), and learning–performance = 45 (p < .001). Moreover, the percent of variance accounted for by artifacts (% SE) was well below 75% in all cases: safety–learning = 25%, safety–performance = 37%, and learning–performance = 28%. These results strongly suggest that although the relationships between psychological safety and both learning and performance may be generally positive, the magnitude of those relationships varies considerably across studies (Hunter & Schmidt, 2004; Lipsey & Wilson, 2001). This pattern of results implies that moderators are likely (Lipsey & Wilson, 2001).
Table 2 summarizes our weighted least squares (WLS) regression results in which we examine whether the variance just documented can be explained by differences in knowledge-intensive task characteristics across studies. Our results show that a substantial amount of the variance is explained by the extent to which the team was engaged in knowledge-intensive tasks. Specifically, the magnitude of the correlation between psychological safety and learning was positively and significantly predicted by (a) creativity requirements (β = .43, adj. R2 = .15, p < .001), (b) sensemaking requirements (β = .53, adj. R2 = .25, p < .001), and (c) complexity (β = .60, adj. R2 = .34, p < .001). Moreover, the correlation between safety and performance was significantly predicted by (a) creativity requirements (β = .68, adj. R2 = .43, p < .001), (b) sensemaking requirements (β = .38, adj. R2 = .14, p < .05), and (c) complexity (β = .70, adj. R2 = .47, p < .001). Because the sample size for each of these regressions is fairly small, we conducted outlier analyses to ensure that these effects are not being driven by extreme cases (see De Dreu & Weingart, 2003). No significant outliers were detected using the procedures outlined by Cohen, Cohen, West, and Aiken, (2002) as well as Stevens (1984).
Weighted least squares regression results.
Note. κ = number of studies; β = standardized regression coefficient; adjusted R2 used for R2.
*p < .05, **p < .01, ***p < .001.
The magnitude of the explained variance in these regressions is worth highlighting. For example, task complexity accounted for fully one third of the variance in the relationship between psychological safety and learning and just under half of the variance in the relationship between psychological safety and performance. Clearly, task knowledge intensity strongly affects the strength of the relationships between psychological safety and learning as well as performance.
Table 3 summarizes the results of studies involving teams that scored above and below the median on knowledge-intensive task characteristics based on the population of DOT and O*NET jobs. Effect sizes for the relationship between psychological safety and learning were larger in studies where teams scored above the median on knowledge-intensive task characteristics, and the confidence intervals for above-median effect sizes did not overlap with intervals for below-median effect sizes. A similar pattern was found for the relationship between psychological safety and performance, except in the case of sensemaking requirements where the confidence intervals did overlap (the confidence interval for below-median studies was very wide).
Effect sizes of focal relationships at high (above median) versus low (below median) levels of each knowledge-intensive task characteristic.
Note. κ = the number of studies; N = the total number of teams; r = the sample-weighted mean correlation; ρ = the mean estimate of the corrected population correlation; 95% CI = the 95% confidence interval.
An examination of Table 3 also shows that one explanation for the strong and positive relationship between psychological safety and team learning in our meta-analysis is that researchers have tended to sample from task settings that are higher on knowledge intensity. Of those 27 field studies that examined the relationship between psychological safety and team learning, 22 (81%) scored above the general population median on creativity requirements, 18 (67%) scored above the median on sensemaking requirements, and 21 (78%) scored above the median on complexity. Given this sampling bias, we can assume that past research has reported larger correlations than would be observed in the general population.
Lastly, the indirect relationship between psychological safety and team performance through team learning shows a similar pattern to the direct relationships—it is stronger in task environments that score higher on creativity requirements, sensemaking requirements, and complexity. The coefficients for the indirect effect of safety on performance through learning at high versus low levels of each task environment variable are summarized in Figure 1. For creativity requirements, the indirect effect was .24 (95% CI = [.17, .30]) for above-median studies and .10 (95% CI = [.07, .13]) for below-median studies (effects differed at p < .001). For sensemaking requirements, the indirect effect was .37 (95% CI = [.32, .43]) for above-median studies and .07 (95% CI = [.05, .10]) for below-median studies and these effects differed at p < .001. And for complexity, the indirect effect was .24 (95% CI = [.19, .28]) for above-median studies and .10 (95% CI = [.07, .14]) for below-median studies (p < .001).

Indirect relationships between psychological safety and performance via learning at high (above median) versus low (below median) levels of each task environment variable. Significance of indirect effects based on 3,000 bootstrapped samples. *p < .05, **p < .01, ***p < .001.
In sum, the results of our meta-analysis strongly support the proposition that psychological safety’s impact on team learning and performance is contingent on the knowledge intensity of the task. The WLS analysis found that a sizeable portion of the variance in the focal relationships was predicted by knowledge-intensive task requirements. And the median splits reveal a similar pattern. Lastly, we also found that the indirect effect of psychological safety on performance through learning is lower in teams working on tasks that are low in knowledge intensity.
Discussion
The theory and results of this study provide a more accurate and nuanced picture of the consequences of psychological safety in groups, and underscore the importance of explicitly considering context and contextual moderators in groups research. Past studies examining the relationship between psychological safety and either team learning or team performance have paid very little attention to the role of the task environment. In fact, most studies in our meta-analysis provided scant detail about the tasks that the teams in the study were performing. 2 The implicit assumption seems to be that teams are teams and that if safety relates to learning and performance in a health care team it will also relate to learning and performance in a manufacturing team. We show that the task environment of a team does matter, and that predictions and prescriptions that overlook the role of the task environment will therefore tend to over- or underestimate the importance of psychological safety for both learning and performance. Our paper is therefore a specific illustration of why the tendency among group and organization scholars to underemphasize the effect of context on relationships of core theoretical interest is problematic. To show that an effect exists is important. But to show that an effect exists without considering the contextual conditions under which it might or might not exist, or the environments in which it might be strengthened or weakened or even reversed, provides an incomplete picture that ultimately confuses rather than illuminates (as noted by Cronin & Bendersky, 2012; Cronin et al., 2011).
The theory and results presented here raise several important questions and point to several promising directions for future research. We call out several of those issues in the following paragraphs, and more fully explore the implications of this research for theory and practice.
Areas for future research
What is moderator and what is main effect?
We have argued that a group’s task environment is a critical but largely overlooked moderator of the safety–learning–performance relationship. But a strong argument could be made that, in fact, the task environment factors examined here—complexity, creativity requirements, and sensemaking requirements—are the main effects and that psychological safety is the moderator in this system of relationships. As noted before, psychological safety is traditionally viewed as a group climate factor that removes barriers to learning, risk-taking, and openness during group interaction. But to say that psychological safety removes barriers to learning is not the same as arguing that psychological safety stimulates learning. Indeed, early theorists viewed psychological safety more as an enabling factor, a factor that would enable learning to happen within a team when members were otherwise motivated to engage in learning. In other words, psychological safety is perhaps more appropriately viewed as the moderating variable in these relationships.
In contrast, elements of the task environment—factors like task complexity, creativity requirements, and sensemaking requirements—are perhaps better conceptualized as factors that stimulate learning within groups rather than as moderators of the safety–learning relationship. In groups that perform knowledge-intensive work, learning is not optional but is a core performance requirement—groups simply cannot succeed at knowledge-intensive work without some degree of reflection, experimentation, and discovery. We would therefore expect that members of such groups will be more strongly oriented toward learning than will members of groups that perform less knowledge-intensive work. And since psychological safety removes barriers to the behavioral expression of this orientation, it follows that the knowledge intensity of a group’s task will be more positively related to learning and performance in teams where psychological safety is high.
The results of the present meta-analysis are, of course, consistent with both the moderator and the main effect interpretations since we are only able to establish that an interaction effect exists with this meta-analytic research design. But the basic issue is fundamental since it suggests that psychological safety may have no positive effect on learning or performance in groups where members are not otherwise oriented toward learning—either because of task requirements or for some other reason (e.g., intrinsic motivation, learning incentives, etc.). We therefore encourage future researchers to examine this issue directly.
Ideally, this would be tested using controlled settings where levels of learning motivation and psychological safety can be manipulated. It is important to keep in mind, however, that psychological safety was weakly related to learning and performance in the few laboratory studies we found in our review. We posited that this result may be due to the fact that laboratory studies often utilize simpler and less knowledge-intensive tasks. Researchers may therefore need to use field experiments or design studies that increase or that deliberately vary the knowledge intensity of team tasks in order to fully explore the effect of psychological safety on team learning and performance dynamics. Methodological advancements in measuring the changing nature of team behaviors and norms (for a review, see Humphrey & Aime, 2014) may also help in allowing researchers to more precisely specify the role that motivation and psychological safety play in dynamic models of team learning and performance.
How much does context really matter?
As noted before, several scholars have encouraged researchers who study groups and organizations to pay more careful attention to context. And yet, many studies continue to give scant attention to the role that context might play in affecting the pattern of observed relationships. One possible explanation for this oversight is that scholars may agree that context matters, but question whether that effect is significant enough to warrant careful attention. The results of this study provide strong evidence that context not only matters, but that context can account for a very significant portion of the observed variance in the strength of a given relationship across studies. For example, we found that nearly half of the variability in the relationship between psychological safety and performance is explained by a single contextual factor, task complexity.
Changes in the effect of psychological safety over the experiential learning cycle
In our review of the literature, we also noticed that the team learning process is almost always measured as one set of co-occurring activities. This measurement is inconsistent with the initial conceptualizations (Dewey, 1938; Kolb, 1983) of experiential learning which describe it as a four-step process made up of experiencing, collecting information on experiences, reflecting on what happened, and coming up with a plan. Past studies have acknowledged that the efficacy of different conditions or interventions can vary based on where the attention of the group is focused at the time of the condition or intervention (Marks, Mathieu, & Zaccaro, 2001). For example, taking interpersonal risks may be less beneficial during the concrete experience phase because adherence to routines may be necessary (Gersick & Hackman, 1990). Thus, the impact that psychological safety has on team performance through learning may vary based on where a team is in the experiential learning process. Future research should explore this possibility.
Managerial implications
As noted before, the bulk of the scholarly research examining the relationship between psychological safety, learning, and performance views psychological safety as a direct predictor of learning or performance in teams. The implied prescription for a manager who wants to improve team learning and performance, then, is to increase psychological safety, perhaps by implementing those structures and practices that have been shown to affect safety in past research (e.g., structural clarity, Bunderson & Boumgarden, 2010; inclusive leadership, Nembhard & Edmondson, 2006). The results reported here make it clear that such.
Conclusion
In a complex and changing world, organizations must rely on their teams to learn and adapt. In order to help managers increase their teams’ engagement in the learning process, scholars have devoted considerable attention to understanding how managers can encourage their teams to learn. These efforts have often focused on psychological safety, with research most often examining a model in which psychological safety increases team learning by removing the fears that inhibit the interpersonally risky process of learning.
This paper contextualizes that model by suggesting that the degree to which psychological safety impacts team learning varies based on the extent to which teams must engage in variable interactions to complete their tasks. More specifically, we use a meta-analysis to test the proposition that the effect of psychological safety on learning and performance is bounded by the knowledge intensity of a team’s task. We find that over one third of the variance in the relationship between psychological safety and learning as well as nearly half of the variance in the relationship between psychological safety and performance is explained by just one aspect of knowledge-intensive work, complexity. In short, we find that context matters a great deal.
Given the general lack of attention that has been paid to context in the management literature (Cronin & Bendersky, 2012), the magnitude of these findings should remind researchers of the importance of contextualizing their theories and analyses. Doing this is not only important for building accurate models; it is also critical for giving practitioners useful advice. In order for management scholarship to help managers, it must do more than tell them what theories may work; it must also tell them when and where those theories should be applied.
Footnotes
Acknowledgements
We would like to thank Amy Edmondson, Zhike Lei, participants in the GOMERS workshop at Washington University in St. Louis, Maren Bunderson, and Alessandrina Freitas for their comments, feedback, and encouragement during the development of this paper. We would also like to thank Ana MarkdaSilva and Daniel Rosenbaum for assistance with data collection, and Stephen Humphrey and two anonymous reviewers for excellent feedback and guidance throughout the review process.
