Abstract
Bilinguals often outperform monolinguals on executive function tasks, including tasks that tap cognitive flexibility, conflict monitoring, and task-switching abilities. Some have suggested that bilinguals also have greater working memory capacity than comparable monolinguals, but evidence for this suggestion is mixed. We therefore conducted a comprehensive meta-analysis on the effects of bilingualism on working memory capacity. Results from 88 effect sizes, 27 independent studies, and 2,901 participants revealed a significant small to medium population effect size of 0.20 in favor of greater working memory capacity for bilinguals than monolinguals. This suggests that experience managing two languages that compete for selection results in greater working memory capacity over time. Moderator analyses revealed that largest effects were observed in children than other age groups. Furthermore, whether the task was performed in the first (L1) or second (L2) language for bilinguals moderated the effect size of the bilingual advantage; this factor is often overlooked and our results point to the importance of defining language variables that influence critical cognitive outcomes.
I Introduction
Bilingualism has been associated with the enhancement of multiple executive functions, including cognitive flexibility (Adi-Japha et al., 2010), efficiency (Blumenfeld and Marian, 2014), task-switching (Gold et al., 2013; Prior and MacWhinney, 2010), and conflict resolution (Donnelly, Brooks, and Homer, 2015). This is believed to be the result of lifelong experience managing multiple languages that compete for selection. Resolution of this competition requires higher-order executive control processes that may extend to enhance generalized executive functioning (Bialystok et al., 2012). Given that language processing is largely dependent on working memory and that there is a strong positive relationship between many higher-order executive functions and working memory (WM) capacity (Engle, 2002), it is logical to expect that bilinguals would also have greater WM capacity than monolinguals. Some research supports this prediction (e.g. Morales et al., 2013; Soliman, 2014) but other studies have found no effect (e.g. Namazi and Thordardottir, 2010; Ratiu, and Azuma, 2015). Recently, such failures to replicate bilingual cognitive benefits have led some to argue that there is no bilingual advantage (Gasquoine, 2016; Paap et al., 2015), sparking widespread responses to these claims (Bak, 2015; Gold, 2015; Kousaie and Taler, 2015; Woumans and Duyck, 2015). It is therefore especially important to discover the reality of these effects in light of the contentious nature of the field. A meta-analysis on the current state of the literature would be revealing in uncovering the strength and reality of these findings. Here, we present a meta-analysis of performance differences between monolinguals and bilinguals on WM span tasks.
There are reasons to expect that bilinguals might have greater working memory capacity than monolinguals. Substantial evidence exists that both languages are activated in the bilingual brain even when only one language is being used (Martin et al., 2009; Spivey and Marian, 1999; Thierry and Wu, 2007; Timmer et al., 2014; Wu and Thierry, 2010, 2012; for review, see Kroll et al., 2012). Managing languages that compete for selection requires resources from WM (Thorn and Gathercole, 1999), and continual use of WM resources might lead to enhanced WM capacity over time in order to ensure efficiency of processing in the future. Some researchers claim that larger WM capacity actually represents greater ability to control attention within the WM store, rather than having a greater store per se (Engle, 2002). From this perspective as well, bilinguals should have greater WM capacity than monolinguals because numerous studies have shown that bilinguals are better at controlling attention (Bialystok, 1999; Bialystok et al., 2004; Costa et al., 2008; Hernandez et al., 2010; Soveri et al., 2011).
Recent models of WM share the view that WM is made up of multiple component processes that rely heavily on selective attention (Eriksson et al., 2015). Eriksson et al. (2015) propose that selective and sustained attention processes that operate on perceptual and long-term memory (LTM) information are integral parts of WM. Specifically, they propose a model in which attentional and other components (e.g. inhibition, prospective planning, updating), all contribute to ‘temporarily enhanced accessibility’. Learning a second language across the lifespan necessarily involves selective and sustained attention that operate on perceptual and LTM information. For example, preverbal infants that grow up in a bilingual environment already show enhanced attention to the mouth of talking faces at an earlier age than infants growing up in a monolingual context; focus on the mouth is a strategy used by more mature individuals (e.g. adults) in order to make use of the redundancy of audiovisual speech cues (Pons et al., 2015). This shows that bilingual infants have more mature attentional control than monolinguals. Furthermore, in order for infants and children to make an association between an item currently being processed and one of the two languages being learned, the item needs to be rehearsed, while previously stored information from the two languages (that compete for selection) is retrieved from LTM. Lifelong practice with attentional control devoted to these rehearsal and retrieval processes should contribute to better WM for bilinguals than monolinguals.
A 2010 meta-analysis examining the relationship between bilingual status and executive function (EF) tasks revealed that bilinguals outperformed monolinguals on a range of EF tasks, including tasks that tapped WM processes (Adesope et al., 2010). However, only seven effect sizes from five independent studies were available for this meta-analysis and, since then, an upsurge of studies has published on the relationship between bilingual status and WM capacity. In fact, 19 of the 27 studies included in the present meta-analysis were from 2010 onward. Furthermore, some of the studies included in Adesope et al. (2010) meta-analysis used non-span tasks that only indirectly measured WM capacity (e.g. Bialystok, 2006). Given these limitations and the current conflicting results in the literature, a re-evaluation of the population effect size of WM capacity advantages for bilinguals over monolinguals (if present) is necessary.
The present study performed a comprehensive meta-analysis on the effects of bilingualism on working memory capacity to reveal an estimate of the population effect size. Furthermore, given the upsurge of new studies, we were able to examine age, the linguistic nature of the task, and the language in which the task was performed as potential moderating variables.
II Methods
1 Literature search
We first searched the PsycINFO database with the keywords ‘bilingual’ and ‘working memory’ or ‘short-term memory’. Following this, we performed a Google Scholar search for ‘bilingual working memory span’ and ‘bilingual short-term memory span’ for any articles that PsycINFO might have missed. Finally, we combed through the reference sections of the articles obtained to find any additional studies that we might have missed. The search included any relevant articles (see below) found up to 13 July 2015.
2 Inclusion criteria
Studies included in the meta-analysis conformed to the following set of criteria:
The study had a clear monolingual and a clear bilingual group. Studies that examined working memory performance across different levels of bilingualism (e.g. low proficiency vs. high proficiency bilinguals) were not included.
All participants were typically developing individuals without learning disabilities or cognitive impairments of any sort.
The study included a working memory span task (e.g. forward digit span, backward digit span, sentence span, operation span, etc.). Studies that indirectly manipulated working memory demands within a non-span task were not included.
The study had enough information to calculate an effect size. In cases where an effect size could not be calculated based on descriptive statistics (i.e. means and standard deviations), F- and t-values were converted to effect sizes using the conversion equations reported in Appendix 1.
Following these guidelines, a total of 88 effect sizes from 27 independent studies with a total sample involving 2,901 participants were obtained. All studies included in the meta-analysis are shown in Table 1 and identified by an asterisk in the references section.
All studies included in the meta-analysis.
3 Main meta-analysis
Following the suggestion of Field and Gillett (2010; see also Rosenthal and DiMatteo, 2001), we chose the Pearson r correlation coefficient as our effect size measure for the analysis. This coefficient is understood by most psychologists and is a versatile measure of the strength of an experimental effect. Furthermore, the Pearson r correlation coefficient is easy to calculate from means and standards deviations, as well as from other statistics (e.g. F- and t- values; Field, 2005a, 2005b; Rosenthal, 1991; for conversion equations used in the present experiment, see Appendix 1).
For the present meta-analysis, we employed the Hunter–Schmidt random-effects model (Hunter and Schmidt, 2004) on these effect sizes. Most researchers recommend using a random-effects model when one wishes to generalize findings beyond studies included in the analysis (i.e. to the general population; Field and Gillett, 2010; Hedges and Vevea, 1998; Overton, 1998). Fixed-effects models should only be used when there are reasons to expect that the population variance is homogeneous, which rarely occurs in reality, unless one is only interested in a subset of the population (e.g. using the same task with the same researcher). Furthermore, applying a fixed-effects model to the population can inflate type-I error rates from the usual 5% to anywhere from 11%–80% (Field, 2003; Hunter and Schmidt, 2000). Given that we wished to estimate the mean effect size of the general population, we opted for a random-effects model. To employ the Hunter–Schmidt random-effects analysis, we used the SPSS syntax files described by Field and Gillett (2010) and supplied by Field and Gillett (2009). We report the estimated population effect size (ρ), along with the 95% confidence intervals, sample correlation variance, sampling error variance, and the estimated variance in population correlations.
4 Sensitivity analysis
We conducted a sensitivity analysis to combat the potential issue of non-independence of effect sizes in our sample given that many of the studies have multiple effect sizes (see Table 1). Following previous research (Orlitzky et al., 2003), mean effect sizes were created, resulting in one independent effect size per study. If this analysis turns out similar to the comprehensive one, we can be more confident in our findings.
5 Moderator analyses
We performed moderator analyses on the following variables:
Age group (children, young adults, older adults)
Task type (verbal vs. non-verbal)
Language in which the verbal tasks were performed
a Age group (children, young adults, older adults)
Previous work has shown that bilinguals and monolinguals are more likely to show executive function differences in children and older adults than in younger adults because the latter individuals are at peak cognitive performance (Bialystok et al., 2005). Thus, larger working memory capacity might not be as evident in young adults as it is in the other two groups. On the other hand, it is possible that a linear increase in working memory capacity will be revealed across the lifespan, given that experience managing two languages increases over time. Finally, it is possible that working memory demands are largest when people are first learning a new language and the greatest gains in working memory capacity might be expected in children, after which point they plateau.
b Task type (verbal vs. non-verbal)
Working memory capacity might vary as a function of task type. Bilingual performance disadvantages compared to monolinguals have been reported on numerous tasks that require language processing (Bialystok and Luk, 2012; Gollan et al., 2007; Ivanova and Costa, 2008). Thus, it is possible that verbal working memory tasks are masking the true working memory capacity of bilinguals (Bialystok et al., 2014).
c Language in which the verbal tasks were performed
Language in which the verbal tasks were performed: first language (L1) vs. second language (L2). All monolinguals necessarily have to perform WM span tasks in their L1, but bilinguals sometimes perform the tasks in their L1, sometimes in their L2, and sometimes in both. Performing WM span tasks in their L2 would disadvantage bilinguals on these tasks given that they are necessarily not as proficient in this language (compared to L1) and retrieval processes are slower. Thus, we explored whether these considerations moderated the results.
6 Publication bias
An issue with any meta-analysis is the fact that published studies are more likely to report significant findings than non-significant findings, and this can over-represent the estimated effect size of the population. Fortunately, there are some analyses that one can perform to instill confidence in the meta-analytic results. One of the most popular is Rosenthal’s (1979) fail-safe N, which is an estimate of the number of missing or unpublished non-significant studies that are required in order to render the estimated effect size non-significant. A general rule of thumb is that the estimated effect size is likely safe from publication bias if the fail-safe N exceeds 5k + 10, where k is the number of effect sizes included in the analysis. In the present study, the fail-safe N would need to exceed 470 to be safe from publication bias.
III Results
1 Population estimate
Table 1 provides descriptive statistics and mean effect sizes for all studies included in the analysis. Following Rosenthal’s (1995) advice, Table 2 shows stem-and-leaf plots for all effect sizes. A chi-squared test of homogeneity revealed considerable variation in effect sizes overall, X2(1, N = 88) = 892.92, p < 0.001, which gives justification for use of the random-effects model. The weighted mean population estimate was highly significant, ρ = 0.20, X2(1, N = 88) = 799.65, p < 0.001; the 95% confidence interval was –.253 (lower) to 0.653 (upper). The sample correlation variance was found to be 0.06 and the sampling error 0.01. Thus, the estimated population correlation variance was 0.05.
Stem-and-leaf plot of all effect sizes.
2 Sensitivity analysis
A chi-squared test of homogeneity revealed considerable variation in effect sizes overall, X2(1, N = 27) = 168.97, p < 0.001. The weighted mean population estimate was highly significant, ρ = 0.158, X2(1, N = 27) = 132.04, p < 0.001; the 95% confidence interval was –.205 (lower) to 0.521 (upper). The sample correlation variance was found to be 0.043 and the sampling error 0.009. Thus, the estimated population correlation variance was 0.034. These results are similar to the comprehensive meta-analysis, thus, we use the comprehensive list of effect sizes for the remaining analyses.
3 Publication bias
The fail-safe N was found to be 6496, which is much larger than the suggested minimum of 470 (i.e. 5k +10). This suggests that our population effect size estimate is likely safe from publication bias.
4 Moderator analyses
Table 3 shows stem and leaf plots of the effect sizes for each of the age (children, young adults, older adults) and task type (verbal, non-verbal) combinations. Using the moderator syntax script for meta-analyses provided by Field and Gillett (2009, 2010), age did not significantly moderate the overall population effect size, X2(2, N = 88) = 2.94, p = 0.23, nor did type of task, X2(1, N = 88) = 2.48, p = 0.115. However, because of their theoretical significance and our a priori predictions, we followed these analyses up with conventional ANOVAs on the weighted Pearson r values for the independent variables. The one-way ANOVA on age revealed a significant effect, F(2, 85) = 5.01, p = 0.01, but the ANOVA on type of task did not, F(1, 86) = 1.97, p = 0.16. The effect of age is explained by the finding that children (ρ = 0.25) showed larger effect sizes than young adults (ρ = 0.03; t(54 11 ) = 3.56, p = 0.001) and older adults (ρ = 0.08; t(271) = 2.32, p = 0.03). Younger and older adult effect sizes did not differ from each other (p > 0.4). However, caution should be taken when interpreting the older adult results because only 8 of the 88 effect sizes were from the older adult population and are therefore less reliable.
Stem-and-leaf plots for effect sizes broken down by age group and task type.
The next analysis involved whether studies that used verbal tasks were performed in the L1 or the L2 for bilinguals. This analysis involved 46 effect sizes (some of which were previously collapsed into one effect size; see Table 1 footnotes) from 18 independent studies. The analysis revealed a significant moderating effect on the population effect size, X2(1, N = 46) = 12.00, p = 0.001. This is explained by the finding that when bilinguals did the WM task in their L1 (ρ = 0.28; N = 35) the effect size was much larger than when they did the task in their L2 (ρ = −0.11; N = 11), which was in the opposite direction.
IV Discussion
We performed a comprehensive meta-analysis to examine the possibility that bilinguals have greater WM capacity than monolinguals because of their extensive experience managing two languages that compete for selection. Results revealed a robust small to medium population effect size of 0.20 showing greater working memory capacity for bilinguals than monolinguals. Our interpretation is that competition between languages places demands on WM capacity and that this results in greater WM capacity over time.
A recent study performed a meta-analysis on the influence of working memory capacity on second language comprehension and production scores amongst bilinguals (Linck et al., 2014). They found a population effect size estimate of 0.255 amongst 79 independent studies. They interpreted these findings as suggesting that WM capacity leads to greater proficiency in a second language amongst bilinguals. However, correlation does not equal causation, and it is equally likely that their data reflect that amongst bilinguals, greater second language proficiency leads to enhanced WM capacity. The latter possibility is in line with our interpretation of increased WM capacity for bilinguals over monolinguals. The present study can shed light on this potential bidirectional relationship given that we examined the difference between monolinguals and bilinguals, whereas Linck and colleagues only examined bilinguals. Most bilinguals do not choose to be bilingual; rather, they are raised in bilingual environments or become bilingual out of life necessity (Bialystok and Poarch, 2014). Therefore, it is not likely that bilinguals in the present study became bilingual because of greater WM capacity, nor is it likely that monolinguals in the present study became so because of smaller WM capacity. Differences in executive functions between bilinguals and monolinguals who are well matched on other environmental factors, such as socioeconomic status and intelligence, are therefore not likely due to genetic or biological factors, but rather to their linguistic experiences. We propose that second language experience has a positive effect on WM capacity, both between language groups (bilingual vs. monolingual; present study) and within language groups (low proficiency vs. high proficiency bilinguals; Linck et al., 2014).
The hypothesis that age would have an influence on the bilingual advantage for WM performance was partly supported. Specifically, the bilingual advantage appeared to be strongest in children. We hypothesized that age might moderate the effect of bilingual status on WM performance in three possible ways:
Largest effects would be seen in children and older adults because of ceiling performance amongst young adults (Bialystok et al., 2005).
A linear increase from youngest to oldest would be seen due to increasing practice managing two languages across the lifespan; or
Children would show the largest effects because of more cognitive demand when first learning a new language.
The present results support the third interpretation, with greatest group differences amongst children. Learning a new language is very cognitively demanding and, because of how plastic the brain is in children relative to other age groups (Johnston, 2009), group differences in neural reorganization are more likely to appear, leading to observations of greater WM capacity for bilinguals.
It is interesting to note that type of task did not moderate the relationship between bilingual status and WM performance. It was possible that the relationship between bilingualism status and WM performance would be moderated by whether or not a verbal task was used due to subpar performance by bilinguals on tasks that required language processing. The present results did not support this conclusion, and instead support the idea that bilinguals generally outperform monolinguals on WM span tasks, regardless of the linguistic nature of the task. However, it is also possible that the WM tasks that we included in our analyses are not clearly defined as ‘verbal’ or ‘non-verbal’, and that this masks any group differences that one might expect. For example, the digit span task requires holding digits in mind and is considered to be a verbal task, but the actual linguistic demands are very minimal and may not be enough to differentiate these tasks from non-verbal span tasks. The reading span task on the other hand requires that participants read sentences and remember the last word of each sentence. This requires much more linguistic processing and it is possible that bilinguals are disadvantaged on such tasks because of their poorer language processing. The digit span and the reading span tasks are clearly quite different with respect to how much they tax language processing centers, yet both are classified as ‘verbal’. It is possible that task type only moderates the bilingual advantage effect when more extreme language processing is required. The present study did not have the required number of observations to perform such an analysis, but future studies are encouraged to examine this possibility.
Another linguistic variable that was examined was whether the verbal tasks were performed in the L1 or the L2. Results revealed that when bilinguals performed the task in their L1, they performed better than monolinguals, but when they performed the task in their L2, they performed worse than monolinguals. This is in line with psycholinguistic studies showing slower lexical access for L2 compared to L1 production (for a review, see Kroll and Gollan, 2014). The L2 linguistic disadvantage likely masks the advantage of enhanced attentional WM processing for bilinguals. This exemplifies the importance of ensuring that both groups are performing the task in their dominant language; collapsing across this information masks important group effects.
There are a number of additional linguistic factors to take into consideration in the future that were not possible to analyze here due to lack of available data. For example, given Linck et al.’s (2014) findings, proficiency in the L2 likely influences the effect size between monolinguals and bilinguals, but studies often treat bilingualism/monolingualism as a dichotomous variable rather than a continuous one, even though bilingualism is not a categorical variable (Luk and Bialystok, 2013). Furthermore, proficiency is not always defined in the same way when it is reported. For example, some studies use objective measures of proficiency, and others use subjective self-reports, making comparisons across studies problematic. Although we did not have the available data to perform the analyses, we would expect that WM performance for high proficiency bilinguals would be even more distinct from monolinguals than low proficiency bilinguals; collapsing across these groups likely masks important group effects. Another factor worth investigating is whether bilinguals are bilingual by necessity or by choice. This often coincides with whether someone is an early or a late second language learner. Again, we did not have the available data for such an analysis, but future studies are encouraged to explore this further.
There is a large literature on how bilinguals outperform monolinguals on a range of executive function tasks (for a review, see Bialystok et al., 2012), but until now, the effects of bilingualism on WM capacity were unclear. Results from the present meta-analysis indicate that WM capacity can be added to the list of executive function advantages that bilinguals have over monolinguals.
Footnotes
Appendix
Equations used to convert F and t values to Pearson r correlation coefficients.
| Conversion | Formula |
|---|---|
| t to r | r = √ (t2 / (t2 + n – 2), where n = sample size |
| F to r | r = √ (F / (F + dferror), where df = degrees of freedom in the error term |
Note. Conversion to r is not possible when there are more than two groups being compared.
Declaration of Conflicting Interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
