Abstract
How do faces with social-cultural identity affect bilingual language control? We approach this question by looking at the switch cost patterns and reversed language dominance effect, which are suggested to reflect bilingual language control mechanisms, in the absence (i.e., baseline context) or presence of faces with socio-cultural identity (Asian or Caucasian). In separate blocks, the face matched (i.e., congruent context) or mismatched (i.e., incongruent context) the language to be spoken. In addition, cue preparation time was manipulated to be long (Experiment 1) or short (Experiment 2). In both experiments, a unique asymmetric switch cost with larger costs for L2 was observed in the congruent context as compared with the baseline and incongruent contexts. Furthermore, the reversed language dominance effect was not modulated across contexts. These results suggest a critical role of contextual faces in modulating local but not global language control. Thus, bilingual language control changes flexibly within an environment that includes faces with socio-cultural identity.
Keywords
Introduction
In a multicultural and multilingual society, bilinguals will have to switch between the languages they speak. The bilingual language control system that they use to produce words in the intended language has been studied extensively. It has been shown that bilingual language control adapts flexibly depending on the proficiency of the bilingual (Meuter & Allport, 1999) or the language context one is in (Timmer, Christoffels, & Costa, 2018; Timmer, Grundy, & Bialystok, 2017b). However, what is the influence of multicultural faces around us on bilingual language control? In daily life, we interact with different people and we know which languages they speak. The identity of a familiar interlocutor can therefore prime a specific language (Woumans et al., 2015). For example, at a party, a Chinese–English bilingual may select English to communicate with interlocutors with Caucasian faces but select Chinese with interlocutors with Asian faces (Li, Yang, Suzanne Scherf, & Li, 2013). In the current study, we set out to investigate whether bilingual language control flexibly adapts to socio-cultural contexts based on the race of faces of interlocutors.
Bilingual language control
Previous studies have shown that both languages are activated in parallel during bilingual speech production (Costa, Santesteban, & Caño, 2005; Kroll, Bobb, & Wodniecka, 2006). Therefore, to speak in the intended language bilinguals need a mechanism to control cross-lingual activation. This control process is called bilingual language control (for reviews, see Abutalebi & Green, 2007; Declerck & Philipp, 2015a). A common paradigm used to investigate the underlying mechanism of language control is the language-switching task (Chang, Xie, Li, Wang, & Liu, 2016; Meuter & Allport, 1999; Timmer, Grundy, & Bialystok, 2017a), which distinguishes two loci of control: local control as measured by the switch cost and global control as measured by the mixing cost (De Bruin, Samuel, & Duñabeitia, 2018; Prior & Gollan, 2013; Roychoudhuri, Prasad, & Mishra, 2016) or the reversed language dominance effect (i.e., also called global L1 slowing) (Christoffels, Firk, & Schiller, 2007; De Groot & Christoffels, 2006; Timmer et al., 2018). The mixing cost and reversed language dominance effect are two different indexes to measure global control. While the mixing cost refers to the performance difference between single-language blocks and repeat trials of mixed-language block, the reversed language dominance effect refers to the overall faster responses in the second than first language in a mixed-language block. In the current study, we used the latter as most previous studies investigating language switching used the latter index (see Table 1a and 1b of Bobb & Wodniecka, 2013).
The switch cost is the difference in naming latencies or accuracy between trials on which the response language of two subsequent trials is the same (i.e., repeat trial) or different (i.e., switch trial) and considered as a local level of language control (Christoffels et al., 2007; De Groot & Christoffels, 2006). The switch cost is often asymmetrical (i.e., L2-L1 switch costs are larger than L1-L2 switch costs) in unbalanced bilinguals (Meuter & Allport, 1999; Philipp, Gade, & Koch, 2007), and more symmetrical in balanced bilinguals (Costa & Santesteban, 2004; Linck, Schwieter, & Sunderman, 2012). Several accounts have been proposed, of which the Inhibitory Control (IC) Model is one of the most influential explanations (Green, 1998). This view suggests that when a picture is named, that language receives activation. This increases the threshold for accessing words from the opposite language. On a switch trial, re-activating this language causes a delay (Timmer et al., 2017a). Asymmetric switch costs are due to a larger threshold to re-activate the dominant (L1) than the non-dominant (L2) language (Green, 1998).
The reversed language dominance effect shows slower naming in the dominant L1 than the weaker L2 during a language-switching context (Costa & Santesteban, 2004; Costa, Santesteban, & Ivanova, 2006; Gollan & Ferreira, 2009; Gollan, Schotter, Gomez, Murillo, & Rayner, 2014; Timmer et al., 2018; Wu, Kang, Ma, Gao, & Guo, 2018; but see Prior & Gollan, 2013). This effect is suggested to come about from exerting global control over all L1 representations when bilinguals need to mix their two languages. This helps to have more efficient production in their two relative languages. In the literature, this effect has been considered to reflect a global level of language control (e.g., Bobb & Wodniecka, 2013; Christoffels et al., 2007; Kroll et al., 2006).
Language switching with contextual faces
The adaptive control hypothesis (Abutalebi & Green, 2016; Green & Abutalebi, 2013) postulates that bilingual language control processes are flexibly modified to the type of context a bilingual is in. The hypothesis is supported by recent studies showing that the linguistic processing context is a critical factor that modulates bilingual language control (Olson, 2015; Timmer et al., 2018; for a review, see Timmer et al., 2017b). For example, Timmer and colleagues (2018) showed that language context could modulate both local and global language control. Dutch–English bilinguals performed a cued language-switching task in two different language contexts: L1 context (83% pictures had to be named in Dutch) and L2 context (83% had to be named in English). During the L1 context, there was a symmetric switch cost and global slowing of the L1; however, during the L2 context, an asymmetric switch cost with larger cost for L2 and no global slowing down of the L1 was found. This suggests that bilingual language control flexibly changes depending on the linguistic context. Similarly, other linguistic contexts, such as sentence context (Declerck & Philipp, 2015b) or grammatical structures (Gollan & Goldrick, 2016), also modulate bilingual language control.
It seems that the linguistic context modulates bilingual language control, but what about the influence of non-linguistic factors on language control processes? There seems to be conflicting evidence with some studies finding effects of non-linguistic factors on bilingual language control mechanisms (Liu et al., 2018; Zhang, Morris, Cheng, & Yap, 2013). For example, Liu and colleagues (2018) asked non-proficient Chinese–English bilinguals to name pictures in a conflicting context (i.e., name the colour of the printed words switching between L1 and L2, while the colours of the words were incongruent with the printed words) or non-conflicting context (i.e., name the colour of the non-colour words). The language switch cost and its level of asymmetry became larger in the conflicting context compared with the non-conflicting context, indicating that the non-linguistic processing context modifies the workings of the bilingual language control system (Liu et al., 2018). Furthermore, others found that cultural images (e.g., Great Wall vs. Statue of Liberty) hindered fluency in the opposite language (Zhang et al., 2013). However, although Roychoudhuri and colleagues (2016) also found that culturally iconic images interfered with naming pictures in the opposite language, they did not find that these cultural images modulated specific control mechanisms, like the mixing and switch costs. Thus, it is still unclear whether or how the non-linguistic processing context could modulate bilingual language control.
A non-linguistic factor that has shown to be important in bilingual language processing is the effect of socio-cultural faces (for a review, see Hartsuiker, 2015). The visual cue of a socio-cultural face can stimulate language selection. For example, selecting Chinese when seeing an Asian face (Li et al., 2013). It can also interfere with language production when the face and language to speak do not match (Zhang et al., 2013). For example, when Chinese–English bilinguals named pictures that were presented together with faces of a matching socio-cultural identity (e.g., viewing a Caucasian face while speaking English), naming latencies were facilitated compared with a baseline context with no faces. Incongruent face–language context (e.g., viewing a Chinese face while speaking English) did not disrupt naming compared with the baseline (Li et al., 2013). However, when Chinese immigrants were instructed to engage in a simulated dialogue in English while viewing a Chinese instead of a Caucasian face, their fluency in English was reduced (Zhang et al., 2013). In addition, other studies also found that when we know what language an interlocutor speaks, this familiar face modulates our language activation towards the language spoken by the interlocutor (Molnar, Ibáñez-Molina, & Carreiras, 2015; Woumans et al., 2015). To conclude, contextual faces facilitate the selection of lexical representation of the language that matches the identity (i.e., socio-cultural or familiarity) of the presented face and might interfere when they do not match. In the present study, we take a step further and investigate how socio-cultural faces influence bilingual language control. In the lab, arbitrary cues (usually coloured line cues) are used to investigate bilingual language control during switching tasks. However, in daily life, facial cues are present that seem to cue us towards speaking in a specific language. Therefore, we investigate whether contextual faces modify language control in bilinguals.
The present study
In present study, we investigate what the effect of contextual faces is on both the switch cost (i.e., local language control) and the reversed language dominance effect (i.e., global language control). In Experiment 1, non-proficient Chinese–English bilinguals performed a language-switching task in which pictures were named in either Chinese (L1) or English (L2). This was done in three contexts: (1) the baseline context, during which no faces were presented; (2) the congruent context, during which a face was presented together with the cue that was congruent with the language to be spoken (e.g., naming a picture in Chinese while seeing an Asian face); and (3) the incongruent context (e.g., naming a picture in Chinese while seeing a Caucasian face). If contextual faces affect bilingual language control, different patterns of switch costs or different reversed language dominance effects could be observed across the three contexts. Specifically, faces with socio-cultural identity facilitated speech production when they matched the language, and this facilitation was stronger for the dominant language than the non-dominant language (Li et al., 2013). The bilinguals in the present study were more familiar with faces with an Asian background than faces with a Caucasian background. This has been shown by the so called “own-race effect” that is reflected by stronger activation for the own than the other face. Therefore, the dominant Asian cultural cue (face) is primed stronger and integrated quicker with the language to be spoken (Li et al., 2013; Mathur, Harada, & Chiao, 2011). Based on this, we predict that switching back to the dominant Chinese language (with familiar faces) will be easier than switching back to the non-dominant language (with unfamiliar faces). Thus, we expect the pattern of switch cost in the congruent context will be symmetric or even asymmetric with larger switch costs towards L2, in contrast to the other two contexts. Furthermore, we did not expect that the reversed dominance effect, a measure of global control, would be modulated by context. A previous study by Roychoudhuri and colleagues (2016) showed that cultural context did not modulate the size of the mixing cost (i.e., another index of global language control, see Prior & Gollan, 2013).
In Experiment 2, we apply the same design, only the cue presentation time was reduced to avoid preparation of the language to name the picture. We expected to find the same contextual results as in Experiment 1.
Experiment 1
Participants
Thirty undergraduate students from the South China Normal University were paid to participate in the experiment, which had been approved by the ethical committee of the local authority. All participants were right-handed with normal or corrected-to-normal vision, and they signed a written informed consent form prior to their participation. The participants were non-English major students, and their mean age of acquisition (AoA) was 8.00 (±2.50) for English. The participants rated their proficiency level in L1 (Chinese) and L2 (English) for listening, speaking, reading, and writing on a 7-point Likert-type scale, with 7 indicating the highest level of proficiency and 1 indicating the lowest level of proficiency. Paired-samples t tests revealed a significant difference between the proficiency ratings in L1 and L2 for all four language skills (all ts > 6.99, all ps < .001), suggesting that the participants were unbalanced bilinguals with a higher proficiency level in L1 than in L2 (see Table 1).
Means (and SDs) of AoA and proficiency ratings in four language skills for both Chinese and English in both experiments.
AoA = age of acquisition.
Materials
The pictures included 70 black and white drawings, selected from the database of Zhang and Yang (2003), of which 10 pictures were used in practice phase. Attributes such as familiarity, visual complexity, and image agreement are matched according to Chinese and English norm data from Snodgrass and Vanderwart (1980) and Zhang and Yang (2003), respectively.
Task and procedure
We used a picture-naming task in this experiment, which was adapted from the task used by Li et al. (2013). Before the experiment, the participants familiarised themselves with the pictures and their corresponding names in both Chinese and English. During the experiment, each trial began with a red or blue frame for 500 ms, followed by a picture of a person’s face together with the frame for another 500 ms. Then, a picture of an object appeared in the middle of a coloured frame. Participants were instructed to name the picture as soon as possible, either in Chinese if the frame was red or in English if the frame was blue. The picture remained on the screen until a response was given or after 3,000 ms had passed. The next trial began after the presentation of a blank screen for 500 ms (see Figure 1). The face of the person holding the picture frame was either Asian or Caucasian (half male faces and half female faces in each block), and nothing about the face would be told to participant during the experiment.

The trial procedure for the congruent (left panel) and the incongruent (right panel) contexts in Experiment 1.
Participants named all pictures in three blocks: a baseline block, a congruent block, and an incongruent block. In the baseline block, the participants were naming a picture in Chinese or English without the presentation of faces. In the congruent block, the participants were naming a picture in Chinese while seeing an Asian face or naming a picture in English while seeing a Caucasian face. By contrast, in the incongruent block, the participants were naming a picture in Chinese while seeing a Caucasian face or naming a picture in English while seeing an Asian face. The block order was counterbalanced across participants. In all three blocks, the ratio of switch-to-repeat trials was 1:1. Each block consisted of 61 experimental trials, with the first trial being the filler trial, so there will be 15 trials for each trial type (i.e., L1 repeat, L2 repeat, L2-L1 switch, and L1-L2 switch). Before the experimental blocks, there was a practice block of 10 trials. A video recording software named “EV Capture” recorded the progress of the experiment, including the verbal responses made by the participants, which were checked for accuracy post-experiment.
Therefore, the experiment conforms to a 3 (contexts: baseline vs. congruent vs. incongruent) × 2 (language: L1 vs. L2) × 2 (transition: repetition vs. switch) within-subjects design, with reaction time (RT) and accuracy as the dependent variables.
Results
The first trial of each block and the error trials (containing trials named incorrectly, trials named in an incorrect language, or trials without any response) were excluded from RT analyses, as were trials following an error trial. We also discarded trials with RTs over 2.5 SDs below or above the mean (per condition) (Wang, Fan, Liu, & Cai, 2016). Taking these criteria into account, 8.6% of the data (ranging from 4.2% to 11.6% for different conditions) for error trials and 4.6% of the data (ranging from 1.7% to 7.6% for different conditions) for outlier trials were excluded. Furthermore, three participants were excluded from the analysis due to an error rate higher than 25% (Liu, Jiao, Sun, & Wang, 2016).
Analyses were conducted using mixed-effects models with crossed random effects for subjects and items using the lme4 package (Bates, Maechler, Bolker, & Walker, 2014) and the lmerTest package (Kuznetsova, Brockhoff, & Christensen, 2014) in the statistical software R (version 3.4.3). Mixed-effects models are preferable to analysis of variance (ANOVA) because they allow random effects of participants and items to be considered simultaneously, making the data modelling more appropriate and the results generalisable to the other subjects and items. We fit a mixed-effect model for RT data, with contexts (congruent vs. incongruent vs. baseline), language (L1 vs. L2), transition (repetition vs. switch), and their interactions as fixed effects. As random effects, we included by-participant and by-item random intercepts, by-participant random slopes for contexts and language, and by-item random slopes for transition. The other factors and the interaction among the three within-subject factors were consequently excluded in the fitted model because they did not improve the model fit (ps > .05) (see Hsu & Novick, 2016; Huang, Zheng, Meng, & Snedeker, 2013). For this model, the context variable was coded using dummy coding so that the baseline context serves as a reference level to which all other levels are compared. All other variables were coded using mean-centred contrast coding (i.e., repetition = –0.5, switch = 0.5; L1 = –0.5, L2 = 0.5), yielding tests of the main effects directly analogous to that obtained from an ANOVA. Because not all effects are estimated in a single model when a variable contains three or more levels, we refitted the model by defining the congruent context and incongruent context serve as a reference level separately.
As shown in Tables 2 and 3, the RT data revealed the effects of Transition term were significant in all three contexts (t = 4.60, p < .001 in baseline context; t = 3.95, p < .001 in congruent context; t = 5.46, p < .001 in incongruent context), suggesting that the response latencies in switching trials are significantly larger than those in repetition trials. There was also a main effect of Context (t = 2.51, p = .018), indicating slower response latencies for the incongruent context than baseline context. Moreover, the marginally significant Transition × Language parameter in congruent context (t = 1.72, p = .086) indicated that L1-L2 switch costs are slightly larger than L2-L1 switch costs (i.e., asymmetric switch costs). In contrast, the Transition × Language parameter in baseline context (t = –0.07, p = .946) and incongruent context (t = 0.32, p = .753) were non-significant (i.e., symmetric switch costs). However, all the three-way interactions were non-significant (ps > .05), indicating that the differences in patterns of (a)symmetric switch costs among the three contexts did not reach significance (see Figure 2).
Mean RTs and accuracy for all three contexts in Experiment 1 (standard deviations in parentheses).
RT = reaction time.
Mixed-effects model for RTs in Experiment 1.
RT = reaction time; SE = standard error.
p < 0.05, **p < 0.01, ***p < 0.001.

Switch costs (a) and reversed language dominance effect (b) in RTs (left panel) and accuracy (right panel) for all three contexts in Experiment 1. Error bars represent standard errors.
There were significant effects of Language term, reflecting slower naming in L1 than L2 (i.e., reversed dominance effect). The Language term did not interact with Context, indicating a same reversed language dominance effect in all three contexts (t = –2.57, p = .012 in baseline context; t = –3.11, p = .002 in congruent context; t = –3.82, p < .001 in incongruent context). This suggests that the reversed language dominance effect is not modulated by the facial contexts.
Similarly, the logistic mixed-effects model was fitted to accuracy data, with the same fixed structure as in the linear mixed-effects model for RT. However, for random effects, we included only by-participant random slopes for contexts, by-participant, and by-item random intercepts. The results showed the effects of Transition term were significant in congruent context (z = –2.95, p = .003) and incongruent context (z = –3.96, p < .001), but not in baseline context (z = –1.50, p = .134), suggesting that the participants performed better in switching trials than repetition trials only in the congruent and incongruent contexts (see Table 2). However, the Transition × Language interactions in all the three contexts and the three-way interactions were non-significant (ps > .05, see Table 4 and Figure 2), indicating that the differences in patterns of (a)symmetric switch costs among three contexts did not reach significance. Moreover, similar to the results in RTs, the Language term was statistically significant and did not interact with Context. Thus, the same reversed language dominance effect was found in all three contexts for accuracy as well (z = 2.32, p = .021 in baseline context; z = 2.67, p = .008 in congruent context; z = 2.64, p = .008 in incongruent context), which suggested that the reversed language dominance effect is not modulated by the facial contexts. It has to be noted that accuracy data usually show fewer effects than RTs (e.g., Liu et al., 2018; Wu et al., 2018), as is the case in the current study. This is most likely due to ceiling effects as participants had on average 95.5% correct.
Mixed-effects model for accuracy in Experiment 1.
SE = standard error.
p < 0.05, **p < 0.01, ***p < 0.001.
Discussion
The results of Experiment 1 showed a symmetric switch cost in both the baseline and the incongruent contexts, but an asymmetric switch cost in the congruent context. This asymmetry in the congruent context showed a larger switch cost for the L2 than the L1, which contrasts with previous studies that have shown larger costs to the L1 without the presence of pictures (Meuter & Allport, 1999; Philipp et al., 2007). Specifically, it seems to be the case that the switch cost for the L1 is reduced from 85 ms at baseline to 7 ms in the congruent context. In other words, an Asian facial cue, with which the participants were familiar during their daily life, enhanced the activation of their first language after they named in their second language. Thus, a familiar race face helps people to switch back into their dominant language. Previous studies have shown stronger facilitation of face race for the first language (i.e., “own-race effect”; see Li et al., 2013; Mathur et al., 2011). This suggests that when the socio-cultural identity of the faces matches the language to be spoken, local language control (i.e., the pattern of language switch cost) is adapted. However, such unique switch cost pattern in the congruent context was only observed in RTs but not in accuracy, which might be due to ceiling effects as participants had high accuracy. In addition, we observed the same reversed language dominance effect for both RTs and accuracy across the three contexts. This indicates that contextual faces did not affect global language control.
While the findings of Experiment 1 showed asymmetric switch cost in the congruent contexts and symmetric switch cost in the other two contexts, the absence of a three-way interaction suggested the differences in patterns of (a)symmetric switch costs among three contexts did not reach significance. However, the pattern is visually clearly different in the three contexts. One possible cause for the non-significant interactions is that participants saw the face cue for a full second before the target picture appeared (i.e., long cue-stimulus interval). Therefore, participants had a long time to integrate the facial cues with languages, which might diminish the effect of facial context. Previous findings showed that preparation time can play a critical role in modulating the asymmetry of the language switch cost, with asymmetric costs with short preparation time and symmetric switch costs with longer preparation time (Ma, Li, & Guo, 2016; Verhoef, Roelofs, & Chwilla, 2009). Therefore, it would be important to manipulate the cue-stimulus interval and explore the effect of facial context effect with a shorter cue-stimulus interval.
Moreover, the observed symmetric switch cost in the baseline and incongruent context here has not always been found. Usually, asymmetric switch costs (i.e., larger for L1) are found for unbalanced bilinguals (Philipp et al., 2007; Schwieter & Sunderman, 2008), although others have demonstrated symmetric switch costs (for reviews, see Bobb & Wodniecka, 2013; Declerck & Philipp, 2015a). We suspected that this might have arisen from the long cue-stimulus interval (1,000 ms) in the current experiment.
Overall, to exclude the confounding effect of preparation time, we conducted Experiment 2 with short cue-stimulus interval to further investigate the facial context effect.
Experiment 2
Participants
Thirty undergraduate students from the South China Normal University were paid to participate in the experiment. None had participated in Experiment 1. All participants signed the written informed consent form, and the study had previously been approved by the local authority. The participants were right-handed, had normal or corrected-to-normal vision, and had no reported psychological conditions. They were non-English major students, and all completed the self-rating questionnaire used in Experiment 1. The mean proficiency ratings of L1 and L2 in listening, speaking, reading, and writing are shown in Table 1. Paired-samples t tests revealed a significant difference between the proficiency ratings of L1 and L2 for all four skills (all ts > 5.11, all ps < .001), suggesting that the participants were unbalanced bilinguals with higher proficiency in L1 than in L2.
Materials
The materials used in Experiment 2 were the same as that used in Experiment 1.
Task and procedure
The task and procedure for Experiment 2 were identical to those for Experiment 1 except that each trial began with a fixation instead of a red or blue frame (see Figure 3).

The trial procedure for the congruent (left panel) and the incongruent (right panel) contexts in Experiment 2.
Results
We used outlier criteria and error definitions identical to those in Experiment 1, which resulted in the exclusion of 13.1% of the data (ranging from 6.1% to 14.8% for different conditions) for error trials and 3.4% of the data (ranging from 1.1% to 6.3% for different conditions) for outlier trials. Two participants were excluded from analysis for having an error rate higher than 25%. The mean RTs and accuracy are presented in Table 5.
Mean RTs and accuracy for three contexts in Experiment 2 (standard deviations in parentheses).
RT = reaction time.
Analyses were carried out using R 3.4.3, implemented with lme4 package (Bates et al., 2014) and the lmerTest package (Kuznetsova et al., 2014). We fit mixed-effects models that included contexts (congruent vs. incongruent vs. baseline), language (L1 vs. L2), transition (repetition vs. switch), and their interactions as fixed effects. The model also included random intercepts for subjects and items; and by-participant random slopes for contexts, language, and transition; and by-item random slopes for transition. The other random slopes were typically excluded from further analyses because their inclusion did not improve overall fit (see Hsu & Novick, 2016; Huang et al., 2013). The coding method for three variables were same as Experiment 1.
As shown in Table 6, the RT data revealed the effects of the Transition term were significant in all three contexts (t = 3.58, p < .001 in baseline context; t = 4.67, p < .001 in congruent context; t = 5.11, p < .001 in incongruent context), suggesting that the response latencies in switching trials are significantly larger than those in repetition trials. There was also a main effect of Context (t = 4.67, p < .001), indicating slower response latencies for the incongruent than congruent context. However, the reversed language dominance effect observed in Experiment 1 was not found, as reflected by the non-significant Language term (t = 0.22, p = .827 in baseline context; t = –0.82, p = .411 in congruent context; t = –0.80, p = .427 in incongruent context). Neither did the Language term interact with Context. Critically, the marginally significant Transition × Language parameter in congruent context (t = 1.82, p = .069) indicated that L1-L2 switch costs are slightly larger than L2-L1 switch costs (i.e., asymmetric switch costs). In contrast, L2-L1 switch costs are numerically larger than L1-L2 switch costs were observed by the Transition × Language parameter in baseline context (t = –1.84, p = .066) and incongruent context (t = –1.04, p = .301), although these differences did not reach significance. More importantly, the different patterns of asymmetry of switch costs in congruent context as compared with the other two contexts were confirmed by the significant Congruent × Transition × Language parameter (t = 2.78, p = .005) and Incongruent × Transition × Language parameter (t = –2.20, p = .028). In addition, another non-significant Incongruent × Transition × Language parameter (t = 0.59, p = .557) indicated there were similar symmetric switch costs between baseline context and incongruent context (Figure 4).
Mixed-effects model for RTs in Experiment 2.
RT = reaction time; SE = standard error.
p < 0.05, **p < 0.01, ***p < 0.001.

Switch costs (a) and reversed language dominance effect (b) in RTs (left panel) and accuracy (right panel) for all three contexts in Experiment 2. Error bars represent standard errors.
A logistic mixed-effects model was also fitted to accuracy data, with the same fixed structure as in linear mixed-effects model for RT. However, we only included by-participant and by-item random intercepts for random effects. Only the effect of Transition in incongruent context was significant (z = –2.55, p = .011) (see Table 7). None of the effect of Language (z = 0.85, p = .396 in baseline context; z = –1.16, p = .245 in congruent context; z = 1.06, p = .290 in incongruent context) or interactions (ps > .05) revealed any significant effects. In line with Experiment 1 and previous studies (Liu et al., 2018; Wu et al., 2018), we only found few significant results of accuracy, which is most likely due to ceiling effects as participants had on average 93.3% correct.
Mixed-effects model for accuracy in Experiment 2.
SE = standard error.
p < 0.05, **p < 0.01, ***p < 0.001.
In addition, to better illustrate how preparation time modulates language control in different contexts directly, we fit mixed-effects models that included experiment (Experiment 1 vs. Experiment 2), contexts (congruent vs. incongruent vs. baseline), language (L1 vs. L2), transition (repetition vs. switch), and their interactions as fixed effects. The model also included random intercepts for subjects and items; and by-participant random slopes for contexts, language, and transition; and by-item random slopes for transition. The other random slopes were typically excluded from further analyses because their inclusion did not improve overall fit (see Hsu & Novick, 2016; Huang et al., 2013).
The effects of Experiment × Transition × Language in baseline context (t = –2.00, p = .046) and incongruent context (t = –1.41, p = .158) (although this difference did not reach significance) suggested that, compared with the longer preparation time in Experiment 1, the asymmetry of switch costs was numerically larger for the shorter preparation time in Experiment 2. However, the effect of Experiment × Transition × Language in congruent context was not significant (t = –0.22, p = .823). Moreover, the effects of Experiment × Language in all three contexts were significant (t = 2.16, p = .033 in baseline context; t = 2.17, p = .032 in congruent context; t = 2.80, p = .006 in incongruent context), suggesting that the reversed language dominance effects for shorter preparation time in Experiment 2 are significantly different from that for longer preparation time in Experiment 1.
Discussion
Three important results were obtained in Experiment 2. First, with the short cue-stimulus interval, we replicated the typical asymmetrical language switch cost (i.e., larger for the L1) in our baseline context as reported by previous studies (Meuter & Allport, 1999; Philipp et al., 2007). Combined with the observed symmetrical switch costs with long cue-stimulus interval in Experiment 1, we confirmed the previous finding that preparation time plays a critical role in modulating language switch costs (Ma et al., 2016; Verhoef et al., 2009). Second, we replicated the contextual faces effect found in Experiment 1: the congruent context revealed a reversed asymmetry in switching from the baseline and incongruent contexts. Third, the observed reversed language dominance effects in Experiment 1 were all absent in Experiment 2, which suggests that the reversed language dominance effects could be modulated by preparation time.
General discussion
We investigated whether language control in bilingual speakers is influenced by contextual faces. Unbalanced Chinese–English bilinguals performed a language-switching task in three different contexts: congruent (e.g., Asian face when naming in Chinese), incongruent (e.g., Asian face when naming in English), and baseline (i.e., no face). The results revealed that contextual faces could affect local language control when the face identity matched the language (i.e., change of the switch cost pattern), but did not affect global language control (i.e., reversed language dominant effect). In addition, we also showed that preparation time affected both local and global language control.
The influence of contextual faces on language control
The main finding of the current study was the difference in switch costs across the three contexts in both experiments. First, a unique pattern of language switch costs was found in the congruent context as compared with the baseline context. In the congruent context, we find an asymmetrical switch cost, with a larger cost for the L2 than the L1, whereas the baseline context showed asymmetric switch cost in the opposite direction in Experiment 2 (and a symmetric switch cost in Experiment 1). How can language control models accommodate this observation? The most likely explanation is that the activation threshold to access to lexical representation of the two languages are altered by the presence of socio-cultural facial cues. Within a mixed-language context, the IC account has suggested that the stronger L1 needs to be inhibited more than the weaker L2. When speaking in your L2, lexical representations from the L1 are inhibited, making it harder to access L1 words. When switching back to the L1, this language needs to be re-activated, creating a delay in naming (Green, 1998). When an Asian face is presented on a switch trial towards Chinese, the lexical representations for Chinese are already re-activated, making it easier to re-activate the strongly suppressed L1 and pass the threshold for starting speech production. This reduces the switch cost for L1 compared with L2. Previous studies have shown that within a single-language context, faces with socio-cultural identity (Li et al., 2013) or familiar faces (Woumans et al., 2015) facilitated speech production when they matched the language to be spoken. And this was stronger for the dominant first language than the non-dominant second language (Li et al., 2013), as the “own-race effect” indicated that our bilinguals are more familiar with faces of Asian background and these facilitate switching back to their dominant language, Chinese (Mathur et al., 2011). The no-face context is how language control is usually tested within the lab. However, within daily life, we are surrounded by faces that help us decide in which language to speak. Therefore, the actual language-switching control pattern in daily life might be more like that shown in our congruent context instead of the patterns previously reported in the literature. A reversed asymmetric switch cost has been found before in the literature when bilinguals were mainly speaking in their non-dominant second language (Timmer et al., 2018) or when bilinguals were given more preparation time so that both of their languages could be activated to the same degree (Ma et al., 2016).
Second, the switch cost pattern in the incongruent context was the same as in the baseline context. Specifically, both contexts showed a symmetrical in Experiment 1 and an asymmetrical switch cost (larger to L1) in Experiment 2. Although incongruent faces did not affect local control, we found overall slower responses in the incongruent context than baseline context in Experiment 1. This suggests that the identity of a face that is incongruent with the language to be spoken has a general interference effect, a slowdown in all conditions compared with the baseline. This is in line with previous studies that show incongruent faces can impede speech production (Zhang et al., 2013). During a blocked language condition, Li and colleagues (2013) did not reveal interference of incongruent faces. Therefore, we show that face interference most likely only takes place within the more difficult task of language switching, where the relationship between the cue/face and language to be spoken must be re-evaluated on each trial.
Next, the reversed language dominance effect did not change across the different facial contexts in either Experiment 1 or Experiment 2. Therefore, the global level of language control was not adjusted based on socio-cultural face cues. It is suggested that global slowing of L1 is adopted within a language-switching environment to facilitate efficient performance in both the stronger and the weaker language (Costa & Santesteban, 2004; Gollan & Ferreira, 2009). It seems that non-linguistic facial cues do not affect the relative activation of each language at a global level, but only modulates language activation on a trial-by-trial basis. This is consistent with Roychoudhuri and colleagues (2016) study, where no effect of cultural cues was found (i.e., one type of non-linguistic context) on mixing costs (i.e., another index of global language control). Whereas Roychoudhuri et al. (2016) found no non-linguistic contextual (i.e., cultural cues) effects on both local and global language control, the current study shows non-linguistic contextual faces exerted an influence on local language control but not global language control.
In summary, our findings supported and expanded the adaptive control hypothesis, in which contexts modulate bilingual language control (Green & Abutalebi, 2013). The adaptive control hypothesis proposed that linguistic experience with different interactional contexts of conversational exchanges place different level of demand on the brain and cognitive systems and adaptively alter their language control. The present study (both Experiments 1 and 2) investigated the performance of bilinguals in different non-linguistic context and observed different patterns of language switch costs across contexts. We proposed that in addition to the linguistic context, the non-linguistic context should also be added to the adaptive control hypothesis.
Modulation of language control by preparation time
Preparation time plays a critical role in modulating local language control (i.e., switch costs) (e.g., Declerck, Philipp, & Koch, 2013; Khateb, Shamshoum, & Prior, 2017; Mosca & Clahsen, 2015). For example, Verhoef et al. (2009) observed a symmetric switch cost with long preparation but an asymmetric switch cost with short preparation. The current study replicated this finding: symmetric switch cost in Experiment 1 (1,000 ms preparation time) and asymmetric switch cost in Experiment 2 (500 ms preparation time). A longer preparation time helps overcome the stronger inhibition on the L1 than L2, resulting in symmetrical switch costs for both languages (Ma et al., 2016). However, the preparation effects on switch costs were not observed in all studies, we argued that it might be because such effects would be modulated by many variables such as the language proficiency of the participants. For example, whereas Costa and Santesteban (2004) found no effects of preparation time by testing Stimulus-onset asynchrony (SOA) with high-proficient bilinguals, we found such effect with low-proficient bilinguals. Moreover, we found that preparation time could also modulate global language control.
In the current study, the reversed language dominant effects observed in Experiment 1 with longer preparation time was absent in Experiment 2 with short preparation time. Although global slowing of L1 has often been found, there are some studies that do not show it (Linck et al., 2012; Prior & Gollan, 2013) or only find relative global slowing of L1 in the form of mixing cost (De Bruin, Roelofs, Dijkstra, & FitzPatrick, 2014). In the current study, we see that with a longer preparation time, a global level of control is applied (i.e., reversed dominance effect), but with a shorter preparation time, the level of control becomes local on a trial-by-trial basis (i.e., asymmetric switch cost). In other words, the locus of control flexibly changes depending on the preparation time. Changes in the locus of control have also been found depending on whether bilinguals are mainly talking in their dominant language (global level of control) or in their non-dominant second language (local level of control) (Timmer et al., 2017b, 2018). To better examine how preparation time modulates global language control, further research is necessary. Thus, the locus of bilingual language control is flexibly adjusted depending on the amount of preparation time.
Implications for bilingual language control model
While the existing evidence had clearly confirmed that the functioning of the bilingual language control system was influenced by language proficiency and the individual capacity of cognitive control (Green, 1998; Liu, Liang, Dunlap, Fan, & Chen, 2016; Liu, Liang, Zhang, Lu, & Chen, 2017), the more recent adaptive control hypothesis proposed that we should consider that language control can potentially change depending on processing context such as socio-cultural faces (Green & Abutalebi, 2013). According to such perspective, the current study suggests a non-linguistic context effects on bilingual local but not global language control by contextual faces from a new perspective. Moreover, combining the findings of recent studies with the view that linguistic context such as language context (e.g., Olson, 2015; Timmer et al., 2018) or sentence context (e.g., Declerck & Philipp, 2015b) could shape the language control, we argued that both the linguistic and non-linguistic context could potentially change the workings of language control system flexibly.
Conclusion
In sum, the present study revealed different pattern of language switch costs and similar reversed language dominant effects across contexts with various social-cultural identity of faces. This led to the conclusion that the contextual faces only play a critical role in modulating the local bilingual language control but not global language control.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
This work was supported by the Foundation for innovation team in Guangdong Higher Education (2016WCXTD006), Funding for Key Laboratory for Social Sciences of Guangdong Province (2015WSY009), Guangdong Province Universities and colleges Pearl River Younger Scholar Funded Scheme (2016). Kalinka Timmer was supported by postdoctoral funding from the Dutch Organization for Scientific Research (NWO) with the Rubicon grant (446-14-006) and from the Ministerio de Economía, Industria y Competitividad (MINECO) in Spain with the Juan de la Cierva grant (IJCI-2016-28564).
