Abstract
It is well known that multilingual speakers’ nonnative productions are accented. Do these deviations from monolingual productions simply reflect the mislearning of nonnative sound categories, or can difficulties in processing speech sounds also contribute to a speaker’s accent? Such difficulties are predicted by interactive theories of production, which propose that nontarget representations, partially activated during lexical access, influence phonetic processing. We examined this possibility using language switching, a task that is well known to disrupt multilingual speech production. We found that these disruptions extend to the articulation of individual speech sounds. When native Spanish speakers are required to unexpectedly switch the language of production between Spanish and English, their speech becomes more accented than when they do not switch languages (particularly for cognate targets). These findings suggest that accents reflect not only difficulty in acquiring second-language speech sounds but also the influence of representations partially activated during on-line speech processing.
Keywords
The nonnative productions of multilingual speakers are accented; relative to monolingual speakers, their productions deviate toward the phonetic properties of their first language. Such deviations reflect, in part, the mislearning of nonnative sound categories (Flege, 1987). Can difficulties in speech-sound processing also contribute to a speaker’s accent? We examined this possibility using language switching. Proficient multilingual speakers can readily switch between languages—producing language-appropriate words (e.g., Spanish casa vs. English house) and speech sounds (e.g., pronouncing the first vowel in taxi as /a/ vs. /æ/ when switching between Spanish and English). However, switching is challenging, and it can result in disruptions to speech production (Meuter & Allport, 1999). Do these disruptions extend to phonetic processing, causing a bilingual speaker’s native language to further contaminate nonnative productions?
Such effects are predicted by interactive theories of language production. Disruptions to lexical access decrease target activation or increase the activation of competitors. In interactive theories, these on-line reductions in relative target activation disrupt phonetic processing on a trial-by-trial basis—a prediction confirmed by studies of monolingual speakers (Goldrick & Chu, 2013; Kello, Plaut, & MacWhinney, 2000). Do disruptions due to language switching lead to similar effects? These theories predict that trial-to-trial variations in the degree to which native-language representations are activated should influence the degree to which nonnative productions are accented.
Previous work has shown that the degree to which native-language properties intrude into nonnative productions varies across contexts. Accents increase in mixed-language contexts (e.g., sentences with Greek and English words) compared with single-language contexts (e.g., entirely English sentences; Antoniou, Best, Tyler, & Kroos, 2011; Bullock, Toribio, González, & Dalola, 2006; Flege, 1991). However, it is unclear whether accents can also shift within mixed-language contexts. One study found that when mixed-language sentences are read aloud, accents increase at points where speakers plan to switch languages (Bullock et al., 2006); however, two other studies failed to show systematic effects (Grosjean & Miller, 1994; López, 2012).
In the study reported here, we extended this work by examining how trial-specific processing disruptions due to unplanned switching influence speakers’ accents (see also Olson, 2013). Native Spanish speakers named pictures with a colored frame indicating Spanish or English as the response language (adapted from Meuter & Allport, 1999). If switching modulates a speaker’s accent, we expected participants to show a greater accent on switch trials (in which the language of production differed from that on the previous trial) than on stay trials (in which the language of production was the same as on the previous trial).
We indexed accents using the contrast between voiced (/d/) and voiceless (/t/) sounds. Acoustically, these sounds are distinguished by voice-onset time (VOT; the time between the release of the consonant’s constriction and the onset of periodicity signaling modal vocal-fold vibration). In Spanish, vocal-fold vibration typically starts before the release of voiced stops’ constrictions; voiceless consonants are produced with a short positive lag between constriction release and vocal-fold vibration. In contrast, in English, the voiced versus voiceless contrast is realized by a short and long positive lag, respectively (Lisker & Abramson, 1964). The conflicting realizations of this contrast are reflected in native Spanish speakers’ English productions. These speakers deviate toward the phonetic properties of Spanish, producing shorter and more prevoiced VOTs than monolingual English speakers (Flege, 1991). We examined whether this effect is enhanced by the demands of unexpectedly switching languages.
Method
Participants
Ten native Spanish speakers from Barcelona, Spain, participated. All speakers had some knowledge of Catalan (which has the same voicing contrast as Spanish; Recasens & Mira, 2013). These speakers began learning English in childhood (mean age of onset of English learning = 6.8 years, range = 3–11). Proficiency scores in the three languages (reported in Table 1, along with participants’ age) were obtained through a questionnaire filled out by the participants after the experiment (see also Runnqvist & Costa, 2012; Runnqvist, Strijkers, Alario, & Costa, 2012). Proficiency was scored on a 4-point scale (4 = native-speaker level, 3 = advanced level, 2 = medium level, and 1 = low level). The self-assessment index represents the average of the participants’ responses in four domains (speech comprehension, speech production, reading, and writing).
Participants’ Age and Self-Rated Proficiency in Spanish, Catalan, and English (1 = Low Level, 4 = Native-Speaker Level)
Materials and procedure
The 32 target items were evenly split between voiced (e.g., /d/; door, dinero) and voiceless (e.g., /t/; tent, toro) initial words. Within each language, words were evenly split between cognate and noncognate targets and evenly distributed across four vowels (/ϵ/, /i/, /o/, and /A/ in Spanish; /ϵ/, /I/, /oU/, and /æ/ in English; Table A1 in the Appendix provides a full list of stimuli). A set of 24 additional pictures served as fillers.
Each trial consisted of a picture that participants had to name in either Spanish or English (the response language was indicated by the color of the frame around the picture). Pictures were presented in short sequences ranging in length unpredictably from 5 to 14 trials. Each sequence contained at least one switch between languages. For experimental trials, the response language was either the same (stay trials) or different (switch trials) than on the previous trial. Additionally, the target name could have the same (related trials) or a different (unrelated trials) initial phoneme as on the previous trial. Each of the 16 targets for each language was presented twice in each of these four contexts (switch vs. stay × related vs. unrelated). This yielded a total of 128 trials in each language; these 256 experimental trials were embedded within a total of 832 trials.
Before the experiment proper, participants were familiarized with the picture names. Each picture was presented with its English and Spanish labels for 4 s. Participants then received both written and oral instructions and were asked to name each picture as rapidly and accurately as possible. Each trial consisted of the presentation of a blank screen (300 ms), a fixation mark (300 ms), another blank screen (300 ms), and a picture with a colored frame indicating the response language (maximum of 1,750 ms). Pictures disappeared from the computer screen once a response was detected. The experiment was administered on PCs running DMDX software (Forster & Forster, 2003), which also recorded the vocal responses.
Acoustic analysis
Trials with production errors (identified by a proficient Spanish-English speaker) and recording errors were excluded (6.8% of all trials; N = 2,560). Two coders naive to the purpose of the study performed acoustic analysis. VOT was defined as the time from the burst to the onset of periodicity. Prevoicing was defined as cases in which periodicity began prior to the burst. Recordings were randomly assigned to the coders. To assess reliability, we selected 99 tokens from across several talkers. VOTs on these tokens were strongly positively correlated, r(97) = .87.
Results
Linear mixed-effects models including the maximal (uncorrelated) random-effects structure were used to analyze the likelihood that voiced stops would be produced with a short-lag VOT and the VOT of voiceless stops. 1 Significance of fixed effects was assessed using model comparison.
For voiced consonants (Fig. 1a), participants successfully code-switched, producing more English-appropriate short-lag VOTs on English than on Spanish trials—β = 1.16, SE = 0.25; χ2(1) = 12.83, p < .001. The 95% confidence interval (CI) for the main effect of language on likelihood of a short-lag VOT was [14%, 22%]. Although there was no main effect of trial type, χ2(1) = 2.09, p < .15, trial type and language interacted, β = −0.77, SE = 0.24, χ2(1) = 3.75, p < .053. For English productions, trial type had a significant effect on the degree of accent; χ2(1) = 6.04, p < .02; specifically, productions were more accented on switch than on stay trials, 95% CI = [−6%, −14%]. However, no effect of trial type was observed for native-language Spanish productions, χ2(1) < 1.

Phonetic realization of (a) voiced consonants and (b) voiceless consonants. For voiced consonants, the mean percentage of consonants realized with short-lag voice-onset time (prototypical English pronunciation) versus prevoicing (prototypical Spanish pronunciation) is shown as a function of the language of production and trial type. For voiceless consonants, the mean voice-onset time is shown as a function of the language of production and trial type. Spanish voiceless stops are prototypically realized with relatively short voice-onset times compared with English stops.
Similar results were found for voiceless consonants (Fig. 1b). Participants successfully code-switched—β = 27.2, SE = 4.7, χ2(1) = 14.66, p < .001. The 95% confidence interval (CI) for the main effect of language on likelihood of a short-lag VOT was [24.3 ms, 34.2 ms]. Although there was no main effect of trial type, χ2(1) = 2.33, p < .15, trial type and language interacted, β = −4.1, SE = 1.5, χ2(1) = 5.84, p < .02. For English productions, trial type had a significant effect on the degree of accent; χ2(1) = 4.51, p < .04; specifically, productions were more accented on switch than on stay trials, 95% CI = [−3.7 ms, −8.6 ms]. However, no effect was observed in Spanish productions, χ2(1) < 1.
An additional set of regressions examined whether the critical interaction between trial type (stay vs. switch) and language (English vs. Spanish) was modulated by the phonological relatedness of the preceding trials or the cognate status of the target (both contrast-coded).
For voiced targets, the maximal, uncorrelated random-effects structure that converged included only random slopes for each main effect (no interactions). Neither three-way interaction reached significance, χ2s(1) < 1. For voiceless targets, the maximal, uncorrelated random-effects structure that converged included random slopes for each main effect, all two-way interactions, and the three-way interaction of cognate status, trial type, and language. The three-way interaction of trial type, language, and relatedness failed to reach significance, χ2(1) < 1. In contrast, cognate status significantly modulated the interaction of trial type and language, β = −7.2, SE = 2.9, χ2(1) = 5.69, p < .02. Follow-up regressions revealed a significant interaction of trial type and language for cognate targets, χ2(1) = 8.69, p < .005, but not for noncognate targets, χ2(1) < 1. As shown in Figure 2, participants’ English productions were more accented on switch than on stay trials for cognates, 95% CI = [−1.0 ms, −10.7 ms], but no significant effect was observed for noncognates, 95% CI = [1.8 ms, −4.9 ms].

Mean voice-onset time for (a) cognate targets and (b) noncognate targets as a function of the language of production and trial type. Spanish voiceless stops are prototypically realized with relatively short voice-onset times compared with English stops.
Discussion
When required to unexpectedly switch the language of production, native Spanish speakers’ English productions became more accented relative to when they did not switch languages. This finding extends previous work suggesting that placing speakers in a difficult production context can increase speakers’ accents (Gustafson, Engstler, & Goldrick, 2013; Howell & Dworzynski, 2001). The current findings extend these results, showing that speakers do not simply adopt a particular speech style in a difficult production context. Within a difficult context, processing disruptions on specific trials increase speakers’ accents.
These findings are consistent with a recent cued-switching study (Olson, 2013), which showed that Spanish-English speakers in the United States have greater accents on voiceless stops on stay trials than on switch trials. Our results show that switching affects not only voiceless but also voiced sounds, providing clear evidence that this effect decreases the contrast between these two sound categories. Note that Olson (2013) showed that this effect was found in the dominant, but not the nondominant, language and occurred only when the production context was heavily biased toward the nondominant language (not in a balanced context). In contrast, in a balanced production context, we found effects in the nondominant language (which tends to be more affected by cross-language interactions across a variety of production contexts).
As noted in the introduction, studies of monolingual speakers have shown that trial-specific disruptions to lexical access can lead to disruptions to phonetic processing. To account for such effects, interactive theories have proposed that variations in the relative activation of target versus nontarget representations within lexical access can influence phonetic processing (Goldrick & Chu, 2013; Kello et al., 2000). Such theories provide a ready account of the effects we observed here. On switch trials, nontarget language representations were more active than on stay trials. Interactive mechanisms—for example, cascading activation from nonselected representations—allow these partially activated nontarget representations to influence phonetic processing. This causes productions to deviate away from the target language toward the nontarget language—increasing speakers’ accents.
This mechanism provides a ready account of stronger effects for voiceless stops in cognate than on noncognate targets. A number of studies have suggested that native-language representations are activated during nonnative lexical access—particularly for cognate targets (Costa, Caramazza, & Sebastián-Gallés, 2000). The greater activation of nontarget language representations for cognates will cascade to phonetic processes, enhancing the degree to which phonetic properties of the nontarget language intrude during production. Amengual (2012) reports that, consistent with this account, when Spanish-English speakers read sentences aloud, Spanish voiceless stops are more accented in cognate than in noncognate words.
Although accents partially reflect the difficulties multilingual speakers have in successfully acquiring the sound system of a nonnative language, our results add to the body of work showing that difficulties in language processing also play an important role. These difficulties reflect general principles of the language-production system, shared by both monolingual and multilingual processing—interactive mechanisms that allow partially activated lexical and phonological representations to influence phonetic processing.
Footnotes
Appendix
Stimuli Used in the Study
| Spanish stimulus | English stimulus |
|---|---|
| tenedor (fork) | tent |
| delantal (apron) | desk |
| tijeras (scissors) | tin |
| dinero (money) | dish |
| toro (bull) | tor |
| doce (twelve) | door |
| taza (cup) | tap |
| dado (dice) | dancer |
| teléfono (telephone) | telephone |
| dentista (dentist) | dentist |
| tinte (tint) | tint |
| diploma (diploma) | diploma |
| tornado (tornado) | tornado |
| dormitorio (dormitory) | dormitory |
| taxi (taxi) | taxi |
| dálmata (Dalmatian) | Dalmatian |
Note: English translations of Spanish words are given in parentheses.
Declaration of Conflicting Interests
The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.
Funding
This research was supported by Grants BCS0846147 from the U.S. National Science Foundation, PSI2011-23033 and CONSOLIDER-INGENIO2010 CSD2007-00048 from the Spanish government, and SGR 2009-1521 from the Catalan government.
