Abstract
Missing from the body of literature on contact-induced phonological influence are studies that examine language variation as it occurs in speech production among members of a speech community. This study uses a corpus of naturally occurring Spanish/English code-switched discourse to determine whether cross-language phonological effects are evident in the data. Specifically, 2629 tokens of word-initial /d/ were analyzed in spontaneous interactions to identify the linguistic factors that condition the variable reduction (unreduced [d], reduced [ð]/Ø) of /d-/ in Spanish words. Cognate words (doctor) were found to reduce significantly less often than non-cognate words (después ‘after’). However, a significant effect is found for a novel, contextually informed measure that estimates words’ proportion of use in online contexts promoting reduction (Frequency in a Favorable Context). The greater a word’s prior exposure to online contexts promoting reduction, the greater the likelihood of reduced articulations. Indeed, this work argues that the distinction between cognates and non-cognates in fact emerges through this cumulative effect of significantly different patterns of use in discourse. Cognate /d/ words are used overall (considering speakers’ use of both English and Spanish) less often in contexts that promote reduction than non-cognate words. As a result of the diminished net exposure to reducing environments, per usage-based grammar, the lexical representations of cognate words have strengthened non-reduced exemplars ([d]). The distinct rates of variation for cognate words thus emerge from distinct usage patterns. This paper proposes such a focus on usage patterns within naturally occurring speech for phonological analyses within contact linguistics.
Multidisciplinary approaches to cross-linguistic phonological influence
Research into bilingualism has been undertaken from a variety of perspectives—ethnographic, historical, social, psychological, and linguistic—all with different methodologies. Indeed it is the ‘eclectic methodology’ drawn from various approaches that in some ways gives the field of contact linguistics its strength (Winford, 2003, p. 9). Over the last few decades there has been an ‘unprecedented upsurge’ of interest into linguistic analyses of bilingualism (Leikin, Schwartz, & Tobin, 2012, p. 1), including the psycholinguistics of bilingualism (Grosjean & Li, 2013), in which the quest to model how bilinguals process and store multiple languages takes center stage.
During this same time frame, usage-based approaches to linguistic analyses have been in the ascendency (Backus, 2012). Within a usage-based framework to language (and, hence, to bilingualism), social interactions and generalized cognitive processes are argued to be responsible for the structure and knowledge of language(s). Beckner et al. (2009) note that recent ‘research in the cognitive sciences has demonstrated that patterns of use strongly affect how language is acquired, is used, and changes’ (pp. 1–2). It is precisely these ‘patterns of use’ that will be the primary focus of this current work, and the mechanism by which they can ‘affect’ the cognitive representations of language (Bybee, 2010, p. 12), and hence linguistic outcomes (or ‘changes’), that are commonly noted in situations of language contact (e.g., convergence, interference, borrowing).
This work analyzes variable realizations of word-initial /d-/ in Spanish within an oral corpus of spontaneous Spanish/English code-switched discourse (described below). Realizations of word-initial /d/ provide apt data to explore cross-linguistic phonological influence because articulations differ across Spanish and English, with Spanish forms showing considerable variability in degree of reduction. Significant differences in /d-/ realizations between cognate and non-cognate words are found, with cognate words showing less reduction, which may be interpreted as evidence for cross-language phonological influence.
However, by employing an underutilized usage-based linguistic variable–FFC, or Frequency in a Favorable Context–that estimates lexicalized effects of patterns of use in discourse (cf. Bybee, 2002), this work argues that the source of the cognate effect in these data emerges out of distinct usage patterns of the word classes (cognate versus non-cognate). Non-cognate words appear more often in phonetic contexts that condition reduction and, unlike cognates, lack exemplar connections to non-reduced forms sharing phonological and semantic similarity (i.e., cognates).
Usage patterns within naturally occurring speech have been heretofore neglected for phonological analyses within contact linguistics. Yet the quantification of lexical effects of exposure to online (discourse) contexts promoting reduction is shown in this work to significantly predict variable realizations of words. A primary aim of this study, therefore, is to advocate for inclusion of this more contextually informed frequency measure.
With few exceptions (Bullock & Toribio, 2009; Grosjean & Miller, 1994; Olson, 2012, 2013), articulation in code-switched production remains largely unexplored and phonological analyses of spontaneous interactions are virtually absent from the literature. This lack of attention is noteworthy given the view of code-switching as a possible cause of innovation and propagation of change in language contact (Backus, 2005, p. 316). The present variationist analysis of variable realizations of word-initial /d/ in Spanish/English code-switched discourse redresses this lacuna, adding much-needed corpus-based evidence to the pool of linguistic phenomena attributed to contact and contributing ‘comparative data from on-going (synchronic) contact situations’ with which to build and test theories of bilingual lexical representation and speech production (Backus, Doğruöz, & Heine, 2011, p. 750).
A variationist usage-based approach
A usage-based approach to bilingual data
A usage-based approach assumes that language is shaped by the way in which it is used. As Bybee (2010, p. 1) explains: …the structural phenomena we observe in the grammar of natural languages can be derived from domain-general cognitive processes as they operate in multiple instances of language use. The processes to be considered are called into play in every instance of language use; it is the repetitive use of these processes that has an impact on the cognitive representation of language and thus on language as it is manifested overtly.
For phonological processes, a certain amount of online reduction is expected in production due to neuromuscular requirements of speaking (Raymond, Dautricort, & Hume, 2006). This has the potential to impact lexical representation because when a word is realized with a reduced form, the reduction impacts the word’s exemplar cluster or cloud by increasing the number of reduced exemplars or by strengthening existing reduced exemplars (Bybee, 2012, p. 216).
These exemplar clouds do not exist in isolation, however, but rather are conjectured to be embedded in a highly organized network of lexical connections based upon phonological and/or semantic overlap (Bybee, 2001). It is through this network of lexical connections that word-specific patterns of articulation can spread to other lexical items (Bybee, 2002, p. 272). Under the assumption that both the language-processing dynamics and the lexical representations of bilinguals are qualitatively similar to those of monolinguals, the theoretical and methodological advances of the usage-based approach to linguistics, albeit built largely on monolingual data, will be applied equivalently to bilingual data here. In this vein, the bilingual is viewed not dissimilarly from a monolingual (Grosjean, 1989), a view that is gaining traction (cf. Runnqvist, Strijkers, Alario, & Costa, 2012, p. 850). As such, just as knowledge and use of inflectionally and derivationally related forms within a language have been argued to impact pronunciation variations of related forms via networks of lexical associations (Brown, 2011; Bybee, 2001), this work argues that phonologically and semantically related forms across languages are open to comparable types of paradigmatic peer pressure. Bilingualism, then, is a special case of variable use, where languages may be separated ideologically, but not psycholinguistically (Franceschini, 2011; Kroll & Bialystok, 2013), and the organized network of lexical connections interleaves the distinct languages (Brown & Harper, 2009).
Figuring prominently in experiments investigating bilingual lexical representation are analyses of cognates, ‘translations with a similar meaning, phonology and orthography’ (de Groot, 1995, p. 167). By studying the phonetic outputs for specific phones (e.g., /t/), studies show that while speakers are adept at avoiding massive interference from the other language (Costa, Santesteban, & Caño, 2005, p. 137), subtle encroachments of one language on another in phonetic space do occur. Such permeability may be most noticeable in cognates (e.g., Amengual, 2012; Jacobs, 2007; Torres Cacoullos & Ferreira, 2000). As Bybee (2001, p. 26) observes, ‘the strength of association between [lexical] items with identical or similar features may vary according to the number and nature of the features…’. In this way, cognates that share phonological, semantic, and to some extent orthographic similarity, may be posited to establish a stronger interlingual connection than non-cognates, which do not share such similarities. The prediction that follows, therefore, is that word-initial /d/ will be realized more often as [d] (as opposed to [ð] or Ø) in cognates than non-cognates.
Discourse context frequency
As has been noted, patterns of language use are likely to provide key evidence to help ‘shape future models of the multilingual lexicon’ (Goral, Levy, Obler, & Cohen, 2006, p. 244). A primary goal of this study, therefore, is to outline precise patterns of use in bilingual speech production. There is precedent for examining ‘patterns of use’. For example, in an examination of variable rates of word final /-t, -d/ deletion in English, Bybee (2002) proposes that the lower deletion rates in regular past tense forms stem from the discourse pattern of past tense forms compared to other words, as they occur in discourse significantly more often in pre-vocalic contexts (a non-reducing context), as in verb-particle combinations, such as lived in or looked at. Thus, the lexical representation for the past tense forms will reflect fewer deleted forms overall, which reflects the contexts of use in discourse. The stored experiences, then, predict less subsequent deletion of past tense /-t,-d/ compared to other /-t,-d/ words, even when considered in identical contexts. The hypothesis is, then, that in addition to the online effect of phonetic context during articulation, there is a cumulative (lexicalized) effect of experience in specific discourse contexts that affects pronunciation of that word (Bybee, 2002).
Hall, Cheng, and Carlson (2006) note that ‘what gives rise to the differences in language knowledge are the particular circumstances within which an individual experiences and uses language’ (p. 230). The approach taken in the current work embodies and substantiates this assertion by demonstrating a cumulative (lexicalized) effect of patterns of use, in particular, discourse context frequency, which has been shown to be a powerful factor in variation and change (Brown & Raymond, 2012; Raymond & Brown, 2012).
The dependent variable: Spanish /d-/ realizations
Spanish has two primary allophones of /d/; the voiced, dental stop [d], and the voiced, dental approximant or fricative [ð] (Barrutia & Schwegler, 1994, pp. 114–120). The voiced, dental stop [d] is prescriptively preferred in a post-pause, post-nasal, or post-/l/ (voiced, lateral, alveolar) context, as can be seen in example (1). The second principle allophone [ð] is realized in all post-vocalic contexts and post-consonantal contexts not described for [d], such as that found in example (2).
(1) (2)
Sandra
el --
… humo del anis,
… es muy bueno para e‘the --
… anise smoke,
… is very good for headaches.’
Susan
me preguntó si quería bailar y l
‘He asked me if I wanted to dance and I told him,’
The realizations and allophonic distribution of /d/ exhibit ‘extensive regional variability’ (Amastae, 1989, p. 170; Waltermire, 2010), as well as ‘a certain amount of variability within a given dialect or idiolect’ (Cole, Hualde, & Iskarous, 1999, p. 2). Articulations of /d/ vary considerably, ranging from ‘a complete stop to a vocalic glide’ (Cole et al., 1999, p. 2), to outright deletions [Ø], although such deleted segments may be stigmatized (Barrutia & Schwegler, 1994, p. 120). This high degree of variability is evident in the New Mexican data. As such, the phonological contexts illustrated in (1) and (2) are viewed in terms of probabilities of contexts favoring or disfavoring stop [d] versus reduced realizations such as [ð] or [Ø], as opposed to obligatory contexts for rule application (Cole et al., 1999, p. 2). Multivariate analysis is employed here to tease apart the possible independent effects of the many previously identified linguistic factors constraining the variation (stress, word frequency, etc.).
Whereas the Spanish phone /d/ has a dental articulation, the English phone /d/ is alveolar, and, more notably, the languages differ in the number of allophones of /d/ in word-initial position. The English phoneme /d/ has just one variant, which is realized as a stop word-initially (Barrutia & Schwegler, 1994, p. 114), whereas it is the fricative variant, not the stop, which is more common in Spanish (Barrutia & Schwegler, 1994, p. 114; Teschner, 2000, p. 96). Thus, Spanish /d/ words can be articulated as a stop [d] or a fricative [ð] (more commonly [ð]), while an English word will have just one stop variant [d] (e.g., dentist [d], dentista [ð]/[d]). Consequently, if a Spanish speaker were to demonstrate an interlingual phonological influence from English, we could predict reinforcement of the stop [d] articulation in /d/ tokens to the detriment of the fricative variant [ð].
Data and coding of linguistic factors
The data used for this study was extracted from a corpus of naturally occurring speech, the New Mexico Spanish-English Bilingual (NMSEB) corpus (Torres Cacoullos & Travis, in preparation). The NMSEB corpus is comprised of recorded sociolinguistic interviews between bilingual Spanish-English New Mexicans and in-group members (for a detailed overview of the speakers and the corpus, see Torres Cacoullos & Travis, 2015, and Travis & Torres Cacoullos, 2013). All of the speakers use both Spanish and English regularly and naturally in their speech. This study is based on 18 recordings, representing approximately 180,000 words spoken by 20 speakers over 17.5 hours.
All instances of word-initial /d/ in Spanish were extracted for analysis and tokens were coded auditorily as either occlusive ([d]) or reduced ([ð], Ø). Perception of /d/ spirantization in Spanish can generally be said to be ‘auditorily fairly obvious’ (Port & Leary, 2005, p. 953). Nevertheless, coding as reduced or retained for several tokens was not interpretable at times due to noise or multiple speakers talking at once, and these tokens were not considered for analysis. Proper nouns were not included nor was the one token of non-pre-vocalic (preconsonantal) /d/ (droga). In addition, the 67 tokens of the one lexical item donde ‘where’ are not included in the analysis since the archaism onde, with no initial /d /, is common in New Mexican Spanish (Bills & Vigil, 2008, p. 15). A total of 236 tokens were thus not included for analysis.
All word types were counted separately, not conflated into paradigmatically related forms (e.g., digo, dije, dijera, etc., counted individually and not simply as forms of decir). Although within a usage-based approach to language inflectionally and derivationally related words are presumed to form a network of phonological and semantic associations (Bybee, 2001, 2010), each word registers effects of use individually (e.g., Gahl, 2008). A total of 2629 tokens of /d/ words were extracted in 197 different word types.
To verify that these bilinguals’ English follows the monolingual norm of stop realization of /d/, a sample of tokens of word-initial /d/ in English was also extracted from the first six interviews (N = 1370). These words exhibited minimal variation (12 occurrences of reduced [ð], appearing across three word types; didn’t, don’t, days).
A first finding was that, unlike studies suggesting an effect of proximity to switch site on phonetic outputs in elicited bilingual speech production (e.g., Bullock & Toribio, 2009; Olson, 2012), the rate of reduced and unreduced forms of /d/ was not significantly affected by code-switching (as determined by the factors distance from switch as measured in number of words and presence of switch in the Intonation Unit (IU)). Nor was there a phonetic priming effect of the realization of the previous word-initial /d/, whether the previous word was Spanish or English. Code-switching and priming are thus not considered further in the current analysis. Each token was coded for a set of independent variables (predictors), as outlined below.
Contextual and lexical factors
Previous phonetic context: The previous phonetic environment, as determined by the discourse context, was coded as either a context favoring stop articulation (preceding pause when utterance initial, 2 preceding /n/, and preceding /l/) or as a context favoring a reduced articulation of /d/ (all other post-consonantal and post-vocalic contexts).
Following vowel type: Previous studies of voiced stop realizations in Spanish find that the quality of flanking vowels (Cole et al., 1999) can significantly affect articulations of the consonants. Thus, we code for the following vowel class immediately adjacent to the word-initial /d/ [front (/i/, /e/) and non-front (/a/, /o/, /u/)]. The diphthong /j/ (dieron ‘they gave’) is grouped with the front vowels, and the diphthong /w/ (duele ‘it hurts’) is grouped with the back vowels for the multivariate analysis.
Word frequency per million: To investigate whether lexical frequency is correlated with phonological reduction (cf. Bybee, 2012, pp. 214–215), the token frequency of each word type containing a word-initial /d/ was calculated in the oral portion of the Corpus del español online (Davies, 2002–; 5,113,249 words). There are large discrepancies in token frequency values of the words (frequency per million of de ‘of, from’ = 46,472, frequency per million of debías ‘you should’ <1). The effect of frequency has been argued to be operative above a threshold at which cumulative experience with words can affect lexical representations and, as such, the data were discretized into high- and low-frequency groups (Erker & Guy, 2012, p. 538). High-frequency words were arbitrarily set at those with a frequency per million greater than 100 (45 types) and low-frequency words as those with a frequency of <100 per million (152 types).
Stress: Previous work has determined that stress can play a role in realizations of /d/ (Eddington, 2011), with unstressed syllables typically exhibiting more reduced forms (and less articulatory energy) than stressed syllables. All tokens were coded as to whether the syllable containing the word-initial /d/ carried lexical stress (
Cognate status: Coding cognate status of Spanish words for this project was based upon subjective assessment of degree of phonological, orthographic, and/or semantic overlap to their English equivalent. Although degree of ‘cognateness’ can be said to be highly gradient (Dijkstra, Miwa, Brummelhuis, Sappelli, & Baayen, 2010), the present analysis categorizes words into two categories: cognates (phonologically and semantically similar), non-cognates (dissimilar). Three native Spanish speakers with near-native English proficiency were asked to rate the list of 197 word types as cognate with English or not. Words deemed phonologically and semantically similar by (at least) two out of three consultants were categorized as cognates (see the Appendix). These comprise a total of 37 word types (N = 227) (e.g., decidieron ‘they/you plural decided’, doble ‘double’, diferente ‘different’, diciembre ‘December’, doctor ‘doctor’). A total of 160 word types were classified as non-cognate (N = 2402) (e.g., dice ‘you (formal), s/he say(s)’, después ‘after’, dientes ‘teeth’, domingo ‘Sunday’). 3
Frequency in a Favorable Context
The presence of an effect from discourse context frequency has been tested on word-initial /s-/ reduction in New Mexican Spanish by Brown (2006) and Raymond and Brown (2012), who label the effect FFC. Reduction (aspiration [h], deletion Ø) of /s/ in this variety is favored by a preceding non-high vowel (/e,a,o/), and disfavored by a preceding high vowel, consonant or pause. Words differ significantly in their frequency of use in this reducing discourse context [e.g., (la, una) señora versus (el, un) señor] and hence are presumed to differ in the number (or strength) of reduced exemplars stored in memory, which predicts likelihoods of reduced articulations. Raymond and Brown (2012) establish that FFC significantly constrains synchronic variation of /s-/ realizations, independent from and in addition to, other linguistic factors known to influence articulation (stress, phonetic context, word frequency).
Indeed, even phenomena traditionally given a language contact explanation, such as the case of [f] > [h] > Ø in Spanish of Latin FV- words (Menéndez Pidal, 1968, pp. 198–233; Penny, 1972, 1991), when reexamined with this more precise, contextually informed measurement of FFC, can be shown to be internally motivated. Brown and Raymond (2012) examine the diachronic change and show how the modern standard Spanish outcome of f- and h- words reflects usage patterns and cumulative exposure to reducing contexts (FFC). Forms more often preceded by a non-high vowel are more apt to have lost the initial consonant [FACTUS > hecho ‘done’ (with 53% of occurrences post non-high vowel) versus fecha ‘date’ (39% of uses)]. Further, FFC better predicts modern lexical distribution of FV- words than does transmission history (prestige borrowings or cultismos versus orally transmitted words).
Turning to /d/-initial words, the allophonic distribution discussed in the section The dependent variable: Spanish /d / realizations highlights the important role of online articulation effects stemming directly from the phonetic context in which a /d / word is embedded. For instance, a preceding /l/ is said to promote stop articulations of the /d/, presumably due to fact that /l/ assimilates a dental articulation that gives both phones the same place of articulation (Eddington, 2011, p. 15). These phonetic contexts, then [the non-reducing (#, /l/, /n/, /m/ ___) and the reducing environments], significantly predict the realization of the /d / during production. FFC measures a word’s ratio of occurrence in a phonetic context conditioning reduction.
Each word’s instances of use in a reducing context is calculated in the oral portion of the online Corpus del español (Davies, 2002–). This includes all contexts other than post-nasal, post-lateral, and post-pause. The estimates for use in post-pausal position is operationalized in the online corpus as tokens occurring immediately following punctuation (.,?!:;-¿¡). The total frequency per million in the reducing context is divided by the overall frequency per million of that word. The FFC calculation, then, is the number of tokens in a reducing context/total number of tokens. The value is expressed as a percentage: 100, the lexical item categorically occurs in a context conducive to reduction and 0, the word never occurs in a reducing context.
A comparison of the nouns domingo ‘Sunday’ and diferencia ‘difference’ illustrates this method. Examples listed in (3a) taken from Davies (2002–) are tokens of domingo used in contexts favorable to reduction of word-initial /d/, and examples listed in (3b) are uses of domingo in contexts unfavorable to reduction. Similarly, tokens of diferencia are used in contexts favorable to reduction (4a) and unfavorable to reduction (4b).
(3a) Domingo: Favorable context (N = 250)
El próxim
El prime
Libre completamente sábado
¿verdad?, cad
si hoy e
(3b) Domingo: Unfavorable context (N = 258)
En relación a
regresábamos e
di a luz u
no libré ningú
ir en la catedral el
(4a) Diferencia: Favorable context (N = 619)
sino que l
¿qu
Con un
no sentí mayo
no hay má
(4b) Diferencia: Unfavorable context (N = 34)
pero hay una suti
La gra
Y notaba
se manifiesta co
no sentí… la..
The FFC measure can be determined from the proportion of instances of use for each word in contexts favoring reduction, or each word’s frequency in a favorable environment. As can be seen in examples (3) and (4), the FFC measure is not dependent upon specific lexical items or specific phones but rather reducing environments generally. Table 1 summarizes the FFC values for each lexical item. The noun domingo is less often used (49%) in contexts that favor the approximant allophone ([ð]). Conversely, the noun diferencia is almost never found in a post-pause, post /l/, or post-nasal context, giving it a FFC measure of 95%. In almost every instance of its use, the phonological context in which diferencia is uttered predicts the approximant ([ð]) rather than the stop ([d]) articulation. FFC provides a way to measure the cumulative effect on the lexical representation of repeated utterances in such contexts.
Frequency in a Favorable Context (FFC) value for word-initial /d/ environments (Davies, 2002–).
This measure can be taken as an approximation of exemplar make-up based upon experience with a lexical item, and, importantly, this measure is distinct from the probability of occurring in a certain phonetic context (e.g., post-nasal, post-lateral, post-pause). This differentiates the FFC measurement from other measures of probability (e.g., Hume & Mailhot, 2013; Jurafsky, Bell, Gregory, & Raymond, 2001) in which the dependent variable (the predicted linguistic form) and the probabilistic measure predicting the linguistic outcome are derived from the same context. The FFC value of a lexical item remains constant whether it is being analyzed in a post-nasal environment, for instance, or a post-vocalic environment. 4
The reliability of a FFC value depends on the number of examples. A full 98% of the FFC values were calculated based upon 10 or more examples in the Corpus del español, with only 68 tokens having fewer than 10 occurrences in the corpus. Only an additional 100 tokens (4% of the data) were calculated based upon 20 or fewer examples. Multivariate analyses excluding FFC values for these tokens return the same results. For the multivariate analysis the data were divided into high and low FFC groups; words used in contexts favoring reduction in 75% or more of their instances are labeled high FFC (140 types) and words occurring in such contexts less than 75% of the time are considered low FFC words (58 types).
Bilingual Frequency in a Favorable Context
FFC has not previously been explored for bilingual data. For cognate words in these Spanish-English bilingual data, the Bilingual FFC calculation considers the effects of use stemming from English. To do this, instances of use of the cognate in English are entered into the FFC calculation (as outlined above) as examples of non-reducing contexts. English frequencies per million are taken from the online Corpus of Contemporary American English (COCA, Davies, 2008–). Formulated in this way, the FFC measurement incorporates gradient frequency effects of English forms, as can be appreciated in Table 2.
Frequency per million of cognate and non-cognate words in reducing contexts (Davies, 2002-, 2008-).
FFC: Frequency in a Favorable Context.
This FFC calculation is a ‘stand in’ for knowing exactly the experiences speakers have in production and perception. Ideally we would know a speaker’s precise experiences with words and the make-up of the exemplar representation. Based upon the effects we know to operate during articulation, FFC attempts to estimate probabilistically this experience.
The following section presents the results of analyses conducted on the 2629 coded tokens extracted from the NMSEB corpus.
Results
In order to determine which linguistic factor groups significantly constrain the variable realizations of /d-/ in the NMSEB corpus while simultaneously considering independent contributions of each linguistic factor group, the data were submitted to a multivariate analyses using Varbrul (Guy, 1993; Rand & Sankoff, 2001; Sankoff, 1988). The findings are summarized in Table 3. Of the six factor groups analyzed, selected as significant were four: the phonological contexts flanking the word-initial /d/ (previous and following), cognate status, and FFC. Factors are listed in the order of their magnitude of effect as determined by their order of selection in the multivariate analyses, as well as by the range in probability of individual factors. These probabilities reflect the degree to which the linguistic constraint relatively favors (closer to 1) or disfavors (closer to zero) a stop realization.
Multivariate analysis of factors contributing to reduced articulations [ð], Ø (versus non-reduced forms [d]) in New Mexico Spanish-English Bilingual (NMSEB) corpus (non-significant factors within [ ]).
The factor group ‘preceding phonetic context’ has the greatest relative magnitude of effect on realizations of word-initial /d/ in Spanish. If the word-initial /d/ is preceded by a nasal, lateral, or a pause, reduction is highly disfavored (probability .16) and /d/ reduces at a rate of 25%. The factor group ‘following phonetic context’ also significantly constrains realizations of word-initial /d/. For words with a following front vowel (dinero ‘money’, demás ‘rest’), reduction rates are higher (67%) and reduced articulations are favored (probability .53).
In addition to the significant contribution to variation made by both previous and following phonological context, the analysis also selected cognate status as significant. These word categories (cognate, non-cognate) behave differently with regard to initial /d/ reduction. Non-cognate words are significantly more likely to reduce a word-initial /d/ (68%) than are cognate words (25%) (p = 0.0000, X2 = 167.2588). Cognates, then, are more likely to be articulated with a stop /d/ than with a reduced /d/, in line with predictions regarding potential influence of English. That is, even when bringing other factors under statistical control, cognates (e.g., depender ‘depend’) are less likely to reduce than non-cognates (e.g., doloroso ‘painful’).
A model of cascaded interactivity between languages (e.g., Costa et al., 2005) might predict such a result. Yet, even in the face of such positive evidence, disagreements persist with regard to whether cognate effects “are due to differences in the way words are represented in the lexicon (e.g., shared morphemes for cognates), or rather are due to a more general property of the speech production system (e.g., cascade/interactivity dynamics)” (Costa et al., 2005, p. 99). In fact, this present study proposes another source of such effects that is outlined below: significantly different usage patterns.
Unraveling the cognate effect: the role of discourse context frequency (FFC)
The pattern of variation in the data reflects lexicalized effects, as is indicated by significant contributions to variation of FFC. As summarized in Table 3, the previously untested linguistic factor FFC significantly predicts realizations of word-initial /d/ in the NMSEB corpus, in line with usage-based research. The likelihood of word-initial /d/ reduction reflects the degree to which the lexical item is overall used in discourse contexts favoring reduction. That is, words with a high FFC favor reduction of the word-initial /d/, while words with a low FFC disfavor reduction. If the Spanish /d/ word has a high FFC (e.g., dije ‘I said’), reduction is favored with a probability of .53. Reduction of /d/ occurs overall with these words at a rate of 71%. The probability for /d/ words with a low FFC (e.g., día ‘day’) to reduce is .30, indicating reduction is strongly disfavored for this group of words. These words reduce overall at a rate of 37%.
This innovative probabilistic measure (FFC) is found to significantly predict variable realizations of word-initial /d/ while controlling for multiple factors known to affect pronunciation, in support of research reporting significant effects of cumulative measures of discourse context frequency on phonological reduction (Brown & Raymond, 2012; Bybee, 2002; Raymond & Brown, 2012). The cumulative experience speakers have with words is manifest in the pronunciation variation apparent in natural speech production. Through repeated use in reducing (or non-reducing) contexts, the exemplar representation of words shifts in accordance to the usage patterns, an effect easily captured in the “detail-preserving episodic memory” characteristic of exemplar theory (Mendoza-Denton, 2004, pp. 443–444).
There is variability in the discourse patterns, nevertheless. For example, some cognates often occur in reducing environments (e.g., defender ‘to defend’, FFC = 89; disciplina ‘discipline’, FFC = 99), while others exhibit very infrequent use in discourse contexts favoring reduction (e.g., distrito ‘district’, FFC = 14, dólar ‘dollar’, FFC = 36). Low FFC cognates reduce significantly less often (12%) than high FFC cognates (43%) (p = 0.0000, X2 = 28.88289), as is summarized in the last column of Table 4. Logically, there is also variation in the non-cognate words with regard to discourse distribution patterns (e.g., déjame ‘let me/leave me’, FFC = 17, divertido ‘fun’, FFC = 91), with high FFC words exhibiting significantly more reduction (73%) than low FFC words (46%) (p = 0.0000, X2 = 103.2255). As a result, when modeling variation, FFC has the same direction of effect across these word categories—with higher FFC words reducing more readily.
% /d-/ reduction for cognate and non-cognate words with high and low Frequency in a Favorable Context (FFC) tokens (Ns).
Table 4 also makes evident that a larger proportion of non-cognate words (84%, N = 2008/2402) have a high FFC than do cognate words (43%, N = 98/227), close to double. On average, cognate /d/ words are used in Spanish less often in contexts that promote reduction (FFC = 62) than non-cognate words (FFC = 79). 5 As a result of the diminished net exposure to reducing environments, per usage-based grammar, the lexical representations of cognate words have strengthened non-reduced exemplars ([d]). Conversely, non-cognate words with overall increased use in phonetic contexts promoting reduction have lexical representations with an increased number or increased activation of reduced exemplars ([ð]/ Ø).
The analysis of the data set reveals FFC as a factor capable of predicting variation of /d-/ realizations. Variable realizations of words, in addition to reflecting probabilistic outcomes indicative of online contextual pressures during articulation, also reflect lexicalized effects indicative of cumulative experience with that word (significant FFC effects). However, the FFC values represented in Tables 3 and 4 solely reflect discourse patterns in Spanish for cognate and non-cognate words and disregard potential impacts from knowledge and use of English. If bilingual lexical representation allows for interactivity between cognates, the FFC calculation based exclusively on Spanish usage patterns might less accurately reflect exemplar representations of cognates. Independent analysis of just cognate tokens (N = 227) suggests this to be the case.
The left-hand side of Table 5 summarizes a repetition of the variable rule analysis exclusively of cognate tokens (to the exclusion of non-cognates). It can be noted that although the individual probabilities for high and low FFC tokens suggest a direction of effect as seen for the data set as a whole, FFC is not selected as significantly constraining the variation of the cognate tokens. Nevertheless, the FFC value for each cognate, estimating the proportion of reduced and non-reduced exemplars stored in memory, is based on usage patterns in one language. The FFC value does not incorporate any information from the English lexical item that also forms part of speakers’ experience and that shares phonological and semantic ties with Spanish words.
Independent multivariate analyses of factors contributing to reduced articulations [ð], Ø (versus non-reduced forms [d]) of cognates in the New Mexico Spanish-English Bilingual (NMSEB) corpus (non-significant factors within [ ]), with Frequency in a Favorable Context (FFC) based only on Spanish tokens of occurrence, and FFC including Spanish and English (Bilingual FFC).
Other factor groups included in analysis (following vowel class, stress, frequency) were not significant.
How can we operationalize or quantify the potential effect from English in the usage-based framework adopted here? If the FFC value is replaced with the Bilingual FFC value, one that incorporates the potential impact for paradigmatic peer pressure from English via the network of exemplar connections, the results of the variable rule analysis are distinct, as is summarized on the right-hand side of Table 5. When the cumulative experience factor (FFC) incorporates a value considering the relative impact of English non-reduced forms on the exemplar representation in Spanish, and hence becomes a Bilingual FFC measure, this group is selected as significantly constraining variation of the cognate forms. High Bilingual FFC tokens strongly favor reduction and low Bilingual FFC tokens disfavor reduction.
Future cognate analyses with more tokens are needed to confirm this result. Nevertheless, for the data as a whole, the results of the variable rule analysis are also improved through the quantification and incorporation of experience with another language (in this case, English), providing the best model for the data (as determined by log likelihood, see Guy, 1993, pp. 246–247). These results are summarized in Table 6. Again preceding and following phonetic contexts significantly constrain variation of word-initial /d/. Importantly, however, there is no significant contribution to the model by cognate status once the FFC factor is quantified so as to include the potential impact of English. The Bilingual FFC effect, which attempts to provide a more complete (bilingual) picture of the knowledge and use stored in speakers’ exemplar representations, significantly constrains variation: high FFC tokens favor reduction (69%) with a probability of .55 and low FFC tokens disfavor reduction (22%) with a probability of .18.
Multivariate analysis of factors contributing to reduced articulations [ð], Ø (versus non-reduced forms [d]) in New Mexico Spanish English Bilingual corpus (non-significant factors within [ ]), Frequency in a Favorable Context (FFC) including experience with Spanish and English (Bilingual FFC).
The explanation for how the FFC measurement provides a better measure of reduction than cognate status may be an artifact of the measure itself. By combining several independent measures (word frequency in combination with extra-lexical preceding phonological environment), the FFC factor captures more detail than the category of cognate status per se, and hence may more accurately model reduction (cf. Jurafsky et al., 2001, p. 233). However, more than being a methodological refinement, by suggesting that phonological cognate effects in speech production are lexicalized effects of patterns of use, this work provides a theoretically consistent explanation of the source of the effect. Language patterns emerge through use. Cognates do not have a privileged role in the bilingual’s lexicon (Duñabeita, Perea, & Carreiras, 2010); their distinct rates of variation emerge from distinct usage patterns.
Discussion
Missing from the body of literature on contact-induced phonological influence are studies that examine language variation as it occurs in speech production among members of a speech community. This study used a corpus of naturally occurring Spanish/English code-switched discourse to test the hypothesis of cross-linguistic phonological influence in the data and to add to our understanding of bilingual lexical representation. Code-switched data was ideal to examine this question because although it is known that sound systems are not impermeable to external influences (Sankoff, 2002, pp. 644–649), few attempts have been made to see what, if any, effects there are on variable phonological processes when alternating between languages.
Examined here has been the variable realization of word-initial /d/ in Spanish. Hypothesized influence from English would be evidenced in increased use of the variant [d] in Spanish (as opposed to [ð or Ø]) due to its similarity to (and support from) English. Analysis of variable rates of /d/ reduction in word-initial position revealed no code-switching effect but that words sharing phonological and semantic overlap with English (cognates) are significantly less likely to be realized with a reduced initial consonant. This is perhaps an ‘unsurprising’ result, given the fact that phonological convergence is typically most evident precisely where two languages are most congruent (cf. Bullock & Gerfen, 2004). This result can be taken as evidence that languages are interconnected lexically. The interconnectivity, however, is not generalized to a phonemic level. That is, the effects felt cross-linguistically do not apply uniformly to all /d/ words (decreased rate of reduction generally), but rather variation is found to be lexically specific.
The explanation for the significant differences evident between cognate and non-cognate words, however, is approached from within a usage-based framework. In lieu of arguing that cognates have exceptional status in the lexicon (compared to non-cognate words) a priori, this work argues that the distinction between cognates and non-cognates emerges through the cumulative effect of significantly different patterns of use in discourse. Viewing bilingual language production in this way, as a specific case of variable use, predicts an outcome by which knowledge and use of one language can have predictable effects on the knowledge and use of the other language of a bilingual.
The exemplars are organized into a network of connections relating forms that are phonologically and semantically similar (Bybee, 2001, p. 29). The lexical connections are argued to be gradient and to reflect the degree of form/meaning overlap between lexical entries. Words with a high degree of phonological and semantic similarity share stronger lexical connections than words lacking such similarities. These connections represent the form/meaning overlap from which morphology is emergent (Bybee, 1999, p. 224). For instance, Bybee (2001, pp. 152–153) highlights the lexical connections with an example of the participle suffix in Spanish /-ado/. Importantly, as Bybee notes for the participle, ‘instances of –ado are associated with one another and can have an effect on one another’. What is the nature of this influence? Rates of reduction of intervocalic /d/ in this suffix are higher than rates of /d/ reduction in the same intervocalic context outside of the participle, attributable to the higher token frequency of the /-ado/ suffix. Thus, a word with low token frequency (such as cenizar ‘to turn to ash’, with a frequency per million of .04), but with the /-ado/ suffix, will exhibit higher /d/ reduction than the lexical frequency would otherwise predict as a result of the lexical associations with the highly reducing participle form. The strong form/meaning overlap of morphemes, therefore, allows for the possibility of influence.
This predicts, consequently, via the same mechanism, the likelihood of mutual lexical influence between other forms with strong phonological and semantic overlap. For instance, the masculine and feminine forms of the noun doctor in Spanish (doctor, doctora) would have strong lexical connections due to the high degree of similarity in form and meaning, as is illustrated in Figure 1. What does this predict for the bilingual lexicon? Precisely the type of cognate effect made apparent in the quantitative analysis of these data. Non-reduced forms typical of English pronunciation for this phonological variable ([d]) are closely associated cognitively in the lexicons of bilingual speakers to cognate pairs. Such associations bolster the strength of non-reduced exemplars in Spanish.

Bilingual lexical representation (cf. Bybee, 2001, pp. 22–25).
Cognates, then, reduce less often overall than non-cognates, due to their different exposure to phonetic environments conditioning reduction. The cross-linguistic phonological influence from English to Spanish in this case is lexically specific. Bybee (2012) highlights the interrelatedness of general articulatory routines that are built up from lexically specific articulatory routines, and notes that ‘while individual words have specific routines associated with them, their use activates the more general routines as well’ (p. 218). Thus, the preponderance of stop [d] articulations in cognates affects not just the cognate’s exemplar cloud, but also the ‘exemplar cloud at the more general level of the articulatory routine’ (Bybee, 2012, p. 218). While such influence could predict moderate slowing in the trajectory of /d/ lenition in Spanish (Penny, 1991, pp. 68–69) compared to non-contact varieties, there is no evidence in the current data for such a general change.
In sum, this study presents the results of a relatively untested notion regarding the effect of cumulative exposure to reducing phonetic contexts in discourse (FFC). The novel application of this linguistic factor to the analysis of variation in bilingual speech reveals that usage patterns better account for variation than labels such as cognate, and, in fact, provides an explanation as to the source of such effects in speech production. Significant differences exist in usage patterns between word categories generally. Such patterns predict that for words used frequently in online contexts promoting reduction (i.e., non-cognates), the likelihood of producing a reduced form of the word increases. These reduced articulations increase the number (and/or strength) of the reduced exemplars stored for that word. Such patterns yield different strengths of reduced/non-reduced forms in exemplar clouds averaged across the categories. Hence, the cognate effect is submitted to be a secondary effect of usage patterns, and, even in situations of language contact, the primacy of language internal sources of change is highlighted.
Footnotes
Appendix
Responses per word given by three consultants indicating cognate with English (N = 2629).
| word | N | word | N | word | N | word | N | word | N | word | N |
|---|---|---|---|---|---|---|---|---|---|---|---|
| da | 0 | decirle | 0 | derecha | 0 | dibujo | 0 | dijo | 0 | domingos | 0 |
| daba | 0 | decirlo | 0 | derechita | 0 | dice | 0 | dile | 0 | dompeaba | 3 |
| dábamos | 0 | decirte | 0 | derecho | 0 | dicen | 0 | dime | 0 | dompeaban | 3 |
| daban | 0 | dedo | 0 | derrite | 0 | dices | 0 | dinerito | 0 | donden | 0 |
| dabas | 0 | dedos | 0 | desabrochaba | 0 | dicho | 0 | dinero | 0 | dondequiera | 0 |
| dado | 0 | defender | 3 | desague | 1 | dichos | 0 | dio | 0 | dormía | 0 |
| daga | 3 | defendiendo | 3 | desaparecer | 3 | diciembre | 3 | Dios | 0 | dormíamos | 0 |
| dale | 0 | deja | 0 | desapareces | 3 | diciendo | 0 | dirección | 3 | dormían | 0 |
| dame | 0 | dejaba | 0 | desaparecías | 2 | diciéndole | 0 | diremos | 0 | dormida | 0 |
| damos | 0 | dejaban | 0 | desaparecieron | 2 | diecinueve | 0 | dirían | 0 | dormido | 0 |
| dan | 0 | dejado | 0 | desbarataron | 0 | diecisiete | 0 | disciplina | 3 | dormir | 0 |
| dando | 0 | déjalo | 0 | descendidos | 3 | dientes | 1 | dispare | 1 | dos | 0 |
| dándole | 0 | déjame | 0 | desde | 0 | dentista | 3 | dispuesto | 0 | doscientas | 0 |
| dar | 0 | dejamos | 0 | deseabas | 2 | diera | 0 | distantemente | 2 | doy | 0 |
| daré | 0 | dejan | 0 | desfendía | 3 | dieran | 0 | diste | 0 | duele | 0 |
| darle | 0 | dejar | 0 | desierto | 3 | dieron | 0 | distrito | 3 | duelen | 0 |
| darles | 0 | dejarme | 0 | desmayar | 0 | diez | 0 | divertíanos | 0 | dulce | 0 |
| das | 0 | dejaron | 0 | desollábanos | 1 | diferencia | 3 | divertido | 0 | dulces | 0 |
| de | 0 | dejas | 0 | desollé | 1 | diferencias | 3 | divirtiendo | 0 | dura | 0 |
| dé | 0 | dejé | 0 | despachaba | 0 | diferente | 3 | divorciado | 3 | duramos | 0 |
| debajo | 0 | déjenme | 0 | despacio | 0 | diferentes | 3 | dizque | 0 | durante | 0 |
| debe | 0 | dejo | 0 | despeinada | 0 | difícil | 2 | doble | 3 | duraron | 0 |
| deben | 0 | dejó | 0 | despenada | 0 | diga | 0 | doce | 0 | duras | 0 |
| debía | 0 | del | 0 | despidiendo | 0 | dígame | 0 | docena | 3 | duraznitos | 0 |
| debían | 0 | delante | 0 | despidió | 0 | digan | 0 | doctor | 3 | durmiendo | 0 |
| debías | 0 | demás | 0 | después | 0 | digas | 0 | doctores | 3 | durmieron | 0 |
| decía | 0 | demasiado | 0 | destapaba | 1 | digo | 0 | dólar | 3 | duro | 0 |
| decíamos | 0 | demen | 0 | detener | 0 | dije | 0 | dólares | 3 | duró | 0 |
| decían | 0 | démenlo | 0 | detenida | 0 | dijera | 0 | dolía | 0 | ||
| decíanos | 0 | den | 0 | detrás | 0 | dijeran | 0 | dolían | 0 | ||
| decidieron | 3 | denme | 0 | detuve | 0 | dijeras | 0 | dolió | 0 | ||
| decimos | 0 | depende | 3 | di | 0 | dijeron | 0 | dolor | 0 | ||
| decir | 0 | depender | 3 | día | 2 | dijimos | 0 | dolorosos | 0 | ||
| deprimido | 3 | días | 2 | dijiste | 0 | domingo | 0 |
Acknowledgements
I am grateful for constructive criticism and feedback given to me on previous versions of this work by Rena Torres Cacoullos and Catherine Travis, as well as two anonymous reviewers. I would also like to thank Jonathan Steuck for assistance in data extraction and coding.
Funding
This work was partially supported by National Science Foundation (NSF) grant #1019112/1019122 awarded to Rena Torres Cacoullos and Catherine E. Travis to support the development of the NMSEB corpus.
