Abstract
Aims and objectives:
While previous research has shown that phonetic variation in language contact situations is affected by whether a word has a cognate in the contact language, this paper aims to show that such an effect is not monotonic. According to the usage-based model, items in memory are organized according to similarity, thus we anticipated that formally more similar cognates would show a stronger cognate effect.
Methodology:
This variationist sociophonetic study investigates the relationship between cognate similarity and phonetic realization. We examined this relationship in the bilingual community of Rivera, Uruguay, in which both Portuguese and Spanish are spoken with regularity. Specifically, we focused on intervocalic /d/, which in monolingual Spanish is realized as an approximant [ð̞] or phonetic zero, but in monolingual Brazilian Portuguese is produced as a stop [d] or, in most varieties, an affricate [ʤ] before [i].
Data and analysis:
We analyzed a corpus of sociolinguistic interviews of the Spanish spoken in Rivera. Acoustic measurements were taken from approximately 60 tokens each from 40 different speakers. Using a linear mixed-effects model, we examined the relationship between several predictors and the degree of constriction of intervocalic /d/.
Findings/conclusions:
While there is an overall frequency effect whereby more frequent words exhibit less constriction of intervocalic /d/, as both frequency and cognate similarity increase, less constriction of intervocalic /d/ obtains. Therefore, frequent cognates in Portuguese that have very similar forms affect the production of intervocalic /d/ more so than other cognates.
Originality:
No previous study has demonstrated that the cognate effect on phonetic variation in a situation of language contact is regulated by form similarity between cognate pairs.
Significance/implications:
The data support the usage-based model in that similar cognates have more lexical connections and can therefore show greater influence on phonetic realization than can cognates that share less phonetic material.
Introduction
A tacit assumption made by many people, including some linguists (e.g., Weinreich, 1968), is that linguistic influence in bilingual speech is unidirectional, with the native and/or dominant language influencing the non-native and/or non-dominant language. However, much evidence has disproved the idea that language influence comes only from this language (e.g., Jarvis & Pavlenko, 2008). For example, it is well known among linguistic scholars that Spanish-English bilinguals in the southwestern United States demonstrate varying levels of interference from both languages in casual speech (Fought, 2003; Thomas, 2019). Bilingual speech varieties develop as a natural consequence of language contact and varying degrees of use of both languages. Bilingualism is not easily stated in terms of first language (L1) and second language (L2) since the first and/or dominant language is relative to the individual and varies widely among speakers. In fact, many bilinguals do not readily distinguish between L1 and L2 since they grew up speaking both languages. The degree to which each language exerts influence on a bilingual’s speech patterns, while perhaps shaped by one’s dominant, preferred language, is undoubtedly conditioned by multiple linguistic and social factors, and not by dominant language alone.
This is the case in Rivera, Uruguay, the largest city along Uruguay’s border with Brazil. Virtually all residents can be considered bilingual due to the long-standing contact between Spanish and Portuguese in the city. As with the U.S. Southwest, many bilinguals living in the city cannot easily distinguish between L1 and L2. Bonds with Brazil are very strong, having developed from transnational marriages, economic dependence, and constant contact between speakers of both languages along this open border.
Rivera and its sister city Santana do Livramento form one major metropolitan area. However, there are significant linguistic differences between the two, namely that there are few Brazilians living in Santana that speak Spanish. The use of monolingual Portuguese in neighboring Santana has impacted community perceptions of the bilingual Portuguese spoken in Rivera, which is seen as impure and riddled with errors by Brazilians (and many Uruguayans, consequently) (see Waltermire, 2010, 2014). The stigma that Uruguayan Portuguese suffers is due partly to historical attempts to quash the use of Portuguese in Uruguayan schools (Elizaincín, 1992). Following Carvalho (2016), “Spanish was successfully imposed through obligatory public schooling and language policies, but Portuguese was maintained as a minority language” (p. 406). This, coupled with the fact that older speakers who live outside of the city itself tend to speak Portuguese the most, has led to the stereotype that Portuguese is a rural language spoken by older, typically not as well-educated Uruguayans (Carvalho, 2004; Waltermire, 2012).
The linguistic situation in Rivera reflects a dual nature of linguistic influence. Though speakers who are dominant in Portuguese seem to exhibit more influence from this language in their Spanish, many speakers in Rivera consider themselves dominant in Spanish, especially those of the youngest generation. Nevertheless, though perhaps not as robust as for older speakers, influence from Portuguese is also emblematic of the Spanish spoken by residents under the age of 25. The present research adds to the growing body of recent work (Engelhardt et al., 2018; Waltermire, 2008, 2010, 2014; Waltermire & Gradoville, 2020) that supports claims of Portuguese phonological influence on Border Uruguayan Spanish. The purpose of this study is to analyze the nature of phonetic influence from Portuguese on the Spanish spoken in Rivera, specifically the variable use of a stop articulation of intervocalic /d/, which is typically realized in Spanish as an approximant. We address the potential effects of the similarity of cognate form in particular as well as of social variables (age, sex, profession, Spanish use) and other linguistic variables (morphological status, cognate status, lexical frequency of word form, and lexical frequency of the nearest Portuguese cognate form).
Intervocalic /d/ in Spanish and Portuguese
Although Spanish and Portuguese are formally very similar (Harris & Vincent, 1988), their phonologies demonstrate several differences. One of the most notable differences is the articulation of intervocalic /d/. The realization of this consonant as a voiced dental stop (cada ‘each/every’ [ˈka.da]) is claimed to be obligatory in Brazilian Portuguese (Delgado Martins, 1988, pp. 71–72), except when followed by a high anterior vowel, where it may also be produced as a voiced alveopalatal affricate (idade ‘age’ [i.ˈda.di] or [i.ˈda.dʒi]; pedir ‘to ask’ [pe.ˈdi] or [pe.ˈdʒi]; de ‘of/from’ [ˈdi] or [ˈdʒi]; advogado ‘lawyer’ [a.di(ɪ).vo.ˈga.du] or [a.dʒi(ɪ).vo.ˈga.du]) due to palatalization resulting from the anticipatory fronting of the high anterior vowel. Though /d/ is realized as an affricate in these contexts in many dialects of Brazilian Portuguese (such as those of São Paulo, Rio de Janeiro, Minas Gerais, and Bahia), it is still realized as a stop in much of southern Brazil (Carvalho, 2004; Koch et al., 2002, p. 107, 123). In every other linguistic context, however, even in non-southern varieties of Brazilian Portuguese, the realization of /d/ as a stop is standard.
Conversely, intervocalic /d/ is realized variably in nearly all modern dialects of Spanish as an approximant (todo ‘all/every(thing)’ [ˈto.ð̞o]) or a phonetic zero (lado ‘side’ [ˈla.øo]). The deletion of /d/ is restricted primarily to unstressed syllables, especially as part of the past participle morphemes -ado (first conjugation, as in hallado ‘found’) and -ido (second and third conjugations, as in sido ‘been’ and cumplido ‘accomplished/fulfilled’) (Bybee, 2001, pp. 148–153; D’Introno & Sosa, 1986). As such, approximants and phonetic zeroes for intervocalic /d/ are prototypical of Spanish.
A usage-based account of Spanish /d/ in contact with Portuguese
Portuguese and Spanish share a large portion of their vocabulary because they are such closely related languages. Ulsh (1971, p. x), for example, estimated that 85% of the Portuguese lexicon has cognates in Spanish, while Green (1988, p. 124) estimated 92% root cognacy between the two languages. Bilinguals in these two languages thus experience more overlap in material than bilinguals of unrelated languages. Such lexical overlap in situations of bilingualism has been known to influence the production of speech sounds in the language varieties in question. For example, Torres Cacoullos and Ferreira (2000) found in their study of New Mexican Spanish /b/ that low-frequency words with an English cognate containing the bilabial phoneme instead of the labiodental showed a much lower rate of labiodental production than any other group of words.
The usage-based model is uniquely suited to explaining cognate effects and their interaction with lexical frequency. The present study is concerned primarily with the first and third basic principles of the usage-based model as described by Bybee (2001, pp. 6–8) and their application to cognates in the speech of bilinguals, although other principles remain relevant to the discussion. Regarding the third principle, that “[c]ategorization is based on identity or similarity,” previous research (Bybee, 2001; Díaz-Campos & Gradoville, 2011) has argued that the high rate of /d/ reduction in -ado participles in Spanish is due to lexical connections between the stored word forms containing the morpheme, which has high type frequency in Spanish. Lexical connections need not be limited to individual morphemes in one language, however. In the bilingual mind, material from one language is not isolated from material from the other language. Since the usage-based model assumes that language is stored from actual instances of language use that a user encounters or uses, similar sounds between two languages have their own representations (e.g., /p/ and /k/ in English-Spanish bilingualism, which have longer voice onset times in English). As in the case of Spanish participles, categorization based on similarity is made possible by lexical connections in memory between lexical items in the two languages that share similar sounds. Figure 1 presents some examples of lexical connections involving /d/ in the Portuguese-Spanish contact situation. Portuguese lexical items have a rounded rectangle around them while Spanish lexical items have a regular rectangle. The allophones represented are idealizations and are subject to variation in the variety under study. Solid lines have been used for lexical connections where the sound is identical while dashed lines have been used where the sound differs in some way.

Lexical connections between similar sounds in Portuguese-Spanish bilingualism.
For cognates, defined as “pairs of words that are perceived as similar and are mutual translations” (Inkpen et al., 2005, p. 252) (see Section 2.3.1), connections between the lexical items in the two languages are based on shared semantic information in addition to shared phonetic structure. The degree to which cognate pairs overlap in phonetic form, however, can be conceptualized in a gradient manner. In Figure 2, we can see examples of lexical connections between Portuguese and Spanish that result from complete overlap (cada~cada ‘each’), partial overlap (moeda~moneda ‘coin, currency’), and no overlap (peço~pido ‘I ask for’). As we can see, the number of lexical connections is a function of the amount of overlap in the forms. Note that many lines are still dashed because the likely phonetic realization of the sound will differ to some extent between Portuguese and Spanish. From the perspective of the usage-based model, phonetically similar cognates should have a stronger effect on the realization of the lexical item in the other language. Thus, we would expect cada to show more of a cognate effect than moneda or, especially, pido, whose Portuguese equivalent lacks the /d/ that the Spanish form has.

Lexical connections between Portuguese and Spanish cognates.
The first principle, that “experience affects representation” (Bybee, 2001, p. 6), has received considerable attention due to the debate over frequency effects. Bybee (2006) defines three main frequency effects: the reducing effect, the conserving effect, and autonomy. Both the reducing effect and the conserving effect are predicted to play a role in the variation under study here. The reducing effect of high frequency refers to the phenomenon in which high-frequency words and sequences undergo reductive sound change prior to and to a greater extent than low-frequency words and sequences. Studies have shown the reducing effect to play a role in a wide variety of variable phonetic processes, such as English unstressed vowel reduction (Hooper, 1976), English t/d deletion (Bybee, 2002; Gregory et al., 1999), English word boundary palatalization (Bush, 2001), Spanish /s/ reduction (Brown, 2009; Brown et al., 2014; Brown & Torres Cacoullos, 2003; File-Muriel, 2009; File-Muriel & Brown, 2011; Minnick Fox, 2006), Spanish syllable-final /ɾ/ reduction (Díaz-Campos & Ruíz-Sánchez, 2008; Ruíz-Sánchez, 2007), Spanish intervocalic /ɾ/ reduction in para (Bedinghaus, 2013; Díaz-Campos et al., 2012), Portuguese para reduction to pra or pa (Gradoville, 2017; Huback, 2012), and several other examples cited by Phillips (2006).
The reducing effect of frequency is known to affect Spanish intervocalic /d/ (Bybee, 2001; Díaz-Campos & Gradoville, 2011; Eddington, 2011), the variable under study, although some researchers find frequency to have limited (Solon et al., 2018) to no (Bedinghaus & Sedo, 2014) effect on their data. Brown (2015), studying spirantization in word-initial /d/, found no significant frequency effect, but rather found that the frequency a word had /d/ occurring in contexts favorable to spirantization influenced whether spirantization occurred when /d/ appeared in unfavorable contexts. Looking at Peninsular Spanish processes affecting word-final /d/, Hualde and Eager (2016) found no significant effect for lexical frequency in their sample.
In addition to the reducing effect of high frequency, high frequency of use is also known to have a conserving effect, where high-frequency items resist analogical leveling toward more productive patterns because of their strong lexical representation in memory (Bybee, 2006). Frequently accessed phonological variants are strengthened in a speaker’s stored memory, thereby making the access of these items more efficient. Although the conserving effect of frequency is most commonly associated with morphosyntax, phonetic processes such as noun stress pattern shifts of English diatone pairs (Phillips, 2006, pp. 34–39), glide deletion in Southern American English (Phillips, 2006, pp. 76–81), Middle English unrounding of /ø(ː)/ (Phillips, 2006, pp. 84–87), Argentine Spanish /ʒ/ devoicing (Díaz-Campos & Gradoville, 2011; Gradoville, forthcoming), consonant retention in French liaison (Bybee, 2001, ch. 7), and Caribbean coastal Colombian Spanish word-final /s/ retention (Brown et al., forthcoming) have also been found to be subject to the conserving effect of high frequency. The conserving effect plays a role in the variation examined in the present study in multiple ways. We first expect that speakers that use Spanish more often will, due to Spanish lexical items’ stronger representation in their memory, realize /d/ in accordance with pan-Hispanic norms, as approximants and/or deletions, with greater frequency than those who use Portuguese more often, who would be expected to use more stops due to the stronger representation of Portuguese lexical items in their memory. Furthermore, words with highly frequent, similarly formed Portuguese cognates (especially those that are more frequent than their Spanish counterparts) are expected to exhibit more stop-like productions of Spanish /d/ relative to the norm. Taken together, the usage-based model predicts that high-frequency words with intervocalic /d/ will show more reduction than low-frequency words, unless the Spanish word has a high-frequency Portuguese cognate that is very similar to the Spanish word.
Methodology
The primary objective of this study was to examine the effects of social and linguistic variables on intervocalic /d/ realization in Rivera Spanish. Building on previous research, our variationist analysis includes the examination of cognate-related variables given the number of lexical connections between Spanish and Portuguese. The inclusion of these variables permits evaluation of two key principles in the usage-based model (Bybee, 2001).
Corpus and data
Speech data came from Waltermire’s (2006) corpus of Portuguese-Spanish bilinguals from Rivera, Uruguay. A total of 63 speakers (35 women, 28 men) comprise the corpus representing three age groups (16–25, 26–50, and 51–78 years) and three occupational groups (non-professional, professional, and student). Speakers were recorded, either individually or in a group setting with other speakers, while participating in a sociolinguistic interview in Spanish, and recordings ranged between 30 minutes and 3 hours (for a total of 45 hours of casual conversation).
Data from 40 of the 63 speakers were selected for instrumental analysis in the present study. Because students demonstrated little to no evidence of language contact (see Waltermire & Gradoville, 2020), speakers who were students were excluded from the present dataset. Additionally, speakers who did not self-report their use of Spanish were excluded because their data did not permit analysis of the Spanish use variable (see section on social variables) for these speakers. Following these exclusions, 40 speakers remained for instrumental analysis. Sixty tokens per speaker of intervocalic /d/ in word-medial and word-initial position were targeted for acoustic analysis (for a theoretical ceiling of 2,400 tokens); however, exclusions were made based on disfluencies or background noise, and some speakers did not produce 60 tokens. Individual tokens were identified auditorily in each recorded interview and marked in Praat (Boersma & Weenink, 2017) for acoustic analysis, as described in the following section.
Acoustic analysis and response variable
Intervocalic /d/ realization in Spanish has long been recognized as gradient in nature, although categorical symbolic representation based on impressionistic coding was once standard in order to represent it (D’Introno & Sosa, 1986). While some recent work on the variable has based categorical classification of the variable on observations of the spectrogram and the waveform (e.g., Rao, 2015), much recent work on /b d ɡ/ realization has applied instrumental acoustic measurements to model this inherently gradient phenomenon in a gradient manner (Engelhardt et al., 2018; Gilbert, 2019; Hualde et al., 2011; Waltermire & Gradoville, 2020, inter alia). The primary measurement used to this end is a consonant-vowel intensity ratio. Generally speaking, the more constriction (stop-like) a consonant has, the lower its intensity. The less constriction (deletion-like) a consonant has, the higher its intensity. Approximants, naturally, find themselves between the two extremes. Because the absolute intensity of recorded sound depends on several factors (recording level, speaker volume, speaker distance from microphone, etc.), it is crucially important to measure the consonant with respect to a nearby reference sound, which is usually an adjacent vowel.
To this end, individual tokens of intervocalic /d/ and the vowel following it were delimited in Praat TextGrids (Boersma & Weenink, 2017), as illustrated in Figure 3 in perseguidos, where /d/ generates a significant drop in intensity since it is produced as a stop, and in la diferencia, where /d/ has almost no effect on the intensity curve since it is almost completely elided. The delimitation procedure included two interval tiers to correspond to /d/ and its following vowel. The delimitation of /d/ included the minimum intensity associated with the consonant, while the delimitation of the following vowel included the maximum intensity of the vowel. When /d/ was articulated weakly, the location of the presumed consonant was sometimes arbitrary and so the researchers used perception to delimit its location. The vowel following /d/ was delimited starting at the end of the delimited /d/ and ending at the end of the vowel (or diphthong). The minimum and maximum intensities of /d/ and the following vowel, respectively, were extracted using a script for subsequent statistical analysis. The consonant-vowel intensity ratio was derived by subtracting the minimum intensity of /d/ from the maximum intensity of the vowel. 1 Lower values of this ratio indicate weaker articulations whereas higher values indicate stronger articulations.

Waveform, spectrogram, intensity curve, and Praat TextGrid of perseguidos ‘persecuted’ (left) and la diferencia ‘the difference’ (right).
Predictor variables
Within this section, the predictor variables used in this study are described. The cognate similarity variables have their own dedicated section since they are central to this study and, to our knowledge, have never been operationalized as variables in variationist sociolinguistic studies.
Cognate similarity variables
Several definitions for the term cognate exist. From a historical linguistics perspective, Kondrak (2001) defines cognates as “words in related languages that have developed from the same ancestor word” (p. 1). Inkpen et al. (2005) call these word pairs genetic cognates, and following their definition (i.e., “pairs of words that are perceived as similar and are mutual translations,” [p. 252]), todo—todo and pueden—podem are examples of cognate pairs for Spanish and Portuguese. In a similar vein, Simard et al. (1992) define cognates as “pairs of tokens of different languages which, usually due to a common etymology, share ‘obvious’ phonological or orthographic properties, as well as semantic properties, so that they are likely to be used as mutual translations” (p. 1073). These three definitions highlight an important consideration with respect to word pairs that are named cognates: definitions fall on a continuum from broad to narrow depending on how cognates will be examined in a specific field of study. For example, Inkpen et al. (2005) adopt a definition that is relatively broad in scope and relevant to the topic of second language learning, whereas Kondrak’s (2001) definition is more narrow to meet his goal of comparing the vocabularies of related languages from a historical point of view. The present study approached the study of cognate similarity with Inkpen et al. and Simard et al.’s conceptualization of cognate in mind. While this approach facilitates a quantitative examination of cognate similarity between Spanish and Portuguese word pairs, we do not assume that words from cognate pairs function equivalently or similarly across both languages.
We used four measures of cognate similarity to examine quantitatively the formal closeness of cognates in Spanish and Portuguese, as well as the influence of this variable on /d/ realization in Rivera Spanish. All cognate similarity measures were based on the orthographies of the cognate pairs since phonetic form is a moving target and phonological form depends on the analyst. The first cognate similarity measure, PREFIX, was calculated by dividing the length of the common prefix (or first shared part) from each word by the length of the longer string (Inkpen et al., 2005, adapted from Simard et al., 1992), as shown in (1).
(1) tradiciones ~ tradições PREFIX = LENGTH_COMMON_PREFIX ÷ LENGTH_LONGER_WORD 6 ÷ 11 = 0.54
The second, Dice’s similarity coefficient (hereafter DICE), was calculated by dividing twice the number of shared letter bigrams (i.e., two consecutive letters) by the total number of bigrams in both words (Adamson & Boreham, 1974), as shown in (2).
(2) devolvieron ~ devolveram DICE = 2 × (SHARED_BIGRAMS ÷ (BIGRAMS_WORD1 + BIGRAMS_WORD2)) 2 × (6 ÷ (10 + 9)) = 0.63
The third, LCSR (or longest common subsequence ratio), was computed by dividing the length of the longest common subsequence by the length of the longer string (Melamed, 1999), as shown in (3).
(3) adonde ~ aonde LCSR = LENGTH_COMMON_SUBSEQUENCE ÷ LENGTH_LONGER_WORD 4 ÷ 6 = 0.67
The fourth and final measure, NED (or normalized edit distance), was calculated by counting the minimum number of edits (e.g., substitutions, insertions, and deletions) needed to transform one word into another and then dividing this sum by the length of the longer string (Wagner & Fischer, 1974), as shown in (4).
(4) ayuden ~ ajudem NED = COUNT_EDITS ÷ LENGTH_LONGER WORD 2 ÷ 6 = 0.67
All measures produce values between 0 and 1 with values closer to 1 indicating word pair orthographic similarity for PREFIX, DICE, and LCSR, while values closer to 0 indicate the same for NED. These cognate similarity measures were chosen based on their relatively high rate of accuracy in identifying French and English cognates (between 90% and 94%; Inkpen et al., 2005). The cognate similarity measures required us to identify a single Portuguese cognate word form for each Spanish word form among the data. Word forms were used instead of lemmas since lexically-specific phonetic variation is known to be associated with forms (Brown et al., 2014). We chose the cognate form with greatest degree of functional equivalence. For example, although the Spanish second-person singular verb form podés ‘you can’ has an etymologically equivalent vós form podeis in Portuguese and a semantically equivalent tu form podes, most varieties of Brazilian Portuguese use the etymological third-person singular form pode for second-person singular even if the tu pronoun is used. Consequently, Portuguese pode was deemed to be the cognate of Spanish podés for the purposes of measurement.
Other linguistic variables
Lexical frequency has been shown to condition the reduction and deletion of /d/ in high-frequency words like todo ‘all/every(thing)’ [ˈto.ðo] > [ˈto] and cada ‘each/every’ [ˈka.ða] > [ˈka] (Bybee, 2001, pp. 148–153). Multiple studies show that /d/ deletion in Spanish is favored in highly frequent words as well as the in the past participle morphemes -ado and -ido (Bybee, 2001, pp. 148–153; Díaz-Campos & Gradoville, 2011; D’Introno & Sosa, 1986). These morphemes occur frequently in discourse as part of the Spanish perfect constructions, which consist of a conjugated form of the auxiliary verb haber ‘to be/have’ and the past participle, similar to English. Due to their high frequency, it is expected that /d/ in past participle morphemes will be realized with less constriction than /d/ elsewhere. Though reduction and deletion follow this pattern for monolingual varieties of Spanish, what happens in bilingual communities when speakers possess expanded phonological repertoires? With respect to Spanish-Portuguese bilinguals, such as those who participated in the current study, how is the realization of intervocalic /d/ impacted by the presence of stop articulations that characterize Portuguese?
The effect of language use on the realization of intervocalic /d/ may potentially rein in some of these changes as shown in Blas Arroyo (2006), Waltermire (2010), and Michnowicz (2011). Blas Arroyo (2006) found that Spanish-dominant speakers in Catalonia reduce intervocalic /d/ far more often than do Catalan-dominant speakers, while Waltermire (2010) showed a similar trend in the bilingual community of Rivera, Uruguay. Using self-reported frequencies of language use according to multiple interlocutors and domains (with overall percentages of use being averaged for each participant), Waltermire (2010) found that Spanish-dominant speakers delete /d/ to a far greater extent than do Portuguese-dominant speakers. Michnowicz (2011) showed that speakers whose parents were monolingual Yucatec Maya speakers were more likely to realize /d/ as a stop. In these varieties, language contact favors occurrence of stops and inhibits deletion rates, a force not at play in monolingual varieties. For this reason, we expect Spanish-dominant speakers in the current study to realize intervocalic /d/ with less constriction than Portuguese-dominant speakers, who we expect to realize /d/ with greater constriction, especially for highly frequent words in Portuguese. Token frequencies were determined using the Corpus del español and Corpus do português (Davies, 2001; Davies & Ferreira, 2006), online corpora from which frequencies per million words were extracted from the 1900s spoken data. In the case of Portuguese, we used only spoken data from Brazilian speakers. To be sure, this method of measuring lexical frequency is not ideal and potentially limiting as the speech represented in these corpora does not reflect the speech of the target bilingual community. If available, relevant bilingual corpora should be used to measure word frequency in bilingual speech data.
In addition to addressing individual word and cognate frequencies, we also introduced a new variable, word bias, that addressed the extent to which a Spanish word or its Portuguese cognate were used more often. Word bias was calculated using the formula in (5). This formula produced a positive number if the word was more biased toward Spanish or a negative number if the word was more biased toward Portuguese. Values close to zero indicate that the word is of approximately equal frequency in the two languages.
(5) WORD BIAS = log(SPANISH_LEXICAL_FREQUENCY + 1) − log(PORTUGUESE_COGNATE_ FREQUENCY + 1)
Social variables
Following previous sociolinguistic research using Waltermire’s corpus (2006, 2008, 2010), four social or extralinguistic variables were included for analysis in the present study: sex, age, profession, and Spanish use. The first three social variables are arguably the most traditional and have been included given that differences in sex, age, and profession are well documented in studies on language variation and change. Waltermire (2008) has shown that intervocalic /d/ is socially conditioned in Rivera Spanish; specifically, younger speakers produce fewer occlusive realizations of intervocalic /d/ than older speakers, and students delete intervocalic /d/ at a higher rate than professionals and non-professionals. As previously discussed, rate of Spanish use has also been shown to influence Rivera speakers’ production of intervocalic /d/, with speakers who reported less than 40% use of Spanish (as compared to Portuguese) producing stops two and five times more than speakers who reported between 40% and 80% and over 80% Spanish use, respectively (Waltermire, 2010). The present study examines these social variables alongside linguistic (including cognate-related) variables to better explain patterns of use in Rivera speakers’ production of intervocalic /d/.
Analysis
Linear mixed-effects models using the lme4 (Bates et al., 2015) and optimx (Nash & Varadhan, 2011) packages in R (R Core Team, 2019) were fit with several varying intercepts in order to account for hierarchical groupings in the data. The speaker of the token as well as word and bigram (if the /d/ was word-initial) were included. Additionally, four varying intercepts related to the vowel height and stress of the preceding and following vowels were included in the model, since they may impact the consonant-vowel intensity ratio in ways that are not relevant to the present study (Gradoville, 2011; Waltermire & Gradoville, 2020).
The maximal models included the following fixed effects as predictors of the consonant-vowel intensity ratio: (AGE × PROFESSION × SEX × SPANISH_USE) + (COGNATE_SIMILARITY × log(PORTUGUESE_COGNATE_FREQUENCY + 1) × SPANISH_USE) + MORPHOLOGICAL_STATUS + SPANISH_LEXICAL_FREQUENCY + WORD_BIAS
All four cognate similarity variables were separately tested, creating four sets of models. Interval predictors were z-scaled. Where there was no Portuguese cognate, the distribution mean was imputed for cognate similarity. Non-significant terms were discarded systematically starting with the term with the highest p-value until only significant main effects and interaction terms remained, the Minimal Adequate Model, which is presented in the results.
Results
It is important to first note that the four cognate similarity measures correlated very strongly with one another as can be seen represented in the correlation plot in Figure 4. Moreover, the results of the various regression analyses were essentially the same regardless of which cognate similarity measure was used. Consequently, the inclusion of all the models would introduce unnecessary redundancy into the discussion of results. Therefore, we only present the model including PREFIX as the measure of cognate similarity because it resulted in the best model fit according to the Akaike Information Criterion (AIC) (PREFIX: 6218.430; LCSR: 6218.495; DICE: 6220.945; NED: 6221.012), although for practical purposes any of the four measures would have sufficed.

Correlation plot of the four measures of cognate similarity.
Another important note prior to discussing the results is the fact that R dropped Spanish lexical frequency from the regression automatically in the context of the inclusion of Portuguese cognate frequency due to the extremely strong correlation between the two variables. As a result, one variable (Portuguese cognate frequency) must carry the load of representing frequency in both languages.
Table 1 presents the fixed effects from the minimal adequate linear mixed-effects model. Since the response variable is the consonant-vowel intensity ratio, positive estimates are in the direction of more constriction while negative estimates are in the direction of less constriction. For each variable (and variable level), the estimate and associated p-value are presented along with the means and token counts for portions of the data.
Fixed effects of mixed-effects model of predictors of intervocalic /d/ constriction.
p ≤ 0.05; **p ≤ 0.01; ***p ≤ −.001.
We can see in Table 1 that there is a statistically significant interaction between Portuguese cognate frequency and cognate similarity. In addition, the linguistic variable word bias contributes to the variation at a statistically significant level along with the social predictors age, Spanish use, and speaker sex.
Lexical/cognate frequency and cognate similarity
Notably for the present study, the interaction between Portuguese cognate frequency and cognate similarity was significant. The significant main effect for Portuguese cognate frequency has a negative correlation with the consonant-vowel intensity ratio (−0.150590): As the word gets more frequent, there is on average less constriction of intervocalic /d/. Cognate similarity does not play a significant role as a main effect; however, its interaction with Portuguese cognate frequency does (estimate = 0.101839): Relative to the overall negative correlation between Portuguese cognate frequency and the ratio, as both frequency and similarity increase, there is more constriction. It is important to note that Spanish lexical frequency and Portuguese cognate frequency are not readily distinguishable from a statistical standpoint, so the combined effects of both must be borne in mind when interpreting these results.
To understand what this interaction means for practical purposes, it was plotted in Figure 5 with regression lines representing the relationship between Portuguese cognate frequency and the consonant-vowel intensity ratio in three conditions: (1) when the Portuguese cognate was identical to the Spanish word, which was nearly half the time, (2) when PREFIX was in the 2nd or 3rd quartile because the cognate was similar to the Spanish word, and (3) when PREFIX was in the 1st quartile because the cognate was relatively different from the Spanish word. Figure 5 shows that, when the Portuguese cognate is dissimilar to the Spanish word, there is no apparent relationship between frequency and constriction. On the other hand, when the Portuguese cognate is similar or identical to the Spanish word, there is a positive correlation between frequency and constriction. The implications of this result will be addressed in the Discussion.

Trends of combined effect of Portuguese cognate frequency and cognate similarity.
Word bias
The word bias variable was conceived of in order to account for cases where the Spanish word that contains the token of /d/ and the Portuguese cognate differ markedly in terms of frequency, since this would impact their relative lexical strength in memory. This variable contributes to predict the consonant-vowel intensity ratio at a significant level. Table 1 shows that, with word bias’s negative coefficient (−0.131470), as a word becomes more biased toward Spanish, it is produced with less constriction on average than a word biased toward Portuguese. This effect exists in concert with the other variables included in the model.
Social predictors
In addition to the internal linguistic predictors, three social predictors significantly predict the realization of intervocalic /d/ in Rivera Spanish. First, as we can see in Table 1, age significantly predicts the consonant-vowel intensity ratio of /d/ (0.156493): older participants produce /d/ with more constriction than younger participants. We can observe this relationship in Figure 6, which is a scatter plot of age and the consonant-vowel intensity ratio. The participants above age 60 do not produce /d/ with a ratio of less than 6 dB on average, indicating relatively constricted productions. In contrast, some younger participants produce /d/ with an average ratio less than 4 dB, which is indicative of relatively less constriction.

Scatter plot of age effect.
A second significant social effect in Table 1 is that of Spanish use. The negative correlation between Spanish use and the consonant-vowel intensity ratio (−0.130963) indicates that as participants use more Spanish, they tend to produce intervocalic /d/ with less constriction. This effect may be visualized in Figure 7, which is a scatter plot of the relationship. Although there is much variation, participants that use more Spanish strongly tend to have lower consonant-vowel intensity ratios on average.

Scatter plot of Spanish use effect.
The remaining significant social effect on the data in Table 1 is that of speaker sex (Male = −0.282083): men, on average, produce intervocalic /d/ with less constriction than women. This relationship is represented in the violin and box plot in Figure 8: While there is considerable overlap between their interquartile ranges, women on average produce higher consonant-vowel ratios and have higher possible averages than men.

Violin and box plot of speaker sex effect.
Discussion
Following a variationist sociolinguistics approach, the present study sought to examine the effects of social and linguistic variables on intervocalic /d/ realization in Rivera Spanish. Regarding the social variables examined, age, Spanish use, and sex significantly influenced the intensity ratio of intervocalic /d/ realizations in speakers’ spoken Spanish. Specifically, younger speakers, male speakers, and speakers who reported using more Spanish produced intervocalic /d/ with less constriction (i.e., a less stop-like articulation) than their respective counterparts. These results align with previous accounts of intervocalic voiced obstruents in Rivera Spanish (e.g. Engelhardt et al., 2018; Waltermire, 2010; Waltermire & Gradoville, 2020). Though these results are expected for younger speakers who speak mostly Spanish, it is not expected that men produce intervocalic /d/ with less constriction than women. Considering that this is the same group of participants for Waltermire and Gradoville (2020), we can fairly assume that this results from the fact that “speakers with the highest rate of stop-like productions are all women, [who have] a disproportionate effect in skewing the average consonant-vowel intensity ratios of the women as a social category” (p. 283).
Regarding linguistic variables included for analysis, we found a significant main effect for word bias as well as a significant interaction between Portuguese cognate frequency and cognate similarity (based on PREFIX). Intervocalic /d/ realization in words biased toward Spanish had less constriction than in words biased toward Portuguese. This finding makes sense since, to the extent that Riverenses maintain separate phonological schemata, words that are biased toward one language or the other are used far more often in that language, and the center of the associated exemplar cluster is more likely the variant associated with the language in question. For example, although both languages have a word form lados ‘sides’, the word is far more frequent in Spanish and less constriction would consequently be predicted. A word that is more frequent in Portuguese, such as todo ‘all, every’ would be predicted to exhibit greater constriction. We indeed observe such behavior as todo (mean intensity ratio = 4.241) has more constriction on average than lados (mean intensity ratio = 2.9848).
With respect to the interaction between frequency and cognate similarity, our findings connect Rivera Spanish to what is known about intervocalic /d/ in Spanish and add to what we know about bilingual phonological representation. Regarding the overall frequency effect, when accounting for the main effect of frequency, high-frequency results in significantly decreased constriction, consistent with findings from other varieties of Spanish (Bybee, 2001; Díaz-Campos & Gradoville, 2011; Eddington, 2011), thus providing additional support for the reducing effect of high frequency (Bybee, 2006). This frequency effect, however, is not consistent in this variety of Spanish. Specifically, we also found that when the Portuguese cognate was similar or identical to the Spanish word, the relationship shifted to a positive correlation between frequency and constriction. This indicates that, due to a combination of cognate similarity, which results in denser lexical connections between the Portuguese and Spanish words in memory, and the conserving effect of high frequency, the representations associated with Portuguese have greater impact on the productions of the Spanish words: these words carry the increased constriction associated with Brazilian Portuguese. These findings demonstrate how two of the main properties of the usage-based model, namely the effect of usage on linguistic structure and the categorization based on similarity, operate in a situation of bilingualism where the two languages share many cognates.
Our research further speaks to the nature of crosslinguistic phonetic influence in two languages that have notable lexical overlap. As pointed out by Amengual (2016), previous research has shown that cognate status does influence the acoustic realization of sound segments. Because 92.5% of our data come from words that have a cognate in Portuguese, it is difficult to address the cognate versus non-cognate effect, and, although we tested the effect at one point, there was no significant difference between cognates and non-cognates, loans, proper nouns, and false cognates. Our findings add to what we know about cognates, namely that the cognate effect on phonetic realization is not constant, but rather is modulated by the combined effect of word frequency and the similarity of cognate forms: Forms that are similar in the two languages are more likely to show a cognate effect, especially when they are of high frequency.
Although our findings add to what we know about crosslinguistic phonetic influence between two languages, much remains to be investigated. While we have addressed the Spanish of Rivera, a future study should address intervocalic /d/ realization in both the Portuguese and Spanish of this population in order to, among other things, determine the extent to which the crosslinguistic influence is symmetrical. Additionally, measures of word frequency in bilingual speakers should be based on bilingual language corpora when such corpora are available. Future research should examine different situations of contact in order to determine how the nature of the bilingualism influences this process. In the present study, speakers learn both languages from a young age, but different orders of acquisition may affect the cognate process. Furthermore, it is important to study different language pairings based on quantity and similarity of cognates to ascertain the extent to which these two properties influence the quantity of crosslinguistic phonetic influence in the situation in question.
Conclusion
The present study has demonstrated that a cognate effect on phonetic realization in language contact is strongest when the cognates are similar in form and the words in question are frequent. Furthermore, words that are more likely to be used in one language or the other are more likely to use a variant associated with the language in question due to the stronger representation in memory of that language’s version of the word. Our findings further coincide with previous studies on this variety in that the realization of intervocalic /d/ is affected by age and gender as well as the frequency with which the speaker uses the two languages.
The findings of this study further support the ability of the usage-based model to explain patterns of variation in linguistic production. They specifically expand upon support for its ability to model the interconnections between items in bi-/multilingual speakers’ linguistic systems. The density of such interconnections is, crucially, modulated by the degree to which the items are similar to one another, consistent with the predictions of the usage-based model.
Footnotes
Acknowledgements
We are grateful for feedback received on previous versions of this paper presented at the 12th Conference on Spanish in Contact with Other Languages and the 48th New Ways of Analyzing Variation conference as well as from two anonymous reviewers. All remaining errors are our own.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The field research for this study was supported by a 2002 Tinker Foundation Field Research Grant to the second author.
