Abstract
Translation equivalents (TEs) characterize the lexicon of bilinguals from the early stages of acquisition, as reported in studies involving English and other languages in which most cross-language synonyms are dissimilar in phonological form. This research explores the emergence of TEs in Spanish-Catalan bilinguals who are acquiring two languages with many cognate words and thus languages with many cross-language synonyms with identical or similar phonological forms. Expressive vocabulary was obtained in two 18-month-old groups (monolingual and bilingual, N = 24 each) through parental report using a bilingual questionnaire. Four different vocabulary size measures were computed in bilinguals, correcting for different types of phonological overlap in words across their two languages. Bilinguals were found comparable to monolinguals in every measure except for Total Vocabulary Size (Spanish + Catalan words) in which they outscored monolinguals due to the high number of form-identical cross-language elements in their expressive vocabularies. Form-similar and dissimilar TEs accounted for less than 2% of the words produced and were only present in infants with larger vocabularies. Results support the hypothesis that phonological form proximity between words across bilinguals' two languages facilitates early lexical acquisition.
It is generally accepted that young bilinguals begin to produce canonical babbling and first words at roughly the same age as monolinguals (Genesee & Nicoladis, 2007; Oller, Eilers, Urbano, & Cobo-Lewis, 1997). However, bilinguals are confronted with a more complex input than that of monolinguals, containing phono-lexical information from two distinct languages. It is no surprise, then, that their lexical development reflects specific properties of their dual input, and that differences may appear when bilinguals are compared to monolinguals in measures of lexical growth and vocabulary size. This is actually one of the recurrent topics in the comparison between monolingual and bilingual children. A first look at the literature reveals the existence of not totally convergent results. Some research has found evidence of significant differences in standardized measures of expressive and receptive vocabulary sizes between monolingual and bilingual children, the latter usually showing smaller vocabularies in at least one of their native languages (Bialystok, Luk, Peets, & Yang, 2010; Oller, Pearson, & Cobo-Lewis, 2007; Pearson, Fernández, & Oller, 1993). Other studies, however, have failed to corroborate these vocabulary size differences (Águila, Ramon-Casas, Pons, & Bosch, 2005; De Houwer, Bornstein, & Putnick, 2013; Thordardottir, 2011).
Input factors in bilingual contexts certainly modulate lexical growth and can account for some of the non-convergent results just described (Hoff et al., 2012; Place & Hoff, 2011). But a critical factor that affects how bilinguals and monolinguals compare in lexical acquisition, and can contribute to the non-convergent data found in the literature, is related to the way vocabularies are measured.
When assessed in just one of their languages, young bilinguals tend to show lower scores than their monolingual peers, as their vocabulary is distributed in two languages and not all lexical items are shared (Bedore, Peña, García, & Cortez, 2005). Differences in expressive vocabulary found in single-language evaluation disappear when bilinguals’ words in both their languages are taken into account (Hoff et al., 2012; Junker & Stockman, 2002; Patterson & Pearson, 2004; Pearson & Fernández, 1994). Thus measures of combined vocabularies are better able to capture bilinguals’ lexical knowledge. Two different measures are normally being used: Total Vocabulary Size (TVS) which represents the sum of the words a child knows across their two languages (raw vocabulary score from each language), and Total Conceptual Vocabulary (TCV) which indicates the concepts the child has a word for regardless of the language used, that is, counting only once the concepts that are represented in both languages (Bedore et al., 2005; Core, Hoff, Rumiche, & Señor, 2013; Marchman, Fernald, & Hurtado, 2010). This “conceptual” score offers a more conservative assessment of lexical knowledge and is more likely to reveal similarities rather than differences between monolingual and bilingual early learners (Pearson, 1998). However, it does not adequately capture the complexity of dual lexical acquisition involving a gradual incorporation of cross-language synonyms, also referred to as Translation Equivalents (TEs). Treating the acquisition of TEs as a simple process, merely involving the addition of a second label to an already existing one in the child’s mental lexicon, is ignoring one crucial aspect in this process, namely, the cost of learning their phonological form (Core et al., 2013). Such learning is not straightforward and we argue that it can be modulated by phonological proximity factors between cross-language synonyms. This is a central issue in this research specifically assessing expressive vocabulary and the emergence of translation equivalents in bilinguals acquiring a pair of phonologically close languages, Spanish and Catalan, with a high number of cognates, words similar in form and meaning. More specifically, the study aims at exploring the role of phonological word-form similarity across bilinguals’ two languages in early lexical acquisition and how it affects measures of bilinguals’ vocabulary size.
The early presence of TEs both in receptive (De Houwer, Bornstein, & De Coster, 2006) and expressive vocabularies of very young bilinguals has already been attested (Holowka, Brosseau-Lapré, & Petitto, 2002; Junker & Stockman, 2002; Pearson, Fernández, & Oller, 1995). Details from studies that have specifically analysed the production of TEs in young bilinguals from different languages, are offered in Table 1. In general, the studies confirm that TEs are acquired even before the 50 word milestone, representing between 20% and 30% of bilinguals’ expressive vocabulary between 18 and 24 months of age.
Studies reporting quantitative data on the presence of translation equivalents (TEs) in the early lexicons of bilinguals acquiring different pairs of languages.
Note. LSQ: Langue des Signes Québecoise; CDI: MacArthur-Bates Communicative Development Inventory.
Variability in the amount of TEs found in bilinguals’ lexicons can be the result of exposure factors, as vocabulary size is, so more balanced bilinguals are expected to acquire a higher number of them (David & Wei, 2008). But it can also reflect specific properties of the input languages and be linked to the distribution of different types of TEs, closer or more distant in phonological form. Here, we are specifically interested in considering the role that different degrees of phonological similarity have in the early acquisition of TEs, favouring or hindering early lexical acquisition.
Evidence that phonological proximity can affect the number and type of TEs was reported by Schelletter (2002) from observations of a German-English bilingual child followed from 23 to 32 months of age. The study reports that the presence of TEs is significantly larger and far more balanced across the languages for form-similar (as in “apple” – “Apfel,” sharing half or more of the sounds but sounding phonetically distinguishable), than for form-dissimilar nouns (as in “tree” – “Baum”). It also indicates that, on average, the TE of a noun occurs around 3 months later after the first emergence of that noun in the other language context, this gap being smaller for form-similar elements. Form-concept mappings in both languages are thus facilitated by form similarity. It can be hypothesized that for very similar, almost identical cross-language elements, showing a maximum overlap in phonological features, acquisition would be easier as only one cross-language representation would be needed. Acquisition would also be facilitated because these words can actually double in frequency of exposure, being heard in both language contexts and always linked to the same concept, so the pairing would be strengthened. This facilitation effect might be extended to similar-sounding, not just identical, word pairs, if phonological differences are not perceived as contrastive and words are treated as identical or as variants of the same form. Some support to this alternative can be drawn from results in familiar word recognition tasks, although limited to a specific vowel contrast (Ramon-Casas, Swingley, Sebastián-Gallés, & Bosch, 2009). Finally, from the production side, children’s attempts favouring similar sounding words over dissimilar ones are in line with early work showing that ability to produce a word-form seems to drive acquisition of other similar word-forms in the language (see, for instance, Vihman, 1993). All these arguments seem to suggest that for young bilinguals exposed to phonologically close languages such as Catalan and Spanish, with a common origin and sharing a fair amount of cognate words (around 75% of the words on the Spanish version of the MCDI are cognates), early lexical acquisition of words in both languages could be fostered.
As described in Table 1, studies analysing the emergence of TEs so far have involved rather different languages in terms of phonological proximity of cross-language synonyms. Data from Catalan and Spanish bilinguals can contribute to a better understanding of the central questions in this research: Does form similarity facilitate the acquisition of translation equivalents in these languages, and, if so, does the form similarity of translation equivalents influence the outcome of comparing bilingual to monolingual children’s vocabulary sizes? We have addressed these questions by identifying the number of form identical, form similar, and form dissimilar TEs in 18-month-old bilinguals’ vocabularies, and comparing multiple ways of calculating their vocabulary sizes to the single lexical score of a matched group of monolingual children.
Method
Participants
A total of 48 infants, aged 18 months, divided in two language background groups and comparable in socio-economic status (middle SES, secondary and graduate levels of maternal and paternal education) participated in the study. Half of the participants were simultaneous bilinguals being exposed from birth to both Catalan and Spanish from their parents (bilinguals: N = 24; 12 girls; M age = 18 months and 18 days, range 485–605 days). Their bilingual status was assessed through our language background questionnaire (Bosch & Sebastián-Gallés, 2001). Inclusion in this group was based on a distribution of exposure to both languages ranging from 50%–50% up to a maximum of 75%–25% distribution (M Catalan exposure = 51.5%; SD 15.9). The other half of the participants were growing up in monolingual homes, either Spanish or Catalan, and exposure to a second language, if present, was estimated to be always below 10% of the total amount of time, restricted to sporadic contacts with speakers outside home. There were 9 Spanish and 15 Catalan monolinguals in this group (monolinguals: N = 24; 12 girls; M age = 18 months and 16 days, range 524–594 days). Monolingual and bilingual groups did not significantly differ in age (t <1), but they clearly differed in language exposure, t(46) = 14.11; p = .0001.
Material
An ad hoc questionnaire designed to capture the spontaneous production of words in both of the languages in the child’s environment (adapted from a previous checklist from Águila et al., 2005) was used. The questionnaire has a total of 152 lexical items grouped into 14 different categories (onomatopoeic items, clothes, vehicles, people, household items, food, outdoor items, body parts, animals, toys, qualities, social routines, actions, and personal-social items). Words were structured in two columns, one for each language, the items following the same order of presentation so that TEs could be easily identified. Next to each word in each column there was a space to write down the form produced by the child. This was crucial to identify if the child produced just one form or both forms (TEs) perceived as distinct by the parents. Paired Spanish and Catalan items in the questionnaire were classified as form-identical, form-similar and form-dissimilar items according to specific phonological criteria (see Table 2 for a synthesis of the categorization criteria and phonetically transcribed examples for each category of TEs). The whole questionnaire had 26% of form-identical items, 37% of form-similar and 37% of form-dissimilar cross-language synonyms.
Categories of translation equivalents based on different degrees of phonological overlap between Spanish and Catalan words. Criteria for categorization and examples for each category are offered, followed by the English translation.
Procedure and analyses
Parents from the infant lab database were contacted to collaborate in this project. Selection criteria were children’s age and a clear monolingual or bilingual linguistic status of the family. Families with a fairly balanced distribution between both languages at home, each parent addressing the child in a different language from birth on were targeted. Children growing up in monolingual families but exposed to a different language in day-care centres were not included in the study.
After a short preliminary interview to find out about language/s in the family and general information about their child to ensure he/she was typically-developing, parents were given clear instructions on how to fill in the questionnaire, emphasizing the fact that we were interested in spontaneous productions, not imitations. All families were given the same questionnaire. Crucially, in the case of bilingual participants, parents were required to individually fill in the column corresponding to the language in which they usually interacted with the child. For items that were marked as being produced in each language, they were asked to write down as close as possible the form produced by the child.
Once completed, information from the questionnaire was extracted and double checked by the authors, before analyses were undertaken. First, single-language vocabulary size was obtained in monolinguals and bilinguals, and the Total Vocabulary Size measure (TVS, Spanish + Catalan) was computed in bilinguals. Words produced in both languages were identified and classified into three categories (identical, similar and dissimilar in form) following criteria described in Table 2. Finally, three additional vocabulary measures were obtained by successively subtracting from the TVS, form-identical, form-similar and form-dissimilar TEs.
Results
First, results from the monolingual participants are offered. Mean total vocabulary in this group was 45.6 words (median 41.5; SD 30.4; range 9–131). Total vocabulary size for the Catalan subgroup (n = 15) was 43 words (median 38; SD 27.3; range 10–87), and TVS for the Spanish subgroup (n = 9) was 50 words (median 44; SD 31.6; range: 9–131). A Mann-Whitney non-parametric test revealed no significant differences between subgroups (z = −0.2; p = 0.81) so data were collapsed into a single monolingual group for comparison with the bilingual group data.
Bilinguals had a mean TVS of 75.4 words (median 61; SD 56.5; range 23–213). Their single vocabulary measures were: M 33.5 (median 25.5; SD 26.1; range 10–97) for Spanish, and 41.9 (median 24.5; SD 35.7; range 6–135) for Catalan. Computations on the amount of TEs present in their vocabulary for each of the three categories yielded the following results: form-identical 18.4 (median 14.5; SD 9.9; range 8–39); form-similar 0.87 (median 0; SD 2.5; range 0–10), and form-dissimilar 0.46 (median 0; SD 1.05; range 0–4). Form-identical TEs represented 28% (range 19–35) of the TVS; form-similar was 1% (range 0–12), and form-dissimilar 0.33% (range 0–2). Similar and dissimilar TE’s together represented less than 2% of their TVS.
Information from categories of TEs present in bilinguals’ vocabulary was used to obtain different measures of vocabulary size. Specifically, three additional measures were computed: a) TVS minus form-identical TEs; b) TVS minus (identical + similar TEs); and c) TVS minus (identical + similar + dissimilar TEs). The latter is a measure equivalent to the standard TCV (conceptual vocabulary). Results were 55.7 (median 47; SD 46; range 15–172), 54.8 (median 43.4; SD 45; range 15–171) and 53.5 (median 43; SD 42; range 15–149), respectively (see Figure 1).

Mean expressive vocabulary in 18-month-old monolingual (left side) and Catalan-Spanish bilingual (right side) toddlers based on measures of Total Vocabulary Size (TVS). For the bilingual group different measures are reported resulting from successively subtracting form-identical, form-similar and form-dissimilar Translation Equivalents (TEs) from their Total Vocabulary Size (Catalan plus Spanish vocabularies). Note. Error bars represent ±1 Standard Error.
A one-way ANOVA comparing the different measures of bilingual’s vocabulary size revealed significant differences, F(1, 23) = 38.2; p = .0001; η2= 0.6. Post-hoc comparisons confirmed that TVS was significantly different from each of the other three measures, TVS − identical: t(23) = 8.2; p = .0001, d = 0.38; TVS − identical + similar: t(23) = 8.3; p = .0001, d = 0.4; TVS − identical + similar + dissimilar: t(23) = 7.3; p = .0001, d = 0.44, while none of the other comparisons reached significance (ps > 0.05).
Finally, comparisons between monolingual’s and bilingual’s vocabulary scores only revealed a significant between-group difference in TVS, t(46) = −2.2, p = .028; d = 0.65, but no significant differences in the remaining comparisons (all ts < 1) involving the alternative bilingual’s measures obtained correcting for the presence of different types of TEs.
Due to the minimal presence of both similar and dissimilar TEs in bilinguals’ vocabulary, we further analysed if their emergence could be related to size of the lexicon. The bilingual group was divided into children who did and did not have similar and dissimilar TEs (n = 6, mean TVS 94 words; and n = 18, mean TVS 42 words, respectively). Results from the Mann-Whitney non-parametric test yielded a significant difference (z = −2.3; p = .02), confirming that first similar and dissimilar TEs could only be found in children with larger vocabularies (> 45).
Discussion
This research analysed early expressive vocabulary in 18-month-old infants growing up in monolingual and bilingual families exposed to phonologically close languages (Spanish and Catalan). Bilinguals significantly differed from monolinguals in total vocabulary size (TVS) when both Spanish and Catalan words were considered. When different types of cross-language synonyms were excluded from the measures, bilinguals no longer differed from monolinguals. Form-identical cross-language elements accounted for 28% of the words in bilinguals’ total lexicon (TVS), but form-similar and dissimilar TEs were very limited, accounting for less than 2% of the words produced and only present in infants with larger vocabularies.
Results reveal that Catalan-Spanish bilingual children start “easy,” favouring the production of almost identical cross-linguistic items which are useful to communicate in both languages and somehow facilitate increasing both lexical repertoires simultaneously at approximately the same rate. The minimal presence of both form-similar and form-dissimilar TEs in bilingual’s vocabularies is in contrast with data from studies with bilinguals acquiring more distant languages, where it has been shown that they begin to incorporate cross-language synonyms even before their first 50 words (Deuchar & Quay, 2001; Holowka et al., 2002; Nicoladis & Secco, 2000). Our results thus indicate a rather different initial strategy in building a dual vocabulary when input languages share so many minimally different cross-language synonyms. This in turn has direct consequences on vocabulary size measures usually employed in monolingual-bilingual comparisons.
Core et al. (2013) reviewed three studies that reported different results in vocabulary measures. Only in Junker and Stockman’s (2002) work, on bilinguals acquiring relatively close languages (English and German), TVS scores were significantly higher than TCV and also higher than monolinguals’ total scores, a pattern not observed in Spanish-English or French-English bilinguals (Pearson et al., 1993; Thordardottir, Rothenberg, Rivard, & Naves, 2006). Although participants in Junker and Stockman’s study were older than bilinguals in our sample, their results parallel our current data. They support the hypothesis that phonological proximity of words across bilingual’s two languages can facilitate lexical acquisition.
But, what is the basis for this facilitation effect? As suggested in the introduction, high form-similarity (as in TEs merely differing in the quality of the vowel in the unstressed syllable) can result in children experiencing twice the exposure to the word-concept pairing thus enhancing lexical learning. Moreover, most of these words are very simple, monosyllabic or disyllabic CVCV items, a factor that can also explain why they are present in early vocabularies. But what about non-identical but form-similar TEs? Why are not they frequent in the early vocabularies of bilinguals in our study? A few of these words (those differing in one vowel segment other than a schwa) might be perceived as almost identical if we take into account results from familiar word recognition studies where a specific vowel mispronunciation did not affect recognition (Ramon-Casas et al., 2009). But most of the words in this category involve more than a single segment category and this difference could be challenging in production at this early age, even if it were detected in perception (we assume that fine tuning to the sound properties of the input language/s is already in place at the age tested, as suggested in Bosch & Sebastián-Gallés, 2003; Sebastián-Gallés & Bosch, 2009; Werker, Byers-Heinlein & Fennell, 2009). Form-similar TEs reported in the questionnaire involved items differing in consonants and in syllable structure, changes that are possibly more salient to the child. Whether experience with form-similar cross-linguistic TEs, which involve minor changes in phonological form, could contribute to consolidate the specific phonological representations in each of the bilingual’s two languages remains an open question.
Regarding form-dissimilar TEs, are they less frequent in the input? Do they refer to more complex concepts? Are they phonologically more difficult to produce? These words involve no phonological overlap so they cannot benefit from the facilitation effect produced by form similarity, but some of them correspond to frequent concepts (i.e., dog, chair…) and their phonological form is not especially complex. Because a few of them were reported, idiosyncratic factors, rather than phonological or frequency ones, may account for their early presence.
The current study has revealed some interesting aspects of early vocabulary building in bilinguals exposed to Catalan and Spanish, languages involving many form-similar cross-language-synonyms. Results are preliminary and need to be taken with caution as this research has certain limitations that deserve some comments. First of all, the study was based on parental report, and no recordings were available to confirm reliability of the forms reported by the parents. They might be biased to perceive differences in infants’ productions that perhaps were not actually present. But at the same time, as naïve listeners, they might not have detected subtle differences their infants already produced. Combining both, checklists and recordings, could improve reported information and yield greater reliability. Second, vocabulary estimates offered in this research were not obtained from the standard MacArthur CDI, but from a different, much shorter tool, specifically created to assess bilinguals’ production of TEs (adapted from Águila et al., 2005). Vocabulary size measures might perhaps deviate from the actual expressive vocabulary of the child. Last but not least, this research lacks a longitudinal perspective which would be informative about changes in the observed tendencies as vocabulary size increases. However, the main aim of the current study was to explore the emergence of different types of TEs in bilinguals simultaneously acquiring Spanish-Catalan, exposed to a relatively high amount of form-identical cross-language synonyms, and to analyse the effects of this type of input on early lexical learning. Some preliminary answers have been offered, but other questions remain open to future research.
Footnotes
Acknowledgements
We are grateful to Eva Águila for early collaboration in the preparation of the questionnaire, and to the parents of the participants for the information provided.
Funding
This work was supported by a grant from the Spanish MINECO [PSI-2011-25376].
