Abstract
Objectives:
This study investigates (a) whether child heritage speakers produce more gender mismatches in Spanish (un piedra “a-masc. stone-fem.”) than monolingual children, (b) whether older child heritage speakers mismatch more than younger ones, and (c) linguistic contexts in which mismatches occur.
Methodology:
3893 agreement forms were extracted from corpora of Spanish spoken by six monolingual children, ages 5–6 years, and three groups of US child heritage speakers: ten 5–6-year-olds, fifteen 7–8-year-olds, and twenty-one 9–11-year-olds.
Data and analysis:
Logistic regressions measured the impact of agreement form type, noun gender, noncanonical noun ending, and noun frequency on gender matching. One regression included 5–6-year-olds only (monolingual and heritage); the second included child heritage speakers only (5–11-year-olds).
Findings:
There were no significant differences between monolingual and heritage 5–6-year-olds; for these children, adjectives, direct object clitics, noncanonical nouns, and feminine nouns increased the likelihood of mismatches. Among the 5–11-year-old heritage speakers, direct object clitics referring to feminine nouns and noncanonical nouns increased the likelihood of mismatches. The 9–11-year-olds produced more gender mismatches referring to feminine nouns than the younger child heritage speakers, especially with direct object clitics.
Originality:
This corpus study provides evidence for high rates of gender matching and clarifies the contexts that increase the likelihood that children will mismatch.
Implications:
Gender matching remains an intact part of child heritage speakers’ Spanish grammars. The distribution of mismatches found provides evidence of a strong article–noun association and a weaker noun–direct object clitic association. The oldest child heritage speakers’ use of masculine clitic lo to refer to feminine nouns may reflect an association between English “it” and Spanish lo. More generally, the finding that mismatches tend to involve masculine forms referring to feminine nouns supports the idea that masculine is the default, unmarked form in Spanish.
Introduction
Approximately one-third of children under age eight in the United States speaks a language other than English at home (Park et al., 2018). These children are often called heritage speakers, a term that refers to bilinguals who acquire a minority language at home (Valdés, 2005). There is abundant evidence that US Spanish-speaking child heritage speakers’ grammars differ from those of monolinguals (Montrul, 2016; Shin, 2018; Silva-Corvalán, 2014). Yet, the nature of these children’s heritage language grammatical development is not fully understood. In a recent study of 1080 children who spoke Spanish at home and were acquiring English at school, Castilla-Earls et al. (2019) calculated percent of grammatical utterances produced in oral narratives. Upon entering school, the children produced an average of 95% grammatical utterances in Spanish. By the end of second grade, children who received instruction in both Spanish and English produced an average of 77% grammatical utterances in Spanish, and those who only received instruction in English produced an average of 69% grammatical utterances in Spanish. This large-scale study provides evidence of change in the heritage grammar after entering school. Nevertheless, some studies show that heritage grammars plateau rather than change with age (Castilla-Earls et al., 2016; Cuza & Pérez-Tattam, 2016), others find that older heritage speakers outperform younger ones (Gathercole, 2002; Montrul & Potowski, 2007), and others find that younger child heritage speakers outperform older ones (Cuza, 2016; Shin et al., 2019). Thus, the current study seeks to examine not only whether child heritage speakers differ from monolingual children, but also whether older and younger child heritage speakers differ from each other with respect to their Spanish grammars. To do so, we focus on grammatical gender in Spanish.
Grammatical gender is a fruitful topic for investigating heritage language development, as child heritage speakers have been shown to produce gender mismatches, like (1) where the masculine demonstrative esos refers to the feminine noun pesas, and (2) where the masculine adjective bonito refers to the feminine noun tapa.
(1) Pueden cortar esosm pesasf (Silva-Corvalán, 2014, p. 44). “Those weights can cut” (2) Está muy bonitom la tapaf (Silva-Corvalán, 2014, p. 77). “It’s very pretty, the lid”
Research on grammatical gender indicates that child heritage speakers produce more gender mismatches than age-matched monolingual children in Spanish (Cuza & Pérez-Tattam, 2016; Gathercole, 2002; Montrul & Potowski, 2007), as well as in Russian (Mitrofanova et al., 2018; Rodina & Westergaard, 2017; Schwartz et al., 2015). Further, children who experience reduced exposure to the heritage language mismatch more than children who experience more exposure (Mitrofanova et al., 2018). With age, child heritage speakers of Spanish in the US tend to experience increased exposure to English and, concomitantly, decreased exposure to Spanish (Castilla-Earls et al., 2019; Shin et al., 2019). As such, we might expect older child heritage speakers to mismatch more than younger ones. Yet, as will be discussed below, the evidence for such a conclusion is mixed and may depend on the type of agreement form in question and the linguistic contexts in which gender agreement applies.
Indeed, there is evidence that, among both child and adult heritage speakers, gender mismatches occur more often in certain linguistic contexts. For example, mismatches are more common when referring to feminine nouns rather than masculine nouns in both heritage Spanish (Cuza & Pérez-Tattam, 2016; Montrul & Potowski, 2007; Shin et al., 2019) and heritage Arabic (Albirini et al., 2013), and with certain agreement forms, such as clitics and adjectives, in child heritage Spanish (Montrul & Potowski, 2007; Shin et al., 2019). Mismatches are also more common with nouns that do not follow a transparent and frequent morphophonological rule, such as nouns with infrequent gender-specific suffixes in Greek (Prentza et al., 2017), as well as nouns that lack gender-specific morphophonological cues in Dutch (Unsworth et al., 2011), French (Kupisch et al., 2002), and Russian (Schwartz et al., 2015). In these cases, heritage speakers tend to overgeneralize the more frequent gender marker to nouns whose gender is noncanonical or less transparent (Polinsky, 2018).
Most evidence for child heritage speakers’ gender mismatching is based on experimental studies. One reason for this is that in natural production, determiners, especially articles, are frequent, but adjectives and direct object clitics are scarcer. As child heritage speakers produce few mismatches with articles (Montrul & Potowski, 2007), experimental tasks eliciting adjectives and clitics are helpful. At the same time, it is also important to verify findings from experiments with studies of naturalistic speech, thereby increasing the ecological validity of the conclusions drawn (Torres Cacoullos & Travis, 2018). In the case of gender agreement, this requires a large data set that includes numerous tokens of not only determiners, but also other forms that participate in gender agreement. In the current study, we analyze corpora of previously transcribed conversations with children to investigate whether child heritage speakers produce more gender mismatches than monolinguals, and whether mismatches are more frequent among older child heritage speakers than younger ones. In addition, we examine whether mismatches are more frequent with clitics and adjectives than with articles, and when referring to feminine, noncanonical, and less frequent nouns.
Gender assignment and agreement in Spanish
Traditionally, Spanish is considered to have two grammatical genders, masculine and feminine (Alarcos Llorach, 1994). Masculine nouns ending in -o and feminine nouns ending in -a are considered canonical. There are few exceptions to these pairings (Teschner & Russell, 1984), and new nouns entering the language follow suit (Montrul et al., 2014). Nouns that do not end in -o or -a, as well as masculine nouns ending in -a and feminine nouns ending in -o are considered noncanonical. Articles, adjectives, demonstratives, third-person subject and direct object pronouns and participles agree in gender (and number) with the noun they reference or modify (Ambadiang, 1999). In (3) the indefinite singular article una, the adjective blanca “white,” and the direct object clitic la “it” are feminine because they agree with the feminine noun casa.
(3) Veo unaf casaf blancaf. ¿Laf ves también? “I see a white house. Do you see it, too?”
Although traditional grammars posit two genders in Spanish, scholars have argued that masculine gender is the default or unmarked gender (Harris, 1991). Beatty-Martínez and Dussias (2019) present several lines of evidence supporting this view: the generic masculine is used to refer to groups that include both males and females; when words that are not typically gender-marked are used as nouns, they appear with masculine prenominals; reaction times during lexical decision tasks are shorter for masculine than feminine words; and children tend to assign masculine gender to unknown nouns with ambiguous phonological cues. In addition, both monolingual and heritage adult speakers of Spanish accept feminine nouns with masculine adjectives in grammaticality judgments (Fuchs et al., 2015; Scontras et al., 2018), and when English-Spanish bilinguals insert English nouns into Spanish discourse, they tend to avoid gender by relying on the default masculine for accompanying determiners or modifiers (e.g. elm window) (Beatty-Martínez & Dussias, 2017; Otheguy & Lapidus, 2003; Valdés Kroff et al., 2017). 1
Child heritage speakers’ acquisition of Spanish gender agreement
Monolingual Spanish-speaking children produce gender mismatches in early development (Hernández Pina, 1984), but by around 3 years of age, mismatches between nouns and determiners and between nouns and adjectives are rare. There is some evidence that, like monolingual children, bilingual children acquiring Spanish in a context where Spanish is the dominant language acquire determiner–noun and noun–adjective gender agreement by age three. Fernández Fuertes et al. (2016) investigated the development of grammatical gender in a corpus of conversations with bilingual twins in Spain who were recorded between ages 1;01 to 5;09. Overall, the twins produced 84% target-like gender agreement with determiners and adjectives. There was a slight increase in mismatching between ages 1;01 and 3;04, which all but disappeared after age 3;05, suggesting that the twins’ development was on par with monolingual norms.
Studies of children in the US indicate divergences between child heritage speakers and monolinguals. Montrul and Potowski (2007) studied 38 children in Chicago and 29 monolingual children in Taxco, Mexico. The children completed a story retell, but the resulting narratives included very few agreement forms other than determiners. Monolingual and child heritage speakers alike produced high rates of gender-matched determiner–noun combinations. Differences between these children were more evident in the adjectives they produced during an elicited production task: whereas monolinguals produced no mismatches at all, child heritage speakers who had been acquiring Spanish and English since birth produced masculine adjectives referring to feminine nouns at a rate of 60%; those who began learning English after age four did so at a rate of 34%. Cuza and Pérez-Tattam’s (2016) study of 32 child heritage speakers of Spanish in the US, ages 5;0–10;8, and 19 monolingual Spanish children, ages 4;7–9;1, focused on noncanonical nouns like pie, tren, nariz (“foot, train, nose”). In response to an experimental task that elicited determiner–noun–adjective sequences, the child heritage speakers produced significantly more gender mismatches than the monolingual children. Interestingly, neither Montrul and Potowksi nor Cuza and Pérez-Tattam found evidence suggesting that gender mismatching increases with age. Cuza and Pérez-Tattam note that instead child heritage speakers’ grammars might plateau rather than change during school age.
Yet, other studies of grammatical gender do find differences between younger and older child heritage speakers. For example, in her longitudinal study of two sisters who had moved from Puerto Rico to the US at a young age, Anderson (1999) found that, concomitant with an increase in English use, the girls’ gender mismatches also increased. Further evidence for differences between older and younger child heritage speakers comes from research on direct object clitics. Shin et al. (2019) studied direct object clitic production among child heritage speakers of Spanish in the US, ages 3–5 and 6–8. The older children, who also used more English and had higher English vocabulary scores, gender mismatched more than the younger children, who used less English and had lower English vocabulary scores. In fact, almost half the 6–8-year-olds consistently produced masculine lo for masculine and feminine objects alike. Importantly, the opposite pattern is found in monolingual acquisition of Spanish: Young children show some evidence of gender mismatching with clitics, but mismatches decrease with age (Castilla & Pérez-Leroux, 2010; De la Mora, 2004; Domínguez, 2003). This suggests that older child heritage speakers’ increased reliance on lo is an outcome of their bilingualism and possible influence from English genderless “it.”
To summarize, previous research indicates that child heritage speakers of Spanish in the US produce gender mismatches in specific word classes such as clitics and adjectives when referring to feminine and noncanonical nouns. However, studies have yielded divergent findings regarding age, with some studies suggesting increased mismatching with age and others showing no such effect. Another important question to explore is whether gender mismatches are indeed more common with noncanonical and less frequent nouns, as predicted by research on adult heritage speakers, as well as child heritage speakers of Russian (Montrul et al., 2014; Schwartz et al., 2015). Although studies focusing specifically on noncanonical nouns show gender mismatching among child heritage speakers of Spanish (Cuza & Pérez-Tattam, 2016; Gathercole, 2002), Montrul and Potowski (2007) found that the mismatches produced by child heritage speakers in their oral narratives occurred with frequent nouns with canonical endings, such as unm niñaf “a girl.” In addition, evidence of gender mismatching has been more robust in experimental tasks than in naturalistic production data (e.g. Montrul & Potowski, 2007). Further investigation of naturalistic data would increase the ecological validity of the conclusions drawn. As such, the current corpus study seeks to address the following research questions:
Do child heritage speakers mismatch more than monolingual children?
Do older child heritage speakers mismatch more than younger ones?
Do children mismatch more with clitics and adjectives than with articles?
Do children mismatch more when referring to feminine nouns than to masculine nouns?
Do children mismatch more when referring to noncanonical nouns than to canonical nouns?
Do children mismatch more when referring to infrequent nouns than to frequent nouns?
Participants
To investigate whether child heritage speakers produce more gender mismatches than monolingual children (RQ1), data from six 5–6-year-old monolingual children and ten 5–6-year-old child heritage speakers were analyzed (Table 1). The monolingual children’s data come from Shin’s corpus of monolingual children in Mexico, which consists of sociolinguistic interviews and picture book narration, as well as the BecaCESNo Corpus (Benedet et al., 2004, accessed from CHILDES [MacWhinney, 2000]), which consists of sociolinguistic interviews with children from Spain. Child heritage speakers’ data come from the Silva-Corvalán (1989) Bilingual Corpus. All children in Silva-Corvalán’s corpus had learned both Spanish and English since birth, except for Johanna, who learned Spanish since birth and English when she was between 3 and 4 years old. English was generally used more frequently than Spanish in these children’s homes. The children were exposed to Spanish in school, although classes were taught in English (Silva-Corvalán, personal correspondence, October 2018). Additional child heritage speaker data come from Shin’s corpus of Spanish spoken in the northwestern part of the US, which consists of sociolinguistic interviews and picture book narration (Villa et al., 2014). The children in Shin’s corpus were all born in the US. Although the average age of the child heritage speakers is higher than the average age of the monolingual children, the difference is not significant [t = 0.58, p = 0.57].
Participants, 5–6-year-old monolingual and child heritage speakers.
To investigate whether older child heritage speakers mismatch more than younger ones (RQ2), we compared three groups of child heritage speakers: 5–6-, 7–8-, and 9–11-year-olds. The 5–6-year-olds are the same ones described above (Table 1). The 7–11-year-olds’ data were obtained from the English-MiamiBiling Corpus (Pearson, 2002; available on CHILDES [MacWhinney, 2000]), which includes children in 2nd and 5th grade in Miami schools, narrating picture books. The children in 2nd grade were 7–8 years old at the time of the recording, while the 5th graders were 10–11 years old, except one participant, who was 9;08. The English-MiamiBiling children included in the current study spoke mostly Spanish or Spanish and English equally in the home, and were attending English immersion schools, except for one participant who attended a two-way bilingual school. 2 A total of 36 participants were included: 15 2nd graders, ages 7;03-8;07, and 21 5th graders, ages 9;08-11;04.
Data
Articles, adjectives, pronouns, direct object clitics, and demonstratives were extracted from the children’s transcripts if these forms clearly modified or referred to a noun in the discourse. Truncations like la pied-, so-called hermaphroditic nouns, which occur with masculine singular articles and feminine plural articles (e.g. el agua, las aguas) (Eddington & Hualde, 2008), and mixed-language noun phrases (e.g. la movie) were excluded. This process yielded a total of 4020 tokens of agreement forms. We excluded demonstratives (N = 126) for two reasons. First, there were two types of demonstratives: pronominal (N = 66) and pre-nominal (N = 60). Presumably, pre-nominal demonstratives pattern with articles; however, the relationship between articles and nouns on the one hand and demonstratives and nouns on the other with respect to gender agreement may differ. Second, all demonstratives matched their noun referent in gender; thus, there was clearly no difference between any groups with respect to gender matching with demonstratives in the current data set. After excluding demonstratives, we were left with 3893 agreement forms distributed as follows: monolinguals 5–6-year-olds: 815; child heritage speakers ages 5–6: 1289, ages 7–8: 773, ages 9–11: 1016.
All 3893 agreement forms were coded for the dependent variable, Gender match, i.e. whether the form matched the gender of the noun to which it referred or it did not match, and the following four linguistic predictor variables:
1) Word class (categorical): articles, adjectives, pronouns (third-person subjects and objects of prepositions), direct object clitics.
2) Noun gender (categorical): masculine, feminine.
3) Noun ending (categorical): Feminine nouns ending in -a and masculine nouns ending in -o were considered canonical. All others were coded as noncanonical.
4) Noun frequency (continuous): Frequency counts for each noun were calculated based on how many times the noun occurred in the CHILDES corpora used in the current study. There were 46 tokens from Shin’s corpora that referred to nouns that did not occur in the CHILDES corpora, such as hechicero “wizard” and salvavidas “life jacket.” These were assigned a frequency of 0. On the opposite end of the spectrum, niño “boy” occurred 4996 times in the CHILDES corpora, rendering it the noun referent with the highest frequency value in the current study.
For expository purposes consider example (4), which includes two agreement forms: una “a,” and chiquita “small.” Una is an article, chiquita is an adjective, and both are feminine and thus match the gender of the noun they refer to (cama “bed”). Both were also coded as canonical (cama is feminine and ends in -a), and were assigned 310 for noun frequency because cama occurred 310 times in the CHILDES corpora.
(4) Unaf camaf bien chiquitaf “Af very smallf bedf” (English-MiamiBiling, 2002:11232046)
Results
Overall, the children produced few gender mismatches (N = 5–6-year-old monolinguals: 14/815, 5–6-year-old heritage speakers: 36/1289, 7–8-year-old child heritage speakers: 14/773, 9–11-year-old heritage speakers: 33/1016; see Figure 1). There were no significant differences in overall gender matching rates, neither between the monolingual and child heritage 5–6-year-olds [X2(1) = 2.49, p = 0.11], nor among the three child heritage speaker age groups [X2(2) = 3.53, p = 0.17].

Percent gender matches, monolingual 5–6-year-olds and child heritage speakers, ages 5–6, 7–8, 9–11 years old.
To investigate whether the linguistic contexts in which the children produce (the albeit few) mismatches differ, binary logistic regression analyses were performed with gender match vs. gender mismatch as the dependent variable, and the following four linguistic predictor variables: Noun gender, Noun ending, Word class, and Noun frequency.
Regression results: Monolingual and child heritage 5–6-year-olds
In addition to the four linguistic predictor variables, analyses of the 5–6-year-olds’ data included a variable called Child group, which compared monolinguals and heritage speakers, as well as the random factor, Individual Child, which controlled for possible skewing by individual children. Non-significant variables were removed from the final model, which included all the predictor variables except two: Child group and Noun frequency. The reason for this is that the mixed-effects model, which included the random factor Individual Child, did not converge when either Child group or Noun frequency was included along with Noun gender, Noun ending, and Word class. When Child group was included in the mixed-effects model instead of other fixed factors, it still was not significant. Further, Child group was not significant in any fixed-effects models, i.e., models without the random factor Individual Child. 3 The lack of an effect of Child group was not surprising given the nearly identical rates for monolingual and child heritage 5–6-year-olds (Figure 1). We also ran a fixed-effects only version of the model with Noun frequency, and, just as before in the mixed-effects model, Noun frequency was not significant [z = 0.65, p = 0.52]. All two-way, interactions in both fixed- and mixed-effects models were tested, but none was significant. Mixed-effects models with three-way and four-way interactions did not converge, and while some fixed-effects models with these interactions did converge, no interactions were significant.
Table 2 presents the final model predicting the 5–6-year-olds’ gender mismatches. The application value of the dependent variable was set to gender mismatch; therefore, positive z scores represent increased likelihood of mismatches. Factors that significantly predict mismatches (p < 0.05) are in boldface, and reference levels are listed first for each predictor variable. The results show that gender mismatches were more likely when the agreement form was an adjective or a clitic than when it was an article, when the referent was a feminine noun, and when the noun was noncanonical.
Mixed-effects binary logistic regression analysis predicting gender mismatches, monolingual and child heritage 5–6-year-olds.
Random factor = Individual child. AIC = 401.9.
Regression results: Child heritage speakers, ages 5–6, 7–8, 9–11
In addition to the four linguistic predictor variables, regression analyses investigating child heritage speakers’ gender mismatches included a variable called Age group, which compared the three child heritage age groups: 5–6-year-olds, 7–8-year-olds, and 9–11-year-olds. In this case, the final model (Table 3) only includes fixed effects because mixed-effects models with Individual child as a random factor did not converge unless the model included a maximum of two fixed effects. The mixed-effects models with only two fixed effects did not fit the data as well as the fixed-effects only model reported below, per Akaike Information Criterion (AIC) values (Mixed effects with Noun ending and Word class: 687.2; Fixed effects, Table 3: 639.58). Noun frequency was not significant in any model, and thus was excluded. All two-way, three-way and four-way interactions were tested. There were no significant three- or four-way interactions. With respect to two-way interactions, two criteria were used to decide which interactions were included in the final model. First, all non-significant interactions (p > 0.05) were discarded. There was a total of four significant two-way interactions: Noun gender*Word class, Noun gender*Age, Word class*Age, Noun gender*Noun ending. Each of these interactions was entered in a model by itself as well as in models that included the other interactions. The final model selected (Table 3) was the one with the lowest AIC value and included Noun gender*Word class and Noun gender*Age. 4
Binary logistic regression analysis predicting gender mismatches, 5–6-, 7–8-, and 9–11-year-old child heritage speakers.
The results in Table 3 show that mismatches were more likely with clitics and with noncanonical nouns. The interactions between Noun gender and Word class indicate that clitics referring to feminine nouns increased the likelihood of gender mismatches. Finally, the interaction between Noun gender and Age group shows that mismatches with words referring to feminine nouns were especially likely among the 9–11-year-olds.
To further illuminate the interactions in Table 3, Figure 2 plots rates of gender mismatches for masculine and feminine nouns for each word class separately for each age group. The rates indicate that the older the child heritage speaker, the higher the rate of gender mismatching when referring to feminine nouns, and this is especially true with direct object clitics. In fact, the 9–11-year-olds used lo rather than la 72% of the time. These older children also produced a higher rate of masculine adjectives with feminine nouns (3/18, 16.7%), as well as a higher rate of masculine pronouns with feminine referents. The pronoun results, however, must be taken with caution as the 9–11-year-olds only produced a total of four pronouns referring to feminine referents.

Percent gender mismatches by Word class and Noun gender, child heritage speakers, three age groups.
Discussion
Whereas previous research on child heritage speakers’ gender agreement has primarily been experimental, this study analyzed naturalistic corpus data to investigate whether child monolingual and heritage Spanish speakers differ significantly in their production of gender agreement, whether gender mismatching is more frequent among older child heritage speakers than younger ones, and whether mismatches are more common in particular linguistic contexts. To do so, we analyzed 3893 agreement forms from previously transcribed corpora of Spanish child language. The corpora included monolingual 5–6-year-olds and three groups of child heritage speakers of Spanish in the US: 5–6-, 7–8, and 9–11-year-olds.
Our first research question asked whether child heritage speakers would produce more mismatches than their monolingual peers. Based on our comparison of heritage and monolingual 5–6-year-olds, the answer was negative: both groups gender-matched at near-ceiling rates (Figure 1) and there were no significant differences between them (Table 2). This finding is unexpected given that previous research has found significant differences between monolingual and child heritage speakers’ gender matching in experiments (e.g. Cuza & Pérez-Tattam, 2016; Montrul & Potowski, 2007) and oral narratives (Montrul & Potowski, 2007). However, there are several important differences between our study and Montrul and Potowski’s narrative data. First, whereas our comparison of monolinguals and child heritage speakers focuses on ages 5–6, the youngest children in Montrul and Potowski’s study were 6–8 years old. Second, different methodology was employed; whereas we coded each agreement form, Montrul and Potowski coded each noun token for whether its determiner or adjective provided evidence of agreement. Finally, our study includes pronouns and clitics, whereas theirs does not. This last difference is important, as one of the reasons the monolingual 5–6-year-olds did not differ from the child heritage 5–6-year-olds is that both groups mismatched with clitics (monolinguals: 10/74, 13.5%; heritage: 15/190, 7.9%). In addition, one might wonder whether the child heritage 5–6-year-olds in our study were more Spanish-dominant than those in Montrul and Potowski’s study. Indeed, Montrul and Potowski (2007) found that children who spoke both Spanish and English since birth mismatched more than children whose first language was Spanish and who began learning English after age four. Unfortunately, we do not have enough social data to examine the effect of age of exposure to English, nor can we investigate the impact of exposure to Spanish and English, language dominance, or proficiency. However, the three 5–6-year-old child heritage speakers whom we know learned Spanish and English since birth (Michael, Brian, and Melina) mismatched very rarely (overall: 8/577, 1.4%; adjectives: 3/77, 3.9%; clitics: 1/87, 1.1%). In sum, it is possible that 5–6-year-old child heritage speakers who experience much less exposure to Spanish would mismatch in their natural production data at higher rates than the 5–6-year-olds in our study. Nevertheless, it is crucial to keep in mind the contexts in which such mismatches occur given that both monolingual and child heritage speakers are prone to mismatching with clitics.
Our second question was whether older child heritage speakers would produce more mismatches than younger ones due to the common experience of increased exposure to English and decreased exposure to Spanish with age among Spanish-speaking children in the US (Castilla-Earls et al., 2019; Shin et al., 2019). Although we did not find a main effect for age, there was a significant interaction between feminine nouns and the 9–11 age group. That is, the oldest child heritage speakers were more likely to mismatch with feminine nouns than the younger child heritage speakers. As will be discussed below, this finding is most prominent for direct object clitics, suggesting that the difference between the younger and older children is related to their use of lo to refer to masculine and feminine referents alike.
Our third question was whether mismatches would be more frequent with adjectives and clitics than with articles. The answer was affirmative for the monolingual and heritage 5–6-year-olds. Among the three child heritage speaker groups, clitics, especially those referring to feminine nouns, increased the likelihood of mismatching (Table 3), as in example (5), in which a child in Miami, age 10;9, produced feminine noun frog with feminine article una and then referenced frog with masculine clitic lo.
(5) El niño cogió unaf ranaf y lom puso en una cosa de cristal. “The boy caught a frog and put it in a crystal thing.” (English-MiamiBiling, 11231247)
The finding that mismatches are rare with articles and more common with clitics (and with adjectives for the 5–6-year-olds) provides evidence for a strong association between articles and nouns (Lew-Williams & Fernald, 2007; Montrul et al., 2014) and a weaker association between nouns and clitics or adjectives. This difference in strength of association may be attributable to various factors, including frequency and distance. First, Spanish nouns almost always occur with determiners (most typically articles), whereas adjectives and direct object clitics are much less frequent. In our data set, there were 2839 articles, 439 adjectives, and 385 clitics. The sheer frequency of article + noun combinations leads to rapid entrenchment of these combinations. In fact, children first learn these sequences as chunks and later begin to parse them as separate words (Hernández Pina, 1984), a process that likely contributes to the firm entrenchment of the gender-matched combinations. A second factor that differentiates articles on the one hand and adjectives and clitics on the other is linear distance. In our data set 97% of the articles were adjacent to the noun, whereas 54% of the adjectives and only 3% of the clitics were adjacent to the nouns they modified. Furthermore, there is preliminary evidence in our data that distance between nouns and adjectives increased gender mismatching. Although there was no main effect of adjectives in the analysis of the three child heritage speaker groups (Table 3), a follow-up analysis of these children’s adjectives shows that the children gender mismatched more when the adjective was not adjacent to the noun (13/200, 6.5%) as compared with when it was adjacent (3/239, 1.3%) [X2 (1) = 8.53, p = 0.003].
Although clitics increased the likelihood of mismatches for all children, this was especially the case for the oldest child heritage speakers, who produced lo instead of la for 72% of the feminine referents (Figure 2). The older children’s increased use of lo is likely related to increased English proficiency and perhaps specifically to the influence of genderless “it.” Although the influence of English proficiency cannot be directly examined in the current study, Shin et al. (2019) found that higher rates of lo referring to feminine referents correlated with higher English vocabulary scores and higher levels of English language use in the home. Such a conclusion is bolstered by research showing that specific language pairs can facilitate or hinder the maintenance of grammatical gender in heritage speakers. For example, in Schwartz et al.’s (2015) study of Russian gender agreement among Russian child heritage speakers who were acquiring German, Hebrew, English, or Finnish, the Russian-German and Russian-Hebrew children outperformed the Russian-English and Russian-Finnish children. The authors concluded that the gender systems in German and Hebrew facilitated the maintenance of Russian gender, whereas the lack of gender in English and Finnish hindered it.
The results for noun gender address our fourth research question, which asked whether children would mismatch more with feminine than with masculine nouns. The answer was affirmative for monolingual and heritage 5–6-year-olds (Table 2). Among the three groups of child heritage speakers, the feminine noun effect was significant among the oldest children (Figure 2; Table 3). Why do gender mismatches tend to involve masculine agreement forms referring to feminine nouns? Returning to the idea that genderless “it” in English influences direct object clitic gender in Spanish, one might ask why child heritage speakers would associate masculine clitic lo rather than feminine clitic la with “it.” In fact, one might predict that la would become the default form because la is more frequent due to its polysemy (the feminine definite article is also la). Yet, children’s gender-mismatched clitics almost always involve using lo for feminine referents. A plausible explanation is that in Spanish the masculine gender represents the default, unmarked form, while feminine gender is the marked form, as evidenced, for example, in the use of the generic masculine to refer to groups of males and females (Beatty-Martínez & Dussias, 2019). Further support for this idea can be found in the gender-mismatched articles in the current study. Although gender mismatching with articles was rare (N = 35), 71% of the 35 gender-mismatched articles were indefinite or plural, as in unm piedraf “a stone” and losm abejasf “the bees.” It is possible, then, that when children suspend their gender marking and rely on the default masculine, they are extending a more general process of avoiding gender marking to refer to a group rather than a specific individual. For the 5–6-year-olds, this process seems to apply across word types. For the oldest child heritage speakers, reliance on the masculine default is concentrated mostly in their clitics (Figure 2; Table 3).
Our fifth question was whether children would mismatch more with noncanonical nouns. The results were affirmative (Tables 2 and 3). Consider otrom partef “other part,” produced by a child heritage speaker, age 6;10. Because parte ends in -e, it is noncanonical. At the same time, nouns that end in -e tend to be masculine (Teschner & Russell, 1984). Thus, either the child overgeneralized a pattern that she had learned (masculine gender for nouns that end in -e), or she relied on the default masculine. In either case, the finding that mismatching is more common with noncanonical nouns supports the conclusion that less frequent gender-agreement patterns and item-based exceptions to agreement patterns take longer to master (Montrul et al., 2014; Schwartz et al., 2015).
Our sixth question was whether children would mismatch more with infrequent nouns. To explore this question, we calculated noun frequency values based on frequency of occurrence in the CHILDES corpora that were included in the current study (see Table 1). There were no significant effects of frequency. If we divide the data set in half based on the median as a cut-off value, the lower half (N = 1972) consists of tokens with frequency values between 0 and 434 and the upper half (N = 1921) consists of tokens with frequency values between 435 and 4996. The rate of gender mismatching for the lower half is 2% and for the upper half it is 3%. We suspect that the lack of frequency effects in our study is, in part, related to the fact that children tend to use the words they know well (Gershkoff-Stowe & Hahn, 2013). The CHILDES frequency values in the current study ranged from 0 to 4966. That is, children produced agreement forms referring to very frequent nouns like niño “boy” (N = 536) and perro “dog” (N = 403), which had CHILDES frequency values of 4996 and 3566 times, respectively, as well as much less frequent nouns like hechicero “wizard.” Not surprisingly, however, there is a Zipfian distribution of noun referents in the current data set such that numerous nouns have low frequencies and few words have high frequencies. For example, niño and perro, the two noun referents with the highest CHILDES frequency values, together comprised 939 tokens. In contrast, there were 55 different types of noun referents that had CHILDES frequency values of 2 or less; these accounted for 94 tokens in the current data set. In summary, although all analyses of noun frequency indicate that in the current study the children do not mismatch more often when they refer to infrequent nouns than when they refer to frequent nouns, experimental studies might prove more effective at revealing frequency effects by examining children’s gender matching with words they are less accustomed to using in their own speech.
Conclusion
Our corpus study suggests that Spanish gender matching is a pervasive and intact part of child heritage speakers’ grammars. The one area of the grammar that seems susceptible to change among older child heritage speakers, who in general experience more exposure to English and less exposure to Spanish, is direct object clitic gender. Our study shows that the use of lo to refer to feminine and masculine referents alike is especially prominent among older child heritage speakers, which concords with Shin et al.’s (2019) experimental study of direct object clitics. Together the two studies suggest that as children become increasingly proficient in English, they may associate genderless “it” with lo, which would constitute crosslinguistic influence at the lexical level with repercussions for direct object clitic gender in Spanish. Such a conclusion, however, needs to be confirmed by longitudinal data.
The study also finds that mismatches mostly involve reference to feminine nouns, which supports the claim that Spanish has a default gender and a marked gender (feminine) rather than two marked genders (Beatty-Martínez & Dussias, 2019). Finally, the finding that mismatches are more frequent with noncanonical nouns reinforces the idea that less frequent gender-agreement patterns and item-based exceptions take longer to master.
An important limitation in the current study is that, due to the reliance on multiple corpora, the child heritage speakers’ exposure to Spanish and English, language dominance, and language proficiency could not be measured (c.f., Fernández Fuertes & Liceras, 2018; Silva-Corvalán, 2014). This is an important shortcoming given that amount of exposure to the heritage language has been shown to significantly predict gender matching in other studies of child heritage speakers (e.g. Mitrofanova et al., 2018), and language dominance has been shown to predict gender matching in mixed DPs (e.g. laf partyf rather than elm partyf because fiestaf is feminine) (Liceras et al., 2016). Thus, while age tends to correlate with increased use of English and decreased use of Spanish in the US (Castilla-Earls et al., 2019; Shin et al., 2019), more direct measures of language exposure, dominance, and proficiency are needed to further understand child heritage speakers’ command and use of grammatical gender across their development.
Footnotes
Acknowledgements
We are very grateful to the editor of the journal and two anonymous reviewers, who provided very valuable feedback. We also wish to thank Ana de Figueroa for assistance with coding two children’s data.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
