Abstract
The syntactic structure of sentences in which a new word appears may provide listeners with cues to that new word’s form class. In English, for example, a noun tends to follow a determiner (a/an/the), while a verb precedes the morphological inflection [ing]. The presence of these markers may assist children in identifying a word’s form class and thus glean some information about its meaning. This study examined whether Mandarin, a language that has a relatively impoverished morphosyntactic system, offers reliable morphosyntactic cues to the noun–verb distinction in child-directed speech (CDS). Using the CHILDES Beijing corpora, Study 1 found that Mandarin CDS has reliable morphosyntactic markers to the noun–verb distinction. Study 2 examined the relationship between mothers’ use of a set of early-acquired nouns and verbs in the Beijing corpora and the age of acquisition (AoA) of these words. Results showed that the occurrence of the form class markers is a reliable predictor of the AoA for the early-acquired nouns and verbs.
Baldwin (1991) wrote that fully 50% of the utterances parents direct to children are not about things or events in the child’s immediate environment. This dramatic percentage raises the question of how children interpret nonlinguistic and linguistic information to infer the form class of novel words. The nativist approach has proposed the idea of innate grammatical categories (Chomsky, 1965), but as Fodor argued, one must still learn which particular words are nouns and verbs in one’s language (Fodor, 1975). At least in English (and many other languages such as French (Valdman, 1976), there are reliable morphological markers (as the presence of [ing] on verbs in English and the French suffix [er] for infinitive verbs) and determiners that precede nouns (as the/a/an in English and un in French – an indefinite article) in the language input that children appear to utilize (e.g., Brown, 1957; Shi, 2014). There are also distributional cues to form class. For example, in a frame consisting of two words co-occurring frequently with one word in between (e.g., She X to; put Y in), X tends to be a verb while Y is most probably a noun (Mintz, 2003). But what about a language like Mandarin that allows the omission of both subjects and objects, and requires neither determiners nor morphological inflections (Chao, 1968)? Here we explore whether there are morphosyntactic cues available in Mandarin child-directed speech (CDS) to which words are nouns and which are verbs, and if so, whether the use of these syntactic cues can predict the age of acquisition (AoA) of nouns and verbs.
By now, numerous studies (mostly studying English) have shown that children use syntactic cues to glean something of the meaning of new words – a process known as syntactic bootstrapping – in a range of categories, although the emphasis of syntactic bootstrapping research has been placed on the category of verb (e.g., Arunachalam & Waxman, 2010; Fisher, 1996; Gertner, Fisher, & Eisengart, 2006; Gleitman, 1990; Gleitman, Cassidy, Nappa, Papafragou, & Trueswell, 2005; Lidz, Gleitman, & Gleitman, 2003; Naigles, 1990, 1996; Naigles, Bavin, & Smith, 2005; Naigles & Kako, 1993; Yuan & Fisher, 2009; Yuan, Fisher, & Snedeker, 2012; but see Dittmar, Abbot-Smith, Lieven, & Tomasello, 2008). For example, the syntactic structure accompanying a verb guided the interpretation of a novel verb as either transitive or intransitive in 2-year-olds (Hirsh-Pasek, Golinkoff, & Naigles, 1996; Naigles, 1990; Yuan et al., 2012), and as a belief verb (e.g., believe, think) in 4-year-olds (Papafragou, Cassidy, & Gleitman, 2007). Verb frames also influenced the listener’s interpretation of perspective on the action pairs of give–receive and chase–flee in preschoolers (English – Fisher, Hall, Rakowitz, & Gleitman, 1994; Mandarin – Cheung, 1998) and the interpretation of the causativeness of familiar verbs (English – Naigles, Gleitman, & Gleitman, 1993; Naigles, Fowler, & Helm, 1992; French – Naigles & Lehrer, 2002; Kanada – Lidz et al., 2003). Furthermore, children can use syntactic information from overheard speech in the absence of accompanying events (Arunachalam & Waxman, 2010; Yuan & Fisher, 2009). This – and work with a blind child (Landau & Gleitman, 1985) – provides the strongest test of whether children can use syntactic bootstrapping to help identify verb meaning.
Can children use syntactic bootstrapping to distinguish between nouns and verbs? Research suggests that children are sensitive to the cues indicating the noun form class early on. For example, by 14 months of age, English-learning children appear to employ syntactic bootstrapping to distinguish nouns from adjectives (Booth & Waxman, 2009). In addition, 2-, 3-, and 5-year-olds could interpret a novel word as either a noun when hearing ‘This is a corp,’ versus a preposition when hearing ‘This is acorp my box,’ accompanied by a supporting scene (Fisher, Klingler, & Song, 2006; Landau & Stecker, 1990). Furthermore, children interpret a novel word as either a common or proper noun depending on the use of determiners (Gelman & Taylor, 1984; Jaswal & Markman, 2001; Katz, Baker, & Macnamara, 1974), as either a mass or count noun based on the presence of the word some, or a determiner (Soja, 1992). Another example comes from French. French child-directed speech (CDS) also has cues to the noun–verb distinction (Chemla, Mintz, Bernal, & Christophe, 2009), and 14-month-old French-learning infants use determiners to categorize novel words as nouns (Shi & Melançon, 2010). However, these studies did not examine children’s use of syntactic bootstrapping to distinguish between nouns and verbs.
In 1957, Brown conducted a pioneering study that demonstrated that English-speaking preschoolers mapped a novel word to different aspects of a picture depending on whether they heard ‘This is a zup,’ ‘This is some zup,’ or ‘This is zupping.’ Using the Intermodal Preferential Looking Paradigm (For a review, please read Golinkoff, Ma, Song, & Hirsh-Pasek, 2013), Echols and Marti (2004) found that 18-month-olds mapped the novel word, gep, to either the action aspect of an event when hearing ‘It’s gepping,’ or the agent of the event when hearing ‘That is a gep.’ Research also showed that 24-month-old English learners mapped a novel word to either the object, when hearing ‘He is waving a larp,’ or the action, when hearing ‘He is larping a balloon’ while viewing an event (Waxman, Lidz, Braun, & Lavin, 2009). Thus, in a language that signals the form classes of nouns and verbs in reliable ways, young children seem to access and use those markers to decide whether a word is a noun or a verb.
A critical question about the role syntactic bootstrapping might play in distinguishing between nouns and verbs is whether it can apply cross-linguistically in a language that has a relatively impoverished morphosyntactic system (such as Mandarin). A corpus study of Mandarin CDS by Lee and Naigles (2005) found that Mandarin CDS has reliable cues to distinguish between transitive and intransitive verbs, and between motion and communication verbs. Furthermore, 2-year-old Mandarin learners can use syntactic bootstrapping to interpret the causativeness of familiar verbs (Lee & Naigles, 2008). A one-year longitudinal study on two children learning a southern Mandarin dialect, starting from 10 months of age, showed that they could use frequent sentence frames to form the category of verb and adjective, with some degree of success (Xiao, Cai, & Lee, 2006). However, it still remains unclear whether there is a relation between the use of these cues on particular verbs and the age at which those verbs were acquired. Nor did these studies evaluate whether there are reliable cues distinguishing verbs versus nouns in Mandarin CDS.
Although research on adult-directed written Mandarin indicates that function words and morphosyntactic markers distinguish nouns versus verbs (Redington et al., 1995), it is unclear whether Mandarin CDS also offers these cues to the noun and verb distinction. Oral language directed to children tends to be far less formal than written language and parents may engage in simplifications that alter the relationships between these function words and the categories with which they cluster. For example, it is grammatical to say Chi1 ma, a sentence consisting of only a verb (chi1 ‘eat’) and a question marker (ma), to mean ‘Do you want to eat it?’ 1 Research on parental reports of child speech production showed that by 24 months of age, roughly 50% of Mandarin-learning children produced one or more noun-specific and verb-specific markers with nouns and verbs respectively (Tardif, 2006). In addition, Tardif and Zhang (2003) examined the five most frequent nouns and the five most frequent verbs produced by 20- to 26-month-old Mandarin-learning children. The children appeared capable of producing several morphosyntactic markers for nouns and verbs appropriately. However, this study did not address whether those potential morphosyntactic markers were indeed reliable cues to the noun and verb distinction in Mandarin CDS, or whether the use of these cues in CDS could facilitate children’s word learning.
In sum, research suggests that Mandarin CDS has reliable cues to distinguish among sub-types of verbs (Lee & Naigles, 2005), and that adult-directed written Mandarin may have reliable morphosyntactic cues to distinguish nouns versus verbs (Redington et al., 1995). Mandarin-learning children can produce one or more morphosyntactic markers for nouns and verbs at around 24 months of age (Tardif, 2006; Tardif & Zhang, 2003). However, two major questions still remain unanswered. First, does Mandarin CDS have reliable morphosyntactic cues to the noun and verb distinction? This is a prerequisite for syntactic bootstrapping since without reliable cues to the noun and verb distinction such bootstrapping is less likely to occur. Second, if reliable markers exist, can the use of these cues predict the age of acquisition of nouns and verbs? Perhaps the use of those markers in CDS helps children learn a word’s part of speech, thereby contributing to the discovery of the word’s meaning. Thus, words that occur more often with appropriate form class markers may be acquired earlier than words that occur less often with those form class markers. This study addresses these two questions through an investigation of Mandarin CDS corpora and the age of acquisition (AoA) data of a set of early-acquired Mandarin nouns and verbs. It should be noted that although the AoA data do not allow us to determine if children understand the meanings or form classes or both of the early-acquired words, we are at least in a position to evaluate whether syntactic bootstrapping is in theory applicable to the acquisition of nouns and verbs in Mandarin.
How Mandarin cues the existence of nouns vs. verbs
In Mandarin, it is grammatical to refer to an object in a sentence without a determiner as in Xiao3ming2 you3 ping2guo3 ‘Xiao3ming2 has apple,’ 2 and to refer to an ongoing event without a morphological inflection on the verb as in chi1 ping2guo3 ‘eat apple,’ which can refer to an ongoing event of eating an apple. In addition, nouns and verbs may occur in the same position in a sentence, as Mandarin allows the omission of subjects and objects. For example, when asking a question meaning ‘Do you want an apple?’, one can omit ni2 ‘you’ and/or ping2guo3 ‘apple’; thus, the verb (yao4 ‘want’) can appear towards the sentential final position before the question marker (ma) as in Ni2 yao4 ma?, or at the sentential initial position as in Yao4 ping2guo3 ma?, or nearly in isolation only accompanied by the question marker as in Yao4 ma?
Because of these linguistic factors, perhaps Mandarin-speaking children’s learning of nouns and verbs lags behind that of their English-speaking counterparts. However, such a prediction is not consistent with the data. Notably, Mandarin-speaking children are not disadvantaged in noun or verb learning based on parental reports on the MacArthur Communicative Developmental Inventories (CDI) (Fenson et al., 1994; Tardif, Fletcher, Zhang, Liang, & Zuo, 2008), a widely used parent-report instrument. Furthermore, Mandarin-speaking children learn verbs even earlier and in greater number than English-speaking children based on CDI data (Ma, Golinkoff, Hirsh-Pasek, McDonough, & Tardif, 2009; Tardif et al., 2008). The relative verb learning advantage may be related to the fact that verbs are the only required element in Mandarin sentences that refer to motion or a change of state (Tardif, Shatz, & Naigles, 1997). Finally, Mandarin- and English-speaking children had comparable performance on learning novel nouns and verbs in an experimental test (Imai et al., 2008). However, Mandarin-speaking 5-year-olds interpreted a novel word as either a noun or a verb depending on whether it was preceded by ge (a classifier preceding nouns) or zai4 (a verb-marker indicating ongoing events) (Imai et al., 2008), but only when a one-second segment of object-holding from the beginning of each videotaped scene was removed. Since verbs can be bare in Mandarin, Mandarin-learning children may have to rely on cues other than morphosyntax – such as social cues (object holding or eye gaze) – to identify the form class of newly encountered words. Alternatively, perhaps the syntax–meaning correspondence in Mandarin CDS is reliable and the use of these form class markers can facilitate the learning of nouns and verbs.
Researchers have long argued that Mandarin CDS might have reliable morphosyntactic cues to distinguish between nouns and verbs (Chao, 1968; Li & Thompson, 1981; Tardif, 2006). In Mandarin, nouns can be preceded by determiners (e.g., zhe4 ‘this,’ na4 ‘that’) and a possessive marker – de to indicate the relationship of possession as in Xiao3ming2 de ping2guo3 ‘Xiao3ming2’s apple.’ In addition, when a noun is quantified by a numeral, the noun is often necessary to occur in a numeral–classifier–noun structure. For example, a phrase equivalent to three people should be expressed as san1 ge ren2 ‘three-classifier-people.’ Mandarin has a generic classifier, ge, which is the most common classifier, and multiple specific classifiers indicating the shape (tiao2 indicating long and slender objects; zhang1 indicating flat objects), function (liang4 indicating ground vehicle; sou1 indicating water vehicle), or other physical properties of the object. By contrast, verbs can be preceded by negation markers (e.g., mei2, bu4, bie2), be modified by postverbal aspect markers (e.g., zhe indicating an ongoing event; le indicating a completed event) to refer to different aspects of actions (Zhou, Crain, & Zhan, 2014), and be adjacent to verb constructions to refer to different states of action (e.g., you4 ‘again,’ dou1 ‘all,’ ye3 ‘also’). This study tests whether such cues distinguishing nouns vs. verbs are reliable in Mandarin CDS (Study 1), and if so, whether the use of these markers can predict the age of acquisition of early-acquired words (Study 2).
Study 1
Study 1 examines whether parents speaking in Mandarin CDS use different and reliable morphosyntactic cues with nouns vs. verbs. Mandarin-speaking adults were first asked to identify potential noun- and verb-markers from the most frequently used words in CDS. Next we examined whether the appearance of potential noun- and verb-markers were associated with nouns and verbs respectively in CDS. Here we focus on the cases where form class markers appeared immediately adjacent to noun and verbs, because when markers are adjacent to words of another form class (e.g., the big zorp), children may determine the form class of the novel word zorp based on their understanding of intervening words (in this case, adjectives) rather than determiners. This could leave the function of form class markers unclear. Therefore, an examination of only the cases of immediate adjacency offers a strong test case to evaluate the functional significance of form class markers.
Participants, stimuli, and procedure
Based on previous research (Mintz, 2003; Redington et al., 1995), we first generated a list of the most frequently used words in Mandarin CDS for the identification of form class markers. The CHILDES Beijing corpora were used. They contain conversations between 10 children (8 boys; mean age = 22.8 months, SD = .66) and their adult caregivers (Tardif, 1996). 3 Of the 10 families (all monolingual Mandarin speakers living in Beijing), five children (4 boys, 1 girl) had parents whose education was at college level or higher, and the other five children (4 boys, 1 girl) had parents whose education was at high school level or lower. Adult caregivers included parents, grandparents, live-in nannies, aunts, and neighbors. The children and their families were video-recorded in their homes for one hour for between four and six visits while engaged in naturalistic interaction and doing activities they usually did at that time of the day. The activities included indoor and outdoor toy play, dressing, mealtimes, and social interchanges (see Tardif, 1996; Tardif et al., 1997). Caregivers’ utterances directed to children were transcribed for a total of 50,118 child-directed utterances.
We generated a list of the 200 most frequently used words based on mothers’ utterances in the CHILDES Beijing corpora. Since children were exposed to spoken rather than written language input in the corpora, the definition of word identity strictly followed the transcription of the CHILDES Beijing corpora, which is based on word pronunciation. Thus, a polyphonic word transcribed as having different pronunciations was counted as being two different words, such as zhe4 and zhei4 (two possible pronunciations for the determiner meaning ‘this’) and bu4 and bu2 (two possible pronunciations for the negation word meaning ‘not’). Similarly, homophonic words were treated as one word, such as jia1 that can be a noun ‘home’ and a verb ‘carrying/holding an object under one arm.’
Then, three native speakers of Mandarin Chinese, all graduate students of linguistics, were recruited. Blind to the purpose of this study, using the list of the 200 most frequent words, they were asked to first identify potential form class markers, and then indicate the form class of other words based on the following instruction written in Chinese: Please identify words that tend to occur adjacent to nouns and verbs respectively. Indicate whether these words tend to precede or follow nouns or verbs. For example, zhe tends to follow a verb, indicating an ongoing event, while na4 ‘that’ tends to precede a noun. The identified words must meet two criteria: They cannot be either nouns or verbs themselves, and they tend to be immediately adjacent to either nouns or verbs. Then, for other words, please identify their form classes.
All three coders agreed that among the 200 most frequent words, there were 15 verb-markers and 11 noun-markers. Since five of the noun-markers (ge ‘classifier,’ zhei4ge ‘this-classifier,’ yi2ge ‘one-classifier,’ nei4ge ‘that-classifier,’ nei3ge ‘which-classifier’) were associated with the same marker (ge), and all of them were transcribed as /ge/, they were combined in data analysis. Thus, there were seven potential noun-markers, consisting of four determiners, zhe4 ‘this,’ zhei4 ‘this,’ na4 ‘that,’ and nei4 ‘that,’ one possessive marker – de, the generic classifier – ge, and a specific classifier – dianr3 meaning ‘a bit of.’ The 15 potential verb-markers consisted of two progressive aspect markers, one perfective aspect marker, two future aspect markers, two auxiliary words for action capability, four negation words, and four adverbs (see Table 1). Except for zhe (an aspect marker for an ongoing event) and le (an aspect marker for a completed event), which tend to follow words of the predicted form class (i.e., verbs), all other markers tend to precede words of the predicted form classes.
Mandarin-speaking mothers’ use of form class markers in CDS according to the Beijing corpora (form class markers appearing at utterance boundaries were excluded).
Note: * These markers predict the form class of preceding words, while other markers predict that of following words.
Examples: X-de noun, referring to the fact that X possesses the object. For the marker de, the word that immediately follows de was coded as either a noun (as in X de shu1 ‘X’s book’), or a verb (as in fei1kuai4 de tiao4 ‘fast jumping’), or a word of another form class (as in tiao4 de kuai4 ‘jumping quickly’).
In addition, there were 18 function words irrelevant to the noun and verb distinction; 116 content words with unambiguous form classes, including 62 verbs, 26 nouns, 16 adjectives, nine adverbs, and three number words; 14 content words that received different form class indications across coders; five children’s names; and 21 filler words without well-defined form classes, including en, ya, and ao (similar to uh, er, um in spoken English), which have been thought to either indicate utterance boundaries or signal a certain message (e.g., emphasis or uncertainty of what is being said; Laserna, Seih, & Pennebaker, 2014).
The identification of a set of noun- and verb-markers by adult participants with linguistic backgrounds does not mean that these markers are used reliably in CDS. Thus, we coded the frequency with which nouns and verbs appeared with the nominated form class markers using transcripts from the CHILDES Beijing corpora. A native speaker of Mandarin Chinese, blind to the purpose of this study, was provided with the whole sentence where the form class marker occurred, and coded the words immediately preceding or following the identified potential form class markers as either verbs, nouns, or words of another form class. For example, for the marker de, which tends to be followed by nouns, the word that immediately follows de was coded as either a noun (the predicted form class: X de shu1 where shu1 is a noun meaning ‘book’), or a verb (the contrasting form class: fe1kuai4 de tiao4 ‘fast jumping,’ where tiao4 is a verb ‘jumping’), or a word of another form class (neither the predicted nor the contrasting form class: tiao4 de kuai4 ‘jumping quickly,’ where kuai4 is an adverb ‘quickly’). Words of another form class also included instances of de either (1) occurring at phrasal boundaries (in this case, the end of an utterance), making it impossible to predict the target form class of the following word, or (2) occurring with words of ambiguous form class, such as shang4 ‘up’ in tiao4 de shang4, which could be either an adverb ‘upward’ or a verb ‘going up.’ Following Packard (2000), we treated Mandarin compound words, consisting of two or more morphological constituents that refer to a new concept, as one word. For example, a noun consisting of a verbal morpheme and a nominal root (e.g., fei1ji1 ‘fly machine’ meaning ‘plane’) was coded as a noun. The Mandarin CDI data suggest that many of the early-acquired words in children’s vocabularies are compound words (Tardif et al., 2008). Another coder similarly coded all the data, yielding an inter-coder agreement of .98.
We analyzed the proportions of the markers occurring adjacent to words of (1) the predicted form class, e.g., de on nouns; (2) the contrasting form class, e.g., de on verbs; and (3) another form class, e.g., de on adjectives, words of ambiguous form classes, and appearing at the end of an utterance. Based on our definition of reliable form class markers, these markers should appear more often with words of the predicted form class than with words of either the contrasting or another form class.
Results and discussion
Do noun- and verb-markers predict the appearance of nouns and verbs in CDS?
We first conducted a repeated measures ANOVA, with noun- and verb-markers combined, to examine the proportions of markers occurring adjacent to words of the predicted form class, the contrasting form class, and another form class. A significant main effect emerged (F(2, 42) = 28.58, p < .001, ηp2 = .58), suggesting that the markers occurred more often with words of the predicted form classes than with other words. Then, verb- and noun-markers were analyzed separately. A Bonferroni-adjusted significance level was used for multiple t test comparisons throughout this study. Planned paired sample t tests showed that verb-markers occurred massively more often with verbs (M = .66, SD = .17) than with nouns (M = .04, SD = .08; t(14) = 10.59, p < .001, Cohen’s d = 2.73), or with words of another form class (M = .29, SD = .13; t(14) = 4.83, p < .001, Cohen’s d = 1.25). However, noun-markers occurred with nouns (M = .34, SD = .14) marginally significantly more often than with verbs (M = .17, SD = .10; t(6) = 2.68, p = .037, Cohen’s d = 1.82 [a significance cutoff level of .025 was used]), but similarly frequently when compared with words of another form class (M = .49, SD = .17; p = .24). Thus, based on our definition for reliable markers, verb-markers appear to be reliable markers but noun-markers do not.
We then examined each noun- and verb-marker used in this study. Only one potential noun-marker, zhei4, did not occur with the predicted form class more often (15%; token frequency = 321) than with the contrasting form class (26%; token frequency = 580) or with another form class (58%; token frequency = 1250). Among the instances where zhei4 occurred with a verb (the contrasting form class), zhei4 was followed by a copula verb (shi4 ‘is’ – token frequency = 508), as in ‘This is …, ’ or was used as a noun phrase preceding a verb (token frequency = 72) as in ‘This runs fast.’ In both cases, the use of zhei4 is grammatical and acceptable in Mandarin. When zhei4 occurred with another form class, it was most often followed by a classifier (token frequency = 888), as in zhei4 ge che1 ‘this classifier car,’ to modify a noun, which is also a grammatical structure in Mandarin.
When are noun-markers reliable form class markers?
When noun-markers were coded as preceding words of another form class, they occurred at the end of an utterance 38% of the time (SD = 30%) including the cases when they preceded an utterance-final filler word (e.g., en, a, ya). Thus, in these instances, noun-markers’ failure to predict an immediately following noun was likely because of pro-drop in Mandarin or the speaker’s intention to convey certain communicative messages (e.g., emphasis or uncertainty of what is being said). With these cases excluded, planned paired sample t tests showed that noun-markers appeared with nouns (M = .44, SD = .14) significantly more often than with verbs (M = .21, SD = .10; t(6) = 2.96, p = .025, Cohen’s d = 1.92), but still similarly often with words of another form class (M = .35, SD = .12; p = .34). Only when zhei4 was excluded from data analysis did paired sample t tests show that noun-markers appeared with nouns (M = .49, SD = .07) more often than with verbs (M = .20, SD = .10; t(5) = 4.71, p = .005, Cohen’s d = 1.92) and with words of another form class (M = .31, SD = .07; t(5) = 4.91, p = .004, Cohen’s d = 2.00). Thus, noun-markers, as a group, were reliable form class markers only when (1) they preceded words with identifiable form classes (i.e., preceding neither an utterance boundary nor an utterance-final filler) and (2) zhei4 was excluded. By contrast, when verb-markers failed to predict the verb form class, only 8% (SD = 12%) of these cases were due to the markers’ appearance at utterance boundaries. This may be due to the fact that Mandarin allows the omission of verb arguments but not verbs. With these cases excluded, verb-markers still appeared with verbs (M = .68, SD = .16) more often than with nouns (M = .04, SD = .08; t(14) = 10.95, p < .001, Cohen’s d = 2.83) or words of another form class (M = .28, SD = .13; t(14) = 5.58, p < .001, Cohen’s d = 1.44).
Do noun- and verb-markers differ in their predictive strength?
We compared the predictive strength of noun- and verb-markers using the data that excluded (1) the cases where markers preceding either an utterance boundary or an utterance-final filler and (2) zhei4, because noun-markers were relatively more reliable form class cues when data were analyzed this way. If this follow-up analysis still shows that verb-markers have greater predictive strength than noun-markers, we can safely conclude that verb-markers can predict form class more reliably than noun-markers. Separate independent samples t tests compared the predictive strength of noun- and verb-markers. Results showed that compared with noun-markers, verb-markers were more likely to occur with words of the predicted form class (Leven’s test: p = .011; t(18.93) = 3.71, p = .001, Cohen’s d = 1.34), and less likely to occur with words of the contrasting form class (t(19) = 3.73, p = .001, Cohen’s d = 1.85). However, the proportion of markers appearing with words of another form class did not differ between noun- and verb-markers (p = .55).
Study 1 showed that Mandarin CDS has reliable markers indicating the noun and verb distinction, with the caveat that the noun-markers are reliable only when they precede words with identifiable form classes (i.e., preceding neither an utterance boundary nor an utterance-final filler) and when zhei4 was excluded. However, it is still unclear whether the use of form class markers can predict the age of acquisition of words.
Study 2
Study 2 examined whether the AoA of a set of words, acquired early by Mandarin-speaking children based on parental reports on the CDI, could be predicted by the frequency with which they appeared with the form class markers analyzed in Study 1. The logic is that words that appear highly frequently with markers may be easier for children to learn than words that are accompanied by these markers less frequently. While we are not investigating how much children know about the meanings of these words, using CDI at least allows us to investigate whether these words appearing frequently with markers tend to be acquired earlier than words appearing with markers less frequently.
If so, there are alternative explanations for why children might have these words in their vocabulary that go beyond the words’ use with these markers. To investigate whether the use of form class markers independently contributed to the variance of a word’s CDI AoA, we also analyzed the relation between a word’s CDI AoA and its token frequency, its imageability, and its probability of appearing at the beginning or end of a sentence, and its being produced in isolation or as a reduplicated word, as Mandarin often allows reduplicating a word (e.g., kan4kan4 ‘look look’ meaning ‘to take a look,’ guo3guo3 ‘fruit fruit’ meaning ‘fruit’). Research has found that presenting words in isolation or at utterance boundaries might facilitate word segmentation and learning (e.g., Golinkoff & Alioto, 1995; Lew-Williams, Pelucchi, & Saffran, 2011; Yurovsky, Yu, & Smith, 2012). Furthermore, reduplicating a word increases the token frequency of the word, which may enhance its learnability (Axelsson & Horst, 2014; Bird, Franklin, & Howard, 2001).
The purpose of including imageability in the data analysis was to determine if the contribution that form class markers make to a word’s AoA variance could remain significant even when other factors that can predict a word’s AoA are included in the analysis. Imageability is defined as ‘the ease with which a word gives rise to a mental image’ (Bird et al., 2001; Paivio, Yuille, & Madigan, 1968). For example, the word apple arouses an image relatively easily and would thus be rated highly imageable. The word, idea, on the other hand, would be rated low in imageability. Imageability is related to semantic notions that a word embodies (Langacker, 1987). For example, the object noun cup refers to an entity with distinguishable boundaries (the top, the bottom and the handle of the cup), thus helping children to detect and segment it from the environment – an essential step for word learning. However, a less imageable word, such as idea, is not characterized as having discernible boundaries, which can complicate the process of detecting and identifying the referent of a word. Research has shown that highly imageable words tend to be learned earlier than less imageable words in both Indo-European languages (Bird et al., 2001; Gilhooly & Logie, 1980; Masterson & Druks, 1998) and Mandarin (Ma et al., 2009). In addition, high imageability also facilitates word reading, word association, picture naming, and word guessing performance in adult subjects (e.g., Gillette, Gleitman, Gleitman, & Lederer, 1999; Snedeker & Gleitman, 2004; Strain, Patterson, & Seidenberg, 1995).
Participants, stimuli, and procedure
Using the CHILDES Beijing corpora (Tardif, 1996), we examined how a set of early-acquired Mandarin words was used in Mandarin CDS. We used 45 nouns and 45 verbs, all of which were acquired between 16 and 25 months of age according to productive CDI AoA data (Tardif et al., 2008), consistent with the average age of the children of the CHILDES Beijing corpora. Based on their CDI AoA data, the 45 nouns and 45 verbs were divided into three AoA stages – early (16–18 months), middle (19–21 months), and late (22–25 months) – each of which had 15 nouns and 15 verbs, ensuring the AoA of the words was evenly distributed. For each month from 16 to 25 months, we aimed to use the five most frequent nouns and verbs (with unambiguous form class ratings) based on the CHILDES Beijing corpora. When there were not enough words available for a certain month, additional frequently used words from neighboring months were used (Tables 2 and 3).
Mandarin-speaking mothers’ use of the 45 early-acquired nouns in CDS according to the Beijing corpora.
Note: Imageability ratings were made on a seven-point scale (1 = not imageable at all; 7 = extremely imageable).
Mandarin-speaking mothers’ use of the 45 early-acquired verbs in CDS according to the Beijing corpora.
Note: Imageability ratings were made on a seven-point scale (1 = not imageable at all; 7 = extremely imageable).
A native Mandarin-speaking graduate student of linguistics (separate from those participating above), blind to the purpose of the study, was provided with the sentences in which these words occurred in the CDS. The student coded each occurrence of these words in CDS for whether (1) it occurred adjacent to the form class markers examined in Study 1; (2) it occurred at the beginning or the end of an utterance; (3) it was produced in isolation as a single-word utterance; or (4) it was produced as a reduplicated word. Although Study 1 showed that zhei4 did not occur with nouns more often than with verbs in CDS, it was still included as a potential noun-marker here as the coders in Study 1 unanimously rated it as indicative of the noun form class. Furthermore, our findings (e.g., the association between a noun’s AoA and the rate of occurring with noun-markers) remain largely unchanged when zhei4 is excluded. The entire data set was recoded independently by another coder yielding an inter-coder agreement of 1.00.
Imageability rating collection
Because no prior imageability ratings existed for the 90 nouns and verbs in Chinese, we collected imageability ratings.
Participants
Thirty Chinese undergraduates (half male) (mean age = 20.30 years; range: 18–22 years) were recruited at the University of Electronic Science and Technology of China. None of them was a language or linguistics major.
Stimuli and procedure
We collected imageability ratings using the same procedure and instructions as in Ma et al. (2009). The Chinese word sample contained 90 early-acquired words (45 nouns, 45 verbs – Tables 2 and 3) and 94 words (47 nouns, 47 verbs) from adults’ vocabularies. The adults’ words were among the 500 most frequently used Chinese words collected from an online corpus (Chinese Text Computing; http://lingua.mtsu.edu/chinese-computing) based on modern Chinese literary texts that originally appeared in print (Da, 2004). The adults’ words did not appear in the Chinese CDI and served as the baseline for the participants to give imageability ratings that could vary across words. Imageability ratings were made on a seven-point scale (1 = not imageable at all; 7 = extremely imageable), translated into Chinese. Words were presented in a semi-random order based on the conditions that neither early-acquired words nor adults’ words (and neither nouns nor verbs) appeared on more than four consecutive trials. In addition, the presentation order of words was counterbalanced across participants. A word’s imageability score was calculated as its average imageability rating across all participants. These scores are shown in Tables 2 and 3 for the early-acquired words.
Results and discussion
How often do the words occur with form class markers in CDS?
Independent samples t tests showed that verbs (M = 282.56, SD = 346.69) had a higher word token frequency than nouns (M = 60.78, SD = 86.22; t(88) = 4.16, p < .001, Cohen’s d = .88), likely due to the fact that Chinese is a pro-drop language that allows the omission of subjects and objects, which may increase the frequency of verb use. Furthermore, verbs (M = 74.82, SD = 99.98) had a higher marker token frequency than nouns (M = 12.49, SD = 21.70; t(88) = 4.09, p < .001, Cohen’s d = .86), likely due to the higher token frequency of verb use. The proportion with which a word occurred adjacent to its form class markers was calculated by dividing its marker token frequency by word token frequency. An independent samples t test showed that compared with nouns (M = .20, SD = .12), verbs (M = .26, SD = .19) were marginally more likely to occur with the corresponding form class markers (t(88) = 1.74, p = .086, Cohen’s d = .38).
Does the mothers’ use of form class markers differ according to their educational levels?
Research has shown that mothers’ production of CDS may differ according to their education (Hart & Risley, 1995). Of the 10 families in the corpora, five children had parents whose education was at the college level or higher, and the other five children (4 boys, 1 girl) had parents whose education was at the high school level or lower. Within each education group, we calculated a word’s token frequency, marker token frequency, and proportion of occurring adjacent to its form class markers for the 45 nouns and 45 verbs, respectively. Three separate two-way, 2 (education: more-educated, less-educated) × 2 (form class: noun vs. verb) ANOVAs analyzed these variables. An adjusted p value was used for multiple ANOVA analyses. A significant main effect of form class emerged in the analyses of word token frequency (F(1, 179) = 32.27, p < .01, ηp2 = .16) and marker token frequency (F(1, 179) = 29.38, p < .01, ηp2 = .14), respectively, indicating that verbs had a higher word token frequency and a higher marker token frequency than nouns. A marginally significant main effect of form class emerged in the analysis of words’ proportion of occurring adjacent to their form class markers (F(1, 179) = 4.93, p = .028, ηp2 = .03 [a significance cutoff level of .017 was used]). However, neither the main effect of education (word token frequency: p = .17; marker token frequency: p = .26; words’ proportion of occurring adjacent to their form class markers: p = .51) nor the Education × Form Class interaction (word token frequency: p = .59; marker token frequency: p = .48; words’ proportion of occurring adjacent to their form class markers: p = .95) approached significance. Thus, words’ token frequency, marker token frequency, and proportion of occurring with their form class markers did not differ according to mothers’ educational level (Table 4). However, caution should be taken in interpreting the generalizability of this finding because of the small sample size of mothers (n = 10) in the Beijing corpora.
The more- and less-educated Mandarin-speaking mothers’ use of the 90 early-acquired words in CDS according to the Beijing corpora.
Are there other cues to nouns and verbs in CDS?
We divided a word’s token frequency of occurring at the beginning of an utterance, occurring at the end of an utterance, being produced in isolation, and being produced as a reduplicated word by its total token frequency, respectively. Then, four separate independent samples t tests examined whether nouns and verbs differed in these proportions. Three significant results emerged. First, verbs (M = .22, SD = .18) occurred at the beginning of an utterance more often than nouns (M = .07, SD = .06; t(88) = 5.73, p < .001, Cohen’s d = 1.12). Second, nouns (M = .27, SD = .13) occurred at the end of an utterance more often than verbs (M = .13, SD = .10; t(88) = 5.75, p < .001, Cohen’s d =1.21). Third, nouns (M = .05, SD = .08) were produced in isolation more often than verbs (M = .02, SD = .04; t(88) = 2.53, p = .012, Cohen’s d = .47). Furthermore, verbs (M = .018, SD = .043) were marginally more likely to be reduplicated than nouns (M = .004, SD = .023; t(88) = 1.86, p = .07); however, we suggest caution in interpreting this result because neither verbs nor nouns were reduplicated often (less than 2%) in the Beijing corpora.
Is a word’s usage in CDS and imageability related to its CDI AoA?
Separate bivariate correlational analyses examined the relation between a word’s CDI AoA and its word token frequency, imageability, marker token frequency, proportions of appearing adjacent to markers, at the beginning of an utterance, at the end of an utterance, in isolation, and as a reduplicated word. Results showed that CDI AoA was negatively correlated with word token frequency for nouns (r(45) = −.52, p < .001) and verbs (r(45) = −.60, p < .001), imageability ratings for nouns (r(45) = −.37, p = .012) and verbs (r(45) = −.42, p = .004), marker token frequency for nouns (r(45) = −.50, p = .001) and verbs (r(45) = −.56, p < .001), and proportions of words appearing with their markers for nouns (r(45) = −.33, p = .025) and verbs (r(45) = −.35, p = .017). These findings suggest that the more frequently words are used and appear with their markers, the earlier the words tend to be acquired by children. Furthermore, for verbs, CDI AoA was marginally, positively correlated with the proportion of words appearing at the beginning of an utterance (r(45) = .29, p = .055). This suggests that although Mandarin allows the omission of subjects, the presentation of verbs at the beginning of an utterance may be related to a later AoA of the verbs. Other than this marginally significant correlation, CDI AoA was unrelated to the proportion of words appearing at the beginning of an utterance (for nouns), or the proportions appearing at the end of an utterance, in isolation, or as a reduplicated word for both nouns and verbs. Given that words with high token frequency also tended to have high marker frequency (noun: r(45) = .92, p < .001; verb: r(45) = .78, p < .001), and that word token frequency was related to CDI AoA, we used words’ proportion of occurring adjacent with form class markers for further analyses.
Does the use of form class markers independently contribute to the variance of a word’s CDI AoA?
Separate hierarchical regression analyses were performed for nouns and verbs, with CDI AoA as the dependent variable and word token frequency entered in step 1, imageability entered in step 2, words’ proportion of appearing adjacent to form class markers entered in step 3, and proportion of appearing at the beginning of an utterance in step 4 (only for verbs). In step 1, word token frequency accounted for 25.4% of the CDI AoA variance for nouns and 34.4% of the CDI AoA variance for verbs (p’s < .001). In step 2, word token frequency and imageability together accounted for 32.6% for the CDI AoA variance for nouns and 46.6% for verbs (p’s < .001). In step 3, word token frequency, imageability, and the use of form class markers together accounted for 44.3% of the CDI AoA variance for nouns and 60.6% for verbs (p’s < .001). The use of noun-markers (ΔR2 = .12, p = .003) and verb-markers (ΔR2 = .14, p < .001) explained a significant increase in the CDI AoA variance, respectively. This finding suggested that the use of form class markers had predictive value beyond word token frequency alone for both nouns and verbs. Independent contributions were further evaluated through the interpretations of squared partial coefficients (pr2) (Tabachnick & Fidell, 2007). Word token frequency, imageability, and use of form class markers uniquely accounted for 19.5%, 11.4%, and 12.5% of the CDI AoA variance for nouns, and 28.8%, 16.4%, and 14.4% of the CDI AoA variance for verbs. Thus, the findings suggest that the use of form class markers facilitates the learning of nouns and verbs, above and beyond the frequency of the word and its imageability (see Table 5). In step 4, the verbs’ appearance at the beginning of an utterance did not independently contribute to the CDI AoA variance of verbs (p = .72).
Hierarchical multiple regression of words’ token frequency, imageability, and proportion occurring with form class markers in predicting CDI AoA of nouns and verbs.
p < .001; *** p < .01; * p < .05.
General discussion
This is a corpus study exploring the applicability of syntactic bootstrapping in distinguishing between nouns and verbs in Mandarin CDS. This study examined whether there are reliable morphosyntactic cues to distinguish between nouns and verbs in Mandarin CDS, and whether the use of these morphosyntactic cues in CDS can predict the age of acquisition of nouns and verbs. In Study 1, we first generated a list of the 200 most frequent words from the CHILDES Beijing corpora, and asked adult Mandarin speakers to identify potential noun- and verb-markers from this list. We then examined the occurrence of the identified form class markers in CDS and found that they tend to appear with words of the predicted form class. Furthermore, verb-markers seem to have stronger predictive strength than noun-markers. Study 2 examined a set of early-acquired nouns and verbs by noting how often they occurred with these form class markers in the CHILDES Beijing corpora. The relation between the occurrence of form class markers and words’ CDI AoA was analyzed. Results showed that the use of form class markers independently contributed to words’ CDI AoA variance. Thus, the current findings align with syntactic bootstrapping theory and suggest that morphosyntactic cues to the noun and verb distinction are available in Mandarin CDS. Syntactic bootstrapping is theoretically available as a tool for Chinese children to learn nouns and verbs.
How do infants identify form class markers in CDS?
Two types of information may be useful to children in identifying form class markers. First, children may identify form class markers according to the input frequency. Since languages tend to have only a limited number of function words, their frequency of occurrence is much higher than that of content words. For example, the 20 most frequent words in English CDS are mostly function words except for proper names (e.g., Mummy, Daddy) and non-referential interjection (e.g., look! – Hochmann, Endress, & Mehler, 2010). The high frequency of function words is also observed in Japanese, Italian, and Mandarin (Italian and Japanese – Gervain, Nespor, Mazuka, Horie, & Mehler, 2008; Mandarin and Turkish – Shi, Morgan, & Allopenna, 1998). Based on the findings that 17-month-old children could use word frequency to identify potential function words (Hochmann et al., 2010), and even 7-month-old infants were found sensitive to the frequency cue (Gervain et al., 2008), it is possible that the high token frequency of Mandarin form class markers can also help Mandarin-learning children identify these markers.
Second, infants may identify function words based on phonological properties (Cutler, 1993; Shi et al., 1998; Shi, Werker, & Morgan, 1999). Compared to content words, function words are often shorter in duration, simpler in phonological structure, and unstressed in prosody (e.g., Selkirk, 1996). Infants are highly sensitive to this prosodic difference, as even neonates can categorically discriminate function and content words based on prosodic salience (Shi et al., 1999). Although this early sensitivity does not necessarily mean an early understanding of the meaning of function words, the unique distributional and acoustic properties of function words may guide infants to attend to these items, thus leading to the understanding that function words differ from content words.
Why does the use of form class markers facilitate word acquisition?
There are three possible explanations, none of which is mutually exclusive. First, form class markers may facilitate speech segmentation – one of the prerequisites for word learning (e.g., Jusczyk, 1999; Jusczyk & Aslin, 1995). High frequency words may act as ‘anchors’ that establish a reference point for analysis of the neighboring items (Bortfeld, Morgan, Golinkoff, & Rathbun, 2005; Valian & Coulson, 1988) and the identification of phrases (Gerken, Landau, & Remez, 1990; Mintz, 2003; Morgan, Meier, & Newport, 1987). Research has found that function words facilitated infants’ speech segmentation under experimental conditions (e.g., Höhle & Weissenborn, 2003; Shi, Cutler, Werker, & Cruickshank, 2006; Shi & Lepage, 2008). For example, in Shi and Lepage’s study (2008), 8-month-old French-learning infants were first familiarized with a determiner preceding a novel noun (e.g., des preuves) versus a nonsense syllable preceding another noun (e.g., kes sangles). Then, when infants heard the novel nouns in isolation at test, they preferred preuves over sangles, suggesting that the determiner des facilitated the segmentation of the adjacent word. A similar pattern of results was observed in 8- and 11-month-old English-learning infants (Shi et al., 2006).Furthermore, French-learning infants could use functional suffixes to segment lexical word roots at 11 months of age (Marquis & Shi, 2012), and infants could use function words to place the phrasal boundary according to their native language (e.g., function words tend to occur at the beginning in Italian and at the end in Japanese; Gervain et al., 2008). Perhaps form class markers may also facilitate word segmentation in Mandarin-learning children.
Second, form class markers may help young children distinguish between function words and content words – an important, initial step for learning nouns and verbs. Compared with content words, function words tend to be shorter, phonologically simpler, and unstressed, and usually have higher token frequencies in child speech input (e.g., Selkirk, 1996). This is also the case in Mandarin CDS (Shi et al., 1998). The acoustic properties and high input frequency may assist Mandarin-learning children in identifying content words in the speech stream even before they have separate semantic categories for nouns and verbs. It should be noted that the analysis of words’ AoA data does not allow us to determine whether children understand the meaning or the form classes of the early-acquired words. Nevertheless, the identification of content words is essential for word acquisition.
Third, Mandarin-learning children may use form class markers to distinguish between nouns and verbs in Mandarin. A recent study revealed emerging categorization of nouns and verbs in Mandarin-learning 12-month-olds based on the use of form class markers (Zhang, Shi, & Li, 2015), although this study did not speak to word learning per se. Furthermore, Mandarin-speaking 5-year-olds successfully interpreted a novel word as either a verb or a noun depending on the form class marker and with the aid of extra-linguistic cues (Imai et al., 2008). In another study, Mandarin-speaking 3- and 5-year-olds could distinguish between ongoing and completed events solely based on the use of aspect markers – zhe or le (Zhou et al., 2014). However, it should be noted that the acquisition of Mandarin classifiers (noun-markers) continues even beyond 6 years of age (e.g., Cheung, Barner, & Li, 2010). Thus, the developmental trajectory of the use of form class markers in Mandarin still needs further examination.
These explanations may not be mutually exclusive. As the Emergentist Coalition Model (ECM) states (Hollich, Hirsh-Pasek, & Golinkoff, 2000), children are surrounded by multiple types of inputs to language acquisition in the form of perceptual, social, and linguistic information. These inputs are differentially weighted over development such that children first rely on perceptual cues, then social cues, and finally linguistic cues in the service of word learning. Thus, these morphosyntactic cues may initially serve as perceptual cues for speech segmentation in infants and then gradually become linguistic cues to the distinction between form classes as children’s knowledge of grammatical categories develops. However, all three explanations are mere conjectures. Future studies should investigate Mandarin-learning infants’ use of form class markers in speech segmentation and older children’s capacity to use these markers to distinguish between novel nouns and verbs under experimental conditions.
Word acquisition is not error-free
Based on the current findings, word acquisition should be exceptionally difficult when words do not occur with appropriate form class markers. Although this study found that the form class markers can indicate the noun and verb distinction in Mandarin CDS, they are not infallible. In addition, among the set of early-acquired words tested, verbs and nouns occur with their corresponding form class markers only 26% and 20% of the time, respectively. Thus, Mandarin words often occur in CDS without being accompanied by correct form class markers. Thus, an important question arises: How do children distinguish between nouns and verbs without form class cues?
There are three additional types of information Mandarin-learning children may use to distinguish between nouns and verbs. First, nouns and verbs have different distributional properties in Mandarin CDS. For example, we found that verbs tend to occur more often at the beginning of an utterance than nouns, perhaps due to the fact that Mandarin allows the omission of subjects; furthermore, nouns tend to appear more often at the end of an utterance and in isolation than verbs. These distributional cues can reliably distinguish between nouns and verbs. However, the finding that their occurrence is not related to words’ AoA suggests caution in interpreting the influence of these distributional cues on Mandarin word acquisition.
Second, there are additional syntactic and semantic bootstrapping tools for the noun and verb distinction. Lee and Naigles (2005), for example, found that, for Mandarin, a postverbal noun phrase (NP) is a reliable cue for a transitive verb, as 83% of the utterances with a postverbal NP occurred with transitive verbs. Thus, the fact that nouns and verbs tend to be neighboring words may make them reliable cues to the existence of the other class. For example, a novel word preceding a familiar noun is likely to be a verb, while a novel word following a familiar verb is likely to be a noun. Children may note these relationships. Furthermore, a novel word following an adjective is likely a noun, while a novel word preceding a prepositional phrase is likely a verb. With increasing word knowledge, noting these relationships may be an increasingly powerful bootstrapping tool facilitating word acquisition.
Third, Mandarin-speaking children may turn to extra-linguistic cues – like social cues – to identify the form class of novel words, as Imai et al. (2008) found. Removing a one-second segment of object-holding from the beginning of each videotaped scene allowed the 5-year-olds in their study to interpret the novel word as a verb, suggesting that Mandarin-learning children are highly sensitive to extra-linguistic cues in distinguishing nouns versus verbs.
Why do Mandarin children appear to have a verb learning advantage?
Based on CDI data, Chinese-learning children’s early vocabularies have a much higher proportion of verbs than English-learning children’s early vocabularies do (Tardif et al., 2008; Tardif et al., 1997). In addition, parental reports on the CDI reveal a considerable difference between the number of verbs learned by Chinese and English children. For example, at 16 months of age, only three of the 100 most frequent words in English are verbs, while a full 27 of the first 100 words for Chinese 16-month-olds are verbs (Tardif et al., 2008). Research has proposed several possible explanations for the Chinese children’s relative verb advantage. First, this may be related to the high frequency of verb use in Mandarin CDS. Indeed, Chinese-speaking caregivers produce both more verb types and tokens than English-speaking caregivers (Tardif et al., 1997), which should be related to the fact that Mandarin is a pro-drop language. Thus, with the omission of subjects and objects acceptable in Mandarin, the same amount of CDS production (in terms of words) should contain more verbs in Mandarin than in English. Second, early-acquired Chinese verbs tend to be highly imageable and refer to highly specific meaning (Ma et al., 2009). For example, while carry/hold in English describes a wide range of carrying/holding behavior, Chinese distinguishes between over 26 different kinds of carrying/holding, each with a distinct verb. High imageability and meaning specificity may be related to a less variable set of exemplars, which may help children find the similarities across different action exemplars and thus facilitate the action category formation process – a prerequisite for verb learning. Third, Mandarin is pragmatically biased towards verbs while English may be biased towards nouns. For example, Tardif et al. (1997) observed that in answering questions, in some contexts where English allows nouns as answers, Chinese requires verbs. For example, to the question, ‘Do you want to drink some more juice?’, an English-learning child can answer more or juice, but their Mandarin-learning counterparts should answer yao4 ‘want’ or he1 ‘drink.’
This study suggests a new explanation for the high learnability of early verbs in Mandarin. The use of noun- and verb-markers may also contribute to the verb learning advantage as we found that (1) the token frequency of verb-markers is higher than that of noun-markers; (2) noun-markers reliably predict the noun form class only when those items occurring at utterance boundaries and with utterance-final fillers were removed and zhei4 was excluded; (3) verb-markers are more likely to occur with words of the predicted form class and less likely to occur with words of the contrasting form class than noun-markers; and (4) the proportion of the early-acquired verbs occurring with their corresponding form class markers is marginally higher than that of the early-acquired nouns. Thus, the reliability of verb-markers may provide Mandarin-speaking children with additional assistance in verb acquisition. As the verb is the center of the sentence, identifying it may assist children in analyzing the meaning of the other words in the sentence.
This study correlated the normative CDI AoA data and the speech production data of the CHILDES corpora. Since the two sets of data were collected from two different groups of caregivers, an important question then arises: Does verb use remain relatively stable across caregivers? Research on language acquisition often assumes that verb use is relatively stable across speakers. Thus, one can estimate a particular verb’s usage across a large corpus of speakers, and examine words’ AoA or syntactic diversity across different children’s speech input samples (e.g., Naigles & Hoff-Ginsberg, 1998). This study revealed a correlation between word token frequency in the CHILDES Beijing corpora and the normative AoA data. In addition, the association between input frequency and word learning has also been shown in research examining parent–child dyads (Hart, 1991; Lieven, 2010; Naigles & Hoff-Ginsberg, 1998; Theakston, Lieven, Pine, & Rowland, 2004). These findings suggest that verb use is relatively stable across caregivers. Nevertheless, future research should examine the stability of marker use across parents in the CHILDES Beijing corpora.
Conclusion
Using the CHILDES Beijing corpora, we found that Mandarin has reliable morphosyntactic markers distinguishing nouns versus verbs in CDS. In addition, the use of the form class markers independently contributed to the AoA variance of early-acquired nouns and verbs. Thus, this study supports syntactic bootstrapping theory in suggesting that there is enough information in how these parts of speech are presented to aid in their acquisition.
Footnotes
Author contributions
Weiyi Ma and Peng Zhou contributed equally to this work.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Weiyi Ma is supported by the University of Arkansas Startup Fund and the Provost’s Collaborative Research Grant. Peng Zhou is supported by Tsinghua University Initiative Scientific Research Program (2016THZWLJ14). Roberta Michnick Golinkoff and Kathy Hirsh-Pasek are funded by joint grants (NSF: SBR9615391; SBR-990-5832; 0642632; NIH: R01-HD15964; RO1HD050199; NICHHD: R01-HD19568; IES: R305A100215).
