Productive use of syntactic categories in typical young French children

Abstract

In this corpus study, it is asked whether young children speaking European French build their early syntax around grammatical or lexical words. Specifically, the study examines the relationship of grammatical and lexical words in three types of syntactic structures (determiner–noun, pronoun–verb and subject pronoun–verb). The corpus included 315 samples from children aged 24–48 months, a period of rapid growth in grammatical morphology and syntax. The results of a series of stepwise multiple regression analyses indicate that prepositions and auxiliaries explain the unique variance in determiner–noun and determiners and prepositions explain the unique variance in pronoun–verb and subject pronoun–verb combinations better than lexical categories. All these strong predictors support the view that grammatical words guide and facilitate syntactic knowledge. Early grammar is based not on a lexicon but on basic grammatical relationships that young children build gradually, making use of the formal distributional properties of their native language.

Keywords

Corpora French grammatical words lexical words syntactic categories syntax

Introduction

The issue of whether young children build their early syntax around grammatical or lexical words from their target language is still debated. On one hand, the construction of syntactic structures has been argued as being too complex and too idiosyncratic to be acquired by infants on the sole basis of the sentences they hear. In this view, syntactic acquisition relies on innate constraints (Fisher, 2002; Gleitman, 1990; Lidz, Waxman, & Freedman, 2003; Naigles, 1990, 2002). The child’s construction of early syntactic structures would be similar in kind to the adult’s and would become functional as he/she learns words to fill the syntactic categories. On the other hand, constructivists argue that children start without syntax. Their first utterances are limited to specific word strings, produced by rote. Infants construct lexical categories such as ‘noun’ and ‘verb’ and learn the specific syntactic computations of their mother tongue by generalizing from these fixed utterances, using their general learning capacities and social skills (Lieven, Behrens, Speares, & Tomasello, 2003; Tomasello, 2000; Tomasello & Abbot-Smith, 2002). This can only happen once a ‘critical mass of exemplars’ has been reached, at around 3 years of age. In French, this difference between the noun and the verb categories has been confirmed by Paradis, Crago, and Genesee (2006) who demonstrated that monolingual and bilingual French-speaking 3-year-olds produced more determiners than object pronouns, despite the fact that the forms of the two functional categories and the position with regard to nouns and verbs are exactly the same (e.g. ‘la porte’ (the door) vs ‘il/elle la porte’ (he/she brings it). Thordardottir (2005) followed the lexical and syntactic development of Quebec French- and English-speaking children from age 1;9 to 3;11 and reported normative information about the development of grammatical inflections. She looked at the type of errors produced by children and showed that fewer errors and omissions occurred in samples of French-speaking children than English-speaking children. She also demonstrated that whereas English children tended to develop larger vocabulary sizes, French-speaking children produced longer mean lengths of utterance (MLUs). The use of auxiliary and modal verbs appears later than noun grammaticalization (Bassano, 2000).

The productive use of grammatical words in early syntax has played a central role in this debate. For infants and toddlers, learning syntax involves simple and complex relationships like dependencies between words and their morphological properties as well as non-adjacent dependencies across phrase boundaries. They acquire syntax with a restricted set of utterances extracted from their language input (Ninio, 2011). They make use of the formal distributional properties of their native language and build a generalized knowledge of grammar very early on. Language input not only influences the generalization of construction schemas (Bybee, 2010) but also makes it easier to store the word forms and word sequences (Croft & Cruse, 2004).

Another reason related to this debate is the way that syntactic categories occur in a sentence. Although grammatical categories have traditionally been thought not to be cognitively processed by infants and toddlers, experiments in perception have demonstrated that infants are sensitive to early grammar (Gerken & McIntosh, 1993; Hallé, Durand, & de Boysson-Bardies, 2008; Shi & Gauthier, 2005; Shi, Werker, & Cutler, 2006). Young children have grammatical knowledge of their mother tongue by the end of their first year and already associate articles with nouns before the age of 18 months (Höhle, Weissenborn, Kiefer, Schulz, & Schmitz, 2004; Melançon & Shi, 2015; Shi & Melançon, 2010). Studies also show that they associate pronouns with verbs by the age of 18 months (Cauvet et al., 2014) and can exploit the syntactic context of a new word to infer its syntactic category by the age of 2 years (Bernal, Lidz, Millotte, & Christophe, 2007). Very early on, French children are able to distinguish appropriate grammatical categories – a definite article for nouns, a third person subject pronoun for verbs or even to recognize nonword forms produced in noun or verb syntactic contexts e.g. ‘la coupile va conduire’ (the nonword will drive) (Van Heugten & Shi, 2009) or ‘regarde! elle dase’ (Look! she nonword) (Bernal et al., 2007).

The syntactic structures that specify the relationships within and across grammatical words might be considered the building blocks of the sentence in which lexical words providing the basic content (car, table, dog, put, etc.) are inserted. Nouns frequently have a referential function whereas verbs have a predicative function referring to meanings related to sensory experiences, cognitive events or mental states. In contrast, given their highly predictable distribution, grammatical words might provide the formal architecture or the skeleton of the sentences (e.g. determiners, prepositions and pronouns).

The way in which young children easily succeed in extracting word form classes from the speech they hear in their daily experience and the role that grammatical words play in syntactic development are not fully understood. French is an excellent test case to explore this issue because grammatical words show a certain morphological richness. Bound morphemes are inflected for gender (masculine or feminine) and number (singular or plural). Most of them mark gender, number and person information (e.g. case information for pronouns), resulting in a wide variety of words within these classes. For instance, the four types of determiners and the eight types of pronouns reflect use in different syntactic contexts. The determiners include three demonstratives ‘ce’ (this) ‘cet/cette’ (this) ‘ces’ (those); four generalized determiners ‘quelque’ (any), ‘quelle’ (which), ‘quel’ (what), ‘chaque’ (each); 13 types of possessives ‘sa’ (her), ‘son’ (his), ‘ma’ (my), ‘mon’ (my), ‘leur’ (their), ‘ton/ta’ (your), ‘ses’ (their), ‘mes’ (my), ‘notre’, ‘nos’ (our), ‘votre’, ‘tes’, ‘vos’ (your); and six definite/indefinite articles ‘la, le, les, l’ ’ (the), ‘un, une’ (a). The eight types of pronouns include subject, object, relative, interrogative, reflexive, demonstrative, indefinite and a specific pronoun with the ‘y’ (it). Young children speaking French might learn to encode formal grammatical agreements such as gender when they acquire determiner–noun combinations, e.g. ‘la voiture’ (the car) vs ‘le bébé’ (the baby), or might learn to encode case and tense markers when they acquire subject pronoun–verb combinations, e.g. ‘il va prendre sa voiture’ (he is go-ing to take his car) vs ‘elle va prendre sa voiture’ (she is go-ing to take her car). Agreement categories like gender, number and/or person are considered to be very easy for native French learners. Between 24 and 48 months, most typically developing children speaking French have mastered all the basic morphological inflections, function words and syntactic structures of their native language. Distributional and correlational analyses between syntactic categories produced by children and adults suggest that children’s word combinations are adultlike (Parisse & Le Normand, 2000b, 2001). Young French children are even able to retrieve the meaning of homophonous words, which in adult language are either nouns or verbs, e.g. ‘une ferme/je ferme’ (a farm/I close) (Veneziano & Parisse, 2017).

In this corpus study, we ask whether young children speaking European French build their early syntax around grammatical or lexical words and which type of information guides their use of grammatical words. Specifically, we examine the relationship of grammatical words in three types of syntactic structures (determiner–noun, pronoun–verb and subject pronoun–verb) because these structures are considered to be the building blocks of early syntax. Given that all pronouns including subject pronouns tends to precede the verb and are related to morphological features for case, agreement, number and tense, for our analysis subject pronoun–verb combinations are separated from all other pronoun–verb combinations. The main reason is to examine the morphosyntactic strategies for encoding and decoding different types of pronouns in several syntactic contexts. It is expected that pronoun–verb and subject pronoun–verb will encode and decode word order and morphological features similarly.

The corpus included 315 samples from children aged 24–48 months, a period of rapid growth in grammatical morphology and syntax. We hypothesize that children make use of the formal distributional properties of their native language and build a generalized knowledge of grammar very early on. If early grammar is guided by high-frequency words and morphemes, by phonological saliency and by the regularities of the target language, we can predict that determiner–noun, pronoun–verb and subject pronoun–verb will be more strongly associated with grammatical words than with lexical words. If this is the case, then grammatical words and not lexical words should be considered the building blocks of early syntax.

Method

Participants

The participants in this study were 315 typically developing children (146 girls and 169 boys) ranging in age from 2 to 4 years. Participants were recruited from homes and nurseries in the Paris area, France. Inclusion criteria were (1) passing an auditory screening test, (2) scoring in the normal range on an age-appropriate nonverbal cognitive test (Symbolic Play Test; Lowe & Costello, 1976) and (3) being a French native speaker. The participants’ sociocultural level was also assessed using the classification developed by Desrosières, Goy, and Thévenot (1983), taking into account the family income, the father’s occupation and the mother’s level of education.

Data collection, recording, transcription and sampling procedure

Data were obtained using a 20-minute spontaneous speech sampling method. Each child was individually video-recorded with his or her mother and two female experimenters during the same play context: five figurines (two adult-sized, two child-sized and one baby), one dog, 11 pieces of furniture (two tables, four chairs, two armchairs and three beds) and three figurative objects (stairs with a mobile door, a garage with a sliding door, and a front door bell).

Transcriptions were orthographically made in accordance with the CHAT format (MacWhinney, 2000, https://childes.talkbank.org). After tagging all utterances with the French MOR program (Le Normand, Moreno-Torres, Parisse, & Dellatolas, 2013; Le Normand, Parisse, & Cohen, 2008; Parisse & Le Normand, 2000a), the type/token frequency list of syntactic categories was extracted from the corpus. A fra.cut script for KidEval with the output directly to spreadsheets was used to make the type/token frequency for the three types of syntactic structures investigated (determiner–noun, pronoun–verb and subject pronoun–verb). To avoid any ambiguities about the definition of a category, the tagging quality of the present corpus was checked by hand and averaged 97%. The coding was checked by a research assistant using the Grevisse manual (Grevisse & Goosse, 2011) and discussed until complete agreement was reached. The trained raters were blind to group membership and the inter-rating agreement of the transcribed material was 95%.

Results

The token frequency of the first 100 words listed in Table 1 revealed that all syntactic categories were used. We note that four grammatical categories were productively used with the highest frequencies: (1) subject pronouns, (2) auxiliaries ETRE (BE), (3) determiner articles and (4) demonstrative pronouns.

Table 1.

Token frequency of the first 100 words from the French corpus.

Rank	French word	English translation	Part of speech	Token frequency
1	il	he	subject pronoun	5676
2	être	to be	auxiliary	4986
3	la	the	article determiner	3639
4	ce	this	demonstrative pronoun	3401
5	là	there	locative adverb	3276
6	le	the	article determiner	2916
7	aller-PRES&3s	gonna	modal	2863
8	pas	not	negation adverb	2259
9	avoir	to have	auxiliary	2242
10	ça	that	demonstrative pronoun	2145
11	je	I	subject pronoun	1443
12	on	it	subject pronoun	1400
13	oh!	oh!	communicator	1286
14	y	it	specific pronoun	1271
15	dans	in	preposition	1259
16	elle	she	subject pronoun	1259
17	et	and	conjunction	1195
18	les	the	article determiner	1179
19	non!	no!	communicator	1159
20	un	a	article determiner	1158
21	de	of	preposition	1158
22	oui!	yes!	communicator	1068
23	à	to	preposition	998
24	voilà	there is	locative adverb	915
25	voiture	car	noun	898
26	tu	you	subject pronoun	880
27	une	a	article determiner	854
28	maman	mommy	noun	850
29	bébé	baby	noun	822
30	ah!	ah!	communicator	769
31	de&les	to the	preposition	737
32	se	himself	reflexive pronoun	722
33	le	the	article determiner	716
34	pour	for	preposition	707
35	où	where	interrogative pronoun	670
36	moi	I	strong pronoun	661
37	maison	house	noun	634
38	lit	bed	noun	590
39	mettre-INF	to put	verb	569
40	garage	garage	noun	548
41	papa	daddy	noun	537
42	fait-PP&m	done	past tense	505
43	être&PRES&3s	he is	copula	501
44	manger-INF	to eat	verb	499
45	ici	here	locative adverb	488
46	porte	door	noun	488
47	chien	dog	noun	486
48	aller&INF	to go	modal	469
49	petit	small	adjective	460
50	encore	more	adverb	457
51	table	table	noun	457
52	parce	because	conjunction	444
53	chaise	chair	noun	423
54	faire-INF	to get	modal	412
55	regarder-IMP&2s	look at!	verb	403
56	tomber-PP&m	fallen	past tense	392
57	bonhomme	people	nouns	378
58	hein!	what!	communicator	372
59	dormir-PRES&3s	he/she sleeps	verb	370
60	qui?	who?	interrogative pronoun	364
61	mettre-PRES&3s	he puts	verb	361
62	que	that	relative pronoun	355
63	autre	other	strong pronoun	355
64	falloir&PRES&3s	he/she can	modal	354
65	aller&PRES&3p	they go	modal	346
66	dormir-INF	to sleep	verb	344
67	maintenant	now	adverb	342
68	y’en	there	pronoun	335
69	aller&PRES&1s	I go	modal	328
70	à	at	preposition	326
71	comme ça	like that	adverb	323
72	alors	so	adverb	318
73	ils	they	subject pronoun	317
74	attendre-IMP&2s	wait!	verb	316
75	dedans	into	locative	314
76	de&le	of the	preposition	313
77	fermer-INF	to close	verb	311
78	dodo!	night-night!	communicator	310
79	avec	with	preposition	307
80	poussette	stroller	noun	304
81	avoir&PRES&1s	I have	auxiliary	299
82	le	him	object pronoun	295
83	deux	two	numeral	291
84	lui	to him	indirect pronoun	289
85	aussi	too	adverb	287
86	pouvoir&PRES&3s	he can	modal	287
87	être&PRES&3p	they are	auxiliary	283
88	sa	her	possessive determiner	282
89	après	after	adverb	251
90	avoir&PRES&2s	you have	auxiliary	250
91	ouais!	yes!	communicator	247
92	que	that	adverb	246
93	ça-y-est!	is it!	communicator	244
94	mais	but	adverb	237
95	cheval	horse	noun	236
96	hop!	hop!	communicator	233
97	plus	more	quantifier	229
98	ben!	um!	communicator	227
99	aller&PRES&2p	you go	modal	224
100	escalier	staircase	noun	214

The following CHILDES coding for tense markers was used: -INF = infinitive form, -PP = past tense, -IMP&2s = second person singular imperative, -IMP&3s = third person singular imperative, &PRES&2s = second person singular present, &PRES&3s = third person singular present, &PRES&2p = second person plural present, &PRES&3p = third person plural present.

The description of the token/type frequency by syntactic categories with illustrative examples is presented in Table 2. Grammatical words included modals and auxiliaries, four types of determiners, eight types of pronouns and 30 types of prepositions. Lexical words included nouns, verbs, adjectives and three types of adverbs. Overall, 99,358 word-tokens were used as independent variables in this study.

Table 2.

Token/type frequency by syntactic categories (99,358 tokens/1739 types).

Syntactic categories	Tokens	Types	Examples
Nouns	18,005	995	lit (bed)
Subject-pronouns^b	11,100	9	je (I), tu (you), il (he), elle (she), nous (we), vous (you), ils, elles (they), on (one)
Article-determiners^a	10,462	6	le, la, les, l’ (the), un, une (a)
Finite verbs	9511	267	dort (sleep)
Auxiliaries^c	8510	25	a (has)
Prepositions	6510	32	dans (in)
Modals^c	6414	65	va (goes)
Demonstrative pronouns^b	5928	13	ça (that)
Locative adverbs^d	5524	26	là (there)
Adverbs^d	4873	85	maintenant (now)
Negative adverbs^d	2424	7	ne…pas (not)
Adjectives	1958	141	petit (small)
Strong pronouns^b	1646	15	moi (me), autre, toi, tout le monde, chacun, même, plusieurs …
Specific pronouns^b	1606	2	y’en (some), y’a
Interrogative pronouns^b	1,358	7	où? (where?), qui? (who?), quoi? (what?), lequel?, laquelle?, lesquels? (which?)
Possessive determiners^a	964	14	son (his), sa (her), ses (their)
Relative pronouns^b	830	9	que (that), qui (who), quoi (that), où (where), est-ce que, lequel, laquelle, lesquels, lesquelles (what)
Object pronouns^b	828	7	me (me), te (you), le (him), la (her), nous (our), vous (you), les (them)
Reflexive pronouns^b	722	7	me (myself), te (yourself), lui (himself), se (itself), nous (ourselves), vous (yourselves), leur (themselves)
Generalized determiners^a	131	4	quelque (some), quelle, quel, chaque (each)
Demonstrative determiners^a	54	3	ce, cette, ces (this, those)
Total	99,358	1739

Determiners, ^b Pronouns, ^c Auxiliaries and modals, ^d Adverbs.

The description of token frequency by type of syntactic structure (determiner–noun, pronoun–verb and subject pronoun–verb) is reported in Table 3. Overall, 35,767 word-tokens were used as dependent variables in this study.

Table 3.

Token frequency by syntactic structure used in this corpus study (35,767 tokens).

Syntactic structures	Token	Examples
Pronoun–verb	16,321	qui roule (that moves)
Determiner–noun	9937	la voiture (the car)
Pronoun subject–verb	9509	elle met (she puts)
Total	35,767

To investigate the contribution of grammatical or lexical words in three types of syntactic structures (determiner–noun, pronoun–verb, subject pronoun–verb), a series of stepwise multiple regression analyses was performed with the R statistical package (http://www.R-project.org).

Predictors of determiner–noun combinations

Two categories of grammatical words (prepositions and auxiliaries) and three categories of lexical words (adjectives, adverbs and verbs) were entered in a first regression analysis as the independent variables to predict determiner–noun combinations. As can be seen in Table 4 and Figure 1, prepositions were found to be the best predictor (r = .89, p < .001) explaining 79% of the variance, F (1, 314) = 1157.3, p < .001. The final step in the regression confirms that prepositions and auxiliaries remain significant in the model even when the three categories of lexical words (adjectives, adverbs and verbs) were included. Adjectives, adverbs and verbs were not longer significant and only reach 83% of the variance. In order to control for the variance shared by the five predictors, the squared semi-partial correlation coefficient (‘partial r²’) for each predictor was computed, indicating that the two categories of grammatical words (prepositions and auxiliaries) explain the unique variance in determiner–noun better than the three lexical categories (partial r² = .212 and .130 respectively).

Table 4.

Stepwise multiple regression analysis predicting determiner–noun combinations.

Predictors	Model	Adjusted R²	β	SE (β)	t	Partial r²
Prepositions	1	.79	.887	.037	34.02***	.887***
Prepositions			.864	.046	26.82***	.699***
Adjectives	2	.79	.040	.147	1.24ns	.032ns
Prepositions			.740	.059	17.73***	.449***
Adjectives			.024	.144	.75ns	.019ns
Adverbs	3	.80	.175	.057	4.47***	.113***
Prepositions			.629	.065	13.71***	.334***
Adjectives			.010	.139	.33ns	.008ns
Adverbs			.086	.060	2.07ns	.050ns
Verbs	4	.81	.232	.052	5.05***	.123***
Prepositions			.472	.073	9.11***	.212***
Adjectives			−.005	.133	−.18ns	−.004ns
Adverbs			.045	.058	1.10ns	.026ns
Verbs			.131	.053	2.77ns	.064ns
Auxiliaries	5	.83	.317	.038	5.61***	.130***

ns p > .05, * p < .05, ** p < .01, *** p < .001.

Figure 1.

Scatter plot between prepositions and determiner–noun (r = .89).

Predictors of pronoun–verb combinations

Two categories of grammatical words (prepositions and determiners) and three categories of lexical words (nouns, adjectives and adverbs) were entered in a second regression analysis as the independent variables to predict pronoun–verb combinations. As can be seen in Table 5 and Figure 2, determiners were the strongest predictor (r = .86, p < .001), explaining 74% of the variance, F (1, 314) = 887.5, p < .001. The final step in the regression shows that determiners and prepositions remain significant, F (5, 314) = 199.7, p < .001, even when nouns, adjectives and adverbs were included in the model. Nouns, adjectives and adverbs were no longer significant and only reach 76% of the variance. The squared semi-partial correlation coefficient for each predictor indicates that the two categories of grammatical words (determiners and prepositions) explain the unique variance in pronoun–verb better than the three lexical categories (partial r² = .166 and .081 respectively).

Table 5.

Stepwise multiple regression analysis predicting pronoun–verb combinations.

Predictors	Model	Adjusted R²	β	SE (β)	t	Partial r²
Determiners	1	.74	.860	.018	29.79***	.860***
Determiners			.594	.053	7.06***	.200***
Nouns	2	.75	.282	.043	3.35***	.095***
Determiners			.616	.053	7.32***	.206***
Nouns			.202	.046	2.23ns	.063ns
Adjectives	3	.75	.089	.130	2.32ns	.065ns
Determiners			.590	.053	7.01***	.196***
Nouns			.142	.047	1.52ns	.043ns
Adjectives			.090	.129	2.37ns	.066ns
Adverbs	4	.75	.108	.048	2.42**	.068**
Determiners			.520	.054	6.01***	.166***
Nouns			.048	.050	.49ns	.014ns
Adjectives			.092	.127	2.45ns	.068ns
Adverbs			.082	.048	1.83ns	.051ns
Prepositions	5	.76	.198	.071	2.92**	.081**

ns p > .05, * p < .05, ** p < .01, *** p < .001.

Figure 2.

Scatter plot between determiners and pronoun–verb (r = .86).

Predictors of subject pronoun–verb combinations

The same set of grammatical words (prepositions and determiners) and the same set of lexical words (nouns, adjectives, adverbs) were entered in this third regression analysis as the independent variables to predict subject pronoun–verb combinations. As can be seen in Table 6 and Figure 3, determiners were again the strongest predictor (r = .86, p < .001) explaining 73% of the variance, F (1, 314) = 847.5, p < .001. The final step in the regression shows that determiners, prepositions and adverbs remain significant, F (5, 314) = 256.4, p < .001, even when nouns and adjectives were included in the model. Nouns and adjectives were no longer significant and only reach 80% of the variance. The squared semi-partial correlation coefficient for each predictor indicates that the two categories of grammatical words (determiners and prepositions) and one category of lexical word (adverb) explain the variance in subject pronoun–verb (partial r² = .142 for determiners and prepositions and .136 for adverbs), suggesting a relatively flexible word order for adverb.

Table 6.

Stepwise multiple regression analysis predicting subject pronoun–verb combinations.

Predictors	Model	Adjusted R²	β	SE (β)	t	Partial r²
Determiners	1	.73	.855	.026	29.11***	.855***
Determiners			.639	.075	7.42***	.216***
Nouns	2	.73	.229	.061	2.66***	.077***
Determiners			.647	.076	7.45***	.217***
Nouns			.201	.066	2.16ns	.063ns
Adjectives	3	.73	.031	.186	.77ns	.023ns
Determiners			.560	.069	7.11***	.186***
Nouns			.001	.061	.01ns	.000ns
Adjectives			.035	.167	.99ns	.026ns
Adverbs	4	.78	.360	.062	8.59**	.225**
Determiners			.443	.068	5.65***	.142***
Nouns			−.157	.062	1.78ns	−.045ns
Adjectives			.038	.160	1.12ns	.028ns
Adverbs			.316	.061	7.73***	.194***
Prepositions	5	.80	.333	.090	5.42***	.136***

ns p > .05, * p < .05, ** p < .01, *** p < .001.

Figure 3.

Scatter plot between determiners and subject pronoun–verb (r = .85).

Discussion

In this corpus study, we asked whether young children speaking European French build their early syntax around grammatical or lexical words. Specifically, we examined the relationship of grammatical words in three types of syntactic structures (determiner–noun, pronoun–verb and subject pronoun–verb). The results indicate two strong independent predictors: prepositions explain 79% of the variance in determiner–noun and determiners explain 74% of the variance in pronoun–verb and 73% of the variance in personal subject pronoun–verb. These findings support the view that grammatical words guide and facilitate syntactic knowledge. Early grammar is not based on a lexicon but on basic grammatical relationships that children build gradually, making use of the formal distributional properties of their native language.

This first finding that prepositions predict determiner–noun provides evidence that prepositions are an important predictor to build grammatical relations in different syntactic contexts. The most frequent prepositions found in our corpus (6510 tokens and 32 types) relating to determiner–noun indicate local and global regularities involved in learning grammatical relationships. At the local level, i.e. in immediate syntactic contexts, the use of prepositions preceding the types of determiners exposes the child to the grammatical relationships of both case and gender such as ‘dans la’ (in the) 498 tokens, ‘dans le’ (in the) 348 tokens, ‘à la’ (to the) 191 tokens, and ‘de la’ (from the) 184 tokens. It is important to note that the feminine gender was more frequently used than the masculine gender in our corpus. This supports the salient property of gender agreements in young French learners. The speech sound ‘a’ is more phonological salient than the speech sound ‘e’ (Tranel, 1987). These local regularities, in which prepositions belong to the same grammatical category, suggest that this simple indicator is sufficient to build a generalized knowledge of grammatical relationships. At the more global level, i.e. beyond two co-occurrences, the productive use of prepositions reflects more complex grammatical relationships, which are also highly predictable, e.g. ‘la voiture dans le garage’ (the car in the garage).

A second major finding of this corpus study is the role played by determiners (11,611 tokens and 26 types in our corpus) to build a generalized knowledge of grammatical relationships involving both pronoun–verb and subject pronoun–verb. This is not surprising because determiners and pronouns carry low meaning, which makes their grammatical relationships easier to learn, particularly demonstrative pronouns preceding multiple determiners such as ‘c’est le, c’est la, c’est un, c’est une’ (this is the, this is a), relative pronouns with a flexible word order such as ‘où il est? il est où?’ (where is it?) and specific pronouns with the expression ‘y’ preceding multiple determiners such as ‘y’a le, y’a la, y’a un, y’a une’ (it has the, it has a). These three frequent types of pronouns contribute to build multiple grammatical relationships with multiple determiners.

Two features of determiners could be advanced to explain this prediction. First, all determiners are independent words. Second, they are function words that are prosodically constrained very early on and whose distribution is very limited. Studies of French-speaking children suggest an important role for rhythmic preferences. Before the age of 2, French children prefer to insert fillers or ‘proto-articles’ in front of monosyllabic nouns in order to achieve an iambic rhythm of a weak syllable followed by a strong one, which is a dominant rhythm in the input (Bassano et al., 2013; Demuth & Tremblay, 2008; Veneziano & Sinclair, 2000). These ‘proto-articles’ provide important information of syntactic knowledge in the prenominal and preverbal position showing not only the distributional regularities of syntactic categories but also the increasing length in the phonological structure of children’s language (Veneziano, 2017). The evidence that young children learn syntax on the basis of grammatical words such as determiners and prepositions suggests that it is directly related to a constellation of phonological, morphological and syntactic properties of the native language.

Another reason to explain the strong relationship of determiners to pronoun–verb and subject pronoun–verb combinations is that they are linked to the development of the verbal system: all pronouns including subject pronouns encode and decode word order and obligatory markers similarly for case, agreement, number and tenses. They mark the person correctly, use all pronouns including subject pronouns appropriately, especially first, second and third person markers.

These two findings are in line with other studies that emphasize the role of grammatical words and morphemes in syntactic structures. When young children put words together, their processing of syntactic structures is surprisingly robust. This depends on both the distributional regularities of syntactic categories and on the properties of the native language.

For example, in German language similarly to French, Szagun and Schramm (2018) showed that multiple determiners have morphosyntactic features which make them very easy to learn: they are extremely frequent and have a highly predictable distribution. Determiners in German are highly frequent and restricted in their distribution to placement within the context of a noun. Such distributional regularities are readily learned by children and constitute their early generalized syntactic knowledge. Both the input directed to the child and the child’s ability to process that input are likely to impact the child’s syntactic development (Morgan, Meier, & Newport, 1987; Ninio, 2011; Szagun & Schramm, 2018; Xanthos et al., 2011).

The two results of this corpus study also confirm our previous findings showing that the best predictors of syntactic development measured by global regularities like mean length of utterance (MLU) are grammatical words and not lexical words. Subject pronouns, determiners and prepositions were considered to be the three best predictors explaining more than 50% of the MLU variance and, when taken together, accounting for 73% of the MLU variance, sufficient to determine early syntax (Le Normand et al., 2013).

Although lexical words were productively used in the first 100 words of the corpus, e.g. ‘voiture’ (car) 898 tokens, ‘mettre’ (put) 569 tokens, ‘petit’ (small) 460 tokens, these words did not predict the three types of syntactic structures examined. These results provide evidence that syntax cannot be deduced from the lexical categories of nouns, verbs and adjectives but from grammatical categories. Young children exploit distributional regularities to build different syntactic structures, which in turn provide them with a basic cognitive architecture to master adult syntax. Grammatical words guide and facilitate the distributional regularities of the sentence into which lexical words are inserted. This suggests that young French children generalize the regularities of their native language on the basis of form rather than meaning.

The role of grammatical words found in this corpus study may depend on the particular language being explored, which is defined either as an analytic language with limited inflections or as a synthetic language with rich inflections, derivations and compounds. It is plausible that Turkish with flexible word order, Cantonese with an extensive set of function words (classifiers, prepositions, particles, suffixes, negations, pronouns and aspect markers), or Russian with rich gender inflections but no determiners might give different patterns of results. Further cross-linguistic studies should be carried out to generalize our findings as a language-independent phenomenon. If our results are replicated, they should be explained by cross-linguistic accounts of syntactic development. Furthermore, our findings also confirm the interest of multiple regression analyses to explore syntactic development and open up the possibility of carrying out similar analyses on other syntactic structures with more dense corpora.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Bassano

(2000). Early development of nouns and verbs in French: Exploring the interface between lexicon and grammar. Journal of Child Language, 27, 521–559.

Bassano

Korecky-Kröll

Maillochon

van Dijk

Laaha

van Geert

Dressler

W. U.

(2013). Prosody and animacy in the development of noun determiner use: A cross-linguistic approach. First Language, 33, 476–503.

Bernal

Lidz

Millotte

Christophe

(2007). Syntax constrains the acquisition of verb meaning. Language Learning and Development, 3, 325–341. doi:10.1080/15475440701542609

Bybee

J. L.

(2010). Language, usage and cognition. Cambridge, UK: Cambridge University Press.

Cauvet

Limissuri

Millotte

Skoruppa

Cabrol

Christophe

(2014). Function words constrain on-line recognition of verbs and nouns in French 18-month-olds. Language Learning and Development, 10, 1–18.

Croft

Cruse

(2004). Cognitive linguistics. Cambridge, UK: Cambridge University Press.

Demuth

Tremblay

(2008). Prosodically-conditioned variability in children’s production of French determiners. Journal of Child Language, 35, 99–127.

Desrosières

Goy

Thévenot

(1983). L’identité sociale dans le travail statistique: la nouvelle nomenclature des professions et catégories socioprofessionnelles [Statistical social identity: the new classification of occupations and socioprofessional categories]. Economie et Statistique, 152, 55–81.

Fisher

(2002). The role of abstract syntactic knowledge in language acquisition: A reply to Tomasello (2000). Cognition, 82, 259–278.

10.

Gerken

McIntosh

B. J.

(1993). Interplay of function morphemes and prosody in early language. Developmental Psychology, 29, 448–457.

11.

Gleitman

L. R.

(1990). The structural sources of verb meanings. Language Acquisition, 1, 3–55.

12.

Grevisse

Goosse

(2011). Le bon usage: grammaire française: 75 ans [Good practice in French grammar: 75th year] (15e édition). Bruxelles: [Paris]: De Boeck; Duculot.

13.

Hallé

P. A.

Durand

de Boysson-Bardies

(2008). Do 11-month-old French infants process articles? Language and Speech, 51, 23–44.

14.

Höhle

Weissenborn

Kiefer

Schulz

Schmitz

(2004). Functional elements in infants’ speech processing: The role of determiners in the syntactic categorization of lexical elements. Infancy, 5, 341–353.

15.

Le Normand

M. T.

Moreno-Torres

Parisse

Dellatolas

. (2013). How do children acquire early grammar and build multiword utterances? A corpus study of French children aged 2 to 4. Child Development, 84, 647–661.

16.

Le Normand

M. T.

Parisse

Cohen

. (2008). Lexical diversity and productivity in French preschooolers: Developmental, gender and sociocultural factors. Clinical Linguistics & Phonetics, 22, 47–58.

17.

Lidz

Waxman

Freedman

(2003). What infants know about syntax but couldn’t have learned: Experimental evidence for syntactic structure at 18 months. Cognition, 89, 295–303.

18.

Lieven

Behrens

Speares

Tomasello

(2003). Early syntactic creativity: A usage-based approach. Journal of Child Language, 30, 333–370.

19.

Lowe

Costello

A. J.

(1976). Manual for the symbolic play test (Experimental edition). Windsor, UK: NFER.

20.

MacWhinney

(2000). The CHILDES Project: Tools for analyzing talk. Mahwah, NJ: Lawrence Erlbaum.

21.

Melançon

Shi

(2015). Representations of abstract grammatical feature agreement in young children. Journal of Child Language, 42, 1379–1393.

22.

Morgan

J. L.

Meier

R. P.

Newport

E. L.

(1987). Structural packaging in the input to language learning: Contributions of prosodic and morphological marking of phrases to the acquisition of language. Cognitive Psychology, 19, 498–550.

23.

Naigles

L. R.

(1990). Children use syntax to learn verb meanings. Journal of Child Language, 17, 357–374.

24.

Naigles

L. R.

(2002). Form is easy, meaning is hard: Resolving a paradox in early child language. Cognition, 86, 157–199.

25.

Ninio

(2011). Syntactic development, its input and output. Oxford, UK: Oxford University Press.

26.

Paradis

Crago

Genesee

(2006). Domain-general versus domain-specific accounts of specific language impairment: Evidence from bilingual children’s acquisition of object pronouns. Language Acquisition, 13, 33–62.

27.

Parisse

Le Normand

M. T.

(2000a). Automatic disambiguation of morphosyntax in spoken language corpora. Behavior Research Methods, Instruments, & Computers, 32, 468–481.

28.

Parisse

Le Normand

M. T.

(2000b). How children build their morphosyntax: The case of French. Journal of Child Language, 27, 287–292.

29.

Parisse

Le Normand

M. T.

(2001). Local and global characteristics in the development of morphosyntax by French children. First Language, 21, 187–203.

30.

Shi

Gauthier

(2005). Recognition of function words in 8-month-old French-learning infants. The Journal of the Acoustical Society of America, 117, 2426–2427.

31.

Shi

Melançon

(2010). Syntactic categorization in French-learning infants. Infancy, 15, 517–533.

32.

Shi

Werker

J. F.

Cutler

(2006). Recognition and representation of function words in English-learning infants. Infancy, 10, 187–198.

33.

Szagun

Schramm

S. A.

(2018). Lexically driven or early structure building? Constructing an early grammar in German child language. First Language. Advance online publication. doi:10.1177/0142723718761414.

34.

Thordardottir

E. T.

(2005). Early lexical and syntactic development in Quebec French and English: Implications for cross-linguistic and bilingual assessment. International Journal of Language & Communication Disorders, 40, 243–278.

35.

Tomasello

(2000). Do young children have adult syntactic competence? Cognition, 74, 209–253.

36.

Tomasello

Abbot-Smith

(2002). A tale of two theories: Response to Fisher. Cognition, 83, 207–214.

37.

Tranel

(1987). The sounds of French: An introduction. Cambridge, UK: Cambridge University Press.

38.

Van Heugten

Shi

. (2009). French-learning toddlers use gender information on determiners during word recognition. Developmental Science, 12, 419–425.

39.

Veneziano

(2017). Noun and verb categories in acquisition: Evidence from fillers and inflectional morphology in French-acquiring children. In Vapnarsky

Veneziano

(Eds.), Studies in language companion series (Vol. 182, pp. 381–411). Amsterdam, The Netherlands: John Benjamins.

40.

Veneziano

Parisse

(2017). Retrieving the meaning of words from noun and verb grammatical contexts: Interindividual variation in a comprehension study of 2- to 4-year- old French-acquiring children. In Hickmann

Veneziano

Jisa

(Eds.), Sources of variations in first language acquisition: Languages, contexts, and learner (pp. 81–102). Amsterdam, The Netherlands: John Benjamins.

41.

Veneziano

Sinclair

(2000). The changing status of “Filler Syllables” on the way to grammatical morphemes. Journal of Child Language, 27, 461–500.

42.

Xanthos

Laaha

Gillis

Stephany

Aksu-Koç

Christofidou

Dressler

W. U.

(2011). On the role of morphological richness in the early development of noun and verb inflection. First Language, 31, 461–479.