Abstract
This study explores the use of F0, intensity and duration in the production of two types of prominences in French: primary accent with duration as the main acoustic cue, and secondary accent with F0 and intensity as acoustic cues. These parameters were studied in 13 children using a cochlear implant (CI) and 17 children with a normal hearing (NH), aged 5 to 10 years. Words were recorded in two different tasks, word-repetition and picture-naming, to compare repetition of an audio model with spontaneous production. NH children were able to produce both types of prominences with duration on the one hand and the combination of F0 and intensity on the other hand, similar to what is described in the literature in French-speaking adults. NH children have a more stable use of prominences than CI children, who demonstrate more variability across tasks, more even-timed duration patterns and less modulation of F0 and intensity at vowel and word level than their NH peers.
Introduction
The present study explores the ability for school-aged pre- and perilinguistically deaf children with cochlear implants (CI) and age-matched normally-hearing children (NH) to balance the acoustic cues to produce prominences in French.
Cochlear implants allow pre-, peri- and postlingually deaf children and adults to access audio information that cannot be provided with traditional hearing aids. Gains in speech perception with the implant translate into better oral communication (Niparko et al., 2010). However, even though a cochlear implant (partially) restores perception of environmental and speech sounds, difficulties in perceiving certain acoustic characteristics of speech sounds remain. These may entail difficulties in understanding speech and in producing intelligible speech (e.g. Blamey et al., 2001; Flipsen & Colvard, 2006; Habib, Waltzman, Tajudeen, & Svirsky, 2010; Khwaileh & Flipsen, 2010; Pisoni, 2005; Tye-Murray, Spencer, & Gilbert-Bedia, 1995). The speech production of children with a CI has been extensively studied in the last decades. But there are some notable limitations. First of all, many studies have focused on segmental characteristics of speech, while suprasegmental characteristics have received considerably less attention. Second, most research focuses on the short-term gains in production after implantation while the long(er) term outcomes are not well understood yet. For instance, how do children with CI sound after five or more years of device use? Is their speech indistinguishable from that of children with normal hearing? A third limitation concerns the set of languages that have been studied. Literature on the subject has overwhelmingly explored prosody in English speakers, and a lot less in speakers of other languages. To our knowledge, there is, for example, only one study of the production of suprasegmental features in French-speaking CI children, which emphasized the difficulties that CI children face with prosodic organization (intonation and accentuation) at utterance level (Le Normand & Lacheret, 2010). Here we address these limitations in the current literature by studying French-speaking CI children’s production of accentuation from approximately 2 up to 9 years postimplantation.
Accentuation in French
In contrast to most other Romance languages, French does not have lexical accent or word stress. Instead a distinction is made between primary and secondary accent. Primary accent is described in the literature as a ‘phrasal accent’, located on a phrase’s last syllable (e.g. Delattre, 1938, 1963; Di Cristo, 1999, 2000, 2011). It is realized as phrase-final syllable lengthening (e.g. Delattre, 1963; Di Cristo, 2011; Duběda & Keller, 2005; Lacheret-Dujour & Beaugendre, 1999; Vaissière, 1983). Duration is therefore considered as the main cue to primary accent, but further studies have shown that primary accent can also be marked with intensity in French (Duběda & Keller, 2005). F0 is also cited as an optional cue to primary accentuation: an optional F0 rise can also (although not systematically) occur on accented, lengthened syllables (Lacheret-Dujour & Beaugendre, 1999).
In addition to this primary, phrase-final accent which has a non-emphatic rhythmic, demarcative function, French also has an optional secondary, phrase-initial accent, which can have an emphatic function (Di Cristo, 1999, 2000, 2011; Lacheret-Dujour & Beaugendre, 1999). This secondary, phrase-initial accent is realized with an F0 rise and increase in intensity (Di Cristo, 2011; Duběda & Keller, 2005).
Emergence of accentuation patterns in young French-speaking children
Several studies have explored the emergence of prosodic abilities of typically developing French-speaking children. The prosodic characteristics of the ambient language emerge at a relatively young age. For instance, in a cross-linguistic study with Japanese and French 18-month-old infants, Hallé, De Boysson-Bardies, and Vihman (1991) found adult-like prosodic patterns. They found lengthening of the word-final syllable frequently combined with an F0 rise in disyllabic words for French-reared infants, but a different pattern for Japanese-reared infants (i.e. no lengthening and an F0 falling contour). In an acoustic study of the early production of final lengthening in 14- to 24-month-old infants, Konopczynski (1990) established an initial isochrony between non-final and final syllables in infants, which evolves into a lengthening contrast in the non-final versus the final syllable. This pattern appears to have stabilized at 24 months. The cross-linguistic study by Vihman, Nakai, and DePaolis (2006) with French-, Welsh- and US-English-speaking children draws similar conclusions for French-speaking infants. They show signs of final lengthening at the four-word stage (in contrast to US-English- and Welsh- speaking infants), and at the 25-word stage they have progressed towards adult-like, language-specific patterns of final lengthening, with less variability than the other two groups of children.
Perception of prosody in children with CI
CI users experience difficulties in perceiving fine-grained acoustic characteristics of prosody: the degraded signal provided by the implant does not transmit all acoustic details of F0 (Green, Faulkner, & Rosen, 2004; Moore, 2003; O’Halpin, 2010) and intensity (Drennan & Rubinstein, 2008; Meister, Landwehr, Pyschny, Wagner, & Walger, 2011; Moore, 2003) but durational properties of speech are well transmitted (Meister et al., 2011; O’Halpin, 2010). Exploring the production of two types of prominences in French thus appears relevant as they may reflect different levels of difficulty for CI users in perceiving durational properties of speech or the combination of F0 and intensity.
Studies of the perception of prosodic features in CI children have explored their ability to identify intonation contours used in questions or statements or to convey attitudes (e.g. Klieve & Jeanes, 2001; Most & Peled, 2007; Nakata, Trehub, & Kanda, 2012; Peng, Tomblin, & Turner, 2008), or to identify tones (Peng, Tomblin, Cheung, Lin, & Wang, 2004), word stress and intonation patterns (Segal, Houston, & Kishon-Rabin, 2015; Torppa et al., 2014). These studies show that CI children have persistent difficulties in perceiving fine-grained acoustic cues to prosody (F0, duration and intensity) at syllable, word or utterance levels.
Production of prosodic features in children with CI
As a consequence of the hampered perception of prosodic features in CI children, the production of these features is also shown to be challenging compared to NH peers. In an acoustic study of disyllabic babble and first words in samples of spontaneous speech production by children acquiring Dutch, Pettinato, De Clerck, Verhoeven, and Gillis (2017) analysed vowel F0, intensity and duration from the onset of babbling until children reached a cumulative vocabulary of 200 words. They showed the trochaic pattern (marked by word-final F0 and intensity peaks), which is the predominant stress pattern in the ambient language, emerged later in CI children when compared to NH children. CI children also demonstrated less prominent lexical stress. Furthermore, in a study using samples of spontaneous parent-child interactions, Hide, Gillis, and Govaerts (2007) showed a tendency for CI infants aged 9 to 20 months to produce prosodic prominence with less pitch variation (i.e. a smaller F0 range in vowels) in babbled disyllables than age-matched NH children, which might also be a consequence of the reduced perception of F0 and intensity cues with a cochlear implant.
Studies of older CI children’s prosody show that even after several years of implant experience, English-speaking CI children have difficulties with appropriate emphatic, word, phrasal and sentence stress marking. For instance, in Lenden and Flipsen (2007) subjective ratings on suprasegmental features of speech using the Prosody-Voice Screening Profile (Shriberg, Kwiatkowski, & Rasmussen, 1990) were studied. CI children showed lower prosodic accuracy. And in Carter, Dillon, and Pisoni (2002) lower accuracy was found in CI children’s non-word repetitions with regard to syllable number and stress location. CI children also appear to struggle with the production of lexical tones in Mandarin. Lexical tone is mastered with a delay in CI children as compared to NH peers (Lee, van Hasselt, & Tong, 2010). Lee et al. (2010) also show that implantation before the age of 4 is a predictor of higher accuracy in lexical tone production. At sentence level, two studies showed that English-speaking children with CI experienced difficulties in using prosody to produce distinctive sentence modalities: Chin et al.’s (2012) study using a sentence-repetition task showed that CI children (aged 6 to 10 years with 3 to 9 years of CI use) produce declarative modalities more accurately than interrogative modalities, and Peng et al.’s (2008) study, based on interactions prompting the two different sentence modalities, showed that CI children (aged 7 to 21 with 5 to 17 years of CI use) tend to produce less distinctive declarative and interrogative utterances and more inappropriate intonation contours when compared to NH children. Also the imitation of attitudes such as disappointment or surprise was found more challenging for 5- to 13-year-old Japanese-speaking CI children than for their NH peers (Nakata et al., 2012). Finally, Le Normand and Lacheret (2010) showed that CI children experience difficulties in rhythmic organization of their utterances and that late implantation after age 2;6 is a predictor of greater difficulty than earlier implantation. This longitudinal study (from age 2;6-7;2 until 9 years post-implantation) is the only study exploring prosodic features in French-speaking CI children, and it does not provide comparison to a NH control group.
The characteristics of French prosody provide the possibility to have a distinct look at duration on the one side and intensity and F0 on the other side, since the use of the three acoustic cues of prominence does not occur on the same level in French: duration is used for rhythmic organization (i.e. phrase final lengthening signalling a phrase boundary) while intensity and F0 are used for emphatic accents (i.e. F0 and intensity peaks highlighting pragmatic or semantic information). Furthermore, since durational cues are better perceived than intensity and frequency cues with a cochlear implant, the question is whether the use of these cues by CI children reflects these differences coming both from two types of prominences (duration vs the combination of F0 and intensity) and from the perception with an implant. We therefore expect that CI children will have less difficulty to produce primary prominences involving duration than secondary prominences involving both intensity and F0.
The ability of CI children to produce similar prominences to NH children has mostly been studied at sentence level and is considered to be an ability to convey appropriate semantic or pragmatic information (i.e. sentence modality, contrastive lexical stress, narrow focus, topic); in most studies of prosody, prominences are almost exclusively studied as correlates of syntactic, semantic or prosodic functions at sentence level, but not as the actual ability of CI children to combine pitch, rhythm and intensity at word level, when the accent does not have a lexical function in the language (i.e. French).
On the methodological side, studies of prosody in CI children mostly rely on subjective listeners’ assessments of words or sentences, but less so on objective measurements of the acoustic characteristics of speech (only Pettinato et al., 2017, and Hide et al., 2007 used acoustic measures of F0, intensity or duration to assess the accuracy of word stress in Dutch-speaking infants and young children). Our project is an acoustic study of the ability of children with CI to combine F0, intensity and duration in minimal phrasal units (i.e. words) to produce non-lexical prominences.
Aims of the study
First, characterizing prominences in CI vs NH children and the effects of chronological age on each acoustic cue will help us understand how later access to oral communication influences how phonological representations of prominences are shaped during the course of language acquisition and if all acoustic cues to prominence are mastered at the same time in the course of language acquisition.
Second, exploring the effects of age at implantation will help us question to what extent the critical period in language acquisition specifically delays or hinders the acquisition of prominences in children.
Finally, comparing the children’s production of words in repetition and picture-naming tasks will help us understand how the representations of accents attached to these words are resistant to outside influences and thus how stable they become in the course of language acquisition. On the one hand, productions recorded in a picture-naming task reflect the children’s underlying phonological representations (Stackhouse & Wells, 1993, p. 343) of certain features of speech, in this case the two types of accents in French, and on the other hand, a repetition task provides indications regarding the children’s ability to produce the target features of speech sounds accurately (Stackhouse & Wells, 1993, p. 342) with the accurate accents. More broadly, the use of the two tasks allows us to study how CI and NH children use internal feedback (i.e. how they monitor their own production of speech features) when producing the two types of prominences (Levelt, Roelofs, & Meyer, 1999).
Method
Participants
The data for this study are a part of a bigger project that explores the production of speech in school-aged CI children in comparison with NH children matched on chronological age (Grandon, 2016).
The participants for the present study are 30 children: 13 pre- and perilingually deaf children with cochlear implants (six girls and seven boys) and 17 normally-hearing children (nine girls and eight boys). All children are monolingual, native speakers of French who have been living in the Lyon-Grenoble area in France for several years prior to the recordings.
The French spoken in this area is a standard variety of French, very similar to standard Parisian French, both at the segmental and suprasegmental levels. All NH children were screened for hearing and language impairment. Each child gave an oral consent and the parents gave a written consent. The study was approved by the local ethical committee (CERNI N° 2014-11-18-54).
The chronological age of the CI children (n = 13) ranges from 6;6 to 10;7 (mean: 8;2, SD: 1;3 years) and the chronological age of the NH children (n = 17) ranges from 5;7 to 10;7 (mean: 7;7, SD: 1;4). A Welch t-test shows no significant between-group difference (CI vs NH) for chronological age: t(27.247) = −0.9894, p = 0.3312).
All CI children had severe-to-profound hearing loss before implantation. The age at diagnosis of deafness ranges from 0;7 to 3;4 (mean: 1;6, SD: 0;11). The age at implantation ranges from 1;6 to 6;6 (mean: 3;2, SD: 0;11) and the duration of implant experience ranges from 2;2 to 9;1 (mean: 5;3, SD 2;3).
Procedure
Tasks
The children were recorded in two different tasks: a word-repetition task followed by a picture-naming task. In the word-repetition task, the children were asked to repeat adult audio models of disyllabic words produced in isolation and presented with a matching picture. In the subsequent picture-naming task, the pictures used in the word-repetition task were presented again and the children were asked to name each picture. This time there was no adult audio model. Each child was recorded twice in each task (two word-repetitions followed by two picture-naming productions after a small pause), except for one CI child, who was recorded only once in each task (indeed, all children could stop the recordings at their convenience). We recorded all target sounds in stressed word-initial CV sequences, to obtain both the rhythmic pattern of French (i.e. iambic pattern) marked by duration and an emphatic-like prominence marked by both F0 and intensity. The children were not explicitly instructed to stress the first syllable, but to repeat the word or to name the word associated with the picture. In the picture-naming task, they produce this prominence without any audio model, which means that they rely on their own representation of these primary and secondary accents.
Corpus/stimuli
The data for this study are a list of 16 disyllabic words: 14 words with a CVCV structure and 2 words with a CVCVC structure. A full list of the words is given in the Appendix. The consonants in word-initial and medial positions were either plosives /p, t, k, b, d, g/ or fricatives /f, s, ʃ/ and the consonant in word-final position was /ʁ/. The vowels were /i, e, u, o, a, ɔ̃, ɛ̃/. All words were names of objects, animals, etc. known by young children from 5 to 10 years.
Recordings
The recordings took place in quiet rooms, using a digital Marantz PMD-670 recorder (mono, sampling frequency 44 100 Hz, 16 bits), and an external AKG-C1000S microphone placed on a tripod, approximately 40 cm from the children’s mouths. Pictures and audio models of words were presented on a laptop, through the laptop’s loudspeakers, facing the children.
Analyses
Data selection
For this study, the maximum number of words was 32 per task and per child (two productions of 16 words in each task). All words which were not produced or could not be understood as the target words were excluded. In total, 757 words were included for CI children (mean 58.2 words per child) and 1000 words for NH children (mean 58.8 per child).
Data segmentation and annotation
Words were manually segmented and annotated in PRAAT (Boersma & Weenink, 2015). On a point tier the onsets and ends of V1 and V2 were marked. Vowel onsets were identified as the first peak following a stop burst or a friction noise and vowel ends as the last peak before F0, F2 and intensity stop or before a friction noise appeared (when followed by a fricative). All annotations were done by the first author. In order to validate these annotations, approximately 10% of the corpus (192 words) was reannotated by the second author. Times at V1 and V2 onsets and ends were then extracted for this subset of the corpus for both annotators and inter-annotator agreement of boundary marking was calculated by means of a Pearson’s correlation between time points of both annotators: a 0.99 (p < 0.001) correlation between annotations was found, which allows us to validate the marking of vowel boundaries in the entire corpus.
Acoustical analyses
The three acoustic parameters, viz. vowel duration, intensity peak and F0, were automatically extracted by a PRAAT script. Duration was calculated as the difference between the vowel onset time and the vowel end time. Intensity was measured as the maximum intensity in dB for each vowel (settings for intensity measurements: range from 0 to 100 dB). F0 was measured in Hz with the autocorrelation method (settings for F0 measurements: 100 to 500 Hz): for each vowel, the mean F0 was measured, as well as the minimum and maximum F0, in order to study the amplitude of F0 modulation over each vowel.
In order to characterize the realization of the primary and secondary accents, we compared the duration, intensity, F0 and F0 range of the vowels of the first and second syllables. In order to control for inter-word and inter-subject variability, the data were normalized: instead of raw measurements of duration, intensity and F0, ratios and distances were computed, to compare their use in V1 and V2. For duration, the V1:V2 duration ratio was computed for each word. A ratio higher than 1 corresponds to a longer V1 and a ratio lower than 1 to a longer V2. For intensity, the V1:V2 ratio was calculated. A ratio higher than 1 corresponds to a more intense V1 and a ratio lower than 1 to a more intense V2. For F0 the distance in semi-tones between the two vowels was computed using the formula in equation (1):
where F0V1,V2 represents the mean F0 values of the first and the second vowel.
A negative distance corresponds to a lower V2 (i.e., a falling contour) and a positive distance to a higher V2 (i.e., a rising contour).
The amplitude of the F0 modulation of each vowel was computed as the F0 range in semi-tones using the formula in equation (2):
where F0max,min represents the maximum and minimum F0 values of the vowel.
For each measure of duration, intensity and F0, outliers were excluded by means of the Interquartile Rule (IQR), which sets minimum and maximum thresholds in the distribution of each measure to identify and exclude outliers. Applying the IQR resulted in 1724 words for intensity analyses (1.65% outliers were excluded), 1636 words for duration analyses (3.82% outliers were excluded), 1709 words for F0 distance analyses (2.51% outliers were excluded), 1722 words for F0 range analyses on V1 (1.77% outliers were excluded) and 1644 words for F0 range analyses on V2 (6.22% outliers were excluded). For each measure, means and standard deviation for each group in each task are provided in Table 1.
Mean, standard deviation and number of data for each measure (duration ratio, intensity ratio, F0 distance, V1 F0 range, V2 F0 range) per group and task.
Statistical analyses
All statistical analyses were run in R (R Development Core Team, 2012). Linear mixed-effect models were fitted, which allow to take into account factors with fixed effects and random effects. Variables of interest are: V1:V2 intensity ratio, V1:V2 duration ratio, V2–V1 F0 distance (in semi-tones) and V1 and V2 F0 ranges (in semi-tones). Fixed-effect factors are: group (CI vs NH), task (word-repetition vs picture-naming tasks), chronological age (for both groups) and hearing age and age at implantation (for CI children). Random-effect factors are: children and words. For each variable of interest, a first model was built with all individual factors and their two-way interactions, using the lme function (nlme package in R). Subsequently the relevant factors were selected using the StepAIC function, which allows the selection of the model with the best fit. The best-fitting model reveals which factors might have a significant effect on the variable of interest. When an interaction was significant, multiple comparison tests were run with the lsmeans and multcomp functions in R.
Results
All figures in this section present the variables of interest (V1:V2 intensity ratio, V1:V2 duration ratio, V2–V1 F0 distance and V1 and V2 F0 ranges) for each group of children (CI and NH) with the values of the adult audio model (MODEL), in both tasks (word-repetition, picture-naming tasks). They also present the variables as a function of chronological age, for both groups of children.
Duration
Figure 1 and Table 1 indicate that for both groups, the duration ratio is lower than 1: the words are produced with a longer V2 than V1, which is consistent with the pattern of the adult audio model and with the description of primary accent in French, realized as a lengthening of the phrase-final syllable.

Effects of (A) hearing status (CI vs NH) and task (word-repetition vs picture-naming tasks) and of (B) chronological age on duration.
For the analyses of duration, the best-fitting model includes group, task, chronological age, the interaction between group and task and the interaction between group and chronological age. The intercept for the model is 0.412 (SE = 0.163, t = 2.534, p < 0.05). There is a significant effect of task (Estimate = 0.084, SE = 0.013, t = 6.574, p < 0.001) on the duration ratio, and an interaction between task and group (E = −0.086, SE = 0.017, t = −5.088, p < 0.001). Post-hoc tests of this interaction show a significant between-task difference for CI children (E = −0.084, SE = 0.013, z = −6.586, p < 0.001) but not for NH children (E = 0.002, SE = 0.011, z = 0.205, p = 0.974). This means that the duration ratio is closer to 1, i.e. phrase-final lengthening is more marked in the word-repetition task for the CI children, but similar in both tasks for the NH children. The fixed effect of group does not reach significance (E = 0.391, SE = 0.201, t = 1.94, p = 0.0629), meaning that both groups of children have a similar lengthening of the words’ second vowel. There is neither an effect of chronological age alone (E = 0.002, SE = 0.002, t = 1.254, p = 0.2209) nor in interaction with group (E = −0.004, SE = 0.002, t = −1.710, p = 0.0991) on the duration ratio.
The effects of age at implantation and hearing age on the duration ratio are analysed for the CI children, using a similar statistical method as for the first analyses with both groups. Chronological age is not included in this model, as it is overlapping with hearing age and age at implantation. For the analyses of duration in CI children, the best-fitting model includes task, hearing age, age at implantation and the interactions of task and hearing age, and task and age at implantation. The results indicate no effect of age at implantation alone (E = 0.0009, SE = 0.0017, t = 0.5580, p = 0.5891), or in interaction with the task (E = 0.0016, SE = 0.0009, t = 1.6232, p = 0.1051), and no effect of hearing age alone (E = 0.0022, SE = 0.0013, t = 1.7263, p = 0.1150) or in interaction with the task (E = −0.0012, SE = 0.0008, t = −1.6237, p = 0.1050).
Intensity
Figure 2 and Table 1 indicate that the mean intensity ratio is slightly above 1 for both groups in each task: V1 is produced with more intensity than V2. For the statistical analysis of intensity, the best-fitting model includes group, task, chronological age, and interactions between group and chronological age and between task and chronological age.

Effects of (A) hearing status (CI vs NH) and task (word-repetition vs picture-naming tasks) and of (B) chronological age on intensity.
The intercept of the model is 0.9652 (SE = 0.03337, t = 28.9278, p < 0.001). An effect of group (E = 0.0904, SE = 0.0404, t = 2.2400, p < 0.05) is found (i.e. the intensity accent is stronger for NH children when compared to CI children), as well as an effect of task (E = 0.06198, SE = 0.01387, t = 4.4712, p < 0.001): the ratio is higher when the audio model is provided. Further results of the mixed-effect model built with all the data indicate no effect of chronological age alone (E = 0.0007, SE = 0.0003, t = 1.9490, p = 0.0622) or in interaction with the group (E = −0.0005, SE = 0.0004, t = −1.9334, p = 0.0641), but an effect in interaction with task (E = −0.0005, SE = 0.0001, t = −3.4829, p < 0.001) on the intensity ratio.
To further explore the chronological age and task interaction, we built a separate model for each task. Figure 2 shows a slight decrease of the intensity ratio with chronological age in the word-repetition task: however, the corresponding statistical test shows that this effect is just above the threshold of significance (E = −0.0004, SE = 0.0002, p = 0.0772). There is no effect of chronological age on the intensity ratio in the picture-naming task (E = 0.00008, SE = 0.0002, t = −1.8347, p = 0.7293).
For the analyses of intensity in CI children, the best-fitting model includes task, hearing age and the task and hearing age interaction. Age at implantation is not selected as a variable that significantly contributes to a better fit of the model, thus having no effect on the realization of intensity in CI children. In addition, no significant effects of hearing age alone (E = 0.0003, SE = 0.0002, t = 1.1924, p = 0.2582) or in interaction with task (E = −0.0002, SE = 0.0001, t = −1.4611, p = 0.1446) are found.
F0 distance
Figure 3 and Table 1 indicate a negative F0 distance, which corresponds to the expected falling F0 pattern described in the literature about emphatic stress in French: stressed V1 is realized with a higher F0 than unstressed V2.

Effects of (A) hearing status (CI vs NH) and task (word-repetition vs picture-naming tasks) and of (B) chronological age on the F0 distance.
For the analyses of F0 distance, the best-fitting model includes group, task, chronological age, and the interactions of group and task, and of task and chronological age. The intercept of this model is 0.0315 (SE = 1.0760, t = −0.02930, p = 0.9766). The results indicate no effect of group with results just on the verge of significance (E = −0.6978, SE = 0.3481, t = −2.005, p = 0.0551): expanding the size of children’s groups and adding more data to these analyses could help to explore this tendency and confirm this potential between-group difference. A significant effect of task is found alone (E = −3.6935, SE = 0.4723, t = −7.8210, p < 0.001) on the F0 distance but not in interaction with the group (E = 0.2740, SE = 0.1537, t = 1.7823, p = 0.0749): for both groups, the F0 difference is higher (in absolute value) when an audio model of the word is provided, indicating a highlight of V1 over V2 when children try to replicate the adult’s stress pattern.
There is no effect of chronological age alone (E = −0.0103, SE = 0.0106, t = −0.9665, p = 0.3423) but in interaction with the task (E = 0.0272, SE = 0.0046, t = 5.8737, p < 0.001) on the F0 distance: for the NH children in the word-repetition task, F0 of V1 and V2 approximate as the children grow older, which is similar to what was observed for intensity.
For the CI children, the best-fitting model included task, age at implantation, hearing age, and the interactions between task and age at implantation, and task and hearing age. Effects of age at implantation (E = −0.0354, SE = 0.0153, t = −2.3136, p < 0.05) and task (E = −2.6888, SE = 0.5762, t = −4.6664, p < 0.001) but not of hearing age (E = −0.0149, SE = 0.0120, t = −1.2437, p = 0.2420) are found on the F0 distance. Interactions between task and age at implantation (E = 0.0194, SE = 0.0070, t = 2.7600, p < 0.01) and between task and hearing age (E = 0.0150, SE = 0.00054, t = 2.7623, p < 0.01) also have significant effects on the F0 distance. The overall significant effect of age at implantation corresponds to a smaller F0 distance in absolute value in earlier-implanted children, who use a F0 pattern similar to that of the NH children. Further statistical analyses of the age and task interactions are however not showing how age at implantation and hearing age influence the use of F0 in each task: the inclusion of more children in this study could help us have a better understanding of these task and age interactions.
F0 amplitude (V1)
Figure 4 and Table 1 show that the F0 range for V1 is lower in CI children when compared to NH children, corresponding to a narrower amplitude of F0 modulation in V1. However, the F0 range for V1 does not seem to differ, whether an audio model is provided or not.

Effects of (A) hearing status (CI vs NH) and task (word-repetition vs picture-naming tasks) and of (B) chronological age on the V1 F0 range.
For the analyses of the F0 range over V1, the best-fitting model included group, chronological age and the group by chronological age interaction. Task was not included; we can therefore conclude to an absence of significant effect of task on the F0 range for V1. The intercept for this model is 1.2820 (SE = 0.4084, t = 3.1386, p < 0.01).
Our results show a significant effect of group alone on the V1 F0 range (E = 1.3341, SE = 0.5058, t = 2.6375, p < 0.05) but not of chronological age alone (E = 0.0063, SE = 0.0041, t = 1.5288, p = 0.1384). However, we find a significant effect of the group and chronological age interaction (E = −0.0109, SE = 0.0052, t = −2.1072, p = 0.05): V1 F0 range decreases with chronological age for the NH children but increases for the CI children.
For the CI group alone, we find an effect of hearing age, corresponding to an increase of the F0 range with hearing age (E = 0.0102, SE = 0.0037, t = 2.7951, p < 0.05), and in interaction with task (E = −0.0071, SE = 0.0031, t = −2.2607, p < 0.05) on the F0 range of V1. However, we find no significant effect of age at implantation alone (E = 0.0086, SE = 0.0047, t = 1.8426, p = 0.0952) or in interaction with task (E = −0.0065, SE = 0.0040, t = −1.6019, p = 0.1098). Our results show no effects of task alone (E = 0.6104, SE = 0.3301, t = 1.8492, p = 0.0650) either.
F0 amplitude (V2)
Figure 5 and Table 1 show that the F0 range for V2 is lower in CI children when compared to NH children, corresponding to a narrower amplitude of F0 modulation in V2. However, the F0 range for V2 does not seem to differ, whether an audio model is provided or not.

Effects of (A) hearing status (CI vs NH) and task (word-repetition vs picture-naming tasks) and of (B) chronological age on the V2 F0 range.
For the analyses of the F0 range over V2, the best-fitting model includes group, task, and the group and task interaction. We find a significant effect of group on F0 range for V2 (E = 0.7864, SE = 0.2117, t = 3.7115, p < 0.001), but no effects of task alone (E = 0.1380, SE = 0.0890, t = 1.5499, p = 0.1214) or in interaction with group (E = −0.2153, SE = 0.1192, t = −1.8068, p = 0.0710).
Our variable selection procedure led us to include only task in the statistical model for CI children, which allows us to conclude on an absence of effect of both age at implantation and hearing age on F0 range for V2.
Discussion
This study investigated the acquisition of the three acoustic cues to accentuation in French, namely duration, intensity and F0. It contributes to an understanding of how children balance the relative weight of these cues in both primary and secondary accentuation in French (i.e. duration being used mainly for primary accents but intensity and F0 for secondary accents), to build stable phonological representations of prominences, and how hearing impairment affects this acquisition. Due to the implants’ better processing of durational cues as compared to intensity and F0, primary accentuation was expected to be less affected by the use of a CI than secondary accentuation.
Realization of prominence patterns in NH children
This study first allows us to question how two prominence patterns coexist in a minimal phrasal unit (e.g. a word) in the production of NH children aged 5 to 10 years. It also helps us contribute to an understanding of forms and functions of primary and secondary accentuation in French.
As far as form is concerned, a primary accent is realized through lengthening of the phrase’s last vowel and a secondary accent is realized by a combination of higher F0 and higher intensity located on the phrase’s first vowel. These results are consistent with the description of French accentuation by adults in the literature (e.g. Delattre, 1938, 1963; Di Cristo, 1999, 2000, 2011; Lacheret-Dujour & Beaugendre, 1999).
NH children in our study produce primary and secondary accent patterns in the same way as French adults do, which means that they are able to use duration, intensity and F0 independently for different prosodic functions. Their durational patterns, for the realization of primary accents, do not evolve with age, whereas their use of intensity and F0 parameters of secondary accents does change with age. This could indicate that the functions carried by accents (i.e. primary accent is systematic and has a demarcative function whereas secondary accent is optional and exclusively bears semantic or pragmatic functions) are not mastered at the same time in the course of language development.
Stability in phonological representations of prominence marking
In the picture-naming task, which can be viewed as a realization of the children’s own stored phonological representation of stress (Stackhouse & Wells, 1993), NH children realize two different prominences corresponding to the expected primary and secondary accentuation in French (e.g. Di Cristo, 1999, 2000, 2011), and similar to the adult pattern (i.e. word-final lengthening and higher V1 in intensity and F0). The absence of an effect of chronological age in the picture-naming task indicates that stable phonological representations have been built for both prominences, not evolving during the age span of the children in this study.
In the word-repetition task, where an adult audio model is provided, the strategy of the youngest children in our study is to converge to the adult model as much as possible, but that of older children is to produce the prominences closer to those in the picture-naming task (corresponding to their own stored phonological representations) and further from the adult model. This is especially the case for the intensity and F0 distance (i.e. cues to emphatic-like stress) but not for duration (i.e. cue to primary stress), which is not influenced by chronological age.
The declining effect of convergence to the hyper-articulated adult production and the decreasing between-task difference for intensity and F0 cues could attest that the older the children, the more independent from external input they become, relying more on their own representation of prominences and on internal feedback (Levelt et al., 1999). It is also in favour of two acquisition mechanisms of prominences: final lengthening is crucial for French prosodic organization and is acquired first by NH children, which is consistent with the early acquisition of rhythmic features in French-reared infants (Hallé et al., 1991; Konopczynski, 1990; Vihman et al., 2006), whereas modulating F0 and intensity is less relevant at the word level when these cues are used to mark emphatic-like prominences, and is more variable in NH children as shown in the effect of task for intensity ratio and F0 distance but not for duration.
Production of prominences in CI children
Similarly to the NH children, the CI children produce the two expected prominences of French (e.g. Delattre, 1938, 1963; Di Cristo, 1999, 2000, 2011; Lacheret-Dujour & Beaugendre, 1999): primary accents through phrase-final lengthening, and secondary accents through higher intensity and F0 on the word’s first vowel. This is an indication that CI children are also able to manipulate independently the three acoustic cues to prominence marking.
However, these two prominences are not affected similarly by the children’s hearing abilities: while CI children produce similar final lengthening to NH children, they use a less marked emphatic-like prominence (i.e. intensity ratio and F0 distance are lower in CI children) and V1 and V2 are produced with lower ranges (i.e. less pitch modulation), consistent with the findings in Hide et al. (2007) and Pettinato et al. (2017). These results give further support to the idea that CI children’s difficulties in production are reflecting the greater difficulties experienced in perceiving intensity and F0 cues with a CI, when compared to perceiving durational cues (Drennan & Rubinstein, 2008; Meister et al., 2011; Moore, 2003; O’Halpin, 2010).
Furthermore, we found limited effects of age at implantation in CI children: only the pitch distance is affected by age at implantation corresponding for later-implanted CI children to a higher pitch distance (i.e. more marked secondary prominence). This might indicate that the critical period in language acquisition is not affecting similarly all acoustic cues to prosody: duration, intensity and pitch amplitudes for both vowels are not affected by a later implantation, in contrast to the pitch distance. More broadly, we could argue that the critical period in acquisition might have different influences on primary (i.e. obligatory) or secondary (i.e. optional) accentuation.
Finally, both prominences in CI children differ from the adult model, but duration, intensity and F0 distance are also all changing with the task, with CI children leaning closer than NH children to the adult model when provided. This could mean that CI children are more affected by external influences in the repetition task than the NH children and that they use two different strategies to produce each prominence (i.e. whether a model is provided or not). The absence of chronological age and hearing age effects for most cues in CI children (no effect of chronological age on most cues and effects of hearing age limited to F0 measures) contrasts with the effects of task on almost all cues. This could indicate that phonological representations of accents are stable for CI children who are still not able to monitor their production through internal feedback loops (Levelt et al., 1999) as well as the NH children. Indeed, for CI children, final lengthening, intensity and F0 highlighting are more marked in the repetition task (i.e. when a model is provided) than in the picture-naming task, at any age.
Conclusion
This study contributes to the understanding of the use of prominence marking in French, both in NH children and in CI children, at late stages of phonological development. Consistent with the description of prominence marking in French-speaking adults in the literature, our results showed that the three phonetic cues to prominence patterns are used differently by CI children when compared to NH children, and confirm previous results of the literature: most of the observed variation is coming from difficulties in CI children in processing acoustic information necessary to produce prominences, and not surprisingly, duration cues are less impacted than intensity and F0 by the hearing status of the children. Limited effects of chronological age, hearing age and age at implantation are indications of a relative stability of prominence marking in 5- to 10-year-old children. However these limited effects could be explained by the heterogeneity of the CI group whose hearing ages range from 2;2 to 9;1 years and should be further explored in a larger group. Effects of task raise interesting questions about the status of primary and secondary accentuation in French and how children build separate phonological representations for these two prominences, which could be further explored with other types of accents used with linguistic and non-linguistic purposes.
Further work on longer units (multiple-word phrases and utterances) will help us get a closer look at this realization of accent patterns.
Footnotes
Appendix
List of words
| Word in French | Phonemic transcription | English equivalent |
|---|---|---|
| bateau | /bato/ | boat |
| bébé | /bebe/ | baby |
| bobo | /bobo/ | injury |
| bouton | /butɔ̃/ | button |
| bonbon | /bɔ̃bɔ̃/ | candy |
| chiffon | /ʃifɔ̃/ | cloth |
| couteau | /kuto/ | knife |
| dauphin | /dofɛ̃/ | dolphin |
| dessin | /desɛ̃/ | drawing |
| doudou | /dudu/ | blanket |
| gâteau | /gato/ | cake |
| guépard | /gepaʁ/ | cheetah |
| guitare | /gitaʁ/ | guitar |
| poupée | /pupe/ | doll |
| ticket | /tike/ | ticket |
| toupie | /tupi/ | whirligig |
Acknowledgements
We thank all children participants and their parents, as well as doctors and speech therapists who helped recruiting the participants.
Funding
This work was supported by a doctoral grant (Région Rhône-Alpes – ARC2 grant) awarded to first author.
