Abstract
This study investigates prosodic modulation in the spontaneous canonical babble of congenitally deaf infants with cochlear implants (CI) and normally hearing (NH) infants. Research has shown that the acoustic cues to prominence are less modulated in CI babble. However acoustic measurements of individual cues to prominence give incomplete information about prosodic modulation. In the present study, raters are asked to judge prominence since they simultaneously take into account all prosodic cues. Disyllabic utterances produced by CI and NH infants were presented to naive adult raters who had to indicate the degree and direction of prosodic modulation between syllables on a visual analogue scale. The results show that the babble of infants with CI is rated as having less prosodic modulation. Moreover, segmentally more variegated babble is rated as having more prosodic modulation. Raters do not perceive the babble to be predominantly trochaic, which indicates that the predominant stress pattern of Dutch is not yet apparent in the children’s productions.
Keywords
Introduction
In congenitally deaf infants, cochlear implantation (CI henceforth) in the prelexical period has been shown to have a positive effect on their speech and language acquisition (Dettman et al., 2016; Levine, Strother-Garcia, Golinkoff, & Hirsh-Pasek, 2016; Tomblin, Barker, Spencer, Zhang, & Gantz, 2005). However, the prosodic modulation of speech remains difficult even long after implantation. In Dutch prosodic modulation is achieved by manipulating three acoustic cues, viz. fundamental frequency (F0), intensity and duration (Lieberman, 1960). In perceptual terms this means that a prosodically prominent syllable in a word or prelexical utterance has higher pitch and is louder and longer. Cochlear implants are limited in transmitting the spectral and temporal information that is crucial for adequate perception of pitch (Green, Faulkner, & Rosen, 2004; Moore, 2003; O’Halpin, 2010) and intensity (Drennan & Rubinstein, 2008; Meister, Landwehr, Pyschny, Wagner, & Walger, 2011; Moore, 2003), leading to difficulties with prominence production.
For typical development there are some acoustic studies of prominence production showing that infants already start to manipulate F0, intensity and duration in canonical babble (Davis, MacNeilage, Matyear, & Powell, 2000; De Clerck, Pettinato, Verhoeven, & Gillis, 2017). This early production ability originates from infants’ perceptual sensitivity to the prosodic properties of the ambient language (Friederici, Friedrich, & Christophe, 2007; Jusczyk, Cutler, & Redanz, 1993; Ramus, 2002). CI infants’ speech perception is not only degraded but they also gain access to auditory input at a later chronological age. Because of this initially delayed access to the prosodic properties of their ambient language, their ability to manipulate prosodic cues is likely to be delayed as well. However, hardly any studies have focused on prelexical prominence production by early implanted infants (i.e. infants who received their CI before the age of 2). The present study therefore investigates prosodic modulation in the prelexical utterances of congenitally deaf infants who received a CI before the age of 2, compared to their typically developing peers.
Prosody production by CI users
Prosody production, and more specifically word stress production, is problematic for severe-to-profoundly deaf children. They modulate prominence inadequately, resulting in anomalous intonation due to a slower speech rate (Clement, 2004; van den Dikkenberg-Pot, Koopmans-van Beinum, & Clement, 1998) as well as excessive or monotone pitch production (Clement, 2004; Kent, Osberger, Netsell, & Hustedde, 1987). Although cochlear implantation substantially improves speech perception, school-aged CI children still have problems with particular aspects of speech modulation in production, such as prominence production. Lenden and Flipsen (2007) investigated the prosodic characteristics of conversational speech of 3- to 6-year-olds with CI (implanted before age 3) by means of the Prosody-Voice Screening Profile (Shriberg, Kwiatkowski, & Rasmussen, 1990): 83.1% of their utterances were classified as inappropriately stressed and 96.7% of these inappropriately stressed utterances were judged to have ‘excessive, equal or misplaced’ stress. Carter, Dillon, and Pisoni (2002) conducted a non-word repetition task in a group of 8- to 10-year-old English children with CI (mean age at implantation 3.3 years). They found that the accuracy of imitating the stress pattern of non-words was only 61%, but there was no normally hearing (NH henceforth) control group included in the study to compare the results with. In a study with 6- to 9-year-old Dutch children with CI (age at implantation before 2 years) a non-word repetition task was performed in which the participants had to imitate the stress pattern of disyllabic non-words (Hide, 2013). It was found that CI children had a lower percentage of correctly imitated stress patterns (86% for trochaic and 81% for iambic non-words) than an age-matched NH control group (95% for trochaic and 97% for iambic non-words). Moreover, the disyllabic imitations were classified into trochaic or iambic utterances by three raters. For the non-words that were unanimously judged as having stress on the first or the second syllable, the CI children showed less acoustic modulation than the NH children regarding pitch rises and pitch rise duration. All these studies show that school-aged CI users have difficulties with producing prosodic prominence.
Prosody production in prelexical speech
Although the production of word stress by school-aged CI users has already been investigated, studies of prosodic prominence production in prelexical speech are rare, even for typically developing infants. Davis et al. (2000) showed that prelexical NH English infants already have the necessary abilities to produce prominent syllables, but they do not yet use these in a consistent way. Raters indicated a clearly prominent syllable in less than half of the disyllabic babble. However, in those utterances with a prominent syllable F0, intensity and duration values were modulated in a way similar to adult speech. For all three cues the ratios between the more prominent and the less prominent syllables were comparable for infants and adults. One of our previous longitudinal studies followed nine NH infants acquiring Dutch from the onset of babbling until they reached a cumulative vocabulary of 200 words (De Clerck et al., 2017). The three cues to prosody were measured in disyllabic babble and early words. It was found that from babbling onwards infants differentiate acoustically between prominent and less prominent syllables. Intensity and F0 already tend to be slightly higher in the first syllables of infant babble, but the predominant trochaic pattern was well established in early word use. Words had more prosodic modulation than babble in terms of F0 and intensity.
Research on early prominence production in CI infants is even rarer than in NH infants, therefore an acoustic study was carried out in which the three cues to prosody in disyllabic babble and words of infants with CI and a NH control group were measured (Pettinato, De Clerck, Verhoeven, & Gillis, 2017). The results indicated that CI infants show poorer use of acoustic cues, especially of F0 and intensity. This discrepancy between CI and NH infants started from babbling onwards and the gap between the two groups became even wider when looking at early word productions. In acoustical terms, infants with CI showed less prosodic modulation from babbling onwards and the emergence of the first words did not boost prosodic modulation as was the case for NH infants.
Perceptual judgements to assess prominence production
The study of Pettinato et al. (2017) focused on the acoustic phonetics of individual prosodic cues in babble and words. Measuring F0, intensity and duration reveals subtle prosodic differences between CI and NH infants in the realisation of the isolated prosodic cues to prominence. However, a limitation of this method is that acoustically measured differences do not necessarily have a perceptual effect, meaning that raters are not necessarily able to detect these differences in prominence production. A body of research on covert contrasts in phonological development shows that significant differences between two categories of speech sounds can be measured acoustically but these differences are not perceived as such by raters (Li, Edwards, & Beckman, 2009; Scobbie, Gibbon, Hardcastle, & Fletcher, 1996). Li et al. (2009) for instance investigated the contrasts between voiceless sibilant fricatives in English and Japanese 2- to 3-year-olds. Both languages have a contrast between an alveolar fricative /s/ and a postalveolar fricative (English: /∫/, Japanese: /ɕ/). For four out of 22 English and two out of 21 Japanese children they found significant acoustic difference between the two types of fricatives, although a trained native transcriber transcribed all productions as /s/ (for English) or as /ɕ/ (for Japanese). Therefore, a perceptual experiment is needed to complement the acoustic findings of Pettinato et al. (2017).
An additional motivation for a perceptual study is that the prosodic cues can be in trade-off relations in speech (Lieberman, 1960), e.g. in an utterance with a prominent first syllable the measurement of one cue (e.g. F0) could for instance be higher on the first syllable, while the others are not (e.g. intensity and duration). Consequently, the separate acoustic measurements give an ambiguous image of the realised stress pattern. Raters have been shown to make stress judgements by taking into account these trade-off relations between the different cues instead of decoding the signal into the three separate cues to prominence (Flege & Bohn, 1989; Fry, 1958). Perceptual judgements thus complement acoustic measurements because raters indicate which prosodic prominence pattern they hear by taking into account all cues at the same time.
The aim of the current study is to investigate prosodic modulation in the babble of CI and NH infants in a perceptual experiment in which naive adult raters indicate the degree and direction of prosodic differentiation in disyllabic babble on a visual analogue scale (VAS). A VAS is a psychometrical measurement tool consisting of a line with two opposite speech characteristics of a stimulus at the extremes of the scale (Munson, Schellinger, & Carlson, 2012). In our study those opposites are ‘a very prominent first syllable’ versus ‘a very prominent second syllable’.
Canonical babble in CI infants
The focus of this study is on prosodic modulation in canonical babble of CI and NH infants. Canonical babbling is considered to be an important milestone in the construction of adult-like syllables. Canonical babble consists of consonant–vowel sequences that can be reduplicated or variegated (Oller & Eilers, 1988; Stoel-Gammon & Otomo, 1986). In reduplicated babble the sequences follow each other repetitively (e.g. ‘didi’, ‘mamama’). In variegated babble infants produce different segments in the syllable sequences (e.g. ‘badigogu’, ‘kamobido’). Koopmans-van Beinum, Clement, and van den Dikkenberg-Pot (2001) propose a sensorimotor description of the prelexical speech of NH and deaf infants. They consider speech as a combination of phonation and articulation. In very early speech infants either phonate or articulate. Later on they combine phonation and articulation, resulting in the production of sequences of syllables, i.e. canonical babble. In their study Koopmans-van Beinum et al. (2001) show that deaf infants have difficulties combining phonation and articulation and thus to produce canonical babble. They argue that auditory perception and (auto-)feedback are requirements to coordinate articulatory movements and phonation of the airstream, which is essential for the production of canonical syllables.
Cochlear implantation at an early age improves speech perception and production of severely deaf infants, but studies on prelexical speech show that infants with CI continue to have difficulties with modulating the speech stream. When it comes to canonical babble, CI infants do produce more repetitive vocalisations after implantation and their reduplicated babble resembles that of younger infants with normal hearing in that the utterances consist of similar numbers of CV-repetitions (Fagan, 2015). However, Schauwers, Gillis, and Govaerts (2008) show that the intersyllabic structure of prelexical utterances remains less variegated in infants with CI: they produce more reduplicated and less variegated babble than their NH peers. NH and CI infants prefer vowel variegation over consonantal variegation and this preference is significantly stronger in the CI group. Moreover, consonant variegation in CI utterances was found to be predominantly simple (manner OR place change) rather than complex (manner AND place change) which characterised NH children’s consonant variegation. In sum, the canonical babble of CI infants is less varied or modulated at the segmental level than the babble of NH infants. Infants with CI are able to produce reduplicated babble, but they do not modulate the speech stream to the same extent as their NH peers in order to produce a variety of segments. This results in less variegated babbling.
In sum, it is observed that infants with CI have less modulated prosody in acoustical terms (Pettinato et al., 2017), and they have less modulation at the segmental level (Schauwers et al., 2008). Consequently the question arises whether CI speech is characterised by a more general modulation issue that manifests itself both on the segmental (cf. Schauwers et al., 2008) and the suprasegmental level. This has not yet been investigated, but the present study explores the possible influence of two maturation effects on prosody production. First of all it might be the case that the expansion of phonological capacities influences prosodic modulation, i.e. the phonological complexity of the babble might influence modulation. More concretely, reduplicated babble might be prosodically less modulated than segmentally more modulated variegated babble. Beside the expansion of phonological capacities another maturation effect might impact prominence production. It might be the case that infants’ experience with producing canonical babble (i.e. their chronological age, counted from the onset of babbling) leads to more prosodic modulation and/or more frequent production of the predominant trochaic pattern.
Research questions
Two research questions are the focus of the present study:
Is the spontaneous canonical babble of congenitally deaf infants with CI perceived to be prosodically less modulated than the babble of their NH peers?
Given the initially delayed access to auditory input and the deprived perception of the prosodic cues to prominence, it is hypothesised that raters will perceive less modulated prosody production in the babble of infants with CI. Moreover, the acoustic study of Pettinato et al. (2017) has shown less modulation in pitch and intensity in CI babble. It is tested whether maturation effects (i.e. phonological abilities and/or babble experience) influence prosodic modulation.
2. Do listeners perceive the babble to be predominantly trochaic? Is there a difference between CI and NH infants in this respect?
The predominant stress pattern in Dutch disyllables is trochaic (Daelemans, Gillis, & Durieux, 1994). It is hypothesised that listeners will perceive more trochaic babble if infants produce the dominant prosodic characteristics of their ambient language from babbling onwards. It is tested whether maturation effects (i.e. phonological abilities and/or babble experience) influence prosodic modulation.
Method
Participants
This study was carried out in nine infants with a CI and nine NH infants from the CLiPS Child Language Corpus (CCLC), a collection of longitudinal audio-video data and transcriptions of 10 children with a CI and 40 NH Dutch children (Molemans, 2011; Van den Berg, 2012; Van Severen, 2012). One infant with CI was not selected for the present study because of too few recordings in the relevant age range. All children had been raised in monolingual homes acquiring Belgian Dutch.
The children with CI were recruited from an ENT unit of the St Augustinus hospital in Antwerp/Belgium in 2000–2001 (Schauwers, 2006). The participants had all been diagnosed with a profound congenital hearing loss through a neonatal hearing screening during the first weeks of life. No other health or developmental problems were apparent. The children received a multichannel Nucleus-24 CI (Cochlear Corp., Sydney, Australia). This device consists of 22 intra-cochlear electrodes, like more recent CIs. The data and results in the present study are still representative since the present-day electrodes are not significantly improved in transmitting spectro-temporal information.
All infants were implanted before 20 months, with age of implantation ranging from 5 to 19 months (M = 12 months; SD = 5 months). Before implantation the average hearing threshold, i.e. Pure Tone Average (PTA), was 113 dBHL and the range was 93–120 dBHL (SD = 9 dBHL). One year after implantation the PTA steadily decreased to 30–52 dBHL (M = 40 dBHL; SD = 7 dBHL). All recordings used in this study were made while the children were unilaterally implanted. The auditory characteristics are listed in Table 1.
Auditory characteristics of the CI children.
Notes: PTA = Pure Tone Average at the age of 2; dBHL = decibel hearing level; CI = cochlear implant; ↓ = progressive hearing loss.
As a control group for this study, nine NH children were randomly selected from the corpus. The infants had been recruited from day-care centres, families known by the researchers and by advertisements. The typical development of these children had been established on the basis of parent report and a checklist of the attainment of communicative and motor milestones, largely based on the checklist developed by ‘Kind en Gezin’, the Flemish infant welfare centre (Molemans, 2011). Normal language development had been verified by means of the Dutch version of the CDI (‘N-CDI’, Zink & Lejaegere, 2001). The N-CDI was filled out by the parents of the NH children to estimate productive and receptive vocabulary development. The mean percentile for the infants included in this study was 37.9 (SD = 28.4; range = 5.5–94.5) at 1;0, 46.9 (SD = 23; range = 20–90) at 1;6 and 51.7 (SD = 29.5; range = 10–90) at 2;0.
For the present experiment babble data from both groups were selected from the onset of babbling until the children reached a cumulative vocabulary level of 200 words (cf. ‘Experimental stimuli’ below). This cut-off point was arbitrary but motivated by the fact that infants do not stop babbling after the onset of word use and because the selected time range provided enough data for every child included in this study. This procedure implies that the chronological age of each child at the beginning of data selection and at the end of data selection was (potentially) different. The onset of babbling was determined by a True Canonical Babbling Ratio (tCBR) of 0.15 (Molemans, Van den Berg, Van Severen, & Gillis, 2011; Oller & Eilers, 1988). The tCBR is the ratio of the syllables with ‘true consonants’ (i.e. all consonants except glottals [/h/, glottal stop] and glides [/w/, /j/]) over all syllables produced. The mean age of the CI children at the start of the recordings used in the present study was 18 months (SD = 5 months). The mean age at the cut-off point was 27 months (SD = 5 months). The mean age of the NH children at the onset of recordings was 8 months (SD = 2 months). The mean age at the cut-off point was 22 months (SD = 2 months). The ages of the individual children at the time of recording is given in Table 2.
Recording information on the individual children.
Notes: CI = cochlear implanted infants; NH = normally hearing infants; Age start: age at the onset of babbling; Age end: age at a cumulative vocabulary of 200 words; SD = standard deviation.
Perceptual experiment
A perceptual experiment was set up in which naive adult raters indicated the degree and direction of prosodic differentiation in disyllabic babble on a VAS.
Adult raters
Thirty naive native Dutch-speaking adults (mean age 23 years, range 18–42 years) made the prominence judgements in the perceptual experiment. The raters were naive since they were not informed about the purpose of the study. Moreover, they were not familiar with CI and NH children’s speech. None of the raters reported problems with hearing or health, nor developmental disorders.
Experimental stimuli
For the present study, all eligible disyllabic utterances were selected from the CCLC. Each recording lasted 60–90 minutes and the portions of the recordings in which the child was vocally most active were selected for further processing. These selections were restricted to approximately 20 minutes. From the selections only disyllabic babbled utterances were selected when they consisted of two vocalic phases minimally separated by a clear consonantal phase. Moreover, disyllabic babble was included if there was no concurrent speech or noise and if they were not produced with a creaky, breathy, excessively loud or whispery voice. A total of 524 disyllabic babble utterances (165 CI babble, 359 NH babble) was included as stimuli for the perceptual experiment (see Table 2).
Since maturation effects might influence the production prosodic modulation, two variables were investigated in this study. The first maturation variable is the infants’ babble experience. Since infants start babbling at different ages and since there is a large discrepancy between the chronological age at onset of babbling of the CI and NH infants, it is more meaningful to take into account the chronological age counted from the onset of babbling onwards. More specifically, the month in which the child started babbling was labelled as 0, the first month after the onset of babbling as 1, the second month as 2, etc. The second maturation variable is the phonological complexity of the babble. The canonical babble utterances were categorised into reduplicated and variegated babble. The prelexical utterances in the recordings of the CCLC are transcribed in broad phonemic categories (detailed information on the transcription is provided in Molemans, 2011; Van den Berg, 2012; Van Severen, 2012). These broad phonemic transcriptions were used to automatically categorise babble into variegated or reduplicated babble. When the same segmental content was repeated in the two syllables the utterance was classified as reduplicated (e.g. ‘baba’). Whereas in variegated babble the syllables differ at the segmental level by consonantal and/or vocalic variation (e.g. ‘baka’). This means that there were two intermediary steps: the sound files were first transcribed as part of a previous research project (Molemans, 2011; Van den Berg, 2012; Van Severen, 2012), and then these transcriptions were automatically coded as reduplicated or variegated babble. Of the disyllabic babble included in the present study 44% was reduplicated babble (reduplicated CI babble: 45%, reduplicated NH babble 43%) and 56% of the stimuli was variegated babble (variegated CI babble: 55%, variegated NH babble: 57%). Table 2 contains an overview of the number of reduplicated and variegated babble utterances per child.
As a reliability check 20% of the automatically categorised stimuli (106 babble: 53 NH and 53 CI utterances) was manually recategorised into reduplicated and variegated babble by the first author. The raw sound files were used to judge whether the babble has repeated segmental content or not. In 91 out of 106 utterances categorisation was identical, resulting in an agreement of 86% and a Cohen’s kappa of 0.68.
Experimental design
In the rating task, the raters indicated the more prominent syllable. Moreover, by moving the slider on the VAS they indicated the relative degree of prominence of the syllable. Studies have shown that perceptual ratings on a VAS are informative for capturing the perceptual effects of fine phonetic detail in child speech (Julien & Munson, 2012; McAllister Byun, Harel, Halpin, & Szeredi, 2016; Munson et al., 2012). Munson and Carlson (2016) showed that a VAS is a useful method for experimental and observational studies of within-category speech sound perception as it allows gradual responses.
The VAS used in the present study was a sliding bar of which a screencast can be found in supplementary material 1 and a print screen of three different rating positions can be found in Figure 1. Two circles above the sliding bar served as a visualisation of the two syllables of each stimulus. The initial position of the slider was the midpoint of the VAS (Figure 1(a)). When sliding to the left, the left circle became larger while the right one became smaller, indicating a more prominent first syllable (Figure 1(b)). In order to indicate a more prominent second syllable, the slider was moved to the right, creating a larger right circle and a smaller left circle (Figure 1(c)). The position of the slider moved along a scale ranging from 0 (extreme left) to 100 (extreme right). Thus, a range between 0 to 49 means a more prominent first syllable (the lower the number, the more prominent the first syllable). Point 50 indicates equally prominent syllables, and a position in the range from 51 to 100 indicates a more prominent second syllable (the higher the number, the more prominent the second syllable).

Print screens of three different rating positions on the sliding bar. (a) Equally prominent syllables. (b) Prominent first syllable. (c) Prominent second syllable.
Experimental procedure
Before the start of the experiment, the raters went through an information phase and a familiarisation phase. During the information phase, the purpose of the experiment was explained in a section with instructions (see supplementary material 2) and raters were told that they could replay an utterance two times (i.e. each utterance could be played maximally three times).
The familiarisation phase consisted of an example and trial phase. During the example phase utterances with clear prominence relationships (i.e. clear equal prominence or a clearly prominent first or second syllable) were presented. The raters saw an animation of six clear examples of how the sliding bar was moved to the appropriate position. During the trial phase the raters had to indicate the prominence on similarly clear utterances. The task in the trial phase was the same as in the actual experiment, i.e. ‘indicate the prosodic modulation and stress pattern on the VAS’. The description and design of the example and trial utterances can be found in supplementary material 3. In the actual experiment 524 disyllabic babble utterances produced by CI and NH infants were presented in random order to the raters. The total duration of the experiment was approximately 60 minutes. For every stimulus we recorded the rate and how many times raters listened to the stimulus.
The experimental interface was created by means of Functional Reactive Programming (Czaplicki, 2012), a tool developed for creating responsive graphical interfaces. The programming language used to create the front end was Elm, and Haskell was used for the back end. The experiment was designed as an online application and ran on an iMac (2.9 GHz Intel Core i5, on a 21.5 inch screen) in a quiet room for all 30 participants. During the experiment the selected disyllabic babble utterances were presented one by one to the raters via SONY MDR-1R headphones. Every stimulus was preceded and followed by one second of surrounding silence.
Statistical approach
Generalised mixed models
To analyse the data generalised mixed models (GLMM) (Baayen, 2008) were run in R (R Core Team, 2013) with the lme4 package (Bates, Maechler, Bolker, & Walker, 2014). GLMM is an appropriate tool to examine data that are hierarchically structured: the stimuli in the present study (N = 524) are embedded in different infants (N = 18) and all utterances are rated by different raters (N = 30). Moreover, GLMM is robust to missing data and different numbers of data points per participant. These models consist of two parts: a random and a fixed part. The random part takes into account the variation caused by the random effects, e.g. the variation between individuals. The fixed part consists of the independent variables that may have an effect on the dependent variables, i.e. participant group (CI or NH babble), utterance type (variegated or reduplicated babble) and age at onset of babbling.
The statistical models are constructed iteratively: first the random effects are added one by one and then the fixed effects are added one at a time. After adding a random or fixed effect the new model is compared to the previous model by means of likelihood ratio tests. Only the effects that improve the fit of the model are included. The best fitting model explains the largest amount of variance with the smallest set of predictors.
The dependent variables
The ratings on the VAS resulted in a continuous score from 0 to 100, representing the degree of prosodic modulation. A set of analyses with this continuous dependent variable were conducted (see supplementary material 4). However, in neither of the analyses were the residuals normally distributed, meaning that a GLMM analysis of the continuous variable was not permitted. As an alternative the data were recoded into binomial dummy variables that are informative of the rating zone on the VAS instead of using the raw score (from 0 to 100) as dependent variable.
As shown in Figure 2, the collected ratings form a multimodal distribution: a distribution on the left and right side of the VAS and a peak at the midpoint of the VAS. These variables were ‘trochee’, ‘iamb’, ‘midpoint’. ‘Trochee’ (score: 0–49) represents the utterances that are rated as having a prominent first syllable, which is the predominant Dutch pattern. ‘Iamb’ (score: 51–100) represents the utterances that are rated as having a prominent second syllable. ‘Level’ (score: 50) represents the utterances that are rated at the midpoint of the axis. The midpoint was also the default position of the sliding bar before being moved by the raters. When an utterance is rated at the midpoint this is an indication that there is no clear prosodic differentiation between syllables.

Distribution of all ratings per participant group.
The binomial dependent variables were analysed by means of logistic regression analyses in the form of GLMM. The estimates (E), standard errors (SE), t- and p-values of the fixed effects of the four best fitting models (i.e. one for every dependent variable) are reported in the results section. These values are computed in logits, but to facilitate the interpretation they are converted to the likelihood or probability (expressed in percentages) to be rated in this area of the VAS.
The aim of the analyses was to investigate for every rating zone whether (1) CI utterances were more likely to be rated in the respective zone as compared to NH utterances and vice versa, (2) chronological age at onset of babbling has an impact and (3) whether reduplicated babble is more likely to be rated in a particular zone as compared to variegated babble and vice versa.
The first research question focuses on the prosodic modulation in the babble of CI and NH infants. Three analyses were performed to answer this research question. The first analysis looked at the probability of an utterance being rated at the midpoint of the VAS (score: 50). Since the midpoint of the VAS is considered to be the zone that represents the least prosodic modulation, this analysis aims to answer the question whether the babble of infants with CI is more likely to be rated in this zone. The analysis controls for a possible effect of babble experience (i.e. chronological age counted from the onset of babbling) and phonological complexity (i.e. the babble type: reduplicated versus variegated babble) on prosodic modulation.
The second analysis that investigates raters’ perception of prosodic modulation focused on those utterances that were indicated as having the most modulated prominence. This analysis aims to answer the question whether the utterances of the NH group are more likely to be rated as having a clearly prominent syllable than the utterances of infants with CI. Again age at onset of babbling and the babble type are added as predicting variables.
The utterances that are rated more towards the extremes of the VAS are considered to have the most prosodic modulation. However, looking at the extremes of the raw VAS (e.g. the stimuli rated in zone 0–20 and zone 80–100) would not be informative since individual raters may well apply different strategies to indicate prominence on a VAS: some use the entire scale (from 0 to 100), but most of them do not use the entire VAS and tend to centre their ratings in a smaller range (e.g. from 19 to 82). Therefore it is more informative to convert the ratings of every rater to z-scores. The z-scores are used to determine the extreme ratings per individual rater. A stepwise approach was used to determine these extreme ratings. First of all the range of the ratings of each rater was determined: e.g. for rater 1 the leftmost z-score was −2.33, whereas the rightmost rating was a z-score of 3.01. In a next step this range was divided into five equal parts in order to determine the 20% leftmost and 20% rightmost parts of the VAS for each individual rater: e.g. for rater 1 the extremes range from −2.33 and −1.26 (left) and 1.94 and 3.01 (right). The utterances rated in the first and fifth part of the selected range were coded as being rated at the extremes of the scale: e.g. for rater 1 the left extreme included 45 utterances and the right extreme 24 utterances. The dependent variable in the third analysis was the number of times an utterance was replayed by the raters. If the prosodic modulation of a stimulus is hard to judge, raters are expected to replay it more often than when prominence is clear.
The second research question was whether listeners perceive the babble to be predominantly trochaic and whether there is a difference between CI and NH infants in this respect. This was investigated by two analyses. The first analysis looked at the probability that an utterance was rated on the left, trochaic side (score: 0–49) of the VAS. The second analysis looked at the probability that an utterance was rated on the right, iambic side (score: 51–100) of the VAS. Both analyses control for possible maturation effects by adding age at onset of babbling and utterance type to the models.
Results
Analyses of raters’ perception of prosodic modulation in babble of CI and NH infants
The first set of analyses aims to answer the first research question: is spontaneous canonical babble of congenitally deaf infants with CI perceived to be prosodically less modulated than the babble of their NH peers?
Midpoint
The first analysis looked at the probability of an utterance being rated at the midpoint of the VAS (score: 50). Since the midpoint of the VAS is considered to be the zone that represents the least prosodic modulation, this analysis aims to answer the question whether the babble of infants with CI is more likely to be rated in this zone. Moreover, it is investigated whether maturation variables (i.e. utterance type and babble experience) influence the likelihood of babble being rated with no modulation.
The best model with ‘midpoint’ as dependent variable had participant group and utterance type as fixed effects (supplementary material 5). Adding age at onset of babbling did not improve the fit model. The random part controlled for the variance of the different infants, utterances and raters. The intercept of the model was estimated at −3.222 logits or 4% (SE = 0.297, z = −10.833, p < 0.001), meaning that babble is significantly less likely to be rated at the midpoint, than anywhere else on the VAS. This finding is not surprising since this variable focuses on only one point on the VAS, in comparison to all other points. The finding indicates that the cursor was very likely being moved when making a judgement on the prominence pattern of babble. Regarding the question whether there is a difference between CI and NH infants, the results show that CI utterances are significantly more likely to be rated at the midpoint: reduplicated CI babble utterance has a probability of 3.5% of being rated at the midpoint, whereas the probability is only 2.6% for reduplicated NH utterance (−3.222 – 0.401 logits; E = −0.401, SE = 0.188, z = −2.127, p = 0.033). Reduplicated babble of CI infants is judged as having less modulated prominence. The same holds for variegated babble. Variegated CI babble utterance has a probability of 2.2% (−3.222 – 0.527 logits; E = −0.527, SE = 0.136, z = −3.878, p < 0.001) to be rated at the midpoint and that of a NH infant has a probability of 1.5% (−3.222 – 0.401 + 0.527 logits) to be rated at the midpoint. All these differences are shown to be significant. These results indicate that babble of CI infants is significantly more likely judged as having no clearly prominent syllable. A similar result is found for the difference between variegated and reduplicated babble. In both infant groups reduplicated babble had a higher probability (i.e. 3.5% for a CI- and 2.6% for a NH-reduplicated babble) to be rated at the midpoint compared to segmentally variegated babble (i.e. 2.2% for a CI- and 1.5% for a NH-reduplicated babble). The age at onset of babbling did not improve the fit of the statistical model, meaning that it did not impact the probability of being rated with no clearly stressed syllable.
It can be concluded that babble of CI children is more likely to be rated as having no clearly prominent syllable. Regarding the effect of the maturation variables it can be concluded that phonetic complexity influences prosodic modulation: reduplicated babble is more likely to be rated as having no clearly prominent syllable and thus as having less prosodic modulation, irrespective of the infant group or the age at onset of babbling at which they are produced.
Extremes
The best model with ‘extreme’ as dependent variable had participant group and utterance type as fixed effects (supplementary material 6). Again the random part controlled for the variance of the different infants, utterances and raters and the fixed effects were participant group and babble type. The intercept of the model was estimated at −0.826 logits or 30% (SE = 0.116, z = −7.133, p < 0.001), indicating that babble is significantly less likely to be rated at the extremes than anywhere else on the VAS. Regarding the question whether there is a difference between CI and NH infants in prosodic modulation in babble, the results indeed show that NH utterances are significantly more likely to be rated at the extremes: there is a probability of 30% for reduplicated babble of a CI infant being rated at the extremes, whereas the probability that reduplicated NH babble is rated at the extremes is 36% (−0.826 + 0.255 logits; E = 0.255, SE = 0.129, z = 1.981, p = 0.048). When looking at variegated utterances it appears that variegated CI babble has a probability of 38% (−0.826 + 0.308 logits; E = 0.308, SE = 0.084, z = 3.667, p < 0.001) and that variegated NH babble has a probability of 43% (−0.826 + 0.255 + 0.308 logits) to be rated at the extremes. All these differences are shown to be significant. The age at onset of babbling has no significant effect on the probability of being rated at the extremes (E = 0.004, SE = 0.010, z = 0.426, p < 0.670).
It can be concluded that the canonical utterances of the NH infants are more likely to be rated at the extremes of the ratings compared to the CI infants’ utterances. This is an indication that the prosodic modulation within NH babble is more salient. This difference between groups is apparent in both variegated and reduplicated babble. Regarding the effect of the maturation variables it can be concluded that phonetic complexity influences prosodic modulation: variegated babble is more likely to be rated at the extremes than reduplicated utterances irrespective of the infant group or the age at onset of babbling at which they are produced.
Analysis of number of replays
A third analysis investigates how many times an utterance was replayed by the raters. The random effects again account for the variance caused by the different infants, raters and stimuli. The fixed effects in this analysis were infant group, babble type and extreme (i.e. part of the extreme ratings or not). The fixed effects in the best fitting model were participant group and the variable ‘extreme’ (i.e. the utterance is rated at one of the extremes or not). Adding babble type did not improve the fit of the model, meaning that there was no significant difference in the number of replays of reduplicated versus variegated babble. The results of this analysis are displayed in supplementary material 7.
The intercept of this statistical model was estimated at 0.423 (SE = 0.046, t = 9.156, p < 0.001), meaning that an average CI utterance that is rated anywhere on the VAS except at the extremes is replayed on average 0.423 times. Utterances of the NH infants are repeated significantly less than those of the infants with CI (E = −0.058, SE = 0.017, t = −3.349, p = 0.004), indicating that it was easier to judge the prominence of NH babble than CI babble. The stimuli rated at the extremes were repeated significantly less than those on the rest of the VAS (E = −0.126, SE = 0.009, t = −14.613, p < 0.001). The age at onset of babbling did not improve the fit of the statistical model, meaning that it did not impact the number of replays. These results are supporting evidence for the findings of the VAS analysis that showed that NH stimuli are rated as having clearer prominence and as those utterances that were rated at the ends of the continuum are repeated less often and are thus easier to judge.
Analyses of raters’ perception of the trochaic pattern in babble of CI and NH infants
The following two analyses aim to answer the second research question: Do listeners perceive the predominant trochaic stress pattern in babble? Is there a difference between CI and NH infants in this respect?
Trochee
The first analysis looked at the probability that an utterance was rated on the left, trochaic side (score: 0–49) of the VAS. The random part of the best fitting model controlled for the variance explained by participant identity (i.e. the 18 infants), utterance identity (i.e. a unique number for each of the 524 babble utterances) and rater identity (i.e. the 30 adult raters). The only fixed effect required in the best fitting model was infant group (CI versus NH). Adding babble type (reduplicated versus variegated babble) and age at onset of babbling as fixed effects did not improve the fit of the model. An overview of the best fitting model with ‘trochee’ as dependent variable can be found in supplementary material 8.
The intercept of the model was estimated at 0.157 logits or a likelihood of 54% (SE = 0.268, z = 0.586, p = 0.558). This positive intercept indicates a higher probability for utterances to be rated as a trochee than at any other zone on the VAS. However this effect is not significant. There is no significant difference between the ratings of the CI and NH babble (E = −0.041, SE = 0.345, z = −0.120, p = 0.904). These results show that babble is not significantly more likely to be rated at the trochaic side of VAS, nor that the NH infants are more likely to produce trochees than CI infants or vice versa. The babble type did not improve the fit of the statistical model, indicating that the segmental variation of the utterance does not lead to more ratings on the trochaic side of the VAS. Age at onset of babbling did not improve the fit of the model either, meaning that there is no age effect on the probability to be rated as a trochee. Regarding the second research question it can be concluded that the predominant trochaic pattern is not significantly more often perceived by raters and this is the case for both infant groups.
Iamb
The aim of the second analysis was to see whether there is a difference between groups and babble types regarding the probability to be rated on the right side of the VAS (score: 51–100). The same fixed and random effects as in the analysis with trochee as dependent variable were added. The statistical model with the best fit (supplementary material 9) had participant identity, utterance identity and rater identity as random effects and participant group and utterance type as fixed effects. The intercept was estimated at −0.929 or a likelihood of 28% (SE = 0.320, z = −2.905, p = 0.004). This significant negative intercept indicates that utterances are less likely to be rated on the right side of the VAS than in all the other rating zones together. As for the analysis of trochees, there was no significant difference between the ratings of the CI and NH babble (E = 0.094, SE = 0.389, z = 0.243, p = 0.808) suggesting that there is no difference between groups in the probability that an utterance is rated as having a more prominent second syllable. The analysis showed that variegated babble is significantly more likely to occur at the iambic side of the VAS (E = 0.401, SE = 0.192, z = 2.088, p = 0.037). For the CI infants the likelihood of reduplicated babble occurring on the iambic side of the VAS is 28% (−0.929 logits) whereas it is 37% for variegated babble (−0.929 + 0.401 logits). For the NH infants the likelihood of reduplicated babble occurring on the iambic side of the VAS is 30% (−0.929 + 0.094 + 0.401 logits) and 39% for variegated babble (−0.929 + 0.401 logits). It can be concluded that variegated babble is more likely to be rated as having a more prominent second syllable than babble with a repetitive segmental content, irrespective of the children’s hearing status.
Discussion
This article examines the prosodic modulation in the babble of congenitally deaf infants with CI and their NH peers. Two previous studies of the infants included in the present study suggested that early implanted infants with CI may have difficulties with modulating speech from babbling onwards: one study showed that CI infants’ babble is segmentally less modulated (Schauwers et al., 2008) and the other study showed that CI babble is prosodically less modulated (Pettinato et al., 2017). Schauwers et al. (2008) showed that infants with CI produce less variegated and more reduplicated utterances than their NH peers. Moreover, there is less segmental variation in variegated babble of CI infants. Regarding the modulation of the acoustic cues to prosodic prominence, Pettinato et al. (2017) showed that infants with CI made smaller differentiation between the two syllables of their babble utterances in terms of pitch and intensity, whereas there was no significant difference in duration differentiation in babble. However, since the acoustic cues were measured separately in this study and since trade-off relations between the three prosodic cues are expected to occur (Lieberman, 1960), the current study aimed to take into account the relations between separate cues by presenting babble to raters in a perceptual rating task. This was done by means of a VAS on which naive adults had to indicate the degree and direction of differentiation between the two syllables of babble. The click location on the VAS was analysed in order to find an answer to the following research questions:
Is spontaneous canonical babble of congenitally deaf infants with CI perceived to be prosodically less modulated than the babble of their NH peers?
Do listeners perceive babble to be predominantly trochaic? Is there a difference between CI and NH infants in this respect?
Regarding the first research question, results of this study showed that babble of the CI infants was indeed perceived as less prosodically modulated. This was shown in the analyses of the rating position on the VAS and confirmed by the analysis of the number of replays. First of all, the analyses of the VAS ratings revealed that the babble of CI infants was more likely to be scored at the midpoint of the VAS: this is the default position of the slider which represents equally prominent syllables. Therefore, this result indicates that raters found it more difficult to decide which syllable was more prominent in CI babble than in NH babble. Second, the VAS ratings showed that CI babble was less likely to be scored towards the extremes of the rating scale. Utterances that are rated at the extremes are considered to have the most prosodic modulation: one of the two syllables is clearly prominent. The higher likelihood of CI babble being rated towards the centre of the VAS rather than towards the extremes of the scale is consistent with a perception of smaller prosodic modulation in the canonical babble of CI infants. Interestingly, the analysis of the number of replays as dependent variable has also shown that raters listened significantly more often to CI babble than to NH babble. This is also suggestive of the greater difficulty in judging which syllable is more prominent in CI utterances.
The result that adult raters perceived CI babble as being less modulated is consistent with the literature, which shows anomalous prominence production in school-aged children with cochlear implants (Carter et al., 2002; Hide, 2013; Lenden & Flipsen, 2007). Moreover, the results from the present perceptual experiment support the previous acoustic findings that pitch and intensity are less modulated in CI babble (Pettinato et al., 2017). The present study shows that gradient ratings on a VAS reveal subtle differences in prosodic modulation in infant babble and therefore reveal modulation differences between a typical group and a clinical group (CI group in this case). This finding confirms the studies that show the importance and validity of perceptual judgements (on a VAS) to track subtle phonetic differences (McAllister Byun et al., 2016; Munson & Payesteh, 2013; Munson et al., 2012).
The analyses also controlled for the influence of possible maturation effects on prosodic modulation in canonical babbling. The analyses showed that an infant’s experience with babbling did not boost prosodic modulation, i.e. babble produced shortly after the onset of babbling was not less modulated than babble produced later and vice versa. On the other hand, the infants’ phonological capacities did influence prosodic modulation. The analysis showed that phonologically more complex, variegated, babble is significantly more likely to be rated towards the extremes of the VAS than towards the midpoint of the scale: this means that variegated babble has more prosodic modulation. No interaction effect between the infant groups and utterance types was found, which indicates that variegated babble is rated as being more modulated in both groups. Since infants with CI produce less variegated babble in comparison to NH infants (Schauwers et al., 2008) it is possible that prosodic modulation in the babble of CI infants is linked to reduced segmental variegation in their utterances. It should be noted that this study is the first to explore this link between segmental and suprasegmental modulation. Therefore the present result is interpreted as an exploratory finding that needs replication in a study that is designed to investigate this link.
A plausible explanation for the indirect link between the lack of modulation at the segmental and at the suprasegmental level is that (prelexical) infants with CI have difficulties modulating the speech stream, due to their reduced perception of the speech signal. Infants with CI perceive less variation in the speech signal due to the limited transmission of spectro-temporal information. The results of the present study suggest that this reduced perception of variation is also reflected in reduced production of variation, both at the segmental and the suprasegmental level. Moreover, the degraded perception of the speech stream leads to storage deficits in working memory (Nittrouer, Caldwell-Tarr, & Lowenstein, 2013). Children with more limited working memory in turn are less efficient in processing stress (Torppa et al., 2014). The results of the present study suggest that perception and processing problems in CI infants not only impact word stress discrimination but also word stress production. More specifically, a reasonable hypothesis is that the workload involved in producing two segmentally and/or prosodically equal syllables (i.e. reduplicated babble and equally prominent syllables) is lower than the workload required for producing modulated syllables (i.e. variegated babble and babble utterances with a prominent syllable). This hypothesis suggested by the results of the present study requires further experimental testing in order to figure out whether children with poorer working memory indeed have more modulation issues at the segmental as well as the suprasegmental level.
The second research question was whether listeners perceive the predominant trochaic pattern in the babble of infants and whether there is a difference between CI and NH infants in this respect. Studies on early prominence production often focus on the occurrence of characteristics of the ambient stress pattern (Davis et al., 2000; Hallé, De Boysson-Bardies, & Vihman, 1991). In the present article raters indicated whether the utterances were trochaic (i.e. the ambient Dutch pattern), iambic or equally stressed. The results indicate that naive adult raters did not show a trochaic or an iambic bias in their judgements. Moreover, the results did not show a difference between CI and NH babble in this analysis. Previous acoustic studies (De Clerck et al., 2017; Pettinato et al., 2017) showed that intensity and F0 tend to be higher in the first syllables of infant babble, thus suggesting a trochaic pattern. The predominant trochaic pattern became more dominant than the iambic pattern in early word use. However this tendency is not found in the present perceptual study. It might well be the case that raters find it substantially more difficult to judge the stress pattern of meaningless babble compared to meaningful first words. Although several studies have found ambient language effects in the babbling stage (De Boysson-Bardies, Hallé, Sagart, & Durant, 1989; De Boysson-Bardies & Vihman, 1991; Whalen, Levitt, & Wang, 1991), Engstrand, Williams, and Lacerda (2003) for instance have shown that the detection of ambient-language effects in the babble utterances of 18-month-olds becomes substantially easier for trained raters when the dataset includes possible words and imitations. Similarly, the previous study on NH infants has shown enhanced prosodic differentiation in words as compared to babble (De Clerck et al., 2017). The ambient trochaic pattern thus may only be detectable from word use onwards. Therefore it would be interesting to compare the present results to the results of a rating task in which early lexical utterances are judged.
In conclusion, the present study shows that perceptual judgements by naive raters reveal subtle prosodic differences between the canonical babble of congenitally deaf infants with CI and NH infants. The raters indicated less prosodic modulation in CI babble. Interestingly, segmentally more variegated utterances are rated as having more prosodic modulation. Listeners did not perceive a bias towards the predominant trochaic pattern.
Supplementary Material
Supplementary Material, supplemental_material_2_9 – Prosodic modulation in the babble of cochlear implanted and normally hearing infants: A perceptual study using a visual analogue scale
Supplementary Material, supplemental_material_2_9 for Prosodic modulation in the babble of cochlear implanted and normally hearing infants: A perceptual study using a visual analogue scale by Ilke De Clerck, Michèle Pettinato, San Gillis, Jo Verhoeven and Steven Gillis in First Language
Footnotes
Acknowledgements
Our special thanks go to the families and infants who participated in the study and to K. Schauwers, I. Molemans, R. van den Berg and L. Van Severen for collecting the CLiPS Child Language Corpus. We would also like to thank the two anonymous reviewers and Associate Editor for their constructive comments and suggestions.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research of Ilke De Clerck was funded by a PhD Fellowship grant of the Research Foundation – Flanders (FWO). The research of Michèle Pettinato was funded by a BOF-DOCPRO grant (ID 28259) of the Research Council of the University of Antwerp.
Supplementary material
Supplementary material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
