Abstract
In this study, segmental and prosodic aspects of word repetition and non-word repetition in typically developing children aged four to six years were investigated. Focus was on developmental differences, and on how tonal word accent and word length affect segment production accuracy. Prosodically controlled words and non-words were repeated by 44 Swedish-speaking children. Repetition accuracy for both words and non-words increased with age, and was higher for words than non-words. Further, tonal word accents I and II provided different conditions for segment repetition in favor of accent II during both word repetition and non-word repetition for older children, but only during word repetition for younger children. This suggests age-dependent differences regarding how prosody is stored and integrated with segments. The findings have theoretical significance regarding the role of prosody in the perception, processing and production of phonological information. There are also clinical implications concerning the interpretation of repetition tasks and the potential use of prosody in speech and language intervention.
Keywords
Introduction
The task of repeating non-words has been used for a range of purposes in many different populations (Carter, Dillon, & Pisoni, 2002; Catts, Adlof, Hogan, & Ellis Weismer, 2005; Dollaghan & Campbell, 1998; Hakim & Ratner, 2004; McCarthy & Warrington, 1984; Yoss & Darley, 1974). Even though the question of what mechanisms underlie non-word repetition has not yet been fully answered, it has generated important knowledge of memory and language processes. Children’s ability to repeat non-words predicts language development (Adams & Gathercole, 2000; Gray, 2003; Sahlén, Reuterskiöld-Wagner, Nettelbladt, & Radeborg, 1999), and has potential as a clinical marker of language impairment (Dollaghan & Campbell, 1998). The aim of the present study is to examine how typically developing children aged four to six years perform on a prosodically balanced word and non-word repetition task, and to investigate how lexical prosody influences repetition on the segmental level. Manipulation of prosody in these kinds of tasks has the potential to generate information both about how prosody influences performance, and about how prosody interacts with other features of words and non-words, as well as with cognitive processes like short-term memory and lexical access.
Prosody
Echols (1993) proposed a framework where children rely on stress patterns when forming word representations. According to this framework, early word perception and production are limited by the lesser perceptual saliency of unstressed compared to stressed syllables. In line with this, Gerken’s (1991) metrical hypothesis predicts that young English-speaking children prefer a speech rhythm that follows a strong–weak pattern. Words and phrases not conforming to this pattern are consequently more prone to reduction, e.g. omission of unstressed syllables in words with an iambic (weak–strong) pattern (Gerken, 1994; McGregor & Johnson, 1997). However, the results of Vihman, DePaolis, and Davis (1998) suggest that the vulnerability of unstressed syllables in a certain stress pattern might be language dependent, as French children preferred an iambic pattern.
Prosodic properties have been found to interact with grammar. The metrical hypothesis provides a possible explanation for the omissions of unstressed grammatical markers in young typically developing children, and in children with language impairment (Hansson, Nettelbladt, & Leonard, 2003; Leonard & Bortolini, 1998). In a study by Demuth, Patrolia, Song, and Masapollo (2011), investigating the development of articles in young Spanish-speaking children, it was shown that the prosodic complexity of a language may also be predictive of children’s use of grammatical morphemes, with higher complexity, e.g. a high frequency of three- and four-syllable words, prompting earlier acquisition of articles in some prosodic contexts. Another example of the relation between prosody and grammar was presented by Bassano et al. (2013), who concluded that the preference for an iambic stress pattern in French-acquiring children might drive earlier development of determiner use together with monosyllabic nouns, compared to children acquiring Austrian German or Dutch.
Tonal word accents
In contrast to most other Germanic languages outside of Scandinavia, Swedish has tonal word accent (henceforth referred to as just ‘accent’, not to be confused with regional or foreign accent) as a phonologically distinctive feature, and all Swedish words have either accent I or accent II. Accent in this regard constitutes word level intonation, with different fundamental frequency patterns over the words for each of the accents (see Figure 1).

Spectrogram of productions of two Swedish words with accent I (left), stegen ‘the footsteps’, and accent II (right), stegen ‘the ladder’, of the regional dialect spoken by the children. The bottom panel shows the fundamental frequency pattern in more detail.
The accents are stress dependent, and occur in relation to the primary stressed syllable of a word (Riad, 2014). In Central Swedish, accent I is realized with a single-peak falling tone, while accent II is double peaked (Cruttenden, 1997). Accent I can occur in words of any length with stress on any syllable, whereas accent II can only occur in words with an unstressed syllable after the stressed syllable. Subsequently, monosyllabic words, and words with primary stress on the ultimate syllable, can only take accent I (Riad, 2014). Accent is further partly predictable from the words’ morphological properties. For example, words with suffixes indicating definite article or present tense take accent I, while words with infinitive or plural endings bear accent II. Words with monosyllabic stems typically have accent I, and words with disyllabic stems have accent II. Compounds have accent II, with a few exceptions (Bruce, 2012).
The question whether the Swedish accents are specified for certain morphemes in the mental lexicon has generated three partly diverging theories. Both accents are lexically specified and chosen based on morphological and metric properties of the word, according to Bruce (1977). Others have claimed that accent II is the specified accent, while accent I is the default intonation pattern, chosen when there is no information specifically inducing accent II (Riad, 2014). Lahiri, Wetterlin, and Jönsson-Steiner (2005) instead claimed that accent I is specified by certain word properties, and that accent II is assigned to words by default.
Peters and Strömqvist (1996) showed that Swedish-speaking children acquire accent II before accent I, often before they are two years old, which might be related to greater acoustical and perceptual saliency of accent II. Engstrand, Williams, and Strömqvist (1991) and Engstrand, Williams, and Lacerda (2003) confirmed that production of distinct accent II patterns emerges around 18 months. According to Plunkett and Strömqvist (1992), mastery of the distinction between the accents is not achieved until the age of four.
Word and non-word repetition
Gathercole and colleagues (e.g. Gathercole & Adams, 1993; Gathercole & Baddeley, 1989, 1990; Gathercole, Willis, & Baddeley, 1991) have considered non-word repetition to be a relatively pure measure of phonological short-term memory, with lexical phonological knowledge influencing performance to some degree (Gathercole, 2006). Others, like Snowling, Chiat, and Hulme (1991), suggested that non-word repetition is mainly influenced by long-term lexical knowledge. Bowey (1996, 2001) and Metsala (1999) argue that the performance on non-word repetition tasks is related to lexical restructuring and increased phonological sensitivity.
How well non-words are repeated is influenced by a number of properties of the non-words themselves. In general, longer, more phonologically complex non-words, that violate the phonotactic rules of the speaker’s language, are harder to repeat (Coady & Evans, 2008), as are non-words with low wordlikeness (Gathercole, 1995) and phonotactic probability (Coady & Aslin, 2004).
Prosodic features have been shown to affect typically developing children’s non-word repetition ability in several languages (Archibald & Gathercole, 2007; Sahlén et al., 1999; Santos, Bueno, & Gathercole, 2006; Yuzawa & Saito, 2006; Yuzawa, Saito, Gathercole, Yuzawa, & Sekiguchi, 2011). Yuzawa and Saito (2006) found that two-syllable non-words pronounced with a high pitch accent facilitated repetition compared to non-words that were pronounced with a flat pitch accent, in Japanese children three to four, but not five to six, years of age. Similar results were obtained by Yuzawa et al. (2011), who also found effects of temporal structuring of non-words on repetition performance, with stimuli containing a short interval in-between morae being easier to repeat than no-interval non-words. The opposite effect of intervals between syllables was described by Archibald and Gathercole (2007), where English-speaking children aged seven to 13 years repeated non-words more accurately than sequences containing the same syllables with pauses in-between. Using a non-word repetition task controlling for stress, Roy and Chiat (2004) found that two- to four-year-olds, with English as a first language, rarely omitted stressed syllables. Children two to three years old omitted twice as many syllables as children aged three to four years, but the impact of stress position on performance did not change with age. Unstressed syllables in pre-stressed positions were more frequently subject to omission, yielding different repetition scores for non-words of equal length, but with differing stress patterns. Similarly, Sahlén et al. (1999), in a study of word and non-word repetition in Swedish children with language impairment aged five to six years, investigated the effect of stress pattern on syllable omissions. Although whole syllable deletions were rather unusual, they were six times as frequent in the pre-stressed than post-stressed position. Interestingly, there were no differences in syllable omissions between non-words and words (Sahlén et al., 1999). In Brazilian-Portuguese speaking children aged four to 10 years, Santos et al. (2006) found segmental errors to be more frequent in syllables after the stressed syllable. In a study of children with specific language impairment (SLI), aged 12 to 14 years, there was an effect of metrical complexity, i.e. stress pattern, on errors at the segmental level. Especially syllables outside of the trochaic foot generated more sub-syllabic errors (Marshall, Ebbels, Harris, & van der Lely, 2002).
Comparing segment types, vowels are easier to repeat than consonants (Santos et al., 2006; Yuzawa & Saito, 2006). Generally, vowels are acoustically more prominent than consonants due to their greater duration and amplitude (Ladefoged & Disner, 2012). However, aside from perceptual salience emanating from acoustical properties, the Consonant Vowel (CV) hypothesis (Nespor, Peña, & Mehler, 2003) stipulates that vowels and consonants are functionally different in the processing of speech and language. Consonants mainly carry lexical information, while vowels concern grammatical relations and prosody. Most languages have few vowels and many consonants, which allows for more contrasts and richer lexical cues among consonants. However, this functional difference does not seem to be conditioned by consonants being considerably more numerous than vowels in a language (Hochmann, Benavides-Varela, Nespor, & Mehler, 2011), and has been observed in, for example, French and English (Nazzi, Floccia, Moquet, & Butler, 2009) and Italian (Toro, Nespor, Mehler, & Bonatti, 2008), languages that all have different consonant to vowel ratios. Swedish has quite an unusual distribution with its 18 consonants and 18 vowels (Riad, 2014).
As the focus of the present study is on phonological and prosodic production within the task of word and non-word repetition, some account of the perception, processing and production of spoken information is called for.
Non-word repetition ability has frequently been explained within the concept of working memory (Coady & Evans, 2008; Gathercole, 2006). Allowing for storage and processing of information simultaneously over a short period of time, working memory is a multi-component memory system that includes a central executive, a visuo-spatial sketchpad, a phonological loop and an episodic buffer (Baddeley, 2008, 2012). In the present study, focus is on phonological working memory. In a more specified model of the phonological loop, which is supposedly the most crucial part for repetition of auditorily presented stimuli, Vallar and Papagno (2002) suggest that heard information is phonologically analyzed and placed into the phonological store. From there, the information can be forwarded to a phonological output buffer, and then either articulated or subvocally rehearsed.
Another account for how words are analyzed and articulated is described by Levelt, Roelofs, and Meyer (1999). In their psycholinguistic speech production model, production of a word requires several consecutive steps. First, a lexical concept is activated, then a relevant lexical representation (lemma) is selected. Thereafter, phonological encoding and syllabification can take place: the word form is retrieved, which entails access to information about the morphological structure, metrical shape (e.g. number of syllables and stress placement) and segmental constituents (speech sounds) of the word. The segments must be syllabified, which refers to how individual segments are grouped into syllables. When number of syllables, stress placement and syllable structure have been determined, the segments can be assigned to appropriate places (Levelt, 1999; Levelt et al., 1999). Repetition does not require activation of lexical concepts or lemma selection, and instead starts at the step of phonological encoding through the process of auditory word perception, and syllabification is performed on the segment string that has been activated (Levelt & Indefrey, 2000).
One of the main differences between the psycholinguistic model of speech production proposed by Levelt and colleagues (Levelt, 1989, 1999; Levelt & Indefrey, 2000; Levelt et al., 1999), and the working memory construct by Baddeley (2008, 2012), is that the latter takes into account the potential integration of long-term memory knowledge with just-heard sound information. This could be used to understand effects of wordlikeness on non-word repetition (Frisch, Large, & Pisoni, 2000). What is not regarded in the multi-component working memory model is in what way individual sounds are connected and assembled for articulation, something that is described within Levelt’s framework. These approaches could make complementary contributions to the understanding of the processes underlying repetition of words and non-words.
Judging by previous research, repetition tasks seem somewhat ill-suited for identifying isolated processes involved in the complex chain of events occurring between hearing and articulation. Yet many studies have linked such tasks, in particular non-word repetition, to abilities important for language and cognition. This motivates detailed analyses of what influences repetition accuracy in different languages and populations. Also, the relative simplicity of the testing procedure makes it attractive as a clinical assessment tool. While effects of, for example, stimulus length and wordlikeness are established as important factors, the role of prosody has been investigated to a lesser degree. The prosodic properties of Swedish, with free stress placement and two discrete accents occurring in words with the same stress, enable comparisons of repetition accuracy based on minimal prosodic contrasts.
Aim
The aim of the present study is to investigate the ability to repeat words and non-words with special focus on how accent and word length affect segment production accuracy. Focus is on developmental aspects of both segmental and prosodic repetition, as well as on the relationship between prosody and segments. Two specific questions were addressed: (1) How does the ability to repeat segmental and prosodic properties of words and non-words develop between the age of four to five and five to six years? It is expected that the older children will repeat words and non-words more accurately than the younger children, and that words will be repeated more accurately than non-words. Since the prosodic features of stress and accent should be relatively stable at the age of four, results for repetition of prosody will presumably be higher compared to segment repetition. (2) Do the prosodic properties accent and word length affect repetition accuracy at the segmental level? Longer words and non-words are predicted to give lower segment repetition accuracy. As accent II is the more salient intonation pattern, and is the accent acquired first by children with Swedish as a first language, this might give some advantage compared to accent I.
Method
Participants
In the present study, 44 monolingual Swedish-speaking children, with no history of hearing loss or deviant speech and language development, participated. All children were recruited from pre-schools, and spoke a Central Swedish dialect. In Sweden, children typically enter school the year they turn seven.
Fifty-two children agreed to participate in the study, but three were lost due to illness before testing had begun. Out of a total of 49 who were tested, five children were excluded because of hearing loss, suspected language impairment or because they were too young.
The 44 remaining participants were between 48 and 71 months of age (M = 59, SD = 7), and were divided into two age bands. The younger group consisted of 20 children, 15 girls and 5 boys, aged four to five years. The older group comprised 24 children, 17 girls and 7 boys, aged five to six years. Age ranges were 48–57 months in the younger group (M = 53, SD = 3), and 60–71 months in the older group (M = 65, SD = 3).
Tests of grammatical production (the grammar part of the Lund Test of Phonology and Grammar; Holmberg & Stenkvist, 1983), phonology (Hellqvist, 1995) and language comprehension (the Swedish Test of Language Comprehension; Hellqvist, 1989), as well as non-verbal IQ (Raven’s colored progressive matrices; Raven, Raven, & Court, 1998), indicated that all children included in the study were within their age range.
Ethical approval from the Regional Ethical Review Board in Linköping (Dnr 2013/92-31), as well as parental consent for children’s participation, was obtained prior to inclusion in the study.
Materials
Word and non-word repetition tasks
The assessment materials used in the present study contain both words and non-words, matched pairwise for the prosodic features accent, stress pattern and number of syllables.
A list of 131 non-words conforming to Swedish phonotactics was compiled. Each non-word was binarily judged as sounding like a word or not sounding like a word by eight adult judges (four males and four females aged 18–54, M = 31.25). Total wordlikeness score for each non-word ranged from 0 to 8. The least wordlike non-words from each length group (1–5 syllables) were eligible for inclusion in the non-word repetition task used in the study. The words and non-words included in the repetition task are presented in Tables 1 and 2.
Features of the words included in the repetition task.
S = main stressed syllable; W = unstressed syllable; C = consonant; V = vowel; . = syllable break.
Features of the non-words included in the repetition task.
S = main stressed (strong) syllable; W = unstressed (weak) syllable; C = consonant; V = vowel; . = syllable break; WL = wordlikeness score from 0 (lowest) to 8 (highest).
The word and non-word repetition task consisted of 25 test items respectively, 15 items with accent I and 10 items with accent II. The 25 words and 25 non-words had equal numbers of one-syllable (1 item), two-syllable (3 items), three-syllable (5 items), four-syllable (7 items) and five-syllable (9 items) items, with words and non-words matched for length, stress and accent. As far as possible, only words expected to be familiar to pre-school aged children were included. The words and non-words were of similar phonological complexity, with the words containing a total of 216 phonemes (95 vowels and 121 consonants) and the non-words a total of 212 phonemes (95 vowels and 117 consonants). They were not, however, identical with respect to syllable structure, e.g. consonant clusters.
The one-syllable items were stressed and had accent I, since those are the obligatory prosodic conditions for monosyllabic words. For the two- to five-syllable items, primary stress varied between all positions, and each stress condition had either accent I or II, when applicable (words with primary stress on the ultimate syllable can only take accent I).
Word and non-word item lists were audio-recorded by an adult female speaker with a Central Swedish regional accent.
Procedure
For each participant, testing was performed in a single session at the pre-school. Sessions lasted between 45 and 60 minutes. Language tests were administered first, followed by word and non-word repetition.
Words and non-words were presented through headphones, but due to technical problems or resistance to wearing the headphones, six children had the stimuli presented in free-field via loudspeakers. There were no indications that this affected performance in any way, and there was no obvious relation to age. In order to avoid potential discouraging effects, long and short items were mixed, so that there were no more than three consecutive four- to five-syllable items. Item order was equal for all participants. Responses were audio-recorded and transcribed by two persons trained in phonetic transcription. Inter-rater agreement (exact percentage agreement) calculated from 36% of the material was 98.9%.
Based on transcriptions, percentages of phonemes (PPC), consonants (PCC) and vowels (PVC) correct (Shriberg, Austin, Lewis, McSweeny, & Wilson, 1997; Shriberg & Kwiatkowski, 1982) were calculated for words and non-words respectively. Percentages were acquired using the formula: PCC/PVC/PPC = Number of correct sounds/Number of correct plus incorrect sounds × 100. Deletions, substitutions, distortions and metatheses of sounds were scored as incorrect. Allophonic and regional variants of sounds were scored as correct. Prosody was scored as percentages of correctly repeated accents, stress patterns and number of syllables. Also, PPC, PCC and PVC were calculated for each length (one to five syllables) and accent (I or II) condition. To be able to investigate the relationship between accent and length, PPC was calculated for two- to five-syllable word and non-word items within each accent condition. For the accent conditions (accent I or accent II), percentages of correct number of syllables, stress and PPC, PCC and PVC were computed.
Statistics
Data were analyzed with IBM SPSS Statistics 22. The effects of age, word type, accent and segment type on segment repetition accuracy (PCC and PVC), the effects of age, word type, length and accent on phoneme repetition (PPC), as well as comparisons between repetition of segments and prosody, were investigated using multifactorial repeated-measures analysis of variance (ANOVA). Follow-up pairwise comparisons were calculated with Bonferroni correction for multiple comparisons. For the composite prosody measure, correlations between prosodic variables were calculated using Kendall’s tau-b. Descriptives are presented as percentages, but due to restricted range and variance of some of the data, calculations were performed on arcsine square-root transformed values.
One participant in the older group was identified as an outlier on PPC, PCC and PVC in non-words. Analyses were re-run without this participant, with little impact on the results.
Results
The two groups of children, referred to in this section as younger and older, are compared regarding three main aspects of word and non-word repetition, which are presented as follows. First the repetition of vowels and consonants in relation to accent is described, followed by length effects on phoneme repetition in relation to accent. Finally, repetition of prosody and phonemes are compared.
Vowel and consonant repetition for words and non-words in relation to age and accent
Children’s word and non-word repetition performance was examined using a mixed repeated-measures ANOVA with word type (word or non-word), accent (I or II) and segment type (vowel or consonant) as within-group factors, and age (younger or older) as a between-group factor. Outcome measures were the percentage of consonants correct (PCC) and percentage of vowels correct (PVC) in each condition (see Table 3).
Repetition of consonants and vowels in words and non-words.
PCC = percentage consonants correct, PVC = percentage vowels correct. IQR = interquartile range.
There was a significant effect of age on repetition performance, F(1, 42) = 5.06, p = .030, partial η2 = .11, indicating better performance for older children. There was also a significant effect of whether words or non-words were repeated, F(1, 42) = 113.46, p < .001, partial η2 = .73, with non-words being harder to repeat accurately. Further, there was a significant main effect of accent, F(1,42) = 40.85, p < .001, partial η2 = .49. The individual sounds of words and non-words with accent II were repeated more accurately. The main effect of the type of segment to be repeated was also significant, F(1,42) = 116.48, p < .001, partial η2 = .80, showing that vowels were easier to repeat than consonants. There was no significant interaction between age and type of word.
A significant age × word type × accent interaction was found, F(1, 42) = 6.69, p = .013, partial η2 = .14. Pairwise comparisons showed that for the younger children, accent II was easier to repeat than accent I in words (p < .001), but not in non-words (p = .354), whereas for the older children the difference between accents in favor of accent II was present when repeating both words (p < .001) and non-words (p = .005).
The word type × accent × segment interaction was significant, F(1, 42) = 5.06, p = .030, partial η2 = .11. Pairwise comparisons revealed that in words, both consonants (p < .001) and vowels (p < .001) were easier to repeat in the accent II condition, whereas a difference between accent I and II was found for vowels (p = .003), but not consonants (p = .248), in non-words. Also, the difference between consonants and vowels was larger in non-words than in words (p < .01).
Effects of length and accent on segment repetition
In order to examine the effect of word length and accent on phoneme repetition, a mixed repeated-measures ANOVA with word type (word or non-word), word length (two to five syllables) and accent (I or II) as within-group factors, and age (younger or older) as the between-groups factor, was performed. The one-syllable item was excluded since accent II is not possible in that length condition. The assumption of sphericity was not met for the length effect, χ2(5) = 24.50, p < .001, or the accent × length effect, χ2(5) = 11.40, p = .044, as indicated by Mauchly’s test. The degrees of freedom were adjusted with Greenhouse–Geisser correction for the length effect, and with Huynh–Feldt correction for the accent × length interaction. PPC in each of the conditions was the outcome measure (see Table 4).
Repetition of phonemes in words and non-words divided by accent and length.
W = words, NW = non-words, IQR = interquartile range.
The analysis showed significant main effects of age, F(1, 42) = 7.58, p = .009, partial η2 = .153, word type, F(1, 42) = 186.40, p < .001, partial η2 = .82, length, F(2.26, 95.04) = 47.00, p < .001, partial η2 = .53, and accent, F(1, 42) = 38.89, p < .001, partial η2 = .48.
Also, there was a significant accent × length interaction, F(2.81, 118.02) = 8.14, p < .001, partial η2 = .16, showing that the effect of length differed between the accents. For items with accent I, pairwise comparisons indicated that two syllables were easier than three (p < .001), three and four did not differ (p = .928), and four were easier than five (p < .001). Comparing accent II items of different length, no difference was found between two and three syllables (p = 1.00), three syllables were easier than four (p = .001), and four- and five-syllable items did not differ (p = .108). No difference between the age bands regarding the effect of accent on length could be found, as indicated by the non-significant age × accent × length interaction, F(3, 126) = 1.64, p = .183, partial η2 = .04. Neither was the word type × accent × length interaction significant, F(3, 126) = 2.30, p = .080, partial η2 = .05. The effect of accent and word length on PPC is shown for both younger and older children combined in Figures 2 (words) and 3 (non-words).

Percentage of phonemes correct (PPC) in relation to accent and word length in words for all children.

Percentage of phonemes correct (PPC) in relation to accent and word length in non-words for all children.
There was a significant word type × length interaction, F(3, 43) = 3.56, p < .016, partial η2 = .08, indicating different effects of length on PPC between words and non-words. Follow-up pairwise comparisons revealed the following: For words, two-syllable items were easier than three-syllable items (p < .001), there was no difference between three- and four-syllable items (p = .122), and four-syllable items were easier than five-syllable ones (p < .001). For non-words, there was no difference between two- and three-syllable items (p = 1.00), three-syllable items were easier than four-syllable ones (p = .002), which in turn were easier than five-syllable items (p = .003).
Repetition of prosody in relation to segment repetition
In order to examine how well the children repeated the overall prosodic form of the words and non-words, a composite score was computed from the supra-segmental repetition measures using a unit weighted method. This composite consisted of the mean values of correctly repeated primary stress and syllable number, which were significantly correlated for both real words, τ = .71, 95% Bca CI [.555, .817], p < .001, and non-words, τ = .65, 95% Bca CI [.396, .819], p < .001. As a ceiling effect was found for accent repetition, it was excluded from the composite score calculation. Descriptives for PPC and repetition of the prosodic features in words and non-words are shown in Table 5.
Repetition of phonemes and prosodic features in words and non-words.
PPC = Percentage phonemes correct; Accent = percentage correctly repeated tonal word accents; Stress = percentage correctly repeated stress patterns; Length = percentage correctly repeated number of syllables; Composite = unit weighted composite score based on Stress and Length. IQR = interquartile range.
Note: Accent was excluded from the composite score calculation.
Mixed repeated-measures ANOVA, with word type (word or non-word) and level (supra-segmental composite prosody or segmental PPC) as within-group factors, and age as the between-groups factor, revealed significant main effects of age, F(1, 42) = 11.80, p = .001, partial η2 = .22, word type, F(1, 42) = 36.62, p < .001, partial η2 = .47, and level, F(1, 42) = 12.77, p = .001, partial η2 = .23. The older children performed better than the younger, words were repeated more successfully compared to non-words, and prosody was easier to repeat than segments. However, the significant word type × level interaction, F(1, 42) = 23.35, p < .001, partial η2 = .36, revealed that prosodic features were easier to repeat than phonemes in non-words (p < .001), but not in words (p = .934). Prosodic accuracy did not differ between words and non-words (p = .078), whereas segments were easier in words (p < .001), as shown by pairwise comparisons. The age × word type × level interaction was non-significant, F(1, 42) = 1.17, p = .287, partial η2 = .03.
Discussion
The older children outperformed the younger on both word and non-word repetition, which was expected from earlier studies investigating age differences (Edwards & Lahey, 1998; Gathercole & Baddeley, 1989, 1990; Roy & Chiat, 2004; Stokes et al., 2006). Also, in line with previous research, words were, in general, repeated more easily than the non-words by children of both age bands (Casalini et al., 2007; Chiat & Roy, 2007; Dispaldro, Deevy, Altoé, Benelli, & Leonard, 2011; Roy & Chiat, 2004). Another expected finding was that vowels were repeated more accurately than consonants (Santos et al., 2006; Yuzawa & Saito, 2006).
In words, performance was quite high for segmental as well as for prosodic features, and phonemes were repeated equally accurately as stress patterns and number of syllables. In non-words, segments stood out as being considerably harder to repeat than the prosodic features. Segment repetition was poorer in non-words than in words, while prosodic repetition was no different between word conditions. This was hardly surprising considering that the prosody was equal for words and non-words. One conclusion that can be drawn is that familiarity with the segment sequence of a non-word does not appear to influence the repetition of the prosodic features of the same non-word. As will be pointed out below, the opposite seems to be true however.
In general, segments in words and non-words with accent II were easier to repeat than those in items with accent I. Although both accents’ intonational patterns were produced with ease by the four- to six-year-olds in the present study, the two accents seem not to have provided equal conditions for segment repetition. In terms of lexical specification of the accents, this finding might support the view that accent II is the default, less demanding, one (Lahiri et al., 2005), thus enabling more cognitive resources to be focused on repeating the segments correctly. The potential facilitative effect of accent for non-words repetition appears to develop with age, though. This is evidenced by the finding that the younger children displayed segment repetition in favor for accent II only when repeating words, while the advantage for accent II was found during both word and non-word repetition for the older children. Apparently, the younger children did not benefit more from the accent II intonation when stimuli were less familiar. Greater acoustical and perceptual saliency of accent II (Peters & Strömqvist, 1996) might account for the general trend for segment accuracy to be higher in accent II words and non-words. It does not, however, provide an explanation for the lack of difference between accents in the young children’s non-word repetition. One reason for this might be the way in which prosody is represented in long-term memory. The present findings could indicate that before the age of five, the segmental and prosodic features of a word are stored interdependently to a greater extent than after five years. Accent II, being the phonological default as well as the perceptually more prominent intonation pattern, might then give an advantage over accent I for correct repetition of segments in familiar words, while relative inability to generalize the beneficial properties of accent II to unknown segment sequences results in the difference being eradicated for non-words.
What drives this supposed developmental change in the organization of prosody and its relation to phonological, morphological and lexical information cannot be deduced from the present results. But similar to the way in which an expanding vocabulary causes a need for finer phonological representations at the segmental level (Metsala, 1999), it may also bring about changes in the level of detail prosodic representations have. Stress and tonal word accents are used to discriminate between words; they can be used to distinguish meaning in minimal pairs (Riad, 2014), and they help in classifying morphological information in the words we hear (see Bruce, 2012), enabling better processing of spoken language. Therefore, it seems reasonable to assume that an increasing number of words in the mental lexicon places more demand for precision on the prosodic representations.
In a working memory framework (Baddeley, 2012; Vallar & Papagno, 2002), accents may play a role by facilitating processing and storage to varying degrees. The present design does not allow conclusions to be drawn about exactly where in this process accent is important, but it is plausible that the phonological analysis is in some cases more or less facilitated by the respective accents. Accent II might enhance phonological analysis and encoding more than accent I, either by enabling more help from long-term memory representations (Gathercole, 1995), or by freeing up more short-term memory and other cognitive resources.
As pointed out by Gupta, Lipinski, Abbs, and Lin (2005) common psycholinguistic theories, such as Levelt’s model of speech production (Levelt, 1989, 1992; Levelt et al., 1999), do not fully explain how previously unknown words are processed. However, if non-words enter the model in the same way as words during repetition, as described by Levelt and Indefrey (2000), it can be hypothesized that the insertion of segments into the prosodic frame of both words and non-words is facilitated more by accent II for children five to six years of age, but that this beneficial effect is not as developed in younger children. Accent II might also facilitate activation of the non-word segment string, again only for words in children under the age of five years.
The advantage for accent II was evident for overall performance, but the effect of accent varied considerably between the different length conditions, and between individual items of the same length. Performance on three- and five-syllable items contributed the most to the difference between the accents, while the difference was much smaller for four-syllable items. This indicates that other factors, like differences in phonotactic probability and word frequency, influenced repetition performance beyond what could be explained by length or accent. Experiments that carefully control for such variables would be necessary in order to further elucidate the role of tonal word accents in relation to word length in the rapid processing of linguistic material.
The results of the present study showing the difference between vowels and consonants to be larger for non-word than for word repetition may be related to the findings of Yuzawa and Saito (2006). In their study of non-word repetition in Japanese children, all non-words had an association-value, describing how strongly they were associated with real words by adult judges. Comparing consonant and vowel repetition, a greater difference between the segment types was found for the non-words with low association values. In the present study, the real words can probably be expected to be more wordlike than the non-words (even if this is not as obvious for children as for adults). As such, they supposedly enabled the use of long-term memory knowledge to a greater extent (Gathercole, 2006).
Taken together, these results suggest that long-term memory knowledge affects consonants more than vowels. There are several possible explanations for this outcome. Vowels have higher acoustic energy and are therefore easier to perceive than consonants (Ladefoged & Disner, 2012). Consonants may be harder to perceive than vowels when cognitive resources are scarce, for instance during non-word repetition when there is less support from long-term memory representations. Alternatively, a functional difference between vowels and consonants, as stipulated by the CV hypothesis (Nespor et al., 2003), might implicate that the relative absence of help from lexical representations affects consonants more, as their function mainly is to convey lexical information.
The word and non-word repetition materials used in the present study have some issues that ought to be addressed. First, items were not matched for syllable structure, which means that differences in repetition accuracy between items partly might be explained by this. Second, we did not assure that the real words used were in fact known to the children, leaving the possibility of these words functionally being non-words. On a third note, there were more accent I items in the repetition tasks, since some stress patterns make accent I obligatory. As a result of this, the proportion of unstressed syllables in pre-stressed position was greater in accent I items (one-third) compared to accent II (one-fourth), probably increasing the relative risk of syllables being omitted (Roy & Chiat, 2004) in words and non-words with accent I. However, syllables were rarely omitted in the present study, and no significant difference between pre- and post-stressed syllables could be found.
Constructing non-word repetition tasks is in itself a somewhat unwieldy undertaking, and tasks are obviously constructed differently depending on what information is of interest. Ideally, many variables ought to be controlled, so that items are non-wordlike without violating phonotactic rules or being too difficult to repeat, while also considering degree of wordlikeness, phonotactic probability, syllable structure, prosodic features, and the sounds included. The will to compare non-words to real words adds further complexity, with the non-words having to be matched with the real words without being too similar.
The present findings have some clinical implications. Children with language impairment, who have difficulties with, for example, phonology and lexicon, often receive treatment on the word level. If words with one particular accent facilitate learning, this ought to be taken into account when planning intervention. Segment accuracy is likely to be better in accent II words, motivating the choice of the more difficult accent I words in treatment of phonological problems at the segmental level. In contrast, if there are severe difficulties with segments, clinicians might also consider using accent II words, making production and perception as easy as possible. Further, some assessment tools include non-word repetition as a measure of phonological short-term memory: the longer the stimuli one is able to repeat, the better the short-term memory. As the word length effect seems to differ between prosodic conditions, it is important that non-word repetition tests control for prosodic features such as accent and stress, or that prosody is taken into account when test results are interpreted. Otherwise, there is a risk of prosodic difficulties being interpreted as problems with short-term memory.
Conclusion
The present study contributes to the knowledge of perception and production of phonology in typically developing children, within the task of repeating real words and non-words. The results point to prosody as one of several factors influencing performance. Repetition accuracy improves from the age of four to six years, and prosody is likely to play different roles in younger and older children. The relative importance of phonological working memory and phonological or prosodic knowledge seems to change with increasing age. Regarding the relation between prosody and segments, the accent of words and non-words provides different conditions for how well phonemes are repeated.
Footnotes
Acknowledgements
The authors wish to thank Petra Martikainen and Jasmine Andersson for their contribution to the study. A special thanks also to the children who participated.
Funding
This work was supported by grants from the Sven Jerring Foundation.
