Abstract
To acquire language and successfully communicate in multicultural and multilingual societies, children must learn to understand speakers with various accents and dialects. This study investigated adults’ and 5- to 8-year-old children’s perception of native- and nonnative-accented English sentences in noise. Participants’ phonological memory and phonological awareness were assessed to investigate factors associated with individual differences in word recognition. Although both adults and children performed less accurately with nonnative talkers than native talkers, children showed greater performance decrements. Further, phonological memory was more closely tied to perception of native talkers whereas phonological awareness was more closely related to perception of nonnative talkers. These results suggest that the ability to recognize words produced in unfamiliar accents continues to develop beyond the early school-age years. Additionally, the linguistic skills most related to word recognition in adverse listening conditions may differ depending on the source of the challenge (i.e., noise, talker, or a combination).
1 Introduction
Each time a listener encounters a spoken word it will differ from previously encountered instances. Adult listeners are typically highly successful at mapping these rapidly changing, highly variable acoustic signals onto words in their lexicons. However, substantial decrements in word recognition can occur for adults under adverse listening conditions stemming from environment-, talker- or listener-related factors (Mattys, Davis, Bradlow, & Scott, 2012). A particularly difficult–yet commonly occurring–perceptual challenge occurs when a listener encounters speech from a talker with an unfamiliar regional dialect or nonnative accent. In these cases, listeners must contend with word productions that differ from their previous experience in terms of vowels, consonants, or suprasegmental features (e.g., Clopper & Smiljanic, 2015; Clopper, Pisoni, & de Jong, 2005; Sereno, Lammers, & Jongman, 2016). Nonnative speech is especially demanding (Adank, Evans, Stuart-Smith, & Scott, 2009), partially due to substantial inter- and intra-talker variability (Hanulíková & Weber, 2012). Although adult listeners tend to have more difficulty understanding nonnative talkers than native talkers (Bent & Bradlow, 2003; Munro & Derwing, 1999), their ability to identify nonnative-accented words and sentences improves with short-term laboratory training (Bradlow & Bent, 2008; Clarke & Garrett, 2004; Sidaras, Alexander, & Nygaard, 2009).
Children must learn to recognize words spoken with novel accents to successfully communicate in multicultural and multilingual societies. Without this ability, receptive language development would be slowed and frequent errors in mapping input to words in the lexicon would occur. Similar to findings with adults, recent studies have demonstrated that infants and toddlers have more difficulty understanding words produced in unfamiliar accents than familiar ones (Barker & Turner, 2015; Best, Tyler, Gooding, Orlando, & Quann, 2009; Mulak, Best, Tyler, Kitamura, & Irwin, 2013; Schmale & Seidl, 2009; Schmale, Hollich, & Seidl, 2011), but can improve their perception with exposure and experience (Schmale, Cristia, & Seidl, 2012; Schmale, Cristia, Seidl, & Johnson, 2010; van Heugten & Johnson, 2014; van Heugten, Krieger, & Johnson, 2015; White & Aslin, 2011). However, studies with very young children are limited in that they cannot examine open-set word identification–the precise task that is necessary for real-world spoken communication.
The few studies examining older children’s open-set word identification with novel accents have found improvement between 4 and 7 years of age in the perception of isolated words produced in an unfamiliar native dialect or nonnative accent (Bent, 2014; Nathan, Wells, & Donlan, 1998). However, children’s performance was significantly poorer than that of adults, which suggests that the development of word recognition under a variety of adverse listening conditions continues at least into middle childhood (Bent, 2014). Previous open-set word identification studies with naturally produced dialects and accents were limited by only examining single-word stimuli (Bent, 2014; Nathan et al., 1998). To fully capture children’s abilities to understand speech produced in unfamiliar accents and dialects, the examination of longer utterances is necessary, particularly because real-world communication typically involves utterances longer than a single word. The only study to date that has investigated children’s perception of a novel accent with sentence-length materials is Newton and Ridgway (2016). In this study, 6- and 7-year-old children’s perception of sentences produced with a novel accent that included vowel alterations as produced by a phonetically trained speaker was compared to children’s perception of sentences produced in their home dialect. Artificially created accents of this type approximate vowel deviations found across regional dialects and nonnative accents, but allow the experimenter to tightly control how the “accent” deviates from native norms (Maye, Aslin, & Tanenhaus, 2008). When the sentences were produced with the novel accent, children required a more favorable signal-to-noise ratio compared to the familiar accent. Therefore, children had more difficulty understanding the novel accent in noise than their home dialect. In contrast to the artificially created accent used in Newton and Ridgway (2016), naturally-produced longer utterances include additional deviations from native language norms including differences in rhythm, intonation, and stress, which listeners must accommodate (Munro, 1995; van Els & de Bot, 1987). Furthermore, memory demands increase as utterances lengthen, which may be particularly problematic because acoustic-phonetic deviations from typical pronunciations tend to slow children’s processing (Creel, 2012). Although longer utterances introduce these challenges, sentences and longer stretches of speech allow for the use of top-down, contextual information, which can facilitate children’s speech recognition in difficult listening circumstances (Elliott, 1979; Kalikow, Stevens, & Elliott, 1977). Therefore, to fully capture children’s perceptual abilities, it is important to examine their perception of unfamiliar accents with speech materials of various lengths and complexities.
In addition to developmental changes, there are significant individual differences in speech perception under adverse conditions, even for groups of young adult listeners with normal speech, language, and cognitive abilities (Gilbert, Tamati, & Pisoni, 2013; Tamati, Gilbert, & Pisoni, 2013). A substantial range of cognitive and linguistic variables has been investigated as possible sources underlying this variability, including vocabulary size, indexical processing, short-term and working memory, cognitive function, making perceptual wholes from fragments, familiar sound recognition, task switching, and selective attention (Adank & Janse, 2010; Benichov, Cox, Tun, & Wingfield, 2012; George et al., 2007; Janse & Adank, 2012; Kidd, Watson, & Gygi, 2007; Pichora-Fuller, Schneider, & Daneman, 1995; Tamati et al., 2013; van Rooij & Plomp, 1990; Watson, Qiu, Chamberlain, & Li, 1996; Zekveld, George, Kramer, Goverts, & Houtgast, 2007). However, the skills supporting perception of novel accents may only partially overlap with the skills needed to perceive speech in other types of adverse conditions, such as speech in noise. When perceiving familiar native-accented speech in noise, the target signal will be fairly well matched to the acoustic-phonetic representations of words in the lexicon; however, even native-accented speech includes substantial variability including both within- and across-talker differences (e.g., context, speaking style, affect and gender), which listeners must contend with. For speech in noise, listeners must tune into the target signals while suppressing competing sounds. In contrast, when presented with speech produced in a nonnative accent, listeners must resolve acoustic-phonetic mismatches that arise due to phoneme additions, deletions, and substitutions as well as suprasegmental errors common in nonnative speech. Some cognitive-linguistic skills–such as short-term memory and vocabulary knowledge–may be important for both perception of speech in noise (Pichora-Fuller et al., 1995; Tamati et al., 2013; van Rooij & Plomp, 1990) and for perception of speech that deviates from native language norms (Banks, Gowen, Munro, & Adank, 2015; Bent, 2014; Janse & Adank, 2012). However, other skills–such as task switching, selective attention, or inhibition–may be more closely related to perception of and adaption to novel accents (Adank & Janse, 2010; Banks et al., 2015; Janse & Adank, 2012).
In this study, the chronological and individual differences approaches are combined. We investigate how school-aged children’s perception of nonnative-accented sentences differs from adults’ perception of such sentences. Further, we investigate how two components of phonological processing relate to both children’s and adults’ word identification abilities. Specifically, we test how phonological memory–the ability to remember auditory linguistic information–and phonological awareness–the metalinguistic ability related to thinking and talking about the sound structure of language–relate to nonnative- and native-accented word recognition. The first skill investigated, phonological memory, has been linked to adults’ speech perception abilities under adverse listening conditions, including perception of speech in noise and novel accents (Janse & Adank, 2012; Pichora-Fuller et al., 1995; Tamati et al., 2013; van Rooij & Plomp, 1990). In addition, phonological memory is related to the ability to learn the phonological forms of new words both in the first language (Baddeley, Gathercole, & Papagno, 1998; Gathercole, Hitch, Service, & Martin, 1997; Jarrold, Thorn, & Stephens, 2009) and second language (Hu, 2003) in adults. Here, we extend these results to examine whether phonological memory predicts children’s ability to identify known words with unfamiliar pronunciations in noise. The ability to store auditory, phonological information in a short-term store may facilitate the comparison of deviant speech signals to words in the lexicon. The second skill investigated, phonological awareness, does not appear to be related to native-accented speech-in-noise perception for typically-developing children (Lewis, Hoover, Choi, & Stelmachowicz, 2010). However, listeners who are better at consciously manipulating speech sounds (i.e., have better phonological awareness) may be more adept at resolving the mapping between unfamiliar acoustic-phonetic realizations of words and items in the lexicon. Therefore, we predict that phonological awareness will enhance the ability to decode nonnative-accented speech.
2 Method
2.1 Participants
Thirty-eight children (26 female) participated including the following ages: 5-year-olds (n = 10); 6-year-olds (n = 9); 7-year-olds (n = 10); and 8-year-olds (n = 9). An additional four participants were tested, but their data were not included due to hearing screening failure (n = 1), bilingual language background (n = 2), or failure to attend both testing sessions (n = 1). Forty-two adults (22 female) with an average age of 21 years (range = 18–23) were also tested. An additional 11 adults were tested, but their data were not included due to hearing screen failure (n = 3), computer error (n = 2), not meeting the age requirement for a standardized test (see below) (n = 3), trilingual language background (n = 1), or experimenter error (n = 2). All participants were monolingual American English speakers. Although some children had experience living in other parts of the U.S. (n = 8), most children had lived exclusively in Indiana (n = 26) or other neighboring Midwestern states including Ohio (n = 2) and Illinois (n = 2). Similarly, most adult participants grew up in Indiana (n = 30) or other Midwestern states including Michigan (n = 2), Ohio (n = 2), Illinois (n = 1), and Missouri (n = 1). The seven remaining adults grew up in other parts of the continental U.S.
All participants had normal hearing (as evidenced by passing a pure-tone hearing screening of 20 dB at 500, 1000, 2000 and 4000 Hz as well as 25 dB at 250 Hz). Children demonstrated developmentally appropriate vocabulary skills as shown by a standard score of 85 or higher on the Peabody Picture Vocabulary Test–4th edition (PPVT-4) (Dunn & Dunn, 2007) (average = 119; range = 97–150). Scores on the PPVT-4 are normally distributed standard scores in which the 50th percentile corresponds to a standard score of 100 with a standard deviation of 15. Therefore, the participants in our study had vocabulary scores that were average to very superior for their ages with a mean that was in a high average range. Children also had developmentally appropriate articulation skills based on their performance on the Goldman–Fristoe Test of Articulation–2nd edition (GFTA-2) (Goldman & Fristoe, 2000). The average standard score was 103 (range = 89–112) and all children scored in the 12th percentile or higher. In contrast to the PPVT-4, scores on the GFTA-2 are not normally distributed, but are positively skewed; standard scores and standard deviations cannot be interpreted in the same way as those for the assessments of abilities that are normally distributed.
Prior to participation, adult listeners completed an informed consent form and a language background questionnaire. For child participants, parents completed an informed consent form and a linguistic experience and exposure questionnaire. Children 7 years of age or older completed an assent form. On the language background questionnaire, parents reported their child’s exposure to various dialects and accents on a scale of 1–5 (1 = no exposure to the variety; 5 = frequent daily exposure). Average exposure ratings for the two accents used in the study–Spanish- and Korean-accented English–were 2.1 (range = 1–4) and 1.5 (range = 1–4), respectively. Adults noted no exposure (n = 37) or limited exposure (n = 11) to either of the nonnative accents included. Listeners were paid for their participation.
2.2 Materials
The stimuli consisted of 48 sentences used in the experimental trials and two practice sentences. These materials were adapted from the Audio-Visual Lexical Neighborhood Sentence Test–AV-LNST (Holt, Kirk, & Hay-McCutcheon, 2011). The AV-LNST was developed for use with children as young as 3 years of age. The sentences are composed of one- and two-syllable words that have been found in the speech of children between the ages of 3 and 5 years. The test will be referred to here as the LNST because only auditory versions of the sentences were used. The LNST is divided into 6 lists of 8 sentences. The sentence lists have equivalent intelligibility for adult listeners when produced by a native English speaker (Holt et al., 2011). Each sentence has three key words (e.g., The crazy turtle went away). Sentence recordings were taken from the Hoosier Database of Native and Non-native Speech for Children (Atagi & Bent, 2013; Bent, 2014).
Six talkers from three language backgrounds produced the sentences: two native talkers of American English with a midland dialect (the local dialect); two nonnative talkers of English with a first language of Spanish; and two nonnative talkers of English with a first language of Korean. Two nonnative accents were selected to assess perception of nonnative speech broadly, rather than the perception of a single nonnative accent. For residents of Bloomington, Indiana, Korean- and Spanish-accented English are likely two of the most commonly encountered nonnative accents. The talkers from each language background included one male and one female. Previously gathered intelligibility, foreign-accent strength, and comprehensibility scores for each talker are shown in Table 1. The intelligibility scores reflect percent of keywords accurately transcribed for all LNST sentences in quiet as evaluated by 10 monolingual adult listeners (Bent, 2010). Twenty-seven separate adult listeners rated the talkers’ foreign-accent strength and comprehensibility (Atagi & Bent, 2011). Foreign accent strength was rated on a scale of 1–9 (1 = no foreign accent; 9 = strong foreign accent) and comprehensibility was rated on a scale of 1–9 (1 = very easy to understand; 9 = very difficult to understand). The scores demonstrate that the nonnative talkers were less intelligible, less comprehensible, and more accented than the native talkers. These scores give baseline data on the intelligibility of the talkers under ideal listening conditions. Furthermore, the comprehensibility and foreign-accent scores provide a broad characterization of how the native and nonnative talkers differ on two scales that have been frequently used with nonnative speech (Munro & Derwing, 1995). Sentences were equated for root mean square amplitude.
Intelligibility, comprehensibility, and foreign-accent strength ratings for talkers producing stimulus materials.
2.3 Procedure
Participants were tested individually in one session (adults) or two sessions (children). Participants were administered the Comprehensive Test of Phonological Processing (CTOPP) (Torgesen, Wagner, & Rashotte, 1999) and completed the sentence recognition test. The CTOPP has been normed for use with individuals between 5 and 24 years of age. The CTOPP includes six core tests that assess three phonological processing areas: phonological memory; phonological awareness; and rapid naming. The phonological memory component comprises a digit span task and a non-word repetition task. The phonological awareness test assesses participants’ elision (e.g., produce the word “plane” without saying /l/) and blending abilities (e.g., produce the word made up of these sounds: /p/ /l/ /eɪ/ /n/). Rapid naming was not included in the analyses because CTOPP implements different tests of rapid naming with 5- and 6-year-old children than with older children and adults. Therefore, the data on these subtests were not comparable for all child participants. The tests were administered and scored according to standardized protocols.
For the sentence recognition component, participants were tested individually in a sound-attenuated booth. Before the experimental trials, two practice trials produced by one native speaker with a midland dialect and one nonnative speaker not used in the experimental trials were presented. The nonnative speaker had a first language of Mandarin, a language background not employed in the experimental trials. Listeners were then presented with all 6 lists of sentences with a unique talker producing each list. Each list includes 8 sentences. Sentences were blocked by talker. The two lists (16 sentences) produced by the native speakers were always presented first. The order of the Korean- and Spanish-accented talkers and the female and male talker from each language background were counter-balanced across listeners. Sentences within a list were randomized for each listener. Sentences were embedded in a speech-shaped noise at a +3 dB signal-to-noise ratio. The signal-to-noise ratio was selected based on pilot testing, which indicated that most children’s performance would be at neither ceiling nor floor for the native or nonnative talkers. A random segment of noise was selected from a noise file so that there was 500 ms of noise before and after the sentence. Stimuli were played through a speaker (Yamaha MSP7 Studio Powered Monitor), which was 36 inches from the listener, at approximately 67 dB. A custom-written program in Python running on a Mac Mini was employed to present the stimuli. After each sentence was played, participants repeated the sentence they heard. They were encouraged to guess if they were unsure. No feedback was provided. An experimenter typed their response into a textbox on the computer screen. Responses were audio recorded for accuracy rechecking. For each child participant, a second research assistant, who was not present at the initial testing session, listened to the recording of the child’s responses. This second assistant noted trials in which she disagreed with the initial transcription. For trials in which there were discrepancies, the two transcribers met to resolve these disagreements. On average, the transcribers disagreed on 1% of keywords (range = 0–6% across children).
3 Results
The speech perception test was scored based on keyword accuracy. Words with added or deleted morphemes were counted as incorrect. Scores were averaged across the two talkers from each language background, giving one score for the native talkers, one for the Spanish-accented talkers, and one for the Korean-accented talkers. Scores were then converted to rationalized arcsine units (RAU) to facilitate meaningful statistical comparisons across the entire range of the scale (Studebaker, 1985). RAU scores range from -23 to 123 corresponding to 0 and 100 percent correct, respectively. The RAU transform converts proportions onto a scale that is linear and additive and also corrects for the relationship between means and variances. This transform is particularly important for scores that are on the ends of the scale (below 10% and above 90% correct).
Scores were analyzed with a repeated-measures analysis of variance (ANOVA) with one within-subjects factor (talker accent: native, Spanish, Korean) and one between-subjects factor (listener age: adult, child). Results showed main effects of talker accent and listener age, such that native talkers were more intelligible than nonnative talkers, F(2, 156) = 1462.08, p < 0.001, η2 = 0.97, and adults were more accurate than children, F(1, 78) = 162.49, p < 0.001, η2 = 0.68. There was also a significant two-way interaction between listener age and talker accent, F(2, 156) = 8.08, p < 0.001, η2 = 0.09.
The two-way interaction suggested that children’s performance declined more steeply in the presence of a nonnative accent than adults’ (Figure 1). The adults showed a decline of 28% (37 RAU) between the native- and Korean-accented conditions whereas children had a 38% (40 RAU) decline. The difference in accuracy between the native- and Spanish-accented talkers was 37% (45 RAU) for the adults and 52% (53 RAU) for the children. Viewed another way, the difference in performance between children and adults on the native condition was only 9% (15 RAU) whereas there was a greater performance gap between the two listener age groups for the Korean-accented talkers of 18% (18 RAU) and the Spanish-accented talkers of 23% (22 RAU). To compare children’s and adults’ performance declines from the native to the nonnative talkers, difference scores were calculated on the percent correct values (i.e., native-accented minus Korean-accented and native-accented minus Spanish-accented). Two independent samples t-tests of children’s versus adults’ difference scores indicated that children’s decline in native-to-nonnative accuracy was greater than adults’ for the Korean-accented talkers, t(78) = −6.47, p < 0.001, as well as for the Spanish-accented talkers, t(78) = −7.53, p < 0.001.

Keyword identification accuracy in rationalized arcsine units (RAU) for adults (left panel) and children (right panel) for the native (dotted gray), Korean (light gray), and Spanish (dark gray) accents. The box shows the 25th–75th percentiles with the line inside the box indicating the median. The whiskers indicate the 10th–90th percentiles with data falling below the 10th percentile and above the 90th percentile shown with the filled circles.
The child data were also analyzed separately to determine whether there was an effect of age within the group of child listeners. For this analysis, a repeated-measures ANOVA was conducted with one within-subjects factor (accent: native, Korean, Spanish) with age in months included as a covariate. Mirroring the ANOVA results with both age groups, there were significant main effects of accent, F(2, 76) = 18.8, p < 0.001, and age, F(1, 38) = 10.0, p = 0.003. However, there was not a significant interaction between age and accent (p = 0.44) suggesting that children’s keyword recognition improved on all three accents with increases in age.
In addition to the differences in keyword recognition accuracy between the adult and child age groups, children were also much more likely to provide non-word phonetic responses than adults. Non-word phonetic responses are phonetically related to the target word, but do not correspond to a real word in English. For example, for the sentence “many kids can learn to sing,” a child responded with “many kids candered the sing”. All examples of phonetic responses were of this type, in which only part of the response was a phonetic response embedded with other real words. Adults very rarely produced these non-word responses. For adults, less than 1% of trials for each accent type included a non-word response (native: 0.5%, Korean: 0.5%, Spanish: 0.2%). Only 6 of the 42 adults in the study produced a non-word response with those adults supplying non-word responses on 2 trials at most. Non-word responses for children were much more prevalent with 30 of the 38 children providing at least one non-word response. Averaged across children, non-word responses were observed in 2% of native-accented trials, 10% of Korean-accented trials, and 14% of Spanish-accented trials.
A large range of individual variability was observed on the word recognition task. Regression analyses assessed whether the variability in the sentence recognition task was related to the two phonological processing skills assessed (Table 2). Two stepwise linear regressions were conducted for each listener group, one for scores on the native talkers and one for scores on the nonnative talkers. The predictor variables were a phonological awareness score, which was a sum of the raw scores for the CTOPP blending and elision tasks, and a phonological memory score, which was a sum of the raw scores on the CTOPP digit span and non-word repetition tasks. Children’s scores for native sentences were significantly predicted by the phonological memory measure, R2 = 0.26, F(1, 37) = 12.83, p = 0.001. The addition of phonological awareness scores did not significantly improve the model. The regression for the adult scores on the native talkers was not significant, possibly due to a ceiling effect in this condition. Scores for nonnative-accented sentences were significantly predicted by the phonological awareness measure for both the children, R2 = 0.14, F(1, 37) = 6.04, p = 0.019, and the adults, R2 = 0.27, F(1, 41) = 15.04, p < 0.001. In both of these cases, the addition of phonological memory scores did not significantly improve the models.
Average raw scores for four subtests of the Comprehensive Test of Phonological Processing (CTOPP); maximum scores on the CTOPP are 20 for the blending and elision tasks, 18 for the non-word repetition task, and 21 for the digit span task. Standard deviations are shown in parentheses.
4 Discussion
This study assessed children’s and adults’ abilities to recognize nonnative-accented words in sentences. All listeners demonstrated performance decrements for nonnative talkers compared to native talkers. However, children had more difficulty than adults overcoming the deviations from native norms found in nonnative speech. Previously, children’s perception of isolated words was compared to adults’ (Bent, 2014). Similar to the current study, there were significant effects of both talker accent (i.e., native talkers were more intelligible than nonnative talkers) and listener age group (i.e., adult word identification performance was more accurate than children’s performance) with isolated words. However, in Bent (2014), there was not a significant interaction between talker accent and listener age group; the two groups showed a similar performance decline for nonnative-accented words relative to native-accented ones. In contrast, in this study, there was a significant interaction between talker accent and listener age group; children exhibited a larger performance decrement for the nonnative-accented sentences than adults. Compared to words, sentences present additional acoustic-phonetic deviations from native norms that must be perceptually accommodated. Children appear to have greater difficulty overcoming the presence of a nonnative accent under these conditions.
Compared to isolated words, when presented with sentence-length utterances, listeners have substantially more acoustic-phonetic information to hold in memory. With nonnative-accented sentences, listeners must hold this information in memory and then resolve the mapping between imprecise acoustic-phonetic matches between the acoustic or phonological traces and words in the lexicon. Children’s less developed cognitive abilities—including poorer selective auditory attention (Coch, Sanders, & Neville, 2005), complex working memory (Gathercole, 1999), and executive functions (Anderson, 2002)—may not allow them to fully conduct the cognitive manipulations needed to perceive nonnative-accented sentences while holding the information in memory.
Although sentences present additional cognitive and perceptual challenges, meaningful sentences allow listeners to take advantage of top-down processing. Adults may be more adept than children at utilizing sentence context to constrain possible interpretations of unclear bottom-up information. Once children have made a hypothesis about a word, they may be less likely than adults to revise their initial hypothesis based on additional information gathered from the sentence context. Studies with sentences that are designed to manipulate predictability would allow for the explicit assessment of whether children and adults differ in their use of sentence context with unfamiliar accents.
In addition to potential listener age differences in the use of sentence context, children are more likely than adults to assume that an unfamiliar pronunciation is simply a new, unknown word, rather than an imperfect match to a known word. In previous work with isolated words, children were observed to produce non-word phonetic responses (i.e., imitating the phonetic form of the word without matching it to a lexical item) when identifying words produced in nonnative accents and unfamiliar native dialects (Bent, 2014; Nathan et al., 1998) whereas adults rarely did so. In the current study, children also provided many more phonetic responses than adults, particularly for the nonnative-accented trials. Even in cases in which children have top-down information from the sentence context, they continue to supply non-word responses, particularly in cases in which the talker has an unfamiliar, nonnative accent. Due to children’s smaller lexicons and daily experience of learning new words, they may be more biased than adults towards assuming that a novel pronunciation represents a word that is unknown to them.
Adults’ greater linguistic experience may also help them to overcome deviations from native norms present in nonnative-accented speech. Although participants reported minimal exposure to the specific accents tested, adults typically have more experience with a wide range of idiolects, dialects, and accents than children. Even without exposure to a specific accent, experience with a broad range of accents and dialects may facilitate recognition of nonnative-accented words (Baese-Berk, Bradlow, & Wright, 2013; Schmale, Seidl, & Cristia, 2014). Knowledge of the range of possible instantiations for a single word may allow adults to more easily consider possibilities for a novel pronunciation of a known word than children who have many fewer exemplars to draw upon. Future assessments should attempt to capture children’s experience with both regional variation and nonnative accents to determine whether exposure in daily life to these types of inter-talker variations benefits perception of novel talkers with both familiar and unfamiliar accents.
In addition to developmental differences, large individual differences within groups were observed. To investigate linguistic skills related to these individual differences, children and adults’ phonological processing skills were assessed as possible predictor variables. The phonological processing skills that were most predictive of word recognition scores differed depending on whether talkers were native or nonnative. The phonological memory measure was more closely associated with perception of native-accented speech for children. In contrast, the phonological awareness measure was more closely associated with accurate perception of nonnative-accented sentences for both listener groups. The process of recognizing words that deviate substantially from previously experienced instantiations may require the consideration of alternative interpretations and best fits. Children and adults who are better able to consciously manipulate speech sounds (i.e., have greater phonological awareness) may also be more adept at the primarily subconscious process of mapping words with novel pronunciations onto items in the lexicon.
Previous work suggests that phonological awareness abilities are not related to perception of native-accented speech in noise (Lewis et al., 2010). Here, phonological memory was the strongest predictor for the perception of native-accented speech in noise for children. In cases in which the dialect of the talker and the listener are well aligned, the perceptual manipulations needed for interpreting nonnative speech may not be as heavily recruited. Although the phonological processing variables investigated were significant predictors of word recognition abilities (accounting for 14—27% of the variance across conditions), much individual variability remains unaccounted for. Explicit assessment of cognitive skills (e.g., working memory, task switching, selective attention, and executive functions) as well as linguistic skills and knowledge (e.g., vocabulary size, precision of phonological categories, and indexical processing) should be undertaken to account for more of the variance across listeners.
5 Conclusion
Spoken communication rarely occurs under ideal listening conditions. For example, elementary school classrooms frequently exhibit signal-to-noise ratios that result in speech perception difficulties for many young children (Bradley & Sato, 2008). Speech perception in challenging environmental conditions (e.g., in noise, reverberation, or with competing talkers) is more difficult for children than adults, with mature abilities emerging only in adolescence (Johnson, 2000; Neuman & Hochberg, 1983; Wightman & Kistler, 2005). This study demonstrated that in addition to a speech-in-noise deficit, children also demonstrate greater difficulty overcoming the presence of a nonnative accent than adults for words in sentences. Further, the phonological processing skill that best predicted accurate perception of nonnative-accented speech was different than the skill best predicting perception of native-accented speech. Specifically, there may be a heavier reliance on phonological awareness during the perception of nonnative- than native-accented speech. These findings add to the literature on the cognitive and linguistic skills that contribute to individual differences in speech perception.
Footnotes
Acknowledgements
We would like to thank our research assistants—Marissa Ganeku, Steven Elmlinger, Jessica Copperman, Nancy Eastman, and Matti Toone—for their assistance in data collection and analysis as well as Charles Brandt for writing the experimental software. Earlier versions of this work were presented at meetings of the Society for Research on Child Development, Seattle, Washington, April 2013, and the Psychonomic Society, Minneapolis, Minnesota, November 2012.
Funding
This work was supported by the National Institutes of Health (grant number R21DC010027).
