Abstract
This article considers the continuing debate in pronunciation instruction (PI) about whether segmental or suprasegmental features are more important in teaching English to speakers of other languages. While evidence has accumulated on both sides of the debate, the emergence of the notion of English as a Lingua Franca (ELF) further complicates the issue. This article provides a review of current research supporting the different views in the segmental/suprasegmental debate. The review highlights research evidence that examines either the impact of segmental and suprasegmental features on intelligibility or the effectiveness of teaching these features to improve intelligibility. A review of this line of research underlines the context-specific nature of the debate and a third view that blurs the boundary between segmentals and suprasegmentals.
Introduction
Against the backdrop of the paradigm shift from a form-based to a communicative approach in teaching English as a second/foreign language (ESL/EFL), pronunciation instruction (PI) has been increasingly integrated into the communicative approach, where the aim of PI mainly focusses on the intelligibility of the learners’ utterance. The identification of pronunciation features that influence a speaker’s intelligibility has become a focus in an emerging body of research into PI in ESL and EFL contexts. Such features are mainly categorized as either segmental (individual sounds, e.g. vowels, consonants) or suprasegmental (extending over syllables, words, or phrases, e.g. stress, rhythm, intonation).
A longstanding debate has been about whether it is more important to teach segmental or suprasegmental features as a priority in PI. Two opposing views emerge from this debate: some purport that suprasegmentals should be given priority in PI as they have a greater impact on intelligibility (e.g. Tanner and Landon, 2009), whereas some claim the opposite (e.g. Jenkins, 2002). Zielinski (2015), however, approaches this debate differently by challenging the assumption that segmental and suprasegmental features are independent entities. She argues that the two features are both important and should not always be viewed separately. This review attempts to discuss the three views. First, it synthesizes the arguments and evidence supporting either segmentals or suprasegmentals as being more important than the other. It then focusses on Zielinski’s (2015: 409) claim that ‘the segmental/suprasegmental debate is based on a false dichotomy’. Pedagogical and future research implications are discussed.
The Importance of Suprasegmentals
Traditionally, pronunciation materials or curriculums start from small segmental elements and move towards larger suprasegmental features. However, such a linear approach has been criticized for not attending to the bigger picture; as a result, learners may find it difficult to understand how the various elements fit together in utterances (Pennington and Rogerson-Revell, 2019). Thus, some advocate that suprasegmentals should be given priority in PI as they have a greater impact on intelligibility. Fraser (2001) for example, proposed that the order in which pronunciation features should be addressed should be based on how they affect listener comprehension. She argues that native listeners rely on stress patterns much more than on segmental features and that incorrect stress patterns will render the speaker’s utterance unintelligible even if the speaker has appropriate consonant production. This view is embraced by Chela-Flores (2001) and Tanner and Landon (2009) who concur that suprasegmental aspects deserve to be given emphasis and priority as they have more impact on intelligibility and thus are more relevant to students’ immediate pronunciation needs.
Research has given support to the importance of appropriate suprasegmentals production to intelligibility and proficiency rating (e.g. Hahn, 2004; Kang et al., 2010). Drawing on 26 speech samples obtained from the iBT TOEFL Practice Online test, Kang et al. (2010: 564) found that ‘suprasegmental features alone can collectively account for about 50% of the variance in proficiency and comprehensibility ratings’. Similarly, Isaacs and Trofimovich’s (2012) study with 40 ESL learners suggests that word stress was the most salient feature differentiating ESL speakers of different proficiency levels. Evidence has also accumulated indicating that suprasegmental-based PI may be more effective than the segmental-based one. Derwing et al. (1998) compared the instructional gains of three types of PI, namely, explicit segmental-based PI, suprasegmental-based PI and nonspecific PI (control group). Although both groups receiving explicit PI showed improvement in perceived accentedness and intelligibility in a controlled, read-aloud task, only the group instructed in suprasegmentals improved in intelligibility and fluency in a less-controlled narrative task. In a similar study documented by Gordon et al. (2013), only the group that received explicit instruction on suprasegmentals showed significant improvement in comprehensibility scores. Similar findings were also presented in Gordon and Darcy (2016), who reported that only the group instructed in suprasegmentals showed improvement in intelligibility. This may be because, as the researchers noted, suprasegmental-based PI involved communicative contexts, while segmental-based PI only focussed on the lexical level. That said, the researchers still suggest that suprasegmental instruction may be more effective in short-term PI interventions. Such studies, however, are relatively rare in the existing literature. Many studies only support the effectiveness of suprasegmental-based PI without suggesting its relative effectiveness compared with segmental-based PI (e.g. Derwing et al., 1997; Derwing et al., 2014; Saito and Saito, 2017; Tanner and Landon, 2009).
The Importance of Segmentals
On the other side of the debate, many argue that segmental features have a greater impact on intelligibility and thus should be given emphasis in PI. Collins and Mees (2013), for example, identified seven pronunciation features that they think are the most important features to intelligibility and listed them in the order in which they should be taught. The top five on the list were different segmental features such as vowels and consonants production accuracy whereas the only two suprasegmental features, stress and intonation, were at the end of the list. In line with the list, research has highlighted the importance of accurate segmental production to intelligibility. Rogers and Dalby (2005) itemized a minimal-pairs probe list to examine the intelligibility of segmental elements produced by Mandarin-L1 students. The results show that 76% of the variance in speakers’ utterance intelligibility can be accounted for by seven phonemic category features. In a similar vein, Bent et al. (2007) found that accurate production of vowels, as well as consonants in the word-initial position, significantly correlates with intelligibility. Similar findings were reported in Saito’s (2011a) study with Japanese learners of English. Eight segmentals, /æ, f, v, θ, ð, w, l, ɹ/, were found to have a significant influence on native speakers’ (NSs) speech perception. Notwithstanding the paucity of evidence supporting the relative effectiveness of segmental-based PI over the suprasegmental-based one, abundant experimental research has highlighted the effectiveness of segmental-based PI. Studies examining the impact of segmental-based PI on intelligibility have demonstrated instructional gains in controlled measurement tasks (e.g. Derwing et al., 1998; Saito, 2011b; Saito and Lyster, 2012) as well as in free constructed responses (Saito and Lyster, 2012).
While previous discussions and studies almost exclusively focus on the intelligibility for native listeners, Jenkins (2000) argues that PI needs to be geared towards the intelligibility for non-native listeners since non-native speakers (NNSs) are now frequently interacting with other NNSs using English as a lingua franca (ELF). Jenkins (2000: 135) contends that segmentals are more important than suprasegmentals in NNS-NNS communication and speakers’ adoption of some suprasegmental features such as reductions may even ‘obstruct intelligibility’. Drawing on the field data collected in various classroom and social situations, Jenkins (2002) further points out that the most common causes of communication breakdowns are attributable to the problematic pronunciation of segmental features such as substituting the sound /f/ in failed with /p/. She thus proposed the Lingua Franca Core, a set of crucial pronunciation features that function as safeguards to mutual intelligibility among NNS interlocutors. The main core items were categorized into five groups, among which four were segmentals and only one had to do with suprasegmentals (i.e. the appropriate use of nuclear stress). Noteworthy is that suprasegmental features such as word stress and intonation that are regarded as substantially important to intelligibility in previous studies (e.g. Hahn, 2004) were categorized into non-core features by Jenkins (2002).
Reexamining the Segmental/Suprasegmental Dichotomy
The segmental/suprasegmental debate stems from the premise which views segmental and suprasegmental features as separate entities. This dichotomy is reified in the previous research that viewed and measured the two separately and categorized pronunciation problems as either one or the other. Zielinski (2008), however, contends that this is a false dichotomy and that it can be difficult to categorize the problematic or non-target-like features as either segmental or suprasegmental. She points out the two-way nature of intelligibility by arguing that both the speaker and the listener have a role to play when the utterance is rendered unintelligible. From the speaker’s point of view, the focus is on how the words were pronounced and why they were produced that way; while from the perspective of the listener, the focus shifts towards what the listeners misheard and what misled them. The speakers’ pronunciation feature that reduces intelligibility can be categorized differently depending on the perspective from which the feature is analysed. An example from Zielinski (2015) is the categorization of a Mandarin L1 speaker’s pronunciation feature of epenthesis – adding an extra sound, usually a schwa, to the end of a word. Epenthesis is a common problem among Chinese learners of English because Mandarin does not have word-final consonant clusters. Notwithstanding the segmental nature of the extra vowel, the speaker’s addition of a vowel is attributed to the syllable structure constraints of her L1 Mandarin, which is a suprasegmental issue. The listeners, two NSs of Australian English, heard two words (just a/don’t) as they were misled by the change in the syllable pattern caused by the added vowel, which is also related to suprasegmentals.
Apart from the complexity of categorizing the pronunciation deviations as either segmental or suprasegmental, Zielinski (2015) contends that the segmental/suprasegmental dichotomy ignores the possible interaction between the two features as interactive components of an integrated system. It can be this interaction that influences intelligibility. This view resonates with what Weismer and Martin (1992: 83) argue that ‘modifications of segmental elements. . .may influence not only the perception of those particular segments but also the perception of the rhythmic structure of the utterances as a whole’. Hence, instead of debating whether it is more important to teach segmental or suprasegmental features, it may be more sensible to view both of them ‘as part of an integrated and interactive system where the production of one can influence the other’ (Zielinski, 2015: 402).
Discussion
The debate over segmentals and suprasegmentals appears to end in a stalemate among the different voices and seemingly contradictory evidence, but a look at the literature shows marked heterogeneity in research design among this line of research. The studies reviewed in this article vary substantially in the scale of study (n=1–75), learners’ L1 background (Japanese, Spanish, Chinese, Korean, French being the most common ones) and research methods such as intervention duration (4 hours to 12 weeks) and target forms (see Table 1), all of which could affect the results and conclusions of the study.
Characteristics of Studies Included in the Review.
Notes: n/aa = Participants’ L1s are not reported in the study; n/ab = Intervention duration does not apply because the study only identified pronunciation features that influence intelligibility with no PI intervention provided.
The variations emerging from previous findings may be attributable to the diverse L1 backgrounds of participants – learners from different L1s face different opportunities and challenges stemming from language transfer. For example, as Mandarin has contrastive stress at the word-level that is absent in Korean, Mandarin speakers can have an advantage over Korean speakers in stress processing of English words (Lin et al., 2014). Different intervention durations and target forms of the instruction may also lead to different results. Therefore, the interpretation of conclusions drawn from previous findings warrants caution.
The discussion would not be complete or fair without mention of the learning context. Learning context matters in ESL/EFL learning in that it shapes, or at least influences, learners’ beliefs and decision-making in their English learning trajectory. Notwithstanding the merits of Jenkins’s (2002) Lingua Franca Core in modifying pronunciation standards for ELF communication, there are foreseeable challenges in the promotion of the Lingua Franca Core in contexts where the exonormative NS model prevails. One of these contexts is mainland China, where English is being taught as a foreign language following an exonormative NS model. The Lingua Franca Core is proposed to facilitate ELF interactions among speakers of different L1s, whereas English classrooms in mainland China usually consist of learners with the same L1 who barely have the opportunity to experience ELF communication. Regarding language attitudes, Chinese EFL learners generally still prefer the exonormative NS model (e.g. Fang and Ren, 2018; Kung and Wang, 2019). While it is important for English language teaching (ELT) practitioners to inform students of the growing trend of ELF, teachers should support those who aspire to follow the exonormative NS model and inhibit non-target-like pronunciation features. In this sense, there seems to be no standard answer to whether it is more important to teach segmentals or suprasegmentals.
Challenging the segmental/suprasegmental dichotomy, Zielinski’s (2015) stand on this debate opens a new avenue for discussion. Segmentals and suprasegmentals are not implacable foes but constant companions. They are both an integral part of the pronunciation system that can possibly and positively interact with and build on each other. Notwithstanding the potential benefits of categorizing learners’ pronunciation features, the non-target-like pronunciation features can be categorized differently depending on the perspective from which the analysis is made. The PI process would be unnecessarily complicated if every non-target-like feature is to be categorized as either segmental or suprasegmental. There can be a fine line between segmental and suprasegmental issues for ESL and EFL learners with different L1 pronunciation systems ingrained or imprinted since birth or even since before birth (i.e. in the womb, where the maternal voice becomes familiar to the fetus). Instead of viewing the acquisition of segmentals and suprasegmentals as a fundamentally different process, it would be more productive to take a holistic perspective.
Pedagogical and Future Research Implications
A number of implications can be derived from the discussion. First, since very few studies appear to support that either one is more important than the other, and a meta-analysis of PI effectiveness showed a greater effect when both segmental and suprasegmental features are included in the instruction (Lee et al., 2015), teachers should awaken students to the importance of both. Second, it is important for teachers not to be constrained by the segmental/suprasegmental dichotomy but to be aware of and take advantage of the interactive relationship between the two. Teachers may help students to analyse their pronunciation features and conceptualize the interaction between segmental and suprasegmental features from the perspectives of both the speaker and the listener. Third, rather than uncritically adopting conclusions from previous research, ELT practitioners should familiarize themselves with learners’ L1 background to help them identify and tackle the features they find challenging to pronounce or discriminate. For instance, Korean learners usually need to attend to stress at the word-level (Lin et al., 2014), while Chinese learners often need extra help to address the schwa epenthesis problem (Zielinski, 2015). Thus, teaching learners from different L1s may require a shift in the focus of instruction. Teachers may inform their teaching with research evidence and experiment with various PI approaches that were proved effective in previous research conducted with similar students and in similar contexts. For Japanese learners, for example, Saito’s (2011b) study proved the effectiveness of explicit instruction (perception activities followed by production activities and feedback provision) focussed on eight English-specific segmentals that do not exist in Japanese, i.e. /æ, f, v, θ, ð, w, l, ɹ/. ELT practitioners are further advised to, based on learners’ needs and profiles of pronunciation mastery, adapt PI designs such as duration of intervention and assessment criteria.
The emergence of notions such as ELF reflects the changing demographics of English speakers in today’s world (e.g. McKay, 2018), which calls attention to intelligibility from the non-native listener’s perspective, especially when the existing research is predominantly conducted with native listeners. To bridge this lacuna, more research needs to be conducted with NNS listeners to not only investigate the impact of segmentals/suprasegmentals on intelligibility but also to examine the effectiveness of segmental- and suprasegmental-based PI to facilitate ELF communication. This will shed more light on the role of segmentals and suprasegmentals in ESL/EFL learning and what types of PI are suitable for different contexts and purposes.
Conclusion
The segmental/suprasegmental debate has been a topic of conversation for decades. This review suggests the debate may not be valid when taken out of context, since learners of different L1s and learning contexts may have different needs. Putting the segmental/suprasegmental dichotomy aside, no evidence appears to dispute that both features are important and that addressing one may also improve aspects of intelligibility related to the other. While advantages of both segmental- or suprasegmental-based PI are gaining momentum, the emergence of ELF brings a new dimension to the discussion. The existing research has almost exclusively had NSs as listeners to judge the intelligibility of NNSs’ speech, which, however, does not reflect the rapidly growing trend of ELF. Considerations of ELT call for more research conducted with NNSs as listeners to accommodate the knowledge base of PI to the changing demographics of English users in today’s world.
