Abstract
Research on the perception of word stress suggests that speakers of languages with non-predictable or variable stress (e.g., English and Spanish) are more efficient than speakers of languages with fixed stress (e.g., French and Finnish) at distinguishing nonsense words contrasting in stress location. In addition, segmental and suprasegmental cues to word stress may also impact on the ability of speakers to perceive stress. European Portuguese (EP) is a language with variable stress and vowel reduction. Previous studies on EP have identified duration as the main cue for stress. In the present study, we investigated the perception of word stress in EP, both in nuclear (NP) and post-nuclear (PN) positions, by means of three experiments. Experiment 1 was an ABX discrimination task with stress and phoneme contrasts, without vowel reduction. Experiments 2 and 3 were sequence recall tasks with stress and phoneme contrasts, vowel reduction being added to the stress contrast only in experiment 3. Results showed significantly higher error rates in the stress contrast condition than in the phoneme contrast condition, when duration alone (PN), or duration and pitch accents (NP), are present in the stimuli (experiments 1 and 2). When vowel reduction is added, EP speakers are able to perceive stress contrasts (experiment 3). The results show that vowel reduction appears to be the most robust cue for stress in EP. In the absence of vowel quality cues, a stress “deafness” effect may emerge in a language with non-predictable stress that combines both suprasegmental and segmental information to signal word stress. These findings have implications for claims of a prosodic-based cross-linguistic perception of word stress in the absence of vowel quality, and for stress “deafness” as a consequence of a predictable stress grammar.
1 Introduction
In this paper we investigate the perception of stress in European Portuguese (EP). As it displays an uncommon combination of cues to signal word stress (longer duration in stressed syllables, vowel reduction in unstressed syllables, and low co-variation between stress and pitch accent given that most stressed syllables are unaccented in multiword utterances), EP is an interesting test case to examine stress perception and to bring incremental knowledge about cross-linguistic research on the perception of word stress.
1.1 Stress perception across languages
Word stress is a prosodic dimension that varies across languages. In some languages, stress position varies within the word and this variation is lexically contrastive (e.g., English, Spanish, European Portuguese). In other languages, stress position is fixed (e.g., French and Finnish). Previous studies have shown that speakers of languages with non-predictable or variable stress are more efficient than speakers of languages with predictable or fixed stress at distinguishing nonsense words that vary only in stress position (Dupoux, Pallier, Sebastian, & Mehler, 1997; Dupoux, Peperkamp, & Sebastian-Galles, 2001; Peperkamp & Dupoux, 2002; Peperkamp, Vendelin, & Dupoux, 2010). However, the degree of stress “deafness” also varies across languages with fixed stress, depending on the phonological properties of the language (Dupoux et al., 2001). Previous results from cross-linguistic studies (Dupoux et al., 1997, 2001; Peperkamp & Dupoux, 2002; Peperkamp et al., 2010) have shown that four factors determine the degree of speakers’ stress “deafness”: (1) domain of stress; (2) lexical use of stress cues; (3) variability in the stress position; and (4) presence of lexical exceptions. It is suggested that in a language where phrases, and not words, are the domain for stress, speakers may disregard stress to identify words. For instance, in French, stress signals the phonological phrase but not the word. Hence, French speakers do not make use of stress to identify words. The use of stress cues (duration, pitch, intensity, vowel quality) in a given language to contrast lexical items may also impact on the degree of speakers’ stress “deafness.” For instance, in Finnish and English, variations in duration are used not only to signal stressed syllables (stressed syllables are longer than unstressed syllables), but also to contrast vowels (e.g., keeper/kipper, in English). In these languages, duration has a phonological role that is not restricted to stress. Therefore, speakers of a language in which duration is used for lexical contrasts should be better at distinguishing words contrasting in stress location if, in that language, stress is cued by duration. The variability in the distribution of stress in a given language may also predict the degree of speakers’ stress “deafness.” Whereas in Finnish and French, two languages with a high degree of stress “deafness,” stress is fixed (it is word-initial or it falls on the last syllable of the phonological phrase, respectively), in variable stress languages, like Spanish, stress “deafness” was not detected at all. In Spanish and in English, different stress location may distinguish words (e.g., SÁbana- saBAna, in Spanish; SUBject-subJECT in English) and it is expected that stress is a property speakers need to be aware of in word recognition and identification. Finally, the fourth factor that influences stress “deafness” is the presence/absence of lexical exceptions to the general stress rule and the most frequent stress location in a language. In French, stress falls on the last syllable of a phonological phrase, without exception. In Polish, however, there are contexts where irregular (non-penultimate) stress can occur, although the language has fixed stress. Importantly, according to Peperkamp and Dupoux (2002) and Peperkamp et al. (2010), languages with non-predictable or variable stress, like Spanish, should not undergo stress “deafness.”
The correlates of stress in a particular language are also relevant for stress perception. The phonetic exponents of stress vary cross-linguistically and languages differ in the way acoustic parameters combine to cue word stress, showing various combinations of suprasegmental cues (duration, F0 and intensity) and sometimes also segmental cues (vowel quality). Spanish, a language with no phonological vowel reduction, uses suprasegmental cues (duration and F0), whereas Catalan and English, which have vowel reduction, use a more diverse set of acoustic cues (duration, overall intensity and spectral tilt – Campbell & Beckman, 1997; Fry, 1958; Ortega-Llebaria & Prieto, 2010; Ortega-Llebaria, Vanrell, & Prieto, 2010).
Languages with similar phonological properties may also display different perceptual outcomes because of distinct phonetic cues to stress. For instance, English and Dutch, two languages with vowel length contrasts and vowel reduction, vary in the acoustic correlates of word stress and in the frequency of reduced vowels (English has more words with reduced vowels than Dutch, especially in word-initial position). This may explain why vowel quality has a stronger effect in the perception of word stress in English than in Dutch as, indeed, segmental information related to vowel quality appears to have a facilitating effect in stress-match/-mismatch conditions, in English (Cooper, Cutler, & Wales, 2002; Sluijter & van Heuven, 1996). Although previous studies have demonstrated the influence of suprasegmental cues in stress perception in English, segmental cues like vowel reduction in unstressed syllables have been shown to play a role in lexical activation and spoken-word recognition, when competition for lexical selection is at stake (Braun, Lemhöfer, & Cutler, 2008; Cutler & Pasveer, 2006; Cutler, Wales, Cooper, & Janssen, 2007; van Donselaar, Koster, & Cutler, 2005). The results for English show that speakers do not discard either segmental or suprasegmental information when they are primed with a monosyllabic fragment (e.g., mus-) to two words that vary in stress (e.g., music, museum).
Recent investigation shows that the type of co-variation between word stress and pitch accent also impacts on stress perception. In intonation languages such as English, Spanish or EP, pitch accents mark prominent positions at the phrase level, whereas word stress marks prominent positions at the word level (Gussenhoven, 2004; Ladd, 2008). However, pitch accents are typically associated with stressed syllables, making them more salient via the addition of another level of prominence. High co-variation between word stress and pitch accent occurs when every stressed syllable tends to be pitch accented, whereas low co-variation obtains when stressed syllables tend to be accented almost always only in nuclear position. Pitch accent becomes a stronger cue for word stress if there is high co-variation between stress and pitch accent. The relation between stress and pitch accent patterns differently across languages, depending, for example, on the domain for pitch accent distribution of the language (Hellmuth, 2007). In Spanish nearly every word gets a pitch accent (Hualde, 2002), whereas in EP pitch accent distribution is sparse and thus many words are not pitch accented (Frota, 2002, 2014). Research on the perception of stress in accented (typically nuclear) and unaccented (typically post-nuclear) contexts in stress-accent languages has shown that pitch accents, however, are not necessary for stress perception. Speakers of Catalan, Dutch and English were able to discriminate words contrasting in stress location (final vs. penultimate), based on duration and overall intensity, in the absence of pitch accents (Fry, 1958; Ortega-Llebaria et al., 2010; Sluijter, van Heuven, & Pacilly, 1997; Turk & Sawusch, 1996). Although stress perception appears to be cued by varying sets of acoustic cues (duration and F0 in Spanish; duration and vowel reduction in Dutch and English; duration, overall intensity and vowel reduction in Catalan), there seems to be a general agreement that duration is a suprasegmental cue that influences the perception of word stress cross-linguistically. These results led to the claim that word stress perception is universally based on suprasegmental cues (namely duration), irrespective of languages’ segmental cues to stress (Ortega-Llebaria & Prieto, 2009, 2010; Ortega-Llebaria et al., 2010).
As already mentioned, the performance of speakers in stress perception, lexical activation, word recognition and short-term memory tasks depends on a number of language-specific phonological factors. However, methodological issues also appear to influence stress perception results. Although stress “deafness” was reported in general for speakers of fixed stress languages, like French, as opposed to the general ability to perceive stress by free stress language speakers, detailed analyses have shown some individual variability in French data. In Dupoux et al.’s (1997) study, participants listened to sequences of three non-words (ABX) differing only in stress position. Their task was to determine whether the third word (X) was equal to the first (A) or the second (B). The results showed that French speakers were “deaf” to stress, with an average error rate of 20%, whereas Spanish speakers discriminated stress successfully, with an average error rate of 4%. Despite the difference between the error rates for French and Spanish being highly significant, French speakers generally performed better than chance, and a detailed analysis showed that some of them scored as highly as a typical Spanish speaker, whereas some Spanish speakers scored as badly as a typical French speaker. These results were interpreted by Dupoux and colleagues as following from a traceable phonological representation of stress in some French speakers. Conversely, some Spanish speakers failed in processing and representing stress. The authors indicate that it is likely that some participants were discriminating stress contrasts based on acoustic information, and propose that a more abstract processing level can be accessed by means of memory loading. Dupoux and colleagues later ran a series of experiments in which they investigated stress perception in a sequence recall task (Dupoux et al., 2001; Peperkamp et al., 2010). In these studies, French and Spanish speakers had to recall non-words varying only in stress in a short-term memory task with phonetic variability. Different levels of memory load and processing demands were elicited. Participants were asked to recall the order in which non-words appeared in sequences of variable lengths (two to six non-words), with multiple tokens uttered by different speakers. Contrary to the ABX task, sequence recall is highly demanding for memory and processing; furthermore, it is not a mere discrimination task, but rather an encoding task that provides access to an underlying representation. Results of cross-linguistic research on stress perception in a sequence recall task with phonetic variability confirm a stress “deafness” effect in French but not in Spanish speakers. In French, the error rate in the stress condition was 89% (Dupoux et al., 2001) and 78% (Peperkamp et al., 2010), contra an error rate of 39% (Dupoux et al., 2001) and 47% (Peperkamp et al., 2010) in Spanish. In the phoneme condition, on the contrary, the error rate for French speakers was 40% (Dupoux et al., 2001) and 34% (Peperkamp et al., 2010), whereas Spanish speakers had error rates of 27% (Dupoux et al., 2001) and 48% (Peperkamp et al., 2010).
1.2 The perception of word stress in European Portuguese
In the present study, we investigated the perception of word stress by European Portuguese native speakers. The results found for fixed stress languages like French, and variable stress languages like Spanish, described above, will be compared with new data from EP. We will follow the same methodologies used in Dupoux et al. (1997, 2001), with the goal of obtaining comparable and reliable results related to the phonological encoding of stress in a language that presents a mix of prosodic properties not usually found together in Romance languages.
In EP word stress is variable, as it may fall within the last three syllables of the prosodic word, and it is lexically contrastive (e.g., bambo “lax” [ˈbɐ̃bu]/bambu “bamboo” [bɐ̃ˈbu]). Data from the FrePoP database (Frota, Vigário, Martins, & Cruz, 2010), containing 1,486,092 prosodic words from spoken and written language, shows that 74% of words with two syllables or more are stressed in the penultimate syllable, 23% of words are stressed in the final syllable and 2% of words have stress in the antepenultimate syllable. Vowel reduction is a general phenomenon in unstressed position, both in pre- and post-tonic position (with few exceptions): /i, e, ɛ, a, o, ɔ, u/ are realized as [i, ɨ, ɐ, u] in unstressed positions, as shown in Table 1. Phonological /a/ and /e/ may surface in stressed position as [ɐ], before nasals and palatals, respectively (e.g., cama [ˈkɐmɐ] “bed,” telha [ˈtɐʎɐ] “tile”).
Vowels in stressed and unstressed position in European Portuguese.
Duration was reported to be the main cue for word stress in the absence of vowel reduction (Andrade & Viana, 1989; Delgado-Martins, 1977, 1986). In a perception experiment (Delgado-Martins, 1977, 1986), Portuguese speakers were asked to identify the stressed syllable in the triplet explícito [ʃˈplisitu] “explicit” – explicito [ʃpliˈsitu] “I make explicit” – explicitou [ʃplisiˈto] “(s/he) made explicit.” The stimuli were manipulated for the duration and intensity of the stressed syllable, from each of the three original tokens. The results showed that duration, but not intensity, was the relevant cue for stress identification, especially if stress was on the antepenultimate or final syllable. A decrease in duration in the case of penultimate stress did not induce stress displacement, but if the duration of the antepenultimate or final stressed syllable was reduced, stress was perceived as being elsewhere. The results found in perception experiments were supported by studies on stress production, as acoustic analyses showed that stressed syllables are longer than unstressed syllables (Andrade & Viana, 1989). However, further research on the perception of stress in EP showed that Portuguese speakers perform better in identifying stress location in the presence of vowel reduction. Castelo (2005) has shown that full vowels were frequently identified as stressed, even when they were unstressed.
Finally, pitch has not been considered in the EP stress literature as a potential correlate of word stress. In EP, like in English, pitch accents signal phrase level prominence and are used to differentiate sentence types and pragmatic or discursive meanings (Frota, 2000, 2014). EP, however, is a language with low co-variation between stress and pitch accent due to a sparse pitch accent distribution: less than 20% of prosodic words internal to the intonational phrase carry a pitch accent (Vigário & Frota, 2003). In other words, not every stressed syllable gets a pitch accent, and stressed syllables tend to be accented almost always only when in the nuclear position. Like most other Romance languages, nuclear position is rightmost in EP in broad focus utterances (Frota, 2014; Ladd, 2008). In utterances with a narrow focus in early position, the element under focus becomes the nuclear element, and post-nuclear words before the final rightmost word are unaccented (Frota, 2000). For instance, in the sentence “A Maria comeu torradas com manteiga [Mary ate toast with butter],” the word “manteiga” is in nuclear position and necessarily gets a pitch accent (the nuclear falling accent H+L*), whereas the words “comeu” and “torradas,” which are not at the edges of the intonational phrase, are typically unaccented. In the sentence “A Maria COMEU torradas com manteiga [Mary ATE toast with butter],” with narrow focus on the verb “comeu,” the verb is necessarily pitch accented and the words “toast with butter” are in post-nuclear position. Due to the low co-variation between word stress and pitch accent, pitch variation seems not to be a robust cue for word stress in the language.
Given the properties just described, EP is an interesting test case to examine word stress perception. On the one hand, EP patterns with Spanish, Catalan and English against French in having variable stress. Due to the presence of variability in stress location, which may be lexically contrastive, stress perception in EP may pattern with Spanish, Catalan and English (Dupoux et al., 1997, 2001; Ortega-Llebaria, Prieto, & Vanrell, 2008; Ortega-Llebaria et al., 2010), and in particular, stress “deafness” is not expected. As in other languages (namely Spanish and Catalan), duration was reported to be the main cue in stress perception in EP, a language in which duration is not used contrastively (unlike English or Finnish). Assuming that speakers make use of lexical stress cues for stress identification, duration alone should be a strong cue for stress perception and be sufficient to enable stress perception. On the other hand, EP patterns with English and Catalan, and unlike Spanish, in the diversity of phonological cues used to signal stress. In particular, EP shows vowel reduction in unstressed positions. If stress perception is universally based on suprasegmental cues (namely duration) as suggested by Ortega-Llebaria et al. (2010), in the absence of vowel reduction it is expected that in EP, like in Catalan, prosodic cues will be enough to signal stress contrasts. Finally, EP differs from Spanish, Catalan and English in the low co-variation between stress and pitch accent. Since there is low co-variation between stress and pitch accent, the absence of pitch accent (in unaccented contexts such as post-nuclear position) is not expected to result in a stress “deafness” effect in Portuguese speakers (apart from a general decrease in performance that may be attributed to phonetic reduction effects that generally characterize post-nuclear position). However, if pitch accents are nevertheless a salient (though not necessary) cue as previous studies on other languages suggest (Ortega-Llebaria & Prieto, 2009, 2010; Ortega-Llebaria et al., 2008, 2010; Sluijter & van Heuven, 1996; Sluijter et al., 1997), some (perhaps small) effect may be expected. Also, if suprasegmental features are universal cues for stress perception, it is not expected that speakers of a language with duration as an acoustic correlate for stress show any substantial degree of stress “deafness,” and certainly not a degree equivalent to that described for fixed stress languages, even in the absence of vowel reduction. However, if suprasegmental cues are not enough for stress perception, segmental cues, namely vowel reduction, may impact on stress perception facilitating the perception of stress and possibly preventing stress “deafness” from occuring. EP may thus contribute with new data to the understanding of the role of suprasegmental cues alone in stress perception, as well as that of vowel quality cues, in accented (nuclear) and unaccented (post-nuclear) contexts.
In the following sections, we will present the results of a discrimination task (ABX – section 2) and an encoding task (sequence recall), with and without vowel reduction (sections 3 and 4, respectively), both in nuclear and post-nuclear positions. In section 5, we will discuss the implications of our results for the understanding of word stress perception, both in EP and cross-linguistically.
2 Experiment 1
Experiment 1 was an ABX discrimination task, following the same experimental procedure as in Dupoux et al.’s (1997) experiment 1. This will allow us to compare our findings with the previous results for French and Spanish reported in Dupoux et al. (1997). Unlike Dupoux and colleagues, we tested both nuclear and post-nuclear positions.
2.1 Method
2.1.1 Materials
We tested the perception of disyllabic and trisyllabic nonsense words that varied only in stress location, uttered in two conditions: nuclear position (citation form) and post-nuclear position (within the carrier sentence “A MARIA comeu [target word] com manteiga [MARY ate [target word] with butter],” with narrow focus on “Mary”). The words embedded in the carrier sentence in post-nuclear position were clipped from the context and heard in isolation by the participants, as the citation forms counterparts. Fifteen pairs of nonsense words with penultimate and final stress and 11 trisyllabic words with antepenultimate, penultimate and final stress were constructed. Examples of nonsense words were [ˈmipu]/[miˈpu] and [ˈdɐmitu]/[dɐˈmitu]/[dɐmiˈtu]. In order to investigate the role of suprasegmental cues (pitch and duration) in stress perception, segmental cues (i.e., vowel reduction) were absent from the stimuli in this experiment. This was achieved through the use of high vowels ([i] and [u]), since they do not show vowel reduction (see Table 1), together with the vowel [ɐ] before a nasal or palatal consonant, given that in this context there is no vowel quality distinction between stressed and unstressed position (as described in section 1.2). Thus, all nonsense words were possible words in the language. A phonemic contrast was also tested as a control condition. The stimuli in the control condition varied consonants or vowels (e.g., [ˈdɛsu]/[ˈdɛtu], [ˈsiɾu]/[ˈseɾu]), while keeping the same stress pattern. As in the stress condition, stimuli in the control condition were uttered in nuclear and post-nuclear position. Three EP native speakers, one male and two female, produced the stimuli.
Acoustic analysis of the stimuli showed that duration was a cue to stress, both in nuclear (NP) and post-nuclear (PN) position (Table 2).
Experiment 1: mean duration and standard deviation of stressed and unstressed syllables, in nuclear position (NP) and in post-nuclear position (PN), and results from a paired sample t-test.
A pitch fall in the stressed syllable, due to the H+L* nuclear falling accent that characterizes declarative intonation in EP, was an additional cue in nuclear position, as shown in Figure 1. In post-nuclear position, the target words exhibit a flat pitch contour.

Intonational contour for the nonsense word [paˈmusi] in nuclear position (left) and in post-nuclear position (right).
Pitch accent and duration cue stress in nuclear position (NP condition), whereas duration is the only cue to stress in post-nuclear position (PN condition).
2.1.2 Procedure
Participants listened to two words contrasting in adjacent stress position (AB), such as [ˈmipu]/[miˈpu], [ˈdɐmitu]/[dɐˈmitu] or [dɐˈmitu]/[dɐmiˈtu], or showing a segmental contrast. The third word, X, should be equivalent to either A or B. In the trisyllabic stimuli, antepenultimate stress was paired with penultimate stress and penultimate stress was paired with final stress. The A and B words were always uttered by two female speakers and X by the male speaker, with order (AB/BA) counterbalanced within-subjects. The participants were instructed to decide whether X corresponded to A or B, in a foreign language. An interstimulus interval (ISI) of 500 ms separated the words within a trial. Trials were separated by a 1,000 ms interval and participants were given up to 4,000 ms seconds to respond (as in Dupoux et al., 1997). The experiment included 208 trials (148 trials for the stress contrast – 60 trials for disyllables and 88 trials for trisyllables – and 60 trials for the phoneme contrast). In the stress contrast, the 60 trials for the disyllables resulted from the order of stress position within the word (penultimate/final and final/penultimate) and from the nature of X as having the same stress pattern as A or B (15 pairs × 2 × 2: order of stress position × X-identity). The 88 trials in the trisyllables resulted from the two types of stress contrast (antepenultimate/penultimate and penultimate/final), the order of stress position (antepenultimate/penultimate or penultimate/antepenultimate for one type of stress contrast, and penultimate/final or final/penultimate for the other type of contrast), and X-identity as A or B (11 pairs × 2 × 2 × 2: type of stress contrast × order of stress position × X-identity). The 60 trials for the phoneme contrast resulted from the order of word presentation and X-identity (15 × 2 × 2). NP and PN was a between-subject condition.
2.1.3 Participants
Thirty-two standard EP native speakers participated in the experiment (16 participants in the nuclear condition and another 16 participants in the post-nuclear condition). Participants were undergraduate students who received course credits for the experiment. The order of presentation of the stress contrast and the segmental contrast was counterbalanced across subjects. Participants’ responses and reaction times were recorded with SuperLab Pro v. 4.5. ANOVAs were run for two dependent variables: error rate and reaction times.
2.2 Results
The results for the error rates in the stress vs. phoneme contrast, in NP and PN, are presented in Figure 2.

Experiment 1: error rates for stress contrast and phoneme contrast. Error bars indicate the standard error of the mean.
The data were analyzed using a repeated measures ANOVA with contrast (stress vs. phoneme) as a within-subject factor and position (NP vs. PN) as a between-subject factor. The results show that the effect of contrast was significant, with higher error rates in the stress contrast condition than in the phoneme condition, F1(1, 30) = 189.43, p < .001, η2 = .86; F2(1, 196) = 72.27, p < .001, η2 = .27, and a significant effect of position (nuclear/post-nuclear) showed that PN generated significantly more errors overall, F1(1, 30) = 8, p < .01, η2 = .22; F2(1, 196) = 15.54, p < .001, η2 = .07. No significant interaction was found between stimuli type (stress vs. phoneme contrast) and position (nuclear vs. post-nuclear position), F1(1, 30) = 1.77, p = .19, η2 = .06; F2(1, 196) = 2.02, p = .16, η2= .01, showing that the difference between stress and phoneme was not different between NP and PN position.
The results for the reaction times in the stress vs. phoneme contrast, in NP and PN, are shown in Figure 3.

Experiment 1: reaction times for stress contrast and phoneme contrast. Error bars indicate the standard error of the mean.
Reaction times between stress and phoneme were significantly different by item but not by subject, F1(1, 30) = 2.99, p = .09, η2 = .09; F2(1, 196) = 11.25, p < .01, η2 = .05. A borderline difference in reaction times between position (NP vs. PN) was found, F1(1, 30) = 3.76, p = .06, η2 = .11; F2(1, 196) = 75.87, p < .001, η2 = .28. No significant interaction was found between stimuli type and position, F1(1, 30) < 1; F2(1, 196) < 1.
2.3 Discussion
Native EP subjects performed an ABX discrimination task based on stress contrasts, in nuclear (accented) and post-nuclear (unaccented) positions. In experiment 1, duration and pitch cued stress in nuclear position, whereas duration was the only cue to stress in unaccented positions. Nonsense words contrasting minimally in word stress were used, and nonsense words contrasting only in one phoneme acted as a control condition. Our results showed that subjects made significantly more errors in the stress contrast than in the phoneme contrast. Moreover, they showed faster reaction times in the phoneme contrast, especially in nuclear position (although the difference was not significant). The post-nuclear position made discrimination more difficult overall, and not only in the case of the stress contrast.
A comparison with the results in Dupoux et al. (1997) on French and Spanish shows that the mean error rate in EP in the stress condition is similar to the stress error rate reported for French (21% in EP vs. 19% in French), whereas the error rate in EP in the phoneme control condition is similar to the stress error rate reported for Spanish (5% in EP vs. 4% in Spanish). In addition, our results also differ from those obtained for Spanish and Catalan on the basis of word identification experiments, since both Spanish and Catalan subjects were able to perceive stress distinctions cued by intensity and duration (Ortega-Llebaria et al., 2008, 2010). Contrary to Portuguese speakers, Spanish and Catalan speakers were able to detect stress contrasts based on duration and overall intensity in unaccented contexts, even when vowel reduction is absent from the stimuli (in the case of Catalan, a language with vowel reduction). The results from experiment 1 strongly suggest that EP subjects attend to suprasegmental cues to stress (pitch and/or duration) less than Spanish or Catalan speakers (Ortega-Llebaria et al., 2008, 2010), and, in the absence of vowel reduction, show a stress “deafness” effect similar to that reported for languages with predictable stress in their phonological grammar. However, as pointed out by Dupoux et al. (2001), the ABX task may not be the best method to assess the processing of stress at a more abstract level, as subjects may simply rely on the acoustic information and it is possible not to use phonological knowledge to perceive stress contrasts. The potential limitations of the ABX task have led us to experiment 2.
3 Experiment 2
In experiment 2, we used a sequence recall task, following the same procedure used in Dupoux et al. (2001) (specifically, in their experiment 4), but with stimuli phonetically similar to the ones used in Peperkamp et al. (2010). Our aim was to test the perception of word stress in EP using a more robust method than the ABX discrimination task. A confirmation of the findings from experiment 1 would demonstrate that EP speakers show stress “deafness” in the absence of vowel reduction.
3.1 Method
3.1.1 Materials
Using a sequence recall task (Dupoux et al., 2001; Peperkamp et al., 2010), we tested the perception of stress by EP speaking subjects in the absence of vowel reduction. Two disyllabic (CVCV) minimal pairs consisting of nonsense words were constructed: [ˈmupɐ]/[ˈmunɐ], with a phonemic contrast, and [ˈnumi]/[nuˈmi], with a stress contrast. The four nonsense words were uttered in nuclear (citation form) and post-nuclear position (within the carrier sentence “A MARIA comeu [target word] com manteiga [MARY ate [target word] with butter],” with narrow focus on “Mary”). As in experiment 1, the words in post-nuclear position were clipped from the context and heard in isolation by the participants, as their citation form counterparts. The use of high vowels in the stress contrast pair ensured that vowel reduction was absent from the stimuli. Two EP native speakers, one male and one female, produced 10 tokens of each nonsense word, six of which were used as stimuli (three per speaker). The inclusion of multiple tokens uttered by two different talkers added to the phonetic variability of the stimuli. A combination of memory load and phonetic variability were introduced, aiming to address the processing of stress at a more abstract level.
As in experiment 1, acoustic analysis of the stimuli showed that duration was a cue to stress, both in nuclear and in post-nuclear position (Table 3).
Experiment 2: mean duration and standard deviation of stressed and unstressed syllables, in nuclear position (NP) and in post-nuclear position (PN), and results from a paired sample t-test.
As in experiment 1, a pitch fall (H+L*) was a further cue to stress in nuclear position only, as illustrated in Figure 4.

Left to right: intonational contour for the nonsense words [ˈnumi]/[nuˈmi] in nuclear position, and [ˈnumi]/[nuˈmi] in post-nuclear position.
Also, as in experiment 1, pitch accent and duration cue stress in nuclear position (NP condition), whereas duration is the only cue to stress in post-nuclear position (PN condition).
3.1.2 Procedure
In experiment 2, participants were instructed to recall sequences of words of a foreign language. The experiment was divided into two parts. In the first part, participants were tested on a phoneme contrast. In the second part, they were tested on a stress contrast. Each part was sub-divided into two phases: the training phase and the test phase. In the training phase, participants had to associate the two nonsense words to the keys [1] and [2]. By pressing each one of these keys, they listened to the two words as many times as they wanted. Before the test phase there was a warm-up set of trials, consisting of four sequences of two of the newly learned words, presented with feedback. In the test phase, participants listened to 20 sequences composed of five tokens each, followed by the word “OK.” After listening to the word “OK,” participants would recall the order in which the two words had appeared in the five-token sequence (e.g., [ˈnumi]-[nuˈmi]-[nuˈmi]-[ˈnumi]-[nuˈmi]). 1 Only responses that were a 100% correct transcription of the five-word sequence were coded as correct; all the others were coded as incorrect. Responses that were 100% incorrect were coded as reversals. As in Dupoux et al. (2001), participants with more reversals than correct responses in either the phonemic or the stress contrast condition were rejected.
3.1.3 Participants
Twenty-four EP native speakers participated in the experiment (12 participants in the nuclear condition and another 12 participants in the post-nuclear condition). Participants were undergraduate students who received course credits for the experiment. Additional speakers were tested and excluded from the results due to the presence of too many reversals: two in the NP condition (one had too many reversals for the phoneme contrast and one for the stress contrast) and 10 in the PN condition (eight had too many reversals for the phoneme contrast and two for the stress contrast). Two other speakers were excluded for responding before the word “OK.” Participants’ responses were recorded with SuperLab Pro v. 4.5. The data were subjected to a repeated measures ANOVA with position (nuclear vs. post-nuclear) as a between-subject factor and type of contrast (stress vs. phoneme) as a within-subject factor.
3.2 Results
Error rates as a function of type of contrast are shown in Figure 5.

Experiment 2: error rates for stress contrast and phoneme contrast. Error bars indicate the standard error of the mean.
As in experiment 1, there was a significant effect of type of contrast, F(1, 22) = 66.93, p < .001, η2 = .75, with more errors in the stress than in the phoneme contrast, and an almost significant interaction between contrast and position, F(1, 22) = 3.76, p = .065, η2 = .15, due to the fact that stress, but not phoneme, showed more errors in the PN position. There was no main effect of position, F(1, 22) < 1.
Also like in experiment 1, participants made significantly more errors in the stress contrast condition than in the phoneme contrast condition, both in nuclear and post-nuclear position. Contrary to the results found in experiment 1, where the stimuli in post-nuclear position were overall more difficult to discriminate, in experiment 2 only the stress contrast in post-nuclear position (but not the phoneme contrast) was more difficult to discriminate.
3.3 Discussion
In experiment 2 we investigated stress perception in EP by means of a sequence recall task. In this experiment, duration and pitch cued stress in the NP condition, and duration was the only cue to stress in the PN condition. Participants made significantly more errors in the stress contrast than in the phoneme contrast, both in nuclear and in post-nuclear position. These results replicate the findings from experiment 1. In the absence of vowel reduction, EP speakers show a stress “deafness” effect similar to that reported for French (Dupoux et al., 2001). Therefore, suprasegmental cues (pitch accent and/or duration) are not used to perceive stress, unlike in Catalan (Ortega-Llebaria et al., 2010). However, unlike experiment 1, experiment 2 showed that in the unaccented context (PN), stress is harder to perceive, whereas that is not the case for the phoneme contrast. This suggests that the nuclear pitch accent may nevertheless function as a residual, weak cue to stress in EP.
The findings from both experiment 1 and experiment 2 demonstrated that EP speakers show a stress “deafness” effect similar to that found in speakers of languages with predictable stress (Dupoux et al., 2001; Peperkamp et al., 2010). In the absence of vowel reduction, and in accented contexts (nuclear position) where stress is cued by duration and pitch, EP speakers have difficulties in perceiving stress contrasts. Given these findings, we tested the impact of vowel quality cues on stress perception by means of a third experiment.
4 Experiment 3
In experiment 3, we used a sequence recall task (as in experiment 2), but we added vowel reduction to the stress contrast stimuli. Our goal here was to test the effect of vowel reduction in stress perception.
4.1 Method
4.1.1 Materials
Using a sequence recall task (Dupoux et al., 2001), we tested the perception of stress by EP speaking subjects in the presence of vowel quality cues; that is, vowel reduction. We used the same phonemic contrast that was part of experiment 2 ([ˈmupɐ]/[ˈmunɐ]), and a novel stress contrast with vowel reduction ([ˈnemi]/[nɨˈmi]). In EP, as described in Table 1, a stressed [e] reduces to [ɨ] in unstressed position. The four nonsense words were uttered in nuclear (citation form) and post-nuclear position (within the carrier sentence “A MARIA comeu [target word] com manteiga [MARY ate [target word] with butter],” with narrow focus on “Mary”). The words embedded in the carrier sentence in post-nuclear position were clipped from the context and heard in isolation by the participants, as their citation form counterparts. Two native speakers of EP, one male and one female, produced 10 tokens of each nonsense word, six of which were used as stimuli (three per speaker), adding to the phonetic variability of the stimuli.
Acoustic analysis of the stimuli showed that duration was a cue to stress, both in nuclear and in post-nuclear position (Table 4). As in experiments 1 and 2, a pitch fall (H+L*) was a further cue to stress in nuclear position only, as illustrated in Figure 6. Vowel reduction thus adds to the cues to stress in the NP condition (pitch and duration) and in the PN condition (duration).
Experiment 3: mean duration and standard deviation of stressed and unstressed syllables, in nuclear position (NP) and in post-nuclear position (PN), and results from a paired sample t-test.

Left to right: intonational contour for the nonsense words [ˈnemi]/[nɨˈmi] in nuclear position, and [ˈnemi]/[nɨˈmi] in post-nuclear position.
4.1.2 Procedure
The participants’ task in experiment 3 was similar to that in experiment 2.
4.1.3 Participants
Twenty-four EP native speakers participated in the experiment (12 participants in the nuclear condition and another 12 participants in the post-nuclear condition). Participants were undergraduate students who received course credits for the experiment. Additional speakers were tested and excluded from the results due to the presence of too many reversals: three in the NP condition (all with too many reversals for the phoneme contrast) and 16 in the PN condition (seven had too many reversals for the stress contrast, five for the phoneme contrast and four had too many reversals in both phoneme and stress). Four other speakers (two in the NP condition, two in the PN condition) were excluded for responding before the word “OK.” Participants’ responses were recorded with SuperLab Pro v. 4.5. The data were subjected to a repeated measures ANOVA with position (nuclear vs. post-nuclear) as a between-subject factor and type of contrast (stress vs. phoneme) as a within-subject factor.
4.2 Results
Error rates as a function of type of contrast are shown in Figure 7.

Experiment 3: error rates for stress contrast and phoneme contrast. Error bars indicate the standard error of the mean.
The results revealed a significant effect of type of contrast, F(1, 22) = 12.09, p < .01, η2 = .36, and a significant interaction between type of contrast and position, F(1, 22) = 5.97, p < .05, η2 = .21. There was no significant effect of position, F(1, 22) = 2.19, p = .15, η2 = .09. In the NP condition, stress and phoneme contrasts show similar error rates. A paired t-test carried out for NP and PN separately showed a significant difference between stress and phoneme in the PN position, t(11) = 3.68, p < .01, but not in the NP position, t(11) = .07, p = .4.
Results from experiment 2 and experiment 3 were compared. A repeated measures ANOVA with two between-subject factors of experiment (experiment 2 vs. experiment 3) and position (NP vs. PN), and a within-subject factor of type of contrast (stress vs. phoneme), was carried out.
The results revealed a significant effect of type of contrast, F(1, 44) = 67.78, p < .001, η2 = .61, and significant interaction between contrast and experiment, F(1, 44) = 10.89, p < .01, η2 = .2 and between contrast and position, F(1, 44) = 9.61, p < .01, η2 = .18, due to the different behavior of the stress contrast in NP and PN, in experiment 3. The effect of position was almost significant, F(1, 44) = 3.92, p = .054, η2 = .0. The three-way interaction between contrast, position and experiment was not significant, F(1, 44) < 1.
For NP, an independent samples t-test revealed a significant difference between experiments 2 and 3 for stress, t(22) = 4.98, p < .001, but not for phoneme, t(22) = .04, p = .97. Similarly, for PN an independent samples t-test revealed a significant difference between experiments 2 and 3 for stress, t(22) = 2.22, p < .05, but not for phoneme, t(22) = .25, p = .81. Importantly, the stress contrast showed significantly lower error rates in experiment 3 relative to experiment 2, both in NP and PN.
4.3 Discussion
In experiment 3, we investigated stress perception in EP by means of a sequence recall task. In this experiment, we included vowel reduction in the stimuli and thus vowel quality was an additional cue to stress, together with duration and pitch in the case of the NP condition, and duration alone in the case of the PN condition.
The results demonstrated that vowel reduction is a relevant cue for stress perception. In experiment 3, unlike in experiment 2, stress and phoneme contrasts showed similar error rates and no stress “deafness” effect was found. The error rates reported for the stress and phoneme contrast in the NP condition in experiment 3 (respectively, 55% and 51%) were comparable to the error rates found for the phoneme contrast for Spanish in Dupoux et al. (2001) and Peperkamp et al. (2010) (56% and 48%, respectively). In non-nuclear position, however, the error rate for the stress contrast is still significantly higher than in nuclear position, suggesting that the presence of a pitch accent may nevertheless play some role in stress perception, although to a much smaller extent than vowel reduction. If pitch were totally irrelevant for stress perception, Portuguese speakers would have performed similarly in the nuclear and non-nuclear contexts.
The comparison between the results from experiments 2 and 3 furthermore confirmed the relevance of vowel reduction for stress perception in EP. Importantly, the stress contrast showed significantly lower error rates overall in experiment 3 relative to experiment 2, both in the accented and unaccented contexts (cf. Figures 5 and 7).
Finally, both in experiments 2 and 3 the unaccented context showed significantly higher error rates in the stress contrast condition (but not in the phoneme one) than the accented context, suggesting that the presence of a pitch accent may nevertheless play a weak, residual role in the perception of EP stress, despite the low co-variation between stress and pitch accent in the language.
To sum up, the results of experiment 3 indicate that, when vowel reduction is included in the stimuli, error rates generally decrease in the stress contrast condition, both in nuclear and in non-nuclear position. Vowel reduction thus seems to prevent the stress “deafness” effect found in experiments 1 and 2, when only suprasegmental information cued word stress.
5 General discussion
In this paper we investigated word stress perception in EP. Because it displays a particular combination of properties, namely variable stress, vowel reduction and duration as the main acoustic cue to stress, and low co-variation between stress and pitch accent, EP is an interesting language in which to examine word stress perception. The variable nature of stress in the language would predict no stress “deafness” effects (Dupoux et al., 1997, 2001; Peperkamp & Dupoux, 2002; Peperkamp et al., 2010). EP is thus expected to pattern with Spanish and not French. The fact that stress is cued by suprasegmental information, besides vowel quality, as in Catalan, would predict that suprasegmental cues (namely duration) should be enough to signal stress contrasts. Thus, EP data could be anticipated to support recent claims that word stress perception is universally based on suprasegmental cues, irrespective of languages’ segmental cues to stress (Ortega-Llebaria et al., 2010). In particular, since duration is an acoustic cue for stress in EP, it would be predicted that in the absence of segmental cues (vowel reduction), duration should be enough to enable the perception of stress. The unaccented context was not expected to have a major impairing effect on stress perception, given the low co-variation between stress and pitch accent in this language.
However, if suprasegmental cues, although present, are not enough for stress perception in a language like EP, a stress “deafness” effect might emerge in the absence of vowel quality cues to stress. In that case, EP would pattern differently from Spanish and Catalan, and would approximate fixed stress languages like French. Such findings would identify vowel reduction as the key correlate for stress in the language, and contradict the claims of prosodic-based cross-linguistic perception of word stress in the absence of vowel quality cues.
Using two different paradigms – an ABX discrimination task and a sequence recall task – we demonstrated that, when vowel reduction is absent from the stimuli, EP speakers show a stress “deafness” effect similar to that found in speakers of languages with predictable stress, such as French. In the absence of vowel reduction, and in accented contexts (nuclear position), EP speakers are much less efficient in perceiving stress contrasts. In Tables 5 and 6 we compare the error rates reported for French and Spanish (Dupoux et al., 1997, 2001; Peperkamp et al., 2010) with those found for EP, in a comparable condition (the accented context) and using similar methodologies.
Error rates for stress contrast (nuclear position only). Data from French and Spanish from Dupoux et al. (1997) – ABX, Dupoux et al. (2001) and Peperkamp et al. (2010) – sequence recall task.
Error rates for phoneme contrast (nuclear position only). Data from French and Spanish from Dupoux et al. (1997) – ABX, Dupoux et al. (2001) and Peperkamp et al. (2010) – sequence recall task.
Table 5 shows that EP patterns with French, not Spanish. The results from both the ABX task and the sequence recall task demonstrate a stress “deafness” effect in EP in the absence of vowel reduction similar to that previously found for French. The error rates in the stress contrast condition are higher for French and EP (19% and 21%, respectively, in the ABX task, and 89%/78% and 78%, respectively, in the sequence recall task) than for Spanish (4% in the ABX task, and 39%/47% in the sequence recall task). However, Table 6 shows that in the phoneme contrast condition French, Spanish and EP have similarly low error rates. The performance of EP subjects in the phoneme contrast – the control condition – shows that they are as efficient in performing the experimental tasks as French or Spanish subjects. These results bring evidence for a stress “deafness” effect in speakers of a language with variable stress, contra previous findings (Dupoux et al., 1997, 2001; Peperkamp & Dupoux, 2002; Peperkamp et al., 2010). Like Spanish, EP has variable stress. However, unlike Spanish, EP has reduced vowels in unstressed position. In the absence of vowel reduction, a stress “deafness” effect arises in Portuguese speakers’ perception. Furthermore, and contrary to Catalan (Ortega-Llebaria & Prieto, 2010; Ortega-Llebaria et al., 2010), a language with variable stress and vowel reduction, longer duration in stressed syllables is not enough to cue stress. In fact, EP speakers are less efficient at perceiving stress when only supragmental cues are present (when longer stressed syllables co-occur with a pitch accent, as in nuclear position, and when duration is the only cue to stress, as in post-nuclear position), but they are able to perceive stress contrasts when segmental cues are present. The results found for EP stress perception can be related to results found for English in lexical activation and spoken-word recognition, in stress-match/mismatch conditions, where vowel reduction has been shown to have an effect. Although English speakers do not discard suprasegmental information in stress perception (Fry, 1958), they perform better in lexical selection tasks when vowel reduction, and not only duration and/or intensity, is present (Braun et al., 2008; Cutler & Pasveer, 2006; Cutler et al., 2007; van Donselaar et al., 2005).
Our findings strongly suggest that stress “deafness” is not specific to languages with fixed stress, but rather a perceptual inability that emerges in the absence of the critical cues to stress when the relevant cues for stress are absent, or when there is a mismatch between the phonetic correlates and the speakers’ phonological representation of stress. Variable stress at the word level does not conform to the phonological representation of stress for French speakers, as stress in French is fixed and is not a property of words. When presented with stimuli that have variable stress, French speakers, unlike Spanish speakers, find it more difficult to perceive stress, as that pattern does not conform to their phonological grammar (Dupoux et al., 2001; Peperkamp & Dupoux, 2002; Peperkamp et al., 2010). Likewise, Portuguese speakers show a stress “deafness” effect in the absence of vowel reduction, and this may be due to the fact that vowel reduction is the most relevant cue for stress in their grammar (cf. Castelo, 2005). Although languages may be grouped into typological classes according to their stress properties, stress cues and stress rules are mostly language-specific and these specificities seem to play a role in determining the presence, absence or degree of speakers’ stress “deafness.”
The results from our three experiments on stress perception in EP, both in accented and unaccented positions, show that suprasegmental properties alone (not only duration alone, but also duration and pitch accent) are not enough to enable the perception of stress in a language that uses both suprasegmental and segmental (vowel quality) cues to stress (Castelo, 2005; Delgado-Martins, 1977, 1986). Therefore, the present findings do not support the claims of prosodic-based cross-linguistic perception of stress in the absence of vowel quality cues (Ortega-Llebaria et al., 2010). The results from our study, on the contrary, bring evidence of a language-specific basis for stress perception. In EP, contrary to Catalan, suprasegmental cues are not sufficient to enable stress perception and, in the absence of vowel quality cues, only when vowel reduction is present are Portuguese speakers able to perceive stress contrasts. Vowel reduction, by contrast, seems to be the key correlate for stress in EP and its absence leads to a stress “deafness” effect in Portuguese speakers.
In sum, our findings suggest that stress “deafness” is not an effect specific to fixed stress languages, but rather a perceptual inability that emerges when the relevant cues for stress are absent, or when the correlates of stress do not conform with the speakers’ phonological representation of stress.
6 Conclusions
Cross-linguistic research showed that speakers of languages with fixed stress are less efficient in perceiving stress contrasts, when compared with speakers of languages with variable stress. However, speakers of different languages vary in the degree of stress “deafness” effects, depending on the acoustic cues and the phonological properties of stress. Segmental cues (such as vowel reduction) and/or suprasegmental cues (such as duration, intensity and pitch) impact on the way in which speakers perceive stress in their language. EP is an interesting case to investigate stress perception, given the unusual interaction of duration patterns and vowel reduction, as well as the low co-variation between stress and pitch accent.
In this paper, we conducted three experiments where we tested stress perception with and without vowel reduction, both in accented and in unaccented contexts. Our findings demonstrate that, in the absence of vowel quality cues to word stress, a stress “deafness” effect emerges in a language with variable stress that combines both suprasegmental and segmental information to signal word stress. A segmental cue, namely vowel reduction, seems to be the most robust cue to signal word stress in EP. Hence, our results do not support findings for a universal prosodic-based perception of word stress but, instead, strongly point to a language-sensitive mechanism in word stress perception.
Footnotes
Funding
This work was supported by Grant EXCL/MHC-LIN/0688/2012 from the Foundation for Science and Technology (Portugal).
