Abstract
This paper describes a cross-linguistic production study of devoicing for European Portuguese (EP), Italian, and German. We recorded all stops and fricatives in four vowel contexts and two word positions. We computed the devoicing of the time-varying patterns throughout the stop and fricative duration. Our results show that regarding devoicing behaviour, EP is more similar to German than Italian. While Italian shows almost no devoicing of all phonologically voiced consonants, both EP and German show strong and consistent devoicing through the entire consonant. Differences in consonant position showed no effect for EP and Italian, but were significantly different for German. The height of the vowel context had an effect for German and EP. For EP, we showed that a more posterior place of articulation and low vowel context lead to significantly more devoicing. However, in contrast to German, we could not find an influence of consonant position on devoicing. The high devoicing for all phonologically voiced stops and fricatives and the vowel context influence are a surprising new result. With respect to voicing maintenance, EP is more like German than other Romance languages.
Keywords
1 Introduction
Extensive research has been dedicated to the analysis of stop voicing distinctions. However, when it comes to direct cross-linguistic comparison of different languages, the literature is rather sparse concerning the differing contributions across languages with respect to acoustic parameters or auditory features for voicing distinction, both in speech production and speech perception (and its interaction).
For stop consonants, it is generally agreed that voice onset time (VOT) is the most dominant perceptual cue to the voicing contrast (in a broad phonological sense). For languages such as English and German, the contrast is based on a distinction between zero or short-lag VOT in /b d ɡ/ and long-lag VOT in /p t k/. In languages such as Spanish and Russian, on the other hand, the contrast is based on a distinction between negative VOT (which is also known by the term pre-voicing) in /b d ɡ/ and zero or short-lag VOT in /p t k/ (Keating, 1984).
Further, it is generally agreed that other parameters are also important for the stop voicing distinction. Results for English (Lisker & Abramson, 1964, 1967; Luce & Charles-Luce, 1985), German (Jessen, 1998), Korean (Brunner, Fuchs, & Perrier, 2011), Japanese (Kawahara, 2004), Catalan (Cuartero, 2002) and European Portuguese (EP) (Lousada, Jesus, & Hall, 2010; Viana, 1984) all show that, across languages, closure duration and preceding vowel duration are seen as strong and dominant cues for voicing distinction. It has to be noted that there are still further correlates for stop voicing distinction, for example, f0 or the onset of the first formant (Jessen, 2001; Lisker, 1986).
Phonetic research and analyses of EP real speech data show that for a significant number of items there is no discernable burst (Lousada et al., 2010). This is quite common in phonologically voiced stops, but occasionally also found in voiceless stops. The missing burst forces the perceptual system to rely on other acoustic cues to extract and perform the given voicing distinction task. Thus, the following questions could be raised, especially when seen in the light of cross-linguistic perceptual research: what are the cues that the perceptual system chooses to form a robust voicing distinction, and how is weighting among the available cues mediated? For vowel perception, it has been shown that the human perceptual system is not only able to perform certain weighting techniques between cues (i.e., to apply cue-trading) in order to achieve a robust perceptual outcome, but, in addition, this weighting differs across different dialects and languages (Escudero, Benders, & Lipski, 2009; Morrison, 2005; Pape & Jesus, 2014). For the perception of obstruent voicing, this cue weighting is assumed to be highly language-dependent: while some languages merely rely on the strong cues like VOT, other languages may rely on voicing maintenance, closure duration or vowel duration instead. Thus, when comparing different languages, a number of different acoustic parameters have to be taken into account when examining the cue mediation for obstruent voicing distinction.
Our research focuses on the cross-language importance of the parameter voicing maintenance, that is, voicing during closure/constriction of phonologically voiced obstruents. We chose the terminology voicing maintenance in contrast to negative VOT or pre-voicing because it incorporates the possibility of the absence of a burst, which is common in real speech production. In the current paper we examine the phonetic realisation of voicing maintenance for stops and fricatives in a cross-linguistic context. It has to be noted that the phonetic realisation of voicing maintenance is rather different than the phonological meaning of voicing, for example, for German it has been shown (Jessen, 1998) that the ‘normal’ voicing status of a phonologically voiced stop is related to aspiration and duration cues rather than to the maintenance of the voicing during stop closure. However, for other languages like Italian or Spanish (Shih, Möbius, & Narasimhan, 1999) the maintenance of voicing during stop closure is more important.
The literature on cross-linguistic comparison of voicing maintenance shows quite diverse results. Examining an extensive cross-linguistic corpus, Shih et al. (1999) showed that for all voiced stops the percentage of devoicing was considerably higher for German than for Italian or (Mexican) Spanish. For the two Romance languages there was almost never devoicing throughout the complete closure of the voiced stops, although for Spanish it has to be taken into account that stops are reduced to approximants in intervocalic positions, so the finding of strong voicing maintenance is not surprising here. Again, for other consonant positions (e.g., the devoicing-prone initial position) other cross-linguistic studies (Lisker & Abramson, 1967; Solé, 2011) show strong stop voicing for Romance but not for Germanic languages. This finding for Germanic languages is not surprising, since other distinctive features are important (Jessen, 1998). Results from devoicing studies for phonologically voiced stops confirm that devoicing is rather common in German (Pape, Mooshammer, Hoole, & Fuchs, 2006). All these findings are in line with the classical phonological view that for Romance languages voicing maintenance or pre-voicing is prevalent (Caramazza & Yeni-Komshian, 1974; Castaneda, 1986; Quilis, 1981) in contrast to Germanic languages, which tend to devoice (Jessen, 1998; Pape et al., 2006).
However, for EP, another Romance language, research shows (Jesus & Shadle, 2002, 2003; Lousada et al., 2010; Pinho, Jesus, & Barney, 2012) that for voiced consonants the devoicing rate is rather high. Together with the fact that EP behaves like Spanish or Slavic languages, that is, has unaspirated, short-lag VOT values for the voiceless stops /p t k/ (Lousada et al., 2010), this raises the question of how the voicing contrast in EP is achieved when the voicing cue is missing and other cues such as aspiration are not present.
Valid cross-linguistic comparisons of EP to Spanish and German are highly problematic, since conditions in the previously cited studies were very different, and they also had small sample sizes with high speaker variance. Thus, the aim of the current study is to overcome these methodological issues and investigate whether the discrepancy between expected and observed obstruent voicing for EP is an artefact of methodology and protocol differences or rather obstruent devoicing is in fact an important feature of EP.
For a valid cross-linguistic examination of voicing maintenance it is absolutely crucial that all possible confounding factors are carefully matched. A number of studies found five main groups of factors influencing voicing (and thus devoicing):
Consonant class: Ohala (1983) stated that voiced fricatives tend to devoice easily. So, in our study a substantial difference between phonologically voiced stops versus fricatives could be found, given that the stop duration does not exceed the natural threshold where devoicing occurs naturally due to aerodynamic reasons.
Place of articulation: Ohala and Riordan (1979) found that velar stops are more prone to devoicing than more anterior places of articulation. The effect is caused by the reduced cavity volume behind the constriction, which limits the capacity for passive enlargement and therefore affects voicing maintenance. For English and Swedish, Keating, Linker, and Huffman (1983) found a similar result with earlier voicing termination for more posterior places of articulation.
Position and stress: Westbury and Keating (1985) showed that from an aerodynamic point of view, stops are more likely to be voiced in utterance medial positions, while utterance initial and final positions increase the probability of devoicing. Keating et al. (1983) showed that stress increased the duration of voicing during the closure for Swedish.
Context: For English, Ohala and Riordan (1979) observed that stops coarticulated with high vowels maintained voicing longer than those coarticulated with low vowels. They explained the results by the enlarged pharyngeal cavity for high vowels. For German, Pape et al. (2006) confirmed this vowel-dependency, with higher devoicing percentages for stops followed by any low vowel. This result was obtained even when consonant and vowel spanned a word boundary. With respect to contrasting phonemic contexts, Shih et al. (1999) showed that preceding voiced contexts (vowels and sonorants) resulted in less devoicing of voiced stops than voiceless contexts (stops and fricatives).
Consonant duration: Ohala and Sprouse (2003) showed in a vented valve experiment that it was difficult to maintain voicing for longer than around 60ms, that is, the longer the consonant duration, the more probable it would cease the stop voicing.
Given this variety of influencing factors for voicing, it is paramount that a valid examination across different languages closely matches all these described factors. For this reason, we carefully controlled these factors during the construction of our corpora; namely, manner and place of articulation, position, and vowel quality context. Stress and accent were uniform across all languages, thus eliminating a possible influence.
Following the cross-linguistic comparisons, we will focus on the examination of EP to shed light on the differing devoicing behaviours in this understudied Romance language.
2 Method
2.1 European Portuguese corpus
We recorded CVCV (consonant, vowel, consonant, vowel) items in the context of the frame sentence Diga [CVCV] outra vez ([ˈdiɡɐ… ˈowtɾɐ veʃ] – ‘Say [CVCV] again’). For each CVCV item both consonant and vowel were pairwise identical (e.g., tata, pipi, or koko), sentence stress was on the CVCV pseudoword, and lexical stress was set to the first syllable of the CVCV pseudoword. In EP, lexical stress could occur on either syllable, so stress had to be fixed to guarantee an identical pattern across speakers. The recordings consisted of all EP stops and fricatives, that is, /p b t d k ɡ f v s z ʃ ʒ/, in the vowel contexts /i e o a/. This resulted in two different consonant positions to be examined: an intervocalic initial and a medial consonant position (in the following referred to as initial and medial position), with identical vowels following both positions, and with a central vowel (the final vowel /ɐ/ in Diga is centralised) uniformly leading in to the pseudoword’s initial consonant. Each pseudoword in its frame sentence was repeated nine times in randomised order by six different EP native speakers (all female). In sum, a total of 5184 items (6 speakers × 4 vowels × 12 consonants × 2 positions × 9 repetitions) were produced. Speakers read the complete phrases from a sheet of paper held in front of them (without obstructing the line between microphone and lips; items were prompted by orthographic representation and in cases of doubt by phonetic transcriptions and examples), and they were instructed not to allow pauses or breaks within the sentence (i.e., to produce Diga [CVCV] outra vez without a pause). To assure matching speech rate, multiple items of a sample database were played to the speaker to follow the speech rate. A trained native EP speaking phonetician was present throughout the recordings to ensure that no vowel reductions (frequent in EP), or consonant reductions (in which case the item was immediately repeated), or deviations from the provided speech rate occurred. Thus, it was ensured that the complete corpus was free from reduction or other lenition phenomena before pre-processing the data. The recordings were made in a soundproof room at the Speech, Language and Hearing Laboratory (SLHlab), University of Aveiro, Portugal. The acoustic signal was recorded using a Cirrus Research acoustic free field MK224 microphone located one metre in front of the speaker’s mouth and the electroglottograph (EGG) signal was recorded with a Glottal Enterprises EG2-PCX processor. The acoustic signal was pre-amplified (Cirrus Research MV 181A), then amplified and filtered by a Cirrus Research ZE 901B Preamplifier Power Supply, and finally recorded to the first channel of a Marantz PMD671 Solid State Recorder with a sampling frequency of 48 kHz and 16 bits. The EGG signal was recorded directly onto the other stereo audio channel.
All speakers were from the same dialectal region (Dialetos Setentrionais, according to Segura, 2013) and had not spent extended periods in other regions of Portugal. All speakers were university educated, and their mean age was 25 years (standard deviation = 2 years).
2.2 Reference corpora: comparing EP to other Romance and Germanic languages
In order to make cross-linguistic devoicing comparisons, we recorded analogous corpora for German and Italian (i.e., one Germanic language and one additional Romance language). We chose Italian rather than Spanish (see Introduction) since in Spanish stops are often produced as approximants, and thus obscure a valid devoicing analysis. The recording setup was identical to that for the EP corpus: MK224 microphone, MV 181A preamplifier, ZE 901B Preamplifier Power Supply, and Marantz PMD671 Solid State Recorder. The recordings were conducted in the soundproof rooms at the Centre for General Linguistics (ZAS) in Berlin (Germany) and the Instituto di Scienze e Tecnologie della Cognizione (ISTC) in Padova (Italy).
For German, we recorded analogous C and V sequences in the frame sentence Sage [CVCV] ohnehin ([ˈzaːɡə… ˈoːnəhɪn] – ‘Say [CVCV] already’), with sentence stress on the CVCV pseudoword and lexical stress on its first syllable. Both the preceding (sage) and following phonemes (ohnehin) of the frame sentence and the pseudoword’s syllable structure were chosen to be comparable with the original EP frame sentence. German distinguishes between a tense and lax vowel set, differing in vowel length and formant values. To keep the German corpus as closely matched as possible to the EP corpus, we carried out an informative analysis of intrinsic lengths and formant values which showed that the EP vowels are more similar to the German lax vowel set than to the tense vowel set. Therefore, we chose to record the lax vowel quality for the German corpus (/ɪ ε ɔ a/) for the preceding vowel (C
To compare the EP corpus to another Romance language, we recorded a further analogous corpus with six Italian speakers (five males, one female) for all Italian stops and fricatives in the context of the frame sentence Dite [CVCV] ogni dì ([ˈdite… ˈɔni di] – ‘Say [CVCV] every day’). For the four vowel contexts, we chose to record (additionally to high /i/ and low vowel /a/) the closed vowel varieties (è and ò) of the northern variety. All speakers originated from the Veneto region of north-eastern Italy, had university level education, and their mean age was 40 years (standard deviation = 13 years). Northern Italy and central/south Italy are different in terms of the voicing of the phoneme /z/ (in central/south Italy this phoneme is devoiced). For this reason, we chose to record only speakers from northern Italy, to be able to compare their voicing of the phoneme /z/ with German and EP. The speakers were instructed to produce /s/ as in the words sano (healthy) and /z/ as in sbarbato (shaved) and the items were presented to the speakers as either a phonetic transcription or by giving examples until the speakers were perfectly able to produce the desired CVCV items. Again, both the preceding (dite) and following phonemes (ogni dì) of the frame sentence and the pseudoword’s syllable structure were chosen to be comparable with the original EP frame sentence. As for the German recording, the speech rate reference (multiple items, recorded beforehand) was provided, but within the limits of being natural for Italian. Correct realisation of all items and match to the provided speech rate was verified by a native Italian speaking trained phonetician (present during the recordings). The phoneme /ʒ/ was eliminated from the corpus as it does not exist in Italian. This resulted in a recording of six speakers with nine repetitions each for 4752 items in total: 6 speakers × 4 vowels × 11 consonants × 2 positions × 9 repetitions.
2.3 Labelling and processing
Each of the EP, German and Italian CVCV items was manually labelled according to the following acoustic landmarks:
onset and offset of the neutral lead-in vowel (V0) before the pseudoword (i.e., the last vowel of the preceding word);
onset and offset of the first target vowel (V1) [C
onset of the second target vowel; offset of the second target vowel (V2) [CVC
All vowel onsets/offsets were regarded as the offsets/onsets of the neighbouring obstruents and were labelled by the presence or absence of a clear higher formant structure (i.e., taking into account both the second formant F2 and the third formant F3).
We computed the voicing status for each obstruent, sampled at 10 equidistant points throughout the complete consonant duration, as previously proposed by Shih et al. (1999). The first point is set to the beginning of the consonant constriction (and thus to the preceding vowel offset), whereas the 10th point is the following vowel onset. In other words, we sampled over the complete duration of the consonant, from the beginning of the constriction to the onset of the following vowel (thus our measure does not include only the closure phase of the consonant). The advantage of using the whole-consonant strategy in stops is that it is no problem to deal with cases of a missing burst (which often occurs in EP, as described in the Introduction) for /b d ɡ/. By using only the closure phase of the stop (i.e., beginning of closure to burst release), any case of a missing burst will provide problems for the data collection. Another advantage is that it is possible to measure stops and fricatives more similarly than if just using the stop closure phase.
To find the most accurate automatic voicing decision algorithm for the acoustic signal, we conducted an extensive comparison of different algorithms available for our purposes. We applied the different voicing decision algorithms to real speech data, specifically, an existing EP and Standard English corpus (Jesus & Jackson, 2008) and an existing German corpus (Pape et al., 2006). The algorithms’ results were checked against the visible voicing curves in both the corresponding speech and EGG signals.
For our purposes, the most accurate automatic voicing detection algorithm was the PRAAT version 5.2 (Boersma, 2001) autocorrelation (AC) pitch extraction algorithm with the settings voiceless decision = 0.55 and silence threshold =0.1. Throughout the analyses, by means of the audio and the EGG signal, we constantly checked manually for errors of the algorithm and corrected the voicing status if necessary.
2.4 Statistical analysis
For the statistical verification of the described voicing patterns we chose the obstruent acoustic midpoint as the fixed landmark for the statistical analyses. As described, the obstruent onset (voicing profile: landmark one) and offset (landmark 10) are voiced by definition (vowel formants onset and offset). The acoustic midpoint (i.e., point six in our data) between these two endpoints is widely accepted as a measurement point in stop and fricative analyses, and is thus a good candidate for use in the analysis of the significance of differences in the above-described voicing profiles.
To statistically analyse devoicing at that consonantal midpoint, a series of logit models with mixed effects was run (lmer, package lme4) in the R environment (R Development Core Team, 2010). The logit models are based on binomial distributions (z-scores, Generalized Linear Mixed Model, GLMM). This allowed us to do modelling based on binary decisions (Baayen, 2008; Bates, Maechler, & Bolker, 2011), since for our voicing profile data and for each of the recorded items a binary voicing decision is obtained for each of the 10 consecutive landmarks.
3 Results
3.1 Temporal measures
Figure 1 shows for all three languages the medial stop duration (right panels) and duration of the preceding vowel (left panels) by consonant. These two measures are often regarded as the most important acoustic cues for the voicing distinction (see Introduction). Figure 1 shows that consonant durations are very similar across languages, and considerably longer durations are prevalent for the voiceless consonants than for their voiced counterparts. For the preceding vowel durations there are clear language effects, with Italian having the longest vowel durations (before both voiceless and voiced consonants) and German having the shortest durations, which is not surprising given the lax vowel quality of the recorded German vowel set. The EP durations are in between the two other languages. Further, for all three languages the voiced consonants are preceded by longer vowels than the voiceless consonants. In sum, all three languages show substantial differences of both medial consonant duration and preceding vowel duration with respect to the voiced/voiceless consonant distinction. The initial consonant duration was not reported here due to the increased variance based on the word boundary. A word boundary often induces an increased variability of the measured (initial consonant) durations, so a comparison would not be useful here.

Mean and ± 1 standard error of the first (preceding) target vowel C
3.2 Cross-linguistic comparisons of phonologically voiced obstruents
We are mainly interested in the time-varying behaviour of the occurring devoicing process, thus we needed a precise measure of the voicing status, which takes into account the whole obstruent duration from the onset of the constriction to the onset of the following phoneme. Such a measure has the advantage that the time-dynamics during the devoicing process are clearly visible (see the voicing profiles presented in Figures 2–4), thus allowing the examination of the devoicing occurrences during the time course of the consonant.

Voicing profiles of medial consonants for EP (top row of panels), German (second row) and Italian (bottom row). Shown on the x-axis are the percentages of the stop/fricative duration from the offset of the preceding vowel (0%) to the onset of the following vowel (100%). Profiles for stops are displayed on the left side (/p t k b d ɡ/) and profiles for fricatives (/f s ʃ v z ʒ/) on the right side. Each curve is the mean over all speakers of one language with all four vowel contexts, that is, each data point is the mean of 216 items (6 speakers × 4 vowel contexts × 9 repetitions). The postalveolar voiceless fricative /ʃ/ is represented by the symbol ‘S’, the voiced counterpart /ʒ/ is not plotted due to constraints in German and Italian.

Voicing profiles for the initial consonant position for EP (top two rows of panels), German (third row) and Italian (bottom row). Shown on the x-axis are the percentages of the stop/fricative duration from the offset of the preceding vowel (0%) to the onset of the following vowel (100%). Profiles for stops are displayed on the left side (/p t k b d ɡ/) and profiles for fricatives (/f s ʃ v z ʒ/) on the right side.

Voicing profiles for /b d ɡ/ in medial position, according to vowel height context. High vowel context (/iCi/) is on the left, and low vowel context (/aCa/) on the right for EP (top row of panels), German (second row) and Italian (bottom row). Shown on the x-axis are the percentages of the stop/fricative duration from the offset of the preceding vowel (0%) to the onset of the following vowel (100%).
Möbius and his colleagues (Möbius, 2004; Shih et al., 1999) introduced such a time-dependent measure with their voicing profiles. To obtain a voicing profile, the length-normalised consonant is sampled at 10 equidistant landmarks, and each of the 10 landmarks is assigned its own binary voicing status, with voicing decisions extracted from the acoustic signal. Thus, for each consonant a (normalised) voicing curve consisting of 10 equidistant landmarks is obtained, with the first landmark corresponding to the onset and the 10th landmark corresponding to the offset of the consonant (i.e., the beginning of the next phoneme). As described in the methods section, stress was held identical across all languages, so that this factor does not influence the resulting voicing maintenance.
Figure 2 compares voicing profiles for the medial consonant position. As expected, there is a clear separation of voiceless consonants from their voiced counterparts (i.e., comparing the two sets of three lines on each graph). For the phonologically voiced consonants, Figure 2 shows very clearly that the mean voicing profiles of German and EP are very similar and contrast strongly with the Italian voicing profiles. In German and EP, consonant devoicing occurs very early during the consonant duration, principally during the first third, and only at the consonant offset the voicing is resumed again. Visual pairwise comparisons fail to show any clear difference between stops and fricatives. In contrast to EP and German, Italian stops and fricatives consistently maintain voicing, with almost no devoicing through the entire consonant duration.
In sum, our plots show no devoicing for Italian, and they confirm the finding of Shih et al. (1999) of strong devoicing for German. Furthermore, our medial obstruent devoicing data clearly shows that EP, the other Romance language we examined, must be grouped with German rather than Italian.
Comparing initial (see Figure 3) and medial (see Figure 2) consonants, only German shows a clear difference between consonants in medial and initial intervocalic position, with greater devoicing in the initial position as expected from literature results (Ohala and Riordan (1979) expected more devoicing in the initial position as compared to the medial position due to aerodynamic reasons). Italian and EP show no difference between the two conditions. Further, in Italian all phonologically voiced consonants are actually produced as voiced throughout the consonant duration.
The apparent difference in position effects between EP and German is unexpected, since the aerodynamic conditions responsible for this phenomenon should be more or less identical across different languages, so for EP a stronger devoicing for initial position would be expected. In both cases, it is not purely articulatory (or aerodynamic) conditions that lead to the observed patterns, but languages, possibly due to their distinct phonological systems, ‘chose’ phonetically different realisations, or cues.
Figure 4 shows the dependency of the voicing profiles on the height of the following vowel. As described in the Introduction, aerodynamic conditions suggest stronger devoicing for stops associated with low vowels than with high vowels. The voicing profiles for stops in high and low vowel context show that for EP and German, but not for Italian, there is a difference in devoicing behaviour with contrasting tongue height of the contextual vowels, with stronger devoicing in the low vowel context. The effect is more pronounced in German than EP and occurs earlier during the consonantal closure. Thus, for both German and EP this result is in line with the work of Ohala and Riordan (1979) and Pape et al. (2006). The explanation for the lack of vowel-dependency of the Italian voicing patterns is the extremely high voicing throughout the consonant closures. As devoicing is nearly absent in our data for this language, any possible influence of contextual vowel height is obscured.
3.3 Statistic validation at the obstruent midpoint
Logit models (GLM) were used with a p<0.05 significance threshold to test for effects of language (EP, German and Italian), consonant position (initial and medial), manner of articulation (stop and fricative) and their interactions on the devoicing at the midpoint of phonologically voiced obstruents (i.e., statistical analysis was only applied to phonologically voiced and not to voiceless consonants). Speaker, contextual vowel and repetition were random factors. There were two reasons for the use of vowel context as a random factor here: first, vowel identity and properties differ between the languages examined, thus it cannot be regarded as a valid predictive factor; second, vowel height context only affects stops, so its inclusion as a predictive factor for both stops and fricatives would be incorrect.
The linear mixed models showed that regarding language, voicing probability was significantly higher for Italian than for EP [z=−6.696, p<0.001] or German [z=−7.86, p<0.001], but not for the comparison of EP and German. Further, for manner of articulation there was significantly higher voicing probability for fricatives than for stops [z=−3.583, p<0.001]. However, no significant differences were found between initial and medial consonant position.
Moreover, the test showed significant interactions between (a) language and consonant position and (b) language and manner of articulation. In the former case, for EP and Italian there are similar values of voicing probability for the medial and initial positions, while for German there is substantially more devoicing in the initial consonant position [z=5.291, p<0.001], thus explaining the significant interaction (see Figure 5 left). The interaction between language and manner of articulation can be explained by the higher devoicing probabilities for EP [z=2.605, p=0.004] and German [z=3.969, p<0.001], but not for Italian (see Figure 5 right).

Significant interactions among the factors language, consonant position and manner of articulation. Shown are means ± 1 standard error of the voicing probability for the three examined languages EP, German and Italian (x-axis). The left-hand panel shows the difference in consonant position (initial vs. medial), the right-hand panel shows the difference in manner of articulation (stops vs. fricatives).
3.4 Results for European Portuguese
Our extensive corpus allows us to focus on obstruent devoicing in EP, which has seldom been studied in the past.
Our results seem to provide evidence that EP consonant voicing patterns in fact differ from Italian and are similar to those found for German. The Italian and (Mexican) Spanish voicing profiles in Shih et al. (1999) and our own voicing profiles for Italian show clearly that EP’s voicing patterns do not resemble those of other Romance languages, but in fact exhibit behaviour similar to that found in the voicing profiles of our German corpus. In the following, we therefore examine in greater detail the devoicing characteristics of EP, taking into account the variability among native speakers and the influences of place and manner of articulation, consonant position, and vowel height context.
3.4.1 Voicing profiles: speaker-dependency of voicing patterns
The following description on inter-speaker variability is rather limited due to the reduced number of recorded speakers (six speakers). Figure 6 shows EP’s voicing profiles for the medial consonant position. In line with the voicing profiles for all languages, all six EP speakers show a clear separation between voiced and voiceless consonants: For the voiceless consonants, voicing ceases abruptly during the first 20% of the obstruent duration and does not pick up till the onset of the following vowel, while for the voiced consonants, voicing probability decreases more gradually during the first 40% of the obstruent, then plateaus and picks up again the in the last third of the consonant.

Voicing profiles for European Portuguese medial stops (upper half) and fricatives (lower half). Each data point is the mean of 36 items (9 repetitions × 4 vowel contexts). Shown on the x-axis are the percentages of the stop/fricative duration from the offset of the preceding vowel (0%) to the onset of the following vowel (100%).
We found a strong inter-speaker variation in the amount of devoicing (see speaker 6 versus speaker 1 or speaker 3). Devoicing differences across place of stop articulation show no overall pattern across the speakers. For one of the six speakers (speaker 5) there is a clear difference, with more and steeper devoicing for more posterior place of articulation. However, speaker 1 and speaker 4 show weak evidence for the opposite trend, with less devoicing for velar stops than for the other places, so no overall pattern can be inferred. For the fricatives, no evidence can be observed for differences in place of articulation. With respect to the difference in manner of articulation, all speakers show a trend to have more voicing for fricatives than for stops, which runs against Ohala’s (1983) predictions.
In sum, from the observations of the voicing profiles we find no evidence in EP for higher or steeper devoicing for more posterior place of articulation. Thus, there is a difference between EP and both German (Pape et al., 2006) and English (Ohala & Riordan, 1979), where devoicing increases with more posterior place of articulation.
3.4.2 Voicing profiles: dependence on consonant position and manner of articulation
For both medial (see Figure 6) and initial (see Figure 7) consonant positions, the data for all our speakers show that all of EP’s phonologically voiced stops and fricatives undergo a substantial and stable amount of devoicing. Across speakers, devoicing of the phonologically voiced stops fails to increase in initial consonant position as would be expected from the literature for German (Pape et al., 2006), with the exception of only speaker 2. Comparing the influence of place across medial and initial positions, we see stronger devoicing for velar stops in initial position, while at medial position this tendency is absent.

Voicing profiles for European Portuguese initial stops (upper half) and fricatives (lower half). Each data point is the mean of 36 items (9 repetitions × 4 vowels). Shown on the x-axis are the percentages of the stop/fricative duration from the offset of the preceding vowel (0%) to the onset of the following vowel (100%).
3.4.3 Voicing profiles: vowel context effect for velar stops
The effect of vowel context on stop devoicing is expected to be strongest for velars, since this posterior place of articulation has the smallest back cavity, and thus most strongly favours devoicing, so any contextual effects should surface when comparing the velars with the other stops. Figure 8 shows EP velar stop devoicing profiles for the differing vowel height contexts (/i/ high vowel and /a/ low vowel), with data shown for both initial (upper panels) and medial positions (lower panels).

Voicing profiles of EP velar stops in high (/i/ vowel context – dotted lines) and low vowel context (/a/ vowel context – solid lines). The upper panels show results for consonants in initial position, the lower panels for medial position. Shown on the x-axis are the percentages of the stop duration from the offset of the preceding vowel (0%) to the onset of the following vowel (100%).
As Figure 8 shows, for five of the six speakers at both positions there is substantially stronger devoicing when the velar stop is in a low vowel context. Speaker 5 is the exception here, showing more devoicing in high vowel context in initial position, and partly also in medial position (specifically, during the first half of the consonant). Additionally, in the initial condition only speaker 2 shows indifference for vowel context. Thus, for five of the six speakers, the examination of the EP velar stop voicing patterns for medial position shows that devoicing depends on vowel height context, as previously shown for German (Pape et al., 2006), English (Keating et al., 1983; Ohala & Riordan, 1979) and Swedish (Keating et al., 1983).
The next section examines the statistic validity of the previous voicing profile observations for measurements taken at the midpoint of the consonant.
3.4.4 Statistical validation at the obstruent midpoint
Similar to our procedure with the cross-linguistic statistic validation, we computed a separate Linear Mixed Model for the EP data, specifically a series of logit models with mixed effects. Again, the logit models are based on binomial distributions (z-scores, Generalized Linear Mixed Model, GLM). A p<0.05 significance threshold was used to test the effects of consonant position (initial and medial), vowel context (/i e o a/), place of articulation (/b d ɡ v z ʒ/), and their interactions on devoicing at the midpoint of phonologically voiced obstruents. Speaker and repetition were random factors, as before. We ran one model for each manner of articulation, that is, one model for stops and another for fricatives, because we wanted to compare within rather than across the stop and fricative manner classes.
For EP fricatives, there was no significant effect, either for consonant position, vowel context or place of articulation.
For EP stops, we found significant differences for vowel context, with higher voicing probability for high vowels /i/ as compared to low vowels /a/ [z=−3.143, p=0.002]. Further, for the back vowel context /o/ stops showed significantly higher voicing probability than the low vowel /a/ [z=−2.759, p=0.006]. None of the other vowel comparisons were significant. With respect to consonant position, no significant differences were found when comparing initial with medial consonant position. For place of articulation, we found significantly lower voicing probability for velars than for dentals [z=2.103, p=0.035].
Moreover, we found a significant interaction between vowel context, with vowel /o/ and all other vowels (/a-o/, /e-o/, /i-o/) and consonant position. For the three vowel contexts /i e a/ devoicing was higher in initial consonant position, while for /o/ it was higher in medial position. The data for the EP stops in the left panel of Figure 9 illustrate this contrasting devoicing behaviour for differing vowel contexts.

Voicing probability at midpoint of EP phonologically voiced consonants across consonant position and vowel context: means and standard errors (± 1 standard error). The left panel shows data for stops (manner of articulation), and the right panel for fricatives.
In sum, we have statistically validated our observations for the EP voicing patterns. We found significantly higher stop devoicing for low vowel context than for high. There is no statistically significant difference between initial and medial consonant positions. Further, there is more devoicing for velar stops than for other places of articulation. As observed and presented in the voicing profiles in Figures 6 and 7, Figure 10 shows that the increased devoicing for velar place of stop articulation is only present in initial consonant position, but not in medial position.

Voicing probability at midpoint for all European Portuguese consonants (phonologically voiced and voiceless) across initial and medial consonant position: means and ± 1 standard error.
4 Discussion
The comparison of our data with the voicing profiles published by Shih et al. (1999) indicates that both our German and Italian data are generally in accordance with the devoicing reported there. Our German voicing profiles do show stronger devoicing than reported by Shih et al. (1999), but those researchers were focusing on different research questions. As a result, their study incorporated a large phoneme corpus without controlling for certain important factors known to influence devoicing, like position, vowel context or speaker influences. For example, the inclusion of Spanish intervocalic stops presents a problem when examining devoicing behaviour, since they are generally produced as approximants and thus are very unlikely to devoice. Concerns of this sort constrain the use of Shih et al.’s (1999) data for cross-linguistic comparison of different language families. To avoid such concerns for our EP corpus, we carefully controlled for reduction or lenition processes.
Our data confirm and extend the results of significantly higher devoicing for both stops and fricatives for German in comparison to Italian (north-eastern variety). Additionally, the data provide strong evidence that voicing in EP behaves very differently from voicing as found in two other Romance languages (Spanish (Shih et al., 1999) and our Italian data). Comparing the voicing profiles and statistical results of EP across those of Shih et al. (1999) and our German and Italian data it is evident that EP does not behave like the other two Romance languages with regard to devoicing of phonologically voiced consonants. This result, which differentiates EP from Italian on one hand, and groups it with German on the other, is backed up by the linear mixed models analysis of voicing status at the consonantal midpoint, where we found significant differences between Italian and EP, and between Italian and German, but not between EP and German.
4.1 Differences between different consonant classes: voicing and manner
For the three languages examined, the comparison of the voiced and the corresponding voiceless consonants confirms the existence of a consistent and stable difference between the two classes. Even when devoicing was very strong and present throughout the entire closure or constriction, it did not result in similar voicing curves for devoiced consonants and their voiceless analogues. Thus, for all languages observed, the comparison of phonologically voiced consonants with their voiceless counterparts shows that even for the most strongly devoiced items a consistent difference is maintained in both amplitude and shape of the voicing profiles.
All the phonologically voiced stops and fricatives examined in EP showed unexpectedly high devoicing for all places of articulation. For the voiced fricatives, this is in line with the high level of devoicing found for EP by Jesus and Shadle (2002, 2003) and Pinho et al. (2012). For the fricatives, these results confirm Ohala’s (1983) idea that voiced fricatives are prone to devoicing, given that the simultaneous maintenance of voicing and frication is rather challenging. However, we found less devoicing for fricatives than for stops (see Figure 6 and Figure 7), which thus runs against Ohala’s (1983) predictions.
4.2 Differences in position, vowel context, and place of articulation
For German, we found greater devoicing for a consonant in intervocalic initial position than in intervocalic medial position. In contrast, none of the EP speakers showed this effect opposing intervocalic initial and medial stop position, and the statistical analysis of the acoustic midpoint backed up this finding: position was not significant for EP data. This indifference to position for EP indicates that word boundary effects are stronger for German then for EP, with higher devoicing in German intervocalic initial position (Pape et al., 2006) as compared to EP.
We cannot compare our data to English, since the initial position measures in Westbury and Keating (1985) were obtained in isolated position, while our EP (intervocalic) initial position (similar to the dataset in Pape et al. (2006)) is preceded by a word boundary. Further, it has to be noted that the comparisons for German initial and medial /s z/ and /ʒ/ are not very reliable due to the restriction of these phonemes to the lack of contrast pairs (/s z/) and its use in loanwords only (/ʒ/).
Our analysis of the EP stop data shows that contrasts in vowel height context result in consistent differences between the voicing profiles concerned. For EP, five of six speakers consistently show this effect at both initial and medial position for velar stops. This was backed up by the statistical analysis at the consonantal midpoint, where significant differences are obtained contrasting not only the extreme /i-a/ context, but also the closer /i-e/ and /a-o/ contexts. However, the differences for /i-e/ and /a-o/ are in opposite directions (see Figure 9) medially, so only the context /i-a/ is robust. Thus, our results for vowel context effects in EP consonant devoicing are in line with previous results for German (Pape et al., 2006) and English (Ohala & Riordan, 1979).
5 Conclusions
We could show that EP, in contrast to Spanish (Shih et al., 1999) and Italian, shows strong devoicing throughout all phonologically voiced obstruents. Given that all three languages belong to the Romance languages family, one would expect higher obstruent voicing for EP, given that voicing maintenance is expected to be more or less uniform across other Romance languages. Furthermore, the typology differences between the languages complicate the strong devoicing for EP. Whereas in German the stops /p t k/ are usually aspirated (long-lag VOT), they are unaspirated (have short-lag rather than long-lag VOT) in EP (Lousada et al., 2010).
From a perceptual perspective, German can afford to give in to the natural aerodynamic tendency to devoice the phonologically voiced /b d ɡ/ (Ohala, 1983) because there is still aspiration (among other cues) to distinguish between /b d ɡ/ and /p t k/. However, for EP, with the combination that both /b d ɡ/ and /p t k/ are unaspirated and /b d ɡ/ are subject to devoicing, what is left as a phonetic basis of the voiced–voiceless contrast in EP? Apparently EP fits neither into the aspirated–unaspirated type of languages such as German or English, nor into the strictly voiced–voiceless type of languages such as Italian or Slavic languages. In the following, we present different possibilities for the EP voicing distinction, all from a perceptual point of view; however, follow-up perceptual experiments have to be conducted (see results for EP and Italian presented by Pape and Jesus (2014)) to verify the speculations.
First, it could be the case that EP relies heavily upon another phonetic correlate. Lousada et al. (2010) showed that /b d ɡ/ have shorter closure durations than /p t k/ for both word-medial and word-initial position. In German, on the other hand, /b d ɡ/ have shorter closure duration than /p t k/ word-medially, but NOT word-initially (i.e., word-initially there is no reliable closure duration difference between the stop series in German but there is in EP). It could be the case then that closure duration is a more robust cue to the stop contrast in EP than it is in German. This could be connected to our finding that there is more devoicing in initial position than in medial position in German but not in EP. It could be the case that for the word-initial position the closure duration in /b d ɡ/ is relatively longer in German than in EP, making it more difficult to maintain voicing throughout closure in German than in EP.
A second possibility, given the described frequent absence of a burst for both voiced but also voiceless EP stops, maybe EP listeners have to rely on the voicing cue after all. Figure 2 and Figure 3 show that voiceless stops (and fricatives) are still more ‘radically’ voiceless than the voiced ones (i.e., the voicing ceases faster for voiceless stops), and this gradient difference between partially devoiced and fully voiceless stops might be sufficient for the listener to extract a stable voicing distinction.
From a speech production point of view, we could provide the following explanation for the unexpected results for EP, arguing on the rhythm contrast between the examined languages: in contrast to other syllable-timed Romance languages (e.g., Spanish and Italian), EP is normally reported to be a stress-timed language (Cruz-Ferreira (1999); see, however, Frota and Vigário (2006) for evidence that EP shows shared properties of both stress-timed and syllable-timed languages), with reduction and neutralisation characteristics similar to other stress-timed languages like German and English (Mairano & Romano, 2011). In these languages, segments between stresses have the tendency to undergo substantial changes, with strong effects on segments themselves (e.g., vowel centralisation). Devoicing occurrences could be seen as a by-product to the high-priority requirement to enhance and fully maintain certain (stressed) segments. In other words, to maintain voicing is costly (with respect to speech economy), thus voicing can be easily sacrificed for the strengthening of other features. This assumption would clearly differentiate the devoicing characteristics for the two groups of languages examined in this study, separating the stress-timed languages (thus EP and German) from the syllable-timed language (Italian). The hypothesis is that stress-timed languages prefer a system that is not based on voicing (e.g., rather based on aspiration as in German) whereas syllable-timed languages do not have to squeeze a certain amount of segments into a given stress, with the probability to give in to reduction and devoicing. As a consequence, this allows syllable-timed languages to maintain voicing (and voicing differences) as a distinctive acoustic cue.
Footnotes
Appendix
CVCV items used in the recording of the database for EP, German and Italian and information about the identity of the target word as a real word in these languages.
| item | Real word in EP | Real word in German | Real word in Italian |
|---|---|---|---|
| /papa/ | yes (father) | yes (father) | yes (pope, father) |
| /pepe/ | yes (pepper) | ||
| /pipi/ | yes (vagina) | yes (pee) | yes (pee) |
| /popo/ | yes (buttocks) | ||
| /baba/ | yes (dribble/drool) | ||
| /bebe/ | yes (baby) | yes (baby) | |
| /bibi/ | |||
| /bobo/ | yes (fool) | ||
| /tata/ | yes (sister) | ||
| /tete/ | |||
| /titi/ | |||
| /toto/ | yes (game) | yes (whole) | |
| /dada/ | yes (name, art movement) | ||
| /dede/ | |||
| /didi/ | yes (name) | ||
| /dodo/ | |||
| /kaka/ | yes (poo) | yes (poo) | |
| /keke/ | |||
| /kiki/ | yes (name) | ||
| /koko/ | yes (coconut) | yes (name) | yes (coconut) |
| /ɡaɡa/ | yes (fool) | yes (fool) | yes (fool) |
| /ɡeɡe/ | |||
| /ɡiɡi/ | |||
| /ɡoɡo/ | |||
| /fafa/ | yes (lot) | ||
| /fefe/ | |||
| /fifi/ | |||
| /fofo/ | yes (soft) | ||
| /sasa/ | |||
| /sese/ | |||
| /sisi/ | yes (name) | ||
| /soso/ | |||
| /zaza/ | |||
| /zeze/ | |||
| /zizi/ | |||
| /zozo/ | |||
| /ʃaʃa/ | |||
| /ʃeʃe/ | |||
| /ʃiʃi/ | |||
| /ʃoʃo/ | |||
| /ʒaʒa/ | |||
| /ʒeʒe/ | |||
| /ʒiʒi/ | |||
| /ʒoʒo/ |
CVCV: consonant, vowel, consonant, vowel; EP: European Portuguese
Acknowledgements
We would like to thank the Instituto di Scienze e Tecnologie della Cognizione (ISTC) in Padova, especially Claudio Zmarich, for enabling us to record our Italian data there and use the soundproof room. We would also like to thank the phonetics laboratory, especially Jörg Dreyer, at the ZAS in Berlin for letting us record our German data there. Further, we thank Susanne Fuchs and Caterina Petrone for help with the linear mixed models for binomial data. We also thank all our subjects for their participation in our experiments.
Funding
This work was partially funded by National Funds through FCT – Foundation for Science and Technology, in the context of the project PEst-OE/EEI/UI0127/2014 to IEETA, and the post-doctoral fellowship from FCT (Portugal) SFRH/BPD/48002/2008 to Daniel Pape.
