Abstract
This study examined whether the discrimination accuracy of nonnative vowels could be predicted by how listeners assimilate nonnative phones into their L1. The results demonstrated that Japanese listeners discriminated between English /æ/ and /ʌ/ better than they did between /ɑ/ and /ʌ/, although they categorized all those stimuli as the Japanese /a/. Given that the acoustic distance between stimuli was controlled to be identical, this result was attributed not to the acoustic difference but to the category-goodness difference. The goodness-of-fit to the Japanese /a/ phoneme differed between the English /æ/ and /ʌ/ but not between the English /ɑ/ and /ʌ/, suggesting that it is more difficult to discriminate between vowels when the category-goodness difference between two nonnative stimuli is smaller. In addition, this study examined the relationship between perceptual assimilation and the focalization effect. Focalization affects directional asymmetry in a manner that renders detecting a sound change from a more-focal to a less-focal vowel more difficult than detecting a change in the opposite direction. The results demonstrated that this directional asymmetry is only observed when listeners assimilate two nonnative phones into a single L1 phonemic category, with no category-goodness difference between the two nonnative phones.
Keywords
1 Introduction
1.1 Perceptual assimilation of nonnative vowels and their discriminability
The difficulty in identifying and discriminating speech sounds is not the same for all nonnative phones. Some nonnative phones can be discriminated more easily than others (Best et al., 2001; Best & McRoberts, 2003; Best & Tyler, 2007; Tyler et al., 2014). According to the Perceptual Assimilation Model (PAM; Best, 1994a, 1994b, 1995), when people perceive unfamiliar nonnative phones, they tend to assimilate them into the most articulatorily similar native-language phonemes, and this assimilation pattern predicts how well people can discriminate nonnative consonants (Best et al., 2001; Best & Strange, 1992; Guion et al., 2000; Takagi, 1993) and vowels (Escudero & Williams, 2011; Grimaldi et al., 2014; Lengeris, 2009; Mokari & Werner, 2017; Strange et al., 1998; Sun & van Heuven, 2007; Tyler et al., 2014). However, discriminability is also attributed to the acoustic distance between stimuli (Lengeris, 2009; Strange et al., 1998) and language-universal vowel perception bias, such as favoring focal vowels (Masapollo et al., 2015, 2017; Polka & Bohn, 2003, 2011; Schwartz et al., 2005; Tyler et al., 2014). In this study, we controlled for acoustic distance and examined whether perceptual assimilation patterns could predict the discriminability of nonnative vowels. The effect of focalization on directional asymmetry in discriminating nonnative vowels was also analyzed to test the hypothesis that directional asymmetry occurs only for nonnative contrasts exhibiting a particular type of perceptual assimilation.
Addressing the perception of English vowels by Japanese listeners, Strange et al. (1998) demonstrated that the English vowels /æ/, /ʌ/, and /ɑ/ were most frequently assimilated into the Japanese vowel /a/, but the English /æ/ was judged as a worse exemplar of the Japanese /a/ than the other two vowels. Lengeris (2009) observed that it was difficult for Japanese listeners to discriminate between two English vowels when they were either equally good or bad exemplars of a single L1 phoneme into which they were assimilated. This assimilation pattern is known as single-category assimilation, according to PAM (Best, 1994a, 1994b, 1995). Lengeris (2009) also demonstrated that it was relatively easy for Japanese listeners to discriminate between two English vowels when one was a better exemplar of an L1 phoneme than the other. This pattern is known as category-goodness difference assimilation (Best, 1994a, 1994b, 1995). Shafer et al. (2021) examined both behavioral and neural discrimination of English /æ/, /ʌ/, and /ɑ/ by Spanish, Japanese, and Russian listeners. They found that behavioral and neural discrimination tended to be influenced by participants’ L1 phonology. However, because these previous studies used natural recordings of English vowel stimuli, the difference in discriminability was attributed not only to perceptual assimilation but also to the acoustic distance between stimuli. The English /ɑ/ and /ʌ/ vowels are spectrally closer to each other than the English /æ/ and /ʌ/ vowels (e.g., Hillenbrand et al., 1995; Kent & Read, 2002).
1.2 Directional asymmetries
When testing the discriminability of two-vowel contrasts, directional asymmetries related to the language-universal vowel perception bias must be considered. According to the Natural Referent Vowel (NRV) framework (Polka & Bohn, 2003, 2011), detecting a change from a more peripheral vowel (more front/back and higher/lower tongue position) to a less peripheral vowel is more difficult than detecting a change in the opposite direction (from a less peripheral to a more peripheral vowel). This effect on directional asymmetries was originally reported as a peripherality effect (Polka & Bohn, 2003), but was later revised as focalization (Polka & Bohn, 2011; Schwartz et al., 2005). Producing more peripheral vowels requires more extreme vocalic articulations, which cause formants to merge; this merged acoustic energy (i.e., narrowly focused and enhanced spectral amplitude) allows listeners to easily detect more focal vowels from a sequence of less focal vowels (Polka & Bohn, 2011; Schwartz et al., 2005). Polka et al. (2023) later demonstrated that the focalization of vowels affects directional asymmetry more than their peripherality. Polka and Bohn (2011) claimed that focal vowels act as perceptual anchors guiding the development of native vowel categories. Therefore, once native phonological categories are established through development, directional asymmetry in the perception of native vowels fades or disappears, but the effects remain in the perception of nonnative vowels.
A vast body of research has tested the effects of focalization on directional asymmetry in nonnative vowel discrimination (Harnsberger, 2001; Levy, 2009; Masapollo et al., 2015, 2017) and has even shown directional asymmetry for consonant discrimination (Bundgaard-Nielsen et al., 2015; Dar et al., 2018; Tsushima et al., 2003). However, to the best of our knowledge, few studies have tested the effect of L1–L2 phonetic difference on the directional asymmetry of vowel discrimination (Masapollo et al., 2015, 2017; Shafer et al., 2021; Tyler et al., 2014). For example, Tyler et al. (2014) examined the relationship between directional asymmetry and PAM. They hypothesized that the directional asymmetry caused by the focalization effect in the NRV framework would be observed only when listeners relied on phonetic rather than phonological information. This is because no directional asymmetry is observed in native vowels. Tyler et al. (2014) predicted that if listeners assimilated two nonnative phones into two different native phonemes (i.e., two-category assimilation as per PAM), or if they assimilated one of them but not the other one (i.e., uncategorized–categorized assimilation as per PAM), directional asymmetry would not be observed. The listeners’ discrimination accuracy was predicted to be at ceiling because the listeners would rely on native phonological categories. By contrast, listeners rely on phonetic perception when discriminating between two nonnative phones that are assimilated into a single native phoneme category (i.e., single-category assimilation and category-goodness assimilation). In this case, directional asymmetries in discriminability are expected to emerge. However, Tyler et al. (2014) found significant directional asymmetry only for the single-category but not for the category-goodness difference assimilation type. Tyler (2021b) later stated that sensitivity to language-independent phonetic information may be observed when attention is drawn to lower-order information (e.g., at the acoustic-phonetic perception level). This indicates that the language-universal bias in phonetic perception (i.e., the focalization effect) is observed only when listeners do not perceive a difference in the phonetic goodness-of-fit as a native phoneme.
In addition, Shport (2019) examined the relationship between directional asymmetry and perceptual assimilation patterns. Unlike the findings of Tyler et al. (2014), their results showed directional asymmetry even in the two-category assimilation type. Shport (2019) explained that acoustic-phonetic similarities between vowels might predict directional asymmetry better than the NRV framework. Our study controlled for the acoustic distance between vowels to be identical and tested how the vowels were perceived by English and Japanese listeners; moreover, it examined whether the focalization effect was observed only for a nonnative vowel contrast that falls into the single-category assimilation type but not the category-goodness assimilation type.
1.3 Present study
Two experiments were conducted in this study. Experiment I examined the categorization of a resynthesized vowel continuum with a consistent duration by English and Japanese listeners. The goodness-of-fit rating to their respective native vowels was also measured. Based on these results, three stimuli were selected for Experiment II. To examine our hypotheses, the stimuli must be identified as three separate English vowels (i.e., /æ/, /ʌ/, /ɑ/) by English listeners but categorized as a single Japanese vowel /a/ by Japanese listeners. Experiment II examined the discrimination accuracies of the English /æ/-/ʌ/ and /ɑ/-/ʌ/ contrasts by both English and Japanese listeners. Following the predictions of PAM (Best, 1994a, 1994b), if the /ɑ/ and /ʌ/ pair fell into the single-category assimilation type, this pair was expected to be more difficult to discriminate than the /æ/ and /ʌ/ pair (category-goodness difference assimilation type) for Japanese listeners. In contrast, English listeners were expected to discriminate both pairs equally well because the /æ/, /ʌ/, and /ɑ/ stimuli are distinct phonemes in English. After confirming the perceptual assimilation effect, the directional asymmetry was examined. For Japanese listeners, detecting a change from /ɑ/ to /ʌ/ was expected to be more difficult than detecting a change in the opposite order because of the vowel focalization effect. However, such asymmetry was not predicted to occur in the /æ/-/ʌ/ contrast because Japanese listeners were expected to perceive a difference in the phonetic goodness-of-fit to Japanese /a/ between English /æ/ and /ʌ/.
2 Experiment I: identification test with goodness rating
2.1 Method
2.1.1 Participants
Table 1 presents the information regarding the participants. Participants were native monolingual speakers of American English and Japanese. All participants reported no history of speech or hearing impairment, and they had not lived outside their own country for more than 4 months. Japanese listeners had received English education at school (Mean = 11.2 years, SD = 3.0), but none of them had lived in an English-speaking country for over a month, which was within the below-3-month criterion of “less-experienced” group in a previous study reporting the PAM of Second Language Speech Learning (PAM-L2; Best & Tyler, 2007; Tyler, 2021a). The parents of each participant were native speakers of their respective languages, and the participants spoke their native language in daily life. This study was approved by the ethics review boards of Waseda University (Tokyo, Japan) and the University of Delaware (Newark, DE, USA). All the participants provided informed consent.
Participant Information.
2.1.2 Stimuli
Using linear predictive coding (LPC) analysis and resynthesis in Praat (Boersma & Weenink, 2017), a stimulus continuum with varying F2 frequency was created. A neutral LPC residual from a female speaker was filtered using the formant information (F1 = 979 Hz, F3 = 2886 Hz, and F4 = 4151 Hz). The F2 frequency was manipulated from 995 to 2698 Hz by interpolating the continuum into 66 stimuli. All other formant frequencies, the fundamental frequency (F0), and amplitude of the isolated vowel stimuli were maintained constant. The F0 was set to 220 Hz, and the sound intensity was normalized between the 66 stimuli using the root mean square method in Praat. This process was repeated twice to generate two versions of the 66-stimulus continuum: long (308 ms) and short (138 ms). The long version was used for the identification test in Experiment I, and the short version was used for the discrimination test in Experiment II. The long version was used to facilitate the perception of isolated vowels. However, length may have influenced the English listeners’ perception.
2.1.3 Procedure
Twenty-seven English listeners and 30 Japanese listeners participated in the identification with a goodness rating test in two separate locations (Delaware, U.S.; Tokyo, Japan). The identification task involved hearing the 66-stimulus continuum that consisted of varying F2 (995–2698 Hz) through a pair of headphones (Sennheiser HD 280 Pro) in a soundproof booth. The sounds were presented randomly and the participants identified the vowels. Before conducting the experiment, participants practiced 10 randomly selected stimuli. For English listeners, five words (head, had, hud, hod, and hawed) were displayed on a screen, and the participants clicked on the one including the vowel they thought they had heard. The Japanese listeners did the same using three moras in hiragana (i.e., Japanese letters: “へ” /he/, “は” /ha/, and “ほ” /ho/) that were displayed on screen. The participants subsequently ascribed a goodness rating on a 7-point scale based on how well the sound matched the vowel (7 for the best, 1 for the poorest). There were 66 stimuli, each of which was repeated four times, for a total of 264 trials.
2.2 Results
Figure 1 shows American English listeners’ vowel identification rates for each stimulus. Since English speakers in Delaware often do not make a distinction between /ɑ/ and /ɔ/ (Labov et al., 2005), the responses related to /ɑ/ and /ɔ/ are combined and reported as /ɑ/ for English listeners, as given below. The stimuli with F2 values of 2017, 1755, and 1493 Hz, were most frequently identified as three different English vowels—/æ/, /ʌ/, and /ɑ/, respectively—by English listeners. One could argue that the stimulus with an F2 value of 1775 Hz should not be categorized as the English /ʌ/ vowel, but as an “uncategorized” stimulus because of its lower categorization rate (i.e., 43%, Figure 1). However, this may not be an issue because both the two-category assimilation and the uncategorized–categorized assimilation types predict high discrimination accuracy and no directional asymmetry (Tyler et al., 2014). The relatively higher F1 and longer duration for the English vowel /ʌ/ might have contributed to the lower categorization rate.

Scatterplots of identification rates for the 66-stimulus continuum varying only in F2 as English vowel categories (/ɛ/, /æ/, /ʌ/, /ɑ/, and /ɔ/) by English listeners. A loess smooth curve fitted to data with α of 0.5 is displayed with the gray shading representing the 95% confidence interval. The vertical dashed lines represent the F2 frequencies at 2017 Hz, 1755 Hz, and 1493 Hz.
Figure 2 displays the goodness-of-fit rating of the three stimuli with F2 values of 2017, 1755, and 1493 Hz as the English /æ/, /ʌ/, and /ɑ/ vowels, given by English listeners, respectively. In accordance with Masapollo et al. (2017), stimuli not identified as the intended vowel (i.e., English /æ/, /ʌ/, or /ɑ/) were assigned a rating of 0. A cumulative link mixed-effects model with a logit link function was used to analyze the goodness rating test scores for the three stimuli. The fixed effect was the stimulus, and by-participant random intercepts were included. The cumulative link mixed-effects model demonstrated that the goodness-of-fit rating for the English /ʌ/ vowel was significantly lower than that for English /æ/, β = 0.76, SE = 0.26, z = 2.95, p < .01, suggesting that the stimulus with F2 value of 1755 Hz did not relatively fit well to the English /ʌ/ category, compared with the one with F2 at 2017 Hz categorized as English /æ/. While the difference in the rating score between English /ɑ/ and /ʌ/ was marginally significant, β = 0.44, SE = 0.26, z = 1.66, p = .097, no significant difference was found between English /æ/ and /ɑ/, β = −0.33, SE = 0.25, z = −1.29, p > .05. Although the stimulus with F2 value of 1755 Hz had a relatively lower goodness rating, we labeled it as English /ʌ/ because it was identified as /ʌ/ most frequently at the 43% rate. Based on these results, for the remainder of this paper, the three stimuli with F2 values of 2017, 1755, and 1493 Hz, will be referred to as the English /æ/, /ʌ/ and /ɑ/ stimuli, respectively, for easier understanding.

Boxplots of goodness rating for the three stimuli (F2 = 2017, 1755, and 1493 Hz) identified as the English vowels /æ/, /ʌ/, and /ɑ/, respectively, by English listeners. Means are calculated, including 0 scores for nonselection, and are displayed as dots. The lines in boxes represent the median. Since the /ʌ/ identification rate was 43% for the stimulus with an F2 value of 1755 Hz, the median of the goodness-of-fit score was 0.
Figure 3 shows the Japanese listeners’ vowel identification rates for each stimulus. All the three stimuli categorized as English /æ/, /ʌ/, and /ɑ/ by English listeners were most frequently identified as a single Japanese /a/. Although the Japanese /a/ identification rate for the English /æ/ stimulus was relatively low (49%), the categorization curve with the loess function showed more than 50% identification of all three stimuli as Japanese /a/. The 50% criterion for categorization has been used in previous studies (Bundgaard-Nielsen et al., 2011; Faris et al., 2018).

Scatterplots of identification rates for the 66-stimulus continuum varying only in F2 as Japanese vowel categories (/a/, /e/, /o/) by Japanese listeners. A loess smooth curve fitted to data with α = 0.5 is displayed with the gray shading representing the 95% confidence interval. The vertical dashed lines represent the F2 frequencies at 2017, 1755, and 1493 Hz.
Figure 4 displays the goodness-of-fit ratings of the three stimuli for Japanese /a/. As in the analysis of the English listeners’ data, a stimulus that was not identified as the intended vowel (i.e., Japanese /a/) was assigned a rating of 0. A cumulative link mixed-effects model was formulated with a fixed effect of the stimulus and by-participant random intercepts. The result demonstrated that the English /æ/ stimulus was a significantly worse exemplar of Japanese /a/ than the /ʌ/, β = −0.52, SE = 0.24, z = −2.12, p = .034. However, there was no significant difference in the goodness rating score between the /ʌ/ and /ɑ/ stimuli as the Japanese /a/, β = 0.18, SE = 0.24, z = 0.74, p > .05. These results suggest that the English /æ/-/ʌ/ contrast falls within the category-goodness assimilation type and the English /ɑ/-/ʌ/ contrast in the single-category assimilation.

Boxplots of goodness rating for the three stimuli (F2 = 2017, 1755, and 1493 Hz) identified as the Japanese vowel /a/ by Japanese listeners. Means are calculated, including 0 scores for nonselection and are displayed as dots. The lines in boxes represent the median. Since the /a/ identification rate was 49% for the stimulus with an F2 value of 2017 Hz, the median of the goodness-of-fit score was 0.
It should be noted that when a maximal model, including by-participant random slopes for stimulus, was used for the goodness-of-fit rating analyses (Barr et al., 2013), the results differed from that described above. The maximal model for Japanese listeners showed no significant difference in goodness rating between /æ/ and /ʌ/ or between /ɑ/ and /ʌ/, although the simpler model demonstrated a significant difference between /æ/ and /ʌ/. This result suggests a possibility of single-category assimilation for both contrasts for Japanese listeners. Nevertheless, this might be owing to an increase in the Type II error rate in using the maximal model (Matuschek et al., 2017). Besides this, there was no critical difference in the results between the two models. Following Tyler et al. (2014), who used t-tests to judge perceptual assimilation types, we used the simpler model with a fixed effect of stimulus and by-participant random intercepts.
Thus, the English /æ/-/ʌ/ contrast was interpreted as the category-goodness difference assimilation type, and the English /ɑ/-/ʌ/ contrast was as the single-category assimilation type for the Japanese participants. The stimuli with F2 values of 2017, 1755, and 1493 Hz had an identical acoustic distance between stimuli (i.e., 262 Hz) and were selected for use in the discrimination test to examine our hypotheses.
3 Experiment II: auditory discrimination
3.1 Method
3.1.1 Participants
Of the participants in Experiment I, 27 English and 21 Japanese listeners took part in Experiment II.
3.1.2 Stimuli
Table 2 presents the acoustic information for the resynthesized stimuli used in the auditory discrimination test. The F2 values of the /æ/, /ʌ/, and /ɑ/ stimuli were 2017 Hz (13.07 in Bark), 1755 Hz (12.14 in Bark), and 1493 Hz (11.06 in Bark), respectively. Although not strong, the /æ/ stimulus had F2 and F3 convergence and the /ɑ/ stimulus had F1 and F2 convergence, compared with the /ʌ/ stimulus (Figure 5). The duration of the stimuli used for auditory discrimination was 138 ms.
Information of the Resynthesized Stimuli.
B1–B4 stands for the bandwidth of F1–F4.

Wideband spectrograms (top) and the first three formant frequencies (bottom) of the three stimuli used for the discrimination test in Experiment II (F2 = 2017, 1755, and 1493 Hz).
3.1.3 Procedure
Using these three stimuli, an oddball auditory discrimination test was conducted with 27 English and 21 Japanese listeners in a soundproof booth at each location (Delaware, U.S.; Tokyo, Japan). Three numbers were displayed on the screen, and the participants heard three stimuli through a pair of headphones (Sennheiser HD280 Pro) at a comfortable level. They were asked to click on the number of the stimulus that sounded different from the other two. The inter-stimulus interval was 1.2 s. There were two vowel contrasts (/æ/-/ʌ/ and /ɑ/-/ʌ/), two possible oddball stimuli per vowel contrast, and three possible oddball stimulus positions, resulting in 12 unique trials. Each stimulus combination was played six times, meaning that each participant completed 72 trials of the auditory discrimination task. Their responses received no feedback, and they were unable to replay the stimuli.
3.2 Results
Figure 6 displays the auditory discrimination accuracies for the two vowel contrasts (/æ/-/ʌ/ and /ɑ/-/ʌ/) and the 95% confidence intervals of correct response probabilities predicted by a logistic mixed-effects model. For statistical analysis, a logistic mixed-effects model was used based on the correct or incorrect responses of all tokens. The model was selected using a top-down approach (i.e., excluding fixed and random effects that did not significantly improve model fit) as per the Akaike Information Criterion value. The model included fixed effects of language group (English, Japanese), vowel contrast (/ʌ/-/æ/, /ʌ/-/ɑ/), and the interaction of language group and vowel contrast. Orthogonal contrast was set for each fixed effect. Random effects were by-participant and by-token intercepts (i.e., 12 possible unique trials) and by-participant random slopes for vowel contrast. The logistic mixed-effects model demonstrated that language group, β = −0.14, SE = 0.12, z = −1.16, p > .05, and vowel contrast, β = 0.04, SE = 0.23, z = 0.19, p > .05, had no significant effect on discrimination accuracy. However, the interaction between language group and vowel contrast was significant, β = 0.25, SE = 0.12, z = 2.03, p = .043, suggesting that the effect of the vowel contrast on discrimination accuracy differed between the language groups. Although the acoustic distance between /æ/ and /ʌ/ was the same as that between /ɑ/ and /ʌ/ in F2 (Hertz), Japanese listeners discriminated the /æ/-/ʌ/ contrast more accurately than the /ɑ/-/ʌ/ contrast in comparison to English listeners, who discriminated both vowel contrasts (/æ/-/ʌ/, /ɑ/-/ʌ/) equally well.

Boxplots (left) of discrimination accuracy for the /æ/-/ʌ/ and /ɑ/-/ʌ/ contrasts by English and Japanese listeners and the 95% confidence intervals (right) of correct response probabilities in discrimination predicted by the logistic mixed-effects model used for analyses.
Figure 7 displays the directional asymmetry of the discrimination accuracy of the English /æ/-/ʌ/ and /ɑ/-/ʌ/ contrasts for English and Japanese listeners and the 95% confidence intervals of correct response probabilities predicted by a logistic mixed-effects model. To examine directional asymmetry, only the relevant directions (i.e., more-focal to less-focal vowel change and less-focal to more-focal vowel change) were included in the analysis. Trials with an odd stimulus placed in the middle of three sounds were excluded from the directional asymmetry analyses because they were irrelevant to the analysis. After excluding irrelevant tokens, we formulated a logistic mixed-effects model, which included the fixed effects of language group (English, Japanese), vowel contrasts (/æ/-/ʌ/, /ɑ/-/ʌ/), and stimulus direction (more-to-less focal vowel change, less-to-more focal vowel change). All possible interaction effects of the three two-way interactions (language group × vowel contrast, vowel contrast × stimulus direction, and language group × stimulus direction) and the three-way interaction (language group × vowel contrast × stimulus direction) were also included in the model. The random effects were by-participant intercepts, by-token intercepts, and by-participant random slopes for vowel contrast. Orthogonal contrast was set for each fixed effect.

Boxplots (left) of discrimination accuracy for the /æ/-/ʌ/ and /ɑ/-/ʌ/ contrasts by the stimulus change direction (more-to-less vs. less-to-more focal vowel change) for English and Japanese listeners and the 95% confidence intervals (right) of correct response probabilities in discrimination predicted by the logistic mixed-effects model used for analyses.
The logistic mixed-effects model demonstrated a significant three-way interaction of language group, vowel contrast, and stimulus direction, β = 0.16, SE = 0.07, z = 2.25, p = .024. This suggests that the directional asymmetries for the two vowel contrasts were significantly different between the English and Japanese groups. In addition, there was a significant two-way interaction between language group and vowel contrast, β = 0.21, SE = 0.10, z = 2.01, p = .045, suggesting that the difference in discrimination accuracy between vowel contrasts was larger for Japanese listeners than for English listeners. No other significant factors were found in the mixed-effects model, at the significance level of .05.
For further analyses, a separate logistic mixed-effects model was formulated for each language group. Fixed effects were vowel contrast, stimulus direction, and their interaction, and random effects were by-participant and by-token intercepts. By-participant random slopes for vowel contrast were included in the mixed-effects model for English listeners but excluded from the model for Japanese listeners because they did not improve the model fit significantly (Barr et al., 2013; Matuschek et al., 2017). For English listeners, neither vowel contrast, β = −0.22, SE = 0.21, z = −1.02, p > .05, nor stimulus direction, β = −0.23, SE = 0.15, z = −1.56, p > .05, had a significant effect. The interaction between vowel contrast and stimulus direction was also not significant, β = 0.01, SE = 0.15, z = 0.07, p > .05. By contrast, for Japanese listeners, stimulus direction had a marginally significant effect, β = −0.33, SE = 0.18, z = −1.81, p = .071, but vowel contrast did not, β = 0.20, SE = 0.18, z = 1.10, p > .05. The interaction of vowel contrast and stimulus direction was also marginally significant, β = 0.31, SE = 0.18, z = 1.70, p = .088.
For Japanese listeners, a separate logistic mixed-effects model was formulated for each vowel contrast. The results demonstrated that the effect of stimulus direction (i.e., the directional asymmetry) was significant for the /ɑ/-/ʌ/ contrast, β = −0.64, SE = 0.29, z = −2.25, p = .025, but not for the /æ/-/ʌ/ contrast, β = −0.03, SE = 0.23, z = −0.12, p > .05. This suggests that detecting the vowel change from /ɑ/ to /ʌ/ was more difficult than detecting the vowel change from /ʌ/ to /ɑ/, while this directional asymmetry was not found for the /æ/-/ʌ/ contrast.
These results demonstrated that although there was no directional asymmetry found for native listeners perceiving either contrast, directional asymmetry was significant for nonnative listeners perceiving the contrast in the single-category assimilation type but not the category-goodness difference assimilation type.
4 Discussion
In this study, the discriminability of the English /æ/-/ʌ/ and /ɑ/-/ʌ/ contrasts was examined for both English and Japanese listeners using resynthesized stimuli in which the acoustic distance was controlled. It was predicted that discriminating between English /ɑ/ and /ʌ/ would be difficult for Japanese listeners because they were expected to assimilate English /ɑ/ and /ʌ/ into a single Japanese /a/ category as equally good or bad exemplars (single-category assimilation). They were also expected to better discriminate English /æ/ and /ʌ/ because /æ/ would be a poorer exemplar of Japanese /a/ than /ʌ/ (category-goodness assimilation), as proposed in PAM (Best, 1994a, 1994b, 1995). By contrast, English listeners possess the three separate phonological categories of /æ/, /ʌ/, and /ɑ/; thus, they were expected to discriminate the English /æ/-/ʌ/ and /ɑ/-/ʌ/ contrasts equally well. These predictions were supported by the results. The /ɑ/-/ʌ/ contrast was more difficult for Japanese listeners to discriminate than the /æ/-/ʌ/ contrast, and both contrasts were discriminated equally well by English listeners.
Discriminability is predicted by perceptual assimilation patterns, but it is also attributed to a language-universal perceptual bias (i.e., focalization of vowels). Focal vowels are perceptually more salient than nonfocal vowels; therefore, detecting a less-to-more focal sound change is easier than detecting the opposite. Polka and Bohn (2003, 2011) proposed that this directional asymmetry disappears for native contrasts but remains for nonnative contrasts because the language-universal vowel perception bias is attuned to listeners’ language experience. Following this claim, Tyler et al. (2014) hypothesized that directional asymmetry would be found for both the single-category and category-goodness assimilation types because listeners rely on their phonetic perception but not their phonological categories. However, Tyler et al. (2014) found asymmetry only in single-category assimilation. Tyler (2021b) later stated that listeners prefer to pay attention to L1 phonological information over language-universal phonetic information, but the language-universal phonetic perception is not lost and may be observed when attention is drawn to the lower-order information. This suggests that language-universal phonetic perception is only apparent for phonetic perception with no difference in category-goodness ratings.
We hypothesized that a universal vowel perception bias would be observed only if there were no phonetic differences in the goodness-of-fit to a native language phoneme. To examine the effect of spectral assimilation on directional asymmetry, this study controlled both spectral and temporal cues between stimuli, which was not addressed in previous studies. Our findings supported our hypothesis. There was significant directional asymmetry for single-category assimilation, but not for category-goodness assimilation. No asymmetry was observed when there was a category-goodness difference.
The results of this study support previous findings. As Masapollo et al. (2017) demonstrated, the NRV framework (Polka & Bohn, 2003, 2011) better predicted the directional asymmetry for Japanese listeners perceiving the English /æ/, /ʌ/, and /ɑ/ than the Native Language Magnet model (NLM; Iverson & Kuhl, 1995; Kuhl, 1991; Kuhl et al., 1992). According to the NLM model, when two vowels are equally good or bad exemplars of a native vowel (e.g., nonnative vowels in the single-category assimilation type in PAM), there is no difference in perceptual salience, leading to no directional asymmetry in discrimination accuracy. However, when the proximity to a prototype differs between two stimuli (e.g., nonnative vowels in the category-goodness assimilation type in PAM), the one closer to the prototypical vowel is more salient than the other. Consequently, it is more difficult to detect a sound change from a more to a less prototypical vowel, compared with a change from a less to a more prototypical vowel. Following this model, the directional asymmetry should have been observed in the /æ/-/ʌ/ contrast more than the /ɑ/-/ʌ/ contrast because the former had larger difference in the proximity to the best exemplar of Japanese /a/ than the latter. However, our results did not show what the NLM model predicted, but what the NRV framework predicted. There was a directional asymmetry only in the /ɑ/-/ʌ/ contrast, suggesting that the NRV framework better predicted the directional asymmetry than the NLM model.
The ceiling effect could be responsible for the lack of directional asymmetry for the /æ/-/ʌ/ contrast. Eight out of 21 Japanese participants marked all the correct responses in discriminating the vowel change from /ʌ/ to /æ/, and eight of them marked all the correct responses in the reverse order from /æ/ to /ʌ/ (four of them marked all the correct responses in both orders). This means that the discrimination accuracy showed the ceiling effect for both directions of the /æ/-/ʌ/ contrast and that the directional asymmetry might not have been observed owing to the ceiling effect.
One of the limitations of this study was phonological categorization. The identification score for the English vowel /ʌ/ was not high (i.e., 43%), suggesting that the stimulus was “uncategorized” rather than categorized as /ʌ/. The duration of stimuli (308 ms) and relatively high F1 may have contributed to the relatively low identification rate for the English /ʌ/ vowel. If the stimulus with an F2 value of 1755 Hz was “uncategorized” by English listeners, both the /æ/-/ʌ/ and /ɑ/-/ʌ/ contrasts would be the uncategorized–categorized assimilation type for English listeners (Faris et al., 2018). The discrimination accuracy of this assimilation type was predicted to be high (Tyler et al., 2014). However, the F2 frequency was the only difference among the English /æ/, /ʌ/, and /ɑ/ stimuli in this study. The relatively lower discrimination accuracy of the English listeners may be attributed to the unnaturalness of the stimuli.
In future studies, the Bark or Mel scale should be used instead of the Hertz scale when the acoustic distance between stimuli is equalized. In this study, the /æ/-/ʌ/ contrast had a smaller F2 difference in Bark or Mel but received a higher discrimination accuracy score than the /ɑ/-/ʌ/ contrast did. Since a higher discrimination accuracy was observed for a nonnative vowel contrast with a smaller acoustic difference, claims based on PAM and NRV are still supported.
In conclusion, this study demonstrates that PAM accurately predicts nonnative vowel discriminability, which is related to directional asymmetry attributed to the focalization of vowels. There is directional asymmetry for nonnative vowel contrasts of the single-category assimilation type but not the category-goodness assimilation type because a language-universal vowel perception bias is only observed when there is no perceived difference in the goodness-of-fit to a native phoneme. This suggests that universal phonetic perception is tuned not only by listeners’ native phonological categories but also by their allophonic variants within a native phoneme category. Future studies should examine how allophonic variants within a native phoneme category develop in memory and adjust language-universal phonetic perceptions.
Footnotes
Acknowledgements
The authors thank their research assistant, Ms. Midori Sato, who helped them collect the data and recruit participants for their experiments and the audience at ICPhS 2019 and New Sounds 2019 for their useful comments.
Authors’ note
A portion of this work was published as “Effects of perceptual assimilation: The perception of English /æ/, /ʌ/, and /ɑ/ by Japanese speakers.” In S. Calhoun, P. Escudero, M. Tabain, & P. Warren (Eds.), Proceedings of the 19th International Congress of Phonetic Sciences, Melbourne, Australia 2019 (pp. 2344–2348).
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by JSPS KAKENHI (grant nos. 16K16884, 19K13169, and 22KK0195) and Waseda University Grants for Special Research Projects (grant nos. 2017K-221, 2018K-159, and 2019E-030).
