Abstract
This study investigated the role of different cognitive abilities—inhibitory control, attention control, phonological short-term memory (PSTM), and acoustic short-term memory (AM)—in second language (L2) vowel learning. The participants were 40 Azerbaijani learners of Standard Southern British English. Their perception of L2 vowels was tested through a perceptual discrimination task before and after five sessions of high-variability phonetic training. Inhibitory control was significantly correlated with gains from training in the discrimination of L2 vowel pairs. However, there were no significant correlations between attention control, AM, PSTM, and gains from training. These findings suggest the potential role of inhibitory control in L2 phonological learning. We suggest that inhibitory control facilitates the processing of L2 sounds by allowing learners to ignore the interfering information from L1 during training, leading to better L2 segmental learning.
1 Introduction
Studies have extensively investigated the difficulties that adult learners usually face in the production and perception of non-native sounds and have often observed individual differences between subjects. Previous research has determined several effective factors in the successful learning of non-native sounds. These well-documented factors include the relationship between the sound inventory of the first language (L1) and second language (L2) (Best, 1995; Flege, 1995; Kuhl, 2000), the age of L2 learning (Flege, Yeni-Komshian, & Liu, 1999; Munro, Flege, & MacKay, 1996), the length of exposure to L2 (Flege, Bohn, & Jang, 1997; Jia, Strange, Wu, Collado, & Guan, 2006), and the degree of ongoing L1 use (Flege, Frieda, & Nozawa, 1997; Flege & MacKay, 2004). However, even when these factors are controlled, the individual differences in the production and perception of L2 sounds remain large.
One of the possible reasons for these individual differences is the existence of factors other than those mentioned above. Perceptual learning may be related to more general cognitive abilities (Goldstone, 1998). Extralinguistic factors such as working memory (Aliaga-García, Mora, & Cerviño-Povedano, 2010; Service, 1992), attention control (Darcy, Mora, & Diadone, 2014; Guion & Pederson, 2007; Safronova, 2016), inhibition (Darcy et al., 2014; Darcy, Mora, & Diadone, 2016), musical ability (Delogu, Lampis, & Belardinelli, 2010), and phonological short-term memory (PSTM) (Aliaga-García et al., 2010, 2011; Cerviño-Povedano & Mora, 2011; Darcy, Park, & Yang, 2015; O’Brien, Segalowitz, Freed, & Collentine, 2007; Safronova, 2016) are influential in the perception and production of L2 sounds. Kim and Hazan (2010) found that attentional switching, the ability to sort stimuli according to a particular dimension, frequency acuity, and the ability to associate two unrelated events are correlated with the ability to learn a novel phonetic contrast. Safronova (2016) studied the effects of PSTM, acoustic short-term memory (AM), and attention control on the L2 vowel perception of 45 adult Catalan-Spanish bilingual learners of English. She found that PSTM, AM, and attention control ability significantly contributed to explaining the variance in learners’ perception of L2 sounds. Safronova (2016) also found that attention control and AM were related to learners’ perception of cross-language phonetic distance.
Intensive phonetic training, even for short periods only, improves the perception of novel consonant or vowel contrasts (Iverson & Evans, 2009; Logan, Lively, & Pisoni, 1991), and its effects can be retained in the long term (Bradlow, Akahane-Yamada, Pisoni, & Tohkura, 1999). Between-subject variability have also been reported in the amount of performance gains after laboratory training, from very small to large improvements. For instance, Bradlow et al. (1999) found individual differences in the identification of the English /r/ and /l/ by Japanese learners before and after training. Perrachione, Lee, Ha, and Wong (2011) suggested that successfully learning a foreign-language phonological contrast for pitch depends on the interaction between individual differences in perceptual abilities and the design of the training paradigm. They reported that high-variability training enhanced learning only for individuals with strong perceptual abilities.
Studies on the effects of cognitive abilities on L2 phonological learning are limited. To the best of our knowledge, no study has explored the effects of inhibition, attention control, and AM on the amount of gains from phonetic training. Some studies have explored the effects of different cognitive abilities on L2 phonological competence (Darcy et al., 2015; Safronova, 2016); however, the present study specifically aims to examine the effects of cognitive abilities on L2 phonological learning by conducting a phonetic training for the whole study population. We explore the correlates of individual variability in training results using a broad test battery of cognitive abilities, including a PSTM task, an AM task, the Stroop task, and a retrieval-induced forgetting (RIF) task.
1.1 Inhibitory skills
Inhibition is an important executive function that helps suppress irrelevant and interfering information and process the related information. The inhibitory control model (Green, 1998) proposes that the inhibitory control mechanism effectively limits attention only to the target language during language processing and production. It further proposes that the more dominant a language is, the more inhibition is needed to inhibit it. Switching from the less dominant language to the dominant language is slower than the other way around, presumably because of the greater inhibition that must be overcome (Meuter & Allport, 1999). Recent studies reported a direct association between inhibition and language selection during bilingual processing (Festman, Rodriguez-Fornells, & Münte, 2010; Gollan, Sandoval, & Salmon, 2011; Linck, Schwieter, & Sunderman, 2012).
Lev-Ari and Peperkamp (2013) found that individual differences in inhibitory skill influence the degree to which bilinguals’ L2 influences their L1. They showed that the poorer the bilinguals’ inhibitory skill was, the more their voice onset times (VOTs) in their L1 shift in the direction of those in the L2. They reported that late English–French bilinguals with lower inhibitory skill residing in France produced and perceived VOTs of voiceless stops in English in a more French-like manner. In addition, they found that higher exposure to L2 increased the importance of inhibitory skill to protect against the influence of L2 on L1.
Regarding the effects of inhibitory skills on L2 phonological processing, Darcy et al. (2016) studied the perception and production of L2 sounds among Spanish learners of L2 English and those with American English as their L1 learning L2 Spanish and found that higher inhibitory control is related to a lower error rate in segmental perception. Darcy et al. (2016) suggested that better inhibitory skills may help L2 speakers discriminate the L2 sound pairs that interfere with their L1 sounds. They proposed a potential role for inhibition in L2 phonological acquisition, as inhibition enhances the processing of phonologically relevant acoustic information in the L2 input, which leads to more accurate L2 phonological representations. Since inhibitory control has been reported to have a potential role in L2 phonological acquisition, we hypothesize that better inhibitory skills will lead to a higher gain from phonetic training in the discrimination of L2 vowels.
1.2 Attention control
Attention control is defined as the ability to switch attention between different dimensions relevant to a task (Posner & DiGirolamo, 2000). It also refers to an individual’s ability to efficiently shift attention among different sets of linguistic relationships (Talmy, 1996). The theory of constructive operators, a model of mental attention, makes a distinction between effortful and automatic inhibition (Johnson, Im-Bolter, & Pascual-Leone, 2003). RIF can be considered an automatic inhibition, while the Stroop effect (see section 2.4.4 and the last paragraph of this section) is an effortful inhibition.
Generally, attention control has been shown to play an important role in L2 learning (Segalowitz & Frenkiel-Fishman, 2005). The link between attention control and speech perception has been extensively studied (Astheimer, Berkes, & Bialystok, 2016; Lange, Rösler, & Röder, 2003; Sanders & Astheimer, 2008). Research has suggested that attentional resources help learners discern perceived phonetic distance between L2 and L1 sounds (Safronova, 2016). Guion and Pedersen (2007) found that adult learners can better discern novel phonetic contrasts through the explicit directing of attention. Francis, Baldwin, and Nusbaum (2000) reported that training L2 learners to focus attention on the relevant acoustic cues signaling the differences between L2 sounds is related to successful L2 speech learning.
Von Kriegstein, Eger, Kleinschmidt, and Giraud (2003) proposed that to understand L2 speech, listeners may need to allocate their attention efficiently to several competing dimensions in speech. Researchers have also claimed that efficient attention control may be required for listeners to perceive phonetic cues that provide information for the categorization of sounds (Assmann & Summerfield, 1994; Gordon, Eberhardt, & Rueckl, 1993). Darcy et al. (2014) found that more efficient attention control was associated with more accurate perception and use of contrastive vocalic and consonantal features in the categorization of L2 speech stimuli. They suggested that participants with higher attentional control developed more accurate perceptual representations for the L2 sounds. Safronova (2016) measured individuals’ ability to rapidly and accurately shift their focus of attention between two acoustic speech-related dimensions such as voice quality (female vs. male) and segmental duration (long vs. short). She reported that this attention control made a significant contribution to learners’ degree of perceived phonetic distance between L2 and L1 sounds.
In this study, we use the Stroop task as a measure of attentional control. MacLeod (1992) referred to the Stroop paradigm as the “gold standard” for measuring the automatic influence of unattended information. In the basic Stroop task, participants are asked to focus on one component of a stimulus (e.g., ink color) while ignoring its other components (e.g., a color name). The conflict between the task-irrelevant, ignored dimension and the attended dimension normally causes an interference effect. Consequently, selectively attending to one dimension does not completely prevent the processing of the ignored dimension. Large interference effects are indicative of poor selective attention, as they reflect excessive processing of the task-irrelevant information. Higher Stroop interference effects may be associated with lower gains from phonetic training and vice versa. We assume that greater attention control between different acoustic cues of contrastive vowels during training will improve the learning of those pairs; this will be reflected in better discrimination after training. Overall, we hypothesize that better attention control will lead to higher performance gains from phonetic training.
1.3 Phonological short-term memory
PSTM is usually associated with the phonological loop involved in the temporary storage of verbal-acoustic information (Baddeley, 2003; Baddeley & Hitch, 1974). PSTM capacity varies among individuals, but on average, it can hold phoneme-/syllable-sized phonological units that are produced within two seconds (Baddeley, 1992). Different aspects of the role of PSTM in language acquisition have been studied, including its role in L1 and L2 development (Baddeley, Gathercole, & Papagno, 1998; French & O’Brien, 2008; Kormos & Sáfár, 2008; Masoura & Gathercole, 1999).
Greater PSTM capacity, as measured by a variety of nonword repetition or serial nonword recognition (SNWR) tasks, is assumed to be related to language learners’ greater aptitude in speech processing tasks, and it is consequently found to predict greater gains in several areas of linguistic competence. MacKay, Meador, and Flege (2001) studied native Italian speakers’ ability to identify English consonants in noise in word-initial and word-final consonants; they found that PSTM scores were related to variance in the participants’ identification error rate in word-final and word-initial consonants. O’Brien et al. (2007) investigated the relationship between PSTM and the oral fluency development of adult English-speaking learners of Spanish. They found that PSTM was significantly correlated with oral fluency gains obtained during the learners’ stay in the L2 country. Aliaga-García et al. (2011) explored whether Catalan-Spanish bilingual EFL (English as a Foreign Language) learners with higher PSTM would perceive the English monophthongs /i, ɪ, a, ʌ, ɑ/ with higher accuracy than those with lower PSTM capacity after receiving 10 sessions of high-variability phonetic training. They measured PSTM capacity at posttest and assigned the participants to high-PSTM and low-PSTM groups. They found that the high-PSTM group benefited more from phonetic training and argued that this supports the hypothesis that individual differences in PSTM affect perceptual phonological competence development in adulthood. These results suggest that larger PSTM is associated with L2 learners’ ability to accurately discriminate contrasting L2 vowels. This suggests that a higher PSTM is related to a higher ability to form L2 phonetic categories because of easier distinction of unfamiliar acoustic cues. Therefore, we hypothesize that individuals with greater PSTM capacity will gain more from phonetic training.
1.4 Acoustic short-term memory
Several studies have suggested that the discrimination of sounds involves two different memory components: acoustic and phonological memory (e.g., Fujisaki & Kawashima, 1970; Pisoni, 1973; Tanaka & Nakamura, 2004). AM is an auditory sensory memory store that retains speech stimuli at a pre-categorical level prior to phonological encoding (Crowder & Morton, 1969). Joseph et al. (2015) demonstrated that AM determines accuracy in the perception of speech sounds. Researchers have suggested that AM plays a significant role in within-category vowel discrimination, whereas PSTM is involved in between-category discrimination (Cowan & Morse, 1986; Darwin & Baddeley, 1974). The ability to temporarily maintain larger amounts of acoustic information in short-term memory facilitates the accurate perception and production of L2 speech (Safronova & Mora, 2012; Tanaka & Nakamura, 2004). Safronova (2016) mentioned that greater memory capacity provides the potential for processing a larger amount of L2 speech input, leading to better detection of acoustic-phonetic differences between sounds.
Previous studies have suggested that AM capacities are also associated with L2 learners’ ability to accurately discriminate contrasting L2 vowels (e.g., Safronova, 2016). Larger AM may also facilitate the establishment of L2 phonetic categories. Individuals with higher AM can be assumed to be more sensitive to differences in acoustic information between perceptually similar L2 or close L2-L1 vowel pairs. Therefore, higher AM capacity may lead to more efficient learning of L2 vowels. We hypothesize that individuals with greater AM capacity will gain more from phonetic training.
1.5 Vowel systems of Azerbaijani and Southern British English
The present study explores the source of individual variability in L2 vowel learning among Azerbaijani learners of Southern British English. Azerbaijani has nine vowels—/æ/, /ɑ/, /o/, /e/, /œ/, /ɯ/, /u/, /i/, and /y/—with no length distinction (Ghaffarvand Mokari & Werner, 2017a). Southern British English, on the other hand, has 11 vowels: /æ/, /ɑ/, /ɒ/, /ɛ/, /ʌ/, /ɜ/, /ɔ/, /ʊ/, /u/, /ɪ/, and /i/ (Roach, 2004). These vowels are presented in cardinal vowel charts in Figure 1. Recent studies have investigated difficulties that Azerbaijani learners experience in learning English vowels based on the perceptual assimilation model (Best, 1995; Ghaffarvand Mokari & Werner, 2015, 2017b).

Cardinal vowel charts of British English (left, Roach, 2004) and Azerbaijani (right, Ghaffarvand Mokari & Werner, 2017a).
2 Materials and methods
2.1 Participants
In the present study, the participants were requested to fill out a background questionnaire. Eligible participants started receiving English instruction in middle school (a three-year level before high school) at the age of 12, were born and raised in Tabriz, and had not lived in any other country for more than six months. Potential participants passed a pure-tone audiometry at octave frequencies between 250 and 8000 Hz at 20 dB and reported not having any hearing impairment. Due to the possible effects of L2 proficiency on the discrimination of L2 vowels and in order to have a homogenous group in terms of English proficiency, we used a standardized Cambridge Preliminary English Test (PET). The PET was carried out in the same room and at the same time of day until the suitable number of eligible participants was reached. The PET scores from listening comprehension (25 points), reading (25 points), and writing (25 points) were transformed into percentages. A total of 75 points equaled 100%. Eligible participants had to obtain a PET score of 45%–69%. Overall, 40 participants were selected based on the abovementioned inclusion criteria. Thirty of them (15 males and 15 females) served as the experimental group, with a mean age of 25.4 years (SD = 3.4, range = 21–31 years). Ten (five males and five females) served as the control group, with a mean age of 25.8 years (SD = 3.1). All the tests were carried out in the language laboratory of Islamic Azad University in Tabriz. Participants were verbally informed about the tests and signed informed consent forms before joining the study. All participants received a USB flash memory drive as an incentive for participation.
2.2 Pretest and posttest materials
L2 learners’ phonological competence in cross-linguistic speech perception research is normally assessed through identification and discrimination tasks (Beddor & Gottfried, 1995; Harnsberger, 2001). Iverson, Pinet and Evans (2012) found that discrimination was correlated with individual differences in L2 identification accuracy and that identification training had a significant main effect on the identification and discrimination of L2 vowels. However, they found that identification training improvements were much higher for identification than discrimination. Nevertheless, given the limited number of studies that have tested the effects of identification training on the discrimination of L2 sounds, we were interested to see if this type of training improves Azerbaijani learners’ discrimination of Standard Southern British English (SSBE) vowels. Ultimately, if there was any significant improvement, we aimed to explore if cognitive factors are related to the individuals’ amount of learning gain.
The participants performed a forced-choice discrimination task, which was implemented in the computer program Praat (Boersma & Weenink, 2016). The vowel stimuli used for the discrimination test were the same as those used by Pereira (2014). Pereira (2014) recorded four native speakers of Southern British English (two males and two females) reading a list of 61 randomized words containing English vowels. The recordings were done using a Bruel and Kjaer 2231 microphone connected to a digital audiotape recorder at a sampling rate of 48 kHz. The 11 SSBE vowels were naturally produced in /bVt/ and /hVd/ contexts. Three repetitions per word were included, and one of the repetitions was selected for each vowel in each context. We only used stimuli that were recorded in the /hVd/ context. This way, we had 44 recorded tokens (11 vowels in /hVd/ × 4 speakers) to use in the discrimination test. The speakers and stimuli used in the discrimination task were different from those in the training corpus of the present study (see section 2.3 for details about the training stimuli). The stimuli were combined to create L2 vowel contrasts /ɑ-ᴐ/, /æ-ɛ/, /ᴐ-ʊ/, /i-ɛ/, /i-ɪ/, /ᴐ-ɒ/, /ʊ-u/, /ɑ-ɒ/, /ɑ-ʌ/, /æ-ʌ/, and /ʌ-ɒ/. Before using these stimuli in the discrimination test, all recorded tokens were normalized for their peak intensity.
The trials were presented in a random order, there was a 1000-ms interval between them, and there was a 1000-ms inter-stimulus interval in each trial. The participants were instructed to decide if the vowels in the two words they heard were identical or different by clicking on “same” or “different” on the screen. They were asked to focus only on the vowels and not on the differences between the speakers. To force the listeners to ignore the within-category phonetic differences, stimuli were produced by four different speakers. The “different” trials always had an odd item, which was a word belonging to a different vowel category. The “same” trials, on the other hand, were made from stimuli in the same vowel category. In total, each test had 256 trials (Appendix 1). The test took about 25–30 minutes to administer.
The sensitivity A’ score (Snodgrass, Levy-Berger & Haydon, 1985) was calculated for each vowel pair’s discrimination from pretest and posttest results based on the proportion of hits and false alarms for each contrast. 1 The A’ score was calculated to reduce the possible effect of response bias. Hits were defined as the correct selection of the odd item out in change trials. False alarms were defined as the incorrect selection of an odd item out in no-change trials. An A’ score of 1.0 indicated perfect sensitivity to a vowel contrast (i.e., correct responses to all change and no-change trials), and an A’ score of 0.5 represented a theoretically defined chance level of response and a lack of sensitivity (Flege & MacKay, 2004; Snodgrass et al., 1985). From the 11 contrasts in the discrimination tests, we only included the contrasts /ʌ-ɒ/, /ɑ-ʌ/, /ʊ-u/, and /ɑ-ɒ/ in our analyses. The reason for selecting these pairs was that discrimination of these pairs is reported to be very difficult (around chance level) by Azerbaijani learners of British English, but the other contrasts could be discriminated above the chance level with scores greater than 0.7 A’ (Ghaffarvand Mokari & Werner, 2017b).
2.3 Phonetic training
Studies have found that adaptive (high-intensive) perceptual phonetic training significantly improves the identification and discrimination of English vowels among individuals with different first languages (Iverson & Evans, 2009; Lengeris, 2009). Therefore, we adopted the same training stimuli and procedure as used in the study of Iverson and Evans (2009). For the training task, Iverson and Evans (2009) had five SSBE speakers (two males and three females) record a total of 140 target words containing 14 vowels, arranged in four groups (/iː ɪ aɪ eɪ/, /uː aʊ ɜː/, /ɒ əʊ ɔː/, /ɛ æ a ʌ/), with each group consisting of 10 minimal pairs. The training included all the vowels used in the pretest and posttest alongside other vowels of Southern British English. The recordings were conducted two times for each speaker in an anechoic chamber at University College London with a sampling rate of 44.1 kHz and then down sampled to 11.025 kHz. In the present study, the training stimuli were not the same as the stimuli used in the pretest and posttest, as that could have led to an advantage in the discrimination of stimuli for which the participants were trained. The mentioned advantage could cause an interference with our aim of testing the actual L2 vowel discrimination ability after training.
Participants completed five sessions of computer-based auditory training with feedback within two weeks with a different speaker each day. In each session, participants responded to 225 trials (in 45–60 minutes); they heard an English word and chose one of the three or four candidates as displayed on a computer screen. For instance, they heard slit and were asked whether it sounded like sleet, slit, slight, or slate. The stimulus was played before the response candidates were displayed. In case the response word was unfamiliar, it was accompanied by a more common word that had the same vowel, like seed and sit. If the correct answer was given, “Correct!” was displayed on the screen, a cash register was heard, and the target word was repeated once. If an incorrect answer was given, “Wrong” was displayed on the screen, two beeps were heard, and both the target and the wrongly chosen word were repeated twice. To familiarize the trainees with the training procedure, a short 14-trial session was presented to them before starting the training. The participants could have a break in the middle of each training session.
The training was partly adaptive in that the first 70 trials were five random repetitions of 14 English vowels, the next 85 trials were based on the trainees’ errors, and the final 70 trials were five random repetitions of the 14 English vowels. In the adaptive part, the selection probability of a vowel was based on the misses and false alarms in the participants’ previous responses (Iverson & Evans, 2009). Therefore, the vowel that was detected as difficult for the participants would be repeated more than the easy ones. The selection of the stimulus words was done randomly for each vowel. For instance, if the trial was intended to have an /i/ stimulus, the computer program randomly chose one of the 10 minimal-pair stimulus words containing this vowel. Each of the 10 minimal-pair word sets was used once before the list was recycled. The percentage of correct responses was displayed at the end of each session.
2.4 Tests of cognitive abilities
The administration sequence of the cognitive ability tests was as follows: PSTM, AM, inhibitory skills, and attention control. The tests were administered in the same sequence for all participants.
2.4.1 Phonological short-term memory
PSTM was assessed through an SNWR task using the same method as that used by Cerviño-Povedano and Mora (2011). Participants had to determine whether two strings of nonwords were presented in the same or different order. The SNWR was chosen in this study because, as mentioned by O’Brien et al. (2007), serial recognition is less affected by lexicality (i.e., better recall with words than with nonwords) than is serial recall or repetition (other measures used for PSTM). O’Brien et al. (2007) suggested that serial recognition relies to a smaller extent on long-term lexical and phonological memory than serial recall does. Therefore, it provides a more appropriate test of phonological memory.
To design an Azerbaijani SNWR task, Azerbaijani nonwords were read aloud at normal speed several times by a female native Azerbaijani speaker and digitally recorded in a sound-treated room with an M-Audio USB Producer microphone. The recordings were digitized at a 44-kHz sampling rate. The best tokens of each nonword were selected, segmented, edited, and normalized for peak intensity (70 dB) to minimize differences in loudness among stimuli. The nonwords were all CVC syllables conforming to the phonotactic regularities of Azerbaijani (Appendix 2).
Each SNWR task consisted of 24 pairs of CVC nonword sequences of increasing length (five, six, and seven nonwords). Each test consisted of 144 monosyllabic CVC nonwords. The nonwords were separated by a 300-ms silence. Trials at each sequence length contained four same and four different nonword sequence pairs and were randomly presented with a 1000-ms delay upon response. A weighted score (out of 144) obtained by assigning different points to each correctly identified sequence according to its length was used as a measure of PSTM capacity (O’Brien et al., 2007). Correct responses at the sequence length of five were assigned a score of five, correct responses at the sequence length of six were assigned a score of six, and correct responses at the sequence length of seven were assigned a score of seven for a maximum weighted score of 144. Administration of the test took about 10 minutes.
2.4.2 Acoustic short-term memory
AM was assessed using rotated speech stimuli with the same method used by Safronova and Mora (2012). AM was measured through an SNWR task using a rotated speech version of the 144 Azerbaijani nonword stimuli used in the PSTM task. SNWR was chosen because rotated speech is acoustically as complex as speech but cannot be phonologically encoded. The speech rotation technique by Scott, Rosen, Beaman, Davis and Wise (2009) was used to create the AM task stimuli. The technique involved low-pass filtering of the original speech stimuli at 4000 Hz and applying spectral inversion at 2000 Hz, after which the modified speech stimuli were low-pass filtered again at 3800 Hz. This acoustic manipulation produced unintelligible “alien-sounding” stimuli that preserved the acoustic complexity of normal speech (Safronova, 2016).
The AM task (four practice trials and 32 experimental trials) consisted of three-, four-, five-, and six-item pairs. Each trial consisted of two sequences of rotated speech nonwords that were either the same or different. The nonwords were separated by a 300-ms silence, and each sequence was randomly presented with a 1000-ms silence between them. Participants heard the sequences of rotated nonwords distributed in blocks of increasing length and indicated whether the sequences were the same or different. Eight trials (four same and four different) at each sequence length were randomly presented (Safronova & Mora, 2012). The score for AM was calculated as it was for PSTM (see section 2.4.1). The test administration took about 10 minutes.
2.4.3 Inhibitory skills
The RIF test by Jonker, Seli, and MacLeod (2013) was used to assess inhibitory skill. Although other materials have been shown to produce an RIF effect, for example personality characteristics (Macrae & MacLeod, 1999), visuospatial materials (Ciranni & Shimamura, 1999), and eyewitness memory scenes (Shaw, Bjork, & Handal, 1995), the test used here—involving the three key phases of study, retrieval practice, and a final test—is regarded as the standard procedure for investigating RIF (Anderson & Spellman, 1995; Anderson, Bjork & Bjork, 1994; Jonker et al., 2013).
Participants completed four phases: a study phase, an extra-study phase (i.e., practice), a distractor task, and a final test. The language of the test was changed to the participants’ native language. Participants studied stimuli from six categories, each with eight exemplars, resulting in 48 category-exemplar word pairs. These stimuli were used in all subsequent experiments. Stimuli were displayed in 20-pt Times New Roman font on a 17-in. monitor, and all responses were captured with E-Prime 2.5 software.
Participants saw the 48 category-exemplar word pairs on the computer screen. Each pair was displayed individually at the center of the screen in black font against a white background for 5 seconds. The category names were always displayed in uppercase font, and the exemplars were always displayed in lowercase font, with a dash separating them (e.g., “GHAZA–kabab” [FOOD–kebab]). The presentation order of the category-exemplar pairs was randomized throughout the study phase with the restriction that no two pairs from the same category appeared back to back.
Afterward, half of the exemplars from half of the categories were randomly selected (four exemplars from each of the three categories). To ensure that the participants were attending to and encoding the items presented, they were required to repeat the pairs aloud. Each pair was presented three times, with items from the same category separated by a minimum of one pair from another category. Then, participants completed a 5-min distractor task during which they produced the names of as many countries as they could (Macrae & Roseveare, 2002). A one-letter word stem was then shown along with its category name, and participants were given up to 10 seconds to give a response. All items from one category were tested together before items from another category. The strength of the RIF effect was measured by taking the difference between the RP (unpracticed items sharing category membership with practiced items) scores and NRP (baseline items with no category items practiced) scores. The test administration took about 30 minutes.
2.4.4 Attention control
The Stroop paradigm has been referred to as the “gold standard” for measuring the automatic influence of unattended information (MacLeod, 1992). In this study, attention control was assessed through a Stroop test, which was the same as the task used by Unsworth and Spillers (2010). Both tests were displayed on a 17-in. computer monitor, and all responses were captured with E-Prime 2.5 software. In the Stroop test, participants were presented with a color name (which was modified according to the participants’ native language: “ghermez” [red], “sabz” [green], or “abi” [blue]) in one of three different font colors (red, green, or blue). The participants were asked to indicate the font color by pressing the corresponding key (red = 1, green = 2, and blue = 3). They were told to respond as quickly and accurately as possible. They were given 15 trials of response-mapping practice and six trials of practice with the real task. They had a total of 75 real trials. Of these trials, 67% were congruent, meaning the word and font color matched (e.g., red printed in red), and the other 33% were incongruent (e.g., red printed in green). The reaction time difference between incongruent and congruent trials (for correct responses) served as the index for attention competence. The test took about five minutes to complete.
3 Results
The experimental group’s English proficiency (PET) mean score was 57.42 ± 6.22 (range = 49.3–66.6), while the control group’s mean PET score was 57.16 ± 5.69 (range = 50.66–66.6). There was no significant difference between the PET scores of the experimental and control groups, t(38) = 0.117, p = 0.907. Pearson correlation analysis did not reveal a significant correlation between PET scores and discrimination pretest/posttest scores or between PET scores and discrimination gains (p > 0.05). Table 1 shows the means, standard deviations, and ranges of the measured cognitive abilities and discrimination scores in the pretest and posttest.
Mean and standard deviation for PSTM (weighted scores), AM (weighted scores), Stroop (milliseconds; RT incongruent – RT congruent), and RIF (RT difference between the RP− and NRP), discrimination A′ scores in pre- and post-tests and gains from training (n = 30).
PSTM = phonological short-term memory, AM = acoustic memory, RIF = retrieval-induced forgetting effect, Disc-pre = discrimination pre-test, Disc-post = discrimination post-test, and Disc-gains = gained scores in discrimination after training.
Results of the paired sample t-tests revealed a significant difference in discrimination scores between the pretest and posttest for the experimental group, t(58) = 8.76, p < 0.001, but not for the control group, t(18) = 0.336, p = 0.741, which indicates the effectiveness of the training task. Overall, the average score for the discrimination task improved by 17.9 ± 8 percent from the pretest for the experimental group.
Regarding the results of the RIF test, to provide an appropriate baseline for the RP− and RP+ items that occurred in different testing positions, we divided the NRP items in each category in half based on their testing positions. RP− items were compared to the first three NRP items tested (NRP1), whereas RP+ items were compared to the last three NRP items tested (NRP2; Jonker & MacLeod, 2012; Jonker, Seli, & MacLeod, 2015). Results of a repeated-measure analysis of variance revealed that recall for RP−, RP+, NRP1, and NRP2 items differed significantly, F(3, 87) = 27.3, p < 0.001. The first planned comparison revealed a significant benefit of extra-study practice for the practiced items: Using paired sample t-tests, we found that participants recalled more RP items than NRP2 items, t(58) = 5.0, p < 0.001. More importantly, the second comparison revealed a significant cost to the RP items, and participants recalled RP− less than NRP1 items, t(58) = 3.9, p < 0.001. Inhibition theory predicts that RIF should be retrieval dependent, occurring only when the practice task involves the retrieval of RP+ items. Table 2 shows the means and standard deviations of the proportions of items recalled in RP+, RP−, NRP1, and NRP2.
Mean percentages and standard deviation of items recalled as a function of item type.
Consistent with this prediction, the RIF effect was present in this experiment where practice involved recalling RP+, which caused lower scores for RP− items. The mean proportions of correct recall during the final test are presented in Figure 2. All scores are expressed as a proportion of the total number of items in the category.

Mean proportions of exemplars recalled during the final test. The error bars represent one standard error of their respective means. RP+ = practiced items; RP− = unpracticed items sharing category membership with practiced items; NRP baseline items with no category items practiced; NRP1 items, like RP items, were from testing positions 1 to 4 of a baseline category; NRP2 items, like RP items, were from testing positions 5 to 8.
Regarding the Stroop test results, paired sample t-tests showed a significant difference between RT of incongruent and congruent items, t(58) = 3.40, p < 0.05. For PSTM scores, repeated-measure analysis of variance revealed a significant difference between scores with different item lengths, F(2, 87) = 12.7, p < 0.001. Post hoc Tukey comparisons showed a significant difference between scores of 5-item and 6-item sequences (p < 0.001) and between 5-item and 7-item sequences, but not between 6-item and 7-item sequences (p < 0.001).
Table 3 shows the percentage scores of the AM and PSTM tests with different item lengths. Regarding AM scores, repeated-measure analysis of variance revealed a significant difference between scores with different item lengths, F(3, 116) = 7.23, p < 0.001. Post hoc Tukey comparisons showed a significant difference between the scores of 3-item sequences and 4-, 5-, and 6-item sequences, but not between the scores of 4-, 5-, and 6-item sequences.
Mean and standard deviation of percentage of correct identifications in AM and PSTM tests with different item lengths.
The results of the partial Pearson correlation analyses between the amount of gains from phonetic training and inhibition, Stroop, PSTM, and AM scores are presented in Table 4. As mentioned earlier, there was no significant relationship between English proficiency (PET) scores and discrimination scores. Regarding the potential effect of proficiency on L2 sound discrimination, the correlations were examined while controlling for the PET scores. We checked for significant correlations while adjusting for p-values with a Benjamini-Hochberg false discovery rate of 5% (Benjamini & Hochberg, 1995).
Partial correlations (Pearson) between discrimination results in pre- and post-tests, gains from training and cognitive ability scores, controlling for proficiency scores (PET).
Note. n = 30. Benjamini-Hochberg adjusted p < 0.05*, one-tailed, aacoustical memory and bphonological short-term memory.
These correlations indicate that inhibition was significantly positively correlated with discrimination gains from phonetic training, r(27) = 0.486, p = 0.004. However, there was no significant correlation between inhibition and discrimination at pretest. Additionally, there were no significant correlations between discrimination scores and Stroop, PSTM, and AM scores (p > 0.05; Table 4).
Further regression analyses revealed that inhibition was a significant predictor of discrimination gain (Table 5). We used a hierarchical regression analysis to control for the possible effect of proficiency scores. Perception gain was included as the dependent variable, and proficiency scores and RIF were included as predictors in the model. Inhibition explained about 49% of the total variance in discrimination gain (p = 0.008) while controlling for proficiency scores (PET).
Results of hierarchical regressions using inhibition and PSTM as predictor of discrimination gain, controlling for proficiency scores (PET).
4 Discussion
This study investigated the relationship between different cognitive abilities and adults’ L2 vowel learning with the aim of gaining a better understanding of individual differences in L2 speech learning.
4.1 Inhibitory skills
The results of the correlation analysis revealed a significant association between inhibition task scores and discrimination gains from phonetic training. This finding is in line with the results of Darcy et al. (2016), who reported that higher inhibitory control was related to a lower error rate in segmental perception.
Individual differences in the ability to inhibit the language not in use have consequences for language perception and production (Lev-Ari & Peperkamp, 2013). Friedman and Miyake (2004) categorized inhibitory control into three different groups, including “resistance to proactive interference,” under which the RIF task falls. They defined resistance to proactive interference as the ability to resist memory interference from information that was previously relevant to the task but has since become irrelevant. As Darcy et al. (2016) proposed, this type of inhibition might be highly relevant for category learning. It helps learners form accurate vowel categories while resisting interference from L1-specific memory traces during perception.
We suggest that the relationship between inhibition scores and greater accuracy in the discrimination of L2 vowels demonstrates that learners with higher inhibition scores may have used this ability during the training to support the learning of L2 segmental categories. Therefore, learners with stronger inhibitory control may have had an advantage during the training, enabling them to develop more accurate representations for L2 segments. A possible explanation for this advantage is the greater capacity to avoid L1 interference during L2 phonological processing and learning. Darcy et al. (2016) suggested that temporarily preventing the activation of an L1 system in phonological processing would allow learners to minimize L1 perceptual interference, which facilitates the development of more accurate L2 phonological representations. They concluded that inhibitory control could be an essential component of executive function facilitating L2 phonological acquisition.
We assume that higher inhibitory skills helped the participants not to confuse the intended L2 vowels with L1 vowels during the phonetic training and allowed the easier establishment of new categories for the new L2 vowels. Costa and Santesteban (2004) observed that language learners with low proficiency show asymmetries in language-switching tasks, whereas high-proficiency L2 learners no longer exhibit these asymmetries even when switching between their native language and a third language they do not know well. These asymmetries reflect the asymmetrical inhibition in language processing. Costa and Santesteban (2004) interpreted this finding as indicating that high-proficiency bilinguals learn how to control their languages in a manner that does not rely on inhibition, and later using this as an alternative method for language control when learning additional languages. Based on their study, it can also be assumed that because our participants are low-proficiency learners, their inhibitory skill predicts the degree of influence of their L1 on L2 during the training.
In the study of Darcy et al. (2016), learners with stronger inhibition scores also made fewer errors in vowel perception; thus, the authors suggested that the learners’ inhibition measure might reflect their ability to inhibit the wrong response alternative in the ABX task for perception. In our study, since there was no significant correlation between inhibition and discrimination pretest scores, it can be assumed that inhibitory control is more related to training than to discrimination accuracy. It is also possible that the effects of other factors present prior to phonetic training could not be observed in the pretest phase, but when the training procedure was controlled, the effects of these cognitive abilities showed up.
Better inhibitory skills may help learners selectively activate one phonological system at a time (either L1 or L2), which lowers interference between them and enhances phonological processing and acquisition. For instance, deactivation of an L1 system during training on minimal pairs in the phonetic training task would minimize L1 perceptual interference and consequently improve learning of the L2 phonological system. Our findings are in line with the assumption of Darcy et al. (2016) that inhibitory control could be an essential component of executive function facilitating L2 phonological acquisition.
4.2 Phonological/acoustic short-term memory
Based on the results of this study, PSTM and AM capacity seem unrelated to the amount of gains from phonetic training in the discrimination of L2 vowels. Generally, the lack of association between PSTM and gains from training is contrary to the results of Aliaga-García et al. (2011), who found that high-variability phonetic training had an overall larger effect on participants with high PSTM than on those with low PSTM. The lack of a relationship between AM capacity and gains from training is in line with the results of Safronova and Mora (2012) and in contrast to those of Safronova (2016), who found a significant relationship between AM and discrimination scores.
The discrimination of different L2 sound pairs can be assumed to require different levels of cognitive demand. Therefore, one of the possible explanations for these contrary results is the use of different L2 pairs with different degrees of difficulty in the discrimination tasks in these studies. Isaacs and Trofimovich (2011) also did not find any relationship between PSTM and L2 speech judgment. Similar to this study, they also employed an SNWR task to estimate the raters’ phonological memory capacity. Other tasks such as nonword repetition or recall tasks, despite their shortcomings, could yield a measure of phonological memory that would be associated with better gains from phonetic training. Several studies have found that the phonological loop plays a crucial role in the learning of new words by storing unfamiliar sound patterns while long-term representations are built, which supposes a direct link between short-term memory and long-term learning (Papagno & Vallar, 1995; Speciale, Ellis & Bywater, 2004). Regarding these findings and the results of our study, another assumption would be that a better PSTM is an advantage for storing sound patterns and vocabulary learning and does not play a significant role in L2 segmental learning.
Additionally, there was no significant relationship between PSTM and AM. Williamson, Baddeley, and Hitch (2010) argued that the acoustic and phonological stores are fundamentally different and thus do not overlap but share a common rehearsal mechanism. Our results also suggest that these two short-term capacities are different.
4.3 Attention control
The selective attention models often applied to speech perception are used to study effects of training on the perception of an unfamiliar phonetic contrast (Goldstone 1993; Nosofsky, 1986). Based on these models, learning shifts attention to dimensions relevant for classification and away from dimensions that are irrelevant. Attention control in this study is viewed as the ability to have a selective attention on different aspects of acoustic information. We assumed that better selective attention control may facilitate L2 phonetic learning by helping the learners to shift their attention to dimensions relevant for discrimination of L2 vowels and ignore dimensions that are irrelevant during the phonetic training. However, in the present study we found no significant relationship between attention control ability and gains from training.
Previous studies on L2 learners’ perception of English vowel contrasts have shown that better attention control may not contribute to an accurate perception of L2 phonological contrasts (Darcy et al., 2014, 2015; Safronova, 2016; Safronova & Mora, 2012). Safronova (2016) suggested that the ability to rapidly reallocate one’s focus of attention may not be involved in L2 learners’ accurate perception of English vowel contrasts, and that the ability to focus attention on particular acoustic cues may be a better predictor of L2 learners’ fast and accurate discrimination of L2 vowel contrasts. She argued that learners’ L2 vowel discrimination ability might be partly predicted by their lower ability to inhibit irrelevant acoustic cues. It should be noted that these studies have used attention switching paradigms as an index for attention control; however, the comparison of our results with those of the mentioned studies is difficult since we have used a Stroop task, which is a measure of selective attention/response inhibition rather than attention switching.
The nonsignificant relationship between attention control and gains from phonetic training in the present study could be attributed to our testing procedure. Our phonetic training procedure might not have been cognitively challenging; therefore, there might have been inadequate room for participants to efficiently consider their attention control (Isaacs & Trofimovich, 2011). Moreover, Trofimovich, Ammar, and Gatbonton (2007) suggested that the measure of attention control was a stronger predictor of participants’ performance when the cognitive demands of the task were elevated. In future investigations of the role of cognitive factors in the perceptual training of L2, it would be interesting to examine these claims further by modifying the level of cognitive demand in the training procedure.
Regarding the fact that in the present study we have only considered the role of selective attention on L2 vowel learning, we suggest future research to study the role of individual differences in the different aspects of attention control on amount of gains from phonetic training of L2 sounds.
5 Conclusion
Despite several findings on large individual differences in the learning of L2 sounds, the source of these differences is still unclear. This study explored the source of individual differences in cognitive abilities in a controlled situation through phonetic training to reduce the effects of other factors affecting long-term learning.
There were no significant correlations between PSTM, AM, attention control, and the amount of gains from training. However, higher inhibitory control was significantly correlated with higher gains in perceptual discrimination from phonetic training. We suggest that higher inhibitory control facilitates the processing of L2 sounds by allowing learners to ignore the interfering acoustic and phonological information from L1, leading to better L2 segmental learning. Studies exploring effective factors in the learning of L2 sounds may consider controlling for learners’ inhibitory skills.
Generally, consideration of the effects of psycholinguistic factors in speech perception/acquisition research is important. Ramus et al. (2010), in their “standard model” of phonological theory, provide a modified information-processing model of the speech system in which they signify the importance of considering various levels of processing and representation in experimental paradigms.
This study has some limitations; in particular, it had a relatively small number of participants and did not control for their motivation in L2 learning. Future research can involve larger sample sizes and control for motivation as a confounding factor. The relationship between cognitive abilities and learning L2 consonants and prosodic structures can also be explored in future studies.
Footnotes
Appendix 1
Below is the list of all 128 trials used in the discrimination task (all repeated twice in the task). The f1, f2, m1, and m2 are used for the tokens produced by the two native female and the two native male speakers. For instance, in the trial (/hɪd/f1,/hɪd/f2) listeners would hear the speaker f1’s /hɪd/ token followed by the speaker f2’s /hɪd/ token.
(/hɪd/f1,/hɪd/f2) (/hɪd/m1,/hɪd/m2) (/hɪd/f2,/hɪd/f1) (/hɪd/m2,/hɪd/m1)
(/hid/f1,/hid/f2) (/hid/m1,/hid/m2) (/hid/f2,/hid/f1) (/hid/m2,/hid/m1)
(/hɛd/f1,/hɛd/f2) (/hɛd/m1,/hɛd/m2) (/hɛd/f2,/hɛd/f1) (/hɛd/m2,/hɛd/m1)
(/hæd/f1,/hæd/f2) (/hæd/m1,/hæd/m2) (/hæd/f2,/hæd/f1) (/hæd/m2,/hæd/m1)
(/hʌd/f1,/hʌd/f2) (/hʌd/m1,/hʌd/m2) (/hʌd/f2,/hʌd/f1) (/hʌd/m2,/hʌd/m1)
(/hɒd/f1,/hɒd/f2) (/hɒd/m1,/hɒd/m2) (/hɒd/f2,/hɒd/f1) (/hɒd/m2,/hɒd/m1)
(/hɑd/f1,/hɑd/f2) (/hɑd/m1,/hɑd/m2) (/hɑd/f2,/hɑd/f1) (/hɑd/m2,/hɑd/m1)
(/hᴐd/f1,/hᴐd/f2) (/hᴐd/m1,/hᴐd/m2) (/hᴐd/f2,/hᴐd/f1) (/hᴐd/m2,/hᴐd/m1)
(/hʊd/f1,/hʊd/f2) (/hʊd/m1,/hʊd/m2) (/hʊd/f2,/hʊd/f1) (/hʊd/m2,/hʊd/m1)
(/hud/f1,/hud/f2) (/hud/m1,/hud/m2) (/hud/f2,/hud/f1) (/hud/m2,/hud/m1)
(/hɪd/f1,/hid/f2) (/hɪd/f2,/hid/f1) (/hɪd/m1,/hid/m2) (/hɪd/m2,/hid/m1)
(/hɪd/f2,/hid/f1) (/hɪd/f1,/hid/f2) (/hɪd/m2,/hid/m1) (/hɪd/m1,/hid/m2)
(/hɪd/f1,/hɛd/f2) (/hɪd/f2,/hɛd/f1) (/hɪd/m1,/hɛd/m2) (/hɪd/m2,/hɛd/m1)
(/hɪd/f2,/hɛd/f1) (/hɪd/f1,/hɛd/f2) (/hɪd/m2,/hɛd/m1) (/hɪd/m1,/hɛd/m2)
(/hɛd/f1,/hæd/f2) (/hɛd/f2,/hæd/f1) (/hɛd/m1,/hæd/m2) (/hæd/m2,/hɛd/m1)
(/hɛd/f2,/hæd/f1) (/hɛd/f1,/hæd/f2) (/hɛd/m2,/hæd/m1) (/hæd/m1,/hɛd/m2)
(/hæd/f1,/hʌd/f2) (/hæd/f2,/hʌd/f1) (/hæd/m1,/hʌd/m2) (/hæd/m2,/hʌd/m1)
(/hæd/f2,/hʌd/f1) (/hæd/f1,/hʌd/f2) (/hæd/m2,/hʌd/m1) (/hæd/m1,/hʌd/m2)
(/hʌd/f1,/hɒd/f2) (/hʌd/f2,/hɒd/f1) (/hʌd/m1,/hɒd/m2) (/hʌd/m2,/hɒd/m1)
(/hʌd/f2,/hɒd/f1) (/hʌd/f1,/hɒd/f2) (/hʌd/m2,/hɒd/m1) (/hʌd/m1,/hɒd/m2)
(/hɒd/f1,/hɑd/f2) (/hɒd/f2,/hɑd/f1) (/hɒd/m1,/hɑd/m2) (/hɒd/m2,/hɑd/m1)
(/hɒd/f2,/hɑd/f1) (/hɒd/f1,/hɑd/f2) (/hɒd/m2,/hɑd/m1) (/hɒd/m1,/hɑd/m2)
(/hɑd/f1,/hᴐd/f2) (/hɑd/f2,/hᴐd/f1) (/hɑd/m1,/hᴐd/m2) (/hɑd/m2,/hᴐd/m1)
(/hɑd/f2,/hᴐd/f1) (/hɑd/f1,/hᴐd/f2) (/hɑd/m2,/hᴐd/m1) (/hɑd/m1,/hᴐd/m2)
(/hᴐd/f1,/hɒd/f2) (/hᴐd/f2,/hɒd/f1) (/hᴐd/m1,/hɒd/m2) (/hᴐd/m2,/hɒd/m1)
(/hᴐd/f2,/hɒd/f1) (/hᴐd/f1,/hɒd/f2) (/hᴐd/m2,/hɒd/m1) (/hᴐd/m1,/hɒd/m2)
(/hʊd/f1,/hud/f2) (/hʊd/f2,/hud/f1) (/hʊd/m1,/hud/m2) (/hʊd/m2,/hud/m1)
(/hʊd/f2,/hud/f1) (/hʊd/f1,/hud/f2) (/hʊd/m2,/hud/m1) (/hʊd/m1,/hud/m2)
(/hᴐd/f2,/hʊd/f1) (/hᴐd/f1,/hʊd/f2) (/hᴐd/m2,/hʊd/m1) (/hᴐd/m1,/hʊd/m2)
(/hᴐd/f1,/hʊd/f2) (/hᴐd/f2,/hʊd/f1) (/hᴐd/m1,/hʊd/m2) (/hᴐd/m2,/hʊd/m1)
(/hʌd/f1,/hɑd/f2) (/hʌd/f2,/hɑd/f1) (/hʌd/m1,/hɑd/m2) (/hʌd/m2,/hɑd/m1)
(/hʌd/f2,/hɑd/f1) (/hʌd/f1,/hɑd/f2) (/hʌd/m2,/hɑd/m1) (/hʌd/m1,/hɑd/m2)
Appendix 2
The following list shows the “same” and “different” trials used in Azerbaijani phonological short-term memory (PSTM) task. The items with different places within the two sequences are underlined.
(/d͡ʒod/ - /næʃ/ - /fœj/ - /bæz/) (/d͡ʒod/ - /næʃ/ - /fœj/ - /bæz/)
(/t͡ʃɑp/ - /t͡sef/ - /næl/ - /pil) (/t͡ʃɑp/ - /t͡sef/ - /næl/ - /pil)
(/d͡zɯm/ -
(/nɑʃ/ -
(/næl/ - /pit/ - /tem/ - /d͡ʒyv/ - / muf/) (/næl/ - /pit/ - /tem/ - /d͡ʒyv/ - / muf/)
(/bæl/ - /d͡ʒod/ - /nyl/ - /tɯv/ - /pus/) (/bæl/ - /d͡ʒod/ - /nyl/ - /tɯv/ - /pus/)
(/tod/ - /d͡zɯm/ - /bœd/ - /pɑn/ - /t͡sib/) (/tod/ - /d͡zɯm/ - /bœd/ - /pɑn/ - /t͡sib/)
(/puz/ - /bod/ - /vom/ - /næs/ - /sœm/) (/puz/ - /bod/ - /vom/ - /næs/ - /sœm/)
(/væl/ -
(/pɑz/ -
(/tid/ - /fum/ -
(/dɑn/ - /t͡syv/ -
(/d͡ʒɯt͡ʃ/ - /tub/ - /t͡sum/ - /dɑl/ - /liʃ/ - /t͡sem/) (/d͡ʒɯt͡ʃ/ - /tub/ - /t͡sum/ - /dɑl/ - /liʃ/ - /t͡sem/)
(/pæt/ - /mis/ - /vœl/ - /ner/ - /fuz/ - /sof/) (/pæt/ - /mis/ - /vœl/ - /ner/ - /fuz/ - /sof/)
(/bim/ - /nel/ - /lop/ - /dɑt͡ʃ/ - /tɯn/ - /d͡zun/) (/bim/ - /nel/ - /lop/ - /dɑt͡ʃ/ - /tɯn/ - /d͡zun/)
(/dæz/ - /t͡ʃit/ - /fœj/ - /neʃ/ - /t͡sod/ - /lur/) (/dæz/ - /t͡ʃit/ - /fœj/ - /neʃ/ - /t͡sod/ - /lur/)
(/t͡sæf/ - /tɯm/ -
(/dɯs/ - /vyn/ - /noz/ -
(/nɑʃ/ -
(/d͡zæp/ -
(/tær/ - /deb/ - /bɯm/ - /dœb/ - /med/ - /d͡ʒel/ - / t͡ʃɑp/) (/tær/ - /deb/ - /bɯm/ - /dœb/ - /med/ - /d͡ʒel/ - / t͡ʃɑp/)
(/mys/ - /tud/ - /lɯb/ - /pil/ - /t͡sœv/ - /sœd/ - /bop/) (/mys/ - /tud/ - /lɯb/ - /pil/ - /t͡sœv/ - /sœd/ - /bop/)
(/mɑb/ - /t͡sɯn/ - /t͡ʃez/ - /d͡zil/ - /d͡ʒœp/ - /tyn/ - /rɑl/) (/mɑb/ - /t͡sɯn/ - /t͡ʃez/ - /d͡zil/ - /d͡ʒœp/ - /tyn/ - /rɑl/)
(/mœj/ - /fɯn/ - /nœʃ/ - /d͡ʒom/ - /duv/ - /t͡sev/ - /t͡ʃib/) (/mœj/ - /fɯn/ - /nœʃ/ - /d͡ʒom/ - /duv/ - /t͡sev/ - /t͡ʃib/)
(/bæz/ - /t͡sœt/ - /fol/ -
(/bɑp/ - /luʃ/ - /tæz/ - /vɯl/ -
(/ræs/ -
(/vɯm/ - /nuz/ -
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
