Abstract
We investigated the “proximate unit” in Korean, that is, the initial phonological unit selected in speech production by Korean speakers. Previous studies have shown mixed evidence indicating either a phoneme-sized or a syllable-sized unit. We conducted two experiments in which participants named pictures while ignoring superimposed non-words. In English, for this task, when the picture (e.g.,
Phonological units in Korean speech production
In speaking, we transform our thoughts into sound waves which other speakers of the language perceive and understand. The act of speaking involves selection of the appropriate words from the mental lexicon (i.e., lexical selection) as well as assembling the sounds associated with these words for articulation. This article focuses on the latter process, called phonological encoding. Several theoretical models have been formulated describing in varying detail how the ability to speak might be organised in the mind (e.g., Caramazza, 1997; Dell, 1986; Levelt et al., 1999). Among these models, Levelt et al. (1999) is most explicit on how phonological encoding of words takes place. According to this model, two pieces of information are needed: (1) the metrical frame, specifying the number of syllables and the stress position, and (2) the phonological units, which are the initially selected units which will be used to fill the slots of the metrical frame. For example, uttering a word such as “panda” would involve a bi-syllabic metrical frame with stress on the first syllable (i.e., ω = ‘σσ), and the phonological units would be /p/1 /æ/2 /n/3 /d/4 and /ə/5. These two pieces of information are then combined to form the bi-syllabic phonological word [‘pæn-də] with stress on the first syllable.
The Levelt et al. (1999) model was developed using experimental data from mainly Germanic languages such as Dutch, English, and German. However, it is becoming clear that the model details are not the same for all languages due to the numerous phonological, morphological, and orthographic differences between them. Indeed, a recent paper by Roelofs (2015) has already amended parts of the model to accommodate findings from other languages such as Mandarin Chinese and Japanese (e.g., Chinese: Chen et al., 2002; O’Séaghdha et al., 2010; Japanese: Kureta et al., 2006; Verdonschot et al., 2011). These findings have shown that the initially selected phonological unit during phonological encoding—the “proximate” unit (see O’Séaghdha et al., 2010)—is not the same in all languages. Specifically, it has been shown using different experimental paradigms such as implicit priming and masked priming; whereas the proximate unit is the phoneme (segment) in Dutch and English (and other European languages), it is the (atonal) syllable in Mandarin Chinese and the mora in Japanese 1 (for a comprehensive overview of these findings, see O’Séaghdha, 2015; Roelofs, 2015). The status for many other languages, however, is largely unexplored.
In this study, we focus on Korean. Korean is of special interest because of the numerous phonological, morphological, and orthographic differences between Korean and other languages, for example: Dutch and English, on the one hand, and Mandarin Chinese and Japanese, on the other. These differences suggest a role for either phonemes (segments) or syllables as the potential proximate unit in Korean, as we discuss below.
First, the Korean syllable is far less complex than that of English and Dutch, suggesting a putative role for the syllable as a proximate unit (although the Korean syllable is more complex than Mandarin). A Korean syllable is maximally CGVC (C = consonant, G = glide, V = vowel) where only a single consonant is allowable in onset and coda position. This contrasts with English and Dutch which allow as many as three consonants in syllable onsets (e.g., “sprain”) and four consonants in syllable codas (e.g., “texts”). Hence, the number of possible syllables in Korean is 1,832 (derived from the subtraction of the unattested syllables from all possible combinations from 19 consonant onsets, 21 vowels, and 7 coda consonants; Won, 1987), which is considerably smaller than the 12,000 possible syllables in Dutch (Schiller, 1998), but also much larger than the approximately 400 syllables in Mandarin Chinese when ignoring tones (and 1,200 for tonal syllables; O’Séaghdha et al., 2010). Furthermore, in contrast to Mandarin where resyllabification is non-existent, Korean allows numerous phonological processes to occur across the syllable boundary. For example, when the second syllable of a disyllabic word begins with a vowel, the coda of the first syllable is resyllabified to the onset of the second syllable, for example, /hak.u/ “buddy” → [ha.ku]. Another example is nasal assimilation in which the coda of the first syllable is nasalised and assimilated with the nasal onset in the second syllable, for example, /hak.mun/ “learning” → [haŋ.mun]. 2
Second, the Korean orthographic system (Hangul) is different from all alphabetic scripts, where a letter corresponds to a phoneme, as in English and Dutch; Chinese “hanzi” (and Japanese kanji) characters represent (morpho-) syllables, and Japanese “kana” script represents moras. Hangul is an “alphabetic syllabary” in which both phonemes and syllables are explicitly represented in the writing system (e.g., Taylor, 1980; Taylor & Taylor, 2014). Hangul is a relatively transparent orthography where grapheme-to-phoneme correspondence is consistent, and thus, each phoneme is represented by a Hangul letter. These letters are, however, not linearly ordered as in the alphabetic writing system, but grouped into a square block, which corresponds to a syllable. The syllables are separated by a physical gap: For example, the printed word <한글> /han.kɨl/ is composed of two syllable blocks. The physical demarcation of syllable blocks renders the readers to clearly distinguish the syllable boundary (e.g., unlike English in which a word such as “music” is not written as mu-sic). Thus, Hangul maps graphemes onto phonemes just as English and Dutch do, but the composition of its graphemes is shaped into a square-like syllable-sized block, like a Chinese hanzi character.
The putative roles of segments and syllables as the proximate unit in Korean find empirical support in observations of natural language use, such as word games and speech errors. Some word games in Korean respect the syllable, whereas other word games respect the phoneme. For example, a Korean word-chain game played by children named kketmaliski requires players to utter a word which begins with the final part of the previously heard word. Here, the final part must be the syllable. For instance, when han.kuk “Korea” is presented, kuk.su “noodle” is valid, but ki.lin “giraffe” (onset overlap) or ku.lɨm “cloud” (onset plus a vowel overlap) is not a valid continuation. It is of interest to note that the syllable in question is the underlying syllable as represented in the Hangul orthography, for example: kuk.min “people” would also be a valid answer here, even though the coda of the first syllable is ultimately pronounced as a nasal as it is situated before a labial nasal (i.e., [kuŋ]), whereas a word with an underlying nasal in the coda of the first syllable (/kuŋ.li/ “deliberation”) is not a valid answer. We will return to this point in the “General Discussion” section. On the contrary, Sohn (1987) describes another language game in which vowels of two consecutive syllables are switched around without affecting other parts of the syllable. For instance, the non-word ha.pok is derived from ho.pak “pumpkin” where only the two vowels in the first and the second syllables are switched without any change in the quality of the surrounding consonants, suggesting a role for the phoneme as a functional phonological unit.
Speech error data can also be informative as to the nature of the phonological unit in speech production. In English, most phonological errors involve a single segment or clusters, and errors involving whole syllables are not as common (Bock, 1991; Dell, 1995). In contrast, both syllable and segment errors are observed in Mandarin (Chen, 1993), and errors mostly adhere to morae in Japanese (Kubozono, 1989). J. I. Han et al. (2019) examined the errors in a large-scale corpus of spontaneous speech (i.e., the Seoul Naturalistic Speech Corpus; Yun et al., 2015) which contains audio-recorded interviews of 40 standard Korean speakers and found that Korean speakers produced segmental as well as syllabic errors. Korean has higher numbers of segmental than syllabic errors as in the Germanic languages. However, considering the proportions of segmental and syllabic errors, Korean showed similar proportions of errors involving segments (46.1%) and syllables (39.5%), while there are few, if any, pure syllable errors in Indo-European languages (especially not in Germanic languages; see (Bock, 1991; Dell, 1995). 3
Experimental investigations regarding the role of the segment and the syllable in Korean speech production are scarce. We are aware of just four such studies, and the evidence regarding the onset segment is mixed. Three of these studies employed the masked-priming read-aloud paradigm. Kim and Davis (2002) were the first to report data on Korean, using masked priming in combination with reading aloud and lexical decision. The targets were all monosyllabic words. There were five prime conditions differing in the amount of (orthographic and phonological) overlap, namely, an identity prime (<결> - <결>, both /kjʌl/ “grain”); an onset prime (<개> - <결>, /kæ/ “dog”, /kjʌl/); an “onset-plus” (CV) prime (<겨> - <결>, /kjʌ/ “bran”, /kjʌl/); a rime prime, wherein the vowel and coda of the prime (i.e., the rime) overlapped with the target (<멸> - <결>, /mjʌl/ “a kind of pepper plant”, /kjʌl/); and an unrelated prime serving as the control (e.g., <돈> /don/ “money”). In reading aloud, Kim and Davis (2002) did not obtain significant onset effects (7 ms; although this result approached significance, p = .06); however, they obtained a significant identity priming effect (17 ms) as well as a significant 19-ms “onset-plus” (i.e., CV) effect. The form-priming effect (i.e., facilitation due to an overlap in rime) was not significant (–4 ms; e.g., /mjʌl/ - /kjʌl/). The lexical decision task they administered showed—consistent with previous studies in European languages—only significant identity priming, suggesting that the beneficial effects of begin-related overlap in the read-aloud task originated during phonological encoding. In sum, theirs was the first to investigate the masked onset priming effect in Korean, and unlike the effect in European languages which is highly robust, the authors reported the absence of a statistically significant effect in reading aloud in Korean.
In contrast, Witzel et al. (2013) did find significant phoneme onset priming effects. They also employed a masked-priming read-aloud task but used bi-syllabic Korean non-words as targets instead. These non-words were all preceded by one of three disyllabic Hangul primes (also non-words), specifically: onset phoneme overlap (e.g., <페추> - <피토> /phε.chu/, /phi.tho/), CV overlap (e.g., <피추> - <피토> /phɪ.chu/, /phɪ.tho/), versus an unrelated prime (e.g., <카추> - <피토> /kha.chu/, /phɪ. tho/). The results by Witzel et al. (2013) showed significantly faster reaction times (RTs) when primes and target shared the onset phoneme (9 ms) and the CV syllable (16 ms), and priming was significantly larger for CV primes compared with phoneme primes. Note that the first syllable of all of their primes and targets were open syllables (i.e., CV syllables), and hence it is unclear whether the greater benefit observed with the CV overlap was due to the greater amount of segmental overlap or due to the overlap in the syllable.
J. I. Han and Choi (2016) used the form preparation (implicit priming) paradigm with picture targets (i.e., the picture served as the prompt for the picture name to be produced) with disyllabic names (e.g., /ki.lin/, “giraffe”, /i.cha/, “train”). A significant form preparation effect was found for the syllable overlap (17 ms, p < .001) and a marginally significant effect for onset segment overlap (11 ms, p = .06). J. I. Han and Choi (2016) took the absence of reliable onset effect in the form preparation paradigm (which is in sharp contrast with the robust effect found with Dutch and English) to argue that the proximate unit in Korean speech production is the syllable, rather than the phoneme.
More recently, J. I. Han and Verdonschot (2019) used two different tasks to investigate the phonological unit of Korean word production. In their first experiment, they used a masked-priming read-aloud task, with two-character non-word target stimuli written in Hangul. They were especially interested in the role of the syllable, and their experimental design included various syllable conditions, which will be described in more detail later. For now, the relevant findings are a significant 19-ms priming effect for the onset segment (“Same onset,” for example, <댄소> - <독가> /
As for Witzel et al.’s (2013) findings, these results could be interpreted in terms of the benefit due to segmental overlap, and in line with this, J. I. Han and Verdonschot (2019) concluded that “the onset segment and not the syllable is the initial (or proximate) phonological unit used in the segment-to-frame encoding process during speech planning in Korean” (p. 901). Importantly, their conclusion also took into account the findings from the additional conditions in their masked-priming read-aloud experiment (Experiment 1) referred to earlier. Specifically, in addition to the “Same syllable” (e.g., <독쇠> - <독가> /
This study further investigates the role of onset segment and syllables in Korean speech production, using the picture–word interference (hereafter PWI) paradigm. To our knowledge, the PWI task has not (yet) been used to study the proximate unit in Korean, but we believe it is important to extend the range of paradigms beyond the two main tasks used to date, namely, form preparation and masked priming. O’Séaghdha and Frazer (2014) have pointed out limitations with these tasks. Specifically, he suggested that some form preparation effects may involve memory cuing which allows the shared component to more quickly retrieve the to-be-named target word, instead of (or in addition to) preparation of the shared component. Furthermore, the form preparation effect may be strategic, as attested by the observation that in J. I. Han and Choi’s (2016) study, the effect (for either the onset segment or syllable) was absent in the first block of trials and appeared only from the second block onwards. The possibility that the emergence of the form preparation effect—despite the alternative term implicit priming effect—depends on participants explicitly noticing the shared component makes it a less ecologically valid paradigm to study natural speech production. Of the masked-priming read-aloud paradigm, O’Séaghdha (2015) has commented that “masked primes could influence many word-production processes” (p. 13), not just phonological encoding. Indeed, a major alternative account of masked onset priming effect attributes the facilitation in reading aloud due to the onset overlap between the prime and the target to the sub-lexical grapheme–phoneme mapping process (e.g., Forster & Davis, 1991), and both Kim and Davis (2002) and Witzel et al. (2013) interpreted their finding of masked onset priming effect with Korean words written in Hangul in terms of this process, rather than using phonological encoding. Given these issues, we heed O’Séaghdha’s (2015) call for “more data using tasks that more fully engage speech production processes,” and consequently, we use the PWI task here.
The PWI task has been a mainstay of speech production research. The advantages of this task over the masked-priming read-aloud paradigm are twofold: first, the target is not a written word and hence it is less prone to the influence of orthography (see the results of Kinoshita & Mills, 2020; Kinoshita & Verdonschot, 2020, which suggest that the benefit of onset segment overlap in the PWI task is purely phonological, with no added benefit from orthographic overlap); second, the distractor is available for a longer period than masked primes (i.e., until the participant’s response), and hence there is a greater scope to observe effects of different sized units. 4 In these respects, the PWI task is similar to the Stroop colour-naming task, the task used previously to investigate the proximate unit in Japanese (Verdonschot & Kinoshita, 2018) and Korean (J. I. Han & Verdonschot, 2019). One advantage of the PWI task over the Stroop colour-naming task is the greater range of available to-be-named targets, because they are not limited to colour names. It is relevant to note in this context that J. I. Han and Verdonschot (2019) were unable to fully replicate the design of their masked-priming read-aloud experiment (Experiment 1) in the Stroop task (Experiment 2), because the first syllable of the to-be-named colours (blue, white, yellow) in Korean had to be an open syllable (CV syllable), and therefore the “Resyllabification” and “Coda change” conditions could not be included.
In our current study, we used as targets (presented as a picture) disyllabic words with initial CVC syllable (e.g., /kuk.su/ “noodle”) and disyllabic words with initial CV syllable (e.g., /ca.sʌk/ “magnet”) with disyllabic non-word distractors superimposed on the picture. For the former type of targets, we included all of the conditions used in J. I. Han and Verdonschot’s (2019) masked-priming read-aloud experiment, namely, the Same syllable (e.g., 국퐁 /kuk.phoŋ/), Same onset (i.e., 걸맴 /kʌl.mæm/), Resyllabification (i.e., 국억 /kuk.ʌk/ > [ku.kʌk]), and Coda change (e.g., 국눈 /kuk.nun/ > [kuŋ.nun]) conditions, and compared them with the unrelated Control (e.g., 산몹 /san.mop/). For the initial CV targets, the Resyllabification and Coda change conditions could not be included (for the reasons mentioned earlier), and hence, there were the Same syllable (e.g., 자홉 /ca.hop/), Same onset (e.g., 종팔 /coŋ.phal/), and the unrelated Control condition (e.g., 꼭찰 /k’ok.chal/). Importantly, in addition, for both types of targets, we included the Same CV condition to provide a further test of whether syllable overlap confers a greater benefit than CV overlap. The Same CV distractors differed from the to-be-named target in the syllable, such that for the initial CVC targets (e.g., /kuk.su/, “noodle”) the Same CV distractor contained an initial CV syllable (e.g., 구툴 /ku.thul/) and for the initial CV targets (e.g., /ca.sʌk/ “magnet”) the Same CV distractor contained an initial CVC syllable (e.g., 잠진 /cam.cin/). Note that unlike the “Resyllabification” and “Coda change” distractors, the syllable of the Same CV distractors (that differed from the target) was explicitly represented orthographically. Furthermore, for the initial CV targets (but not for the initial CVC targets), the amount of segmental overlap with the target was equated for the Same syllable distractors and the Same CV distractors. Thus, for the CV targets, the advantage of the Same syllable condition relative to the Same CV condition would indicate a benefit due to the shared syllable and provides a strongest test of the role of syllable. Table 1 shows all distractor conditions and examples.
Mean naming latencies (in ms) in Experiment 1 (note: the initial syllable is a free morpheme).
RT: reaction time; CVC: consonant-vowel-consonant; CV: consonant-vowel.
For the CVC targets, the Whole syllable effect is confounded with the number of segments shared with the target as the Same syllable distractor but not the Same CV distractor shared the coda segment of the initial syllable with the target. For the CV targets, the Whole syllable effect represents a pure benefit of sharing the whole syllable, as the Same syllable distractor and the Same CV distractor contained the same initial CV segments.
In addition to the inclusion of the “Same CV” condition, the design of this study differs from J. I. Han and Verdonschot’s (2019) Experiment 1 in two respects. First, in that experiment, all of their targets contained an initial CVC syllable with the coda /k/ or /p/ (e.g., 독 /to
Experiment 1—free morpheme initial syllable targets
The first experiment investigated the role of onset segments and syllables in Korean speech production. Picture naming was used to optimally involve production processes, and superimposed non-words were used as distractors.
Method
Participants
Thirty students (11 males, age = 21 ± 2 years) from Konkuk University, Republic of Korea, all native Korean speakers, participated in the experiment and received monetary compensation.
Design
The experiment used the picture–word interference task, which had the distractor type (i.e., Same syllable, Resyllabification, Coda change, Same CV, Same onset, and Control) manipulated within participants. The dependent variable was the picture-naming response latency.
Materials
Stimuli consisted of eight pictures (black-and-white drawings) and 240 disyllabic Korean non-words. All pictures had disyllabic names, four pictures had names starting with a CVC syllable (e.g., /kuk/), and four had names starting with a CV syllable (e.g., /ca/), hereafter referred to as CVC targets and CV targets, respectively. In Experiment 1, all the targets contained an initial syllable which was a free morpheme, that is, in the target /kuk.su/ meaning “noodle,” the syllable /kuk/ means “soup.” Note that the initial syllable morpheme was not always semantically related to the whole word: for example, the target /pak.cwi/ meaning “bat” (animal) contains the initial syllable /pak/ which means “gourd.”
For the CVC targets, there were six distractor conditions: (1) Same syllable, (2) Resyllabification, (3) Coda change, (4) Same CV, (5) Same onset, and (6) Control; for the CV targets, there were four (instead of six) distractor conditions as CV targets do not have a coda: (1) Same syllable, (2) Same CV, (3) Same onset, and (4) Control. For each distractor condition, six disyllabic distractor words were devised (one for each block) totalling 4 CVC targets × 6 conditions × 6 blocks + 4 CV targets × 4 conditions × 6 blocks = 240 stimuli. All stimuli used in Experiment 1 can be found in Supplementary Material A.
Apparatus and procedure
Participants were tested individually, seated approximately 60 cm in front of a Samsung S24E450F monitor, upon which the stimuli were presented. Each participant completed 240 test trials, in six blocks. Each picture was named as many times as it had conditions in each block (i.e., 4 CVC targets × 6 conditions, and 4 CV targets × 4 conditions, i.e., 40 items per block). There were self-paced breaks between the blocks, and for each block two pseudorandomised lists were created (using the “mix” software; van Casteren & Davis, 2006) for which each list had the restriction that picture name, picture category, first character, and condition could not directly follow each other. Half of the participants were assigned to one version and the other half to the other version. Six block orders were generated according to a Latin square design to avoid any block order effects. A practice block of eight trials using stimuli (which were not the test stimuli) with the target presented in the same format as the test block preceded the experiment proper. The task was a picture-naming task. Participants were instructed at the outset of the experiment that on each trial they would be presented with a black-and-white line drawing, and their task was to name the picture, as fast and accurately as possible. Stimulus presentation and data collection were achieved through the use of E-prime 2.0 software (e.g., Spapé et al., 2019). Stimulus display was synchronised to the screen refresh rate (16.7 ms). Each trial started with the presentation of a fixation mark (+) for 750 ms at the centre of the screen. This was followed then by the target picture which also contained a superimposed distractor word. Targets were presented for a maximum of 3,000 ms or until the participant’s response. The experimenter then judged the accuracy of a trial by pressing 1 (correct), 2 (voicekey problem), or 3 (error) on a keyboard. After this, an empty screen of 250 ms was shown, which was again replaced by the fixation.
Results
Naming latencies were analysed using linear mixed-effects (LME) modelling with subjects and items as crossed random factors (Baayen, 2008), using the packages lme4 (version 1.1-17; Bates et al., 2015), and lmerTest (version 3.0-1; Kuznetsova et al., 2018) implemented in R (version 3.5.1; 2018-07-02; R Core Team, 2018). CVC targets and CV targets were analysed separately. In the analysis of naming latencies, error trials were excluded, and the latencies were log-transformed to meet the distributional assumptions of LME. We initially tested LME models that included the subject random slope on the distractor type factor; however, as the models did not converge, all the models we report here included subject and item random intercepts.
RT
The mean correct naming latencies are shown in Table 1. Error rates were not analysed, as there were too few errors (78 out of 7,200 trials, or 1%).
CVC targets
The data shown in Table 1 indicates the pattern: (Same syllable = Resyllabification = Coda change) < Same CV < Same onset = Control. This pattern was confirmed by the statistical model with Distractor type as the fixed factor (referenced to the various Distractor conditions) and words (144) and subjects (30) as crossed random effect factors: logRT ~ Distractortype + (1| word) + (1| subj). Relative to the Control condition, all conditions were significantly faster (all |t| > 2.815, all p < .001) except the Same onset condition, t = 0.488, B = 0.0006257, SE = 0.001282, p = .62, that is, there was no onset effect, but a significant syllable effect. Relative to the Same syllable condition, Resyllabification and Coda change condition did not differ significantly from this condition (both |t| < 1, p > .649), but the Same CV condition was significantly slower, t = 5.976, B = 0.007649, SE = 0.00128, p < .001.
CV targets
For these targets, the data shown in Table 1 indicate the pattern: Same syllable < Same CV = Same onset = Control. As for the CVC targets, this pattern was confirmed by the statistical model with Distractor type as the fixed factor (referenced to the various Distractor conditions) and words (96) and subjects (30) as crossed random effect factors: logRT ~ Distractortype + (1| word) + (1| subj). Relative to the Control condition, all conditions (except the Same onset condition) were significantly faster (all |t| > 2.132, all p < .05). The Same onset condition was marginally faster than the Control condition, t = −1.96, B = −0.02736, SE = 0.01396, p = .053. Relative to the Same onset condition, the Same syllable condition was significantly faster, t = −3.785, B = −0.052797, SE = 0.014948, p < .001, but the Same CV condition was not, t = −0.173, B = −0.002411, SE = 0.013959, p = .863. This last pattern indicates that the benefit due to the syllable overlap beyond the onset overlap was not a segmental overlap effect, as both the Same syllable and Same CV distractors shared the first two segments with the target picture name.
Experiment 2—non-morphemic initial syllable targets
The design of Experiment 2 was identical to Experiment 1, except that the first syllable of picture targets was not a morpheme here. This is akin to disyllabic words in English such as “rabbit” or “campus” (where “rab” and “cam” are not morphemes). All stimuli used in Experiment 2 can be found in Supplementary Material B. A different group of participants compared with Experiment 1, all students of Konkuk University took part in this experiment for a monetary reward (N = 30, 20 males, age = 21 ± 2 years).
RT
The mean correct naming latencies are shown in Table 2. Error rates were not analysed, as there were too few errors (77 out of 7,200 trials, ~1%).
Mean naming latencies (in ms) in Experiment 2 (note: the initial syllable is not a morpheme).
RT: reaction time; CVC: consonant-vowel-consonant; CV: consonant-vowel
For the CVC targets, the Whole syllable effect is confounded with the number of segments shared with the target as the Same syllable distractor but not the Same CV distractor shared the coda segment of the initial syllable with the target. For the CV targets, the Whole syllable effect represents a pure benefit of sharing the whole syllable, as the Same syllable distractor and the Same CV distractor contained the same initial CV segments.
CVC targets
The data shown in Table 2 are similar to those of Experiment 1 and indicate the pattern: (Same syllable = Resyllabification = Coda change) < Same CV < Same onset = Control. This pattern was confirmed by the statistical model with Distractor type as the fixed factor (referenced to the various Distractor conditions) and words (144) and subjects (30) as crossed random effect factors: logRT ~ Distractortype + (1| word) + (1| subj). Relative to the Control condition, all conditions were significantly faster (all |t| > 4.72, all p < .001) except the Same onset condition, t =−1.585, B = −0.02354, SE = 0.01485, p = .115, that is, there was no onset effect, but a significant syllable effect was found. Relative to the Same syllable condition, Resyllabification and Coda change condition did not differ significantly from this condition (both |t| < 1, p > .73), but the Same CV condition was significantly slower, t = 2.93, B = 0.043382, SE = 0.014804, p < .01. Finally, referenced to the Same onset condition, the Same CV condition was significantly faster, t = −3.128, B =−0.04642, SE = 0.01484, p < .001.
CV targets
For the CV targets, the data shown in Table 2 are similar to the pattern observed in Experiment 1: Same syllable < Same CV = Same onset = Control. This pattern was confirmed by the statistical model with Distractor type as the fixed factor (referenced to the various Distractor conditions) and words (96) and subjects (30) as crossed random effect factors: logRT ~ Distractortype + (1| word) + (1| subj). Relative to the Control condition, the Same onset condition did not differ significantly, t = −0.223, B = −0.00238, SE = 0.01067, p = .824. Relative to the Same onset condition, the Same syllable condition was significantly faster, t = −7.086, B = −.0.07562, SE = 0.01067, p < .001, but the same CV condition was not, t = −1.356, B = −0.01447, SE = 0.01067, p = .178. As in Experiment 1, this last pattern indicates that the benefit due to the syllable overlap beyond the onset overlap was not a segmental overlap effect, because both the Same syllable and Same CV distractors shared the first two segments with the target picture name.
Bayes factor analysis
Given the undefined empirical status of the onset effect in the Korean speech production literature, we calculated the Bayes factor for the effect in our Experiments 1 and 2 to quantify the strength of evidence for the effect. A Bayes factor is an odds ratio, with 1 indicating equal evidence for the two mutually exclusive hypotheses, generally odds of 3 or greater indicating “some evidence,” greater than 10 indicating “strong evidence,” and odds greater than 30 indicating “very strong evidence” (Dienes, 2014; Jeffreys, 1961) for one hypothesis over the other. For each experiment, the data for the CVC targets and CV targets were combined, and the Bayes factor against the Onset effect (Same onset vs. the Control) was calculated using the BayesFactor package (version 0.9.12-4.1; Morey & Rouder, 2018) with the model that contained only the subject and distractor as crossed random intercept (i.e., the null model) as the numerator. For both Experiments 1 and 2, the Bayes factor was 7 in favour of the null model, indicating moderately strong evidence against the presence of onset effect.
General discussion
The present experiments used a picture–word interference task to investigate which phonological unit—segment (phoneme) or syllable—is initially used in Korean speech production. The pictures had disyllabic names (e.g., /kuk.su/, “noodle,” /ca.sʌk/ “magnet”), and the distractors were disyllabic non-words. The data patterns were consistent across two experiments and showed that relative to the unrelated control condition, (1) syllable overlap produced a large (about 50 ms or greater) naming benefit; (2) the benefit for onset overlap was small (generally less than 10 ms) and not statistically significant, with the Bayes factor indicating moderately strong evidence for the null effect; (3) for the initial CVC syllable targets (e.g., /kuk.su/), the Resyllabification and Coda change distractors which share the underlying syllable with the Same syllable distractor produced a naming benefit that was indistinguishable from the Same syllable distractor, and the benefits were significantly greater than those produced by the Same CV distractor; and (4) for the initial CV syllable targets, (e.g., /
It is relevant to mention in this context that clear evidence for the role of the syllable, distinct from segmental overlap, is absent in speech production studies of European languages. Previous experiments used the masked-priming read-aloud or picture-naming task, and the results have been mixed. Ferrand et al. (1996) reported the initial evidence of syllable priming in French, that is, target words were read aloud faster when preceded by a prime that shared the syllable with the target, that is, BALCON (“balcony,” where the syllables are BAL and CON) was named faster when preceded the prime bal than ba, and BALADE (“ballad,” where the syllables are BA and LADE) was named faster when preceded by ba than bal. However, later studies (e.g., Schiller, 1998, 1999, with Dutch and English, respectively; Brand et al., 2003 with French, using the exact same stimuli used by Ferrand et al., 1996) 5 failed to replicate this pattern, finding only evidence for segmental overlap (the prime bal facilitated the naming of BAL.CON and BA.LADE more so than the prime ba). The fact that we found a clear role for the syllable here, independent of segmental overlap, may be due to task differences (e.g., previous studies used masked priming or PWI) or possibly the nature of Korean orthography (i.e., Hangul). Recall that Hangul explicitly represents the syllable by grouping the constituent letters into a square block; this contrasts with multisyllabic words written in the Roman alphabet in which the syllable boundary is not physically marked (i.e., music is not written as mu-sic). For example, consistent with an important role of the syllabic spatial organisation of Hangul characters, C. H. Lee and Taft (2009) have reported that the cross-language difference in the transposed letter similarity effect (i.e., the high level of confusion with the original base word which is generated by non-words in which adjacent letters are transposed, e.g., NAKPIN) in English and Korean was eliminated when English stimuli were presented in a “hangul-like format” (i.e., in which syllables of disyllabic English stimuli and their constituents were presented in separate vertical columns, resembling Hangul). In any event, it would be of interest to examine in future endeavours whether the current findings of clear syllable effects in the PWI task can be replicated in French, with the distractors written in the Roman alphabet.
In the present PWI task, the Same syllable, Resyllabification and Coda change conditions were indistinguishable. This replicated the pattern observed in J. I. Han and Verdonschot’s (2019) masked-priming read-aloud experiment; thus, the absence of difference observed in their experiment was not because “participants may not have had sufficient time (due to the limited time the masked prime is available) to process the primes up to the point that re-syllabification or nasalisation could have been carried out” (J. I. Han & Verdonschot, 2019, p. 901). It is also important to note that the pattern of results was found regardless of whether the initial syllable of the picture targets was a free morpheme (Experiment 1) or a non-morphemic syllable (Experiment 2). This means that the result cannot be explained in terms of morphemic overlap.
One potential explanation may be the nature of Korean orthography (i.e., Hangul). As noted earlier, Hangul explicitly represents the syllable by grouping the constituent letters into a square block, and this makes the (underlying) syllable salient. In addition, it may be that written distractors themselves do not undergo phonological encoding, as the distractor itself would not have to be uttered, and hence it would not have been processed down to the level of phonetic planning. This means that for a distractor such as 국눈 /kuk.nun/ [kuŋ.nun], the first syllable 국 might have been converted through grapheme-to-phoneme conversion to /kuk/, and resyllabifications or coda changes such as /kuŋ/ are never realised. Therefore, that our Same syllable, Resyllabification, and Coda change conditions elicited equal effects might have occurred as in all these cases the segments /kuk/ first became available. We should point out that our written distractors were all non-words, and it is currently unclear whether real words, such as 국민 /kuk.min/ [kuŋ.min] “people,” would have their coda change stored or not. Although we believe this to be unlikely (as resyllabification and coda change are phonological rules in Korean that apply to non-words as well as words), it is possible that with the use of real-word distractors the resyllabification and coda change conditions may show different effects compared with the Same syllable distractor. This needs to be investigated in future studies.
Incidentally, for CVC targets, half of all distractors shared the first grapheme with the target (e.g., 국 appears in three out of six distractor conditions as the first character for the target 국수 /kuk.su/ “noodle”). Therefore, one might consider whether effects for CVC targets (note: not for CV targets as only in 25% this was the case) are due to participants strategically using this relationship. However, a widely held notion is that effects in the Stroop/PWI task are unintentional and uncontrollable (i.e., effects appear even when they are detrimental to performance, e.g., Moors & De Houwer, 2006). In addition, J. I. Han and Verdonschot (2019) obtained the same result (i.e., same syllable = resyllabification = coda change) using masked priming in which participants did not consciously see the primes (and would not have been able to use a strategy), so we believe that a strategic effect is unlikely to be at play.
The absence of a statistically significant onset effect in the present picture–word interference task contrasts with the finding of a statistically significant onset effect by J. I. Han and Verdonschot (2019) using the phonological Stroop task. As the analysis method was slightly different in the two studies (Han and Verdonschot analysed the raw RTs and trimmed RTs slower than 700 ms), we reanalyzed their data using the present analysis method using log-transformed RT as the dependent variable. This did not change the outcome and again yielded a significant onset effect (t = −4.479, B = −0.033420, SE = 0.007461, p < .001). Moreover, the Bayes factor for Han and Verdonschot’s onset effect was 23, indicating strong support for the presence of the onset effect in their study. Thus, there is a true discrepancy between J. I. Han and Verdonschot’s (2019) Experiment 2 and the present experiments. This is all the more surprising, given that the two studies used similar tasks—the phonological Stroop task and the PWI task, both of which involve to-be-named targets (i.e., colours and pictures) whose name must be retrieved conceptually—with both studies involving two-character Hangul non-word distractors.
A possible interpretation of the “discrepancy,” however, is that it is not genuine; in fact, there does not seem to be even a quantitative difference between the two studies. The onset effect observed here ranged from −9 to 15 ms (−9 ms for the CVC targets and 15 ms for the CV targets in Experiment 1; 14 ms for the CVC targets and −2 ms for the CV targets in Experiment 2), while the onset effect found in J. I. Han and Verdonschot’s (2019) phonological Stroop task using CV targets was 14 ms. Also across the studies we reviewed in the “Introduction” section, the onset effect was variable, that is, of the studies using the masked-priming read-aloud task, Kim and Davis (2002) found a non-significant 7 ms effect; Witzel et al. (2013) found a statistically significant 9 ms effect; and J. I. Han and Verdonschot (2019, Experiment 1) found a statistically significant 19 ms effect. J. I. Han and Choi (2016) used the implicit priming task and found a statistically non-significant 11 ms onset overlap effect, and they noted that the effect varied across blocks of trials. A reasonable summary of all the studies taken together would be that the onset effect in Korean speech production seems to be small and not robust.
One last, and important, avenue we would like to pursue is to theorise on how phonological encoding takes place in Korean.6 Currently, comprehensive form encoding networks have been laid out for English/Dutch, Chinese, and Japanese (see Roelofs, 2015) but do not exist for Korean yet. In all these languages, the phonological encoding process consists of two parts (i.e., the “frame” and the “units”). Both the frame and units differ in the various languages, with the frame containing stress (English/Dutch), tonal (Chinese), or pitch accent (Japanese) information with the proximate unit being the phoneme (English/Dutch), syllable (Chinese) or mora (Japanese).
However, contrary to other languages, we would like to suggest that the Korean form network likely does not require any “frames” to be activated as Korean does not have stress, tone, or pitch accent. Therefore, no such information needs to be reconciled with any phonological unit as a first step. Note that the absence/presence of frames in Korean phonological encoding has not been explicitly investigated yet (as far as we know) and is subject to additional experimentation needed to verify this claim. Second, given our current findings, we propose the activation of proximate syllables followed by phonemic segments which, unlike Chinese, do not have their final positions assigned yet due to the presence of phonological processes such as resyllabification. Finally, syllabified motor programmes are created, which adhere to Korean phonological rules. Figure 1 illustrates the prospective form network for a non-resyllabified word such as 학생 /hak.sæŋ/ [hak.s’æŋ] “student” and a resyllabified word such as 학우 /hak.u/ [ha.ku] “classmate/buddy.” Note that /hak.u/ (a Sino-Korean word) does not initially activate the syllables /ha/ (and /ku/) as a proximate unit as /hak/ is a separately stored morpheme meaning “learning.” Given the available experimental evidence up to this point, we conclude that the initial stage of Korean phonological encoding reserves an important role for the syllable. It seems that the initial unit used in phonological encoding in Korean is different from Germanic languages such as English and Dutch, but also from Japanese, and potential similarities and differences to Chinese need to be investigated in future research.

Proposed form encoding network for Korean.
Supplemental Material
QJE-STD-19-438.R2-Supplementary_Material – Supplemental material for The proximate unit in Korean speech production: Phoneme or syllable?
Supplemental material, QJE-STD-19-438.R2-Supplementary_Material for The proximate unit in Korean speech production: Phoneme or syllable? by Rinus G Verdonschot, Jeong-Im Han and Sachiko Kinoshita in Quarterly Journal of Experimental Psychology
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: R.G.V.’s research is sponsored by a Grant-in-Aid (C) from the Japanese Society for the Promotion of Science (17K02748).
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
