Abstract
We investigated the influence of grapheme familiarity and native language grapheme–phoneme correspondences during second language lexical learning. Native English speakers learned Russian-like words via auditory presentations containing only familiar first language phones, pictured meanings, and exposure to either Cyrillic orthographic forms (Orthography condition) or the sequence <XXX> (No Orthography condition). Orthography participants saw three types of written forms: familiar-congruent (e.g., <KOM>-[kom]), familiar-incongruent (e.g., <PAT>-[rɑt]), and unfamiliar (e.g., <ФИЛ>-[fil]). At test, participants determined whether pictures and words matched according to what they saw during word learning. All participants performed near ceiling in all stimulus conditions, except for Orthography participants on words containing incongruent grapheme–phoneme correspondences. These results suggest that first language grapheme–phoneme correspondences can cause interference during second language phono-lexical acquisition. In addition, these results suggest that orthographic input effects are robust enough to interfere even when the input does not contain novel phones.
1 Introduction
Adult learners of a second language (L2) often have difficulty perceiving novel L2 phonological contrasts, limiting their ability to establish contrastive lexical representations of L2 words (e.g., Cutler, Weber, & Otake, 2006; Escudero & Wanrooij, 2010; Ota, Hartsuiker, & Haywood, 2009). However, learners are able to make use of available input to facilitate word learning; specifically, learners are able to exploit the availability of orthographic input (OI) to learn the phonological content of new words (Escudero, Hayes-Harb, & Mitterer, 2008; Showalter & Hayes-Harb, 2013). For example, Escudero et al. (2008) found that native Dutch speakers could establish contrastive lexical representations for the difficult-to-perceive English /æ/-/ε/ (e.g., pat-pet) vowel contrast when provided written forms that systematically manifest the distinction. Showalter and Hayes-Harb (2013) demonstrated that even unfamiliar OI can be exploited in this way. Roman segmental information with diacritic tone marks presented during pseudo-Mandarin word learning aided establishment of lexical representations compared with participants not exposed to tone marks (e.g., <gĭ> versus <gi>). In contrast with these findings, others have demonstrated either a hindrance or no effect of OI. For example, English learners of Mandarin in Bassetti (2006) were negatively affected by transfer of English orthography conventions during phoneme counting and phoneme segmentation tasks in Mandarin; they failed to count or produce vowels that were not represented in the Pinyin orthographic representations (e.g., /e/ in /guei/-<gui>). In Simon, Chambless, and Kickhöfel Alves (2010), OI had no apparent effect on participants’ ability to make inferences about the phonological forms of words in French (e.g., <dûge>-/dyʒ/, <douge>-/duʒ/, and <dige>-/diʒ/). As a result of these and similar findings reported in the literature, it is well established that OI can affect participants’ ability to make inferences about the phonological content of new L2 words. The present study is an attempt to better understand the factors that influence the impact of OI in L2 word learning.
2 Background
Recent research on the effects of OI has identified a number of variables that, in certain conditions, shape the inferences that learners draw about words’ forms, including phonological contrast difficulty (e.g., Escudero, 2015), instruction (e.g., Jackson, 2016), and word familiarity (Veivo & Järvikivi, 2013). In the present study, we will focus in particular on two of these variables: grapheme familiarity and congruence. Grapheme familiarity refers to the presence or absence of an L2 grapheme in an individual’s native or first language’s (L1) writing system. For instance, an L1 English-L2 German learner will be familiar with the grapheme <t>, which occurs in both languages. An L1 English-L2 German learner, on the other hand, will need to learn the unfamiliar grapheme <ß> or (diacritic) umlaut <ä>, which occurs only in the L2 German written forms. Congruence is a property of grapheme–phoneme correspondences (GPCs). A congruent GPC is one where the grapheme-to-phoneme mapping is the same in the L1 and the L2 (i.e., L1 English-L2 German, <n>-[n]); in this case, L2 learners do not need to learn new grapheme–phoneme mappings. Incongruent GPCs occur when a grapheme and phoneme are not mapped in the same way in the two languages (e.g., <w> maps to /w/ in English but to /v/ in German); in this case, L2 learners must acquire new mappings between graphemes and phonemes. Literate speakers of a language with an alphabetic orthography appear to transfer L1 GPCs to L2 word learning, which can lead to non-target-like knowledge of L2 words’ forms (see e.g., Bassetti, 2006). Few studies have directly investigated the effects of grapheme familiarity and congruence simultaneously, as well as their interaction with L1–L2 knowledge; however, each of the variables has been investigated separately.
In a study of the impact of grapheme familiarity, Showalter and Hayes-Harb (2015) investigated the effect of an entirely unfamiliar script on the acquisition of the phonological forms of L2 words. Native English speakers were exposed to a difficult-to-perceive contrast (velar-uvular /k-q/; see Al Mahmoud, 2013 for English speaker difficulty with this contrast) in L2 pseudo-Arabic. During a word-learning phase, participants saw either Arabic script (e.g.,
) or a meaningless sequence of letters (i.e., <ط ط ط ط>). Participants did not benefit from the Arabic script. Even in subsequent tasks designed to facilitate word learning and mediate effects of grapheme unfamiliarity (e.g., transliteration of Arabic script into the Roman alphabet), no effect of OI was observed. Showalter and Hayes-Harb concluded that effects provided from OI may have been overshadowed by the difficult-to-perceive contrast, a result of the entirely unfamiliar script, or a combination of OI, the contrast, and the script.
Other research has focused on effects of congruence of GPCs on L2 phonological acquisition in both perception (e.g., Escudero, 2015; Escudero, Simon, & Mulak, 2014; Hayes-Harb, Nicol, & Barker, 2010) and production (e.g., Rafat, 2016; Vokic, 2011). To investigate how L1 GPC knowledge may affect the inferences made about L2 words’ phonological forms, Hayes-Harb et al. (2010) presented native English speakers with spelled forms that contained only familiar letters. While the alphabet was familiar, some spelled forms contained incongruent GPCs, either “wrong letter” spellings (e.g., [fɑʃɑ] represented as <faza>) or “extra letter” spellings (e.g., [toɡεɡ] represented as <thogeg>), based on English phonological sequences. In support of their hypothesis, the authors observed that participants exposed to incongruent spellings accepted mispronunciations corresponding to incongruent spelled forms presented during word learning (e.g., [fɑzɑ]/[θoɡεɡ]), and performed less accurately at test than participants exposed to congruent forms. These results suggest that participants may have difficulty inhibiting knowledge of L1 GPCs when presented with incongruent GPCs in an L2.
Two recent studies manipulated both grapheme familiarity and congruence. Mathieu (2016) explored effects of varying grapheme familiarity in novel L2 lexical items. Native English speakers learned the Arabic /ћ-χ/ contrast via auditory forms, pictured meanings, and OI in one of four word-learning conditions: No Orthography (<XXX>), Arabic script (e.g., <ﺏﻭﺧ>), Cyrillic script (e.g., <xұб>), or Hybrid script (e.g., <жub>; first letter Cyrillic script and remainder Roman alphabet). It was predicted that as the graphemes became more unfamiliar, there would be more difficulty in acquiring the non-native contrast (Arabic as most difficult, followed by Cyrillic, then Hybrid). Unexpectedly, Mathieu observed no significant differences among the different script conditions. However, participants exposed to OI with incongruent GPCs, namely in the Cyrillic and Hybrid conditions, performed worse on the incongruent GPCs items at test. That is, knowledge of L1 graphemes <h> and <x> and their corresponding mapping with the phones /h/ and /z/, /gz/, or /ks/, respectively, appeared to interfere with participants’ ability to accurately learn the /ћ-χ/ contrast. As with Showalter and Hayes-Harb (2015), it is unclear whether the results should be attributed to unfamiliarity of the graphemes and phonemes, or the fact that the contrast was too difficult for the English speakers to perceive and acquire.
Because the perceptual difficulty of a novel phonological contrast may obscure the contribution of script familiarity or congruence, Hayes-Harb and Cheng’s (2016) materials did not involve novel phonological contrasts; they taught native English speakers Mandarin with exposure to auditory forms and either Pinyin (Romanized Mandarin) or Zhuyin. Zhuyin is entirely unfamiliar to native English speakers, while Pinyin is written with familiar (Romanized) segments. Pinyin stimuli were additionally divided according to whether the GPC was congruent with English (e.g., <nai>-[nai]) or incongruent (e.g., <zai>-[tsai]; English [zai]). Auditory forms contained familiar and unfamiliar (e.g., /ɕ/) phones, but were not dependent on unfamiliar contrasts (i.e., as in previous studies with /k/-/q/; e.g., there were no /kai/-/qai/ pairs). Participants completed a word-learning phase (with either Pinyin or Zhuyin OI, auditory forms, and pictured meanings) and a criterion test assessing knowledge of the newly learned words. A final test assessed participants’ ability to remember the phonological forms, presented as matched (i.e., pictured zai and [tsai]) or auditory foils/mismatched (i.e., pictured zai and [zai]). Zhuyin participants required more word-learning cycles to reach criterion, but Pinyin participants had less accurate performance overall as a result of incongruent forms (e.g., correct <xiu>-[ɕiou], incongruent hear [ziou]). Therefore, while script unfamiliarity created an initial delay in word learning, congruence caused more difficulty at test.
Building on Mathieu (2016) and Hayes-Harb and Cheng (2016), in the present study we investigate the interaction of grapheme familiarity and congruence during the acquisition of a pseudo-Russian lexicon by naïve English learners. The Russian Cyrillic alphabet provides the opportunity for a more ecologically valid study in that the combination of native English speakers and Russian/Cyrillic allows for grapheme familiarity and congruence effects to be observed within a single writing system. In addition, we control for potential phonological confounds; all auditory forms contain only familiar L1 phones. The elimination of difficult-to-perceive contrasts means any observed performance differences should reflect OI effects. The present study was therefore designed to address the question: How do grapheme familiarity and congruence interact in the context of native English speakers learning Russian Cyrillic words?
3 Method
3.1 Participants
Participants were native English-speaking undergraduate or graduate students from the University of Utah, either paid or awarded extra credit for volunteering. Participants had no prior formal (instruction) experience with the Cyrillic alphabet, did not report an L2 that was Cyrillic based, were not heritage speakers of a Cyrillic-based language, and did not have a history of any speech, language, hearing, or motor/neurological disorders. Participants were randomly assigned to one of two word-learning conditions: No Orthography (n = 15) or Orthography (n = 15). The No Orthography condition participants had an average age of 21.4 years (range 18–31) and consisted of six females and nine males. The Orthography condition participants had an average of 23.9 years (range 18–43) and consisted of nine females and six males.
3.2 Stimuli
As previously noted, the use of the Russian Cyrillic script and native speakers of English allows for the simultaneous manipulation of both familiarity and congruence. The Cyrillic alphabet is, like English, alphabetic and read from left to right. English is an opaque language with a deep orthography (for more information on depth see Frost & Katz, 1989), while Russian is relatively transparent or shallow. The stimuli included both Russian words and nonwords; this was a consequence of available graphemes, L1-L2 GPCs, and Russian phonotactic restrictions. Stimuli were chosen for one of three conditions (n = 4 words each): Unfamiliar grapheme (Unfam), Familiar grapheme-Congruent GPC (FamCong), and Familiar grapheme-Incongruent GPC (FamIncong). Unfamiliar stimuli included unfamiliar Cyrillic graphemes representing familiar phones (e.g., <Ф>-[f]). Familiar-Congruent stimuli included familiar graphemes, with GPCs being congruent between English and Russian (e.g., <К>-[k]). Familiar-Incongruent stimuli included familiar graphemes and phones, but the GPCs differed from English to Russian (e.g., <B>-English [b]/Russian [v]). A full list of stimuli is in Table 1.
Russian stimuli.
Crucially, the words did not contain any unfamiliar phones or unfamiliar phonological contrasts. That is, unlike some previous studies which contained phones that would be noticeably novel to participants (e.g., Arabic consonants for native English speakers), we attempted to control for phone familiarity in order to isolate effects of OI. While not all of the phones were identical to English, they were intentionally selected for their similarity to English. The present study differs in this regard from previous studies (e.g., Hayes-Harb & Cheng, 2016; Mathieu, 2016), and thus minimizes the possibility of confounds between OI effects and perceptual difficulty. All words were of the form CVC, keeping with the format of stimuli in previous literature. The monosyllabicity of the stimuli and the small set (n = 12) was chosen based on what is known about participants’ abilities to learn words within an experimental hour. For the FamCong and FamIncong stimuli, the first consonant/letter was manipulated for familiarity and congruence and all other segments/letters in the words were both familiar and congruent. In the Unfam stimuli all letters were unfamiliar. Each word was associated with a “correct” auditory form and “foil” form, with the foil forms reflecting L1 GPCs for FamIncong stimuli (e.g., <P>-/r/, English /p/) and differing in at least two features of articulation for Unfam and FamCong stimuli (e.g., < Д >-/d/, foil /k/ differing in voicing and place of articulation) so as not to be auditorily confusable or overlap with other words and/or GPC mappings.
Each of the 12 words was randomly associated with a real-object pictured meaning (obtained from the Bank of Standardized Stimuli; Brodeur, Dionne-Dostie, Montreuil, & Lepage, 2010); none of the assigned meanings corresponded to the words’ actual meanings in Russian.
The auditory words were produced by a 32-year-old female native Russian speaker from Novosibirsk, Russia. She was a graduate student at the University of Utah and had been residing in the United States for 4 months at the time of recording. The speaker was recorded producing each word three times and the second token of each word was selected for presentation in the study.
It is known that the auditory contrast /b/-/v/ is relatively confusable (see e.g., Ota et al., 2009; Experiment 1). To ensure that results were attributable to OI effects and not to any auditory confusability of the stimuli, the FamIncong test-foil pair /b/-/v/ was subjected to a categorization task (offline forced-choice identification). A separate group of native English speakers (n = 5) listened to all /b/ and /v/ stimuli produced by the Russian speaker and transcribed what they heard. Responses indicated that all items were identified as the intended forms (i.e., /b/ tokens perceived as /b/ and /v/ tokens perceived as /v/), with the exception of one /v/ production perceived as /b/ by one listener. This production was discarded and replaced by the third token produced by the speaker.
3.3 Procedure
The study employed the artificial lexicon design used in a number of previous studies (e.g., Hayes-Harb et al., 2010; Showalter & Hayes-Harb, 2013, 2015), implemented using DMDX software (Forster & Forster, 2003). The experiment involved three phases: word learning, a criterion test, and a final test. All participants sat in a sound-attenuated booth facing a computer screen and a keyboard, and heard auditory stimuli over headphones. During word learning, participants heard auditory forms of the words (e.g., [nom]) and saw their pictured meanings (e.g., a baseball). They also saw OI as either written representations in Cyrillic (Orthography Condition; e.g., <HOM>) or a meaningless sequence of letters in an effort to provide an equivalent amount of visual input (No Orthography Condition; i.e., <XXX>). All visual and auditory input was presented simultaneously. An example trial in each word-learning condition is provided in Table 2. Each of the 12 words was presented eight times, randomized within four blocks for each participant (each word two times per block; n = 96 presentations). Participants did not need to respond during this phase, but were instructed to learn the words and their meanings.
Example word learning phase trials.
Immediately following the word-learning phase participants completed the criterion test, in which they determined whether pictures and auditory forms matched. No OI was presented to either group during this phase. There were 12 matched items (e.g., baseball-[nom]) and 12 mismatched items (e.g., baseball-[zib]). Mismatched words did not involve incongruent GPCs, but were paired auditory forms and picture meanings from different conditions (e.g., see FamIncong picture glasses-[sot], hear FamCong [tam]; example in Table 3). Participants had three seconds to respond before the program counted the response as incorrect and proceeded to the next trial (participants were not provided feedback). To ensure that participants had generally learned the words’ phonological forms, a 90% criterion cutoff was required to pass to the final test phase. Participants completed as many word-learning cycles as needed to reach criterion.
Example criterion test trials: matched and mismatched.
The test phase was identical to the criterion test phase, except that mismatched items paired picture meanings with foil auditory forms (Table 4).
Example final test trials: matched and mismatched.
Once participants finished the test phase, they completed a language background questionnaire. The full experiment lasted approximately 30 minutes.
3.4 Analysis
3.4.1 Learning cycles
Participants in both conditions required between one and three word-learning cycles to advance to the test phase. An independent samples t-test was conducted, revealing no significant difference in mean number of word-learning cycles between the Orthography condition (mean = 1.6) and the No Orthography condition (mean = 1.2), t(28) = −0.167, p = 0.105; r = −0.292.
3.4.2 Mean proportion correct and d-prime
Figures 1 and 2 present mean proportion correct per stimulus condition during the test phase, for the matched and mismatched trials, respectively. Scores for both matched and mismatched items were near ceiling with the exception of FamIncong stimuli. Orthography participants’ proportion correct scores varied as a function of the familiarity and congruence variables (see Figure 2): Unfam and FamCong items yielded greater accuracy (100% and 86%) than proportion correct on FamIncong items (62%).

Mean proportion correct on matched items by word-learning condition. Error bars represent ±1 standard error.

Mean proportion correct on mismatched items by word-learning condition. Error bars represent ±1 standard error.
The proportion correct data were converted to d-prime (a measure of sensitivity to stimuli differences, factoring out bias), and these data were submitted to a two-factor mixed-design ANOVA with word-learning condition as the between-participants variable (two levels: Orthography and No Orthography) and stimulus condition as the within-participants variable (three levels: Unfam, FamIncong, FamCong). Figure 3 shows mean d-prime scores by stimulus and word-learning conditions. There was a significant main effect of stimulus condition, F(1, 28) = 6.215, p < 0.005; partial η2 = 0.182, no significant effect of word-learning condition, F(1, 28) = 3.413, p = 0.075; partial η2 = 0.109, and a significant interaction of stimulus condition and word-learning condition, F(1, 28) = 9.446, p < 0.005; partial η2 = 0.252.

Mean d-prime of word-learning condition by stimulus condition. Error bars represent ±1 standard error.
Following up on the significant interaction of stimulus condition and word-learning condition, we investigated the effect of word-learning condition on performance in each stimulus condition. There was no effect of word-learning condition on either Unfam stimuli, F(1, 28) = 1.944, p = 0.174, partial η2 = 0.065 or FamCong stimuli, F(1, 28) = 0.759, p = 0.391, partial η2 = 0.026. There was, however, an effect of word-learning condition on FamIncong stimuli, F(1, 28) = 10.558, p = 0.003, partial η2 = 0.274), with No Orthography participants outperforming Orthography participants.
Further analyzing the effect found with the FamIncong stimulus condition and word-learning conditions, proportion correct scores for each of the four FamIncong words were reviewed individually. Matched trial performance was near ceiling for all participants. Mismatched trial performance for the No Orthography participants was also at 100% for three of the words and 87% for <BAM>. Orthography participants performed least accurately on <PAT> (44%), with more accurate performance on <HOM> and <BAM> (both 63%), and greatest accuracy on <COT> (88%).
While the sequence <CO> does not map to [so] word initially, that <C> can map to [s] in other environments may have resulted in superior performance on this word. Participants may have used their knowledge of this mapping when they saw <CO>. It is not surprising that the other items have less accurate performance given the unlikely or impossible (English) mappings between the graphemes and target phones. The <B>-[v] stimulus was vetted for acoustic confusability and found not to be auditorily confusable. Had Orthography participants performed least accurately on this stimulus as the No Orthography condition participants did, it would be of greater concern. Overall, and as reflected in the statistics, it is evident that the incongruent stimuli caused considerable interference for the Orthography participants.
4 Discussion
In the present study we investigated the interaction of grapheme familiarity and congruence in the acquisition of the phonological forms of new words by native English speakers learning a Russian-like mini lexicon. Recall that the research question was: How do grapheme familiarity and congruence interact in the context of native English speakers learning Russian Cyrillic words? Unlike the entirely unfamiliar Arabic script in Showalter and Hayes-Harb (2015), novel graphemes and their GPCs appear to have been learnable within the experimental session in the present study. Incongruent written forms interfered with participants’ ability to make inferences about the phonological forms of words. This provides evidence that incongruent OI robustly affects learning of L2 words’ phonological forms.
The finding that OI can interfere with learners’ ability to make inferences about words’ phonological forms is consistent with previous studies (i.e., Hayes-Harb et al., 2010; Hayes-Harb & Cheng, 2016; Mathieu, 2016), and contributes to the accumulating evidence in the literature that incongruence poses a substantial challenge to learners. What is noteworthy here is that we observed a robust effect even when the auditory input was entirely familiar to learners (that is, it contained only familiar L1 phones). It is expected that unfamiliar OI or difficult-to-perceive phonological L2 contrasts cause difficulty for a learner; consider, for example, the difficulty that Japanese learners of English have with the /r/-/l/ distinction (see e.g., Cutler et al., 2006; Iverson et al., 2003). In the present study, however, phonological forms of words were misremembered due to interference from OI even in the absence of difficult-to-perceive contrasts. It thus appears that knowledge of L1 GPCs transfers to L2 acquisition and is not readily “unlearned” even when the input contains evidence of new GPCs.
It may be the case that the susceptibility of an L2 learner to interference from incongruent GPCs depends in part on the nature of the native language writing system with respect to orthographic depth, with learners from relatively shallow L1 orthographies experiencing more interference than those from relatively deep orthographies (see e.g., Escudero and Wanrooij, 2010 for discussion). It would be of interest to investigate the effects of orthographic depth (see e.g., Erdener & Burnham, 2005 or Frost & Katz, 1989 for more information) on inferences made about graphemes and phones in an L2. This is not detailed in the present study, but we acknowledge that English (a deep/opaque language) and Russian (a relatively shallow/transparent language) might affect the manner in which a learner approaches L2 GPCs.
Given that most previous studies of OI in L2 acquisition have been conducted in laboratory settings with naïve subjects and an artificial lexicon paradigm (notable exceptions include Bassetti, 2006 and Young-Scholten, 2002), the effect of OI in actual L2 learners is still relatively understudied. Young-Scholten (2002) found that learners with more exposure to OI during L2 German acquisition did not produce word final devoicing for alternating pairs (e.g., <Tag>-[tɑk] but <Tage>-[tɑgə]) after 11 months of instruction. Written forms present voiced consonants (e.g., <Rad>), interfering with learner knowledge that final obstruents are devoiced (e.g., <Rad>-/rɑt/). Using an out-loud sentence reading task, Comer and Murphy-Lee (2004) found that native English speakers in a Russian language class tended to mispronounce unfamiliar and incongruent graphemes (e.g., <Ц> <B>) at 12 weeks of instruction. Future research should investigate whether OI’s contribution to learning word forms can be moderated by instruction, and which interventions are most effective. The results found here and in the existing laboratory-based literature with respect to OI have been quite robust; however, it may be the case that experimental settings artificially inflate these effects by either drawing undue attention to aspects of OI or trivializing aspects of language that would otherwise be embedded in a communicative context. Thus, studies with participants who are actual L2 learners are necessary for us to better understand the pedagogical implications of this line of research. Further, as noted by Escudero et al. (2014), L2 learners are typically exposed to OI in instructed settings. To the extent that actual L2 learners are found to experience difficulty associated with OI, research investigating the role that instruction may play in moderating the interfering effect of incongruence may be beneficial to the field of L2 pedagogy.
Footnotes
Acknowledgements
I gratefully acknowledge the contributions of Ala Simonchyk, Isabelle Darcy, Rachel Hayes-Harb, Shannon Barrios, and Taylor-Anne Barriuso. I also thank the members of the Sound to Word in Bilingual and Second Language Speech Perception audience, and the audience at the 8th Pronunciation in Second Language Learning and Teaching Conference for their feedback.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
