Abstract
Recent research indicates that knowledge of words’ spellings can influence knowledge of the phonological forms of second language (L2) words when the first and second languages use the same orthographic symbols. It is yet unknown whether learners can make similar use of unfamiliar orthographic symbols. In this study we investigate whether native English speakers use orthographic tone marks to help them associate lexical tone with new L2 words? Native English speakers with no knowledge of Mandarin were assigned to ‘Tone Marks’ or ‘No Tone Marks’ word learning groups. During a word learning phase, they learned to associate Mandarin nonwords varying in lexical tone with orthographic forms (written in pinyin with/without tone marks) and pictured ‘meanings’. In Experiment 1, participants were asked whether a picture associated with, for example, tone 1 matched an auditory form containing tone 2. Tone Marks participants outperformed No Tone Marks participants, suggesting that the availability of unfamiliar orthographic symbols helped them associate lexical tone with the new words. In Experiment 2, the test involved matching an orthographic representation and an auditory word. Tone Marks participants performed above chance, while No Tone Marks participants did not, indicating that Tone Marks participants learned the correspondences between auditory tones and tone marks to some extent. We conclude that the presence of a novel orthographic feature (in this case, tone marks) can support native English speakers’ ability to associate a novel phonological feature (in this case, lexical tone) with newly-learned lexical items.
Keywords
I Introduction
One of the challenges of learning a second language (L2) is building a lexicon that effectively encodes the language’s phonological contrasts. It is well documented that second language learners exhibit difficulty perceiving and producing some L2 phonological contrasts; e.g. native speakers of Japanese typically exhibit difficulty distinguishing English words like read and lead due to difficulty with the English /r/–/l/ contrast (Cutler et al., 2006; Iverson et al., 2003; MacKain et al., 1981; Ota et al., 2009; Ueda et al., 2007). Several studies have, however, demonstrated that auditory perception training can result in improved performance with respect to novel L2 contrasts (e.g. Bradlow et al., 1997; Hazan et al., 2005; Wang et al., 1999). In addition, recent research has provided evidence that knowledge of the orthographic forms of words can influence learners’ memory for the phonological forms of second language words (Bassetti, 2006; Escudero and Wanrooij, 2010; Escudero et al., 2008; Hayes-Harb et al., 2010; Simon et al., 2010). For example, some recent studies have indicated that L2 learners may exhibit enhanced knowledge of the phonological forms of newly-learned words when they are provided with the spelled forms of the words than when they do not have access to spelled forms (Escudero et al., 2008; Hayes-Harb et al., 2010). These studies have primarily considered cases where the native and second language orthographies are similar (e.g. both employ the Roman orthography, though they may differ with respect to some grapheme–phoneme correspondences). In the present study we further investigate the utility of orthographic information in supporting learners’ knowledge of the phonological forms of L2 words, in this case, when the words are spelled using some unfamiliar orthographic symbols.
II Background
The influence of the availability of orthographic representations on the acquisition of novel second language phonological contrasts has received relatively little attention in the literature on second language speech, which has instead primarily focused on the role of auditory input in L2 speech learning. However, recent studies have found that learners’ memory for the phonological forms of words can be influenced by their knowledge of the spelled forms of the words. Several studies have provided evidence that orthographic forms can provide clues that help learners to discriminate L2 lexical items differentiated by novel phonological contrasts. For example, Escudero et al. (2008) demonstrated that the availability of orthographic representations of English words can help native speakers of Dutch create lexical representations for new words that differentiate the phonemes /æ/ and /ϵ/, while participants who had access to only the auditory forms of the new words did not. Participants were taught a set of auditory English nonwords containing /æ/ or /ϵ/ (e.g. [tændək] and [tϵnzə]). During a word learning phase, one group of participants was presented with each word’s auditory representation and an associated picture, while the other learner group was presented with the auditory representation, an associated picture, and an orthographic representation (e.g. <tandek> and <tenzer>, where the letters <a> and <e> correspond to the /æ/–/ϵ/ contrast). At test, the group who had seen the words’ orthographic representations demonstrated a pattern of lexical activation that indicated that they had established lexical representations that differentiated /æ/ and /ϵ/, while those who had not seen the words’ spelled forms did not. Thus the availability of spelled forms during the word learning phase supported native Dutch speakers’ ability to accurately remember the phonological forms of the new words.
It has also been demonstrated that the availability of orthographic representations can prevent learners from creating target-like lexical representations for new words. For example, Hayes-Harb et al. (2010) taught native English-speaking participants a set of words in an ‘unfamiliar language’. During a word learning phase, participants heard the English nonwords (e.g. [fɑʃə]), saw a picture representing the words’ meanings (e.g. a picture of an apple), and saw a written form. Participants in the control condition always saw the nonsense written form <xxxx>; participants in the congruent spelling group saw spellings for the auditory words that followed English spelling conventions (e.g. <fasha> for [fɑʃə]); and participants in the incongruent spelling group saw some spellings that did not follow English spelling conventions (e.g. <faza> for [fɑʃə]). At test, participants in the incongruent spelling group were more likely than participants in the other two groups to misremember the phonological forms of the newly-learned words in ways that reflected the incongruent spellings that they had encountered during the word learning phase. Thus the availability of orthographic forms for new words can hinder the development of target-like lexical representations when the grapheme–phoneme correspondences differ in the first and second languages.
Further evidence for the interaction of orthographic and phonological forms in second language word learning can be found in Bassetti’s (2006) work on native English speakers learning Mandarin. For the purpose of supporting both native and second language acquisition of written Mandarin, learners are often presented with Mandarin written in pinyin, a Romanization system for Mandarin. In one experiment, Bassetti (2006) asked native English-speaking learners of Mandarin to count the number of sounds in a syllable, and found that when the syllable contained a vowel not present in the orthographic representation (e.g. the /e/ in /guei/, which is spelled <gui>), participants counted fewer sounds than when the orthographic representation presented the vowel (e.g. the /e/ in /uei/, which is spelled <wei>). A second experiment, where the participants were asked to pronounce words, produced a similar result: if a vowel was present in the spelled form, participants pronounced it (e.g. the /e/ in /uei/, which is spelled <wei>). However, if the vowel was not present in the spelled form, it was not pronounced (e.g. the /e/ in /guei/, which is spelled <gui>). Bassetti (2006: 107) concluded that there is a ‘strong effect of pinyin orthographic conventions’ on learners’ phonological representations.
Studies have also demonstrated that orthographic representations can in some circumstances neither help nor hinder learners. For example, Simon et al. (2010) sought to determine whether nonnative vowel contrasts may be more accurately discriminated by learners when the learners are provided with orthographic representations that do not adhere to their native language’s grapheme–phoneme correspondences. They investigated native English speakers’ ability to discriminate between French /y/ and /u/. Experiment 1 involved two phases: a word learning phase and a test phase. In the word learning phase, learners were assigned to two different conditions: Sound Only or Sound–Spelling. Participants in both groups heard sets of three words, and were told that each word in the triplet contained a different vowel (e.g. [dyʒ]–[duʒ]–[diʒ]), and for each word saw its assigned picture (e.g. a ‘banana’ or ‘glasses’). Learners in the Sound–Spelling condition additionally saw the orthographic representations of the words (e.g. <dûge>–<douge>–<dige>). During the testing phase, learners performed an auditory AXB task, where they heard three new words and were required to determine whether the second word (X) they heard matched the first (A) or third (B) word (e.g. [styɡ]–[stuɡ]–[stiɡ]). The Sound–Spelling group performed more accurately on the AXB task than did the Sound Only group, but this difference was not significant, suggesting the availability of orthographic representations did not have a significant beneficial effect on learning the novel vowel contrasts. This experiment provides evidence that orthographic representations – especially when learners are required to create novel grapheme–phoneme correspondences – may neither help nor hinder a learner’s ability to distinguish words on the basis of novel L2 vowel contrasts.
The studies discussed thus far have all considered cases where the first and second languages use (nearly) identical sets of orthographic symbols, even if some of the grapheme–phoneme correspondences may differ. It is yet unknown whether L2 learners can use their more general knowledge that written forms can provide phonological information about words – independent of their knowledge of specific orthographic symbols – to support their ability to learn the phonological structure of second language words from unfamiliar orthographic symbols. The first goal of the present study is therefore to determine whether learners can use unfamiliar orthographic symbols to help them learn the phonological structure of L2 words.
In addition, the studies reviewed thus far have focused on the lexical representation of segmental contrasts, and it is unknown whether the influence of orthographic information on learners’ knowledge of the phonological forms of words is limited to words’ segmental structures. Thus the second goal of the present study is to determine whether the availability of orthographic information can influence learners’ knowledge of novel L2 suprasegmental contrasts (in this case, lexical tone, or pitch that conveys meaning; Ladefoged and Johnson, 2010). With these two goals in mind, Experiment 1 was designed to address the following research question: Do native English speakers benefit from the availability of orthographic tone symbols when learning to associate lexical tone with new L2 words?
If participants in Experiment 1 do benefit from the availability of orthographic tone symbols, it is possible that this benefit arises from having learned the correspondences between orthographic tone marks and auditory tones (e.g. learning that ‘x̆’ = low-falling-rising tone). Alternatively, it is possible that the mere presence of unfamiliar orthographic symbols cues participants to pay particular attention to the novel aspects of the auditory forms (i.e. the tones), leading them to enhanced knowledge of the words’ lexical tones. The third goal of the present study is to attempt to tease apart these two possibilities. If (a) participants in Experiment 1 demonstrate a benefit from the availability of orthographic tone marks during a word learning phase, but (b) an identically-trained group of participants is unable to match the orthographic tone marks to the auditory tones (in Experiment 2), we might infer that the ability to benefit from the availability of the orthographic tone marks is not predicated on having learned the specific correspondences between auditory tones and tone marks. This finding might suggest that the benefit derived from the availability of the orthographic tone marks, then, might be due to enhanced ‘noticing’ of unexpected elements in the auditory signal (for a discussion of the role of ‘noticing’ in second language acquisition, see, for example, Truscott, 1998).
III Experiment 1
Experiment 1 was designed to investigate whether second language learners can use unfamiliar orthographic tone marks to help them associate lexical tone with newly-learned words. The acquisition of Mandarin by native English speakers provides an ideal scenario for this research, as Mandarin is a tone language, and also can be written in pinyin, a Romanized writing system that presents segmental information using letters that are familiar to native English speakers, and diacritic marks to indicate lexical tone.
In Mandarin, lexical items are contrasted on the basis of both segmental and tonal information. For example, the sequence of segments /ma/ can be associated with four different lexical tones resulting in four different words:
high-level (tone 1), e.g. /ma-tone1/ ‘mother’;
high-rising (tone 2), e.g. /ma-tone2/ ‘hemp’;
low-falling-rising (tone 3), e.g. /ma-tone3/ ‘horse’; and
high-falling (tone 4), e.g. /ma-tone4/ ‘scold’. (Ladefoged and Johnson, 2010)
English does not have lexical tone contrasts, and the acquisition of Mandarin lexical tone by native English speakers is notoriously difficult (e.g. Liu et al., 2011; Wang et al. 1999). Thus native English speakers learning Mandarin must learn to associate both segments and tones with Mandarin lexical items in order to effectively contrast Mandarin words. As noted, pinyin is a Romanized version of Mandarin that includes the use of diacritic tone marks to represent the lexical tone contrasts; for example, the four Mandarin words above are written in pinyin as <mā>, <má>, <maă>, and <mà>, respectively.
1 Participants
Twenty-six adult native speakers of English were recruited from first-semester linguistics undergraduate courses at an American university and received course credit for their participation. A background questionnaire confirmed that no participants reported a hearing, language processing, speech, or neurological disorder, or experienced learning Mandarin or any other tonal language; they were not asked to report musical training experience. The participants were randomly assigned to one of two word learning conditions, the Tone Marks condition or the No Tone Marks condition. The Tone Marks group consisted of three males and ten females (n = 13); 12 were in the age group 21–30 years and one was in the age group 31–40. The No Tone Marks group consisted of seven males and six females (n = 13); eight were in the age group 21–30 years, four in the age group 31–40, and one in the age group 41–50.
2 Stimuli
The auditory stimuli consisted of Mandarin nonword minimal quadruplets, made up of two segment sequences, [ɡi] and [fiɑn], each associated with the four tones, for a total of eight nonwords. Though the native English-speaking participants would have been unfamiliar with real Mandarin words, nonwords were used in order to avoid presenting the native Mandarin speaker who produced the stimuli with real words that may have differed in their lexical frequencies and/or neighborhood densities, which are factors known to affect pronunciation (see, for example, Wright, 1997). A male native speaker of Mandarin was presented with the written form of each of the eight nonwords in isolation four times and was asked to pronounce the word. The second production of each nonword was selected for presentation in the study, for a total of eight unique auditory stimuli.
Visual stimuli included the written forms of the nonwords and line drawings of non-objects. The written forms were presented to participants in pinyin (see the second column in Table 1). Because they are based on the Roman alphabet, pinyin letters are familiar to English speakers; however, in pinyin, lexical tone is indicated by diacritic marks above the letters. Therefore, the letters themselves were familiar to participants, while the diacritic marks were unfamiliar. The non-object line-drawing pictures are provided in the third column in Table 1. For the purpose of helping participants to develop a novel mini-lexicon, each of the eight nonwords was assigned a ‘meaning’ represented by a non-object picture.
Information about the eight nonwords: Their lexical tones, their orthographic forms, their associated pictures, and their auditory forms.
3 Procedure
The experiment was comprised of three phases: a word learning phase, a criterion test phase, and a final test phase. During the word learning phase, participants in the Tone Marks condition heard the auditory representation and saw both the picture and the orthographic representation with the diacritic tone mark (e.g. see <gī> and its associated picture; hear [ɡi-tone1]). Participants in the No Tone Marks condition heard the auditory form and saw the orthographic representation without tone marks (e.g. see <gi> and its associated picture; hear [ɡi-tone1]). In both conditions, each word was presented twice per block, for a total of 16 items per block. The block was presented four times in a different random order each time and for each participant. Participants were instructed to learn the words and their meanings as well as possible, and no response was required of them.
During the criterion test phase, participants were tested on their memory of the meanings of the eight nonwords. In each criterion test trial, participants saw a non-object picture and heard a nonword (no written forms were presented). Each picture was presented four times: twice in a matched condition and twice in a mismatched condition. In the matched condition, the picture and the auditory form matched (e.g. see picture associated with [ɡi-tone1]; hear [ɡi-tone1]). In the mismatched condition, the picture and the auditory form did not match (e.g. see picture associated with [ɡi-tone1]; hear [fiɑn-tone1]). Each auditory word was associated with a different picture each time in the mismatched condition. Participants indicated whether they thought the picture and the auditory form were matched by pressing YES (matched) or NO (mismatched) keys on a keyboard. Participants had four seconds to respond before the response was considered incorrect and the test continued to the next trial. Critically, the criterion test only examined the discrimination between the segment sequences [ɡi] and [fiɑn], not the four lexical tones; in other words, sensitivity to the lexical tone contrasts was not necessary for participants to respond accurately during the criterion test. Participants needed to demonstrate 90% accuracy to continue to the final test. Participants repeated the word learning and criterion test loop as many times as necessary to reach the criterion.
In the final test, participants were tested on their knowledge of the lexical tone contrasts. The final test was identical to the criterion test except that mismatched items in the final test involved only a difference in lexical tone (e.g. see picture associated with [ɡi-tone1]; hear [ɡi-tone2]). In this way, we were able to determine whether participants were able to accurately associate lexical tones with lexical items.
For all parts of the experiment, participants were seated at a computer in a sound-attenuated booth. Visual stimuli were presented on a computer screen, auditory stimuli were presented at a comfortable listening level over headphones, and participants registered responses by pressing keys on a computer keyboard.
4 Results
The mean number of word learning cycles required to reach 90% criterion for the Tone Marks group was 2.69 (SD = 1.25) and for the No Tone Marks groups was 2.00 (SD = 1.08). Group difference for the word learning cycles required was not significant (F(1,24) = 2.28, p = .144, ηp2 = .09).
Figure 1 presents the results of the final test by participant group and item type. an analysis of variance (ANOVA) with participant group as a between-participants variable (two levels: Tone Marks and No Tone Marks) and item type as a within-participants variable (two levels: Matched and Mismatched) revealed a significant effect of item type (F(1,24) = 43.22, p < .01, ηp2 = .64), with higher accuracy on matched than on mismatched items. There was also a significant main effect of participant group (F(1,24) = 5.36, p = .03, ηp2 = .18), with more accurate performance by participants in the Tone Marks than in the No Tone Marks condition. In addition, the interaction of item type and participant group was significant (F(1,24) = 7.96, p = .01, ηp2 = .25).

Experiment 1: Mean proportion correct by participant group and item type.
Following up on the significant interaction, there was no significant effect of participant group for matched items (F(1,24) = 1.09, p = .31, ηp2 = .04). However, the effect of participant group for mismatched items was significant (F(1,24) = 7.14, p = .01, ηp2 = .23), with participants in the Tone Marks group performing more accurately than those in the No Tone Marks group. A signal detection analysis using d-prime was also conducted in order to determine participants’ ability to detect the ability between matched and mismatched items. The benefit of this analysis is that it accommodates possible biases participants may have had to press the YES or the NO buttons. This analysis confirmed the proportion correct findings just reported, with higher d-prime scores (indicating higher detectability) for participants in the Tone Marks group (M = 2.17, SD = .84) than the No Tone Marks group (M = 1.425, SD = .99; F(1,24) = 4.30, p = .05, ηp2 = .15). These findings suggest that the availability of orthographic tone marks enhance learners’ ability to associate lexical tones with newly-learned words.
To investigate the possibility that the different tones posed different degrees of difficulty for participants (see Table 2), an ANOVA with participant group as a between-participants variable (two levels: Tone Marks and No Tone Marks), the tone of the pictured word as a within-participants variable (four levels: tones 1, 2, 3, and 4), and proportion correct on mismatched items as the dependent variable was conducted (see Table 2). There was no main effect of tone (F(3,72) = 2.10, p = .11, ηp2 = .08) and no interaction of tone and group (F(3,72) = 1.07, p = .37, ηp2 = .04). Thus, performance was not systematically different on items that probed sensitivity to the different tones.
Experiment 1: Mean proportion correct (standard deviation) by participant group and tone of the pictured nonword.
Experiment 1 provides evidence that participants who were provided with unfamiliar orthographic tone marks during a word learning phase more accurately matched words distinguished by lexical tone to pictures at test, suggesting that their knowledge of the phonological forms of the new words was enhanced by the availability of orthographic information about lexical tone. However, we do not yet know why performance by participants in the Tone Marks group was more accurate than that of the No Tone Marks participants. One possibility is that participants learned the correspondences between auditory tones and tone marks, and then used this knowledge to help them remember which lexical items involved which lexical tones. Alternatively, the presence of the tone marks during the word learning phase may have simply cued them to notice the contrastive lexical tones in the auditory forms, leading them to more accurately associate lexical tones with lexical items, independent of knowledge of any correspondences between auditory tones and tone marks. The purpose of Experiment 2 is to attempt to tease apart these possibilities by examining the extent to which participants in the Tone Marks condition learned the correspondences between auditory tones and tone marks.
IV Experiment 2
1 Participants
Twenty-eight adult native speakers of English participated in Experiment 2. All participants met the same criteria as participants in Experiment 1, and they were randomly assigned to one of two word learning conditions: the Tone Marks condition and the No Tone Marks condition. Each of the two groups consisted of seven males and seven females (n = 14). All of the participants in the Tone Marks group were in the age group 18–30 years, and all of the participants in the No Tone Marks group expect for one (in the age group 31–40) were in the age group 18–30.
2 Stimuli
The set of auditory and visual stimuli used in Experiment 2 was identical to those used in Experiment 1.
3 Procedure
As in Experiment 1, there were three phases: a word learning phase, a criterion test phase, and a final test phase. Both the word learning and the criterion test phases were identical to Experiment 1; the only difference between the procedures of Experiments 1 and 2 can be found in the final test phase. In Experiment 2, instead of asking participants to match pictures to auditory forms, participants were asked to match spelled forms to auditory forms: no pictures were presented during the final test in Experiment 2. For participants in both word learning conditions, the spelled forms presented at test were those with the tone marks (e.g. <fián>). If the Tone Marks group learned the correspondences between auditory tones and tone marks, they should be able to match the auditory words with their written forms. The No Tone Marks group should perform at chance.
4 Results
The mean number of word learning-criterion cycles required to reach 90% criterion for the Tone Marks groups was 2.43 (SD = 1.09) and for the No Tone Marks group it was 1.50 (SD = .52). The group difference for the word learning cycles required was significant (F(1,26) = 8.29, p = .01, ηp2 = .24). The reason for this significant difference is unclear, given that Experiments 1 and 2 have identical word learning and criterion phases and that there was no analogous significant difference between groups in Experiment 1. The possible implications of this finding are addressed below.
Figure 2 presents the results of the final test by participant group and item type. An ANOVA with participant group as a between-participants variable (two levels: Tone Marks and No Tone Marks) and item type as a within-participants variable (two levels: Matched and Mismatched) revealed a significant effect of item type (F(1,26) = 16.35, p < .01, ηp2 = .39), with higher accuracy on matched than on mismatched items. There was also a significant main effect of participant group (F(1,26) = 5.51, p = .03, ηp2 = .18), with more accurate performance by participants in the Tone Marks than the No Tone Marks condition. The interaction of item type and participant group was not significant (F(1,26) = .39, p = .54, ηp2 = .02).

Experiment 2: Mean proportion correct by participant group and item type.
A signal detection analysis using d-prime was also conducted, and confirmed the proportion correct findings, with higher d-prime scores for participants in the Tone Marks group (M = 1.01, SD = 1.28) than the No Tone Marks group (M = .65, SD = .61; F(1,26) = 6.18, p = .02, ηp2 = .19). Because we are interested in determining whether performance by participants in the Tone Marks group in Experiment 1 may have been influenced by learning correspondences between auditory tones and tone marks, of particular interest is the extent to which participants exhibited an ability to match the auditory and orthographic forms; thus we examined whether the proportion correct scores, averaged across item condition, were above chance in each participant group. Performance by participants in the Tone Marks group was significantly above chance (at 0.65; t(13) = 2.91, p = .01), while performance by participants in the No Tone Marks group was not significantly different from chance (at 0.51; t(13) = .37, p = .72). Thus it appears that participants in the Tone Marks group did learn the correspondences between auditory tones and tone marks to some extent (though their performance, at 65% correct, was not highly accurate), while participants in the No Tone Marks group did not.
Recall that participants in the Tone Marks group required significantly more word learning-criterion cycles than participants in the No Tone Marks group: the difference in mean number of cycles between groups was approximately one cycle. Thus participants in the Tone Marks group were presented with each nonword (auditory form, picture, and spelled form) eight more times, on average, than participants in the No Tone Marks group. It is worth noting that despite the lack of significant difference between groups in the number of cycles required to pass the criterion test in Experiment 1, the descriptive pattern found in Experiment 1 is similar to that in Experiment 2, with participants in the Tone Marks group requiring more cycles, on average, than participants in the No Tone Marks group. It may be the case that participants in the Tone Marks group were engaged in trying to learn the associations between the pictures and the auditory tones, and/or learn the new correspondences between auditory tones and tone marks, leaving fewer resources available for learning the associations between the pictures and the auditory segment sequences (which is all that was necessary to perform accurately on the criterion test), with the result that they required more word learning-criterion test cycles. In contrast, participants in the No Tone Marks condition saw only familiar orthographic symbols during the word learning phase, which were compatible with English grapheme–phoneme correspondences, and to the extent that they did not notice the auditory tone differences among the words, the task of learning to associate the (segmental portions of the) auditory words with the pictures might have been relatively easier.
Of particular concern here is in fact not any difference in the number of word learning-criterion test cycles between Tone Marks and No Tone Marks participants in Experiment 2, but rather whether the Tone Marks participants in Experiments 1 and 2 differed in the number of cycles they required to meet the word learning criterion. An analysis using the Scheffé criterion for significance due to the difference in numbers of Tone Marks participants in the two experiments indicates that Tone Marks participants in Experiments 1 (n = 13, M = 2.69, SD = 1.25) and 2 (n = 14, M = 2.43, SD = 1.09) did not differ significantly in the number of word learning cycles they required. We can thus conclude that the Tone Marks participants in the two experiments had relatively similar amounts of exposure to the nonwords, and we might then infer that the Tone Marks participants in Experiment 1 also learned the correspondences between auditory tones and tone marks to some extent. While it is not possible to infer a causal relationship between learning these correspondences and being able to associate tones with lexical items; we can nonetheless conclude that the Tone Marks participants, who appeared to have learned to associate lexical tone with the lexical items to some extent, also developed limited knowledge of the correspondences between auditory tones and tone marks during the word learning phase.
As in Experiment 1, it may be the case that the different tones posed different amounts of difficulty for participants (see Table 3). To investigate this possibility, an additional ANOVA with participant group as a between-participants variable (two levels: Tone Marks and No Tone Marks), and the tone of the written word (four levels: tones 1, 2, 3, and 4), and proportion correct on mismatched items was conducted. There was no main effect of tone (F(3,78) = 2.16, p = .10, ηp2 = .08) and no interaction of tone and group (F(3,78) = 1.16, p = .33, ηp2 = .04). Thus performance was not systematically different on items that probed sensitivity to the different tones.
Experiment 2: Mean proportion correct (and standard deviation) by participant group and tone of the pictured nonword.
V Discussion
The experiments presented here were designed to explore how orthographic representations influence learners’ knowledge of the phonological forms of newly-learned L2 words. The first goal of the present study was to determine whether learners can use unfamiliar orthographic symbols to support their ability to learn the phonological structure of L2 words, and the second goal was to determine whether the availability of orthographic information can influence learners’ ability to associate lexical tones with novel L2 words. Experiment 1 provided evidence that native English speakers can indeed use unfamiliar lexical tone diacritic marks to help them associate auditory lexical tone with meanings in a novel lexicon. While participants in the Tone Marks and the No Tone Marks word learning conditions did not differ significantly in the amount of exposure they required to learn the words, they did differ in their ability to determine that auditory forms and pictures were not matched when the auditory word contained a different lexical tone than did the pictured word, with participants in the Tone Marks group (73% correct) outperforming participants in the No Tone Marks group (47% correct). On the basis of this finding, we conclude that under some circumstances, second language learners may be able to use knowledge of newly-learned words’ orthographic representations to help them associate tones with lexical items. As noted above, however, on the basis of Experiment 1 alone, we have little information about why the Tone Marks participants experienced this benefit.
In order to benefit in this way from the availability of the tone marks, participants in the Tone Marks group may have learned the new correspondences between auditory tones and tone marks (that is, learned that, for example, <x̀> corresponds to [tone4]). In this scenario, participants might have created lexical representations that included this orthographic information, providing an additional source of information about the words’ phonological structures in memory, leading to more accurate performance at test. However, it is also possible that the mere presence of the unfamiliar orthographic symbols cued them to pay particular attention to the auditory forms of the words, making them more likely to notice that the auditory words had different tones and thus more likely to learn which lexical items were associated with which tones. The third goal of the present study was to attempt to tease apart these two possibilities, and was addressed in Experiment 2. In Experiment 2, a new group of participants learned the same Mandarin nonwords as in Experiment 1 and under identical conditions. They were then tested on their ability to match auditory words to their orthographic forms (with tone marks). The Tone Marks group’s above-chance performance on this task indicates that they did learn the correspondences between auditory tones and tone marks to some extent, and the at-chance performance by No Tone Marks participants seems to indicate that the tone marks are not ‘iconic’ or otherwise obviously associated with the auditory tones. We cannot draw any straightforward conclusions from Experiment 2 concerning the relationship between knowledge of which tones are associated with which words and knowledge of specific correspondences between auditory tones and tone marks. However, it is apparent that native English speakers can (a) partially learn a small set of Mandarin nonwords and their associated lexical tones, and (b) partially learn the correspondences between Mandarin lexical tones and pinyin tone marks during a relatively short (< 1 hour) exposure period.
Is it surprising that the native English speakers in the present study were able to use novel orthographic symbols to help them learn the lexical tones associated with new second language words? Simon et al. (2010) provided evidence that when novel phoneme contrasts are combined with novel orthographic–phonological correspondences (e.g. <dûge> for [dyʒ] or <douge> for [duʒ]), second language learners do not necessarily benefit from the availability of the spelled forms. Given that participants in the present study were presented with both novel auditory (lexical tone) contrasts and also novel orthographic marks (which entailed novel orthographic–phonological correspondences), the present finding may at first seem unexpected. However, as Simon et al. (2010) point out, the task of learning the novel grapheme–vowel correspondences in their study may have in fact been quite difficult for native English speakers, who are accustomed to the relatively opaque English mapping between graphemes and vowels, and who may not expect vowel letters to be very informative about vowel quality relative to speakers of languages with more transparent orthographies. On the other hand, participants in the present study did not need to ‘overcome’ any native language associations for the orthographic tone marks (except, perhaps, for the tone 2 and tone 4 diacritics, which are optionally included in the written forms of a small number of French borrowings, for example, résumé and voilà, though native English speakers are unlikely to have robust phonological correspondences for these marks). It may be the case that learning the orthographic–phonological correspondences for novel orthographic symbols is easier than learning new correspondences for familiar symbols, though this hypothesis must be systematically tested before such a conclusion can be drawn.
In contrast with the findings of some studies of Mandarin tone discrimination by native English speakers (e.g. Francis et al., 2008; Hao, 2012; So and Best, 2010), we found no significant effect of tone on performance in either of the two experiments. However, the present study was not designed to elicit tone differences, and the power of these analyses may have been too low to reveal any such differences.
In order to simulate the initial stage of second language acquisition and to control crucial details concerning both the makeup of the lexicon and characteristics of lexical items, and in keeping with previous studies in the relevant literature (e.g. Hayes-Harb et al., 2010; Simon et al., 2010), we taught participants a mini-lexicon in an unfamiliar language. However, the ability to interpret the results of novel lexicon studies as representative of the initial stage of second language acquisition relies on the assumption that participants treat the stimuli as new second language words. It must be acknowledged that studies such as the present one differ in many important ways from what we might call the ‘typical’ second language learning experience. For example, one way in which the present study differs from L2 learning outside of the laboratory is that the auditory stimuli were intentionally controlled so as not to exhibit the variability that is typical in second language learners’ auditory input. In a pilot study, which was otherwise identical to Experiment 1, both a male and a female native speaker of Mandarin produced the auditory stimuli. When the stimuli were produced by the two different speakers, participants in both participant groups performed at chance at test, indicating that they had not been able to learn the words’ lexical tones. When the variability associated with the second voice was removed, as in the present study, however, performance by participants in both groups improved, and differences between the two groups of participants emerged. Another way in which the present study may not be representative of L2 learning outside of the laboratory is that the pictures indicating the meanings of the novel lexical items were of non-objects. While exposure to novel names for familiar objects may be more typical in early second language learning, learners do in fact occasionally encounter second language words for which they know of no native language translations (e.g. lúcuma, a fruit native to Peru whose English translation ‘eggfruit’ is unfamiliar to many Americans).
On the other hand, we also took steps to recreate the experience of second language learning by the participants in this study. For example, in order to ensure that participants believed they were learning a new language, they were asked about their language background and then told that they would be learning some words in an unfamiliar language. In addition, the auditory stimuli were produced by a native speaker of Mandarin, and the productions thus differed from English not only in the presence of lexical tone, but also in the pronunciation of the segments. It is assumed that these aspects of the study contributed to encouraging participants to treat the stimuli as new second language words despite the limitations imposed by the controlled nature of the study.
One might expect unfamiliar orthographic symbols to pose some difficulty for learners, especially in the absence of explicit instruction concerning the meanings of the symbols, and we have found some evidence that they may have been confusing and/or distracting to participants. Recall that the Tone Marks participants in Experiment 2 required significantly more word learning cycles than did the No Tone Marks participants to pass the criterion test. However, this result was not robust across experiments, and therefore should be interpreted with caution. Perhaps more compelling is the beneficial effect that the availability of the orthographic tone marks had on participants’ ability to associate lexical tone with new words. It is worth noting that participants received no explicit instruction concerning the significance of the tone marks; nor, in fact, were they told to learn the words’ spellings: they were simply instructed to ‘learn the words and their meanings as well as possible’. It appears, then, that participants expected the orthographic forms to provide some kind of information about the phonological forms of the words, and over the course of a relatively short exposure period made sufficiently effective use of the unfamiliar orthographic symbols so as to benefit their ability to associate tones with the novel lexical items. We wish to stress, however, that it is not possible to determine on the basis of this study whether the enhanced performance is due to an increase in noticing (thus leading to memory), to more robust memory representations, or to something else entirely.
In sum, we have found some evidence for the utility of partially unfamiliar orthographic information in second language word learning; in this case, the availability of pinyin tone marks led to an enhanced ability to associate lexical tone with newly-learned Mandarin nonwords than when orthographic forms did not include tone marks. A next logical step may be to ask to what extent learners can make similar use of an entirely unfamiliar phonographic script (e.g. the Arabic script for native English speakers). In the context of Chinese language pedagogy, where the ultimate goal is for learners to acquire the system of Chinese characters, it may also be of interest to investigate whether the addition of tone marks to these characters can support their acquisition of lexical tone. In addition, we do not yet know whether the observed beneficial effects of tone marks in the present listening tasks would also appear in learners’ production performance. It is clear that more studies are needed to further illuminate the undoubtedly intricate relationship between the familiarity of orthographic symbols and the effect of the availability of spelled forms on second language word learning.
Footnotes
Acknowledgements
The authors are grateful to Speech Acquisition Lab members who helped with data collection, to the audience at SLRF 2011 who provided comments and suggestions about the study, and to Susanne Carroll for organizing this special issue of Second Language Research. We extend special thanks to Brian Cragun, without whom this project would not have happened.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
