Abstract
This study investigated the effect of teacher codeswitching on second language (L2) vocabulary acquisition during listening comprehension activities in a lexical Focus-on-Form context. To date there has been research on teacher beliefs about first language (L1) use, its functions and its distribution in the interaction, but little on its effect on aspects of learning. Previous research on intentional vocabulary teaching has shown it to be effective, but whether the lexical information provided to learners is more effective in L1 or L2 has been under-researched and, moreover, has only been investigated in a reading comprehension context. Eighty first-year students of English as an L2, in a Chinese university, were stratified by proficiency and randomly allocated to a codeswitching condition or to an English-only condition, and their performance in vocabulary tests compared to a control group of 37 students that did not receive any lexical Focus-on-Form treatment. Results confirm previous studies that lexical Focus-on-Form leads to better vocabulary learning than mere incidental exposure. Results also provide initial evidence that teacher codeswitching may be superior to the teacher providing L2-only information. Contrary to some theories of the mental lexicon, proficiency level did not clearly favour one condition against the other.
I Introduction and background
This study had three principal aims:
to explore the relative benefits of intentional vocabulary learning versus incidental vocabulary learning in a Focus-on-Form context (Long, 1991);
to contribute to the debate on the use of the first language (L1) in second language (L2) classrooms by providing empirical evidence of the effect of teacher codeswitching on student vocabulary learning; and
to arrive at a good compromise between highly controlled experimental and/or laboratory conditions of research and the naturally occurring L2 classroom.
1 Lexical Focus-on-Form
Intentional vocabulary learning that is contextualized in a communicative teaching activity is what Laufer and Girsai (2008) propose constitutes Lexical Focus-on-Form. Intentional learning is generally contrasted with incidental learning. In the case of vocabulary, if the learner is learning ‘incidentally’, his or her attention is focused predominantly (but not necessarily exclusively) on the message contained in a text or utterance rather than the form through which that message is being conveyed. If the learner is learning a word ‘intentionally’, the main focus of his or her attention is on the form–meaning relationships and properties of that word, with the presumption that it will be acquired to some degree.
Laufer and Girsai (2008) point out that Form-Focused Instruction was conceived and developed in the context of the acquisition of morpho-syntax, but argue that it can be extended to vocabulary as has been demonstrated in meaning negotiation studies by, for example, de la Fuente (2002).
Virtually all conceptualized, intentional L2 vocabulary teaching and learning has been researched in relation to written texts. We know of only one study (Toya, 1993) where it has been researched in relation to oral text, as in this study. Yet, Laufer and Girsai’s lexical re-conceptualization of Focus-on-Form can be applied to both uni-directional spoken texts (e.g. audio or video recordings) or to teacher–learner discourse. Here the pedagogical issue is the extent to which learning of vocabulary occurs when different levels of attentional resources are being drawn, by the teacher, to specific items in the input, either thanks to the sensitivity that a teacher has to the lexical difficulty of a text/discourse, or the result of a communication breakdown (Ellis et al., 1994). Recently Macaro et al. (2009) have proposed the concept of ‘teacher as dictionary designer’, where the teacher, during interaction, provides the form–meaning connections that a dictionary offers but with the added advantage of the teacher having some knowledge of the students’ current vocabulary store. In the case of the bilingual teacher, he or she would be both a ‘monolingual and bilingual dictionary designer’.
2 Teacher codeswitching
Along with other researchers in the field (e.g. Ferguson, 2003; Üstünel & Seedhouse, 2005) we have adopted the term ‘teacher codeswitching’ rather than ‘teacher use of L1’ in order to draw a distinction between them. In the naturalistic environment, codeswitching is defined as the ‘alternation of two languages within a single discourse, sentence or constituent’ (Poplack, 2000, p. 224), and there is a wealth of research evidence (see Auer, 1998; Tay, 1989) that codeswitching, far from being a sign of linguistic deficit, is a sign of bilinguals having superior control of two or more languages compared to monolinguals. Naturalistic codeswitching, however, does not appear to be an ‘anything goes’ phenomenon but follows certain grammatical, lexical and social conventions, even though some speakers at times violate these conventions (see a recent review by Gardener-Chloros, 2009). As we argue below ‘teacher use of L1’ appears to have no rules, conventions or limitations. ‘Codeswitching’ frameworks, therefore, can offer formal classrooms a hook upon which to hang principled rather than ad hoc L1 use. One such framework is Myers-Scotton’s Matrix Language Framework (1993; 2001). Here it is generally possible to identify a predominant language in the interaction – i.e. one that provides the morpho-syntactic structure – and an ‘embedded language’, i.e. one that occurs less frequently, is characterized by content words rather than function words, and is adopted for a number of purposes including the more effective communication of ideas through ‘marked’ (i.e. switched) lexical items (see, for example, Chung, 2006 where it is used for this purpose among family members).
To the possible objection that classroom codeswitching is incompatible, as a phenomenon, with naturalistic or ‘social’ codeswitching, we offer, albeit briefly, the following theoretical arguments. Language classrooms are social situations where speakers ‘share knowledge of communicative constraints and options’ and therefore can be ‘said to be members of the same speech community’ (Gumperz & Hymes, 1986, p. 17). Classroom discourse does not need to simulate naturalistic discourse in order for it to be authentic but is ‘authenticated’ by the participants in the discourse (Van Lier, 1996). It is true that in classroom discourse codeswitching occurs both for communication and for teaching/learning purposes; but that is its actual discourse function. Moreover, there is evidence that in naturalistic contexts where participants in the discourse have unequal proficiency in one of their two languages, some codeswitching occurs for the purpose of linguistic advancement as well as communication (David, 2004).
3 L2 exclusivity
The debate on whether the L1 of the learners should be used in the L2 classroom or whether teachers should use the L2 exclusively stretches back well over a century when alternative methods to a grammar–translation approach to language teaching were introduced, inter alia, by the Direct Method in the late 19th century (Richards & Rodgers, 1986). The debate appears to have subsided in the late 1970s and 1980s when explorations of the communicative approach were being undertaken theoretically (Canale & Swain, 1980), pedagogically (Widdowson, 1978), and empirically by the proponents of what was to become known as the interaction hypothesis (for a recent comprehensive account, see Mackey, 2007). The hypothesis with its attendant empirical research attempted to demonstrate that the L2 could not only be comprehended but could also be acquired through pre-modified input (Krashen, 1985), through interactionally-modified input (Long, 1981; Pica et al., 1987), through forced output (Ellis & He, 1999; Pica et al., 1989; Swain, 1995), and through negative input, such as feedback to learner error (Dekeyser, 1993; Lyster, 1998). Interestingly, whilst pedagogical commentators such as Widdowson (2003) have continued to reflect on the albeit limited role of the L1 in these communicative classrooms, researchers in the interaction hypothesis at best completely ignored its role (Mackey, 20071) and at worst considered it an aberration to the learning process (see, for example, Lyster, 1998 who coded learner use of L1 as an error). The reason for this may be that many of these studies have been in contexts in which to use the L1 is difficult because of the mixed background of the learners.
A re-evaluation of the L1’s contribution to communicative competence was developed in the 1990s, and it centred around a number of distinct areas. Researchers working from a socio-cultural perspective pointed to important functions that the L1 had in facilitating L2 interaction (Antòn & DiCamilla, 1998; Brookes & Donato, 1994; Hancock, 1997). The L1 was identified as a tool with which the individual not only thought about language during use – the ‘inner voice’ for working out the task in question – but also the tool with which he or she progressed the task with others. The general claim, however, was that it facilitated classroom interaction, not acquisition.
This socio-cultural perspective began to run parallel, especially in the case of English as a global language, with the belief that to exclude the learners’ L1 was an imposition bordering on linguistic imperialism (Phillipson, 1992) or that it undermined the personal identity bestowed by the L1 (Canagarajah, 1995; Ferguson, 2003; Lin, 1996). These writings coincided with an increased questioning of the native speaker teacher’s predominance, not only as the learners’ linguistic model (Cook, 1999; Moussu & Llurda, 2008), but also as the best possible person to further the learner’s interlanguage development. Finally, there was a growing recognition that the L2 develops alongside, and interacts with, the already developed L1 rather than developing separately from it; that the process of learning an L2 is a process of becoming bilingual (Cook, 1992).
Scholars have investigated teacher use of the L1 by researching teacher’s beliefs regarding its inclusion or exclusion from the L2 classroom (e.g. Kharma & Hajjaj, 1989) and have concluded two things. First, the majority of teachers favoured its inclusion. Second, teachers could be classified into three fairly distinct groups:
those that believe that it is perfectly possible to exclude it completely, and hold that it is detrimental to allow it or use it (the ‘virtual position’; for this terminology, see Macaro, 2000, 2001; McMillan & Turnbull, 2009);
those that believe it is highly desirable to exclude it, but who are unable to do so (the ‘maximal position’); and
those who believe it should not be excluded, and that there are a number of principled uses of the L1 that enhance L2 learning (the ‘optimal position’).
The maximal position (for an example of teachers expressing this belief, see Liu et al., 2004) seems a-theoretical from a psycholinguistic perspective; a statement of inadequacy rather than a set of propositions that explain or predict language acquisition and skill development.
Researchers have attempted to identify the functions to which the L1 is put (Hosoda, 2000; Polio & Duff, 1994; Üstünel & Seedhouse, 2005). This research suggests that the functions are neither limited nor highly principled. For example, a number of studies reported that the L1 is used as a short-cut to learning because of the pressure of exams. Other functions reported are: contrasting L1 and L2 forms, providing metalinguistic cues, translating, giving L1 explanations of previously used L2 utterances, providing instructions for carrying out tasks, prompting L2 use, commenting on social events, eliciting learner participation, and classroom management. The list seems open-ended (Ferguson, 2009), and somewhat ad hoc. Moreover, some researchers argue that functions such as classroom management and content transmission are ‘useful’ (Hosoda, 2000, p. 71), or that they can be ‘beneficial’ (Storch & Wigglesworth, 2003, p. 761) but do not provide convincing definitions of ‘useful’ or ‘beneficial’.
A similar lack of guiding principle appears if one attempts to synthesize studies that have observed and measured proportions of L1 use in teacher–class interaction. Findings suggest a range of L1 use, within-study, from as narrow as 2%–5% (Kong & Zhang, 2005), 4%–12% (Macaro, 2001), 0%–18% (Rolin-Ianziti & Brownlie, 2002), 0%–60% (Levine, 2003), to as wide as 0%–90% (Duff & Polio, 1990). Clearly this is not an accumulation of research evidence that can inform practice. What seems to be happening is that we have both an international and a context-specific (or local) situation where apparently ‘anything goes’. But is that really the case? Or is it that researchers asking the quantitative question have not categorized sufficiently the language classrooms they have been observing? In a few studies, the general pedagogical approach is provided. For example, in Guo (2007) it is measured by the use of the ‘COLT’ (communicative orientation in language teaching) observation schedule (Fröhlich et al., 1985). In many other studies the reader is given no indication (e.g. Brooks-Lewis, 2009) whether the researcher has been measuring a classroom where the general pedagogical intent is the communication of meaning or one where the teacher is simply comparing the L2 with the L1 both grammatically and lexically. We see little value in measuring the amount of L1 use during ‘grammar–translation’ lessons.
In order to arrive at some notion of optimal L1 use, it seems essential that research provide a body of findings that document the effects of L1 use on L2 learning, a gap in the research literature we are not alone in identifying (see Carless, 2008; Ferguson, 2003, 2009; Levine, 2003). Kaneko (1992) addressed the question by relating teachers’ L1 use with what students claimed to have learnt and reported that the more L1 use, the less progress students made with pronunciation, but a mixture of L1 and L2 was beneficial for vocabulary and grammar acquisition. Qian et al. (2009) reported a gradual reduction in teacher codeswitching in primary schools over four grades, and claimed that this progression benefited the students; however, the evidence they provide for acquisition is largely anecdotal.
In sum, research on codeswitching in the L2 classroom needs to be conducted in a pedagogical context in which accessing and communicating meaning are primary objectives. At the same time we need to recognize that language classrooms are places where the aim is to acquire linguistic knowledge as well as to use it. Therefore a tension regarding both quantity of codeswitching and the functions to which it is put persists. This tension ultimately can only be resolved through empirical evidence of the relative effectiveness of codeswitching behaviour.
4 L2 vocabulary acquisition
An important quest of vocabulary research has been to establish how the L2 word is linked to the concept that the word represents (for a summary, see De Bot et al., 1995), the concept itself being assumed to be stored in long-term memory as a non-language-specific neural network (Blakemore & Frith, 2005; Libben, 2000). The vast majority of concepts, however, will have been experienced through the L1 and are therefore strongly linked to L1 lemmas. Studies investigating lexical development have therefore attempted to establish differences in the bilingual lexicon between learners at different levels of proficiency (Altarriba & Mathis, 1997; De Groot & Hoeks, 1995; Kroll & Stewart, 1994). Of interest for the purposes of the current study, is whether access to L2 lexis is directly via the concept or via the L1 equivalent. Here evidence points to differences among lower and higher proficiency L2 speakers in that the former are likely to be more reliant on L1 equivalents to access L2 lexemes (Kroll et al., 2002; Lotto & De Groot, 1998). This tendency may be accentuated where the L2 being learnt is not cognate with the learners’ L1, as is the case of Chinese and English (see a series of studies by Jiang, 2000, 2002, 2004).
Incidental vocabulary learning, as defined earlier, has not met with high levels of success unless there is additional ‘intention to learn’ (Hulstijn, 1992; Mondria, 2003). Laufer (2003) argues that to build up an adequate L2 lexicon entirely through, for example, reading for pleasure, takes too long and, in her (Laufer, 2009) assessment of vocabulary acquisition research, demonstrates the superiority of various types of intentional learning over incidental learning. However, Lexical Focus-on-Form, in order to create the intention to learn, is not an easily defined notion in terms of time taken up. Particularly, in the case of a listening comprehension activity (such as in the current study) we do not know what are the cost-benefits of Lexical Focus-on-Form by the teacher.
Researchers have turned their attention towards different types of intentional vocabulary learning in order to establish which is most effective. Thus the use of glosses in texts has received considerable researcher attention. This is where learners are engaged in examining written text, and where the intention is for them to learn vocabulary from the inclusion of written definitions, paraphrases, or other textual enhancements. Findings suggest (Hulstijn et al., 1996; Kim, 2006; Min, 2008) that vocabulary acquisition is furthered by such glosses or enhancements. However these studies did not explore whether an L1 or an L2 gloss was more effective. Three studies that did reported conflicting results. Jacobs et al. (1994) provided university students with glosses in either L1 or L2, but found no differences in the resulting vocabulary learning. In contrast, Laufer and Shmueli (1997) investigated different ways of presenting new vocabulary to students in high school classes both in terms of which language the item was glossed in, and the amount and type of context it was presented in. They found evidence of better learning via L1 glosses, but also in conditions where there was less surrounding text, suggesting that the amount of effort expended on attending to a gloss may be mediated by the demands of the comprehension task. In these studies, the proficiency level of the students (at least at an international level permitting direct comparisons) was not a variable, and therefore links with the above outlined theory are difficult to make. On the other hand, Miyasako (2002) found that groups given L2 glosses outperformed their L1 counterparts and that these results were mediated by proficiency level.
II Research questions
In the light of the above literature review and the context of the study, we adopted the following research questions:
To what extent is Lexical Focus-on-Form beneficial during a focus on meaning activity (such as listening comprehension) in terms of students’ receptive vocabulary learning?
Is students’ receptive vocabulary learning better facilitated by a teacher’s use of codeswitching or by providing L2-only information?
Do lower proficiency students benefit more than higher proficiency students from teacher codeswitching? In other words, is level of proficiency a covariate of the potential gains made from either condition?
III Method
We used an experimental design with randomization to learning conditions in ‘extracted’ (i.e. non-intact) classes, and with pretests, posttests and delayed tests. We controlled for teacher effect by having a single teacher (one of the authors) teach all conditions and we controlled for activity type by centring the vocabulary teaching episodes around one task type (listening comprehension).
1 Population and sample
The population for this study was first-year English-language majors in Chinese universities. We acknowledge that this is a vast population and that our sampling frame, one particular type of university in Shandong Province, may not allow the selected sample to accurately reflect the population. However, students enrolled in this university originated from many of China’s provinces, so the sampling frame did take into account important geographical and economic differences in that generally speaking students reach higher levels of English proficiency in urban environments than in rural ones. Enrolment was based on the national university entrance exam and we can report with some confidence that they were all from the middle tier of proficiency, thus reflecting at least the middle tier of the population in question.
In the university there were four classes in the first year, comprising 120 students approximately 19 years of age who began taking courses (September 2007) for 20 hours each week. Towards the end of September all students had the research project explained to them, and received an informed consent letter. Subsequently 117 students agreed to participate.
2 Allocation to groups
The 117 participants were first stratified into four proficiency levels and the stratified random allocation established three conditions (the extracted classes) as shown in Figure 1. As well as controlling for proficiency, stratified random allocation to three new groups was employed because each original class was being taught by different teachers, and we wished to reduce original teacher effect.

Stratified random allocation to the three conditions
Stratification was based on the combined scores of:
a general (nation-wide) English proficiency test involving reading, writing, vocabulary and grammar tests for which no reliability measures are available;
a listening comprehension test, devised by the researchers. The test was piloted to meet the level of the participants. A native speaker’s opinion was solicited on the authenticity and accuracy of the test. The test revealed an alpha reliability value of .75.
a vocabulary baseline test, which was devised by the researchers and was piloted. The test followed the same format as the posttests used to measure the effect of the intervention (see Appendix 1). The test revealed an alpha reliability value of .67.
Statistical analysis (the p value set at < .05) was performed on these three groups to establish that they were equal at baseline; descriptive statistics are reported in Table 1. Following tests for Homogeneity of Variances (Levene’s test: F = .884, df = 2, p > .05) and for normality of distributions (K-S test = .067, df = 114, p > .05), a one-way ANOVA revealed that there were no significant difference between the three groups in the combined scores, F(2,114) = .000; p = 1.000 ns, and for each of the three sets of scores: vocabulary pretest, F(2,114) = .336, p = .715 ns; listening comprehension test, F(2,114) = .082, p = .921 ns; general proficiency, F(2,114) = .142, p = .868 ns.
Descriptive statistics for each baseline test and combined baseline tests for each condition
The three groups were then randomly assigned to three conditions: Non-codeswitching condition (NCS); codeswitching condition (CS); control condition (CONT). The key independent variable between the three conditions was as follows: CONT did not receive any vocabulary-related explanation or information. NCS only received information or explanation of vocabulary in L2. In the CS condition the teacher briefly switched to Chinese in relation to the target lexical item. Put differently, NCS and CS received Lexical Focus-on-Form, but CONT did not. NCS and CS received different types of Lexical Focus-on-Form.
3 Procedure
After establishing the baseline proficiency tests, the vocabulary pretest was administered. One week later the instructional intervention began, and this lasted for 6 weeks. Two weeks after the end of the instructional intervention and last posttest a delayed test was carried out; thus the study was carried out over a period of 9 weeks. For a diagrammatic explanation of the study procedure, see Appendix 2.
4 The instructional intervention
The instructional intervention, for all three conditions, was additional to the students’ normal courses and was provided by an experienced bilingual teacher of English. It comprised 1.5 hours per week and was centred on listening comprehension activities. As well as addressing the gap in the literature, we felt that teacher-fronted listening comprehension activities would provide good opportunities for initial focus on meaning, while at the same time providing a much more tightly controlled framework than, say, general question-and-answer interaction.
For each session the teacher played the recorded texts through once. Next, multiple-choice comprehension questions were handed to students to complete and immediately collected in by the teacher. The procedure so far was kept constant for all three conditions, and the objective of the instruction was entirely focused on the general meaning of the text. At this point the three conditions diverged.
For NCS and CS the teacher played the text again in segments. During this process, new vocabulary contained in the texts was focused on, whether as a result of student requests or not. The vocabulary was written on the board, thus isolating the word from the speech stream and providing the students with grapheme–phoneme information. The teacher then provided the NCS with information and explanation of the word in English and the CS with equivalents in Chinese. Note that for a number of English words there is no simple one-to-one relationship between English and Chinese, the latter being a morpheme-based rather than lexically-based language, and that what amounts to an L1 paraphrase was given (see Appendix 3). For the vast majority of lexical items it took longer for the teacher to put across the meaning via the NCS condition than the CS condition. For example, of the words classified as ‘unknown to the students’ (see target items below) only 4 took longer to put across in CS than NCS. As a further example, the word ‘salient’ took 78 seconds in the NCS condition and 44 seconds in the CS condition.
For CONT, after the listening comprehension questions were collected in, the text was played again, to mirror the other conditions, but the focus switched to a discussion of listening comprehension strategies. This was ecologically valid, as well as ethically justifiable, as all students were required later to take an exam that included listening comprehension.
All sessions were videotaped and, additionally, an assistant teacher sat in the classroom and commented on fidelity to conditions. All information on lexical items was ‘pre-scripted’ in that the lessons were carefully pre-planned, and the information to be given to the respective conditions was written down in advance and memorized by the teacher prior to the lesson.
Due to inevitable attendance fluctuation over such a long period, not all 117 students took part in every test, as is shown in Table 2, and in statistical analysis we therefore used the list-wise method. CONT were only administered the delayed test not each posttest. This was in order to avoid sensitizing them to the aims of the study at subsequent instructional sequences. Moreover, for ethical reasons, in a Chinese context, it was inappropriate to test students on vocabulary not focused on during the instruction phases.
Number of students in each group for each vocabulary test
5 Listening texts
The listening texts needed to be at a level that was appropriately challenging and contained target words that the students as a whole were unlikely to know. On the other hand, they should not contain so many unknown or low-frequency words as to make the texts impossible to comprehend. On the basis of some original listening passages taken from a widely used English listening and speaking textbook (Zheng, 2003), a specially tailored set of listening materials was therefore devised following piloting. The number of target words in each text was approximately 10, that is under 5% of the overall tokens. All except one instructional session comprised 3 separate texts, covering a wide range of topics (for an example, see Appendix 4).
6 Selection of target vocabulary
A methodological challenge was how to include a sufficient number of target words to make the tests rigorous while ensuring that they would be unknown to every student. In order to do so we would have had to insert extremely low-frequency words rendering the listening texts inauthentic. We therefore decided not to exclude words that were shown, by the pretests, to be known to some of the students. The pretests identified 77 words that were not known by any of the students and 93 that were known by some. We justify this pedagogically in that no teacher is able to ascertain what every student knows in terms of every individual lexical item. This may, as one reviewer pointed out, seem like a large number of words to learn, even only receptively. However, there were in total of 17 listening passages over the six weeks, three for each intervention session, and each passage was about 257 words in length, so there were for each passage only about an average of 4.5 words new to all students, and around 10 words tested in all. The target words included both concrete and abstract words and covered the full range of content word class.
7 Vocabulary tests
We piloted and then adopted a simple receptive vocabulary scale as shown in Appendix 5. The scale allowed respondents to provide answers either in Chinese or in English in order not to bias one instructional condition against the other. When we examined the completed tests we observed that virtually all the CS group provided answers in their L1. Half the NCS group also opted to provide answers in their L1, that is, in a manner not congruent with their instructional condition.
Appendix 2 shows graphically the relationship between the instructional intervention and the testing process. Although the delayed test was only two weeks after the last instructional session, it should be noted that it contained words that the students had been exposed to as long as 7 weeks earlier. Administration of all tests was supervised by the researcher and another teacher.
Further details of the study design can be found in Tian (2009) but, in brief, the following were subjected to piloting and revision prior to the main study:
Four of the listening passages were piloted to gauge comprehensibility level and clarity of recording.
The time required to administer each listening text was calculated.
The teaching plan (the instructional sequence) was examined for its feasibility.
As note-taking was not to be allowed, the procedure was piloted to ensure that the students would not react negatively.
The adapted vocabulary test was piloted for validity.
The information to be provided to the students regarding the target words was piloted to ensure it was understandable.
IV Results
Our first research question explored the extent to which Lexical Focus-on-Form was beneficial during a predominantly meaning focused activity. Thus for this question NCS and CS scores were combined. Table 3 provides descriptive statistics for the pre, post, and delayed tests. As can be seen from Table 3, at posttest, the two groups receiving Lexical Focus-on-Form appeared to make very large gains. However, this dropped quite noticeably at delayed test. In order to examine these apparent within-group gains we first carried out a series of tests to ensure that parametric statistics could be applied. Histograms for both the vocabulary posttests and delayed test showed normal distribution curves. These were confirmed by Kolmogornov–Smirnov tests of normality (K-S = .078; df = 61; p = .200 and K-S = .064; df = 61; p = .200 respectively).
Descriptive statistics comparing Lexical Focus-on-Form with meaning-only related instruction on vocabulary learning (posttests) and retention (delayed test) (percentages)
We then carried out a repeated measures ANOVA. As the assumption of sphericity had been violated (p < .05), the degrees of freedom were corrected using Greenhouse–Geisser estimates of sphericity. Results showed that there was a significant effect for time of testing, with a very large effect size: F(1.17, 69.9) = 826.3, p = 0.000*,
Bonferroni Multiple Comparison results from the Repeated Measures analysis on NCS+CS
Note: *Adjustment for multiple comparisons: Bonferroni.
We then carried out an analysis of covariance (ANCOVA) in order to examine between-group differences at delayed test, using the pretest scores as the covariate and found a significant group difference F(1,95) = 28.07, p < .001*,
Research question 2 examined which type of Lexical Focus-on-Form was more beneficial during the interaction, one carried out exclusively in L2 or one introduced via teacher codeswitching to Chinese. Descriptive statistics for NCS and CS separately are also provided in Table 3. As we can see both groups appeared to improve their vocabulary knowledge in the six combined posttests (66.9% and 78.3% respectively) as compared to pretest (both at approximately 18%) but dropped at delayed test.
In order to establish whether there was a between-group difference, we carried out an ANCOVA. There was a significant group effect on posttest scores after controlling the effect of pretest scores, F(1,64) = 9.178, p = 0.01*,
Our third research question asked the extent to which proficiency made a difference to the students’ ability to benefit from the Lexical Focus-on-Form treatment. Recall that the students were classified into four general proficiency levels before stratified random allocation was made to conditions. We therefore first calculated the gain scores made by each of the two treatment groups in combined posttest. These are presented in Table 5.
Proficiency levels and vocabulary gain scores in each treatment group (M with SD in parentheses)
We then performed an ANCOVA with the Gain Scores as the dependent variable, Group as the independent variable and Proficiency Level as the covariate. We found that Proficiency was not a significant covariate of the Gain scores: F(1, 63) = 3.02, p = .08 ns. There was also no significant interaction between group and proficiency: F (2, 63) = 2.12, p = .15 ns.
We can now summarize the results as follows:
The students receiving Lexical Focus-on-Form instruction made substantial gains from pretest to posttests. These gains obtained consistently for each of the six posttests. Although the gains were substantially reduced at delayed test, they remained statistically significant compared to the pretest scores.
The students receiving Lexical Focus-on-Form instruction significantly outperformed those students that did not receive Lexical Focus-on-Form instruction. The effect size was high.
Among the students who received Lexical Focus-on-Form, students who received lexical information that contained L1 equivalents – as a result of the teacher codeswitching to Chinese – benefited more than students who received information in L2 only. However, this advantage was not sustained in the long term.
An influence of general proficiency on vocabulary learning via one instructional treatment or another was not confirmed.
V Discussion
This study investigated whether Lexical Focus-on-Form – that is, vocabulary learning contextualized by a listening activity – produced better vocabulary learning than no Lexical Focus-on-Form (incidental learning) and whether teacher codeswitching to L1 in order to put across the meaning of lexical items or teacher remaining exclusively in L2 was more effective.
The results suggest that Lexical Focus-on-Form, during (or at least closely associated with) a comprehension activity, is beneficial for vocabulary acquisition. The two treatment groups made significant gains over the control group in the long term. This supports findings of previous studies. For example, Mondria (2003) found that, in addition to inferring meaning from context, verification and memorization processes enhanced lexical acquisition. Thus, vocabulary learning appears to require attention to its form–meaning connections rather than only the general ideas contained in a text.
However, this study supports these previous findings in a relatively under-explored context of listening comprehension, which includes interaction between the teacher and the whole class rather than reading comprehension which usually denotes a student-centred activity. The implications of this are important. When Lexical Focus-on-Form is adopted in a reading task, there is the possibility that a student may go beyond the supporting information provided by the teacher by, for example, noting down the new word and looking it up in the dictionary in class (if permitted). Moreover, time dedicated to individual lexical items will vary considerably from student to student. In the case of a listening comprehension activity with teacher–class interaction, both these variables can be controlled much more effectively although not totally eliminated.
Our results also show some limited advantage for codeswitching as opposed to exclusive use of the L2. This is one of the first pieces of research evidence to show a codeswitching effect in an interaction context and merits further investigation.
This finding should, moreover, be considered in the wider debate on L1 use in the L2 classroom. The kind of L1 use adopted in this study respected the constraints to be found in naturalistic codeswitching by limiting itself almost entirely to brief switches for content words, maintaining English as the predominant language (the vast majority being intra-sentential switches), and by not violating the grammars of either language. To that extent we believe that the study has made a contribution to establishing principles for optimal use. Moreover, the findings related to codeswitching are located in a clearly defined, communicative instructional context.
We now need to qualify these two results. First, the strong Lexical Focus-on-Form effect, regardless of treatment type, was not as strong in the long term. The number of lexical items that were recalled two weeks after the last posttest was under half compared to the average posttest score. However, delayed test scores were still statistically significant in relation to the pretest test. This supports Toya’s (1993) finding that explicit information associated with items in listening input results in better vocabulary learning in the short term but less so in the long term. The implication of this seems clear: a single exposure to a new word does not permit enough consolidation in the mental lexicon and needs to be ‘frequently activated’ (Hulstijn, 2001, p. 286). Future researchers may wish to adopt the same design but adapt it so that target words are recycled over a number of sessions.
When viewed against the instructional procedure, the results seem to demonstrate that Lexical Focus-on-Form need not undermine the general communicative nature of the classroom interaction as the amount of time dedicated by the teacher to it was relatively small. In other words the whole lesson objective of listening comprehension was not abandoned in favour of carrying out lexis-based exercises. Time dedicated to Focus-on-Form in general is a pedagogical issue needing constant investigation. How much lesson time should be devoted to intentional vocabulary learning, when one considers all the other demands made on teachers and learners alike, is an important question. That codeswitching has some benefits for vocabulary learning becomes significant therefore when it is considered in relation to time taken up. Space has not allowed us a detailed analysis of the different time taken in the codeswitch as opposed to the English-only condition, but the information provided earlier (see method) and the examples provided in Appendix 5 clearly show that the English-only information took longer to communicate. Further evidence of this is in a linked report (Tian, 2009).
Our findings support the claim made by Ellis et al. (2001), who reported that teachers and students could navigate in and out of focusing on aspects of the code while still keeping the overall orientation of the message intact. Very importantly it also provides some evidence for multicompetence theory (Cook, 1992) in that the findings signal no obvious negative effect of mixing codes during classroom interaction, and support a finding reported by Guo (2007) with a similar population where no deleterious effect of the teacher mixing codes was reported by the students. Lastly, the findings contribute to the bilingual teacher versus monolingual teacher debate in that the codeswitching condition could only have been offered by a bilingual teacher.
The second qualification relates to the proficiency level of the students. According to vocabulary acquisition theory cited earlier, the students in our sample with the lowest proficiency should, hypothetically, have benefited the most from lexical information in L1. Our results do not show this.
In order for this finding not to undermine the theoretical model, we could speculate that differences in proficiency levels were not large enough to speak to the theory and that the codeswitching treatment effect might only manifest itself in near-beginner learners. A second possibility is related to word type. Consider that most of the target words were low-frequency words. It is possible that higher proficiency students were still only able to learn these words via the codeswitch condition. Put differently, once learners get above a basic level of proficiency, the theoretical relationship may lie more between word type and instructional method than general proficiency level and instructional method. This would fit better with Jiang’s (2002) hypothesis that even highly proficient L2 users will retrieve the meaning of L2 words according to their L1 semantic specification. Space in this article does not allow us to perform an analysis of word type and proficiency level. However, as we observed earlier regarding the way that students provided answers in the vocabulary tests; half of the students from the English-only group opted to provide Chinese equivalents rather than English synonyms or definitions. This suggests that, for a considerable number, despite the lexical information being presented to them in English, they recalled it in Chinese.
VI Limitations
We will now consider our third research aim against the limitations of the study as a whole. This study explored the research questions in an ecologically valid context, using easily recognizable pedagogical activities. However, it adopted a rigorous experimental design with stratified random allocation conditions, highly controlled instruction in order to eliminate interactional confounding variables, and a single teacher in order to control for teacher variables.
Perhaps the major limitation of the study, brought about by our attempts to adhere to a ‘normal classroom context’, is that we were not able to administer immediate vocabulary posttests to the control group. Thus, it is theoretically possible that they learnt an equal number of words entirely incidentally whilst listening to the texts. They then forgot what they had learnt and did significantly worse than the treatment group at delayed test. We doubt this to be the case with such a high number of new words introduced, but it is a possibility. Nevertheless we would maintain that for methodological considerations we did not want to sensitize them to our outcome variable. Similarly it is possible that the students in both CS and NCS conditions were sensitized to learning new words by the series of posttests even though we inserted a number of additional words in each treatment session as distracters in order to minimize this eventuality.
Another limitation of the study, this time brought about by our attempts to adhere to an experimental approach, is that the students were deprived of the opportunity to write down the target words and review them outside class, and this runs counter to normal pedagogical practice. We have suggested ‘lexical-recycling’ to reduce the impact of this limitation in future research.
VII Conclusions
This study proposes that the issue of teacher L1 use in the L2 classroom has to be investigated within tight parameters. For the debate to be meaningful, the classroom context in which L1 use is investigated needs to be a broadly communicative one where the pedagogical intention is to communicate meaning through the target language, although at times some Focus-on-Form is required. The interaction between teacher and learners should demonstrate that the predominant language (the ‘matrix language’) is the L2, and that the participants in the discourse respect many of the conventions of codeswitching found in the naturalistic environment. In the case of the current study the effect of codeswitching was measured as a tool for focusing on Lexical Focus-on-Form, during a combined pedagogical activity, listening comprehension and vocabulary acquisition.
This study is one of a few to have shown a relationship between teacher codeswitching and learning outcomes. Our findings suggest some benefit for teacher codeswitching over remaining in L2 thereby being compelled to put across form–meaning relationships via definitions and paraphrases. But the beneficial effect was not huge and therefore there is no implication here that teachers should constantly switch to L1 in order to provide the meanings of unfamiliar words. Further investigations need to be carried out regarding what types of words are most amenable to being put across solely in L2 and which require L1 equivalents, and this in relation to the cost–benefit of either approach. Research could also investigate, by using verbal report, the mental processes that different learners go through when being exposed to each of the two conditions. Moreover, future research could investigate whether certain concepts in a particular L1 exist in quite the same network of associations as they do in L2 (see the work of Jiang previously cited) and how teacher codeswitching might be more effective with these words than others. This brings us back to the notion of teacher as dictionary designer, mentioned earlier, a notion to which this study has contributed.
Footnotes
Acknowledgements
We are very grateful to the two anonymous Language Teaching Research reviewers who provided valuable advice for improving earlier drafts of this article.
Notes
Appendix 1 Vocabulary posttest 1
Appendix 2 Relationship between the instructional intervention and the testing process
Appendix 3 Examples of lexical information provided to the two treatment groups (NCS and CS)
Capriciously:
L2: somebody makes a sudden decision without thinking very much, does it capriciously
L1: 冲动的
Mechanic:
L2: somebody who is skilled at using or mending machines
L1: 机械工,机械师
Silhouette:
L2: the shape or shadow or figure of a person or something
L1: 轮廓,影子
Example of target word ‘Assume’ in interaction context:
NCS group
Recording: ‘any assumed right to smoke ends where a non-smoker’s nose begins’. Teacher: any assumed (writes ‘assumed’ on the board) right … ok assume means think that something is true or taken for granted Students: (silence) T: so … assumed right to smoke ends … means the right stops … it means that he thinks that he has a right to smoke, but actually there is no right to smoke.
CS condition
Recording: ‘any assumed right to smoke ends where a non-smoker’s nose begins’. T: any assumed (writes ‘assumed’ on the board) right to smoke S: (echoing) assume T: class, for assume we could say 假设 (tr: assume),假定 (tr: assume)’, right? S: yes T: so assumed right to smoke ends … means the right stops
Examples where L1 information is an L1 paraphrase, because there is no equivalent concept to match in L1 with the L2 new word
Subject to: 使某人遭受什么罪
Niche: 适当的位置
Tangible: 可以触摸到的
Appendix 4 Example of listening text (from session 3) showing target words
Although the relaxed American style is well known, many new visitors think that it shows a “lack of respect”. This is especially true in the business world. Americans
Americans seem either totally
A visitor to the United States should, therefore, understand that being in a great hurry does not
Appendix 5 Vocabulary knowledge scale used in all vocabulary tests
Note: instructions provided in L1; for an example, see Appendix 1
