Abstract
This article examines the influence of different reading conditions (i.e. reading only and reading with first language marginal glosses), number of word encounters (one, three, and seven) while reading, and combinations of these two variables on new word retention. This study considered a total of six possible combinations. Six groups of Chinese learners of English as a foreign language (EFL) (n = 240) were randomly selected and each assigned to a condition including 15 target lexical items. Each treatment session lasted for 5 weeks. One delayed test, containing four dimensions of vocabulary knowledge, was intended to measure learners’ retention of unknown words. The delayed test was administered 2 weeks after the experiment and was not disclosed to the learners in advance. The groups whose reading was accompanied by first-language (L1) marginal glosses scored significantly higher than the reading-only groups. The increased effectiveness of repeatedly encountering target lexical items was more pronounced in the reading experiment including L1 marginal glosses. The combination of L1 marginal glosses and seven encounters was found to be the most effective combination for lexical item retention. This study highlighted the effectiveness of repeatedly encountering target words and being provided with L1 marginal glosses to retain new words incidentally learned from reading. The conditions and relevant teaching implications are discussed in this study.
I Introduction
The acceleration of vocabulary growth is a vital component of learning English as a foreign language (EFL). A prominent and specialized field of research has focused for many years on the effects of either intentional or incidental word learning (Laufer, 2009). ‘Intentional word learning’ refers to a deliberate attempt to commit words to memory, often including the use of rote techniques such as memorizing word forms and meanings (Hulstijn, 2003). In contrast, ‘incidental word learning’ refers to the acquisition of a word or expression without a conscious intention to commit the element to memory, such as picking up an unknown word from listening to someone else using it or from reading it in a text (Hulstijn, 2013). Intentional and incidental word learning each contribute to incremental growth in vocabulary knowledge but the speed of incidental learning will be much slower than intentional learning. Research has shown that intentional word learning yields significantly better results than incidental word learning (de la Fuente, 2006; Lessard-Clouston, 2013; Sonbul & Schmitt, 2013; Teng, 2015). While words can be taught explicitly, the length of a teaching session typically allows for a certain amount of explicit vocabulary instruction. In addition, although new words can be learned deliberately under the scope of a school curriculum, not all words learned intentionally are retained in memory over time. Such phenomena are considered barriers to vocabulary growth for non-native English speakers. Scholars have noted that native English-speaking adults understand an average of 22,000 to 32,000 words, whereas advanced second-language (L2) English learners possess an average receptive vocabulary size of 11,000 words (Nation, 2013). Non-native English speakers are unlikely to acquire a large quantity of words compared to native English speakers through intentional learning alone; many words can only be acquired incidentally via reading or other input conditions.
According to prior studies, vocabulary can be enriched incidentally through various learning activities such as reading (Pigada & Schmitt, 2006; Webb & Chang, 2015), listening (Van Zealand & Schmitt, 2013; Vidal, 2003), speaking (Newton, 2013), video viewing (Peters & Webb, 2018), writing (Folse, 2006), and translating (Hummel, 2010). So far, the majority of studies on incidental vocabulary learning have been conducted through reading (Webb & Nation, 2017). Learners can control their reading pace and reflect on unfamiliar words while having textual input at their disposal, making reading a specialized area of research in the incidental word-learning domain.
The main factors influencing incidental word learning during reading include repetition of new words (Webb, 2007), word-processing tasks (Laufer & Rozovski-Roitblat, 2015), elaboration of word processing (Eckerth & Tavakoli, 2012), dictionary use (Liu, Fan, & Paas, 2014), and glosses (Yoshii, 2014). Incidental learning of new words does occur while reading, but incrementally and in small quantities. Two main issues should be considered when determining whether reading is a good source of incidental word learning. First, context may hinder the inferred meanings of new words. The meaning of an unknown word may be obvious in some sentences but obscured in others; hence, incidental word learning is affected by the strength of contextual clues (Laufer, 1997; Teng, 2016; Webb, 2008). Second, encountering a new word only once does not aid in establishing a form–meaning link in a learner’s mental lexicon (Waring & Takaki, 2003). These two issues highlight the importance of marginal glosses and word exposure frequency for vocabulary growth. In terms of vocabulary learning, providing students with first-language (L1) translations of unknown words and increasing the frequency of word encounters may help learners to acquire new words (Webb & Nation, 2017). However, no studies have been conducted examining the combined effects of L1 marginal glosses and word exposure frequency in an English as a foreign language (EFL) context. The present study, focusing on university-level Chinese EFL students, explores the retention of new lexical items learned incidentally from reading by considering two variables – the frequency of exposure to target words and reading conditions (i.e. reading only or reading with L1 marginal glosses) – and combinations thereof.
II Literature review
1 Effects of word exposure frequency on incidental word learning
Previous studies on incidental word learning while reading demonstrated that the number of encounters with a new word may affect the learning and retention of vocabulary knowledge (McLeod & McDade, 2011; Pigada & Schmitt, 2006; Rott, 2007). Repeated encounters with target words have been shown to facilitate the learning of new words in small increments (Schmitt, 2010). For example, Waring and Takaki (2003) explored incidental learning of new words by having participants read a graded reader, A Little Princess, and recording the readers’ overall gains. Immediately after reading, the learners recognized the form of 15.3 out of 25 target words (61.2%); they recognized and recalled the meaning of 10.6 (42.4%) and 4.6 (18.4%) words, respectively. One-week-delayed tests indicated a slight decrease in their vocabulary knowledge as students recognized the meaning of 7.9 words (31.6%) and recalled the meaning of 1.9 words (7.6%); however, they managed to recognize the form of 11.1 words (44.4%). Students’ vocabulary knowledge declined significantly after 3 months; they were only able to recall the meaning of one word. The positive effects of word exposure frequency appeared when opportunities for learners to acquire frequently occurring words increased.
Webb (2007) explored the effects of repetition on learning using 10 dimensions of vocabulary knowledge. The occurrence frequencies of target words included one, three, seven, and 10 encounters, with results suggesting that all dimensions of vocabulary knowledge can be improved with an increasing number of encounters. This finding confirmed earlier work noting the positive cumulative effects of occurrence frequency on the incidental learning of new words (e.g. Horst, Cobb, & Meara, 1998). Webb also discovered that learners acquired the meanings of words more slowly than other aspects of vocabulary knowledge, consistent with previous results (Schmitt, 2010). In addition, Webb suggested that 10 encounters are needed to see a substantial improvement in learning vocabulary knowledge; he mentioned that over 20 encounters are required to develop complete understanding of a new word. Similar results were found in other work (e.g. Waring & Takaki, 2003).
Chen and Teng (2017) examined the learning of 15 words that occurred incidentally during reading. Four dimensions of vocabulary knowledge were measured, namely the recognition and recall of word form and meaning, respectively. A single encounter with new words resulted in respective 0%, 10%, 20%, and 18% improvements in retention as measured by these four dimensions of vocabulary knowledge. The results of five encounters indicated that recall of form increased to 4%, recall of meaning rose to 14%, recognition of form climbed to 34%, and recognition of word meaning was enhanced to 32%. The results of 10 encounters showed subsequent increases in word form recall (34%), meaning recall (38%), form recognition (54%), and meaning recognition (46%). However, encountering a new word 10 times did not necessarily guarantee that learners would develop complete vocabulary knowledge. In fact, Chen and Teng’s (2017) study found that incidental word learning was affected by the target word’s part of speech. For example, students learned verbs more easily than nouns and nouns more easily than adjectives (Zimmerman, 2009). In line with previous studies (Pellicer-Sánchez, 2017; Pellicer-Sánchez & Schmitt, 2010; Teng, 2014), word meaning recognition requires greater word occurrence frequency than recognition of the word form itself.
The aforementioned studies revealed similar findings. For instance, the number of times a word is encountered while reading affects the incidental learning of new words. Learners who encounter an unknown word more often may demonstrate significantly larger gains in vocabulary knowledge compared to learners who encounter an unknown word fewer times. However, results pertaining to the precise number of word encounters are inconsistent. Waring and Takaki (2003) recommended eight encounters for better retention of the meaning of new words. Webb (2007) found that, after 10 encounters with an unknown word, learners might be better able to recognize its spelling, meaning, part of speech, and other words associated with it. Pellicer-Sánchez (2017) identified no significant differences between two experimental groups (a four-repetition group and eight-repetition group) on form recognition, meaning recall, meaning recognition, collocation recall, and collocation recognition. Incidental word learning is expected to vary given its relationship with other factors, such as learners’ lexical proficiency (Tekmen & Daloglu, 2006), word type and function in speech (Zimmerman, 2009), and the context surrounding a word (Webb, 2008). In an empirical study (Teng, 2016), frequency alone did not appear to determine the likelihood of acquisition; the effects of context may have a more pronounced influence on gaining knowledge of a word incidentally while reading. Thus, the present study aimed to address the gap of whether marginal glosses, a way of intending to help learners grasp contextual clues surrounding a new word, can help learners compensate for the limited effects of word exposure frequency and strengthen the form and meaning of unknown words.
2 Effects of marginal glosses on word learning
A marginal gloss is a notation used to help a learner grasp the meaning of reading materials by providing additional information. Glosses are generally presented in the page margins for reference, serving as mediators between the text and the reader. Glosses provide specific definitions for difficult words to help readers decode the text. Extant literature has attempted to measure the effects of marginal glosses on word learning. For example, Eckerth and Tavakoli (2012) developed three tasks to explore the effects of marginal glosses on learning word meaning. The first task was based on marginal glosses included as a part of the reading material. Results revealed that the presence of marginal glosses had a statistically significant and positive effect on the learning of a new word’s meaning; however, no significant effects were found in subsequent 2-week delayed tests. Cheng and Good (2009) and Miyasako (2002) reported similar results, concluding that marginal glosses have a positive effect on word learning.
Hulstijn, Hollander, and Greidanus (1996) also examined the effects of marginal glosses and the frequency of the occurrence of unknown words on incidental word learning. In their study, Dutch students learning French were divided into three groups and asked to read a short story in French. The three groups included students provided with L1 marginal glosses, students using dictionaries, and a control group of students who received no additional information while reading. In the text used in the experiment, each of the 16 target words appeared either once or three times. The findings revealed that additional encounters with the new words fostered incidental word learning when learners were provided with L1 marginal glosses. Incidental word learning was also enhanced by the use of dictionaries. However, because their study was limited to either one or three encounters, possible outcomes when learners experience additional word encounters remain unclear.
Yoshii (2014) examined the effects of glosses on incidental word learning during readings in which 20 target words were introduced in a 300-word text read by 39 Japanese tertiary-level EFL learners. The results showed that glossing may exhibit a significant positive effect on word learning because the rates for correctly identifying words were extremely high in both tests: 94% for the immediate test and 80% for the delayed test. This finding indicates that glossing may help learners establish initial connections between form and meaning to facilitate word learning. These results coincided with those of Lomicka (1998) and O’Donnell (2013), who noted that students with access to glossing demonstrated an improvement in the number of causal inferences.
Given the results of the aforementioned studies, glosses seem to direct readers’ attention to unfamiliar words and encourage processing of the words’ meanings during reading. However, learners tend to skip over glossed words, referring directly to the glossed definition (O’Donnell, 2013). L1 glosses, which have been assumed to have direct and clear links between words and meanings, do not always enable students to remember the meaning of new words in a difficult text (Cheng & Good, 2009). Therefore, the extent to which glosses support incidental word learning warrants further consideration. The various results of using glosses for incidental word learning could be attributed to other factors, including students’ proficiency (Yoshii, 2014), the level of text difficulty (Ko, 2005), and the frequency of new word exposure (Teng, 2016). As proposed by Hulstijn et al. (1996), L2 learners may benefit from the combined effects of the reoccurrence of unknown words and the provision of information concerning their meanings. To increase the likelihood of incidental vocabulary learning, educational scholarship should evaluate how a generally low rate of incidental word learning can be improved. Among the reviewed factors, word exposure frequency and glosses are the main features affecting incidental word learning. These two points are also characteristic of many real-life EFL reading situations: (1) the print medium remains the most common way in which EFL learners read and interact with texts; (2) the text contains several words with which EFL learners are unfamiliar; (3) EFL learners require additional information about new words; (4) EFL learners need more incidental exposure to new words. Thus, it is necessary to examine the combined effects of word occurrences and L1 marginal glosses on incidental word learning in an EFL reading context.
III The present study
This experiment involved tertiary-level first-year students, differentiating it from the methodology employed in previous research in several aspects. First, in the present study, the combined effects of L1 marginal glosses and word exposure frequency on incidental word learning were measured, contributing to the limited body of empirical literature that combines these two variables. The present study aimed to discover an optimal combination of L1 marginal glosses and word exposure frequency.
Second, although the combined effects of word exposure frequency and L1 marginal glosses were measured by Hulstijn et al. (1996), target words were repeated no more than three times, which may not have provided sufficient evidence to measure the effects of word exposure frequency. The present study extended previous findings by including words that occurred seven times. Finally, this study intended to contribute to an exploration of the cumulative effects of repeatedly encountering new words on incidental word learning by using actual English words rather than non-words as in previous studies, which may have limited authenticity of the findings.
Overall, the results of the present study will provide insight into incidental word learning while reading. The following research questions are addressed in the present study:
What is the main effect of L1 marginal gloss on incidental word retention?
What is the main effect of frequency of occurrence on incidental word retention?
Is there an interaction effect between frequency of occurrence and L1 marginal gloss on incidental word retention?
IV Method
1 Research design
Drawing on previous studies that focused on the number of exposures, task types, and their combinations (Laufer & Rozovski-Roitblat, 2011, 2015), this study adopted a 2 × 3 between-participants design (type of reading input × number of target word occurrences), resulting in six conditions (see Table 1). Reading conditions included reading with L1 marginal glosses (R+L1) and without L1 marginal glosses (R). Following from Webb (2007), the independent variable – word exposure frequency – was operationalized at three levels: one occurrence, three occurrences, and seven occurrences. In this mixed-design study, 240 participants were divided randomly and equally into six groups, with each condition including 40 students. All groups were exposed to the same set of 15 target words.
Details on the results of the Vocabulary Levels Test (VLT) in each condition.
Note. Standard deviations are presented in parenthesis. The maximum score for each frequency band is 30 points.
2 Participants
A total of 240 participants (130 males and 110 females) were selected from 10 intact parallel classes held at a private university in China. Participants ranged in age from 18–20 years old. All were native Chinese speakers and first-year, non-English-major students who had studied EFL for at least 6 years.
Originally, 380 participants were recruited, 240 of whom were included based on the criterion of processing similar levels of vocabulary proficiency. Three months before the experiment, potential participants completed the Vocabulary Levels Test (VLT) developed by Schmitt, Schmitt, and Clapham (2001). Detailed test results are presented in Table 1. The 240 selected participants met the cut-off score of 26 out of 30 for the 2,000-word level (Schmitt et al., 2001), indicating mastery at this level. The analysis of variance (ANOVA) results revealed no significant differences between the conditions for each word level, suggesting the groups had a similar vocabulary size. Reliability indices (Cronbach’s alpha) for all Levels sections ranged from .91–.93, indicating sound reliability.
3 Target words
Based on the researcher’s teaching experience, 200 difficult words were chosen. The learners in each of the six groups were instructed to review a checklist of the words and check off any unknown words. The learners reported 50 unknown words, from which 15 words were selected for inclusion in the reading passages (Table 2). Words included in the pilot test appeared in the reading passages. The words were chosen based on the following criteria: First, target words included the most common parts of speech found in natural texts: nouns, verbs, and adjectives (Webb, 2008). The selected word set consisted of five nouns, five verbs, and five adjectives. Second, target words did not vary in length (9–10 letters each). Finally, target words had only one L1 translation.
A list of target words.
The checklist was provided in July, 2 months before the experiment. Participants in this EFL context were unlikely to encounter the target words in the interim, as the terms were uncommon in the students’ regular learning domain; the university was also on summer break. Participants were likely to forget the target words on the checklist, which enhanced the validity of post-test data.
4 Reading materials
Reading materials consisted of 30 passages selected from a story-reading book (Lv & Su, 2009). The average length of each passage was 1,100 words, and the main reading included 15 passages. The remaining 15 passages were used as distracter readings that attempted to divert the learners’ attention from the main reading.
Only one of the 15 target words was placed in each main reading passage, which was written in three versions. Version 1 contained one target word that occurred once, Version 2 contained one target word that occurred three times, and Version 3 contained one target word that occurred seven times. Participants in the ‘reading with L1 marginal glosses’ group encountered the target words once (Version 1), three times (Version 2), and seven times (Version 3). Participants in the ‘reading without L1 marginal glosses’ condition read the same texts without the L1 marginal glosses. The learning order of the 15 target words was identical for all participants. This step was necessary because learners’ working memory has been shown to yield different levels of acquisition depending on the order in which words are presented (Ellis & Sinclair, 1996). Each distracter reading was edited in the same way; the only difference was that the target words were not included in the distracter reading.
Students in both reading conditions were not allowed to use a dictionary or other learning aid. As proposed by Hulstijn et al. (1996), if dictionary consultations are permitted, then the retention of word meanings provided in the margin may be affected. One aim of the present study was to examine the effects of marginal glosses in word-learning retention. However, as repetition of the target words increased, then the text would become less dense, easier, and thus potentially not comparable in terms of the level of difficulty. To address this issue, two experienced college English teachers were invited to edit the texts with words within 2,000-word frequency level. As the two teachers were familiar with the students’ English proficiency, texts were revised to align with students’ skill level. This procedure sought to ensure the learners in the formal investigation would be motivated to read the article and would not become frustrated to the point of giving up upon encountering difficult words. A native English-speaking teacher was then invited to examine the passages and ensure the language flowed properly. In addition, the Flesch Reading Ease Score, a measure for assessing the difficulty of texts that had been used in previous studies (e.g. Crossley, Allen, & McNamara, 2011), was calculated for the 30 texts. Results revealed the average Flesch score to be 88.83, with the highest score (89.43) for Text 1 and the lowest score (87.75) for Text 5. Koda (2005) argued that text readability is based on features that operate at different levels of linguistic processing, including lexical, syntactic, semantic, and discourse features; however, readability formulas (e.g. the Flesch Reading Ease Score) are based on two linguistic features (i.e. lexical and syntactic). In the present study, word level was controlled in the texts, and the texts neither included technical terms, alternative forms of words (e.g. lemmas), or proper nouns nor required background knowledge for comprehension. Thus, the level of difficulty of the texts could be determined based on the Flesch scores and the minor differences in the Flesch scores shows that the edited texts used in the different conditions were broadly at the same level of difficulty.
Ten reading comprehension questions were provided at the end of each passage. These questions did not directly test learners’ understanding of the target words; they were developed based on the recall of facts appearing directly in the text. Answers could typically be found in multiple places, thus requiring learners to search through the passage to locate the answers. The comprehension questions were developed for several reasons. First, these items were intended to engage the learners in reading comprehension. Second, the conditions aimed to facilitate incidental word learning, as learners’ attention was diverted from certain unknown words and focused on an understanding of the text as a whole. Finally, these questions were developed to ensure that no comprehension-related difficulties would affect gains in vocabulary learning.
The word frequency in each text was analysed and confirmed using the Lexical Frequency Profiling (LFP) program, available on Tom Cobb’s website (lextutor.ca). Results indicated that most non-target words appeared in the British National Corpus’ list of the 2,000 most frequently used words, accounting for at least 96%–98% coverage of each text and implying that the edited texts were highly comparable; unknown words comprised less than 4% of each passage. As demonstrated by the VLT results, participants achieved mastery of the 2,000 most frequently used words, concurring with previous findings in which only learners with an understanding of at least 95% of words used were able to read the texts easily and comfortably (Hu & Nation, 2000; Laufer, 2013).
5 L1 marginal glosses
In the EFL context, glosses generally clarify difficult words by using definitions or synonyms. L1 marginal glosses in the present study provided explanations of the target words in Chinese and were emphasized because of their prevalence when providing reading instruction to EFL students in China.
In the ‘reading with L1 marginal glosses’ group, target words, along with several difficult words, were highlighted in bold type, and the Chinese meaning was provided in the right-hand margin. Words appearing more than once were accompanied by glosses for only the first encounter; other difficult words were accompanied by glosses to distract readers from focusing on target words exclusively.
6 Measurement instruments
A post-test containing four dimensions of vocabulary knowledge was adapted from a previous study (Laufer & Rozovski-Roitblat, 2015) and was administered to all participants to measure vocabulary knowledge. The four dimensions were as follows:
Active recall: This dimension measures the retrieval of a target word for a given meaning; that is, it measures whether learners can supply the L2 target word to fit the meaning prompt. In this study, learners were required to write the target word in English based on its Chinese equivalent. The first letter of the word was provided. For example:
A _______ (the target word is ‘adventure’) 冒险
Passive recall: This is a translation test to measure whether learners can supply an L1 translation of the L2 target word. Participants were required to supply a meaning in Chinese for the original English word. For example:
Adventure_____
Active recognition: This is a meaning-matching test to measure whether learners can identify the correct target word among four options in L2; hence, the prompt is the L1 translation. For example:
冒险 A. Detriment B. Adventure C. Jubilation D. Jeopardy
Passive recognition: This is a meaning-matching test to measure whether learners can identify the correct translation of the L2 target word among four options. For example:
Adventure A. 灾害 B. 冒险 C. 欢庆 D. 猜忌
The test was administered 2 weeks after the text reading. In the interim, participants were not exposed to any of the targeted materials, and the target words were not introduced in their college English courses. As the participants were non-English majors in a Chinese-speaking setting, they were not likely to encounter the words outside of the reading exercise. Because the purpose of the study was to measure the new word retention, an immediate post-test was not administered and was delayed for 2 weeks after the reading session.
The four dimensions of vocabulary knowledge were measured during two different lessons on two separate days. On the first day, the learners were required to take the active recall test before the first lesson; they took the passive recall test after the second lesson. A time gap was maintained to accommodate regular lessons. Learners also completed the other two test dimensions, active and passive recognition, in the same manner the next day (active recognition first, followed by passive recognition). This procedure minimized any potential clues the preceding test might provide regarding the subsequent test, as learners could potentially identify clues from the passive recall test for the active recall test. The order in which target words were presented in each test dimension was different, and learners were not notified that a test would be administered at a later date. Learners’ demonstrated improvement in word learning as indicated by the test results was assumed to be the outcome of incidental word learning (Hulstijn, 2013).
7 Scoring system
Two experienced raters independently scored the responses for all test dimensions. The raters were not teaching the participants; thus, the risk of scoring bias was eliminated. However, the assistance of a third rater would be sought if the original two raters demonstrated scoring discrepancies. Participants who provided a correct target word based on its given Chinese meaning were awarded one point. When the word was misspelled (e.g. ‘adventre’ or ‘aventure’ instead of ‘adventure’), the response was considered incorrect. Participants who provided a semantically correct Chinese meaning in the passive recall test were granted one point. In addition, learners who selected one correct option out of four on the active and passive recognition tests received one point. In cases in which either no answer or an incorrect answer was provided, the learner scored zero points. The maximum score for each test dimension in each group was 15, as each group was tested on 15 target words. No disagreements between the two raters emerged when marking the test, and a third rater was therefore not needed.
8 Procedure
This study was conducted in three phases: a pre-study vocabulary checklist, a treatment session, and a post-study 2-week-delayed test. Six English teachers with about 6 years of experience teaching English were invited to participate. They were familiarized with the study procedure through a group meeting. Each teacher was randomly assigned to one group or condition. The teachers instructed the participants that they would be completing reading comprehension questions without having any supplementary resources at their disposal. The students participated in silent reading; the use of a dictionary was not allowed. Teachers were expected to monitor the experimental process and to distribute and collect reading and test materials.
The learners in each condition group read 30 passages, each of which was three pages long. The 10 comprehension questions were provided on a separate page. The teacher distributed each passage separately. The 15 distracter passages in each condition were completed over five lessons (D1 to D5); the 15 main passages containing the target words were completed in another five lessons (M6 to M10). The main reading was included in the last five lessons to reduce the time gap between the test and learners’ encounters with the target lexical items. Each lesson lasted 90 minutes. Participants were required to read the three texts in a single lesson, but they received the texts individually. Thirty minutes were allotted for reading one text and completing the corresponding questions. This duration was suggested by the teachers and piloted with 40 students with similar backgrounds and proficiency levels. The students reported that the reading time was reasonable. Participants attended two lessons per week, and the full experiment lasted for 5 weeks.
Waring and Takaki (2003) observed that the meanings of items encountered fewer than eight times during reading were not remembered 3 months later. Following Laufer and Rozovski-Roitblat (2015), a 2-week period was deemed reasonable for measuring learners’ ability to retain newly acquired words as the time tended to be relatively stable. No specific pre-determined test duration was used to complete the post-test. As argued by Nation and Webb (2011), time-on-task is a variable that exerts a large effect on vocabulary outcomes, and the effect may not be apparent when time is controlled. Therefore, participants were allowed to take as much time as needed to complete the tests.
9 Ethical issues
With approval from the College English Teaching department at the chosen university, participants earned extra course credit for taking part in this study. Voluntary consent forms were obtained from each participant, who was entitled to withdraw at any time without penalty. Participants were informed they would need to complete some readings and related exercises; however, the learners were not told they would be taking a test. The test was not disclosed so as to prevent learners from devoting additional effort to the test or becoming anxious about being assessed. Although an extensive amount of reading was required for the experiment, the amount was normal for these participants based on the requirements outlined in their English course syllabus.
10 Data analysis
The effects of L1 marginal gloss and word exposure frequency on incidental vocabulary learning and the interaction between the two independent variables were analysed with Multivariate Analysis of Variance (MANOVA) in SPSS statistics 25.0. The assumptions of MANOVA were met. First, the data were normally distributed (the p value was larger than .05 for the Shapiro-Wilk test, and the skewness and kurtosis values were .359 and .342 respectively) (George & Mallery, 2010). Second, the post-tests of the four vocabulary dimensions largely conformed to the MANOVA requirement of inter-correlations, which ranged from .35–.45. Four dimensions of the post-test were analysed as dependent variables and the frequency of occurrence and gloss type were analysed as independent variables in this MANOVA test for the main effects, as well as their interaction effects. Post-hoc pairwise comparisons were used to look at differences between groups. The significance level was set at 0.05.
V Results
Table 3 summarizes the descriptive statistics for the four tests. Within each word occurrence frequency condition, reading with L1 (R+L1) marginal glosses led to higher scores than reading without L1 marginal glosses. Additional encounters with the target words yielded better scores within each reading condition. Among the six groups, the L1 marginal glosses + seven times word exposure group achieved the best results for the four test parts (Active recall: 6.00; Passive recall: 7.03; Active recognition: 8.95; and Passive recognition: 12.63).
Word retention scores per degree of word knowledge (maximum = 15).
Box’s Test for Equivalence of Covariance Matrices (Box’s M test) evaluates the null hypothesis that the observed covariance matrices of the dependent variables are the same across groups. The Box’s M test statistic is transformed to an F-statistic with df 1 and df 2 degrees of freedom (df 1 = 50, df 2 = 100497.306). The significance value of this test is less than 0.05, suggesting that the MANOVA assumption of homogeneity of variance-covariance is violated. In this case, Pillai’s trace, rather than Wilks’ Lambda, was used as the test of F-statistic in MANOVA for the main and interaction effects (Cramer & Howitt, 2004). The results in Table 4 demonstrate that there is a significant effect of L1 marginal gloss (V = .742, F (4, 231) = 165.814, p < .001, ηp2 = .742) and word exposure frequency (V = .892, F (8, 464) = 46.719, p < .001, ηp2 = .446) on incidental word learning, as well as a significant interaction effect of L1 marginal gloss and word exposure frequency on incidental word learning (V = .195, F (8, 464) = 6.258, p < .001, ηp2 = .097).
Results of multivariate tests.
Notes. Design: Gloss + Word exposure frequency + Gloss * Word exposure frequency. aExact statistic.
None of the Levene’s tests of equality of variances for each dependent variable was significant (p > .05). The results strengthen the case for assuming that the multivariate test statistics are robust. Following this, the univariate results, i.e. ‘tests of between-participants effects’, were examined to find out the significance of the independent variables with respect to the four vocabulary knowledge dimensions. The results in Table 5 show the main effects of L1 marginal gloss (p < .001, ηp2 = .558), word exposure frequency (p < .001, ηp2 = .845), and an interaction between them (p < .001, ηp2 = .145) on the active recall. The results show the main effects of L1 marginal gloss (p < .001, ηp2 = .668), word exposure frequency (p < .001, ηp2 = .857), and an interaction between them (p < .001, ηp2 = .171) on the passive recall. The results show the main effects of L1 marginal gloss (p < .001, ηp2 = .693), word exposure frequency (p < .001, ηp2 = .765), and the interaction between them (p < .01, ηp2 = .045) on active recognition. The results also show the main effects of L1 marginal gloss (p < .001, ηp2 = .625), word exposure frequency (p < .001, ηp2 = .778), and an interaction between them (p < .05, ηp2 = .031) on the passive recognition.
Results of between-participants effects.
Notes. a R Squared = .873 (Adjusted R Squared = .870). b R Squared = .892 (Adjusted R Squared = .889). c R Squared = .848 (Adjusted R Squared = .844). d R Squared = .839 (Adjusted R Squared = .835).
Figures 1–4 show the interaction effect of gloss and word exposure frequency on the four vocabulary knowledge dimensions. Figure 1 shows the active recall results. The first contrast for the interaction compares level 1 (word exposure 1 time) to level 2 (word exposure 3 times). This contrast is highly significant (Table 5). This result suggests that increased scores in word exposure of 3 times compared to word exposure of 1 time found for reading with L1 marginal gloss is significantly more than for reading without L1 marginal gloss. So, in the first contrast, the slope representing reading with L1 marginal gloss is more positive than the slope representing reading without L1 marginal gloss. The pronounced effectiveness of word exposure of 3 times, compared to word exposure of 1 time, is greater for reading with L1 marginal gloss than for reading without L1 marginal gloss. The second contrast compares level 2 (word exposure 3 times) to level 3 (word exposure 7 times). Again, this contrast is highly significant (Table 5). The results show that the increased scores in word exposure 7 times compared to word exposure 3 times found for reading with L1 marginal gloss is significantly more than for reading without L1 marginal gloss. So, in the second contrast, the slope of representing reading with L1 marginal gloss is also more positive than the slope representing reading without L1 marginal gloss. The results conclude that the effectiveness of word exposure 7 times, compared to word exposure 3 times, is more pronounced for reading with L1 marginal gloss than for reading without L1 marginal gloss. Figures 2–4 show similar patterns for passive recall, active recognition, and passive recognition, respectively.

The effect of gloss type and number of encounters: Active recall.

The effect of gloss type and number of encounters: Passive recall.

The effect of gloss type and number of encounters: Active recognition.

The effect of gloss type and number of encounters: Passive recognition.
The post-hoc pairwise comparisons of the reading conditions in each of the word occurrence frequency conditions were presented in Table 6. The results show that the R+L1 condition yielded significantly better results than the R condition, irrespective of the word frequency conditions and the test types (p < .001). This result implies a significant difference between the reading with L1 marginal gloss and reading without L1 marginal gloss in the number of retained words when the number of encounters with the words was identical in the two.
Post-hoc pairwise comparisons of reading conditions in each frequency band.
Notes. Based on estimated marginal means. *The mean difference is significant at the .05 level. aAdjustment for multiple comparisons: Bonferroni.
The post-hoc pairwise comparisons also show to what extent differences occurred in retaining new words between the three word occurrence frequency conditions – one, three, and seven – within each reading condition. The results listed in Table 7 show that a small increase in word encounters contributed to better new word retention in both reading conditions. The pattern that additional encounters with new words appeared to significantly increase new word retention was also consistent across dimensions of vocabulary knowledge (p < .001).
Post-hoc pairwise comparisons of word exposure frequency in each reading condition.
Notes. Based on estimated marginal means. *The mean difference is significant at the .05 level. aAdjustment for multiple comparisons: Bonferroni.
The post-hoc pairwise comparisons between the six experimental conditions (i.e. combinations of word occurrence frequency and reading conditions) were listed in Table 8. In terms of the active recall test, learners who encountered words three times in the R condition did not retain the new words significantly better than those encountering them only once in the R+L1 condition (p > .05). Likewise, learners who encountered words in the R condition seven times did not retain the words significantly better than those encountering them three times in the R+L1 condition (p > .05). The combination of encountering words for seven times in the R+L1 condition yielded the best results in active recall. In terms of passive recall, the combination of three encounters and the R condition was not more effective than the combination of one encounter and the R+L1 condition (p > .05). Relatedly, the combination of seven encounters and the R condition was not more effective than the combination of three encounters and the R+L1 condition (p > .05). Again, the combination of encountering words for seven times in the R+L1 condition yielded the best results in passive recall. In terms of active recognition, the only exception was that learners encountering words in the R condition seven times did not retain the words significantly better than those encountering them three times in the R+L1 condition (p > .05). The combination in yielding the best retention results was still seven encounters in the R+L1 condition. The results on passive recognition showed a similar pattern with active recall and passive recall, wherein three encounters in the R condition was not more effective than one encounter in the R+L1 condition (p > .05) and seven encounters in the R condition was not more effective than three encounters in the R+L1 condition (p > .05). Again, seven encounters in the R+L1 condition yielded the best results in passive recognition. Therefore, the data evidence suggests that the combination of seven encounters and R+L1 yielded the best results, significantly better than any of the other five combinations (p < .001). This finding was consistent across all four dimensions of vocabulary knowledge.
Post-hoc pairwise comparisons of gloss and ‘number of encounters’ combinations.
Notes. ***p < .001, **p < .01, *p < .05.
VI Discussion and conclusions
The present study explored the retention of 15 unknown words in situations using two variables and their combinations. In total, six experimental combinations (two reading conditions × three word occurrence frequency conditions) were examined. The learners encountered 15 target words for each combination and were monitored through one post-test, administered 2 weeks after the reading treatment, which analysed four dimensions of vocabulary knowledge. The following discussions were presented based on the research questions.
1 Effects of L1 marginal glosses
The first research question explored the main effect of L1 marginal gloss on incidental word retention. The results showed that the reading accompanied by L1 marginal glosses was more effective than reading without L1 marginal glosses, irrespective of the vocabulary knowledge dimensions. The significant effect of L1 marginal glosses on retaining new words is consistent with previous studies (Hulstijn et al., 1996; Rott, 2007). For example, Hulstijn et al. (1996) demonstrated that learners’ retention performance in the ‘marginal glosses’ group for words that occurred three times was 2.6 out of 8 (32%); learners in the present study achieved 4.60 out of 15 (30.6%). This outcome is slightly lower than that reported by Hulstijn et al. (1996), presumably because the contextual hints provided with the test items were less demanding and thus led to a better performance. In the study by Hulstijn et al. (1996), learners were required to provide meanings of the target words. However, the words were not provided separately; they appeared in the context of a few lines taken from the original text. This test was similar to the passive recall test employed in the present study, for which learners were asked to supply the meaning of the target words but the words were provided separately. Although measuring vocabulary in context could arguably be a more valid method (Read, 2000), a demanding test of measuring word learning in a decontextualized condition may provide a more accurate assessment (Webb, 2007). In addition, Hulstijn et al. (1996) focused on assessing only one aspect of vocabulary knowledge, which was unlikely to reveal the full extent of learning. Thus, evaluating different dimensions of vocabulary knowledge, as in the present study, may be more accurate.
In Rott’s (2007) study, three dimensions of vocabulary knowledge were measured independently: active recall, passive recall, and passive recognition. These were similar to those assessed in the present study; however, the retention performance for words occurring four times in the gloss-bolding condition in Rott’s (2007) work was 0.11 out of 24 (0.4%) for active recall, 1.57 out of 24 (6.5%) for passive recall, and 3.26 out of 24 (13.5%) for passive recognition. These results were lower than those words occurring three times in the ‘reading supplemented with glosses’ condition in the present study, in which a rate of 3.58 out of 15 (23.8 %) was achieved for active recall, 4.60 out of 15 (30.6%) for passive recall, and 10.05 out of 15 (67%) for passive recognition. These differences could be due to the length of time before the post-test was administered. The post-test in Rott’s (2007) study was administered 4–6 weeks after treatment, whereas that in the present study was administered only 2 weeks after treatment. As suggested by Waring and Takaki (2003), learners exhibit significantly decreased performance in new word retention after a longer intervening period. Word retention requires the isolation of the lexical form from its context along with some type of elaboration of meaning or rehearsal in working memory (Hulstijn, 2001); as such, learners who have not been exposed to a target word for longer than a 1-month period may easily forget its lexical form and unique meaning.
The role of glosses in word learning has been well documented in previous studies (Eckerth & Tavakoli, 2012; O’Donnell, 2013; Yoshii, 2014). Glosses have been found to aid participants in noticing and retaining more target items compared to learners in a control group who read without glosses. This pattern may emerge because glosses could compel learners to detect and comprehend unknown vocabulary terms, thereby strengthening word encoding in the mental lexicon (Teng, 2018a; Hulstijn et al., 1996). A clear explanation of the less significant effects of the reading-only condition may be that this condition was less obtrusive and failed to direct readers’ attention to the target words or trigger the processing of word forms and meanings. In contrast, when an L1 translation was provided in the margin along with the text, readers could access the encoded meaning in the mental lexicon. This link may reinforce L2–L1 word association, which has been described as a crucial step in lexical development (e.g. Jiang, 2002). In this work, glosses provided in the margin alongside the core text seemed to help learners deduce meaning and thus establish a connection between the lexical form of a word and its meaning.
2 Effects of word exposure frequency
The second research question explored the main effect of word exposure frequency in retaining new words. Results showed that participants in this study demonstrated significant gains in new word retention when target words were encountered more often, conforming to the findings of previous studies (e.g. Chen & Teng, 2017; Reynolds, 2018; Waring & Takaki, 2003; Webb, 2007; Webb & Chang, 2015). However, judging from partial learning gains in the present study, estimates vary in terms of the number of encounters needed for incidental word learning or retention from reading. It is difficult to specify an exact number of required word encounters from the evidence gathered during the present study. Incidental learning or the retention of lexical items appears to be a complex and incremental process, such that learning and retention are quite difficult to accomplish in a single encounter (Webb & Nation, 2017).
The absence of external information in relation to the meaning of new words might lead learners to ignore unfamiliar words or misinterpret their meanings. In addition, the recurrence of unfamiliar words and a push toward attending to them might decrease participants’ interest, thereby reducing their enthusiasm for encountering new words. Hence, although repetition is crucial, it is difficult to determine the exact number of encounters necessary for successful incidental word learning. The learners also experienced difficulties with learning some words; the mean score for the active recall test for words occurring seven times in the condition of reading without gloss was only 3.95 out of 15, mainly because active recall is more difficult than the other three dimensions of vocabulary knowledge. In addition, other variables may interact with repetition. For example, context may have a strong impact on the number of repetitions needed to gain knowledge of a word, particularly in the active recall of word meaning. Teng (2016) suggested that the availability of fewer clues for inferring a word’s meaning might lead learners to ignore other unknown words in the surrounding context. According to Laufer and Rozovski-Roitblat (2015), tasks that require focusing on words lead to better incidental word learning compared to a reading-only condition. Fewer exposures in a task, where some attention has been drawn to the words, are more effective than more exposures in the reading-only conditions.
3 Interaction effect of L1 marginal glosses and word exposure frequency
The third research explored a possible interaction effect between frequency of occurrence and L1 marginal gloss on incidental word retention. Results show the frequency effect in this study was more pronounced in the ‘reading plus L1 marginal gloss’ than the ‘reading without L1 marginal glosses’ condition in retaining the four dimensions of vocabulary knowledge. That said, the terms in the R group required more exposure to match the level of learning of the R+L1 group. The extra saliency provided by L1 marginal glosses minimized or modulated the effect of frequency. In the R only condition, in which glosses provided no extra saliency, frequency was not a greater determining factor than the reading plus L1 marginal gloss condition. As Laufer and Rozovski-Roitblat (2011) contended, the focus on words may be more efficient than additional exposure during reading without any focus on words.
Post-hoc results showed that a combination of reading with L1 marginal glosses and seven encounters with a word can greatly enhance the retention of incidental word learning. In this study, the active recall of new words (6.00 out of 15, 40%) seemed relatively low; however, the scores in passive recognition were 12.63 out of 15 (84.2%), suggesting that learners retained a significant degree of vocabulary knowledge, especially in the passive recognition of new words. This finding is encouraging given that this study was based only on incidental learning with a post-test administered 2 weeks after the reading treatment. In a previous study (Pellicer-Sánchez & Schmitt, 2010), learners were found to recognize only 70% of spelling forms after encountering the words more than 28 times. Thus, the combination of reading with L1 marginal glosses and encountering words seven times in the present study yields promising results for incidental word retention.
To the best of the author’s knowledge, no previous studies on incidental word learning (Pellicer-Sánchez, 2017; Waring & Takaki, 2003) – including the present one – have demonstrated any evidence that complete receptive and productive vocabulary knowledge can be achieved incidentally from reading. The present study provides sufficient evidence to suggest that the best word retention results can be achieved by encountering a word at least seven times in a text accompanied by L1 marginal glosses. However, in the conditions where the word meaning was not provided, learners had only a 26–30% chance of recalling the word after 2 weeks, despite encountering it seven times. If a word was encountered less than three times, the chance of active recall of new words dropped to below 10%. Assuring 95% known word coverage resulted in learners only acquiring a negligible number of words. Arguably, this is a result of learners’ low motivation or disinterest in the reading topic that disrupted the flow of comprehension and reduced the attention given to unknown words. Although 2 weeks seems a reasonable window after which to measure retention (Laufer & Rozovski-Roitblat, 2015), the time between treatment completion and post-test administration should be considered as potentially influencing the retention of unknown words. The findings should also be interpreted with care given the fact that there was no control group, meaning recognition pre-tests were not administered, and the time between post-tests may also affect learning outcome.
4 Implications
A few pedagogical implications for EFL teaching and learning can be proposed based on these findings. First, marginal glosses that accompany reading texts are effective in removing potential difficulties with incidental noticing, learning, and retention of new words (Hulstijn et al., 1996). The current study indicates that L1 marginal glosses can alleviate the cognitive and mental challenges linked with the retention of new words. This pattern offers some practical implications for instructional designers and course practitioners. As reading is a primary source of vocabulary growth, any tool that can relieve the lexical burden and facilitate word learning for students should be considered (Teng, 2018b). Teachers and instructional designers should therefore strive to provide learners with easily accessible glosses.
Second, as repeated encounters with target words can have a significant effect on the retention of new lexical items, teachers should pay special attention to the number of times a new word occurs in a text (Teng, 2018c). If possible, important words should be included and highlighted several times. Reading alone is not enough to acquire full vocabulary knowledge.
Finally, given the apparently drastic decrease in retention, if combining L1 marginal glosses and word exposure frequency could lead to better retention to some extent, then such exercises should be incorporated into course and materials design. However, a decrease in retention should still be considered. The four dimensions of vocabulary knowledge appeared to be correlated but also distinct. Some of these word knowledge aspects may be relatively sensitive to increased word encounters and L1 marginal glosses, such as recognition of word meaning; the more contextualized aspects, such as recall of word meaning, may be more difficult. Vocabulary acquisition seems to be incremental, as does mastering individual lexical items (Schmitt, 2010; Webb & Nation, 2017). The knowledge of different components of a word may be developed to varying degrees, a fact that should be taken into account when designing conditions for vocabulary learning.
5 Limitations and concluding remarks
This study had several limitations. First, it was difficult to control for out-of-class exposure to target words. A control group (who did not participate in the treatment sessions) may have been a solution to this problem. Second, only stories were included as reading materials; in future research, it would be beneficial to investigate whether the relevance of new words interacts with the text genre. In addition, whether learners were familiar with the genre was not explored. Incorporating learners’ topic preferences for passages may have an impact on retention rates. Third, the study did not include different gloss types, so subsequent research investigating the effects of different gloss types on new word retention is warranted. The small quantity of new words could have posed another limitation. Fourth, learners were required to review a checklist of the words and identify those that were unknown. This checklist was based on a form recognition test, and learners’ self-judgment may not be accurate. For example, definitively determining that learners were not able to recognize the meaning of a target word was challenging. In addition, not all post-tests were administered on the same day, and a 2-week gap occurred between the end of the treatment and the post-tests. Although participants were told not to look up words at home or talk to each other, uncontrolled behavior might have affected the results. Finally, as Webb (2008) suggested, some words may be easier to understand in informative contexts, while others may be comparatively difficult to grasp. The context factor was not controlled, suggesting that future studies should address the context in which words are introduced (Teng, 2016). Nevertheless, these limitations do not negate the findings of the present study, as it offers a valid experimental approach for measuring the efficacy of incorporating L1 marginal glosses and word occurrence on the retention of new words learned incidentally from reading.
Footnotes
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
