Abstract
The present study investigates the collocational profiles of (1) three series of graded textbooks for English as a foreign language (EFL) commonly used in Taiwan, (2) the written productions of EFL learners, and (3) the written productions of native speakers (NS) of English. These texts were examined against a purpose-built collocation list. Based on the British National Corpus (BNC), the collocation list comprises 43,875 verb–noun collocations, the nodes of which were drawn from a prescribed wordlist (Jeng, Chang, Cheng, & Gu, 2002) to be learned on completion of the secondary education in Taiwan. Findings show that overall the collocational density and diversity of the textbooks are comparable to those of NS essays. Nonetheless, only small proportions of collocations within the repertoire were presented in the textbooks, and these collocations did not recur enough for the learner to consolidate collocational knowledge. Compared with their NS equivalents, learners’ writing exhibited an inordinate degree of collocational density and limited collocational diversity, suggesting that they did have the need to construct utterances with collocations, but were inhibited by a underdeveloped sense of collocational knowledge. Implications for learning/teaching collocations and materials designing are discussed.
Keywords
I Introduction
Collocation is a type of formulaic language, among idioms, lexical phrases, sentence frames, and so on (Wray, 2002). Insights from corpus linguistics have revealed that formulaic language is ubiquitous in a language, accounting for approximately one-third to one-half of any discourse (Erman & Warren, 2000; Foster, 2001). Considering the ubiquity of collocations in a language, learners should not be said to have mastered a second language / foreign language (SL/FL) without a developed sense of collocations, whether in recognition or production.
From a pedagogical perspective, it is now widely accepted that knowing a word entails much more than knowing the form–meaning link. Word knowledge consists of collocations, associations, grammatical functions, among other aspects (see the word knowledge framework in Nation, 2001). Collocation is an integral part of word knowledge, particularly associated with the productive use of words (Crossley, Salsbury, & Mcnamara, 2014; Schmitt, 2008). As well as learning the form–meaning links of the words, learners need to become familiar with their typical collocations.
Despite the importance of collocations from both linguistic and pedagogical perspectives, copious research has shown that in practice many FL learners have been plagued by limited collocational knowledge, whether in writing or speaking (Altenberg & Granger, 2001; Bahns & Eldaw, 1993; Granger, 1998; Hsu & Chiu, 2008; Nesselhauf, 2003; Wang & Shih, 2011). Compared with native speakers, FL learners have difficulty producing and recognizing collocations: Laufer & Waldman (2011) found that learners produced far fewer collocations, regardless of their proficiency levels, and even advanced learners were not impervious to producing interlingual miscollocations; Siyanova and Schmitt (2008) showed that L2 learners lagged behind their NS equivalents in terms of the underlying intuitions and the fluency with collocations.
Aware of the increasing pedagogical demands to develop learners’ collocational knowledge, some publishers of English language teaching (ELT) materials may highlight the collocations of the target words in the lexical syllabus. In many FL contexts the textbook is the major source of exposure to the target language (TL), and learners may not have much exposure to the TL outside the classroom. It is thus important to examine exposure as such and consider whether the textbooks selected are conducive to developing collocational knowledge. Also, collocational exemplars to which learners are exposed may suggest the extent of their knowledge of collocations. For the reasons above, it is necessary to examine the collocational profiles of ELT materials.
Collocation is an integral part of lexical knowledge, and lexis constitutes the basis of pedagogic materials, however, little research has looked beyond individual words to explore the use of collocations in such materials. Studies on the treatment of vocabulary in ELT materials focus primarily on vocabulary size, frequency, level of difficulty, lexical density or text coverage (e.g. Chujo, 2004; Dodigovic, 2005; Matsuoka & Hirsh, 2010). Even if sporadic attempts have been made to investigate the collocations in pedagogic materials, they tend to look specifically into particular collocations, for instance, Gouverneur (2008) examined the phraseological patterns of two high-frequency verbs make and take in English for General Purposes (EGP) textbooks; Ward (2007) investigated the collocations of three words gas, heat and liquid in engineering textbooks across sub-disciplines; and Liu (2010) compared the treatment of the verb–noun (VN) collocations of three words ability, work and trip between collocation dictionaries and textbooks. Still other studies focus on how well textbook exercises provide opportunities to practice collocations (e.g. Boers, Demecheleer, Coxhead, & Webb, 2014; Meunier & Gouverneur, 2007). Few studies have hitherto examined the overall collocational profiles of textbooks, with the exception of Koprowski (2005) and Koya (2004).
In an investigation of the usefulness of lexical phrases (e.g. collocation, idiom, phrasal verb) in mainstream ELT course books, Koprowski (2005) found that the usefulness of lexical phrases was compromised for the pedagogical need to be lexically comprehensive, resulting in less-than-useful collocations being included in the course books, such as play judo and do weightlifting. To take this line of inquiry a step further, Koya (2004) outlined the general collocation use in textbooks, with a focus on verb–noun collocations in ELT textbooks in Japan. The target textbooks were examined against a self-compiled list of 120 collocations selected from collocation dictionaries. Findings showed that only a small proportion of possible collocations were presented, among which over half occurred only once throughout the textbooks, without being recycled to allow learners to consolidate their knowledge of the target collocations.
In the researcher’s context of teaching, viz. Taiwan, ELT materials for the high school curriculum are designed on the basis of an English Reference Word List (Jeng, Chang, Cheng, & Gu, 2002) prescribed by the Ministry of Education. The wordlist (Jeng et al., 2002) consists of 6480 words to be learned on completion of the curriculum. As the pedagogic materials pivot on the prescribed words, the frequent collocations of these words deserve attention, and these words therefore provide a good starting point for investigating the coverage of collocations in the textbooks. In addition, whereas the wordlist (Jeng et al., 2002) provides a reference of the prescribed words, it could be put to a better use by complementing these discrete words with their frequent collocates. Hence, the present study set out to compile a list of the typical collocations of the prescribed words, and using this list, the collocation use of the major ELT materials in Taiwan and that of the written discourse of Taiwanese EFL learners were examined. The following research questions are formulated to guide the study:
What are the collocational profiles of the major ELT materials in Taiwan? How do these materials compare with one another in terms of collocation use?
What are the collocational profiles of the written discourse of Taiwanese EFL learners? How do these learners compare with native speakers of English in terms of collocation use in the written discourse?
II The study
1 A top-down approach to building collocational profiles
Previous studies took a bottom-up approach to identifying collocations in texts, namely extracting all the word combinations therein and then verifying collocability with collocation dictionaries or large corpora. For example, Nesselhauf (2005), Laufer and Waldman (2011) extracted VN combinations from their datasets and checked them against collocation dictionaries. Siyanova and Schmitt (2008) used frequency and mutual information (MI) in the British National Corpus (BNC) to determine the collocability of the word combinations produced by learners. Word combinations identified as such may not be entirely qualified as collocations in a statistical sense, or may be too uncommon in real language use. For example, to use a/the textbook is a frequent word combination but does not reach statistical significance in the BNC. On the other hand, the collocation dictionary entry to prophesy disaster is rarely used and only occurs once in the BNC, albeit with a high MI score of 8.21. Collocation dictionaries may warrant collocability but do not necessarily reflect commonness in real language use. In contrast, a top-down approach to examining collocation use entails compiling a prescribed list of statistically-verified collocations and then searching the target texts for the occurrences of these collocations. The rationale for adopting a top-down approach in this study is two-fold: (1) the textbooks under study are based on prescribed words, the collocations of which therefore merit investigation, and (2) more stringent criteria for delineating collocations can be applied to ensure their statistical collocability (as discussed further below).
2 A purpose-built collocation list
As collocations are common in a language, the sheer number of the collocations of the prescribed words is tremendous. To make the compilation of a collocation list manageable, only verb–noun (VN) collocations were taken into account in this study because (1) the vast majority of the prescribed words are nouns (i.e. 4567 nouns out of 6480 words), and (2) VN collocations are shown to be more problematic for learners than other types of lexical collocations (Bahns, 1993; Chang, 1997; Nesselhauf, 2003; Yamashita & Jiang, 2010). Hence, this study limited the scope of investigation to the VN collocations derived from the 4567 nouns on the prescribed wordlist.
BNCweb was queried for the typical VN collocations of the prescribed nouns and their association statistics, because the BNC is representative as a repertoire of general English in use. The following procedure was performed to compile the collocation list:
Extracting the VN combinations of the prescribed nouns using the BNC: within a span of four words preceding and following (–4, –3, –2, –1, +1, +2, +3, +4) the node noun, all verbs were identified.
Identifying collocations from the extracted VN combinations: The extracted VN combinations above were then subjected to MI and t-score computation to determine their statistical collocability. According to Hunston (2002), MI score is ‘a measure of how strongly two words seem to associate in a corpus, based on the independent relative frequency of the two words’ (p. 72). This measure is useful in highlighting salient but sometimes infrequent word combinations. That is, some words may not occur frequently in a corpus, but they tend to collocate with a particular word when they do occur. For example, the word extenuating has only 20 occurrences in the BNC, but 16 out of 20 times collocate with the noun circumstances, hence a high MI score of 10.47. In addition to MI score, t-score was taken as another selection criterion. Based on a standard deviation measure, t-score takes into account the corpus size as well as the absolute frequency of joint occurrences of the node and the collocate (see Hunston, 2002; Stubbs, 1995). Following Hunston (2002), only VN combinations that met the following criteria were accepted as statistically-verified collocations in this study: (1) a MI score of 3 or above, so that the constituents of the collocation are mutually expected to the point that the choice of the collocate is not sheer chance; and (2) a t-score of 2 or above, so that the occurrences of the collocation are frequent enough in real language use. Such stringent criteria ensure that only the statistically typical and frequent VN collocations be generated in the list.
Lemmatizing the inflected forms of the VN collocations: Collocations may appear in inflected forms, for instance, collocations write an essay, wrote essays, or writing the essay all fall into the same base form write essay. Therefore, collocations identified above were lemmatized for inclusion in the collocation list. Finally, within a span of ±4 words from the 4567 nodes there are 43,875 lemmatized VN collocations, Table 1 shows an example of the VN collocations of the word essay and their association statistics.
Verb collocates of the word essay and corresponding mutual information (MI) and t-score.
3 Texts under study: Textbooks, learner and NS written productions
Having compiled a collocation list, target texts including ELT materials, learner and NS essays were then examined for occurrences of the listed collocations. The three most common ELT textbook series used in the senior high schools in Taiwan are English reader for senior high schools (Chen, 2012) published by Far East, English Reader (Chou, 2012) published by Lungteng, and Sanmin English (Che, 2010) published by Sanmin. Each series of textbooks consists of six volumes designed for the six semesters of the high school English curriculum. For ease of identification, the textbook series are hereafter referred to as Far East (FE), Lungteng (LT) and Sanmin (SM). These textbooks are geared towards lower-intermediate EFL learners who have reached CEF (Common European Framework) A2 level (Waystage) and expect to achieve B1 level (Threshold) on completion of the high school English curriculum. The texts in these 18 volumes of textbooks constituted the textbook corpus for this study. All the written texts were examined, including extended texts (e.g. reading passages) as well as instructions, exercises, and so on. Some studies have looked into the collocations presented explicitly in vocabulary sections or exercises, while this study also took into account those occurring elsewhere in the textbooks. Because even if collocations are embedded in texts without being highlighted, they can still be learned incidentally (Webb, Newton, & Chang, 2013). Also, the textbook is the major source of the TL exposure for EFL learners, so all collocations therein need to be considered exhaustively, regardless of the sections in which they occur.
Learner written productions consisted of argumentative essays written by 692 undergraduate EFL learners in Taiwan, adding up to 149,875 words. The essay is the written assignment for a Freshman English course at a university in Taiwan. The student writers were a homogeneous group of L1 Mandarin Chinese speakers whose English proficiency ranged from CEF A2 level (Waystage) to CEF C1 level (Effective operational proficiency). The required word count of the essay is 200 words (±10%), namely between 180 to 220 words (for the writing instructions, see Appendix 1). Spelling errors were corrected and the texts were annotated with part-of-speech (POS) tags.
The NS baseline data was drawn from the Louvain Corpus of Native English Essays (LOCNESS), comprising the essays written by young adult L1 English speakers, including British A-level students and undergraduates, and American first-year undergraduates. The word count of the corpus is 324,304, but this study only sampled the subcorpus USARG, which is roughly comparable to the learner dataset in terms of size and genre. The USARG component comprises 149,574 words while the learner dataset has 149,875 words; and both sets of data consist of argumentative essays. Although both learner and NS data were obtained from similar age groups, it needs to be conceded that EFL learners are disadvantaged in making an argument in a FL. The writing topics were thus purposefully confined to the ones that were immediately relevant to the learners’ life. As opposed to the narrow focus of the learner writing, NS essays wrote about a wide range of themes, such as controversy in the classroom or abortion.
All the written texts collected were part-of-speech tagged. The nouns in the texts were identified, along with the verbs co-occurring within a span of ±4 words. The extracted VN combinations were then lemmatized for a comparison with the collocation list, and those matching the listed collocations were identified and the number of occurrences tallied.
4 Data analysis
The occurrences of the collocation tokens and types in the target texts were tallied for analyses. In the texts the same collocation type may occur more than once, so it is counted as having several collocation tokens. To illustrate, in the following excerpt, the nouns included on the wordlist are underlined, and the VN combinations within a span of ±4 words from these nouns are compared against the collocation list, finally yielding four matching collocation types: scientist believe, cell transform, memory store, information transfer. Also, the collocation type information transfer occurs two times, so it is counted as two collocations tokens. Altogether in this excerpt four collocation types and five collocation tokens are identified. See Figure 1.

An excerpt from the Far East series.
The collocational profile of a text was operationalized as the density and the diversity of the VN collocations used in the texts. Density was measured by (1) the number of collocation tokens, and (2) the number of collocation tokens per 1000 words. Diversity was operationalized as (1) the number of collocation types, and (2) the number of collocation types per 1000 words. Comparisons of the above indicators across textbook series (research question 1), as well as between learner and NS datasets (research question 2) were made to understand how collocation use was addressed in different texts. In addition, in order to understand if the textbooks provide further opportunity to allow learners to consolidate collocational knowledge, the textbooks were also examined in terms of the extent to which collocations were recycled throughout the series.
III Results
1 Collocational profiles of the textbooks
The first research question concerns the collocational profiles of the ELT materials commonly used in the high school English curriculum in Taiwan. The density, diversity and repetition of the collocations occurring in the three major textbook series were examined. Table 2 shows the descriptive statistics of the collocations in the textbooks.
Frequencies of occurrences of the collocations in the textbooks.
Note: Standardized frequencies per 1000 words are shown in parentheses.
a Density
Far East has 7458 collocation tokens, Lungteng presents 5680 tokens, and Sanmin uses 6456 tokens. As the frequency of occurrence of the collocations was not normally distributed, nonparametric tests were conducted to determine if each series differed from one another in terms of the raw number of collocation tokens. Using Friedman’s ANOVA, the result confirmed that there was a significant difference in the raw frequency of collocation tokens across the three series (χ2(2) = 81.19, p < .001). To further identify where the difference lay, a post-hoc Wilcoxon’s signed-rank test was conducted on three pairs of comparison (FE<, LT&SM, FE&SM). Bonferroni correction was used, so a critical value of .05 divided by three (i.e. the number of tests conducted) was accepted. Accordingly, p < .0167 was accepted as significant. The results ascertained that all three series differed from one another in terms of the raw frequency of collocation tokens (FE<: Z = −10.3, p < .001, r = −.03; LT&SM: Z = −7.02, p < .001, r = −.02; FE&SM: Z = −3.65, p < .001, r = −.01). In other words, Far East has significantly more collocation tokens than Sanmin, and Sanmin outnumbers Lunteng.
The frequency distribution of collocation tokens reveals a different picture when considered in relation to the text length of the teaching materials. The word count of each series of textbooks is as follows: Far East 270,981 words, Lungteng 255,346 words, and Sanmin 304,421 words. Adjusted for text length, Far East presents 27.52 collocation tokens per 1000 words; Lungteng has 22.24 collocation tokens per 1000 words; and Sanmin uses 21.21 collocation tokens for every 1000 words.
Overall, Far East has the highest collocational density, regardless of which measurement method (i.e. raw frequency or standardized frequency) was employed. Sanmin has more collocation tokens than Lungteng in terms of raw frequency, suggesting that overall the users of the former would be exposed to a larger number of collocational exemplars over the course of the curriculum. Nonetheless, as the text length of the Lungteng series is markedly shorter than that of Sanmin, Lungteng users would encounter a denser distribution of collocation tokens than their Sanmin counterparts.
b Diversity
Among the repertoire of the 43,875 statistically-verified collocations, Far East presents 3101 types of VN collocation, accounting for 7.07% of all the collocations on the list; Lungteng uses 2737 collocations, taking up 6.24% of the entire repertoire; and 3157 types are found in the Sanmin series, accounting for 7.2% of all possible collocations. It is worth noting that calculating the proportions of possible collocation types occurring does not presume that the entire repertoire of 43,875 collocations is to be presented in these pedagogic materials. Given the limited coverage of the teaching materials and class time, it is unrealistic to expect all the collocation types to be included. Moreover, as Koprowski (2005) points out, ‘As hundreds of thousands of multi-word lexical items pervade the language, it is clear that not every pattern can be brought into a syllabus nor will all combinations be equally useful for learners’ (p. 323). Despite unavoidable limitations, the fact that the materials contain such disconcertingly low proportions of possible collocation types does raise concerns over the adequacy of these materials in facilitating collocation learning. While the prescribed vocabulary forms the basis of these materials, their typical collocations should presumably constitute an important part of the lexical syllabus. Given the small range of collocation types presented in the materials, this aspect of lexical knowledge may not have been given due attention.
In general, all three series present rather small proportions of VN collocation types in the repertoire. Inferential statistics provide a clearer picture of how each series compares to one another in the raw frequency of collocation types. The Cochran test showed that there was a significant difference in the raw number of collocation types among the three series (the Cochran’s Q = 56.84, p < .001). A post-hoc McNemar test was conducted to determine where, among the three pairs of comparisons, the difference(s) lay. Again, Bonferroni correction was used, so a critical value of .0167 was accepted as significant. The post-hoc tests confirmed that the number of collocation types presented in the Lungteng series was significantly lower than both Far East and Sanmin (FE<: χ2(2) = 37.71, p < .001; LT&SM: χ2(2) = 47.99, p < .001), while Far East and Sanmin did not differ significantly from each other (χ2(2) = .79, p = .37). In other words, as far as the raw frequency of collocation types is concerned, Far East and Sanmin contain a wider range of collocation types than Lungteng. However, Lungteng’s smaller range of collocations is again attributable to its shorter text length. If adjusted for text length, Lungteng presents 10.72 collocation types per 1000 words, lying between Far East (11.44) and Sanmin (10.37). Generally, the learner would come into contact with more than 10 collocation types for every 1000 words s/he reads in any of the textbooks above. Far East again outperforms the other series in both raw frequency and standardized frequency of collocation types, displaying the highest level of collocational diversity. When the text length of these materials is taken into account, Lungteng no longer falls short of Sanmin in terms of the collocation varieties occurring within a given length of texts. But if Lungteng is designated as the teaching materials for the three-year curriculum, the learner will be exposed to significantly fewer collocation types overall.
Having a greater variety of collocations is generally favorable in an FL classroom, but when it comes to teaching materials, collocational diversity may be at the expense of the pedagogical need to recycle the same collocation throughout the series. That is, if more collocation types are to be presented in the materials, the fewer times each type will be repeated throughout the texts. Admittedly, it is difficult, if not impossible, to strike a balance between collocational diversity and sufficient repetition. There is no universal standard for judging the adequacy of collocational diversity in teaching materials, because it hinges on idiosyncratic, on-site pedagogical needs: in some cases learners need to be exposed to a range of varied collocations to be able to pick up new ones not yet existing in their repertoire, while in others recycling is necessary to consolidate learners’ knowledge of the collocations encountered before.
c Recycling collocations
A body of empirical evidence suggests that a learner needs to encounter a lexical item 5 to 16 times or more before s/he acquires it (Nation, 1990). The number of repetitions needed to learn a collocation also vary considerably, ranging from 5 to 15 times (Peters, 2014; Webb et al., 2013), but such empirical evidence attests to the necessity of multiple encounters in learning collocations. In spite of the empirical evidence, the frequency distribution of the collocations in the materials investigated does not seem to respond to the call for recycling collocations. Table 3 shows the repetitions of the collocations across series. In all series, over 90% of the collocations occurred less than five times throughout the textbooks (FE 92.71%, LT 94.78%, SM 95%). A closer look into the frequency distribution revealed that approximately 60% of the collocation types were not recycled at all: in the Far East series 59.43% of the collocations occurred only once throughout the series; in the Lungteng series 64.82% of the collocations did not recur; and 64.46% of the collocations in Sanmin had one occurrence. Only a scant few collocations (5% – 7.29%) were recycled as many times as suggested in the literature.
Number of repetitions of the collocations in the textbooks.
The 60 high-frequency collocations (> 5 occurrences) which are common across the three series of textbooks are shown in Appendix 2. Some of these collocations were embedded in the instruction sections or exercises, such as fill (in the) blank, listen (to the) conversation. The remaining ones are those denoting common activities in daily lives, such as spend time, read a/the book. It seems that these high-frequency collocations hardly need teaching, and yet they are the optimal opportunity to consolidate collocational knowledge the materials have to offer. As Sanaoui (1996) observes, a lexical syllabus is often not planned in its own right but organized around other pedagogical considerations (e.g. communicative functions) as a by-product of the curriculum. The scenario may be even bleaker in the case of collocations, as they are often the by-product of the lexical syllabus, in the sense that they are presented along with target words, rather than being the focus for teaching. In Brown’s (2011) investigation, collocations only took up as low as 8% of the vocabulary exercises in textbooks, while word form and meaning received by far the most attention in such exercises. Given that collocations are peripheral to the lexical syllabus, it is not surprising that collocations are not revisited in a systematic and principled way in the pedagogic materials.
Despite little repetition, admittedly, it is unrealistic to expect ELT materials to present a sufficiently wide range of collocations and at the same time to recycle them as many times as demanded in the literature. Also, not all the collocations need to be explicitly taught: some are straightforward, such as to drink water; and others are congruent with the learner’s L1 and thus do not pose difficulty, such as to break the silence (打破沉默 da po chen mo), in the case of L1 Mandarin Chinese speakers (for more information on L1 influence on L2 collocation learning, see Wolter & Gyllstad, 2011; Yamashita & Jiang, 2010). Even though we cannot expect pedagogic materials to provide all the exposure necessary to learn collocations, such a limited frequency distribution showed above is nonetheless a cause for concern. It is thus worth serious attention from the part of materials writers and practitioners to consider the adequacy of such materials as a vehicle for developing collocational knowledge.
2 Collocational profiles of the learner and NS productions
To gain insights into how learners avail themselves of the collocations they have at their disposal, their written productions were examined against the collocation list, with writing samples composed by NS equivalents as the baseline for comparisons. Table 4 shows the frequency distribution of the collocations in learner and NS essays.
Frequencies of occurrences of the collocations in learner and native speaker (NS) essays.
Note: CTTR: type–token-ratio of collocations; standardized frequencies per 1000 words are shown in parentheses.
Learner essays add up to 149,875 words in total, among which 4687 collocation tokens were embedded. For every 1000 words in the learner productions 31.27 collocation tokens were used, demonstrating a noticeably higher collocational density than either of the textbooks (27.52 for FE, 22.24 for LT, 21.21 for SM). However, only 2895 collocation tokens were identified in the NS productions, standardized to 19.35 collocation tokens per 1000 words. The fact that the NS ratio is lower than that of the textbooks may be attributed to different purposes intended: whereas the former presents the writer’s argument on a certain issue, the latter serves the pedagogical purpose to provide a more intensive exposure to the TL by adapting reading texts and containing vocabulary sections or exercises to recycle the target words and/or collocations, hence a relatively higher density of collocations. On the other hand, the collocation tokens found in the learner productions outnumbered those in all the textbooks as well as NS essays, suggesting that learners did use and/or have the need to use collocations in writing, even in a way much denser than their NS counterparts and textbooks did.
An even more striking difference between learner and NS writing was found in the number of collocation types used. The former used 639 collocation types while as many as 1628 types were found in the latter. Standardized for text length, learners produced 4.26 collocation types per 1000 words whereas NSs used 10.88 collocation types for every 1000 words they wrote. Furthermore, the two datasets displaying opposing predilections in the use of collocation tokens and types even contributed to a considerable difference in the collocation type–token-ratio (CTTR): while the CTTR of the learners was 13.63%, that of the NS group was a far more diverse 56.23%. Note that the type–token-ratio here refers to the type and token of collocations, as opposed to those of words as conventionally accepted, so it is hereafter called CTTR to avoid confusion.
In general the collocational diversity in the materials (number of collocation types per 1000 words: 11.44 for FE, 10.72 for LT, 10.37 for SM) approximated to that in the NS essays (10.88). In contrast, as the student writers had received exposure to English predominantly through either of the pedagogic materials above, it was assumed that their collocation use would have been at least comparable to these materials, if not better. Even though learners used more collocation tokens than NSs, the variation among those collocations was very small. In particular, the collocation attend class given in the writing instruction was used 1213 times, that is, a quarter of the collocation tokens belonged to this particular type. It is to be conceded that the narrow focus of the learners’ writing assignments may have, to some extent, contributed to the limited diversity in the use of collocations, but since the collocations in the repertoire are rather generic, as opposed to domain-specific or genre-specific ones, it did not preclude the possibility of learners using more collocational varieties. The findings provide a clear account of learners’ collocation use in the written discourse: learners do have the need to use collocations to express meanings, as evidenced by the inordinate number of collocation tokens; however, their limited repertoire of collocations does not allow them to use alternative forms, as substantiated by the small range of collocation types and disconcertingly low CTTR.
IV Conclusions and discussion
Numerous wordlists currently available give an indication of the vocabulary breadth that a learner is expected to display at a particular proficiency level. While wordlists are valuable in this regard, further aspects that constitute the depth of the learner’s vocabulary knowledge are often assumed to take care of themselves once the form–meaning link is learned. This study is among the first attempts to address the depth of an existing wordlist (Jeng et al., 2002) by augmenting it with collocations. Furthermore, the study demonstrated how such an exhaustive, corpus-based collocation list could be exploited to examine the collocational profiles of pedagogic materials and learner and NS written productions.
Among the textbooks investigated, Far East has the highest collocational density and diversity. Lungteng comes in second provided that the text length is standardized, but Sanmin outnumbers Lungteng in terms of raw frequency. In other words, Lungteng users would encounter a denser distribution of collocations while Sanmin users would be exposed to a larger sum of collocations over the course of the curriculum. Overall the textbooks exhibit collocational profiles comparable to that of NS productions, even though the types therein still represent a very small proportion of the entire repertoire of 43,875 collocations. Considering the fact that the collocations in question are derived from the prescribed words, it is only sensible if the frequent collocates of these words are given the same importance as the words themselves. In this sense, the collocation repertoire may have been underrepresented, so the materials writers need to contemplate how to design the lexical syllabus not only on the basis of the prescribed words but also their typical collocations. Furthermore, the vast majority of the collocations in the textbooks were not recycled to the point where learning is likely to occur. On the contrary, those recurring sufficiently did not seem to be worth the investment of valuable class time. One could argue that the repetition of collocations may be at the expense of collocational diversity; nonetheless, as far as learning is concerned, recycling is not to be compromised. It is suggested that the repetition of collocations needs to be taken into account when designing the lexical syllabus, be they recycled in the textbooks, workbooks or classroom activities. In addition, collocations with high pedagogical value should be preselected in the same way target words are and revisited in a principled manner throughout the curriculum to facilitate learning. It would also be useful to demonstrate the typical usage of the target words along with their frequent collocations, ideally formatted in a way that can draw learners’ attention (e.g. through highlighting, bolding) to these collocations to foster noticing. On the other hand, course books can never provide enough exposure and must be accompanied by substantial extensive listening and reading materials, so as to expose learners to a large number of collocations.
Learners’ collocational profiles are characterized by an inordinate number of collocations with small variations, compared with the collocational profiles of the materials with which they studied English and those of NS essays. The finding is not entirely coherent with the previous studies that have reported learners’ underuse of collocations (e.g. Altenberg & Granger, 2001; Laufer & Waldman, 2011), but the distinction between the collocation types and tokens here makes it clear that it is the collocation types that they underused, not the tokens. This in fact echoes Granger’s (1998) finding that learners tend to stick with a limited number of ‘safe’ formulaic sequences that they felt confident using. By the same token, Chen and Baker (2010) found that learners’ academic writing showed the smallest range of lexical phrases, as opposed to the widest range exhibited by published journal papers. It may well be that learners’ poor collocational knowledge hinders them from using alternatives to make their lexical profiles more diverse, albeit the perceived needs to construct utterances with collocations. They tend to ‘cling to’ a limited range of low stakes collocations with which they are familiar. The finding thus has implications for the pedagogic materials to expand the spectrum of collocations and, more importantly, for teachers to guide learners to venture beyond their ‘comfort zone’ and experiment with what they already have at disposal into productive use.
The collocation list may serve as a useful reference tool for materials writers as well as teachers/learners. For ELT materials writers, the collocation list may complement collocation dictionaries with more updated and truthful frequency information, as it gives an indication of the commonness of the target collocation in real language use. For teachers, vocabulary needs to be taught in context along with collocations, in order to help learners to turn passive lexical knowledge into active use. For example, the teacher may present a sample sentence such as Her views stand in stark contrast to those of her colleagues in order to illustrate how the target word contrast co-occurs with the verb collocate stand. Note that in the present study a verb collocate occurs within a span of ±4 words from the node, so the noun is not necessarily the direct object of the verb. Therefore, collocations need to be presented in context to demonstrate their typical usage.
It needs to be acknowledged that while the collocation list is a repository of VN collocations of the prescribed wordlist, its sheer number militates against direct application in classroom. To make it more user-friendly, future research may refine the list with multiple indicators to determine the pedagogical usefulness of each collocation, such as frequency of occurrence, semantic transparency, congruency to L1, number of collocates, position of the node, etc. In addition, this study complemented Jeng et al. (2002) wordlist with VN collocations, because VN collocations have been reported to pose the most difficulty for learners. It is hoped that further efforts would be made to expand the list with other types of lexical collocations (e.g. adjective–noun collocations), so as to make the collocation list more comprehensive.
Collocation is an integral part of lexical knowledge and particularly crucial in productive use of words, so knowing the range of typical collocations is as important as learning the lexical item itself. Words should no longer be viewed as discrete units to be learned or taught independently of their syntagmatic links. The collocation list may serve as a starting point to sensitize learners to the connections between lexical items and their frequent collocates, and hopefully over time learners will be able to avail themselves of the collocations listed as part of their vocabulary learning.
Footnotes
Appendix 1
Funding
This work was supported by the Ministry of Science and Technology {grant number NSC 00-2410-H-390-030-}.
