Abstract
To facilitate second language learning, it has become increasingly popular to use a second language as the medium of instruction for content subjects for majority language students. Although numerous research studies have shown the advantages of such kind of programs in North America and Europe, those investigating English as the Medium of Instruction (EMI) schools in Hong Kong yielded inconclusive results. This meta-analysis is the first attempt to synthesize the research evidence on EMI education in Hong Kong since 1970. Based on 24 studies, this meta-analysis shows that students in EMI secondary schools were more proficient in second language and performed better on measures of affective variables. Yet their learning in other content subjects suffered. The differences between the effectiveness of EMI education in Hong Kong and that of similar programs in other contexts will be discussed, thereby illuminating second language acquisition theories and bilingual education.
Keywords
Medium of instruction (MoI) refers to the language used when teaching nonlanguage academic/content subjects (e.g., mathematics, science, history). Students’ first language (L1) is normally the default MoI in schools, yet for various reasons, students’ second language (L2) may be adopted instead (Ho & Ho, 2004). Some examples include immersion programs in Canada, Content and Language Integrated Learning in Europe, and bilingual programs in the United States. In Hong Kong, a former British colony reverted to China as a special administrative region in 1997, there is a strong preference for learning through the L2, English, and these programs are mainly implemented in English as the Medium of Instruction (EMI) secondary schools.
Owing to some complicated political, socioeconomic, and educational considerations, the MoI adopted in secondary schools in Hong Kong has always been a source of major debates. Unlike other contexts where similar programs are practiced, there seems to be no conclusive research evidence revealing the advantages and disadvantages of using English (L2) over Chinese (L1) as the MoI in Hong Kong. By synthesizing the research evidence of EMI education in Hong Kong over the past four decades, this meta-analysis sheds some light on MoI issues in Hong Kong. EMI education in Hong Kong is an interesting context for research as the two languages involved are typologically more distant than the languages involved in similar programs in Western contexts (Genesee, 2006). By comparing the effectiveness of using L2 as the MoI in Hong Kong with other educational contexts, this study can in turn illuminate broader questions regarding L2 acquisition and bilingual education.
The Development and Rationale for Using L2 as the MoI
Most people in the world learn at least one language other than their mother tongue (Tucker, 1999), and so the major issue of L2 acquisition is how to learn the language more effectively. In traditional L2 learning programs, the L2 is usually taught in isolation as a subject, which usually occupies several 30- to 40-minute periods per week in the school curriculum. The exact amount of time varies from one school to another and from one country to another, depending on the role of the L2 in a particular context.
Since the St. Lambert experiment in Canada, which was an initiative to help Anglophone children learn French more effectively (Lambert & Tucker, 1972), immersion programs seem to be an attractive alternative to traditional L2 learning programs. The basic principle is to teach some or all nonlanguage content subjects through the language that students are learning as an L2 (Lyster & Ballinger, 2011; Stoller, 2008). In this way, students are exposed to more L2 input in meaningful, communicative contexts, and hence they learn the L2 incidentally (Genesee, 2006; Snow, Met, & Genesee, 1989).
In the psycholinguistic strand, the effectiveness of such kinds of programs in L2 learning is supported by several L2 learning hypotheses. According to the comprehensible input hypothesis (Krashen, 1982), when learning content subjects through L2, learners are inevitably exposed to more comprehensible L2 input, which then triggers the innate language acquisition device (Chomsky, 1959). Learners also tend to produce more L2 through interacting with their teachers and peers, so they can obtain more interactionally modified input that better suits their needs, receive feedback on their language use, as well as test their own hypotheses about the language. These processes are likely to contribute to L2 learning according to the interaction hypothesis (Long, 1996) and output hypothesis (Swain, 1995). In the psychological strand, the use of L2 as the MoI is likely to motivate students to enhance their L2 proficiency, which significantly affects their academic achievement. It is also argued that students will be more interested in learning L2, as that is the medium through which they can understand the subject content (Gardner, 1985; Genesee, 1991; Snow et al., 1989). Based on these theoretical underpinnings, using L2 as the MoI provides a favorable environment for L2 learning.
Using L2 as MoI for Majority Language Versus Minority Language Students
However, the potential benefits of using L2 as the MoI have been challenged by the contrast between the success of using French as the MoI when teaching Anglophone children in the Canadian immersion programs (reviewed in more detail below) and the limitations of using EMI when teaching immigrant children in U.S. schools (Skutnabb-Kangas & Toukomaa, 1976, as cited in Cummins, 1979). Although the former resulted in additive bilingualism without decrements in academic achievement, the latter resulted in partial bilingualism, where students were proficient in neither L1 nor L2 and suffered negative effects on their academic performance (Cummins, 1979).
Such differential effects of using L2 as the MoI for majority and minority language students can be explained by the theoretical model of bilingual education proposed by Cummins (1979, 2000). In this model, educational outcomes (including cognitive, academic, linguistics, and affective) are the function of the interaction of background, students’ input, and education programs. The two hypotheses incorporated into this model, namely, the developmental interdependence hypothesis and the threshold hypothesis, provide significant insights into the necessary conditions for effective bilingual programs. The developmental interdependence hypothesis proposes that the level of L2 competence is partially a function of students’ L1 competence when intensive exposure to L2 begins (Cummins, 1979). The threshold hypothesis suggests that students have to attain a minimum level of L2 and L1 competence in order to enjoy the potential benefits that bilingualism may bring to their cognitive and academic functioning (Cummins, 1979).
Following these two hypotheses, when immigrant children (e.g., Spanish-speaking students) enter schools in Anglophone countries, they are immersed in an all-English learning environment from Day 1, although reaching the threshold level of English to fully benefit from such a learning environment may take 5 to 7 years (Collier, 1989). Meanwhile, their L1 is not supported in the school or the community. Although immigrant children gradually learn English in their daily life, such conversational language proficiency is different from academic proficiency, which is essential for academic development (Cummins, 2000). These students then suffer cognitive and academic disadvantages in schools. In contrast, when majority language students start learning through L2, their L1 is often well developed, since it is the home and community language. It is also likely that they have developed some proficiency in the L2, which has been taught as an isolated subject in school. Hence, those majority language students tend to enjoy the academic and cognitive benefits of bilingual programs adopting L2 as the MoI.
The focus of the current study is bilingual programs that use L2 as the MoI for majority language students (hereafter called L2-medium education 1 ). Some well-known examples include the immersion programs in Canada noted above and Content and Language Integrated Learning springing up in Europe since the 1990s, where English or other L2s are often adopted as the MoI for several content subjects in schools (Coyle, Hood, & Marsh, 2010). In these contexts, plenty of research studies have provided evidence that using L2 as the MoI for majority language students brought about different advantages.
Students learning through L2 outperformed their counterparts in traditional language learning programs in various L2 skills, without any adverse effects on their L1 development (e.g., Admiraal, Westhoff, & de Bot, 2006; Jiménez Catalán & Ruiz de Zarobe, 2009; Swain & Lapkin, 1982). When compared with native speakers at similar stages, students in L2-medium education could achieve native-like proficiency in receptive skills such as reading and listening, though they could only attain near-native proficiency in productive skills such as writing and speaking, and were not comparable with native speakers in terms of phonology, vocabulary, grammar, and communicative or sociolinguistic competence (Cummins & Swain, 1986; Genesee, 1987; Harley & King, 1989).
Most important, the more advanced L2 proficiency of students in L2-medium education was achieved at no cost to their academic achievement (e.g., Bergroth, 2006; Turnbull, Hart, & Lapkin, 2001). In other words, compared with their peers in traditional L2 learning programs, students in L2-medium education were not adversely affected in their learning of academic subjects, even though those subjects were learned through L2. In addition, researchers have found other benefits of L2-medium education, including such cognitive benefits as mental manipulation, flexibility, divergent thinking, creative thinking, and originality (Cummins, 1977; Jäppinen, 2005; Peal & Lambert, 1962).
Social and psychological benefits have also been observed, in the sense that students learning through L2 felt more confident when speaking to native speakers and developed more positive attitudes toward L2 (Lasagabaster & Sierra, 2009; Merisuo-Storm, 2006). Furthermore, it has been noted that L2-medium education was suitable for all students, irrespective of IQ, learning disabilities classification, and socioeconomic status (e.g., Genesee, 1987; Holobow, Genesee, & Lambert, 1991). There are some recent reviews of L2-medium education in different contexts, such as Lazaruk’s (2007) article on immersion programs in Canada and Pérez-Cañado’s (2012) synthesis of research on Content and Language Integrated Learning in Europe. However, the research evidence for using L2 as the MoI is not so conclusive in the Hong Kong educational context. The following section will briefly review the issues regarding MoI in Hong Kong and the debate to date.
Using L2 as MoI in Hong Kong
Policy Development
In Hong Kong, Chinese (spoken Cantonese and written Modern Standard Chinese) is the mother tongue of the majority of population, although English is considered high status (Hoare & Kong, 2008; Tsui, 2004). Chinese is used as the MoI for primary education and students learn English as a separate subject. On the other hand, English is the MoI for tertiary education. It is in secondary schools where the Hong Kong government faces a constant dilemma when deciding which language, Chinese or English, is to be used as the MoI, given the interplay of political, socioeconomic, and educational factors (Lin, 2006; Tsang, 2004; Tsui & Tollefson, 2004).
Before the 1997 political handover, the Hong Kong government adopted a noninterventionist policy, under which schools were allowed to decide on their own MoI (Poon, 2010). At this time, over 90% of secondary schools claimed to be EMI schools (Falvey, 1998), where all subjects (except Chinese, Chinese History, and Chinese Literature) were taught in English. However, observations revealed that in many of these ostensible EMI schools, the use of mixed Chinese and English was prevalent (Johnson, 1983). In 1997, shortly after the political handover, the Education Department stated that all secondary schools should adopt Chinese as the MoI, except schools that had applied for and been granted an exemption. As a result, all except 114 secondary schools in Hong Kong (≈25%) became Chinese as the Medium of Instruction (CMI) schools. However, this policy aroused vigorous opposition in society (Falvey, 1998). In 2009, the government decided to “fine-tune” the mother tongue policy (Education Bureau, 2009) by allowing CMI schools that met certain requirements to have some approved classes. In those classes, schools can adopt a different MoI for different subjects, groups, or time periods, according to the needs and ability of the students and teachers. These changes in the government’s MoI policy have not resolved the controversy over the issue (Hoare & Kong, 2008).
Research Evidence to Date
Judging from the contextual factors, Hong Kong, Canada, and Europe should be rather similar, in the sense that students admitted to L2-medium education are self-selected, majority language students with a solid foundation in their L1. Considering such contextual similarities, So (1998) was convinced that many students in Hong Kong possessed “the necessary sociolinguistic capital” (p. 171) to enjoy the benefits of EMI education. Nevertheless, although the research evidence reviewed above shows the benefits of L2-medium education in North America and Europe, the effectiveness of EMI education implemented in Hong Kong is not so clear.
Since the 1970s, there have been quite a number of empirical studies evaluating EMI education in Hong Kong, and the conclusions drawn from these are confusing or intriguing. For example, K. K. Ho’s (1985) experimental study revealed that EMI students enjoyed no significant advantages in English proficiency over their CMI counterparts, and there were no significant differences in the learning of most content subjects between the two groups. Marsh, Hau, and Kong (2000, 2002), after following a large number of students for 5 years, suggested that EMI students benefited in Chinese and English proficiency but were disadvantaged in content subjects such as science, geography, and history.
Several researchers (e.g., Hoare & Kong, 2008; Yip, Tsang, & Cheung, 2003) have critically reviewed the previous studies and identified some potential factors that may have led to the discrepancies in the findings in previous studies. The first one is the time period when the studies were conducted. As Yip et al. (2003) suggested, studies done before the 1997 compulsory mother tongue policy bore the risk of investigating mixed-code schools, instead of EMI or CMI ones. The language use in classrooms after the compulsory mother tongue policy was under more stringent control of the government and so the types of schools in studies conducted after 1997 were more clearly identified.
Second, the age of students under investigation and the research design of the studies are potential moderating factors. Collier (1989) argued that immersion students normally needed 5 to 7 years to catch up with those studying in their L1, because immersion students needed a certain period of time to reach the threshold level of L2 proficiency to benefit from immersion education. Therefore, the age of students, which implies how many years they have been experiencing L2-medium education, may affect the findings of the studies. For instance, junior form students (Grade 7 to 9) in Hong Kong EMI schools have typically experienced 1 to 3 years of L2-medium education, whereas their senior form counterparts (Grade 10 to 13) should have received 4 to 7 years of L2-medium education. In a similar vein, research studies adopting longitudinal designs following students for a few years may be in a better position to capture the long-term effects of bilingual programs.
Third, the types of outcome measures may influence students’ performance. Different types of measurements were adopted in different studies—some adopted low-stakes tests, on which students were not encouraged to do revision in advance, and the test results would not affect students’ official academic results; some used high-stakes tests such as public examinations, on which students were highly motivated to do a great deal of revision. Students’ motivation and efforts associated with different types of outcome measures may therefore moderate the effects of MoI on achievement (Marsh et al., 2002). In addition, some researchers specially designed tests for their studies, whereas others simply adopted standardized tests or measures, such as public examinations and validated vocabulary tests. The nature and scope of those different types of outcome measures may also affect the evaluation of students’ performance in different bilingual programs.
Fourth, most studies evaluating the impact of MoI selected students from intact schools/groups. Such a design ran the risk of having confounding variables, one of which is students’ initial abilities before entering EMI or CMI secondary schools (which were normally measured in terms of students’ academic results by the end of primary education). Therefore, whether studies have controlled for prior differences in abilities may influence the findings of studies.
More systematic reviews of research studies on the MoI issue in Hong Kong were done by Lin and Man (2009) and Poon (2009). Lin and Man (2009) provided a general review of empirical research studies on MoI-related issues since 1970s, whereas Poon (2009) reviewed English language education in Hong Kong over the past 25 years. However, both these reviews are narrative in nature, and they mainly reported the findings of studies available, without synthesizing them and drawing conclusions on the impact or effectiveness of EMI education in Hong Kong.
The Current Study
In the current study, we synthesize research studies examining the EMI education in Hong Kong over the past four decades (1970–2010), using the technique of meta-analysis, which is “a procedure for integrating the results of empirical research studies” (McGaw & Glass, 1980, p. 1). By doing so, we hope to come to a more definitive conclusion on the potential effectiveness of EMI education, which can then shed light on the long-lasting debate over the MoI issues in Hong Kong and inform policymakers on future policy formulation.
Furthermore, the synthesized results of studies examining EMI education in Hong Kong can be compared with those investigating L2-medium education in other contexts (e.g., Canada and Europe). Such a comparison will be particularly insightful for L2 acquisition and bilingual programs, since the languages involved in those contexts are considerably different in nature—in Hong Kong, the languages involved are Chinese and English, which are typologically distant and belong to distinct language families; in the Canadian and European contexts, the languages involved are very often English, French, German, and other European languages, which share common origins and are considered as more similar in typology (Genesee, 2006; Janson, 2012). A comparison of the research studies in those contexts may then reveal the potential moderating effect of language typology on the effectiveness of L2-medium education.
Research Questions
As reviewed above, the potential benefits of using L2 as the MoI include development in L2 proficiency without sacrificing L1 proficiency or academic achievement as well as benefits in affective variables such as learning motivation, strategy, interest, self-efficacy, and so on. Hence, to evaluate students’ learning in EMI education in Hong Kong, the three primary research questions of this meta-analysis are as follows. 2 First, what is the difference in academic achievement between students studying in EMI and CMI education? In particular, we were interested in differences in students’ Chinese and English language proficiency and in students’ achievement in such content subjects as mathematics, science, history, and geography.
Second, are there differences in affective variables, including self-concept, motivation, learning strategies, and interest, between students studying in EMI and CMI education? Third, to further examine the variables that may moderate the differences between students studying in EMI and CMI education, we examined the moderating effect of the following variables on the differences between students studying in EMI and CMI education: (a) the time period of carrying out the study (nonintervention, mother tongue policy), (b) the research design (longitudinal, cross-sectional, experimental), (c) the age of students (junior forms, senior forms), (d) the type of outcome measures (standardized tests, self-designed tests), and (e) control over initial abilities (with control, without control).
Method
Locating the Studies
The literature search was conducted with the keywords medium of instruction or immersion and Hong Kong, with studies published between 1970 and 2010. The databases that were searched included ERIC, SCOPUS, Hong Kong Education Bibliographic Database, and Education Bureau Central Resources Centre Catalogue, and the initial search yielded 178 studies. These were then screened with several additional criteria. First, the study must be empirical, rather than conceptual, theoretical, or a review. Second, the study had to compare students’ learning in EMI education with those in CMI education. Third, the study had to examine students’ learning in secondary schools, as the major dispute over MoI is at the secondary level (Tsang, 2004). Fourth, they had to have independent student outcome measures, such as achievement, self-concept, or motivation. In other words, the outcomes could not be teachers’ or students’ reflections, attitudes, or opinions.
Supplementary hand searches of the reference lists of some studies were also conducted. To ensure that the criteria of including studies were reliable, a sample of 20 of the 178 studies was screened by the second author, and the decisions to include or discard showed 100% agreement. The final sample consisted of 31 studies published in English. Although there were 11 Chinese articles that matched the searching keywords, none was empirical. The 31 studies were coded by the first author according to the variables of interest in the current meta-analysis, including (a) the year of study, (b) research design, (c) types of experimental and comparison groups, (d) independent variable(s) apart from MoIs, (e) dependent variable(s), (f) age of student participants, (g) types of tests/measures used, and (h) features of statistical analysis. Again, to ensure the reliability of the coding, one third of the studies were given to the second author for recoding. By dividing the number of items that were coded in the same way between the two authors by the total number of coded items, the estimated interrater reliability was 87%. The discrepancies were discussed between the authors and the coding of the other studies was modified wherever necessary. The coding protocol is in the appendix (available online at http://RER.sagepub.com/supplemental).
Statistical Analyses
Computing the Effect Sizes
For 13 studies, which included experimental studies and studies that compared EMI and CMI students without any statistical control, we estimated their effect sizes with Cohen’s d, which was calculated by dividing the differences in the means of the experimental group and comparison group with the pooled standard deviation (Lipsey & Wilson, 2001). Among those studies, one experimental study (Education Department, 1985) presented the adjusted means and standard deviations after considering students’ pretest scores. We therefore used those adjusted figures in calculating the Cohen’s d.
Two studies (Education Department, 1992; Lin & Morrison, 2010) compared EMI and CMI students with indicators other than means and standard deviations (e.g., success rate in public examination), so we adopted other formulas of effect size calculation (Wolf, 1986). Seven studies used hierarchical linear modeling to control for students’ socioeconomic background, initial abilities, and other factors. In those studies, the raw mean scores and standard deviations were normally not reported. Instead, those raw scores were all standardized so that the means became 0 and the standard deviations became 1. Hence, the simple formula for calculating Cohen’s d could not be applied and our preferred measure of effect size was the path coefficient of the variable, medium of instruction, in the hierarchical linear models. For three studies (C. C. Ho, 1989; Lau & Yuen, 2011; Lin & Morrison, 2010), several ways of comparing students’ academic achievement were reported; hence, more than one estimation of effect size was used.
During the process of computing effect sizes, 7 studies from the original pool of 31 were discarded because there was insufficient information for estimation of effect size or the data reported were also reported in another study. In the end, 24 studies were included in the final meta-analysis, 15 of which focused on academic achievement, 2 of which examined affective outcomes, and 7 of which measured both.
The basic principle of calculating effect sizes was to treat students studying in EMI schools as the experimental group and those in CMI schools as the comparison group. In some studies, there were some mixed-medium groups (e.g., mixed-medium, Chinese by subject, Chinese by class). These groups were excluded in the current analysis as the term, mixed, was used in different ways across studies, and including these groups would inevitably result in ambiguities. For some studies, the differences in students’ initial abilities were taken into account while they were not in the other studies. To avoid loss of information, we decided to include all the effect sizes first, and had one moderating variable, control over initial abilities, to determine its impact on the effect sizes.
Combining Effect Sizes
Each effect size was first multiplied by the inverse of its variance to yield the weighted effect size. Then the sum of all the weighted effect sizes was divided by the sum of the inverse variances (i.e., the weights) to generate the overall mean effect size (Lipsey & Wilson, 2001). For most studies, there was more than one effect size. This outcome occurred when, for example (a) the experimental group (EMI students) was compared with the comparison group (CMI students) on different variables such as Chinese, English, mathematics, history, self-concept, motivation, and so on; (b) when students studying in CMI schools were further categorized into different subgroups according to their abilities (e.g., CMI-High group, CMI-Middle group, and CMI-Low group), and then comparisons were made between EMI students and each of those subgroups, respectively; or (c) when the development of students was traced for several years by testing the students at regular intervals (e.g., at the end of each grade).
To capture the maximum amount of information without violating the assumption of independence of effects, we decided to adopt a shifting unit of analysis approach (Cooper, 1998; Cooper, Robinson, & Patall, 2006). In this approach, every effect size of a comparison in a study was first recorded. For example, students in one study were compared on their Chinese and mathematics achievement. Thus, two effect sizes were calculated for that study. When analyzing the difference in students’ overall academic achievement between EMI and CMI education, the two effect sizes in that study were averaged before entry into the analysis so that the sample only contributed one effect size (Borenstein, Hedges, Higgins, & Rothstein, 2009). On the other hand, when analyzing the difference in students’ achievement in different academic subjects, that study would contribute one effect size to the analysis of Chinese and mathematics achievement, respectively.
Testing for Publication Bias
Both published and unpublished studies (e.g., master dissertations, government reports) were included in this meta-analysis. Yet publication bias could still exist. Therefore, we generated two funnel plots (one for academic achievement and the other for affective variables) to examine the distribution of effect sizes in relation to the studies’ sample sizes. These funnel plots are presented in Figures S1 and S2 (available online at http://RER.sagepub.com/supplemental). Our impression was that there was little publication bias, since quite a number of studies with small sample sizes were included, and most studies were distributed symmetrically about the mean effect size (Borenstein et al., 2009). Nevertheless, we noted that Salili and Lai (2003) and Ripple, Jaquish, Lee, and Salili (1984) were potential outliers for the analysis of academic achievement and affective variables, respectively. We ran the main analyses without the outliers and found very minor differences in the mean effect sizes (±.01) when compared with the analyses for all studies (see Tables S1 and S2 for the results of main analyses without the outliers, available online at http://RER.sagepub.com/supplemental). We therefore decided to report the synthesized results for all 24 studies in this article.
Addressing the Research Questions
In the primary research questions, the independent variable is MoI (i.e., EMI vs. CMI) and the dependent variables are the mean effect sizes of academic achievement and affective variables in the 24 studies. To provide a more fine-grained picture of the differences between EMI and CMI education, the mean effect sizes of six core subjects—namely Chinese, English, mathematics, science, history, and geography—were also computed. Similarly, for the affective variables, the mean effect sizes of self-concept, motivation, learning strategy, and learning interest were calculated. For the subcomponents of self-concept and learning interest, apart from the overall mean effect sizes, those for some major subjects (e.g., Chinese, English, mathematics, science) were also estimated. In a similar vein, the mean effect sizes of three major types of motivation and learning strategies, namely, surface, deep, achieving (Biggs, 1987), were calculated.
However, not all included studies interpreted or measured subcomponents in the same way. For instance, although Marsh’s (1989) Self-Description Questionnaire and Biggs’s (1987) Learning Process Questionnaire were adopted by quite a number of studies to measure students’ self-concept and learning motivation and strategy, respectively, some other measurements were used in other studies (e.g., the PISA questionnaires that estimated students’ learning interest in Ho & Man, 2007, study). Also, although learning strategies were generally categorized into surface, deep, and achieving in a majority of studies, different categories were found in some other studies (e.g., the addition of rote learning strategy in the report of Education Bureau, 2004; the categorization of strategies into rehearsal, organization, and elaboration, among others, in Salili & Lai’s, 2003, study). In view of these differences in measurements and categorization, we first estimated the overall mean effect sizes of those studies that included those subcomponents (i.e., self-efficacy, motivation, strategy, and interest) and then only included the mean effect sizes of those studies that had similar categorization of subjects or types of those subcomponents.
The direction of the mean effect sizes shows how students in EMI education performed on the measures of different dependent variables when compared with their CMI peers and the magnitude of the mean effect sizes reflects the strength of such differences. Cohen (1988) suggested an effect size of 0.2 was small, 0.5 moderate, and 0.8 or over large. Apart from estimating the mean effect sizes, the 95% confidence intervals of the mean effect sizes were calculated. If the confidence intervals do not contain zero, the null hypothesis of no difference between EMI and CMI students can be rejected (Cooper et al., 2006). Z tests were also conducted to infer the statistical significance of the results (Lipsey & Wilson, 2001).
Next, the homogeneity of effect sizes was evaluated by computing Hedges’ Q statistics for academic achievement and affective variables (Card, 2012; Hedges & Olkin, 1985; Lipsey & Wilson, 2001). Significant heterogeneity would suggest that factors other than sampling error may be contributing to the variation in the effect sizes. To interpret such heterogeneity, two moderating tests, including (a) between-group heterogeneity and (b) one-way analysis of variance (ANOVA) on mean effect sizes, were conducted. The former test partitioned total heterogeneity (QTotal) into between-group heterogeneity (QBetween) and within-group heterogeneity (QWithin; Card, 2012; Lipsey & Wilson, 2001). A significant QBetween (with df = number of categorical groups − 1, p < .05) would indicate that effect sizes vary significantly among the studies for a particular moderating variable. The magnitude of the true variation of effect sizes was further estimated by T and I2 (Borenstein et al., 2009).
The other test (i.e., one-way ANOVA) mainly evaluated the heterogeneity of effect sizes in terms of differences in magnitude and direction of mean effect sizes. A significant F statistic (or adjusted F in Brown–Forsyth in case of heterogeneous variance across categorical groups) would indicate a significant mean difference across groups for that particular moderating variable. These two tests addressed the third research question, which asks whether the five variables moderate the differences in students’ learning between EMI and CMI education.
Results
A summary of the effect size data for each study is presented in Table S3 (available online at http://RER.sagepub.com/supplemental). In the tables of this section, as the EMI students were treated as the experimental group, positive mean effect sizes reveal that EMI students performed better than the CMI students on the measures of the dependent variables and vice versa.
Differences in Students’ Academic Achievement
This section addresses the first research question: What is the difference in academic achievement between students studying in EMI and CMI education? The mean effect sizes of the 22 studies that measured students’ academic achievement are presented in Table 1. Generally speaking, EMI students had lower overall academic achievement, with a small and significant negative effect size, M = −.28, SE = .01; Z = −33.0, p < .01, and the homogeneity test shows that these effect sizes varied significantly across studies, Q(21) = 1,240, p < .01; T = .32; I2 = 98.31%.
Mean effect sizes (ESs) of the differences in academic achievement between EMI and CMI students
Note. EMI = English as the medium of instruction; CMI = Chinese as the medium of instruction; CI = confidence interval.
**p < .01.
Differences in Students’ Chinese- and English-Language Proficiency
One major aim of EMI education or other similar kinds of L2-medium programs for majority language students is additive bilingualism, that is, the learning of L2 without sacrificing development in L1. Therefore, it is important to investigate students’ L1 (Chinese) and L2 (English) proficiency in EMI and CMI education. Table 1 shows that students in EMI education lagged slightly behind their CMI peers in Chinese, M = −.12, SE = .01; Z = −12.90, p < .01, but outperformed their CMI peers in English proficiency. The difference in students’ English achievement was significant and fairly large, with moderate effect size, M = .42, SE = .01; Z = 41.13, p < .01.
Differences in Students’ Achievement in Other Content Subjects
While facilitating additive bilingualism or L2 learning, effective EMI education should not hamper students’ learning of other content subjects when compared with their counterparts studying through L1. In the studies included in this meta-analysis, four major content subjects were consistently found, namely, mathematics, science, history, and geography. Table 1 shows that there was no significant difference in students’ learning of mathematics, M = −.01, SE = .01; 95% CI = −.02, .01; Z = −.86, p = .19. However, the performance of EMI students on science, history, and geography was significantly worse than the CMI group, with moderate to large effect sizes, M = −.55 to −.86.
Differences in Affective Variables
In addition to academic achievement, we compared EMI and CMI students on measures of self-concept, motivation, strategy, and interest. The mean effect sizes of all affective variables and their subcomponents are presented in Table 2. The synthesis shows that students in EMI schools generally had more positive affective outcomes, though the effect size was small, M = .08, SE = .01; Z = 8.97, p < .01. Again, the homogeneity test indicates that the effect sizes varied significantly across studies, Q(9) = 111, p < .01; T = .11; I2 = 91.94%.
Mean effect sizes (ESs) of the differences in affective variables between EMI and CMI students
Note. EMI = English as the medium of instruction; CMI = Chinese as the medium of instruction; CI = confidence interval.
p < .05. **p < .01.
With regard to the affective subcomponents, EMI students had higher self-concept in English learning than their peers in CMI schools, M = .28, SE = .01; Z = 30.56, p < .01, but EMI students reported lower self-concept in science subject, M = −.08, SE = .01; Z = −6.71, p < .01. In addition, EMI students performed slightly better on measures of learning motivation, M = .02, SE = .01; Z = 2.18, p < .05, irrespective of the types of motivation. EMI students were also found to score higher on measures of learning strategies, M = .07, SE = .01; Z = 7.37, p < .01. Among the three kinds of strategy investigated in this meta-analysis, the mean effect size was the largest for the achieving strategy, M = .11, SE = .01; Z = 10.65, p < .01, which refers to strategy that maximizes cost-effectiveness of time and effort (i.e., study skills; Biggs, 1987). Finally, students in EMI schools showed slightly more interest in learning as a whole, M = .05, SE = .01; Z = 3.86, p < .01, though they were less interested in learning science, M = −.05, SE = .01; Z = −4.06, p < .01, when compared with their peers in CMI schools.
Moderating Effects
As previously reported, homogeneity tests show that the effect sizes of studies measuring academic achievement and affective variables varied significantly. This implies that some factors other than sampling error have contributed to the variance in the effect sizes. In this section, we examine the moderating effect of five variables on the differences between students studying in EMI and CMI education. As described in the Method section, Hedges’ Q tests with T and I2 and one-way ANOVA were used. Some studies (e.g., Marsh et al., 2002; Salili & Lai, 2003; Siu et al., 1979) included students in both junior and senior forms, and hence the effect sizes for these two groups of students were entered separately when analyzing the students’ grades, resulting in more than 24 effect sizes. In addition, one study (C. C. Ho, 1989) included both standardized tests and self-designed tests and was not included in the analysis of type of outcome measures. Similarly, two studies (C. C. Ho, 1989; Lau & Yuen, 2011) compared EMI and CMI students partly with statistical control over students’ initial abilities and partly without. Thus, they were not included when examining the impact of the variable, control over initial abilities.
The results of the moderator analyses regarding academic achievement are presented in Tables 3 and 4, and results of the moderator analyses regarding affective variables are presented in Tables 5 and 6. The type of outcome measures is missing in Tables 5 and 6, because all the studies examining affective outcomes, except Salili and Lai (2003), used standardized tests and so we decided not to investigate the impact of that particular moderating variable.
Homogeneity tests of effect sizes (academic achievement)
Note. ES = effect size; CI = confidence interval.
With control: including studies with statistical control and experimental studies.
p < .01.
Results of one-way ANOVA for moderators (academic achievement)
Note. ES = effect size.
Significant Levene’s statistic for testing homogeneity of variance across groups was found (p < .05). Thus, F, df, and significant value with Brown–Forsyth’s adjustment are reported.
With control: including studies with statistical control and experimental studies.
p < .05.
Homogeneity tests of effect sizes (affective variables)
Note. ES = effect size; CI = confidence interval.
With control: including studies with statistical control and experimental studies.
p < .01.
Results of one-way ANOVA for moderators (affective variables)
Note. ES = effect size.
With control: including studies with statistical control and experimental studies.
When academic achievement was considered, Table 3 shows that, according to Hedges’ Q, the effect sizes of the included studies were significantly heterogeneous for all the moderating variables, except for control over initial abilities. The magnitude of heterogeneity for each moderating variable was then evaluated. The resulting T fell in a range of .00 to .19, with the variables, type of outcome measures, and research design yielding moderate Ts of .19 and .13, respectively. Their I2 of 95.67% and 90.81% also indicated that most of the observed variance in the effect sizes is real (Borenstein et al., 2009).
Table 4 contains the effect of the moderating variables based on the results of the one-way ANOVA. In academic achievement, most of the moderating variables (except grade of students) appeared to affect the magnitude and direction of the differences between EMI and CMI students. For instance, the studies conducted after the compulsory mother tongue policy showed that EMI students possessed better academic achievement, M = .16, SD = .81, whereas those conducted during the nonintervention period indicated the opposite, M = −.25, SD = .30. The studies using longitudinal designs indicated slightly better performance of EMI students, M = .02, SD = .74, whereas the experimental studies indicated that EMI students performed worse than CMI students, M = −.38, SD = .12. Such differences in the magnitude and direction of effect sizes also applied to type of outcome measures and control over initial abilities. However, Table 4 shows that there was no significant difference between or among groups for most variables, except type of outcome measures.
With regard to affective variables, such as strategy, interest, motivation, and self-efficacy, Hedges’ Q statistics in Table 5 show that the effect sizes of the included studies were significantly heterogeneous for all the moderating variables. However, the resulting T fell in a range of .00 to .07, reflecting low heterogeneity. Parallel to these findings, Table 6 indicates that most of the variables did not have any significant moderating effects on the differences in affective outcomes between EMI and CMI students. Despite this, the variable, implementation period, did seem to affect the magnitude and direction of the effect sizes of affective variables. In particular, EMI students were found to perform slightly worse on measures of affective variables in studies conducted during the nonintervention period, M = −.06, SD = .29, yet they performed better on those measures in studies carried out under the mother tongue policy, M = .16, SD = .09.
Discussion
This meta-analysis systematically combines the empirical evidence of 24 studies investigating the EMI education in Hong Kong. We acknowledge that a majority of the studies included were cross-sectional and correlational in nature and thus we cannot make any strong causal claims. In the following sections, we discuss some potential implications of the findings of this synthesis.
Academic Achievement
First, this meta-analysis shows that in the Hong Kong educational context, students in secondary schools that adopted English (L2) as the MoI performed better in their L2 proficiency, but not L1. This group of students also lagged behind in content subjects, especially in science, history, and geography. These results suggest that EMI education in Hong Kong may fail to achieve the goal of additive bilingualism. In addition, the findings imply the fact that EMI students in Hong Kong may have sacrificed academic achievement for L2 proficiency. Further analyses reflect that the variables, implementation period, research design, type of outcome measures, and control over initial abilities had some moderating effects on the magnitude and direction of the differences in students’ academic achievement between EMI and CMI education.
Although these variables were found to lead to small and significant heterogeneity of effect sizes across studies, only type of outcome measures was a significant moderator in the ANOVA tests. It was found that EMI students performed slightly better than their CMI peers in studies adopting standardized tests, yet the EMI group performed worse than the other group when self-designed tests were used. Such a significant moderating impact of the type of outcome measures can be attributed to the fact that self-designed tests were mainly used with experimental designs (e.g., Siu et al., 1979) and some smaller scale studies (e.g., K. K. Ho, 1985). These tests were usually more specifically designed to measure students’ learning on certain topics taught during the research period, which may have favored CMI students who learnt through their mother tongue.
Comparison with Other Contexts
As noted previously, research studies on L2-medium education in other educational contexts (e.g., Canada, Europe) provide fairly conclusive evidence of the positive impact of using L2 as the MoI on students’ language development without hindering academic achievement. Therefore, the findings of this meta-analysis reveal that the effectiveness of EMI education in Hong Kong appear inconsistent with findings in other educational contexts. There are several potential factors for this discrepancy.
The first factor is the proportion of students enrolling in L2-medium schools in Hong Kong. Owing to the general preference for English in Hong Kong, over 90% of students were studying in EMI schools before the mother tongue policy (Johnson, 1997). Even after the implementation of compulsory mother tongue education, there were still around 25% of students receiving EMI education, and this proportion is on the rise again after the fine-tuning policy since the 2010/2011 academic year (Kan, Lai, Kirkpatrick, & Law, 2011). Such a relatively large proportion of students studying in EMI schools in Hong Kong puts great pressure on schools (Hoare & Kong, 2008), because successful implementation of L2-medium education requires teachers with near-native proficiency in the L2 as well as effective pedagogical skills of teaching through the L2 (Navés, 2009; Tsui, 1992).
Closely related to the large proportion of students in EMI education is students’ L2 proficiency when L2-medium education starts. Hong Kong students are generally admitted to EMI schools in Grade 7, by which time they should have been learning English for at least 6 years in primary schools. However, Tsui (1992) argued that most students are not proficient enough in English to cope with the all-English curriculum, as the English they have learned in primary schools is mainly for communicative purposes, which is different from the academic language required to study content subjects through English (Cummins, 2000; Johnson & Swain, 1994). The Hong Kong government and secondary schools fail to help students bridge this proficiency gap in English (Hoare & Kong, 2008). Without reaching the threshold level of L2, Hong Kong students are very likely to suffer in academic achievement, according to the threshold hypothesis (Cummins, 1979, 2000).
However, students in other educational contexts may also face similar problems with L2 proficiency gaps. Why do the problems look more severe in Hong Kong? There may be two reasons. The first reason is the languages involved. In similar programs in Canada and Europe, the languages involved are likely to be European languages, such as English, French, German, Swedish, and Dutch (Pérez-Cañado, 2012). Tracing their origins and histories, English, German, Dutch, and Swedish all belong to Germanic languages, and many English words—especially academic vocabulary related to religion, science, medicine, and literature—were based on Latin, from which French developed (Janson, 2012). On the other hand, the languages involved in EMI education in Hong Kong are Chinese (Cantonese) and English, which are two distinct language systems, with different writing and spelling systems (Janson, 2012). Thus, it is reasonable to conclude that EMI students in Hong Kong may encounter greater difficulties in bridging the proficiency gap.
Second, the situation in Hong Kong may be related to teachers’ pedagogical skills in teaching effectively through L2. As mentioned previously, Hong Kong teachers were observed to rely heavily on switching to Chinese to help students better understand the lesson content, especially when they tried to cover the whole public exam syllabus, which is the same for all schools irrespective of the MoI. Teachers simply cannot afford to spend more time teaching English in content subject lessons nor can they provide support for students’ L2 development (Hoare, 2003; Hoare & Kong, 2008). This context may not be the most conducive to students’ language development or learning subject content through the L2.
Affective Outcomes
Though students studying in EMI education in Hong Kong appeared to suffer in academic achievement, they performed better in almost all measures of affective variables. In other words, EMI students’ self-concept, motivation, and interest were not undermined. Instead, they may rely on their higher initial abilities and employ different learning strategies (e.g., achieving strategy) to overcome the difficulties they encountered in EMI schools. Such psychological benefits of EMI education is likely because of the general social preference for English proficiency in the Hong Kong society, so students admitted to EMI schools feel more privileged (Salili & Tsui, 2005).
These benefits became even more obvious in studies carried out during the mother tongue policy period, as the moderating variable implementation period was found to have affected the magnitude and direction of effect sizes measuring affective outcomes. This finding can be explained by the strong labeling effect of the compulsory mother tongue policy, since only 114 schools remained as EMI schools, which were then regarded as more prestigious. Thus, students in those schools may feel better about themselves, be more motivated and interested to learn, and more aware of using different types of strategy to study. By synthesizing the results concerning both academic achievement and affective variables, this meta-analysis highlights the fact that when evaluating the effectiveness of EMI education in Hong Kong, we should not focus only on tangible learning outcomes but also some less tangible variables, such as self-concept, learning motivation, and interest.
The differing findings on EMI education in Hong Kong and those L2-medium education programs in other educational contexts can yield some important implications for L2 acquisition. Backed up by psycholinguistic theories of L2 learning, using L2 as the MoI in some or all content subjects has become an appealing alternative to traditional L2 learning programs and has been practiced worldwide. However, this meta-analysis suggests that using L2 as the MoI does not guarantee successful L2 learning without sacrificing academic achievement. Rather, whether such kind of programs can achieve its dual aims depends on many other factors, which may include the sociolinguistic contexts, the actual program implementation in schools, students’ language proficiency, teachers’ pedagogical practices, and the typological differences between the languages involved. These claims echo what Cummins (1979, 2000) proposed in his theoretical model—the outcomes of bilingual education are a function of the interaction of different variables. Hence, though L2-medium instruction is a global trend, there are challenges in making it an effective practice in different educational contexts, an aspect to which more attention should be paid.
Moderators
Several potential moderating variables have been examined in this meta-analysis. Though only a few of them were found to have statistically significant effect on the differences between EMI and CMI students, a closer look into those moderating variables can perhaps illuminate both the MoI policy in Hong Kong and the broader methodological issues of conducting research on L2-medium education or other bilingual programs.
When the studies were categorized according to the period of implementation, it was found that while EMI students tended to lag behind in academic achievement during the nonintervention period, they outperformed the CMI students after the compulsory mother tongue policy was implemented. Bearing in mind that the mother tongue policy greatly reduced the number of EMI schools by stipulating the necessary conditions that EMI schools had to fulfill, the discrepancy in the magnitude of effect sizes of studies conducted in periods with different policies may then provide some evidence of the effective streaming effects of the compulsory mother tongue policy. That is, students with sufficient English language proficiency admitted to schools with the necessary resources and support are more likely to benefit from EMI education. Such a finding may shed light on the recent fine-tuning MoI policy, which has increased the number of EMI schools (or classes) in Hong Kong. The government and schools may need to carefully consider whether teachers are well supported in developing effective pedagogical skills and whether students are sufficiently assisted with their L2 proficiency.
In addition, EMI students performed slightly better than CMI students in longitudinal research studies, which traced the development of students throughout their secondary schooling (usually for 5 years). Perhaps students entering EMI schools need at least 5 to 7 years to reach the threshold level of L2 (Collier, 1989). Therefore, when longitudinal studies follow students over several years, the negative impact of EMI education gradually diminishes (Marsh et al., 2002). To evaluate the impact of any bilingual programs, it seems more reasonable to follow the development of students’ learning throughout the program, instead of simply taking snapshots of their learning.
Some studies controlled for students’ initial abilities (i.e., students’ abilities or achievement before they were admitted to EMI or CMI secondary schools), and some did not. However, students’ initial abilities are a potential confounding variable when comparing students’ learning in different types of education programs. The current meta-analysis showed that when the initial abilities are not controlled, EMI students outperform CMI students academically. Yet when students’ initial abilities were taken into account, the former group was not on par with the latter group. Such a contrast implies that studies evaluating the effectiveness of bilingual programs should consider this important variable.
Limitations
This meta-analysis is not without limitations. First, as illustrated above, there was considerable heterogeneity in the effect sizes across studies. The variance in effect sizes across the relatively small number of studies may explain why most of the moderating variables were found to be nonsignificant in the ANOVAs. Such heterogeneity may also reflect the variance in the quality of the studies included. Whereas some studies (e.g., Education Bureau, 2004, 2006; Marsh et al., 2000, 2002) were large-scale longitudinal studies tracing thousands of students throughout their secondary education, some were cross-sectional comparisons with fairly small samples of students (e.g., Lau & Yuen, 2011; Lo & Murphy, 2010). The number of experimental studies was also very limited, thereby raising concerns about the internal validity of some studies.
Second, even though this meta-analysis attempted to examine the effectiveness of EMI education in students’ learning by controlling some input variables, such as students’ prior abilities and socioeconomic status, there are still some variables that can hardly be captured or controlled by any studies. Examples of these variables include parental support and effects of self-selection into the programs. Third, this study focused on the learning outcomes, whereas teaching and learning processes (e.g., classroom interaction and classroom discourse) were not included. Teaching and learning processes in EMI education are also important for evaluating bilingual programs, through understanding what actually happens in lessons using an L2 as the MoI (Lo & Macaro, 2012; Yip, Coyle, & Tsang, 2007). Finally, this meta-analysis only compared EMI and CMI students and excluded mixed-code groups. However, as some researchers may argue (e.g., Lin, 2006; Macaro, 2009; Probyn, 2009), judicious and principled code-switching may actually be helpful and effective for some learners. Therefore, excluding mixed-code groups may have precluded a comprehensive picture of the impact of using different languages as the MoI. This is an area worth further investigation.
Recommendations and Conclusion
From the insights of this meta-analysis, we have generated a list of recommendations for future research on the effectiveness of L2-medium education or other bilingual programs:
Longitudinal investigation into students’ performance can better capture and trace the impact of different bilingual programs on students’ development
Large-scale studies can provide a more comprehensive picture, but individual factors such as initial academic abilities and socioeconomic status should be taken into account or controlled for
Validated standardized measures of students’ performance may be less sensitive to the effect of other confounds, such as sample size and period of intervention; if similar standardized measures are adopted across studies, the findings of different studies can be better interpreted, compared, and synthesized
The various groups of participants (e.g., EMI group, CMI group, mixed-MoI group) should be clearly defined, not only in terms of the written materials adopted but also with regard to the specific language that teachers use when delivering lessons
To the best of our knowledge, this study is the first to systematically synthesize research studies of EMI education in Hong Kong. This meta-analysis, therefore, helps provide more evidence about students’ learning in EMI education in Hong Kong. The results can in turn shed light on the MoI debate in Hong Kong society and the government’s formulation of MoI policies in the future. In addition, through discussing the differences in the effectiveness of L2-medium education in different educational contexts, this study may have implications for L2 learning and bilingual education programs more generally.
Footnotes
Notes
Authors
YUEN YI LO is an assistant professor at The University of Hong Kong, teaching on the Bachelor of Education, Postgraduate Diploma in Education, and Masters programs. She has a strong research interest in second language acquisition, language-in-education policy, and teacher development.
ERIC SIU CHUNG LO is a part-time lecturer at Hong Kong Baptist University, teaching research method courses on the Bachelor of Psychology program.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
