Abstract
This meta-analysis synthesized recent research on strategy instruction (SI) effectiveness to estimate SI effects and their moderators for two domains: second/foreign language and self-regulated learning. A total of 37 studies (47 independent samples) for language domain and 16 studies (17 independent samples) for self-regulated learning domain contributed effect sizes for this meta-analysis. Findings indicate that the overall effects of SI were large, 0.78 and 0.87, for language and self-regulated learning, respectively. A number of context (e.g., educational level, script differences), treatment (e.g., delivery agent), and methodology (e.g., pretest) characteristics were found to moderate SI effectiveness. Notably, the moderating effects varied by language versus self-regulated learning domains. The overall results identify SI as a viable instructional tool for second/foreign language classrooms, highlight more effective SI design features, and suggest a need for a greater emphasis on self-regulated learning in SI interventions and research.
Keywords
In today’s increasingly mobile and globalized world, developing higher levels of proficiency in a language other than one’s own becomes ever more important to boost one’s “stability, employability, and prosperity” (British Council, 2013, p. 3). Studying variables that may enhance the learning of additional languages, then, becomes crucial for informing practice, policy, and theory affecting the quality and outcomes of language learning experiences. One such variable—identified as a powerful tool in additional language development (as well as in native language development; see Pressley, 2002)—has been referred to as language learning strategies (LLS).
LLS have been defined as actions learners use consciously to learn a new language more efficiently (Griffiths, 2007). Although LLS taxonomies vary (see a review of the topic in Barjesteh, Mukundan, & Vaseghi, 2014), most LLS researchers distinguish among three LLS categories: (a) cognitive strategies, “to do with the behaviors and mental processes of the learning” (e.g., keyword, rehearsal, note taking); (b) metacognitive strategies, “to do with awareness of the learning” (e.g., focusing attention, planning for learning); and (c) socioaffective strategies, to do with “interactions with others” and “personality traits” (e.g., asking for help, self-encouragement; Hassan et al., 2005, p. 1).
Higher use of spontaneous (non-instructed) LLS has been associated with higher proficiency both in second language (L2) and foreign language (FL) contexts (e.g., Ardasheva, 2016; Hu, Gu, Zhang, & Bai, 2009; Huang, Chern, & Lin, 2009; Nahavandi & Mukundan, 2014) and with higher performance on self-regulated learning measures (e.g., motivation: MacIntyre & Noels, 1996; self-efficacy: Magogwe & Oliver, 2007). In this study, we use the term L2 to refer to a language studied inside a country where it is commonly spoken as an official language such as P-16/adult English-as-a-Second-Language (ESL) classes in the United States or French/English immersion classes in Canada and we use the term FL to refer to a language studied inside a country where is not commonly spoken such as French classes in the Unites States or English classes in Japan. The results of extensive, qualitative reviews of LLS research conducted in both L2 and FL settings since the 1970s (Cohen & Macaro, 2007; McDonough, 1999) suggest that LLS can be successfully taught through what is known as strategy instruction (SI), the variable at the heart of the present study.
SI has been defined as “any intervention which focuses on the strategies to be regularly adopted and used by language learners to develop their proficiency, to improve particular task performance, or both” (Hassan et al., 2005, p. 1). Current SI interventions typically involve four steps: consciousness raising, modeling, guided practice, and evaluation/goal-setting (we discuss these steps later in the article). There is emergent evidence linking LLS not only with better language outcomes (e.g., Plonsky, 2011) but also with better academic performance in content areas (language arts, mathematics, science; Ardasheva, 2016; Ardasheva & Tretter, 2013; Chamot, 2007; Martínez-Álvarez, Bannan, & Peters-Burton, 2012; Montes, 2002). In other words, LLS are a potentially beneficial, malleable factor under control of educational systems.
Yet, evidence regarding the effectiveness of SI on language outcomes remains inconsistent. In their systematic review of SI literature, Hassan et al. (2005) found that out of 25 qualifying, experimental design investigations, 17 studies reported positive results, 6 studies reported mixed results, and 2 studies reported negative results. Overall, the authors concluded that evidence of SI effectiveness was the strongest for reading and writing, but less so for listening, speaking, overall proficiency, and vocabulary. In a subsequent meta-analysis of 61 studies with 95 unique samples, Plonsky (2011) found that SI had an overall positive, moderate effect on student outcomes and identified a number of study characteristics associated with SI effectiveness differentials. In contrast to Hassan et al. (2005), however, Plonsky (2011) reported that SI effectiveness was most pronounced for speaking and least pronounced for writing; the effects for reading and vocabulary were average (a similar effect for reading was reported in a narrower scope meta-analysis comparing the effectiveness of two interventions, SI and glosses, with reading serving as the only outcome examined; Taylor, 2014).
Notably, both Hassan et al. (2005) and Plonsky (2011) noted that the generalizability of their respective synthesis findings was undermined by the substantial between-study differences in structural features of SI treatments designed, particularly in early SI research, “based largely on convenience, intuition, and/or some level of idiosyncrasy” (Plonsky, 2011, p. 998). This is of particular concern provided a notably greater standardization of intervention frameworks emerging in recent years due to recommendations for practice and research emanating from 30 years of SI research (Cohen & Macaro, 2007), the proliferation of pedagogical, how-to literature (e.g., Chamot, 2009; Rivera-Mills & Plonsky, 2007), and an emergent theoretical focus on self-regulated learning skills as an enabling mechanism of SI effectiveness (e.g., Grenfell & Macaro, 2007; Macaro, 2006). Such theory- and practice-driven changes in SI field suggest a need to periodically reevaluate SI effectiveness across outcomes, considering most recent trends and evidence.
The purpose of this meta-analysis, then, was twofold. The first main objective was to synthesize most recent (2008–2014) SI research that emerged in response to recommendations for research and practice emanating from 30 years of SI research and the field’s call for a greater emphasis on self-regulated learning (see the discussion above). Relatedly, the second main objective was to, in contrast to the earlier meta-analysis by Plonsky (2011), separately estimate SI effects and their moderators for two domains, language and self-regulated learning. This decision was made in response to a shift from exclusively focusing on what is learned when a new language is acquired (the product or outcome of learning) to studying how a new language is learned (the process of learning) both in the field of L2 acquisition, in general (Griffiths, 2007; Oxford, 2011; see also Dörnyei, 2003, 2005) and in the field of SI, in particular (see the discussion above). In addition, this study explored a number of additional moderators not explored by previous meta-analytic research (e.g., language typology, technology- vs. instructor-delivered and researcher- vs. teacher-lead SI intervention).
Strategy Instruction
Theoretical Underpinnings
In her seminal works, Rubin (1975, 1981) argued that observing “good language learners” (i.e., those identified as effective in acquiring a new language) and studying processes that help them become successful may inform theories of language processing. This information, in turn, could be taught to less successful language learners. The goals of SI, then, are to (a) increase learners’ awareness of the most effective methods of language learning (Cohen, Weaver, & Li 1996; Dabarera, Renandya, & Zhang, 2014; De Silva, 2014; Hu et al., 2009) and (b) develop the independent, self-regulated learners actively participating in their own learning processes (Graham & Macaro, 2008; Oxford, 1999).
Focus on self-regulated learning, Zimmerman (1990) argued, is needed because the “initial optimism that teaching students various learning strategies would lead to improved self-regulated learning has cooled with mounting evidence that strategy use involves more than mere knowledge of a strategy” (p. 9; see also Dörnyei, 2003, 2005; Grenfell & Macaro, 2007). He further argued that to promote self-regulated learning, instruction should focus on supporting three component processes: (a) behavioral, the knowledge and use of learning strategies; (b) metacognitive, self-feedback regarding the effectiveness of learning and learning strategies and student responsiveness to such feedback; and (c) motivational, the interdependence between learning and motivational processes. Figure 1 summarizes these theoretically formulated relationships among SI, self-regulated learning, and achievement.

A model of strategy instruction effects on self-regulated learning and achievement
With regard to the behavioral component, much theoretical and empirical evidence supports the need for the learner to both know and use LLS. Cohen (1998) and Genesee, Lindholm-Leary, Saunders, and Christian (2005) observed that conscious awareness and use of LLS is characteristic of L2 development. This is because L2 learners—beginning to learn a new language at a more mature age—are more aware of language features they need to learn and can consciously draw on explicit LLS to enhance their learning. According to Macaro (2006), LLS do not simply make learning more efficient, but are “the raw material without which L2 learning cannot take place” (p. 332). Indeed, despite some inconsistencies (Nisbet, Tindall, & Arroyo, 2005; Takeuchi, 1993) and evidence to the contrary (Gardner, Tremblay, & Masgoret 1997), a positive relationship between LLS and L2 proficiency has been documented in a number of studies (e.g., Lan & Oxford, 2003; Nahavandi & Mukundan, 2014; Peacock & Ho, 2003; see also Cohen & Macaro, 2007). In synthesizing findings from 12 studies, Oxford (1999), for example, reported that LLS use accounted for substantial amount of variance in L2 proficiency ranging from 21% (a study of Taiwanese students learning English in secondary and tertiary institutions) to 58% (a study of first-year English learners in a Japanese women’s college).
With regard to the metacognitive component, Zimmerman (1990) argued that learners need to engage a cyclic “self-oriented feedback loop” through which they monitor LLS effectiveness and respond to self-feedback in varied ways, “ranging from covert changes in self-perception to overt changes in behavior” (e.g., altering LLS use; p. 5). Many LLS researchers (e.g., Cohen, 1998; Hsiao & Oxford, 2002; Vandergrift & Tafaghodtari, 2010; Zenotz, 2012) argued that L2 success did not depend on the number or frequency of LLS use, but rather on the learner’s ability to select and orchestrate LLS that are most appropriate for completing a given learning task. After all, Grenfell and Macaro (2007) noted, if inappropriately used, any LLS may result in failure. Indeed, metacogntive knowledge—including knowledge of self, knowledge of task, knowledge of learning goals, and strategic competence—has been linked with higher L2 outcomes (Ardasheva, 2016; Ardasheva & Tretter, 2013; Graham & Macaro, 2008; Huang et al., 2009; Kolic-Vehovec & Bajsanski, 2007; Schoonen, Hulstijn, & Bossers, 1998; van Gelderen et al., 2004).
Finally, the motivational component of self-regulated learning relies on learners’ willingness to commit time, effort, and vigilance to initiate and regulate LLS (Zimmerman, 1990). In other words, Zimmerman (1990) contends, self-regulated learning is not only “self-determined in a metacognitive sense” (p. 6) but also self-motivated. With some inconsistencies (Takahashi, 2005; Vandergrift, 2005), research supported the existence of a direct (e.g., Ehrman & Oxford, 1995; Wang, 2008) or mediated (e.g., Ardasheva, 2016; Pae, 2008) relationship between motivation and L2 achievement. Examples of investigated individual difference characteristics mediating motivation include metacognitive strategies (Ardasheva, 2016), effort (Bernaus & Gardner, 2008), and intensity and self-confidence (Pae, 2008). Furthermore, studies found a robust association between motivation and LLS (e.g., MacIntyre & Noels, 1996; Schmidt & Watanabe, 2001; Vandergrift, 2005). Peacock and Ho (2003), for example, found that high LLS users reported being highly motivated and perceived learning an L2 as personally important and enjoyable. Other studies comparing high- versus low-motivation students found that the former group used LLS with greater frequency (Oxford & Nyikos, 1989) and reported knowing more LLS and tended to find LLS more effective and easier to use (MacIntyre & Noels, 1996). Importantly, research indicated that motivation may be stimulated by instructional environments that satisfy human needs for competence (the know-how), autonomy (self-initiation, self-regulation), and relatedness (“secure and satisfying relationships with others;” Deci, Vallerand, Pelletier, & Ryan, 1991, p. 327; see also Noels, 2001; Noels, Clément, & Pelletier, 1999; Wu, 2003). Taken together, these findings suggest that motivation is an important, manipulable-by-instruction component of self-regulated learning; its impact on L2 development may be direct or mediated by LLS and related individual difference characteristics.
Although, as noted above, a number of statistical modeling studies examined mediating effects of different aspects of self-regulated learning on L2 achievement, SI studies per se typically examine only the direct SI impacts on self-regulated learning and/or on L2 achievement, without considering the theoretically posited mediating effects. Following the typical SI study design, this meta-analysis will separately estimate SI effects and their moderators for two domains: language and self-regulated learning. Below, we first describe a typical SI study that exemplifies the common structures of SI studies included in this meta-analysis and then discuss variables that may moderate SI effectiveness depending on the operationalization of such common structures across individual studies.
A Typical SI Study Showcasing Common SI Structures
Macaro and Erler (2008) conducted a 14-month SI study in reading with a sample of young (11–12 year olds), beginner (Year 1) learners of French enrolled in six secondary schools in England. Six intact classrooms were matched on teachers’ years of experience and nonrandomly assigned to either treatment or control conditions (one control classroom dropped from the experiment before the end of the study contributing to an attrition rate of 30.2%; participants and nonparticipants did not differ in reading on pretest, suggesting that attrition was not an issue). Whereas 62 students in the treatment group received awareness raising of 12 strategies listed in a pre-intervention questionnaire plus six additional strategies, 54 students in control groups received regular instruction. Strategies targeted in the treatment group included cognitive (e.g., sounding out, using context clues, background knowledge) and socioaffective (e.g., asking for help, not giving up easily) strategies. Both treatment and control groups were taught by their regular teachers; treatment teachers used researcher-developed reading/SI materials, in addition to regular textbooks. SI lasted, on average, about 10 minutes per day with instructional procedures including the following steps: (a) awareness raising, (b) modelling of strategies, (c) scaffolded practice, (d) removal of scaffolding, and (e) evaluation. The latter component included individualized feedback by the teacher and pair and whole-group discussions. Importantly, “implicit in all these activities was the concurrent development of metacognition with regard to making decisions about clusters of strategies available, and evaluating strategies used” (Macaro & Erler, 2008, p. 105).
Participants were assessed pre- and postintervention on three researcher-developed measures: (a) a reading comprehension test, (b) a reading strategy use questionnaire, and (c) a French attitudes scale. The reading test—the posttest was administered 1 month after the intervention—included narrow (translation) and broad (idea unit identification) reading tasks. The complexity and length of L2 text increased from pre- to postintervention to reflect growing proficiency of the students; test responses were provided in the first language (L1; also referred to as the native language). To allow for results’ comparability, reading tests were scored as percent correct. Strategy use and attitudes measures were both self-report, with 3- and 5-point Likert-type scales, respectively. The reported reliabilities for all three measures were acceptable (.7 and above). Results indicated that SI (a) significantly improved comprehension with an effect size (ES)—corrected for small sample size—of 1.17, (b) “brought about changes in strategy use” (Macaro & Erler, 2008, p. 90; with some of the changes favoring control group), and (c) improved reading attitudes.
Yet, no benefits of SI were found in at least two other recently published L2 reading comprehension studies (Gladwin & Stepp-Greany, 2008; Takallou, 2011) qualifying for this meta-analysis; similar discrepancies in SI impacts were documented for other language (e.g., listening; Cross, 2009; Vandergrift & Tafaghodtari, 2010) and self-regulated learning (e.g., strategy use; Ranalli, 2013; Zenotz, 2012) outcomes. Cross (2009) and Vandergrift and Tafaghodtari (2010), for example, found no versus positive SI impacts, respectively. Notably, although both studies were similar in terms of SI instructional features and focused on adult FL learners, the two studies differed in terms of SI duration, participants’ proficiency levels, and L1–L2 linguistic proximity. These findings suggest that variations in common SI structures across individual studies may play a moderating role on SI effectiveness. We discuss such moderators next, including both known and novel (not explored by previous meta-analytic research) moderators.
Moderator Variables
Outcomes
Research on spontaneous strategy use linked LLS with higher L2 achievement across a broad range of language and self-regulated learning outcomes. Examples of latter include reading (Huang et al., 2009; Kolic-Vehovec & Bajsanski, 2007; Schoonen et al., 1998; van Gelderen et al., 2004), listening (Dreyer & Oxford, 1996; Peacock & Ho, 2003; Takeuchi, 1993; Vandergrift, Goh, Mareschal, & Tafaghodtari, 2006), speaking and writing (Peacock & Ho, 2003), overall proficiency (Nisbet et al., 2005; Takeuchi, 1993), and vocabulary and grammar knowledge (Dreyer & Oxford, 1996; Fraser, 1999; Peacock & Ho, 2003; Takeuchi, 1993). Examples of former include motivation (MacIntyre & Noels, 1996) and self-efficacy (Magogwe & Oliver, 2007).
Research on instructed strategy use, in turn, suggested that SI effectiveness varied depending on the targeted outcome, both for language and self-regulated learning domains. Hassan et al.’s (2005) systematic review, for example, found that SI “works for reading comprehension and writing skills, and [that] the research evidence for this is stronger than it is for listening, speaking and overall proficiency” (p. 6). Evidence regarding SI effectiveness for vocabulary, in turn, was judged as being weak. In line with Hassan et al.’s (2005) synthesis, Plonsky’s (2011) meta-analysis estimated that SI effects on overall proficiency and listening were negligible. In contrast to Hassan et al.’s (2005) work, however, Plonsky (2011) found that SI effectiveness was the strongest for speaking; the effects for reading and vocabulary were average and the effect for writing was the smallest. Furthermore, out of the two SI effects on self-regulated learning outcomes—namely, strategy use and attitudes—only that on strategy use was statistically significant and large in size; the effect on attitudes was estimated to be negligible.
Two reasons may account for such discrepancies scope of work included in each synthesis (published over half a decade later, Plonsky’s [2011] study included a worth of eight more years of research) and differences in analytic approaches (meta-analytic techniques in Plonsky [2011] and team-assigned weights of evidence regarding study trustworthiness, methodological soundness, and relevance in Hassan et al. [2005]). In addition to these obvious reasons, the between-syntheses discrepancies in conclusions may be attributed to methodological advancements in the field itself, such as the differentiation between generic and task-specific LLS (e.g., speaking LLS, listening LLS; see Dörnyei, 2005; Hsiao & Oxford, 2002; Oxford, 2011; Oxford, Cho, Leung, & Kim, 2004). The latter trend, in particular, may explain the drop in the percentage of SI studies targeting overall proficiency over time. That is, among all synthesized studies targeting a specific language outcome, studies that focused on overall proficiency were 5 out of 27 in Hassan et al. (2005; see Table 4.1) and 4 out of 95 in Plonsky (2011), roughly 18% and 4% of all qualifying studies, respectively. (We discuss other advancements in the field later in the paper.) Taken together, these findings highlight language and self-regulated learning outcomes as an important moderator of SI effectiveness and suggest a need to periodically reevaluate SI effectiveness across outcomes, considering most recent trends and evidence.
Furthermore, it is important to recognize that language outcomes, specifically, may be broadly categorized into language outcomes per se (speaking, listening, vocabulary, and grammar) and literacy outcomes (reading and writing). Although the two categories may somewhat overlap in adult L2 learning, this may not be the case for young learners who typically develop language skills first (for this reason, bilingualism does not necessarily imply biliteracy) and, as noted earlier, few SI interventions place an equal emphasis on both language and literacy skills. These considerations suggest that both learners’ age and proficiency (language and literacy skills) may moderate SI effectiveness, along with other learner and study characteristics discussed next.
Context
Researchers have long recognized that context-related variables may differentially affect the effectiveness of any intervention. Literature suggests that when it comes to SI interventions, such variables may include L2 versus FL setting, age, educational level, proficiency, and language typology.
L2 versus FL setting
SI effectiveness has been extensively tested in both FL (e.g., Graham & Macaro, 2008; Vandergrift & Tafaghodtari, 2010) and L2 (Gunning, 2011; Ranalli, 2013) contexts. Although it is difficult for individual studies to account for such setting differences, Plonsky’s (2011) meta-analysis did find that the SI effectiveness ES was almost two times larger in L2 versus in FL contexts. A narrower in scope (see below) meta-analysis conducted by Taylor (2014), however, reported an opposite trend with ESs favoring FL contexts. There were two substantial differences between these two meta-analytic studies, however.
First, the overall SI effectiveness (i.e., an overall, main ES) tested for FL–L2 setting moderator was operationalized differently in the two studies. That is, whereas in Plonsky (2011) the overall effect was synthesized across both language and self-regulated learning outcomes, in Taylor (2014), the overall effect integrated SI and glossing studies and was synthesized only for a single language outcome (reading). Second, whereas Plonsky (2011) considered two moderator levels (FL, L2), Taylor (2014) considered three moderator levels (ESL, EFL [English as a foreign language], FL). Such differences in construct conceptualization may have contributed to the above-mentioned discrepancy in research findings. To address this discrepancy for SI studies, we will ascertain the moderating effect of setting at three levels (ESL, EFL, FL) and we will do so separately for language and for self-regulated learning outcomes.
Age and educational level
Age and educational level have been identified among variables directly related to “the choice, use, or evaluation” of LLS (Oxford & Leaver, 1996, p. 227; see also Hu et al., 2009). Magogwe and Oliver (2007), for example, found that learners preferred strategies of different complexity depending on their educational level: Whereas primary students favored social strategies, secondary (and above) students favored metacognitive strategies. Peacock and Ho (2003) found that older students used significantly more strategies than did younger students. These findings may be attributed to increasing cognitive maturity as with age students develop increasingly sophisticated perceptions of self and academic tasks (Zimmerman, 1990).
Plonsky (2011), however, found a somewhat unexpected result when testing “whether greater (meta)cognitive capacity of adults offers an advantage over children” (p. 997). That is, the SI effect for younger learners (i.e., younger than 12 years, primary education) was 1.4 to 3.3 times larger than that for older and upper-educational–level students. As SI effectiveness research continues with different age groups, including children (e.g., Martínez-Álvarez et al., 2012), adolescents (e.g., Graham & Macaro, 2008), and adults (e.g., Soleimani, Zandiye, & Esmaeili, 2014), we will further investigate this issue by examining if age-related cognitive maturity may affect SI effectiveness differently, depending on the outcome—language versus self-regulated learning—domains.
Proficiency
With some notable exceptions (e.g., Hong-Nam & Leavell, 2006; Phillips, 1992), much research found a positive and linear relationship between LLS use and L2 proficiency (e.g., Hu et al., 2009; Nahavandi & Mukundan, 2014). Students with higher proficiency were found to use LLS more frequently (Dreyer & Oxford, 1996; Griffiths, 2007) and within an increasing range (Griffiths, 2007; Kaylani, 1996). Chesterfield and Chesterfield (1985), for example, found that increased levels of L2 competence seemed to imply the ability to use a larger range of increasingly sophisticated strategies progressing from receptive (cognitive) strategies to more interactive (socioaffective) strategies and to more self-regulatory (metacognitive) strategies by the end of their longitudinal study. Notably, this developmental pattern in LLS use documented in individual studies with “more strategy options becom[ing] available to the L2 learner as competency increases” (Taylor, 2014, p. 45) parallels, to some extent, findings indicating learners’ ability to make a greater use of SI at greater L2 proficiency levels. That is, although individual studies (e.g., De Silva, 2014; Jurkovic, 2010; Urlaub, 2012) continue to report discrepant results in terms of at what proficiency level learners benefit most from SI, Plonsky’s (2011) and Taylor’s (2014) recent meta-analyses suggested that, overall, SI is more beneficial for more advanced learners.
An important question remains, however: Do the relationships between learner proficiency and SI vary by the learning outcome—language versus self-regulated learning—domains? Based on previous empirical (Plonsky, 2011; Taylor, 2014) and theoretical (Zimmerman, 1990; see also Dörnyei, 2003, 2005; Oxford, 2011) research, one would expect that higher proficiency students would be greater SI beneficiaries and that the impacts for language and self-regulated learning outcomes within proficiency levels would be comparable in size. With some notable exceptions (confirming the latter but not the former hypothesis; De Silva, 2014), however, individual studies rarely control for proficiency and rarely do so for both outcomes, a shortcoming that would be easily addressed by the synthetic nature of the meta-analytic approach used in this study.
Language typology
Although the impacts of L1/L2 differences on SI effectiveness have not yet been closely examined, our interest in investigating this potentially moderating effect is grounded in research on transfer (also referred to as cross-linguistic influence; Kellerman, 1995), a phenomenon referring to transferring prior linguistic knowledge from L1 to L2. When L1 items are applied correctly to L2 contexts, transfer is said to be positive; negative transfer occurs when application of L1 forms disrupts performance in L2 (Saville-Troike, 2006). Although there have been different perspectives on the role of transfer in L2 acquisition (see discussions in Gass & Seliker, 2008; Kellerman, 1995; MacWhinney, 1992), there is evidence to suggest that L1/L2 differences may affect L2 acquisition (Navarra, Sebastián-Gallés, & Soto-Faraco, 2005; van Boxtel, Bongaerts, & Coppen, 2003).
Tao and Healy (1998), for example, found that English reading task performances of native speakers of Dutch and English were similar to each other but differed from the performances of native speakers of Japanese and Chinese. The similarity in Dutch and English speakers’ performance could be attributed to two language-related features, not shared by Japanese and Chinese languages. That is, Dutch and English languages share the same script (Latin) and belong to the same language family (Indo-European), both features plausibly increasing opportunities for L1/L2 transfer due to shared orthographic and common grammatical features (the latter being a characteristic of genetically related languages; Fromkin, Rodman, & Hyams, 2007). On the other hand, whereas the Chinese language belongs to the Sino-Tibetan language family and uses the Han script, the Japanese language belongs to the Japonic language family and uses the Hiragana and Katakana scrips in addition to the Han scrip (see Ethnologue at https://www.ethnologue.com/statistics/family). To capture these two plausible language-related moderators in our study, we operationalize L1/L2 typological differences in two ways, namely, as L1/L2 belonging versus not belonging to the same language family and as L1/L2 sharing versus not sharing the same script.
Treatment
In an earlier systematic review of SI studies, Hassan et al. (2005) noted that between-study differences in structural features of SI treatments—such as the selection of strategy type (metacognitive, cognitive, or socioaffective strategies), strategy scope (single strategy or packaged together strategies), treatment duration (from up to 2 weeks to up to a school year), and instructional approach (awareness-raising vs. behavior-modeling; defined later in the article)—“limited the degree to which studies could be combined cumulatively” (p. 64) to assess the overall effectiveness of SI interventions. The authors called for greater standardization of intervention frameworks in the future and for more research comparing effectiveness of awareness-raising versus behavior-modeling approaches.
Subsequently, the use of meta-analytic techniques—as contrasted with Hassan et al.’s (2005) weight-of-evidence judgments—did allow Plonsky (2011) to quantify the extent to which differences in structural features of interventions, but not in the selection of an instructional approach, contributed to differences in SI effectiveness. In particular, Plonsky (2011) found that teaching fewer strategies (eight or less), targeting cognitive rather than metacognitive strategies, and providing longer interventions (over 2 weeks) were associated with larger SI impacts. Echoing Hassan et al. (2005), Plonsky (2011) noted the continued lack of SI standardization, which he attributed to a lack of comprehensive SI theory leaving “researchers and practitioners to design studies of SI based largely on convenience, intuition, and/or some level of idiosyncrasy” (p. 998).
It is worthwhile to note, however, that although Plonsky (2011) incorporated more recent research, both Plonsky’s (2011) and Hassan et al.’s (2005) reviews included an overlapping data subset of early (1980s and 1990s) studies. Due, in part, to recommendations for research and practice emanating from 30 years of SI research (Cohen & Macaro, 2007), the proliferation of pedagogical, how-to literature (e.g., Chamot, 2009; Rivera-Mills & Plonsky, 2007; see a discussion of SI models in Chamot, 2004), and an emergent theoretical focus on self-regulated learning—particularly on the metacognitive component of self-regulated learning—as an enabling mechanism of SI effectiveness (e.g., Grenfell & Macaro, 2007; Macaro, 2006), a notably greater standardization of intervention frameworks has gradually emerged in the past decade.
In particular, most recent SI interventions adopt an awareness-raising instructional model targeting task-specific strategy clusters—rather than single strategies—often across metacognitive, cognitive, and socioaffective strategy types (e.g., Dabarera et al., 2014; Lam, 2009; Macaro & Erler, 2008; Takallou, 2011; Vandergrift, & Tafaghodtari, 2010). With some minor variations, the awareness-raising model typically includes the following steps: (a) consciousness raising—students reflect on learning and their current and potential strategies; (b) modeling—teachers introduce and model new strategies appropriate for the learning goal; (c) guided practice—students are given opportunities to practice strategies with gradual removal of teacher prompts to do so; and (d) evaluation and goal-setting—“students identify problem areas, select strategies that might help remedy them and evaluate their success” (Graham & Macaro, 2008, p. 753). Yet, a number of studies still employ a behavior-modeling approach including only modeling and practice (Kim, 2013; Morett, 2012; Urlaub, 2012).
Accordingly—beyond duration and strategy number moderators examined in Plonsky (2011)—we will explore whether the selection of SI approach (awareness-raising vs. behavior modeling) affects SI effectiveness. Other moderators possibly affecting SI effectiveness and not investigated in previous meta-analyses, will include SI scope and delivery mode. With regard to former, we will examine whether or not larger strategy selection scope (clustering strategies across metacognitive, cognitive, and socioaffective strategy types; e.g., Dabarera et al., 2014; Lam, 2009; Takallou, 2011) would be associated with greater gains than narrower strategy selection scope (only a single strategy type; e.g., Ding & Songphorn, 2012; Gladwin & Stepp-Greany, 2008). With regard to delivery mode, we will estimate whether technology-delivered instruction (an emergent trend in recent SI research; e.g., e-learning tools: Morett, 2012; Ranalli, 2013; video: Kim, 2013) would be associated with greater or lesser gains when compared with human-delivered instruction and whether greater familiarity with SI theories and procedures and a greater control over fidelity of implementation would benefit researcher-led (e.g., Baleghizadeh & Mortazavi, 2013; Bozorgian & Pillay, 2013; Chan, 2014; Dabarera et al., 2014) versus teacher-lead (e.g., Grenfell & Harris, 2013; Gunning, 2011; Hu et al., 2009) interventions.
Methodological Features
Past meta-analytic research, across educational fields, has demonstrated that research design and implementation features (e.g., assignment to conditions) may serve as sources of variance in the estimated outcomes. Plonsky (2011), for example, found that although studies with stronger design (pretest, random groups, reported reliability) consistently yielded larger effects than did studies with weaker designs (no pretest, random groups, reported reliability), on average, all studies included in his meta-analysis yielded statistically detectable effects, regardless of the strengths or weakness of their designs. By contrast, Adesope, Lavin, Thompson, and Ungerleider (2010) reported that, for at least some of the examined outcomes, weaker design studies (reliability not reported) “produced a statistically detectable mean effect size” whereas stronger design studies (reliability reported) did not (p. 226). Thus, in addition to exploring substantively and practically meaningful moderators described in the literature review, we will also investigate methodological moderators. In particular, building on previous research, we will investigate the extent to which individual study quality markers (reliability reported: e.g., Adesope et al., 2010; Plonsky, 2011; and pretest and random assignment implemented: Plonsky, 2011) covary with study ESs.
Research Questions
There are three objectives beyond identifying the main effect of SI on language and self-regulated learning outcomes, specifically, identifying (a) L2 outcomes most affected by SI, (b) effectiveness differentials across contexts, and (c) structural features of strategy training practices associated with positive L2 outcomes. In addition, the proposed research will investigate ways in which methodological features of the studies related to the observed outcomes. The following research questions will guide this study:
What is the overall effectiveness of SI in improving L2 outcomes? What L2 outcomes (reading, writing, listening, speaking, overall proficiency, vocabulary, grammar) are most positively affected by SI? What contextual (e.g., ESL/EFL/FL), treatment (e.g., SI delivery mode), language typology (e.g., L1/L2 script similarity), and research (e.g., random assignment) characteristics can moderate SI effectiveness on L2 outcomes?
What is the overall effectiveness of SI in improving self-regulated learning? What self-regulated learning outcomes (anxiety, self-efficacy, attitudes, strategy use, strategy effectiveness) are most positively affected by SI? What study characteristics (contextual, treatment, methodological) can moderate SI effectiveness on self-regulated learning?
Method
Data Sources
Literature Search Procedures
To ensure that a high proportion of eligible sources (both published and unpublished) are located, the search strategy included an electronic literature search as well as a number of complementary literature searches. Complementary literature searches served to locate literature not easily accessible through electronic databases (e.g., research reports, book chapters, dissertations, and theses). This was done to minimize the publication bias threat (i.e., underrepresentation of studies lacking statistically significant findings in published sources potentially leading to overestimating the effects of interest in meta-analytic studies; Lipsey & Wilson, 2001).
The electronic literature search included the following databases: (a) Linguistics and Language Behavior Abstracts, (b) ProQuest Dissertations and Theses, (c) ProQuest Research Library, (d) ERIC, (e) PsychINFO, (f) Sociological Abstracts, and (g) Web of Science. Basic search terms in three categories—intervention, outcome, methodological characteristics (e.g., LLS, L2, experiment)—and subject indices in each database were used to locate eligible studies. Individual databases use different keywords and search terms, thus, the basic search terms were tailored for each database to maximize the number of results and their relevance. An example of a library search strategy for ProQuest is: (strateg* NEAR/2 lang* OR learn* strateg* NEAR/2 lang* OR strateg* instruction NEAR/2 lang* OR strateg* train* NEAR/2 lang* OR learn* train* NEAR/2 lang* OR process-based teach* NEAR/2 lang* OR learn* skills NEAR/2 lang* OR learn* behave* NEAR/2 lang* OR self* learn* NEAR/2 lang* OR autonom* learn* NEAR/2 lang*) AND (ab(second language OR foreign language OR biling* OR immersion)) AND (ab(intervention OR treatment OR control OR comparison OR experiment OR effect OR impact OR outcome)). Complementary literature search procedures included manual search of the reference lists of qualifying studies and forward citations through Google Scholar of earlier syntheses.
Inclusion Criteria
To capitalize on a broader knowledge base and to reflect a large variability in language learners’ backgrounds, the present meta-analysis aimed to integrate studies conducted between 2008 and 2014 around the world in varied L1/L2 combination contexts. This decision is based on precedence in educational (August & Shanahan, 2006), applied linguistics (Spada & Tomita, 2010), and educational psychology (Adesope et al., 2010) research. To capture relevant studies on the effectiveness of SI, the following criteria for inclusion were developed:
Studies must involve an explicit strategy-training intervention targeting a single or multiple learning strategies in one or multiple strategy categories (i.e., cognitive, metacognitive, and/or socioaffective).
Studies must have been carried out in a L2/FL setting.
Studies must be primary, quantitative investigations to allow for statistical data extraction.
Studies must be experimental or quasi-experimental, between-groups designs, thus allowing for more valid inferences regarding instructional strategy effectiveness.
Studies must be presented in English, French, Chinese, or Russian.
The language selection in “e” was limited to those languages which are spoken by the authors and in which research is typically published.
Data Coding
After running electronic and complementary literature searches, the titles and abstracts of identified studies were scanned using a Screening Guide that aligns with our inclusion criteria. The Screening Guide served as a basis for rater judgments regarding the likely relevance of the studies. If study relevance could not be determined from the readings of the titles and abstracts, full-length study reports were retrieved and reviewed. Final relevance decisions for all studies was based on the reading of the full texts.
The initial electronic and complementary literature searches yielded a total of 2,628 potentially relevant articles. Following a screening of titles and abstracts, 2,528 studies were removed. A full-text screening of the remaining 98 studies (2 studies could not be retrieved) was conducted independently by two authors; a third author was consulted in case of a disagreement. Reasons for excluding studies during the full-text screening process included irrelevant treatment (e.g., LLS were part of a larger scope intervention with no possibility of ascertaining the unique effects of LLS; n = 43), irrelevant design (e.g., two SI interventions were compared with no true control; results for language learners were reported in aggregate with those of native speaking participants; n = 15), and irrelevant outcomes (i.e., neither FL/L2 nor self-regulated learning outcomes were assessed; n = 1). The final sample of eligible studies across two research questions was 39: 37 studies (47 independent samples) for first research question and 16 studies (17 independent samples) for second research question (two studies were nonoverlapping, focusing exclusively on self-regulated learning outcomes).
Next, studies meting the inclusion criteria were coded by two independent raters using a pre-established coding protocol. The protocol was developed based on Valentine and Cooper’s (2003, 2008) recommendations for study quality assessment, piloted in a scoping review study, and refined by the research team. It included eight sections that captured information about the review process (e.g., reviewer and study identification), the study setting (e.g., educational level), sampling procedures (e.g., assignment to treatment), sample characteristics (e.g., age), treatment details (e.g., intervention duration), data collection (e.g., outcome measures), data analyses (e.g., unit of analysis), and the results (e.g., means, standard deviations). One of the outcomes of data extraction procedures is to generate a set of ESs to be synthesized in a meta-analysis. Another outcome is to code studies—that is, to generate a list of categorical codes along four dimensions: outcomes, learning contexts, structural features of SI treatments, and methodological characteristics of individual studies—for subsequent moderator analyses. After the initial training, the average interrater reliability on the coding protocol was about 92% with most disagreements falling under the treatment details category. All disagreements were resolved between the two coding authors, consulting with a third author as needed.
Data Analyses
Effect Size Estimation
The standardized mean difference (Cohen’s d) served as the measure of ES; its estimation for individual studies and integration across studies followed the common and best practices recommended by Cochrane Handbook (Higgins & Green, 2011). For each individual study, a Cohen’s d index (Lipsey & Wilson, 2001) was computed initially. This index is a standardized measure of difference between two group means divided by the pooled standard deviation. Cohen’s d index was computed as
where
where
When studies reported more than one ES from the same sample (e.g., multiple measures of the same construct), we either selected the most relevant outcome (e.g., test rather than self-assessment) or used a weighted average ES (e.g., synthesizing results reported individually by LLS), so that each study only contributed one independent ES to the main analysis, and used the shifting unit of analysis approach (Cooper, 2010) when testing moderators. The shifting unit of analysis approach involves averaging effects when appropriate (e.g., averaging related language outcome effects when testing the overall SI effect), then splitting the effects when testing the dimension on which they differ. In this example, when asking whether program effects appear to be larger for one language outcome than for the other, a study with two or more language outcome measures contributed to both levels of that moderator.
Effect Size Integration and Moderator Analyses
Although interpretation of results could potentially be maximized by providing both the fixed- and random-effects models, a decision was made to use the random-effects model to compute the main language and self-regulated learning effects of explicit SI for two major reasons. First, unlike the fixed-effect model, random-effects model assumes that the true ES varies from one study to the other (Hedges & Vevea, 1998). That is, the fixed-effect model assumes that all studies are estimating a single population parameter. This assumption implies that if all studies were infinitely large, they would all have the same ES. By extension, it also implies that methodological differences across studies do not matter. In contrast, the random-effects model assumes that studies are drawn from a distribution of ESs. This assumption implies that (a) differences in methods across studies do matter and (b) even if all studies were infinitely large, they would not all have the same ES. Thus, the random-effect model is more appropriate for our study as even though recent SI research has seen a notably greater standardization of SI intervention frameworks, other variations in methods (e.g., duration, delivery mode) and samples (e.g., age, educational level) remain. Second, a random-effects model allows for broader generalization of results (Lazowski & Hulleman, 2015). Considering our goal to more broadly draw inferences beyond the studies included in this meta-analysis, we decided to use the random effects model for the main effects.
As recommended by Borenstein et al. (2009), all moderator analyses were carried out using mixed-effects model, which includes a random-effects model within subgroups. Within constructs, all individual ESs were integrated to compute a weighted average ES and its 95% confidence interval. Weighing by sample sizes of contributing individual sizes was done using the typical inverse variance weighting method. The heterogeneity of study effects was explored using a statistical significance test (Q statistics; a significant Q statistic indicates that the ESs vary more than would be expected given sampling error alone, suggesting that studies estimate different population parameters). To provide estimates of the amount (degree) of such heterogeneity, we also computed τ2 and I2. (These data are available on request.) All ES integration and moderator analyses were conducted using Comprehensive Meta-Analysis software.
Missing Data
Missing data can undermine the interpretation of any meta-analysis. Publication bias occurs due to the less likely tendency of studies lacking statistically significant results to be published. Because published studies are easier to locate than unpublished studies, this can lead to a bias against the null hypothesis. There are no well-accepted statistical methods for dealing with this problem; thus, we used complementary literature searches to search for unpublished material (procedures to locate the highest number of eligible, both published and unpublished studies, are described earlier). In addition, to assess the likelihood of the presence and magnitude of publication bias, we carried out three different tests: Classic Fail-Safe N, Orwin’s Fail-Safe N, and Egger’s regression analysis (Borenstein et al., 2009).
Results
The first research question addressed the effectiveness of SI in improving L2/FL learning. The overall weighted mean (see Table 1) represents a large effect on Cohen’s (1988) scale. The second research question addressed the effectiveness of SI on self-regulated learning associated with L2/FL learning. The overall weighted mean for this learning domain also represents a large effect. These results indicate that, on average, the participants in the SI treatment groups scored approximately 0.8 and 0.9 standard deviations above those in control groups on language and self-regulated learning measures, respectively. Statistically significant homogeneity test results (Q statistics) indicated that the studies estimated different population parameters.
SI effectiveness: Overall and by outcomes
Note. SI = strategy instruction; ES = effect size; K = number of comparisons; SR learning = self-regulated learning.
To assess the likelihood of the presence and magnitude of publication bias, we first conducted Egger’s regression test (Egger, Smith, Schneider, & Minder, 1997). With regard to study sample for the first research question, the results showed that the symmetrical distribution around the weighted mean ES is indeed an indication that there was no publication bias (p > .05). Second, a “Classic fail-safe N” test was performed to determine the number of null effect studies needed to raise the p value associated with the mean effect above an arbitrary alpha level (α = .05). The Classic fail-safe N test revealed that 3,473 additional qualified studies would be required to nullify the overall ES found in this meta-analysis. Finally, we conducted a more stringent test (Orwin’s fail-safe N) to examine the number of studies needed to invalidate the results of this meta-analysis. Results showed that 533 missing null studies would be required to bring the mean ES found in this meta-analysis to a trivial level of .05. Similar results were found for the second research question sample (Classic fail-safe N = 718; Orwin’s fail-safe N = 176). The results of these fail-safe statistical tests show that the number of null or additional studies needed to nullify the overall ES found in this meta-analysis is larger than the 5k + 10 limit suggested by Rosenthal (1995). Hence, the meta-analytic results of this study are valid, not threatened by publication bias.
For individual outcomes, the largest effects were detected for vocabulary and reading, followed by listening, general proficiency, and speaking (language learning domain) and for strategy effectiveness, strategy use, and anxiety (self-regulated learning domain; see Table 1). Nontrivial effects were detected for writing, attitudes, and self-efficacy. Statistically detectable ESs were obtained irrespective of primary study designs’ strengths or weaknesses (i.e., whether reliability of instruments used was reported or not, the assignment was random or not, and pretests were administered or not; see Table 2). As elaborated below, a number of context, treatment, and methodological characteristics were found to have moderating influences on SI effectiveness for both language-learning and self-regulated learning domains. When there were three levels of the moderator and the omnibus moderator analysis indicating significant differences among the three levels, we conducted follow-up, pairwise comparisons to examine the source of the significant differences.
SI effectiveness by subgroups
Note. SI = strategy instruction; ES = effect size; K = number of comparisons; ESL = English as a second language; EFL = English as a foreign language; FL = foreign language; N = number of participants; SR learning = self-regulated learning.
Because the average number of strategies taught across qualifying studies was eight, this number was set at the threshold separating studies with below and above the average number of strategies taught (a similar threshold was also set in Plonsky, 2011).
SI Effectiveness for Language
Six contextual variables—setting, age, educational level, proficiency, and two language typology variables—were assessed with respect to their relationship with the effectiveness of SI on language learning (see Table 2). Moderate to large effects were obtained for all subgroups with the smallest effect being associated with the ESL setting (ES = 0.58) and the largest effect being associated with elementary education level (ES = 1.31). The overall trends in the results show that larger effects on language learning were obtained in foreign language (EFL, FL) settings, for younger learners (both in terms of age and educational level), for leaners above minimum proficiency, and for learners studying target languages belonging to the same family—but not necessarily having the same script—as their native languages.
In addition, six treatment-related variables were assessed (strategy scope, number of strategies, instructional procedures, duration, and two delivery variables). Medium to large effects were detected for all these subgroups with only two medium ES exceptions: SI interventions targeting more than eight strategies (ES = 0.55) and not focusing on metacognition (i.e., behavior modeling approach; ES = 0.57). The overall trends in the results show that larger effects on language learning were obtained for interventions that were shorter, focused on eight or fewer strategies, incorporated more than one strategy type, and used an awareness-raising approach. The effect for interventions targeting eight or fewer strategies (ES = 1.01) was significantly larger than that for interventions targeting more than eight strategies (ES = 0.55; Q = 6.08, p = .01).
Last, the three methodological moderators (reliability reported, random assignment, and pretest) were assessed. Medium to large effects were detected across different methodological features, with the only medium-sized effect associated with studies not using pretest (ES = 0.53) significantly smaller than that for studies using pretest (ES = 0.72; Q = 4.69, p = .03).
SI Effectiveness for Self-Regulated Learning
With regard to self-regulation, the moderating effects for all—but language family—contextual variables were statistically significant (see Table 2). Prior to further discussion of the results for this learning domain, however, it is worthwhile to note that moderator analyses for educational level, duration, and procedures should be interpreted with caution as at least one of the comparison groups has a small sample size (i.e., a single study).
Overall, contrasting with results for language learning, to some extent, larger effects were obtained in ESL settings, for older learners (both in terms of age and educational level), for leaners above minimum proficiency, and for learners studying target languages that differ from their native language to a greater extent (belonging to a different language family and not sharing the same script). A post hoc analysis conducted for a three-level settings moderator revealed that both the ESL and EFL effects were significantly larger than the FL effect (Q = 12.41, p < .001, and Q = 5.33, p < .05, respectively). There was no significant difference between the ESL and the EFL settings effects (Q = 2.45, p = .117). Another post hoc analysis for a three-level moderator revealed that the effect for university/adult learners was larger than those for middle/high school (Q = 8.19, p < .001) and elementary school (Q = 30.99, p < .001) learners. In addition, the effect for middle/high school learners was larger than that for elementary school learners (Q = 6.15, p < .05). Notably, the effect on self-regulated learning was significantly and substantively (by about 3.4 times) larger for older participants (12 years and older; ES = 0.98) than it was for younger participants (younger than 12 years; ES = 0.29; Q = 7.25; p = .01). A statistically and practically larger SI effects were found for learners studying languages with different from—rather than similar to—their native languages scripts (ESs of 1.12 and .26, respectively; Q = 9.83, p = .00).
For treatment-related variables, larger effects on self-regulated learning were obtained for interventions that were longer in duration, focused on a single strategy type, used an awareness-raising approach, were delivered by technology, and were researcher-led. Two of the moderating effects were statistically significant. Specifically, the effects were significantly larger for studies with SI focusing on only one strategy type (ES = 2.30) as contrasted with SI focusing on a combination of strategies (ES = 0.69; Q = 4.91; p = .03) and with SI being led by the researcher (ES = 0.97) as contrasted with SI being led by the teacher (ES = 0.31; Q = 9.75; p = .02).
For method, there was only one significant moderator effect associated with assignment to treatments. That is, studies employing random assignment designs reached a statistically larger ES (ES = 1.19) than did studies with nonrandom assignment designs (ES = 0.37; Q = 6.89; p = .01).
Correlations Among Moderators
Correlation analyses among statistically significant moderator results by domain and group indicated that for self-regulated learning, nonsurprisingly, age and educational level positively correlated, r = .54, p < .05. Interestingly, there was a significant negative correlation between proficiency and script, r = −.67, p < .001. Script was coded as 0 for different and 1 for same; proficiency was coded as 1 for low and 2 for intermediate/high. No other correlations for statistically significant moderator results were found.
Discussion
The overall results of this meta-analysis indicate that SI works for improving both L2/FL learning and self-regulated learning. Notably, the overall effects were medium to large for language learning and self-regulated learning (see Table 1) and almost two times larger than the small to medium effect of 0.49 reported in Plonsky (2011). This finding may be attributed to greater emphasis on self-regulated learning—particularly on the metacognitive component of self-regulated learning—in more recent SI interventions (e.g., Dabarera et al., 2014; Lam, 2009; Vandergrift & Tafaghodtari, 2010; Zenotz, 2012) synthesized in this study. Indeed, whereas only about one fifth of primary studies identified by Plonsky’s (2011) meta-analysis examined SI impacts on self-regulated learning, almost half of the studies included in this meta-analysis targeted at least one of the three—metacognitive, motivational, or behavioral—dimensions of self-regulated learning (Zimmerman, 1990).
Thus, in addition to corroborating “empirical justification for integrating learner training programs into L2 curricula” (Plonsky, 2011, p. 1013), this study provides indirect empirical grounding for a theoretically postulated need for SI to be directed toward developing dimensions of self-regulated learning in students (Zimmerman, 1990; see also discussions in Graham & Macaro, 2008; Grenfell & Macaro, 2007). It is worthwhile to note, however, that the results for self-regulated learning are far from being conclusive, due to the still small number of studies focused on this outcome, necessitating farther research in this area. In the following paragraphs, we discuss our findings in light of how SI effectiveness may be affected by contexts and structural features of the strategy training.
Moderation by Context
ESL, EFL, and FL Setting
As noted earlier, previous meta-analyses reported conflicting trends with regard to SI effectiveness across language instruction settings. That is, whereas Plonsky (2011) found that the overall SI effect—synthesized across language and self-regulated learning outcomes—was larger in L2 versus FL settings (the effects of 0.84 and 0.46, respectively), Taylor’s (2014) results, favored FL settings for SI effectiveness on a language (reading) outcome (FL: 0.74; EFL: 0.54; and ESL: 0.51). Building on these two syntheses, we further break down the setting moderator analysis by outcome domains (language vs. self-regulated learning) and examine its effects at three (ESL, EFL, and FL) levels.
Our results show that, when it comes to language outcomes, there is a trend in means—similar to one reported in Taylor (2014)—favoring FL (FL: 0.79; EFL: 0.84) versus L2 (ESL: 0.58) settings. Also similar to Taylor (2014), we find that the differences among ESs are not statistically significant, suggesting that SI may be equally effective in improving language outcomes across L2/FL settings. To further increase our confidence in the validity of this encouraging finding, however, more primarily research in L2 contexts is needed.
When it comes to self-regulated learning outcomes, our results are more in line with those reported in Plonsky (2011), with SI effects being significantly smaller in FL settings (ESs: 1.26; 0.93; 0.37 for ESL, EFL, and FL, respectively). Notably, the EFL effect is statistically similar to that associated with ESL and statistically different from that associated with FL context. This may be due to ever growing spread and prestige of English (Braine, 2005; British Council, 2013), increasingly associated with access to the social, cultural, and material resources (such as membership, education, employment; Bourdieu, 1986), and plausibly leading to increasing learners’ willingness to invest in own identity (Norton, 2006) to gain such access. Although the smallest, SI effect in FL contexts is nontrivial (approaching medium) in size, suggesting SI’s potential to improve—among other aspects of self-regulated learning—motivation, which is typically lower for learning FLs (Dörnyei, 2005), particularly in English-speaking countries (Grenfell & Harris, 2013; Macaro & Erler, 2008), and is of concern to FL educators (Grenfell & Harris, 2013).
Age and Educational Level
A trend in means, although with no statistically detectable mean differences, suggests that when it comes to language outcomes, younger learners tended to benefit from SI more. That is, the SI effect for younger learners was about 1.4 times larger than that for older learners (ESs: 1.05; 0. 74); similarly, the SI effect for elementary students was about 2.0 and 1.7 times larger than those for secondary and postsecondary students (ESs: 1.31; 0.67; 0.79). However, when it comes to self-regulated learning outcomes, we see an opposite and statistically detectable trend favoring older students both with regard to age, below versus above 12 (ESs: 0. 29; 0.98), and with regard to primary versus secondary and postsecondary educational levels (ESs: 0.19; 0.54; 1.18). As noted earlier, the results for self-regulated learning should be interpreted with caution due to the small number of studies focused on younger learners. The latter observation, however, is not surprising. Self-regulated behaviors, in general, follow the developmental trajectory: Older, more mature individuals are more likely to be in control of their learning behaviors and, therefore, are more likely to benefit more from SI with explicit focus on self-regulated learning. This developmental readiness may be a natural confound in how prevalent is the incorporation of an emphasis on self-regulated learning in SI interventions.
Overall, the “rich-gets-richer” hypothesis provides a plausible explanation for age and educational-level effects on both language and self-regulated learning domains. That is, the SI language disadvantage for older students may be attributed to inferior language learning abilities among older learners (i.e., limited plasticity on adult L2 learning [Chomsky, 1959] after about the age of 12 years [Long, 1990]). Other possible causes of younger learners’ language benefit—still within rich-gets-richer hypothesis—include greater availability of simplified input, educational opportunities, and peer support at a younger age (Bialystok & Hakuta, 1999). Indeed, ample research (e.g., Norton, 2006; Rubenfeld, Sinclair, & Clément, 2007) has documented that adult L2 learning occurs in qualitatively different contexts as compared to children (e.g., evening classes vs. structured curriculum at school) and with different purposes (e.g., leisure/hobby vs. academic preparation). These qualitative differences may underlie the developmental differences in L2 learning success. In turn, as noted earlier, the SI self-regulated learning benefit for older students may be attributed to superior cognitive abilities among older learners (e.g., more sophisticated understandings of tasks, academic self-perceptions, and abilities to monitor “differential effectiveness” of LLS to support learning; Zimmerman, 1990, p. 13).
Notably, although SI language effect was statistically similar across age groups and educational levels, the self-regulated learning effect was significantly larger for older learners and learners in upper educational levels (p < .05). This suggests that superior cognitive abilities among older learners may, ultimately, compensate for limited L2 learning plasticity constraints related to language learning ability while making older learners more prone to benefiting from SI in terms of enhancing their self-regulated learning.
Proficiency
In terms of language outcomes, our results indicate that more proficient L2 learners tend to benefit more from SI. These findings are consistent with previous meta-analytic studies (Plonsky, 2011; Taylor, 2014) and can be attributed to what is known as the threshold hypothesis (Ardasheva, Tretter, & Kinny, 2012; Clarke, 1979, 1980; Cummins, 1979, 2000; Lee & Schallert, 1997; Schoonen et al., 1998). The hypothesis holds that a certain level of L2 competency (vocabulary, grammar) must be attained prior to learners’ being able to fully benefit in their L2 development from broader range of (meta)cognitive skills such as LLS. However, it is worthwhile to note that although SI effects for language tended to benefit more proficient (as contrasted with less proficient students; ESs of 0.82 and 0 69, respectively), these effects were similar from both practical and statistical points of view (similar result patterns were reported in Taylor, 2014).
The lack of a statistically significant difference between proficiency groups may plausibly be attributed to a somewhat arbitrary nature of grouping learners by proficiency (Plonsky, 2011). That is, studies often identify proficiency levels in a relative rather than absolute sense by setting up within-group cutoff points (e.g., mean split, standard deviation distances from the mean), often based on researcher-developed measures (e.g., recall tasks, observation rubrics). To better understand the relationship between SI and proficiency levels, individual studies conducted in the future need to strive, whenever possible, for a greater standardization in terms of identifying and reporting proficiency levels (e.g., reporting learners’ standings on such ubiquitous measures as Test of English as a Foreign Language, placement tests, and K-12 standardized assessments) or, when standardized measures are not available, providing more information on learners’ language history and context (e.g., years of prior L2 study, intensity of L2 study).
In terms of self-regulated learning outcomes, we see a similar trend benefiting more advanced learners, with the effect for higher proficiency students (1.06) being significantly (p = .01) and sizably (by 3.7 times) larger than that for lower proficiency students (0.29). Surprisingly, however, there were some unexpected differences across learning domains in terms of ESs’ magnitudes. That is, although self-regulated learning (1.06) and language (0.82) effects for higher proficiency students were comparable in magnitude, for lower proficiency students self-regulated learning effect (0.29) was 2.4 times smaller than that for language (0.69). This appears to be inconsistent with SI assumptions. Namely, considering the theoretical postulate that more self-regulated behaviors would be associated with greater learning (Zimmerman, 1990; see also Dörnyei, 2003, 2005), one would expect that the ESs for self-regulated learning and language outcomes would be more similar in magnitude.
This discrepancy in effect magnitudes across learning domains is far from being conclusive, however, for a number of reasons. First, the number of reports examining SI effectiveness for self-regulated learning outcomes among low proficiency leaners was small (N = 4; only about one fourth of the total sample), to strongly support any conclusions. Second, a number of studies assessed SI effectiveness only for self-regulated learning—but not for language—outcomes (e.g., Baleghizadeh & Mortazavi, 2013; Kashef, Pandian, & Khameneh, 2014), potentially obscuring the comparison of effect magnitudes across learning domains.
Finally, to gage SI impacts on self-regulated learning, most studies measured frequency of LLS use. As noted earlier, L2 success does not necessarily depend on number or frequency but rather on efficacy of LLS use (Cohen, 1998; Hsiao & Oxford, 2002; Vandergrift & Tafaghodtari, 2010; Zenotz, 2012). Yet, only two qualifying studies (Lam, 2009; Ranalli, 2013) measured LLS efficacy using observation techniques and screen shots, respectively, to gage learners’ ability to effectively select and orchestrate LLS targeted for instruction.
Taken together, these considerations suggest a need for additional research to better understand the relationship between SI and proficiency levels needing more studies that would (a) gage SI impacts both on language and self-regulated learning, particularly with lower proficiency students, (b) provide enough data for effect extraction, and (c) capture LLS efficacy.
Language Typology
With regard to language typology, our results suggest that SI effects on self-regulated learning are greater in contexts with greater L1/L2 distance (as suggested by trends in means), particularly when it comes to L1/L2 not sharing the same script (as indicated by statistically detectable mean differences). This may be due to learners’ seeing a greater value in and experiencing a greater need in applying conscious processes—including LLS—when learning a new language that is less similar to their own.
When it comes to language outcomes, the trending in the means suggested that SI may be more effective when languages are more similar grammatically (same family), but less similar orthographically (different scripts). This may be due to SI’s facilitating learners’ L1 to L2 transferability decision making (Kellerman, 1995) regarding common grammatical features and learners’ needing more conscious strategies when dealing with script differences. The effect differences, however, were not statistically significant. This suggested that when affecting language outcomes, SI may be equally effective with both more and less similar L1/L2 combinations. Additional primary research with a more fine-grained operationalization of L1/L2 differences would shed more light on this issue.
Moderation by Treatment
The results regarding treatment characteristics highlight a number of variables affecting SI effectiveness. First, as expected, SI effects associated with awareness-raising approach tended to be larger than those associated with behavior-modeling approach. The lack of statistically significant difference, however, is somewhat unexpected and will be discussed later in the article. Furthermore, similar to Plonsky (2011), our meta-analysis provides strong support for less-is-more approach when it comes to deciding on the number of strategies to teach to improve learners’ language skills. Although focusing on a greater number of strategies appeared to benefit SI impacts on self-regulated learning to a greater extent, this advantage was not statistically significant. Rather, a significant less-is-more approach’s benefit for self-regulated learning outcomes demonstrated itself in studies that focused on a single type of LLS. This may be attributed to more focused instruction and a closer conceptual or procedural similarity among strategies of the same type making it easier for the learner to internalize.
Unlike Plonsky’s (2011) study, our results show a trend in means benefiting shorter interventions for improving language skills (but longer interventions for improving self-regulated learning skills). A lack of statistically significant difference between shorter versus longer intervention durations, however, suggests that both short-term (intensive) and longer term (incremental development over time) interventions may be equally beneficial for the learner and that the decision regarding the treatment length may be best decided by the teacher depending on learners’ needs. This finding—although far from being conclusive necessitating more longitudinal research with delayed posted to gauge long-term impacts—alleviates some of the earlier concerns regarding SIs taking valuable time away from direct language instruction (Bialystok & Hakuta, 1990, as cited in Plonsky, 2011; Rees-Miller, 1993) and provides support for current pedagogical recommendations for instruction balanced between language and (meta)cognitive skills (Chamot, 2009; Peregoy & Boyle, 2012).
With regard to SI delivery methods and agents, two interesting findings emerged. First, the statistically similar ESs for methods suggest that technology-delivered SI is not inferior to teacher-delivered SI. In fact, the trending in the means suggests greater benefits for the latter method, particularly when it comes to self-regulated learning outcomes. In turn, the use of technology to deliver SI has the potential to further alleviate the cost–benefit considerations noted earlier. That is, delivering SI entirely, or some aspects of it, on a computer (Ranalli, 2013; Urlaub, 2012) or using videos (Kim, 2013) has the potential to minimize demands on the teacher and on instructional time as learners may access these resources independently and proceed at their own pace. Additionally, “online SI might also facilitate more rigorous and ecologically valid research designs” by allowing for more accurate data collection of both products’ and processes’ data (Ranalli, 2013, p. 76). These findings, however, need to be treated with caution, as they are based only on a few studies using technology and there is clearly a need for more primary research in this area.
Second, the results indicate that, when it comes to language outcomes, researcher-led SI and teacher-led SI are equally effective, a nontrivial finding providing some support for ecological validity of SI research. However, the SI effect on self-regulated learning was significantly and sizably (by three times) smaller in teacher-led versus researcher-led interventions. This may be due to at least some of the educators’ coming from traditional, behavior-based teacher preparation programs. Educators with such preparation are likely to believe in the value of and be comfortable with providing LLS modeling and practice opportunities (basic components of the behavior-modeling approach), but may be less skilled with or willing to invest classroom time in consciousness-raising and effectiveness evaluations regarding LLS (two components essential for the awareness-raising model). This, in turn, is likely to lead to learners’ adopting LLS, but without either conscious awareness or attribution of their successes to LLS use.
Taken together, these considerations provide a plausible explanation to the lack of statistically significant differences between awareness-raising and behavior-modeling approaches noted earlier and highlight a need for further investigation of the issue in primary studies. From a practical perspective, these findings suggest a need for a greater emphasis on the awareness-raising model and on self-regulated learning and metacognition as underlying mechanisms of SI effectiveness in educator preparation and professional development programs as well as in SI curricular materials. From a research perspective, these findings highlight a need for further research on teachers’ implementing SI, in particular, it terms of “what teachers know and how they use it” (Cohen & Griffiths, 2015, p. 426).
Another concern regarding SI delivery agent is that the number of studies in which the researcher (or the teacher/researcher) led SI was notably larger than that of studies with regular teachers’ leading SI (respectively, about 53% and 21% of studies for language and about 47% and 24% for self-regulated learning) and a substantial number of studies did not clearly state who delivered SI. To better ascertain ecological validity of SI—after all, teachers are the ones who typically deliver SI—more research on teacher-led SI interventions is needed.
Conclusions and Recommendations
Given a continued interest in SI among educators and researchers (Cohen & Griffiths, 2015), a growing decline in FL-learning motivation (Grenfell & Harris, 2013; Macaro & Erler, 2008), and a simultaneous need to improve L2/FL proficiency “to boost stability, employability, and prosperity” in globalized world (British Council, 2013, p. 3), examining the extent to which SI may affect learners’ L2/FL learning and self-regulated learning is essential.
The overall results of this meta-analysis indicate that SI works for improving both learning domains. From a theoretical perspective, this study provides empirical grounding for a theoretically postulated need for SI to be directed toward developing dimensions of self-regulated learning (Zimmerman, 1990; see Grenfell & Macaro, 2007). From a practical point of view, this study offers several insights. In particular, this study (a) corroborates “empirical justification for integrating learner training programs into L2 curricula” (Plonsky, 2011, p. 1013); (b) provides guidelines for more effective SI designs by identifying student populations that are more prone to benefit from SI and treatment features that are associated with greater gains; and (c) suggests a need for a greater emphasis on self-regulated learning as an underlying mechanism of SI effectiveness in teacher preparation and professional development programs as well as in SI curricular materials.
Additional research is needed to further investigate the relationships among self-regulated learning, language learning, and SI. In particular, there is a need for primary studies that would instructionally incorporate and assess—in addition to language outcomes—all three dimensions (metacognitive, motivational, behavioral) of self-regulated learning and assess not only the direct effects of SI (SI on language learning and on self-regulated learning) but also its mediating effects (SI on language learning via self-regulated learning). When assessing the behavioral dimension of self-regulated learning, more primary research is needed to gage SI impacts on efficacy (e.g., Lam, 2009; Ranalli, 2013)—rather than on frequency—of LLS use. After all, L2 success does not depend on frequency of LLS use but rather on the learner’s ability to select and orchestrate LLS effectively (Macaro, 2006). Finally, more primary studies with teacher-led and technology-delivered interventions are needed to better ascertain ecological validity of SI and to further explore the benefits of brining technology to the field.
Footnotes
Notes
Authors
YULIYA ARDASHEVA is an assistant professor in ESL/bilingual education at Washington State University Tri-Cities, Department of Teaching and Learning, College of Education and Human Development, 2710 Crimson Way, Richland, WA 99354, USA; email:
ZHE WANG received his doctorate degree from Washington State University and is currently a post-doctoral researcher at Washington State University Pullman, Cleveland Hall 80, Pullman, WA 99163, USA; email:
OLUSOLA O. ADESOPE is an associate professor in educational psychology at Washington State University Pullman, 1155 College Avenue, Cleveland Hall 356, Pullman, WA 99164-2136; email:
JEFFREY C. VALENTINE is a professor in educational psychology at University of Louisville, 309 CEHD, Louisville, KY 40290; email:
