Abstract
Corpus analyses of learners’ dictionaries of English idioms have revealed that 11% to 35% of English figurative idioms show either alliteration (miss the mark) or assonance (get this show on the road), depending on the type considered. Because English multiword combinations, particularly idiomatic expressions, present a huge challenge even to advanced learners, techniques for helping learners come to grips with this part of the lexicon should be welcomed. A quasi-experiment was conducted to investigate whether interword phonological similarity (specifically, alliteration and assonance) facilitates the delayed recall of the forms of common second language (L2) English figurative idioms which were not known at pretest. Twenty-six advanced-level EFL learners learned significantly more phonologically similar, or ‘sound-repeating’, idioms than phonologically dissimilar control idioms after a treatment designed to raise awareness of phonological similarity and to direct learners’ attention toward occurrences of it. Learners in a comparison group (n = 24), who experienced no awareness raising or attention direction, recalled more non-sound-repeating control idioms than sound-repeating ones. We conclude that the presence of sound-repetition in idioms makes the forms of those idioms relatively easy to recall, but only when learners experience appropriate awareness raising and attention direction. It appears that the techniques of awareness raising and attention direction did not hinder learning of the control idioms.
Keywords
I Introduction
In order to attain a high level of proficiency in a foreign language, learners need to familiarize themselves with very many formulaic sequences, a hard challenge for most post-childhood learners (Li & Schmitt, 2009; Qi & Ding, 2011). One especially troublesome type of second language (L2) formulaic sequence is that of idioms. Various means have been suggested whereby teachers and materials writers might help learners acquire L2 idioms with greater success (for reviews, see Boers & Lindstromberg, 2009; Boers, Lindstromberg, & Eyckmans, 2012). One proposal, directed particularly at the learning of idiom forms, is to raise learners’ awareness of the prevalence of patterns of interword phonological similarity, or sound repetition, in English idioms. This proposal is based on two assumptions. The first is that given appropriate pedagogical intervention, certain patterns of sound repetition render L2 phrasal expressions relatively memorable. The second is that these patterns of sound repetition are sufficiently common in the L2 phrasal lexicon for the first assumption to be of interest. In this article both assumptions will be addressed.
Idioms are so multifarious that it is common for researchers to regard the category as fuzzy (e.g. Nunberg, Sag, & Wasow, 1994; Zyzik, 2011). Although our proposals are not necessarily limited to any one type of idiom, in this article we focus on so-called figurative idioms; these being relatively fixed, typically informal, conventional expressions whose meanings arise at least in part from a trope such as metaphor, metonymy, hyperbole, and/or irony (Gibbs, 1994; Nunberg et al., 1994). In adopting this focus we follow other researchers of L2 idiom learning (such as Boers, Eyckmans, & Stengers, 2007; Grant, 2007; Zyzik, 2011).
II Sound repetition in English phraseology: Alliteration and assonance
The current study is concerned with two patterns of interword, intra-phrase sound repetition, alliteration and assonance. We define alliteration as the occurrence of the same consonant onset in two or more content words within a phrase (e.g. miss the mark) and assonance as the repetition of a vowel in a prominent syllable of two or more content words within a phrase, as in j
III Experimental evidence for positive effects of sound repetition in lexical recall
In recent decades most experimental investigations of the mental lexicon have taken place within a connectionist framework in which it is expected, for instance, that activation of a word in memory (the activation being brought about, for example, by the mention of a particular topic) may spread to other words. That is, these other words are also activated, whereby they too become more available for processing and for production. Within a connectionist framework it is expected that the activation of one word by another is especially likely to take place if the two words are semantically related and/or phonologically similar (e.g. Gupta & MacWhinney, 1997; Luce, Goldinger, Auer, & Vitevitch, 2000; Storkel & Morrisette, 2002). The connectionist view accounts well for the results of a vast number of investigations of effects of ‘phonological similarity’ (i.e. sound repetition) on the accessibility of words in memory. Some 20 years ago Rubin (1995) noted that hundreds of studies had been carried out to investigate phonological similarity effects on participants’ ability to learn lists of ‘paired associates’, that is, pairs of items, typically short first language (L1) words or invented L1-like nonwords (for an overview, see Rubin, 1995). In these studies the item pairings are stipulated by the researchers. Typically, participants go through several cycles of ‘practising’ (i.e. trying to memorize) the pairings and then being tested on them. A strong finding is that when participants are asked to recall practised word pairs in whatever order they like, pairs that show sound repetition are learned faster and recalled better than ones that do not. This holds true for rhyme (e.g. Bower & Bolton, 1969; Nelson & Garland, 1969), for alliteration+assonance (e.g.
A second extensive subliterature is concerned with learning lists of unpaired words or nonwords. When participants are asked to recall these items in any order, they tend to succeed better with ones that show alliteration and rhyme (Gupta, Lipinski, & Actunc, 2005) as well as basic assonance (Watkins, Watkins, & Crowder, 1974), although assonance has been less studied.
Positive effects of phonological similarity have also been observed in studies of implicit priming, where observed response times are shorter when primes and response words alliterate than when they do not, with response times being shorter still when primes and response words both alliterate and assonate (for a summary, see Levelt, 1999).
An additional relevant stream of research concerns the extent to which relations of phonological similarity make primarily oral poetic texts relatively easy to recall. As might be expected, research has identified rhyme and alliteration as factors that facilitate ability to recall traditional counting out rhymes and passages of song lyrics: Provided that one of a pair or set of phonologically similar words is recalled, then recall of the other word(s) is made more likely (Rubin, 1995).
We know of no studies in any of the research streams touched on above that directly address phonologically similarity effects on the recollectability of conventionalized phrases in either L1 or L2. One probable reason for this research gap is the extreme difficulty – even in a laboratory setting – of controlling for nuisance semantic variables when the stimulus expressions are real phrases: After all, achieving approximate control is hard enough in the case of real single words. However, in practice-oriented L2 research a more exploratory approach may be called for than is normal in psycholinguistics. Accordingly, there have been a number of (quasi)experimental paired-design studies of the extent to which alliteration and assonance can help upper-intermediate and advanced learners recall or recognize short L2 English phrasal expressions that host these patterns better than similar phrases that do not (Boers, Eyckmans, & Lindstromberg, 2014; Boers, Lindstromberg, & Eyckmans, 2012, 2014a, 2014b; Lindstromberg & Boers, 2008a, 2008b). In these studies, the stimulus expressions have typically been strong collocations (e.g.
IV Aim and research questions
Our aim in this experiment was to investigate the relatively durable, rather than short-term, retention of the forms of previously unfamiliar L2 English idioms. The research questions were:
With respect to recollectability from semantic memory, do previously unfamiliar sound-repeating idioms have an inherent advantage over previously unfamiliar non-sound-repeating control idioms which comes to the fore even in the absence of pedagogical intervention (i.e. awareness raising and attention direction)?
If not, does an experimental treatment including brief awareness raising about patterns of sound-repetition in idioms and a simple attention direction task lead to higher recall rates of the forms of sound-repeating idioms than when participants are not made aware of the sound patterns in the target idioms?
If so, does this outcome occur at the expense of recall of the non-sound-repeating idioms in the set of idioms to be learned?
These research questions were operationalized by comparing the number of idioms that participants could not complete at pretest with the number that they could complete in a delayed posttest administered a week later. There was also an immediate posttest; but it plays a minor role in this report. For one thing, our interest did not lie with ephemeral form retention. For another (as will be seen), a pronounced ceiling effect is evident in these data, making inferential statistical analysis problematic.
In the experimental condition, the treatment included brief awareness raising about patterns of sound repetition in idioms, following which participants were set a marking task intended to direct their attention to instances of sound repetition among the set of stimulus idioms. For brevity, this condition is sometimes referred to below as the ‘AD’ (Attention Direction) condition, and the group that experienced it as the ‘AD group’. In the comparison condition, the treatment consisted simply in the participants being asked to study the targeted sound-repeating and non-sound-repeating idioms according to their own preferred personal learning strategies. Occasionally, when a brief name is necessary, the comparison group is referred to below as the ‘noAD group’.
V Method
1 Participants
The 50 participants who completed all stages of the experiment were Dutch-speaking students aged 19 to 21 majoring in English and an additional foreign language at the Department of Translation, Interpreting and Communication of Ghent University, Belgium. They formed two intact classes of respectively 26 and 24 students, and their level of proficiency in English was estimated as B2 according to the Common European Framework of Reference for Languages (CEF), which corresponds to an IELTS (International English Language Test System) score of 5 to 6.5. Each class was assigned to one of the treatments (nAD = 26; nnoAD = 24).
2 Materials
The expressions targeted in this experiment were 26 L2 English idioms drawn from two learner’s dictionaries, as detailed further below. We screened out candidate idioms that include Dutch–English cognate content words or that refer to the same or a very similar scenario in English as in Dutch. For example, bury your head in the sand and throw the baby out with the bath water were rejected for one of these reasons. Of the stimulus idioms, 13 show sound repetition (as shortly to be described) and 13 ‘control’ idioms do not (see Table 1). With the exception of the delayed posttest, on every occasion that the students encountered the target idioms – always in a different random order – there were two filler idioms before the block of target idioms and another two at the end. This was to mitigate primacy and recency effects. These fillers (turn back the clock, read between the lines, reach for the stars, bridge the gap) were never presented in the same order twice. Of the sound-repeating idioms, six show alliteration but not assonance (e.g.
The 26 targeted idioms.
Attempts were made to control for variables that are known to influence retention but which are extraneous to the variables of interest, those being, first, the presence (or absence) of sound repetition and, second, the presence (or absence) of awareness raising and attention direction. Table 2 summarizes how the two sets of idioms compare in terms of length, frequency, and imageability of meaning. The somewhat lesser average length of the control idioms was expected to favour their recall very slightly. The considerably greater average whole phrase and word frequencies of the sound-repeating idioms might be thought likely to favour their recall. However, it has been found that the amount of attention paid to a lexical item may correlate negatively with familiarity and, therefore, with frequency (see Tulving & Kroll, 1995). Moreover, as a reviewer pointed out, frequency matters most in contexts of incidental learning, which our study did not involve. Indeed, as will be seen, item frequencies appear to have played a negligible role in the outcomes we observed. Transparency of meaning is another variable of interest with respect to idiom learning. We judged that none of the targeted idioms was likely to be transparent to the participants to any great degree. However, we did not systematically assess their degree of transparency for the reason that all were to be introduced along with examples and glosses. It may be noteworthy here that the results of a study reported by Steinel, Hulstijn, and Steinel (2007) suggest that degree of semantic transparency (which they define in terms of the degree of overlap of literal and figurative meanings) is not a strong predictor of how well learners will recall the form of any given L2 idiom. A variable we were much more concerned with is imageability of meaning (i.e. the degree to which a given idiom conveys a sensori-motoric image) since imageability has long been known to be a powerful facilitator of lexical recall (Paivio, Yuille, & Madigan, 1968) even in the case of L2 English idioms (Steinel, Hulstijn, & Steinel, 2007). To make the experiment a good test of whether sound repetition can have a positive mnemonic effect, it was essential that the imageability of the sound-repeating idioms be equal to or even (to be on the safe side) somewhat less than that of the control idioms. As we knew of no published imagery ratings for idioms, we showed 33 L1 undergraduate Dutch informants a list of candidate idioms and collected from them ratings of these idioms on a 9-point scale. Specifically, the informants were each given a version of the study sheet used in the experiment that showed 32 idioms rather than the eventual smaller number of 26, and it was on this sheet that the informants wrote their imageability ratings. They were also given an instruction sheet reading as follows: Could you give your impression of the imageability of each of the following idioms? Specifically, how much does it suggest to you an image that is visual or which involves movement or physical contact. Please give a separate rating for each idiom on a scale of 1 (‘difficult’ to form an image) to 9 (‘extremely easy’ to form an image). (No half scores, please.)
Comparison of the targeted sound-repeating (SR) idioms, and the nonSR control idioms.
Notes. a Includes nouns, verbs, and adjectives. b This was counted as a noncontent word of one syllable. c As given by the online British National Corpus Simple Search facility: http://www.natcorp.ox.ac.uk/using/index.xml. dLemma frequencies are as given by the Corpus of Contemporary American English (http://corpus.byu.edu/coca) (Davies, 2008–16) in September, 2013.
Informants were asked to consider not only the idioms themselves but also the glosses and the examples of use shown on the study sheet. To minimize overall variation in the amount of attention paid to individual idioms, each informant was randomly assigned a different idiom as a starting point on the study sheet. It turned out that the 16 candidate phonologically similar idioms were rated as more imageable on average than the 16 non-similar ones. The desired balance was achieved by discarding the three highest rated phonologically similar idioms and the three lowest rated non-similar ones. Table 1 shows the final list of 26 idioms.
The targeted idioms were drawn from the Oxford idioms dictionary for learners of English (Parkinson, 2006) apart from two drawn from the online version of the Cambridge advanced learner’s dictionary and thesaurus (CALDT) (http://dictionary.cambridge.org/dictionary/british/). The glosses and examples given on the tests and on the study sheet that participants consulted during the study phase were drawn from Parkinson (2006) except where the definition or example in CALDT was clearly superior. For example, Parkinson defines miss the mark as ‘not succeed in achieving or guessing something’. Because this seemed vague to us, we chose the gloss in CALDT: ‘fail to achieve the result that was intended’. For this idiom we also used the example given by CALDT, since Parkinson does not provide one.
3 Procedure
In a pen-and-paper pretest administered by one of the authors, participants were asked to complete the 26 target idioms along with four fillers. Each of the 30 test items had the same format: A single noun was gapped in each targeted idiom, except for that noun’s first letter. With one exception, the gapped word was always the word following the article the (e.g.
Next, a study booklet was handed out. It consisted of a section for each idiom (including the four fillers). Each section presented (1) the complete canonical form of a targeted idiom, (2) a definition for it taken from a learner’s dictionary, and (3) an illustration of usage also taken from a learners’ dictionary; for example:
. Continue doing something until it has finished or been completed, even though it is difficult. ‘Very few of the trainees
. Fail to achieve the result that was intended. ‘Her speech
In the comparison condition participants were asked to study the idioms on these sheets. No mention was made of sound repetition, and students were given no task designed to cause them to pay more attention to phonological form than they would if left to their own devices. In the attention direction condition the researcher first briefly outlined a common rhetorical function of idioms − that of summing up one’s attitude to an event or situation (McCarthy, 1998, pp. 131–149) − and proposed that sound repetition, as in tear one’s hair [out], might have rhetorical impact over and above any impact of the idiom’s meaning alone. The participants were then asked to study the idioms and to mark the keywords in the idiom that share a sound with the boxed content word. The researcher recommended to participants that they repeat the target idioms subvocally in order to become more aware of any sound repetition. For both groups, time was called after 15 minutes, and these sheets were collected.
In both conditions the participants were told they would afterwards be tested on their knowledge of the form of each of the idioms. The first posttest, which followed immediately, consisted of test items in which all the content words of the idiom were gapped except for the first letter. The appropriate gloss was repeated from the study-sheet, but not the example of usage.
H____________ B____________ F____________ TO F____________. Have more important, interesting, or useful things to do.
This test was different from and more challenging than the pretest simply because we judged that at this stage of the experiment, a test similar to the pretest would be so easy that participants would be able to complete it while paying little attention to the precise wording of the idioms targeted, which would have run counter to our hope that this test would play a role in consolidating all participants’ learning of the targeted forms. Even so, a number of participants got maximum scores, as detailed further below.
When everyone had finished this test (after approximately 10 minutes) the test sheets were collected, and the instructor moved on to unrelated matters. A week later, the unannounced delayed post-test was given. This test had the same format as the pretest; that is, one content word in the targeted idiom was gapped except for its first letter, and after the idiom there was a gloss. As this was the final test to be given, no fillers were included. When everyone had finished the test (after approximately 7 minutes) the test sheets were collected.
VI Data analysis, and results
In our design the outcome variable, recall of forms, was measured by the raw scores on three tests: (1) a pretest of ability to produce targeted idioms when given a first letter cue for one omitted content word; (2) an immediate posttest where multiple content words were omitted except for their first letters; and (3) a one-week delayed posttest that was essentially identical to the pretest. Since we were out to measure the retention of the forms of previously unfamiliar L2 English idioms, the words supplied in the aforementioned test formats needed to be spelled correctly in order for an item to be scored as correct. The independent variable ‘presence/absence of awareness raising and attention direction’ defined the experimental condition (with awareness raising and attention direction) and the comparison condition (without awareness raising and attention direction). The other independent variable was ‘presence/absence of sound repetition’, which was nested under ‘Idioms’. Each participant was tested on his or her knowledge of the 26 targeted idioms, meaning that for each participant the test produced two key measures: specifically, one aggregated raw score (out of a possible 13) for the sound-repeating idioms, and another aggregated raw score (out of 13) for the control idioms. To measure learning from the prettest to a posttest, each participant’s relevant aggregated raw scores were converted into ‘gain scores’, which were derived from the raw scores according to the standard formula:
Scores of the attention direction group (n = 26) for the sound-repeating (SR) idioms and the nonSR control idioms.
Scores of the comparison group (n = 24) for the sound-repeating (SR) idioms and the nonSR control idioms.
Tables 3 and 4 give details for the immediate posttest only for completeness. Note, though, that the rather large number of maximum scores indicates the presence of a ceiling effect in these data. Despite this, the pretest-to-immediate-posttest and pretest-to-delayed-posttest learning trends were basically the same. This is shown by Figures 1 and 2, each of which includes one ladder plot for each of the four combinations of +/– attention direction and +/– sound repetition. In each ladder plot, a diagonal line extending from a pretest score to a delayed posttest score represents the learning trajectory of one participant. (There may be some super-positioning of lines despite random jittering of lines.) In other words, each ladder plot is a profile of gains and each profile in Figure 1 is similar to the corresponding profile in Figure 2. 1 Note that no idiom was known at the pretest by all participants. 2

Gain profiles for the attention direction (AD) condition (top row) and the no attention direction (noAD) condition (bottom row), and for the non-sound-repeating (noSR) control idioms (left column) and the sound-repeating (SR) idioms (right column).

Gain profiles for the attention direction (AD) condition (top row) and the no attention direction (noAD) condition (bottom row), and for the non-sound-repeating (noSR) control idioms (left column) and the sound-repeating (SR) idioms (right column).
Because it is controversial to use ANCOVA to analyse scores from intact groups of participants (e.g. Cribbie & Jamieson, 2004), we opted for gain score analysis. As a preliminary, we assessed all the relevant univariate and bivariate score sets using, respectively, the Anderson–Darling and the Kolmogorov–Smirnov tests of distributional normality (α = .20) and concluded overall that it was not safe to assume that our samples had been drawn from populations of normally distributed values. Accordingly, we decided to rely on robust significance tests based on the 10% trimmed mean (α = .05, two-sided), which is a common compromise between the mean and the median when nonnormality is a threat but outliers do not seem to be rife. Since Cohen’s d can be quite inaccurate when data are nonnormally distributed, for an estimator of effect size we settled on a version of d based on the 10% trimmed mean, dtr. (Wilcox, 2012, pp. 377–385). Given normality and equal variances dtr and d are the same; but dtr is more robust to distortion by outliers and other common violations of the assumptions underlying the use of d. Values of dtr were computed using Rand Wilcox’s R function ‘akp.effect’ (Wilcox, 2012, p. 385). 3 We tested the difference between participants’ pretest-to-delayed-posttest aggregated gain scores for the sound-repeating and the control idioms using Wilcox’s (2012, pp. 406–407) ‘yuendv2’ R function to run a robust version of the paired t-test. Key results are as follows:
The attention direction group: 10% trimmed MD = 1.36, 95% CI [0.07, 2.63], t = 2.24(21), p = .036, dtr = 56. 4 This value of dtr would typically be thought to indicate an effect of medium size, although it happens to be above average for effects observed in educational research generally (Grissom & Kim, 2012, pp. 127–130).
The comparison group: 10% trimmed MD = −0.90, t = −1.79(19), p = .089, dtr = –.38. 5 This value of dtr indicates a smallish medium effect in the opposite direction to that seen in the attention direction condition.
Unlike ANCOVA, gain score analysis is subject to loss of statistical power in the presence of a substantial floor or ceiling effect in the posttest scores (Cribbie & Jamieson, 2004). For the delayed posttest, Table 3 (penultimate column) shows three maximum scores, all relating to the SR idioms. Table 4 shows three maximums for each type of idiom. However, a small proportion of maximum scores does not necessarily mean that there was a ceiling effect. A better diagnostic is based on the finding that a ceiling effect results in a decrease in variances from pretest to posttest (Cribbie & Jamieson, 2004). In Table 5 we see such a decrease only for the scores for the sound-repeating idioms in the attention direction condition. If these scores actually were influenced by a ceiling effect, then – because of lowered statistical power – we observed a significant positive effect of sound-repetition in the attention direction condition despite the ceiling effect not because of it. 6
The variances of the pretest and delayed posttest scores for the sound-repeating (SR) and the control idioms in the attention direction (AD) and the comparison (noAD) conditions.
Notes. This is the only instance of decrease.
Because the two groups of participants were not formed by random assignment, it cannot be presumed that all potentially confounding variables were under control. But a comparison of the scores of the two treatment groups may be of interest even though the results must be regarded as suggestive. We again focus on the two groups’ pretest-to-delayed-posttest gain scores (Tables 3 and 4). To make comparisons between the AD and the noAD conditions we used Wilcox’s (2012) ‘yuen’ R function for a two-independent sample test of 10% trimmed means. We found that the superiority of the per-participant gain scores for the SR idioms in the AD condition over the same idioms in the noAD condition was extremely close to being statistically significant: MD = 1.37, 10% trimmed MD = 1.49, CI [−0.01, 2.99], t(39.25) = 2.01, p = .051, dtr = 0.56. The corresponding statistics for the control idioms in the AD condition vs the noAD condition are: MD & 10% trimmed MD = −0.77, CI [–1.99, 0.45], t(40.00) = −1.28, p = .201, dtr = −0.36. Note the medium-sized positive estimate of effect associated with the first comparison and the considerably smaller negative estimate associated with the second comparison.
We have not yet mentioned how accurately participants in the attention direction condition marked the sound-repeating idioms as such or how well they refrained from marking control idioms. With 26 participants and 13 idioms in each of the two sets, 338 is the maximum possible number of correct marks. The number of marks given the sound-repeating idioms was 236 (69% of the total possible). Control idioms were wrongly marked as sound-repeating 27 times (8% of the maximum possible). If our finding of a positive mnemonic effect of awareness raising and attention direction reflects reality, it seems evident that this effect does not depend on learners identifying instances of sound repetition with a high degree of accuracy.
So far we have largely looked at scores matched to participants. But, as we have seen, the scores can also be matched to idioms or, more generally, (linguistic) ‘items’. Since there were two kinds of idioms in the experiment, one way of organizing each treatment group’s gain scores by items is to put them in two independent sets, with one set comprising the 13 gain scores for the sound-repeating idioms and the other set comprising the 13 gain scores for the control idioms. Because the effective sample size is then less than what it is in the by-participants format (i.e. 13 instead of 26 or 24), the statistical power that can be brought to bear is greatly reduced, meaning that in order to find p < α, the effect size must be larger. 7 In L2 research this by-items perspective is rarely taken into account along with the by-participants perspective, in contrast to practice in psycholinguistic research. The omission is unfortunate since only attention to both perspectives can provide a solid basis for generalizing findings not just to different learners but to different items as well. Accordingly, we analysed the by-items gain scores using an independent samples bootstrap t-test with 10,000 bootstrap simulations (Wilcox, 2012, p. 341). The key statistics for the attention direction group are: MD = 2.46, 10% trimmed MD = 2.55, CI [–1.14, 6.23], t = 1.37, p = .165, dtr = 0.55. 8 If we refer back to the results from the corresponding by-participants analysis, we see that the new p value is about five times as large whereas the observed effect size is almost exactly the same (0.56 vs. 0.55). The comparison group by-item data also show the same trend as the by-participants data (in particular, dtr = –.38 vs. dtr = –.37). Despite the encouraging stability of the estimates of effect size, we emphasize that the prospects of replicating our findings in a study using different idioms would be greater if our by-items and by-participants p values were in accord. 9
Lastly, we return briefly to the variable of item frequency. If we refer again to Table 2, we can see that compared to the control idioms, the sound-repeating ones have greater median frequencies as whole idioms, in terms of content words, and in terms of words intended as responses in the pretest and delayed posttest (Table 2). To get an idea of the roles of these three nuisance variables in our study we checked the association between pretest to delayed posttest aggregated gain scores and each of these measures of frequency. We found that none of these measures accounts for more than 0.9% of the variation in gain scores. Moreover, the first order correlations are not all signed in the same direction.
VII Discussion
Let us now consider the three research questions in turn. Regarding research question 1, our data is not consistent with sound-repeating (phonologically similar) idioms being more memorable than the non-sound-repeating control idioms in the absence of pedagogical intervention since the comparison group remembered the sound-repeating idioms markedly less well than they did the control idioms. Research question 2 asks whether the forms of sound-repeating idioms can be rendered extra-memorable for learners by providing some awareness raising about patterns of phonological similarity (i.e. sound repetition) and by providing also a task that induces learners to look out for instances of phonological similarity in idioms that they encounter. Our results suggest that the answer to this question is ‘yes’. (In Figure 2, the ladder plot for the attention direction group, top right, runs higher than the plot for any other group.) Finally, research question 3 asks whether the awareness raising and attention direction experienced by the participants in the attention direction condition detracted from their learning of the control idioms. Because we did not form the two treatment groups by random assignment of participants, there is no firm basis for direct comparison of one treatment group with the other. However, at pretest these two groups of participants were alike in many respects. And, as can be seen in Tables 3 and 4, the groups’ mean pretest scores for both types of idiom are fairly similar, as are their gains in the case of the control idioms (137AD vs. 145noAD), which is not so for the sound-repeating ones (169AD vs. 123noAD). True, the comparison group’s mean (pretest-to-delayed-posttest) gain score for the control idioms is 5.8% higher than the attention direction group’s mean gain score for those idioms. But if we refer to the two plots in the left column of Figure 2, we see that these plots show very similar gain profiles for the control idioms in both learning conditions; and both profiles are consistent with pretty good learning of these idioms over the period of the experiment. Figure 3 indicates perhaps even more clearly that the control idioms were learned well in both conditions. To approach research question 3 in a different way, recall that it was the attention direction group that made the greatest learning gains across both types of idiom altogether. Specifically, for this group there are 306 gain scores in total vs. 268 for the comparison group, which is, again, 14% more (306AD / 268noAD = 1.14). All in all, there is no evidence that the experimental treatment engendered a trade-off effect, let alone one large enough to be substantively important.

These plots show for each condition how the variable ‘type of idiom’ relates to by-idiom test scores in a robust linear regression model.
VIII Conclusions and suggestions for further research
On the delayed posttest, participants in the attention direction condition recalled sound-repeating idioms (i.e. alliterative and/or assonant ones) better than the non-alliterative and non-assonant control idioms. In the comparison condition the direction of the trend was reversed, even though the stimulus idioms remained the same. The cause of this reversal is likely to have been the absence from the comparison condition of a pedagogical intervention focusing on patterns of sound repetition. The fact that the control idioms were better recalled than the sound-repeating ones in the condition where participants were left to study the stimulus idioms any way they wanted makes it all the more plausible that the outcome in the attention direction condition is directly attributable to the pedagogical intervention.
Overall, our findings are very much at odds with any possibility that alliteration and assonance may have a practically significant mnemonic effect in the absence of a pedagogical intervention such as that seen in the condition with awareness raising and attention direction. As to our third research question, whether a pedagogical focus on alliteration and assonance may adversely affect the learning of idioms that show no such pattern of sound repetition, we found no evidence of such a trade-off effect. A key fact in this regard is that when pretest-to-delayed-posttest gain scores for both types of idiom were counted together it was found that the attention direction group made the greater overall gain.
We have emphasized that our study focused on the retention of forms not meanings. As a reviewer pointed out, we did not investigate the possibility that direction of learners’ attention to forms, as in our experimental treatment, might reduce learners’ attention to, and acquisition of, the meanings of these forms. This is indeed a noteworthy limitation of the current study: Any (partial) replication of it would be particularly useful were it to cast light on whether such a trade-off occurs and, if so, how serious it is.
We have indicated also that the firmest possible basis for generalization from the findings of a study such as ours to situations involving other learners and other items comes from a statistical analysis that takes both by-participants and by-items perspectives into account and finds p < α for both. Because we found p < α only in the by-participants perspective, it is imperative that any replication of this study involve not just new participants but new idioms as well. That said, our results chime with results of other studies (which did involve different phrasal expressions) that indicate that it can be worthwhile for teachers to take a small amount of extra time in class to raise learners’ awareness of patterns of sound repetition in targeted phrasal expressions and to stage relevant attention direction exercises (e.g. Boers & Lindstromberg, 2005, study 3; Lindstromberg & Eyckmans, 2014).
It may be asked why the comparison group (noAD) participants remembered the control idioms markedly better than the sound-repeating ones, which was a result that we did not expect. In both Figure 1 and Figure 2 it can be seen that the lower-right plot (noAD, +SR) is different from all the others. Looking at Figure 2 we see three rather flat diagonal lines at the bottom of the score range. These indicate that three of the noAD participants who had very low pretest scores made especially poor gains on sound-repeating idioms at the delayed posttest: There is even one case of decline. In an attempt to discover why the control idioms showed superior memorability in the comparison condition we consulted the list of imagery ratings for 40,000 English lemmas compiled by Brysbaert, Warriner, an Kuperman’s (2014). From this list we were able to obtain ratings for 25 of the 26 content words that were gapped in the pretest and delayed posttest. The mean ratings for the two types of idioms are: 4.68SR and 4.61C. We aggregated the delayed-pretest scores of the two groups and then did the same with their gain scores. The Pearson’s correlations between these scores and imagery ratings are respectively, r = .17 and –.01, none of which suggests that differential imageability of the gapped words influenced our findings to any practically significant extent. While we are unable, then, to suggest reasons why particular idioms among the ones we used as stimuli might inherently be more or less memorable in the absence of attention direction, one clue may lie in the standard deviations associated with the whole-idiom imagery ratings that were mentioned further above. The mean SD for these ratings (which were based on a 1 to 9 scale) is 2.05. As might be expected, some of the ratings show a good deal of inter-rater variation. (The ratings for cook the books, SD = 2.95, show the most by some distance.) To the extent that imagery enhances memory of forms, such variation could easily lead to unusual results for some idioms in a study not based on a very large sample of learners. Plainly, this is an additional limitation of the present study.
A further limitation of our study is that it did not address the question of whether a greater positive effect on memory for forms is likely to result from just awareness raising or from awareness raising along with attention direction. Nor did our study address the question of whether, or for how long afterwards, learners will autonomously profit from in-class awareness raising and attention direction with respect to sound repetition in idioms generally. Depending on what might be found out in this regard, it could be fruitful to carry out pairwise comparisons of the effectiveness of some of the techniques of awareness raising and attention direction that have been suggested in teachers’ resource books (e.g. Lindstromberg & Boers, 2008c). Suppose, for instance, that a class of students propose a list of L2 idioms that they would like to learn. Among other things, the teacher could ask the learners to form pairs or threes and sort the idioms into four groups: idioms that include alliteration, ones that include assonance (including rhyme), ones that include both, and ones that include neither; and, following this, the teacher could lead a plenary discussion of the groupings. In an alternative exercise, the class divides into A–B pairs; each A student gets a handout showing half the targeted expressions (all complete, i.e. not split in half), with each idiom in a short, representative context; each B student gets a handout showing the remaining expressions, each in a short context; on each handout the loci of sound repetitions have already been highlighted (e.g. underlined) by the teacher; students A and B dictate their phrases to each other; they then check each other’s writing and add in the highlighting of the sound repetitions; to finish, they discuss the sound repetitions and consult the teacher if they have any doubts. An experiment that took into consideration any difference in running time between two targeted exercise types could compare the effectiveness of these exercises as means of promoting productive knowledge of targeted expressions.
Footnotes
Acknowledgements
We are very grateful for helpful suggestions from the editor and the reviewers.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
