Abstract
A study was conducted to identify the scope and timing of maturational constraints in three linguistic domains within the same individuals, as well as the potential mediating roles of amount of second language (L2) exposure and language aptitude at different ages in different domains. Participants were 65 Chinese learners of Spanish and 12 native speaker controls. Results for three learner groups defined by age of onset (AO) – 3–6, 7–15, and 16–29 years – confirmed previous findings of windows of opportunity closing first for L2 phonology, then for lexis and collocation and, finally, in the mid-teens, for morphosyntax. All three age functions exhibited the discontinuities in the rate of decline with increasing AO associated with sensitive periods. Significant correlations were found between language aptitude, measured using the LLAMA test (Meara, 2005), and pronunciation scores, and between language aptitude and lexis and collocation scores, in the AO 16–29 group.
I Age differences and maturational constraints on second language aquisition
Age of first meaningful second language (L2) exposure, or age of onset (AO), is widely recognized as a robust predictor of success in second language acquisition (SLA). While older children and adults often proceed faster through early stages in the acquisition of a L2 morphology and syntax – a rate advantage – the prognosis for level of ultimate L2 attainment generally deteriorates with increasing AO (Jia and Fuse, 2007; Krashen et al., 1979).
There is less agreement about the reasons for age effects. Variation in the quantity and/or quality of input to younger and older learners (e.g. Bialystok and Hakuta, 1999; Flege et al., 1995; Moyer, 2009), and differences in their affective profiles (e.g. Moyer, 2004) and cognitive maturity (e.g. Newport, 1990), have all been proposed, but in the opinion of most reviewers (e.g. DeKeyser and Larson-Hall, 2005; Hyltenstam and Abrahamsson, 2003; Long, 1990) have been found wanting. The input to which children are exposed is often richer and can involve a fuller range of functions than that experienced by many adults, some of whom may even live in what amounts to a first language (L1) linguistic ghetto. There is some experimental evidence, however, that input to children and adults does not differ significantly simply as a function of age (Scarcella and Higa, 1981). Moreover, restricted input cannot explain the high levels of achievement, but non-nativelike achievement, attained by the many non-native speakers (NNSs) who live in the L2 environment for decades, often married to native speakers (NSs) of the L2, who use the L2 both at work and in almost every aspect of their social lives, and who exhibit no attitudinal or motivational barriers to acquisition. Also, with the exception of a few short-term, that is rate, studies (for review, see DeKeyser and Larson-Hall, 2005: 96–97), length of residence (LOR) rarely correlates significantly with achievement in rule-based learning even before the effects of AO are removed, and less so once they are. The same is true for social-psychological variables (attitude, integrative orientation, etc.), whose impact on both phonology and morphosyntax is minimal or evaporates altogether once AO effects are removed through semi-partial correlations or stepwise regression (e.g. Flege et al., 1995; Flege et al., 1999; Jia, 1998; Oyama, 1976, 1978). Age effects are robust, even in situations where the quantity and quality of input available to learners are not at issue, and regardless of learners’ social-psychological profile.
To many researchers, an explanation of age effects as a function of biologically-based maturational constraints, including one or more critical or sensitive periods 1 for SLA, seems the likely alternative. Meisel, for example, considers ‘the maturational changes in the neural system the major causal factor for differences in linguistic knowledge observed in successive as compared to simultaneous language acquisition’ (2009: 8). However, such views remain controversial. (For reviews supportive of the existence of maturational constraints on SLA, see, for example, DeKeyser, 2012; DeKeyser and Larson-Hall, 2005; Hyltenstam and Abrahamsson, 2003; Long, 1990, 2007; Meisel, 2009, 2011; Newport, 2002. For studies and reviews disputing the existence of maturational constraints, see, for example, Bialystok and Miller, 1999; Birdsong, 2006, 2009; Birdsong and Molis, 2001; Herschensohn, 2007; Muñoz and Singleton, 2011).
Inverse correlations between AO and ultimate attainment, alone, constitute one form of evidence for age effects, but are insufficient to support the biological explanation. Evidence consistent with critical periods and, arguably, sensitive periods (SPs) for language learning, too, needs to include distributions marked by clear discontinuities in the rate of decline with increasing age. There will be a period of peak sensitivity, often considered to last from birth until age six where human language learning is concerned, but possibly of shorter duration (zero to age three or four), according to some findings (for example, Hyltenstam, 1992; Meisel, 2009; Morford and Mayberry, 2000). Eventual native-like attainment, especially for phonology, is not guaranteed for sequential bilinguals first exposed to the L2 within that period (Piske et al., 2001), even when exposure continues to be plentiful, precisely because of the heavier learning task the sequential bilingual child faces, in the form of two languages instead of one 2 . Still, eventual native-like attainment is more likely. There follows an offset, perhaps lasting five or six more years where the acquisition of native-like phonology, lexis and collocations is concerned, and until the mid-teens for grammar, during which progressively fewer learners will achieve native-like abilities, the success rate being marked by a statistically significant decline during this period (for reviews of findings, see DeKeyser and Larson-Hall, 2005; Hyltenstam and Abrahamsson, 2003). At the end of the offset(s), that is, after closure of the SP(s), a small minority of learners may achieve near-native abilities, and a tiny group may be able to pass for native on a few areas and/or tightly constrained tasks (e.g. Donaldson, 2011; Marinova-Todd, 2003; van Boxtel, 2005), but no-one will be able to achieve native-like abilities across the board. The decline in achievement among those with an AO after the close of an SP is noticeably more gradual, with variability due to other factors, such as motivation, the proportion of L1 and L2 exposure and use, and intensive training in pronunciation (Bongaerts, 1999: 155), and so will only be indirectly and weakly related to increasing age.
The initially high success rate for those exposed to the L2 early, preceding the relatively steep decline in the number of successful cases among those with an AO during the offset, followed by the flattening out of the data thereafter, results in what is referred to as a ‘stretched Z’ distribution (see Birdsong, 2005), with clear discontinuities at the end of the period of peak sensitivity and end of the offset (see Figure 1). As several reviews of the literature (e.g. DeKeyser, 2012) have shown, this is, indeed, the general pattern. The few apparently conflicting findings, for example those of a general decline in ability across the entire AO range, based on census data, are the result of design artifacts and/or methodological flaws in the studies concerned (DeKeyser and Larson-Hall, 2005; Long, 2005).

The stretched Z.
A maturational constraints claim that has stood up well to empirical testing is that there exist SPs for L2 pronunciation and morphosyntax closing around age 12 and in the mid-teens, respectively (Long, 1990). Specifically, native-like pronunciation of an L2 or dialect is most likely (not guaranteed) for those with an AO between 0 and 6 years; still possible, but decreasingly likely, with an AO occurring during the offset period from 6 to 12; and impossible for anyone with an AO later than 12. Native-like morphology and syntax are most likely (not guaranteed) for those with an AO between 0 and 6 years; still possible, but decreasingly likely, with an AO during the slightly longer offset period from 6 to the mid-teens (15, plus or minus two); and impossible for anyone with an AO later than that. Beyond age 16 or 17, the degree of grammatical accentedness will, again, depend on such factors as L1 and L2 exposure and use, language aptitude, motivation, and metalinguistic knowledge, and so will only be indirectly and weakly related to AO.
The position for lexis and collocational abilities is less clear, chiefly due to the scarcity of studies to date. However, such research findings as there are suggest that acquisition in this domain, too, is subject to maturational constraints, with a period of peak sensitivity from 0–6, followed by an offset closing between 6 and 12, possibly around age 9 (Hyltenstam, 1988, 1992; Lee, 1998; Munnich and Landau, 2010; Spadaro, 1996). While new lexical items and collocations can be, and clearly are, acquired by both native and non-native speakers throughout the life-span, the ability to do so may deteriorate as a result of a declining ability for incidental and, especially, instance learning as a function of increasing age (Hoyer and Lincourt, 1998). Among other things, late L2 learners appear to have problems mastering the limits on the extension of core lexical items, such as pass in ‘Bessie passed a relaxing month in Provence,’ but not in ‘You’ve got the money you demanded – now pass the hostage’ (Kellerman, 1978; Spadaro, 1996, forthcoming 2013). They are often unable to distinguish frozen and relatively productive idioms among other multi-word units, for example not knowing whether they can appear in both active and passive, declarative and interrogative or positive and negative forms, or tolerate changes in word order. Thus, they may accept items like ‘Whose eye is she the apple of?’ and ‘Dolores is the party’s life and soul’ (Spadaro, 1996, forthcoming 2013).
Evidence consistent with these claims has accumulated steadily over the past two decades (see e.g. Abrahamsson and Hyltenstam, 2009; Hyltenstam and Abrahamsson, 2003). In the methodologically most sophisticated, comprehensive study of ultimate L2 abilities as a function of AO to date, Abrahamsson and Hyltenstam (2009) obtained a pool of 195 Spanish/Swedish bilinguals with AOs ranging from 1 to 47 years, all of whom identified themselves as potentially native-like in their L2, Swedish. Brief recorded speech samples from those individuals, mixed with those obtained from 20 Swedish NSs, were rated by a panel of 10 NS judges. Of the 107 early learners (AO of 1–11) in the original pool, 62% were perceived by the judges to be NSs; of the 88 late learners (AO 12–47), only 6% (with AOs of 12–17) passed for native: perceived nativelikeness. The Swedish of a subset of 41 of the survivors, of whom 31 had an AO of 11 or earlier, and 10 an AO of 12–17, was then examined in great detail, using a four-hour battery of very demanding tests: scrutinized nativelikeness. The two groups were of comparable LOR in Sweden, length of L2 exposure, and L1 use. Based on their scrutinized, as opposed to their perceived, abilities, not a single learner with an AO > 12 was found to perform within the NS range, and only three of the 31 younger starters did so on all 10 measures. 3
In the cases of those very few adult starters who do manage to score within the NS range on some measures and/or who exhibit near-native abilities overall, there is growing interest in the claim that language aptitude is a key variable. Stimulated by a proposal first made by DeKeyser (2000), the idea is that aptitude is relevant for explicit learning, which is held either to replace the child’s implicit language learning capacity after a certain age, e.g. from age seven (Paradis, 2004) or the mid-teens (DeKeyser, 2000), or to supplement the older learner’s gradually declining, but never lost, capacity for incidental learning (Long, 2010). The purported role of language aptitude in mitigating (not overcoming) the effects of AO on ultimate attainment has motivated some recent studies of factors involved in the acquisition of very advanced abilities in naturalistic L2 environments, where, unlike in most classroom settings, near-native achievement is sometimes feasible.
II Language aptitude, AO, and ultimate attainment in the L2 environment
In contrast with the many studies of aptitude and rate of classroom language learning, relatively few have been conducted on the role of aptitude in combination with starting age, as a partial explanation for ultimate L2 attainment by long-term residents in the L2 environment, or for L1 retention among attriters (but see Bylund, 2009; Bylund et al., 2010). DeKeyser (2000) predicts that if SLA depends increasingly on explicit learning and domain-general mechanisms as learners grow older, then language aptitude, especially analytic ability, should be more important, and measurable, in samples of older learners, after closure of a SP. Conversely, the argument goes, aptitude should have little or no relevance for younger learners, for whom the acquisition process is supposedly purely implicit, involving domain-specific mechanisms. Also, implicit learning is supposedly unaffected by individual differences (Reber, 1993). Finally, DeKeyser (2000) claims, any post-SP starters who do manage to achieve near-native abilities will be found to have superior aptitude, specifically, verbal analytical ability. Note, however, that aptitude was found to be related to rate of children’s L1 development of various features, such as pronominalization and auxiliary development (Skehan, 1989). Note, further, that since many more children than adults achieve near-native abilities, there will typically be less variability in their proficiency scores, which will make it harder to show the effects of any moderator variable, including aptitude, and easier to show its effects in adult groups, where variable attainment is the norm.
To the best of our knowledge, there have only been three studies to date on the AO– aptitude interaction and ultimate L2 attainment by long-term residents in the target language environment. A fourth, by Harley and Hart (2002), is sometimes cited in this context, but since their participants’ study abroad experience lasted just three months, the results speak only to rate of acquisition. It is included in Table 1 for completeness. Findings have been mixed. Two studies, DeKeyser (2000) and DeKeyser et al. (2010), found statistically significant correlations between aptitude and ultimate grammatical attainment in older learners, but not in younger learners. DeKeyser (2000) measured aptitude with the Words-in-Sentences sub-test in the Modern Language Aptitude Test (MLAT; Carroll and Sapon, 1959) and DeKeyser et al. (2010) used a verbal aptitude test. Abrahamsson and Hyltenstam (2008), on the other hand, found an effect for aptitude in younger learners and suggested that language aptitude seemed to be necessary in adult near-native SLA and advantageous in child SLA.
Studies on the AO–aptitude interaction and ultimate L2 attainment.
Notes: * p < .001; ** p < .05.
DeKeyser also obtained interesting differences in the results for different types of L2 structures as a function of AO. All structures that were clearly classifiable as ‘easy’, such as word order in declarative sentences (except for adverb placement), do-support in yes–no questions, and pronoun gender, showed no AO-related differences in either study. All structures clearly classifiable as ‘difficult’, on the other hand, such as articles, some plurals, and sub-categorization, showed strong AO-related effects in both studies. What makes the first group of structures easy and accounts for their insensitivity to AO, DeKeyser argues, is the perceptual saliency of errors involving them, which leads NNSs to notice the gap between their performance and NS norms. This makes the structures amenable to explicit learning by older learners if they have high enough verbal ability with which to compensate for their decreasing capacity to learn abstract patterns implicitly. Young learners, on the other hand, can learn both easy and hard structures regardless of verbal aptitude, because they can utilize their still intact capacity for implicit learning. Instead of just a sizeable negative correlation between AO and ultimate attainment, in what is the strongest claim in this area to date, DeKeyser concluded:
If the critical period hypothesis is constrained … to implicit learning mechanisms, then it appears that there is more than just a sizeable correlation: early age confers an absolute, not a statistical advantage–that is, there may very well be no exceptions to the age effect. Somewhere between the ages of 6–7 and 16–17, everybody loses the mental equipment required for the implicit induction of the abstract patterns underlying a human language, and the critical [as opposed to sensitive or optimal] period really deserves its name. (DeKeyser, 2000: 518)
As shown in Table 1, rather different results were obtained by Abrahamsson and Hyltenstam (2008). They found that some of the 31 early learners (AO 1–11) were able to score within the NS range on all tasks, but no-one with an AO above 8 could do so on all 10 measures of pronunciation, vocabulary and grammar. Some of the 11 late learners (AO 12+) were able to score within the NS range on some tasks (a maximum of seven), but none could do so on all of them. In the opposite pattern to that observed in the two DeKeyser studies, the correlation between aptitude and GJT scores in the younger group was statistically significant (r = .70, p < .001), whereas that in the older group (r = .53, p = .094), while positive, was not. Abrahamsson and Hyltenstam (2008: 498) note that these results may have been due to the larger n-size and wider range of aptitude scores in the younger group, and the smaller n-size and narrower range in the older group, whose members had above average aptitude. Paralleling DeKeyser’s findings, the four late starters who did score within the NS range on the GJT all had above average aptitude scores but, in contrast, so did 72% of the early learners. Abrahamsson and Hyltenstam concluded that aptitude could play a role in both child and adult near-native achievement.
Partly in an attempt to resolve these conflicting findings, we conducted the present study. The main goal was to provide a global picture of ultimate L2 attainment across tasks in three different language domains – phonology, lexis and collocations, andmorphosyntax – within participants, something that, to the best of our knowledge, no study has attempted before. The second goal was to explore the roles of language aptitude and LOR in mitigating age effects in each of the three domains.
III The study
‘Scrutinized’, rather than ‘perceived’, L2 performance across productive and receptive tasks in phonology, morphology, syntax, lexis and collocation, is the new benchmark for ‘native-like’ L2 abilities. Following Abrahamsson and Hyltenstam (2009), the present study employed a multiple-task design to provide a global picture of ultimate L2 attainment within participants and to investigate any mitigating effects of aptitude and LOR across language domains.
Two main research questions guided the study:
Research question 1: Is there a relationship between AO and performance in different language domains that is consistent with the existence of multiple SPs?
Research question 2: Are language aptitude and/or LOR related to ultimate L2 attainment, depending on AO and/or language domain?
Motivated by the previous claims and research findings on SPs for phonology, morphosyntax, and lexis and collocation, the following hypotheses were tested:
Hypothesis 1: Pronunciation scores will be inversely correlated with AO among participants with an AO > 6.
Hypothesis 2: No participants with an AO > 12 will obtain pronunciation scores within the NS range.
Hypothesis 3: Lexis and collocation scores will be inversely correlated with AO among participants with an AO > 6.
Hypothesis 4: No participants with an AO > 12 will obtain lexis and collocation scores within the NS range.
Hypothesis 5: Morphosyntax scores will be inversely correlated with AO among participants with an AO > 6.
Hypothesis 6: No participants with an AO > 15 will obtain morphosyntax scores within the NS range.
Additional hypotheses addressed the roles of language aptitude and LOR. An alternative to the view that the capacity for implicit language learning ceases at age 6 or age 15, and one we favor, is that the capacity for implicit language learning, especially of item learning, gradually deteriorates with increasing age but never disappears, as evidenced by the finding that it serves well in adult performance of other complex cognitive tasks (for review, see Doughty, 2003; Hoyer and Lincourt, 1998). There is evidence, too, that the capacity for implicit learning of linguistically relevant regularities from statistical properties of the input persists in adulthood (see Thompson and Newport, 2007; Williams, 2009). What changes is the gradually increasing need for explicit learning to support the gradually declining capacity for implicit learning (Ellis, 2008; Long, 2010). This reasoning predicts that language aptitude will be less relevant in any AO group than previously proposed, but likely to become more relevant, if relevant at all, with later AO and in domains where acquisition involves item learning rather than rule learning. Rule-governed phonology and morphosyntax are largely acquired by age 6, with subsequent problems mostly concerning exceptions to rules, or rules like English dative-movement that are lexically conditioned. Lexical items and collocations are prime cases of item learning, and each continues to be acquired throughout the life span in both L1 and L2. Any effect for language aptitude, therefore, might be expected to show up for older starters in the learning of vocabulary and collocations.
Beyond an initial period during which basic phonology and morphosyntax are mastered, LOR, too, should only be relevant, if relevant at all, in the lexical and collocational domain, again due to the life-long duration of the learning process for vocabulary items and collocations. These considerations, together with the conflicting findings in previous research, motivated the following hypotheses:
Hypothesis 7: There will be no effect for LOR in the phonological domain.
Hypothesis 8: There will be an effect for LOR in the lexical and collocational domain.
Hypothesis 9: There will be no effect for LOR in the morphosyntactic domain.
Hypothesis 10: Language aptitude will not be related to scores in the phonological domain for learners of any AO.
Hypothesis 11: Language aptitude will be related to scores in the lexical and collocational domain for late learners (AO ≥ 16).
Hypothesis 12: Language aptitude will not be related to scores in the morphosyntactic domain for learners of any AO.
IV Method
1 Participants
Participants were 65 L1 speakers of Chinese, long-term residents of Spain, from a larger pool who, in response to published recruitment advertisements, had identified themselves as having a command of Spanish similar to that of a NS, and had then been screened into the study via a telephone interview. 4 Requirements, in addition to high Spanish proficiency, were an LOR of around 10 years, and no less than high-school education. The participants’ AO ranged from 3 to 29, and their LOR from 8 to 31 years (for a summary of demographic characteristics, see Table 2). The proficiency criterion was designed to make the study comparable with the previously discussed work in the area. An LOR of 10 years or more has emerged as an acceptable period on which to base assessments of stabilization or fossilization (Long, 2003), making its adoption reasonable in ultimate attainment studies, too.
Participant demographic characteristics.
Note: Standard deviations appear in parentheses.
AO was operationalized as the beginning of a serious and sustained process of language acquisition as the result of migration or the commencement of a formal Spanish language program. AO, therefore, could differ from age of physical arrival in the country. In this study, when AO and age of arrival did not overlap, formal instruction took place in adulthood, after age 16. Therefore, age of first exposure as a result of immersion in the L2-speaking country, and age of first instruction still overlapped for the purposes of the current study, where adult L2 learners are defined as those with ages of onset of 16 and older. In this group, the number of years of instruction ranged between zero and four. A total of seven late L2 learners had received instruction for one year or less than a year, four for two years, and seven for four years. There were also 11 L2 learners with AOs between 9 and 16 who had received formal language instruction upon arrival in the country. Instruction for these learners ranged between two months and three years.
AO could also differ from age of physical arrival in the country in the case of early L2 learners, albeit for a different reason. The earliest L2 learners in the present study arrived in the country between ages three and six or were born in Spain. In either case, they had been born to Chinese-speaking parents who had immigrated to the country as adults. As a result, even those early L2 learners who had been born in Spain had not been immersed in the L2 until a later age, usually at age three, in pre-school. Until that age, they were primarily exposed to Chinese and, therefore, can be considered sequential, not simultaneous, bilinguals.
Regarding the L2 learners’ career profiles, most of them worked in business-related jobs (n = 31) and had college education or were pursuing a degree at the time of the study. There were also engineers (n = 7), hostesses (n = 2), doctors (n = 2), lawyers(n = 7), interpreters (n = 5), a pilot, an architect, and a musician.
Twelve monolingual NS of Spanish were used as controls. They were of a comparable age and educational level. They all had college degrees, or were pursuing a degree at the time of the study, in business, economy, engineering, marketing, education, or social work.
2 Instrumentation
Participants completed a battery of computer-based tests measuring ultimate attainment in L2 phonology, lexical and collocational abilities, and morphology and syntax. Each participant received a different randomized order of tests. Overall testing time was four hours on average, and individual tests lasted less than 25 minutes each. Testing sessions included a minimum of two breaks, with participants allowed to take as many as needed.
Aptitude was measured using the LLAMA test (Meara, 2005), the latest version of the Swansea Language Aptitude Test (LAT; Meara et al., 2003), largely based on the MLAT and consisting of four sub-tests: vocabulary learning, grammatical inference, sound–symbol correspondence, and sound recognition (see Granena, to appear, 2013a, for a detailed description and validation study of the LLAMA). The LLAMA is computer-based and independent of the languages spoken by test-takers. 5 It relies on picture stimuli and verbal materials adapted from a British-Columbian indigenous language and a Central-American language. The test takes approximately 25 minutes. With the exception of sound recognition, the LLAMA sub-tests include study phases lasting between two and five minutes. Instructions are provided verbally by the researcher. The score for each of the LLAMA sub-tests ranges between 0 and 100. Following previous studies on ultimate attainment that have also used the LLAMA or its earlier version, the LAT (Abrahamsson and Hyltenstam, 2008; Bylund et al., 2010; Granena, 2012, to appear 2013b), and for the sake of comparability, participants’ language aptitude was calculated as a composite of scores on the four sub-tests. The reliability coefficient for internal consistency (Cronbach’s alpha 6 ) for the test was .764 (k = 90).
a Phonology
In order to assess their pronunciation, participants were asked to read aloud a three-line paragraph that intentionally included sounds typically difficult for Chinese L1 speakers to pronounce, for example the r/l distinction, consonant clusters, and voiced plosives. The paragraph contained most of the segmental phonemes of Spanish, and, as recommended by Scovel (1988), sentences that could be read with normal intonation, so that they did not require any supra-segmental gymnastics:
Hace años me encantaba observar a los conductores del antiguo ferrocarril de vía estrecha. Me emocionaba verles girar la palanca de freno con precisión hasta que salían chispas de las ruedas. Era una auténtica maravilla. [Many years ago I used to enjoy looking at the drivers of the old narrow-rail train. I used to get excited when I was seeing them turn the brake with precision until sparks were coming out from the wheels. It was a real wonder.]
Speech samples were rated by a panel of 12 linguistically naive NSs (seven males and five females), who served as judges of participants’ degree of foreign accent. Linguistically naive judges have been shown to be able to judge degree of accent reliably, and they are less lenient than NSs with linguistic experience (Thompson, 1991). As in the study by Flege et al. (1999), a nine-point scale was used, recommended for having sufficient resolution and being able to discriminate between degrees of accent well. Such a scale allows for variability in the ratings and, therefore, creates fewer floor and ceiling effects than scales with fewer points. The scale had nine points anchored at ‘very strong foreign accent’ (1) and ‘no foreign accent’ (9). Each rater was asked to listen to each sample until the end and to determine the degree of foreign accent on the scale, which was displayed on the screen during the task. They were not given any instructions regarding how to use the scale. Judges were further told that the samples included a mix of NSs and NNSs of Spanish, but they were not informed about the proportion. Each judge rated each participant twice in two separate blocks. The order of the speech samples was randomized in every block and for each individual participant by the software used to administer the test battery (Superlab Pro). Two practice samples (a sentence read by two NSs) preceded the test.
In order to verify the inter-rater reliability of the 12 judges, Cronbach’s alpha (α) was computed. The results indicated acceptable agreement among the raters in each block (Cronbach’s α = .976 and .972, respectively). Spearman’s rho correlations, taking into account chance agreement, between all pairs of raters (n = 66) ranged between .76 and .93 (M = .85, SD = .05). Finally, intra-rater coefficients (Pearson correlations) for each rater were all significant and ranged between .71 and .94 (M = .87, SD = .07).
Lexis and collocations
Six measures were employed to assess participants’ knowledge of lexis and collocations. Collocations are semi-constructed phrases or prefabricated combinations (Pawley and Syder, 1983). As discussed by Spadaro (1996, 2013), several terms have been used to describe such combinations: ‘multi-word units’ (Crystal, 1975), ‘gambits’ (Keller, 1979), ‘conventionalized language forms’ (Yorio, 1980), ‘lexical phrases’ (Nattinger, 1980), and ‘lexicalized sentence stems’ (Pawley and Syder, 1983). These include sequences of words, such as idioms or fixed expressions, where meaning is derived, but also words that co-occur (e.g. noun + adjective, verb + prepositional phrase).
Based on those developed by Spadaro (1996), they tested knowledge of lexical units and phrases: idioms, set phrases, and word constellations. The tasks involved:
completing compound words orally (α = .920), for example cubre(cama) ‘bedspread’;
completing multi-word units orally (α = .970), for example más tarde o más (temprano) ‘sooner or later’, sano y (salvo) ‘safe and sound’, de mal en (peor) ‘from bad to worse’;
correcting written multi-word units (α = .969), for example recibir con los brazos
supplying the preposition of prepositional verbs (α = .903), for example Los pájaros se alimentan
judging whether words (α = .775) or combinations of words (α = .669) presented auditorily are possible (real) in Spanish, for example abrellaves (abrelatas) ‘can opener’, equipaje ligero ‘light luggage’, hacer ventaja (dar) ‘give an advantage.
Morphology and syntax
Assessment in this domain involved five measures. The first was an auditory GJT (Cronbach’s α = .916) with 144 items focusing on seven target structures – gender agreement, object clitics, prepositions por/para, aspectual contrasts, unaccusative and unergative verbs, the subjunctive, and ser/estar – all known from previous work on L2 Spanish (e.g. Montrul, 2008) to be difficult for speakers of a non-romance language. An auditory modality requiring online processing of stimuli was preferred as a receptive measure involving automatic use of L2 knowledge. This was considered a more accurate representation of the type of knowledge that would be available for the participants in spontaneous communication. Next came a picture-guided narrative (an oral retelling of a short clip from a Mr Bean video) used to calculate the percentage of error-free clauses, and two word-order preference tasks, one testing basic word order in sentences (Cronbach’s α = .765), the other, marked discourse-based word order in short communicative exchanges (Cronbach’s α = .943). Finally, there was a gender-assignment task (Cronbach’s α = .900), where participants had to assign gender to very rare (so-called ‘zero frequency’) words in Spanish. These were all words ending in -z, whose gender is established on the basis of a combination of phonetic and lexico-functional criteria, 7 as discussed by Teschner (1983) (for sample items, see Appendix).
3 Analysis
For the purpose of analysis, L2 learners were divided into three groups by AO: 3–6 (n = 20), 7–15 (n = 27), and 16–29 years (n = 18). Age six has often been suggested in the literature as a likely end-point for the peak period of sensitivity for L1 and L2 acquisition, and the mid-teens as marking the closure of the off-set period for an SP for morphology and syntax. In addition, as noted earlier, those were the ages hypothesized by Long (1990) to be relevant for the domains in question and that have proved consistent with findings in several previous studies of maturational constraints on SLA.
One-way analyses of variance (ANOVAs) comparing the three groups revealed significant differences for AO, LOR, and age at testing. Regarding AO, group was a significant factor (F(2, 62) = 243.029, p < .001) and all the groups differed from one another (p < .001), according to Scheffé post hoc tests. The average AO was 4.35 (SD = 1.18) in the 3–6 group, 11.96 (SD = 1.85) in the 7–15 group, and 19.39 (SD = 3.05) in the 16–29 group. As for LOR, group was also a significant factor (F(2, 62) = 33.290, p < .001), but, while the earliest group had a significantly higher LOR than the other two (p < .001), there were no differences between the AO 7–15 and AO 16–29 groups (p = .218), according to Scheffé post hoc tests. The average LOR was 19.10 years (SD = 4.08) in the 3–6 group, 12.07 years (SD = 3.80) in the 7–15 group, and 10.11 years (SD = 2.81) in the 16–29 group. Finally, group was also a significant factor for age at testing (F(2, 62) = 19.861, p < .001). The latest group was significantly older than the other two (p < .001), but there were no differences between the AO 3–6 and AO 7–15 groups (p = 1.000), according to Scheffé post hoc tests. The average age at testing was 24.15 (SD = 3.92) in the 3–6 group, 24.19 (SD = 3.97) in the 7–15 group, and 31.78 (SD = 5.28) in the 16–29 group.
The three AO groups did not differ regarding percentage of Spanish use at home (p = .283), percentage of daily Spanish use (p = .162), or hours of daily Spanish use (p = .385). The percentage of daily Spanish use was 65% in the AO 3–6 group, 55% in the AO 7–15 group, and 54% in the AO 16–29 group, suggesting that the three groups used Chinese between 35% and 46% of the time, on average. The three groups did not differ regarding degree of identification with Spanish culture, either, on a scale between 1 (very little) and 5 (a lot) (p = .941). In fact, the averages were very similar: AO 3–6 (M = 3.40, SD = .99), AO 7–15 (M = 3.35, SD = .80), AO 16–29 (M = 3.44, SD = 1.04).
Scores on the language tests were standardized to a scale ranging from 0 to 100 to allow comparisons across domains. Note that there was no chance level in any of the domains, since, except for pronunciation, they included a combination of productive and receptive tasks. Alpha was set at .05 for all analyses.
V Results
1 AO × ultimate attainment
a Phonology
The average pronunciation ratings on the read-aloud task are shown in Table 3. Ratings were converted into percentages for comparability. They were normally distributed in each of the groups, according to one-sample Kolmogorov–Smirnov (K–S) tests (p > .05).
Group percentage scores in phonology.
Hypotheses 1 and 2 predicted that pronunciation ratings would be inversely correlated with AO among L2 learners with an AO > 6 and that no learners with an AO > 12 would obtain pronunciation scores within the NS range. The scatterplot in Figure 2 displays scores as a function of AO. The horizontal dashed line indicates the lowest score in the NS group (i.e. NS range), while the vertical line shows the latest AO scoring within the NS range. As can be seen, the decline in pronunciation started very early and was already significant in the AO 3–6 group (r = −.54, p = .015). The decline continued in the AO 7–15 group, but was not as steep as in the AO 3–6 group (r = −.36, p = .067) and, starting from age 16, the AO ultimate attainment function flattened out (a weak, approaching-zero, negative correlation), producing a clear discontinuity (r = −.14, p = .566). No learner performed within the NS range in pronunciation with an AO later than 5.

Scores in phonology as a function of AO.
b Lexis and collocation
An overall score on lexis and collocation was computed for each participant by averaging percentage scores on the six lexical and collocational tasks (compound completion, multi-word unit completion, multi-word unit correction, prepositional verbs, word/non-word discrimination, and collocational judgment) (see Table 4). Scores in each of the groups were normally distributed, according to one-sample K–S tests (p > .05).
Group percentage scores in lexis and collocation.
Hypotheses 3 and 4 predicted that lexis and collocational scores would be inversely correlated with AO among participants with an AO > 6, and that no participants with an AO > 12 would score within the NS range. The scatterplot in Figure 3 presents lexis and collocational scores as a function of AO. The decline in the lexical domain took place later than in the phonological domain, as shown by the significant negative correlation in the AO 7–15 group (r = −.59, p = .001), but non-significant correlation in the AO 3–6 group (r = −.22, p = .361). There was a clear discontinuity in the data starting at AO 16, but also a more gradual decline (r = −.44, p = .066) thereafter. While only very early starters in the AO 3–6 group scored within the NS range in pronunciation, two participants in the AO 7–15 group scored within the NS range in lexis and collocation, each with an AO of 9.

Scores in lexis and collocation as a function of AO.
c Morphology and syntax
An overall score for morphosyntax was computed by averaging the percentage scores on the five morphosyntactic tasks (GJT, error-free clauses in oral narration task, word order preference task, discourse-determined word order, and gender assignment) (see Table 5). Scores were normally distributed according to one-sample K–S tests (p > .05).
Group percentage scores in morphology and syntax.
Hypotheses 5 and 6 predicted that morphosyntactic scores would be inversely correlated with AO among participants with an AO > 6, and that no participants with an AO > 15 would score within the NS range. Figure 4 shows a scatterplot of scores as a function of AO. As in the lexical domain, the decline in morphosyntax took place in the AO 7–15 group (r = −.43, p = .025), but it was less steep than in the lexical domain (−.43 vs −.59). Weak, approaching zero, negative correlations were observed in the other two AO groups (r = −.09, p = .687 and r = −.17, p = .498). The scatterplot also showed that one participant in the AO 7–15 group was able to score within NS range with an AO of 12, later than any individuals within the NS range in the phonological and lexical domains.

Scores in morphosyntax as a function of AO.
In addition to correlation coefficients, multiple linear regression analyses were performed to compare the slopes of the age-attainment function in each domain. A restricted one-slope model with AO as a single predictor was compared against a full model that included interaction terms between the predictor and dummy-coded AO group variables. If slope differences between age groups are substantial enough, the R2 change between the full and restricted model should be statistically significant. The results of the analyses showed that a regression model with two breakpoints provided a significantly better fit to the data than a regression model without breakpoints for each of the language domains: Lexis and collocation (F(2, 61) = 3.423, p = .039, R2 change = .04), morphosyntax (F(2, 61) = 3.191, p = .048, R2 change = .04), and phonology (F(2, 61) = 4.784, p = .012, R2 change = .05). However, the increase in variance accounted for, even if significant, was only around 5%. This could mean that the less complex (i.e. more parsimonious) model with no breakpoints is already a good enough fit to the data or, alternatively, that a larger sample size is needed to compensate for the loss of degrees of freedom and to minimize the risk of overfitting.
The effects of AO on ultimate attainment were further assessed by means of a multivariate analysis of variance (MANOVA) with AO group as a between-participants factor (controls, AO 3–6, AO 7–15, and AO 16–29) and scores in the three domains as dependent variables. The multivariate effect was significant by AO group (F(3, 71) = 15.781,p < .001, η p 2 = .392 8 ). Univariate tests further showed that groups differed in each of the domains: phonology (F(3, 71) = 70.160, p < .001, partial η p 2 = .748), lexis and collocation (F(3, 71) = 46.265, p < .001, partial η p 2 = .662), and morphology and syntax (F(3, 71) = 45.737, p < .001, partial η p 2 = .659). According to Bonferroni-adjusted pair-wise comparisons, those learners who started the process of acquisition before age six performed significantly better than those in the AO 7–15 and AO 16–29 groups (p < .001), but did not differ from controls in either phonology (p = .150), lexis and collocation (p = .425), or morphosyntax (p = .052). The AO 7–15 group also performed better than the AO 16–29 group in phonology and lexis and collocations (p < .001), as well as in morphosyntax (p = .044). Both groups, in turn, performed significantly lower than the controls in all three domains (p < .001).
To summarize, comparisons of L2 attainment in the three domains (phonology, lexis and collocation, and morphosyntax) (see Figure 5) revealed that the overall decline in performance as a function of AO was the steepest for phonology (r = −.81, p < .001), followed by lexis and collocation (r = −.79, p < .001) and morphosyntax (r = −.73, p < .001). By AO group, there were significant inverse correlations in the AO 3–6 group for phonology, and, in the AO 7–15 group, for lexis and collocation, and morphosyntax. There was also a mild decline (r = −.44) that approached significance in the AO 16–29 group for lexis and collocation. No L2 learner performed within the NS range with an AO later than 5 in pronunciation, later than 9 in lexis and collocation, and later than 12 in morphosyntax. Finally, AO had a definite and measurable effect on the level of ultimate attainment reached by L2 learners, as shown by between-participants evidence of differential attainment in the three language domains.

Summary of results for three language domains.
Multiple sensitive periods
In order to investigate whether the relationship between AO and performance in different language domains was consistent with the existence of multiple SPs, a repeated-measures ANOVA was conducted with the scores in the three language domains as a within-participants factor and AO group as a between-participants factor. The analysis yielded a significant two-way interaction between language domain and AO group (F(6, 140) = 17.828,p < .001, η p 2 = .433) (see Figure 6). Controls and the earliest AO group scored highest on phonology, followed by lexis and collocation, and morphosyntax (phonology > lexis and collocation > morphosyntax), while the opposite pattern was observed in the two latest AO groups (morphosyntax > lexis and collocation > phonology). In addition, language domain was a significant main factor in the AO 7–15 group (F(2, 25) = 27.058, p < .001, η p 2 = .684) and the AO 16–29 group (F(2, 16) = 107.589, p < .001, η p 2 = .931). Bonferroni-adjusted pair-wise comparisons showed that the three language domains in these two AO groups were significantly different from one another (p < .001).

Scores by AO group and language domain.
LOR × language domain
Hypotheses 4 and 6 predicted no effects of LOR on attainment in phonology or morphosyntax. Hypothesis 5, on the other hand, predicted effects of LOR in lexis and collocation. Correlations were first computed between LOR and scores in each domain. Given that AO and LOR were strongly and significantly correlated in the sample (r = −.70, p < .001), indicating that participants with earlier AOs had longer LOR and participants with later AOs had shorter LOR, partial correlations were also computed in order to assess the independent contribution of LOR. By group, AO and LOR were significantly correlated in the AO 3–6 and AO 7–15 groups (r = −.58, p = .008 and r = −.42, p = .031), but not in the AO 16–29 group (r = −.109, p = .688). The overall correlations between LOR and scores in phonology, lexis and collocation, and morphosyntax were .62 (p < .001), .54(p < .001), and .68 (p < .001), respectively. When the effect of AO was partialed out, the strength of the correlations with LOR fell, becoming non-significant in phonology (r = .14, p = .273) and morphosyntax (r = .06, p = .658). In the case of lexis and collocation, the correlation also fell, but remained significant (r = .28, p = .043). The reverse partial correlations, between AO and scores in each of the domains, with LOR partialed out, were all moderately strong and significant: −.66 (p < .001) between AO and scores in phonology, −.60 (p < .001) between AO and scores in lexis and collocation, and −.58(p < .001) between AO and scores in morphosyntax.
In the AO 3–6 group, the correlations between LOR and phonological, lexical and collocational, and morphosyntactic scores were all non-significant: .33 (p = .146), .41 (p = .071), and .01 (p = .995), respectively. When AO was partialed out, the correlations with LOR dropped in magnitude: .04 (p = .863), .33 (p = .156), and −.07 (p = .781). The reverse partial correlations, between AO and phonological, lexical and collocational, and morphosyntactic scores, with LOR partialed out, were −.44 (p = .057), .02 (p = .942), and −.12 (p = .624), respectively.
In the AO 7–15 group, the correlations between LOR and scores in each of the domains were also weak and non-significant: .05 (p = .817) between LOR and scores in phonology, .25 (p = .207) between LOR and scores in lexis and collocation, and .14 (p = .470) between LOR and scores in morphosyntax. When AO was partialed out, the correlations approached zero: −.07 (p = .740), .08 (p = .699), and .02 (p = .927), respectively. The reverse correlations with AO, partialing LOR out, remained significant for lexis and collocation (r = −.53, p = .006) and morphosyntax (r = −.41, p = .038), and non-significant for phonology (r = −.36, p = .070).
Finally, in the AO 16–29 group, the correlations between LOR and scores in phonology and morphosyntax were weak and non-significant, .04 (p = .880) and −.19 (p = .448), but the correlation between LOR and scores in lexis and collocation was moderate and significant (r = .53, p = .023).
In order to account for the LOR variable, a multivariate analysis of covariance was performed to assess the effects of AO, while controlling for the effects of LOR, in the three language domains. The results of the multivariate tests showed that AO group was a significant factor (F(2, 61) = 7.758, p < .001, η p 2 = .283) and that LOR was a significant covariate (F(2, 61) = 3.205, p = .030, η p 2 = .140). Bonferroni-adjusted pair-wise comparisons indicated that the covariate-adjusted means of the three AO groups were significantly different from one another. The AO 3–6 group performed significantly better than the AO 7–15 and AO 16–29 groups in phonology (p < .001 and p < .001), lexis and collocation (p = .009 and p < .001), and morphosyntax (p < .001 and p < .001). The AO 7–15 group also performed significantly better than the AO 16–29 group in phonology (p = .002), lexis and collocations (p = .005), and morphosyntax (p = .046).
To summarize, LOR was weakly and not significantly correlated with ultimate attainment in phonology and morphosyntax when AO was controlled for. However, LOR was significantly related to ultimate attainment in lexis and collocation, even after controlling for AO (r = .28, p = .043). This was a weaker correlation than the reverse between AO and lexis and collocation with LOR partialed out, which was −.60, p < .001. By group, LOR was significant in lexical and collocational attainment in the latest AO group, but controlling for LOR as a covariate did not affect the significant differences between the three AO groups.
Aptitude × ultimate attainment
The role of language aptitude in ultimate L2 attainment was investigated in each AO group and language domain. Hypotheses 7 and 9 predicted no relationship between aptitude and L2 attainment in phonology and morphosyntax for learners in any AO group. Hypothesis 8 predicted a relationship between aptitude and attainment in lexis and collocation for late learners.
Average raw scores in language aptitude.
The average language aptitude scores in each of the groups are displayed in Table 6. The AO 3–6 group scored significantly higher than all the other groups, according to Tukey post hoc tests. According to the more conservative Scheffé, the AO 3–6 and AO 7–15 groups did not differ from each other (p = .065). There were no other significant group differences. These results could be interpreted in support of the cognitive benefits of early bilingualism, although the possibility exists that they were chance effects resulting from the relatively small cell size (n = 20).
Correlations between language domain and language aptitude.
In order to assess the relationship between aptitude and ultimate attainment, correlations were computed between language aptitude and scores in each domain (see Table 7). The correlations between aptitude and scores in phonology, lexis and collocation, and morphosyntax were non-significant in the control group and in the AO 3–6 and AO 7–15 L2-learner groups. In the latest AO group, AO 16–29, the correlations with aptitude were significant for phonology and for lexis and collocation, but not for morphosyntax. Given that LOR also correlated significantly with scores in lexis and collocation in the AO 16–29 group, a partial correlation was computed between aptitude and scores in lexis and collocation, controlling for LOR. The correlation between aptitude and lexical and collocational scores remained significant with LOR partialed out (r = .53, p = .029), while the reverse partial correlation with LOR, controlling for aptitude, was weaker and did not reach significance (r = .44, p = .077).
In order to assess the role of aptitude in the AO 16–29 group further, scores of high- and low-aptitude individuals were compared according to a z-score distribution, where high aptitude = z-scores > .5, mid aptitude = −.5 < z-scores < .5, and low aptitude = z-scores < −.5. In phonology and lexis and collocation, high-aptitude learners (n = 6) scored 34.49 (SD = 8.75) and 52.63 (SD = 12.98), respectively, while low-aptitude learners (n = 5) scored 20.56 (SD = 2.51) and 33.87 (SD = 6.41). These differences were statistically significant (t(9) = 3.418, p = .008, and t(9) = 2.930, p = .017) 9 . In morphosyntax, high-aptitude learners scored 65.69 (SD = 6.08) and low-aptitude learners 61.87(SD = 7.86). The difference was not significant (t(9) = .911, p = .386).
As a follow-up to the results in the AO 16–29 group, the strength of relationships between the different aptitude sub-tests and scores in phonology and lexis and collocation was assessed. For phonology, the strongest correlations were with the aptitude sub-tests measuring sound–symbol correspondence (LLAMA E) (r = .41, p = .091) and grammatical inferencing (LLAMA F) (r = .36, p = .137). The correlations with vocabulary learning and sound recognition (LLAMA B and D) were .24 (p = .332) and .16 (p = .519), respectively. For lexis and collocation, the strongest correlations were with sound recognition (LLAMA D) (r = .46, p = .058) and sound–symbol correspondence (LLAMA E) (r = .36, p = .137). The correlations with vocabulary learning and grammatical inferencing (LLAMA B and F) were .26 (p = .307) and .25 (p = .318), respectively.
To summarize, scores in the control group and the two earliest AO groups (AO 3–6 and AO 7–15) were unaffected by aptitude. However, scores in the latest AO group (AO 16–29) showed a relationship with aptitude, depending on language domain. Scores in pronunciation and in lexis and collocation correlated with aptitude, while scores in morphosyntax did not. The two aptitude sub-tests that accounted for the significant correlation in lexis and collocation involved learning auditory stimuli, either in relation to an arbitrary symbol (sound–symbol correspondence) or as part of a longer list of words.
Discussion
The study addressed two major research questions:
Research question 1: Is there a relationship between AO and performance in different language domains that is consistent with the existence of multiple SPs?
Yes. The data showed a steeper decline in pronunciation overall, with the steepest declines in the AO 3–6 and 7–15 AO ranges, followed by a marked flattening over the rest of the AO range (16–29). The steepest decline in the morphosyntactic and lexical and collocational domains was in the AO 7–15 range, after which the rate of decline again visibly slowed, thus providing important evidence of clear discontinuities in all three domains required for SP claims to go through. This pattern of results confirmed Hypotheses 3 and 5, which predicted an inverse correlation between AO and performance among participants with an AO > 6. Hypothesis 1, however, was refuted, since the decline in phonology started earlier than AO 6 in speech production. This result suggests that peak sensitivity in phonology is likely to be in the AO 0–3 or 0–4 range, which this study did not include (i.e. balanced bilinguals, such as one parent-one language cases).
The latest AOs at which there was evidence of native-like attainment were 5 in phonology, 9 in lexis and collocation, and 12 in morphosyntax. This confirmed Hypotheses 2, 4, and 6, which predicted that no L2 learner would score within the NS range beyond AO 12 in phonology and lexis and collocation, and beyond AO 15 in morphosyntax. There was between-participants evidence of differential attainment according to AO group in all domains, and within-participants evidence of differential attainment × domain. These findings are consistent with the claimed existence of multiple SPs for different language domains (Long, 1990; Meisel, 2011; Schachter, 1996; Seliger, 1978), as shown by different AO-ultimate attainment functions.
In what has been the least researched of the three domains to date, the study provides further evidence of the existence of a separate SP for lexical and collocational ability, possibly closing after that for pronunciation and probably earlier than that for morphosyntax (Hyltenstam, 1992; Lee, 1998; Spadaro, 1996). The lexis and collocation domain was also sensitive to LOR, but the overall correlation with AO, partialing out LOR, was stronger than the reverse. In the 7–15 AO range, only the correlation with AO, partialing out LOR, was significant. Significant differences between AO groups remained after controlling for LOR. Again, the results show that the effect for AO is robust and more strongly related to ultimate attainment.
Research question 2: Are language aptitude and/or LOR related to ultimate L2 attainment, depending on AO and/or language domain?
Yes, but only to a limited extent. LOR was weakly and non-significantly correlated with ultimate attainment in phonology and morphosyntax when AO was controlled for. Therefore, Hypotheses 7 and 9, which predicted a lack of a LOR effect in those two domains, were confirmed. On the other hand, LOR was significantly related to ultimate attainment in lexis and collocation, even after controlling for AO, thus confirming Hypothesis 8. However, this was a weaker correlation than the reverse between AO and lexis and collocation, with LOR partialed out. In addition, significant differences between AO groups remained after controlling for LOR.
With respect to aptitude, with increasing AO, and increasing age in general, aptitude appears to be relevant in the domain of lexis and collocations. The results showed significant positive associations between aptitude and ultimate attainment in lexical and collocational ability, and – probably for reasons having to do with the manner in which it was tested in this study – between aptitude and pronunciation, only in the latest AO group (16–29) 10 . This contrasted with the findings in the grammatical domain, where there was a non-significant association between aptitude and ultimate morphosyntactic attainment in any group. These results confirmed Hypotheses 11 and 12, which predicted a relationship between aptitude and lexical and collocational scores, but no relationship between aptitude and morphosyntactic scores. These results further refuted Hypothesis 10, which predicted no relation between aptitude and scores in phonology.
As noted earlier, DeKeyser (2000) predicted that different levels of aptitude would be called upon, depending on age, due to the operation of qualitatively different learning mechanisms in learners with an AO before or after the mid-teens, and offered evidence to that effect in the form of a significant positive relationship between language aptitude and morphosyntactic attainment among late starters, but not early starters (DeKeyser, 2000; DeKeyser et al., 2010). The present findings and those of Granena (2013b) are different. There was no evidence of an effect for aptitude in morphosyntax from speakers of two typologically different L1s, Chinese and English respectively, and the same L2, Spanish. In the present study with NSs of Chinese, the correlation between aptitude and scores on an auditory GJT among late starters was not statistically significant (r = .13, p = .601), and such was the case in the study with NSs of English (Granena, 2013b), where the correlation with aptitude scores on an auditory GJT was also not statistically significant (r = .182, p = .336). Aptitude in the present study was related, however, both to the late (AO 16–29) learners’ L2 pronunciation and lexical and collocational abilities. How can these conflicting findings be reconciled, first, across the four studies, and second, across the three domains?
First, as noted earlier, since many more children than adults achieve near-native abilities, there will typically be less variability in their proficiency scores, that is, a ceiling effect, which will make it harder to show the influence of any moderator variable, including aptitude, and easier to show its effects in adult groups with the greater variability in ultimate attainment that is typical of late starters. That said, aptitude varies among children too. Use of a large enough sample to reflect that, in combination with a smaller adult sample whose members had above-average aptitude, as in Abrahamsson and Hyltenstam’s (2008) study, can modify the usual pattern. There was enough variability in their younger group’s GJT scores to find a relationship with aptitude, but not enough variability in terms of aptitude in the older group to do so. Further, the type and complexity of the tests used, and testing procedures, in the different studies may play a major role. DeKeyser’s (2000) GJT looks much easier than the one designed by Abrahamsson and Hyltenstam (2008). Are these studies correlating aptitude with ultimate attainment or with testing conditions, including complexity, such that the harder the test, the better participants with high aptitude will fare?
In addition to the type and complexity of measures employed, data-collection procedures may be important. Granena (2013b) has pointed out that the GJTs in the two DeKeyser studies shared features of off-line tests, since the procedure involved presenting each sentence twice, with a three-second interval between repetitions and a six-second interval between items. Abrahamsson and Hyltenstam’s participants heard each sentence only once and were given a maximum number of seconds to respond, but the results were reported in combination with those from an unspeeded written GJT, justified because scores on the aural and written GJTs correlated at .91 and there were no significant differences in mean scores between the two test modes. The processing demands participants faced in the Swedish study with the intentionally complex stimuli of its aural GJT (e.g. ‘Given that the economic upturn the country was approaching was very obvious, one understands the capitalists’ position regarding protectionist tolls’) may have taxed working memory, making a relationship between aptitude and outcomes more likely. Robinson (2005) has claimed that aptitude plays a larger role when individuals with strengths in processing speed and working memory are faced with parsing syntactically complex and informationally dense input. The complexity of the Swedish GJT items will have encouraged participants to pay close attention. Measures in both the DeKeyser studies and the Stockholm study would have allowed, or even required, (small ‘m’) monitoring and controlled use of L2 knowledge 11 .
This interpretation is supported by the results of the study by Granena (2013b) with 30 NSs of English, very advanced speakers of L2 Spanish. All were adult acquirers (mean age of acquisition 26, range 17–43), and long-time residents of Spain (mean LOR 22 years, range 7–39). There were 15 NS controls. Granena found that both groups performed significantly better on an unspeeded written GJT with an error correction component than on an aural GJT. In addition, there was a positive effect for aptitude, as measured by the LLAMA, on the written GJT, only in the L2 learner group but not in the NS group. High- and mid-aptitude L2 learners performed better than their low-aptitude counterparts on the written GJT, especially on ungrammatical items, but not on the auditory GJT. Granena concluded that aptitude, as measured by tests such as the MLAT or the LLAMA, loosely based on the MLAT, is important in morphosyntax when tasks allow time and encourage a focus on language form.
It is possible, therefore, that the aptitude measures in the two DeKeyser studies – the MLAT Words-in-Sentences sub-test (DeKeyser, 2000) and the verbal aptitude test comparable to the Scholastic Aptitude Test (SAT) in the United States (DeKeyser et al., 2010) – and the unspeeded auditory GJTs all tap the same underlying language analytic component of aptitude, again facilitating a positive correlation in older learners, but less so in younger learners. In contrast, in the present study, like that in Granena (2013b), the GJT and the administration procedure were different. Sentences in our study were relatively short and simple (e.g. ‘The climate in southern European countries is Mediterranean’), and were only played once. Participants had to press a key as soon as they detected an error, whereupon the computer immediately presented the next sentence without a pause. This task, it is reasonable to believe, tapped participants’ implicit or automatized knowledge of the L2, whereas the GJTs in the other studies allowed, or even demanded, access to whatever conscious knowledge participants possessed, and allowed controlled production. It seems reasonable to suppose that relatively more speeded, on-line measures are more valid reflections of underlying linguistic competence.
Studies are clearly needed that compare L2 learners’ performance on two or more of the aptitude measures currently used in these studies, and the predictions of each with one or more outcome measures. The aptitude tests used in these studies differed, and as Abrahamsson has suggested, therein could lie the reason for the different results. It could simply be that the apparently conflicting results reflect researchers having assessed different dimensions of language aptitude.
If aptitude is an inherited, largely immutable trait, it is present from birth, although some improvement through training, especially of working memory, may be a possibility. We would therefore expect it to have an influence on language learning at all ages unless overwhelmed by more powerful age-related variables, such as the child’s capacity for implicit learning. However, if implicit language learning remains at least partially available to adults, as the evidence increasingly suggests, or is even the default learning process throughout the life-span, especially in L2 contexts – as claimed, for example, by Doughty (2003), Long (2010), and Ellis (2008) – we would not expect to see a major effect for aptitude in either children or adults, and possibly little or no effect, as observed in Granena (2013b) and the present study, except in the domain of lexis and collocations with older starters, provided abilities are measured appropriately. Indeed, on the basis of those last two studies, we tentatively conclude, contrary to DeKeyser, that adult naturalistic acquirers need not have a high level of language aptitude to reach near-native L2 abilities. If so, it would in turn suggest that adults are not as dependent on explicit learning as skill-building models propose. That said, note again that the lower variability in children’s ultimate attainment will usually make it harder for any moderator variable, including aptitude, to affect scores, and even if explicit learning does turn out to have a more significant role in adult than child language learning, its effect cannot be expected to show itself if preempted by characteristics of the measures and procedures employed in some studies.
Our interpretation of the different findings across domains is as follows. The memory component of aptitude (Robinson, 2002; Skehan, 2002) plays a role in the learning of lexis and collocations (e.g. Speciale et al., 2004), but not in morphosyntax. The learning of vocabulary items and collocations are clear cases of item-based learning, and as with NSs, of a process that continues throughout the life-span. Yet item-based learning is the very kind for which the implicit learning capacity declines in adults (Hoyer and Lincourt, 1998). This would predict a greater decline in the capacity for acquiring new lexis and collocations than morphosyntax. It is noteworthy that Flege et al. (1999) also found that LOR predicted lexically-based items, but not rules. Aptitude is compensating for AO effects in lexis and collocations, but not in morphosyntax because, barring low frequency and otherwise perceptually non-salient grammatical examples, rule-learning is completed (or is unlikely to occur at all) within the shorter period for which LOR is relevant, perhaps for somewhere between one and three years (Fathman, 1975) or as much as five years (Johnson and Newport, 1989; Munnich and Landau, 2010; Oyama, 1978; Patkowski, 1980).
This still does not account for the fact that aptitude was a relevant factor in pronunciation scores within the AO 16–29 group, too. Phonological rules, which are finite and even more limited than those for grammar, should also have been acquired (or remain unlikely ever to be acquired) within the first few months and years of L2 exposure. In the case of pronunciation, we do not think aptitude was at work in the acquisition process, but rather, was a factor in this study because of the kind of measure used to obtain speech samples, a (small ‘m’) monitorable, reading-aloud task. It is quite possible that those late L2 learners with higher (analytic/explicit) aptitude were more successful at monitoring their pronunciation while reading. One way of testing this explanation would be to ask a panel of NS judges to evaluate spontaneous speech samples obtained from the same participants, for example an online narration task while watching a video clip. If our explanation is correct, we should expect to see the correlation between aptitude and pronunciation in the AO 16–29 group disappear.
Conclusions
To the best of our knowledge, this is the first study to provide evidence within the same individuals consistent with the existence of three consecutive SPs, for phonology, lexis and collocations, and morphology and syntax. The evidence from this study, plus findings from previous research by others, leads to the conclusion that there is an SP for phonology, its offset beginning at age six, and possibly earlier (in this study, no native-like L2 learners with an AO later than five, and larger mean differences between the groups), probably closing by age 12. There is an SP for lexis and collocations, its offset beginning around age six (in this study, no native-like L2 learners with an AO later than age nine, and larger mean differences between groups), probably closing between ages nine and 12, earlier than the SP for morphology and syntax. There is an SP for morphology and syntax (in this study, no native-like L2 learners with an AO later than 12, and larger mean differences between groups), its off-set beginning at age six, and closing in the mid-teens.
Unlike phonology and grammar, lexical and collocational knowledge continues to develop throughout the life-span in both NSs and NNSs, but with explicit learning playing an increasingly important role, as the human capacity for implicit learning, especially for implicit item learning, gradually declines with age. It is for that reason that language aptitude can play a mitigating role, modifying the negative effects of increasing AO and age in general, in the lexical and collocational domain.
Footnotes
Appendix
Morphosyntactic tasks
|
1. GJT (k = 144). Target structures are underlined and the correct structure is given in brackets. |
|
1.1. Noun–adjective gender agreement in predicative position (k = 17)
El precio de la carne es más caro que el precio del pescado. ‘The price of the meat is more * El crecimiento de la población española ha sido ‘The growth of the Spanish population has been |
|
1.2. Object clitics (k = 18)
Carlota siempre compra la ropa sin probárse ‘Carlota always buys clothes without trying * La niña está cansada porque hice caminar ‘The girl is tired because I made |
|
1.3. Prepositions por and para (k = 24)
Tendré el pan preparado ‘I will definitely have the bread ready * Tu abuela se conserva muy bien ‘Your grandmother looks very well |
|
1.4. Perfective and imperfective aspect contrasts (k = 24)
Carlos ‘Carlos * A los cinco años, Silvia se ‘At age 5, Silvia |
|
1.5. Unaccusative/unergative distinction (k = 14)
‘Once the house * ‘Once the baby |
|
1.6. Subjunctive mood (k = 24)
Los periodistas darán la noticia cuando lo ‘Journalists will break the news when the government * El nuevo jugador organizará una fiesta cuando ‘The new player will throw a party when he |
|
1.7. Verbs ser and estar (k = 23)
El apartamento de Jorge siempre ‘Jorge’s apartment * La miel ‘Honey |
|
2. Word order preference task (k = 30). a) Maribel b) Maribel |
|
3. Discourse-determined word order preference task (k = 38).
¿Qué se ha quemado? La cena se ha quemado por culpa de tu hermano.
Se ha quemado la cena por culpa de tu hermano. (preferred)
Por culpa de tu hermano, la cena se ha quemado. |
|
4. Gender assignment (k = 25)
Masculine: alfiz, arraez, alaroz Feminine: venadriz, tibiez, foz |
Acknowledgements
The authors thank Niclas Abrahamsson, Emanuel Bylund, and Kenneth Hyltenstam for valuable suggestions and comments during the completion of this study, as well as Sunyoung Lee-Ellis and Ilina Stojanovska for their help with coding the data.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
