Abstract
In order to understand variability in second language (L2) acquisition, this study addressed how individual differences in cognitive abilities may contribute to development for learners in different contexts. Specifically, we report the results of two short-term longitudinal studies aimed at examining the role of cognitive abilities in accounting for changes in L2 behavioral performance and neurocognitive processing for learners in ‘at-home’ and ‘study-abroad’ settings. Learners completed cognitive assessments of declarative, procedural, and working memory abilities. Linguistic assessments aimed at determining behavioral sensitivity and online processing of L2 Spanish syntax were administered before and after a semester of study in either a traditional university classroom context (Experiment 1) or a study-abroad context (Experiment 2). At-home learners evidenced behavioral gains, with no detected predictive role for individual differences in cognitive abilities. Study-abroad learners evidenced behavioral gains and processing changes that were partially accounted for by procedural learning ability and working memory. Taken together, these results provide preliminary insight into how individual differences in cognitive abilities may contribute to behavioral and neural processing changes over time among learners in different natural contexts.
Keywords
I Introduction
In the pursuit of explaining why second language (L2) learners attain different levels of success in L2 development, research has addressed questions related to a number of factors, including the role of context (for example, learning in study-abroad or at-home university contexts) and the role of individual differences in cognitive abilities (for example, declarative, procedural, and working memory abilities; e.g. DeKeyser, 1991; Sanz, 2014). Research has also begun to explore how these factors impact variability in L2 neurocognitive processing (e.g. Morgan-Short et al., 2012; Tanner et al., 2014). Current calls in the literature claim that our understanding of variability in L2 development is best advanced not by investigating each factor independently, but rather by exploring the interplay among a variety of factors (e.g. Collentine and Freed, 2004; Robinson, 2001; Sanz, 2005). Indeed, researchers are increasingly interested in examining how individual differences in cognitive abilities might play out differently in specific learning contexts (Sanz, 2014). The experiments reported here address this question by exploring the role of individual differences in cognitive abilities in at-home (Experiment 1) and study-abroad (Experiment 2) learning contexts for L2 behavioral development and neurocognitive processing. We motivate the experiments by reviewing L2 literature relevant to individual differences in memory, individual differences in neurocognitive processing, the role of context, and open issues in these areas that inform the current research design.
1 Review of literature
a Individual differences in L2: The role of memory
A number of individual difference factors have been examined in relation to variability in L2 success among learners with an emerging focus on the role of two long-term memory systems: declarative memory, which underlies semantic and episodic memory (Tulving, 1993); and procedural memory, which underlies motor and cognitive skills and habit learning (e.g. Knowlton and Moody, 2008). Theoretical claims posit a role for declarative and procedural memory and knowledge in L2 development (e.g. DeKeyser, 2015; Paradis, 2009; Ullman, 2015; Ullman and Lovelett, 2018), generally suggesting a large role for declarative memory and knowledge at earlier stages of L2 learning and an increasing role for procedural memory and knowledge at later stages of L2 learning. Thus, one may predict that individuals who have greater ability to learn declaratively may show more success at earlier stages of L2 whereas individuals who have greater ability to learn procedurally may show more success at later stages of L2. Indeed, results of empirical research seem largely consistent with these predictions, with studies showing that declarative memory 1 is related to success at early stages of development (for explicitly trained learners: Carpenter, 2008; Carpenter et al., 2009; for incidentally and implicitly trained learners: Hamrick, 2015; Morgan-Short et al., 2014), and that procedural memory is related to success at later stages of learning (Brill-Schuetz and Morgan-Short, 2014; Hamrick, 2015; Morgan-Short et al., 2014) as well to as increased neural activation for L2 (Morgan-Short et al., 2015a). Importantly, theoretical perspectives suggest that the relative role of declarative and procedural memory may be mediated by context of learning, where declarative memory might play a larger role for learners with primarily explicit L2 instruction and procedural memory might play a larger role for learners when exposure to L2 without explicit instruction is more prominent (Ullman, 2015). These predications are supported by results from an artificial language study that found an interaction between explicit and implicit training and high and low procedural memory ability, with high procedural memory learners performing best in an implicit training condition (Brill-Schuetz and Morgan-Short, 2014). Studies directly examining the role of declarative and procedural memory, however, have been limited to the acquisition of (semi-)artificial languages for learners trained under implicit or incidental conditions as assessed, primarily, by behavioral performance measures. Thus, it is critical to examine whether similar patterns of findings would be evidenced (1) for natural language, (b) across different contexts, and (3) for neurocognitive L2 processing measures. Furthermore, it may be worthwhile to consider declarative and procedural memory in conjunction with other cognitive abilities.
Another factor posited to play a role in L2 acquisition is a learner’s ability to simultaneously process and store information, that is, an individual’s working memory (WM; e.g. McDonald, 2006; Williams, 2012). Although not all researchers agree regarding the nature of the relationship between WM and L2, results of a recent, large-scale meta-analysis suggest a positive relationship between WM and L2 abilities (Linck et al., 2014). Importantly, there is evidence that the relationship between WM and L2 abilities may be mediated by factors such as learner proficiency (Linck et al., 2014), learning target (e.g. McDonald, 2006; Sagarra and Herschensohn, 2010), and learning condition or context (e.g. Tagarelli et al., 2015; Williams, 2012).
Regarding WM and learning context, contradictory predictions have been posited. It has been suggested that WM plays a larger role in immersion settings, where processing demands are arguably highest (McDonald, 2006; Sagarra and Herschensohn, 2010), and indeed empirical research has found a facilitative role for WM among learners with immersion experiences, but not for learners without immersion experiences (e.g. LaBrozzi, 2012; Sunderman and Kroll, 2009; Tokowicz et al., 2004). There is also evidence, however, that WM is predictive of learning in more explicit settings due to the requirement to retain metalinguistic information in memory while simultaneously comprehending and producing language (e.g. Linck and Weiss, 2015; Sagarra, 2008). Critically, the evidence of an interaction between WM and learning in different naturalistic settings comes from studies that have compared learners with or without previous immersion experiences, rather than from experimental analyses of L2 development that occurs over time (e.g. LaBrozzi, 2012; Sunderman and Kroll, 2009; Tokowicz et al., 2004). Although there is preliminary evidence to suggest that WM can account for L2 development among learners in a classroom setting (Linck and Weiss, 2015), additional longitudinal research is needed to understand this relationship. Also of interest is the relationship between WM and online neural processing changes, which has yet to be examined. The experiments reported here aim to build upon and extend previous research by assessing whether WM, considered in conjunction with declarative and procedural memory, can account for changes in L2 performance and processing for learners in different naturalistic settings.
b Individual differences in electrophysiological processing
One brain-based processing measure that can be used for this purpose is the event-related potential (ERP), which reflects online electrophysiological brain activity related to cognitive processing. Here we briefly present two common language-related ERP components that are relevant to the current study (for more in-depth reviews for native language (L1), see Kaan, 2007; Steinhauer and Connolly, 2008; Swaab et al., 2012; and for more in-depth reviews for L2, see Morgan-Short, 2014; Morgan-Short et al., 2015b; Steinhauer et al., 2009; van Hell and Tokowicz, 2010). One prominent ERP component in L1 is the ‘N400’, which is a negative deflection that is typically evident over centro-posterior scalp regions around 300–500 ms after the onset of a word (e.g. Kutas and Hillyard, 1980). The N400 is larger for words that are more difficult to process due to their anomalous meaning or due to a range of lexical characteristics, such as low frequency (Kutas and Federmeier, 2011). The N400 has been understood to represent lexical-semantic processing, but may also be thought of as representing more general memory-based processing, as it can be elicited by non-linguistic stimuli (Kuperberg, 2007; Osterhout et al., 2012; for further discussion, see Morgan-Short and Tanner, 2014). Another important ERP component in L1 is the ‘P600’, which is a positive deflection that is also typically distributed over posterior scalp regions but occurs around 600–900 ms after the onset of a word (e.g. Osterhout and Holcomb, 1992). This component is consistently elicited by violations of syntax and morphosyntax as well as by other grammatical properties, such a complexity (e.g. Kaan et al., 2000). The P600 has been interpreted as reflecting grammatical processing but has also been characterized as reflecting more general combinatorial processes (Kuperberg, 2007; Osterhout et al., 2012; for further discussion, see Morgan-Short and Tanner, 2014). Although these ERP components are typically evident in L1 group averages, recent evidence suggests that there is considerable individual variation among speakers, even in regard to whether a speaker shows an N400 or a P600 response to grammatical processing (Tanner and van Hell, 2014).
In L2, the N400 component is reliably elicited for lexical processing and is also often found for grammatical processing among learners with lower levels of proficiency (e.g. Mueller et al., 2009; Tanner et al., 2013). At higher levels of proficiency and exposure, L2 learners often show P600s for syntactic violations (e.g. Bowden et al., 2013; Hahne and Friederici, 2001; Rossi et al., 2006). A shift from N400 to P600 with increased proficiency and exposure has been attested within individual learners in longitudinal studies of grammatical processing (e.g. McLaughlin et al., 2010; Morgan-Short et al., 2010, 2012), but, as in L1, there is evidence that ERP responses vary among L2 learners even when proficiency and experience are similar (e.g. Bond et al., 2011; Tanner et al., 2014). In order to better understand variability in processing, researchers have begun to examine individual-level processing signatures, which can reveal patterns not evidenced in group-level analyses (e.g. Tanner et al., 2014).
The factors that predict variability in ERP processing signatures remain a largely open question in the extant literature. Recent work has explored the relationship between aptitude and proficiency and P600 effect sizes (Bond et al., 2011; Nickols and Steinhauer, this issue) as well as between motivation, proficiency, and language background and the magnitude and the type of neural response (Tanner et al., 2014). The role of cognitive abilities in L2 processing have not yet been addressed.
c Contexts of L2 learning: At-home and study-abroad
A significant number of behavioral L2 studies have addressed the role of context of learning (e.g. study-abroad and at-home university learning), in the development of a variety of linguistic skills. In general, results indicate that study-abroad learners improve in measures of fluency and oral skills (e.g. Collentine and Freed, 2004; Segalowitz and Freed, 2004). However, conflicting results are prevalent in studies examining grammatical development and accuracy: There is evidence that study-abroad groups show gains in L2 accuracy and grammatical abilities (e.g. Grey et al., 2015; Isabelli, 2004) and that they are superior to comparison groups of at-home learners (e.g. Howard, 2001; Isabelli and Nishida, 2005). In other cases, at-home groups have shown gains that are equal to or superior than study-abroad groups (Collentine, 2004; DeKeyser, 1991; Isabelli-Garcia, 2010). This research has primarily focused on accuracy for morphosyntax (e.g. Collentine, 2004; Howard, 2001; Isabelli and Nishida, 2005; Isabelli-Garcia, 2010), with limited focus on syntactic development (Isabelli, 2004). Moreover, despite discussion in this literature of the importance of studying individual differences in study-abroad and at-home contexts (e.g. Lafford, 2006), a limited number of studies have addressed the role of cognitive abilities in L2 development in these settings (e.g. Grey et al., 2015; Segalowitz et al., 2004). Furthermore, although several ERP studies have examined processing for learners who have studied abroad or have experienced immersion (for detailed discussion, see Morgan-Short et al., 2015b), researchers have yet to examine how exposure to natural language in different contexts affects the individual-level neurocognitive processing of an L2.
2 Motivation for the current study
Understanding how individual differences play out in different learning contexts, to influence both performance and processing, is critical to refining our understanding of L2 acquisition (Robinson, 2001). To this end, motivated by emerging results of previous research with artificial languages and post-immersion experience learners, the two experiments reported here provide a longitudinal examination of natural L2 development (Spanish) to explore the role of individual differences in declarative, procedural, and working memory in behavioral performance and processing changes. Experiment 1 examines learners in a classroom setting, the natural context that ‘explicit’ training conditions are generally designed to reflect; Experiment 2 utilizes the same design to examine learners in a study-abroad or immersion setting, the natural context that ‘implicit’ and ‘incidental’ training conditions are generally designed to reflect. We acknowledge that learners in naturalistic settings are undoubtedly exposed to a mixture of explicit and implicit training in their L2 learning and note that the two experiments are not designed to allow for a direct comparison between the at-home and study-abroad contexts. At the same time, we recognize the merit of exploring whether results of laboratory-based research are corroborated in natural settings in order to inform theoretical models. To this end, L2 Spanish syntax (phrase structure), the learning target in previous artificial or mini-artificial language research that has explored the relative contributions of different aspects of memory (e.g. Brill-Schuetz and Morgan-Short, 2014; Carpenter, 2008; Carpenter et al., 2009; Hamrick, 2015; Morgan-Short, Faretta-Stutenberg et al., 2014; Morgan-Short, Deng et al., 2015a), is used as a proxy for L2 development in order to build upon and facilitate comparison with previous work.
The present experiments address the following research questions in hopes of providing insight into how individual differences in these memory systems may predict behavioral and processing changes among learners in different, natural contexts:
Research question 1: Do individual differences in declarative, procedural, and working memory account for changes in L2 behavioral performance over the course of one semester?
We predict that a regression analysis with L2 behavioral change as the outcome variable and declarative, procedural, and working memory as predictor variables will show the following: Working memory will predict performance gains in both settings (e.g. Linck et al., 2014; Linck and Weiss, 2015); procedural memory will account for gains in performance in immersion settings (e.g. Brill-Schuetz and Morgan-Short, 2014); and declarative memory may play a role if learners remain at lower levels of proficiency (e.g. Hamrick, 2015; Morgan-Short et al., 2014).
Research question 2: Do individual differences in declarative, procedural and working memory account for changes in L2 neurocognitive processing over the course of one semester?
Because of the lack of previous research that directly informs this question, we tentatively predict that regression analyses with individual L2 processing indices as the outcome variables and declarative, procedural, and working memory as predictor variables will show the following: Working memory may play a role in processing strategies for learners in immersion settings (e.g. LaBrozzi, 2012); and procedural memory may predict increased neural activation (Morgan-Short et al., 2015b). No specific predictions are made regarding the role of declarative memory in L2 processing changes.
In order to address these research questions, the study employed a short-term longitudinal design, allowing for the assessment of behavioral and processing changes that occurred over the course of one semester of university-level language study. Three testing sessions were administered: (1) a cognitive session, in which declarative, procedural and working memory were assessed, as well as (2) a baseline language session and (3) a follow-up language session, in which L2 performance and processing were assessed at the beginning and end of the semester, respectively. Each element of the study is described below.
II Experiment 1: At home
1 Methods
a Participants
Participants included 29 native speakers of English studying Spanish as an L2 at the university level. During the semester of study, participants were enrolled in one to three Spanish content courses at a fifth-semester level or above (e.g. Spanish grammar review, introductory linguistics, literary analysis) at a large public university in the USA.
In order to qualify for the study, a participant’s experience with Spanish had to be classroom-based, with no previous substantial immersion experience and no substantial exposure to the language before the age of 12. Participants were healthy, right-handed (Oldfield, 1971), and had normal or corrected-to-normal vision. A total of 12 participants were excluded from analysis for the following reasons: Seven participants failed to complete all experimental sessions, one reported pre-study immersion experience, and four had excessive artifacts in electroencephalogram (EEG) data. In addition to experimental assessments (described below), a language history questionnaire and a measure of IQ were administered during the cognitive session, and a motivation questionnaire and two general measures of proficiency were administered during each language session. Relevant characteristics for participants included in analyses (N = 17; 13 female) are reported in Appendix 1. All participants provided written informed consent to participate at each session, and received monetary compensation for their time.
Cognitive tests
During the cognitive assessment session, participants completed two measures each of declarative and procedural learning ability, and three measures of WM ability. Cognitive assessments were administered in a pseudo-randomized, counter-balanced order across participants.
Measures of declarative learning ability
Following Morgan-Short et al. (2014), participants completed two independent measures of declarative learning ability: the Modern Language Aptitude Test, Part V – Paired Associates (MLAT-V; Carroll and Sapon, 1959) and Continuous Visual Memory Task (CVMT; Trahan and Larrabee, 1988). For the MLAT-V, participants memorized a list of 24 pseudo-Kurdish words and their English translation equivalents and then completed a timed multiple-choice test in which they selected the Kurdish word that matched a given English word. The number correct on the multiple-choice test (maximum = 24) comprised participant score. For the CVMT, participants viewed a series of 112 slides on a computer screen in which a total of 70 abstract figures were presented. Participants indicated whether each design was ‘new’ (had not been displayed previously) or ‘old’ (had been displayed previously); these responses were used to calculate a d’ score. For analyses, a participants’ composite score for declarative learning ability was calculated by averaging z-scores from these two tasks.
Measures of procedural learning ability
Following Brill-Schuetz and Morgan-Short (2014), participants completed two independent measures of procedural learning ability: the Alternating Serial Response Time Task (ASRT; Howard and Howard, 1997) and the dual-task version of the Weather Prediction Task (WPT; Foerde et al., 2006). For the ASRT, participants viewed a set of four, empty circles on a computer screen that became targets when they appeared with a solid black fill one at a time. Participants were instructed to respond as quickly and accurately as possible to the target circles by pressing corresponding keyboard keys. Embedded in the target appearance sequence was a pattern that alternated with random trails. The difference in reaction time for correctly responded pattern versus random trials was used in analyses. For the WPT, participants were presented with different combinations of four ‘tarot cards’ associated with different probabilities of ‘sunshine’ or ‘rain’ over a series of eight blocks. Participants were asked to predict the weather (‘sunshine’ or ‘rain’) as each set of tarot cards was presented while they simultaneously kept a count of high tones that were presented along with low tones during each block. Participant scores were based on weather prediction accuracy during the final dual-task block. For analyses, a composite score for procedural learning ability was calculated by averaging participants’ z-scores from the ASRT and WPT.
Measures of working memory capacity
In order to obtain a measurement of WM ability, automated versions of three WM measures were employed: the Automated Operation Span Task (OSpan; Unsworth et al., 2005), the Automated Reading Span Task (RSpan; Unsworth et al., 2005), and the Automated Symmetry Span Task (SSpan; Unsworth et al., 2005). Each task represents a ‘complex span task’ that engages both processing and storage elements of WM (Unsworth et al., 2005).
For the OSpan task, participants solved math operations (processing) and kept track of letters presented after each math operation (storage) in order to recall them at the end of each of 15 sets of trials. The RSpan task is identical to the OSpan except that in place of the math operations, participants read sentences and judged whether or not they made sense (processing). For the SSpan, participants saw a matrix-based image and indicated whether it was symmetrical (processing). After providing the symmetry judgment, a smaller 4 × 4 matrix with one red cell appeared, and participants were instructed to remember its location (storage) in order to recall them at the end of each of 12 sets of trials. For all three WM tasks, participants were instructed to work as quickly and accurately as possible; scores were based on perfect trials (correct for both processing and storage components). For analyses, a composite score for WM was calculated by averaging z-scores from the three tasks.
b Measure of L2 behavioral performance
A grammaticality judgment task (GJT) served as the experimental linguistic task to address the research questions. During the GJT, participants were directed to read Spanish sentences and to indicate whether each was acceptable (‘good’) or unacceptable (‘bad’) in Spanish. The experimental, phrase structure stimuli consisted of 60 sentences with syntactic violations and 60 matched correct control sentences, designed following a paradigm developed by Steinhauer and Drury (2012) that allows both the critical and the pre-critical material to be matched across the stimuli set (Luck, 2012). Violations were created from correct sentences containing a noun and an infinitive; in the violation, the noun and the infinitive are switched. Two sentence frames with each noun and infinitive pair were created so that each word appeared both as a correct and a violation critical word. Experimental stimuli sentences ranged from 7 to 13 words in length, with one or two words between the switched elements. Examples of experimental stimuli sentences are provided in Table 1.
Grammaticality judgment task experimental stimuli.
Note. Bold typeface marks the critical word where violation becomes evident in each sentence. Event-related potentials were time-locked to the onset of the critical words. The word that constitutes the violation is indicated with an asterisk (*).
The GJT included an additional 480 sentences, representing four conditions not reported here (120 sentences per condition, half correct), yielding a total of 600 stimuli sentences. In order to ensure participant familiarity with the vocabulary, all words used in the stimuli sentences appeared in at least one of two introductory Spanish language textbooks (Dicho y hecho, 8th edition, 2008; Sol y viento, 2nd edition, 2009). Two stimuli lists were created using a Latin square design such that (1) only one version (violation or correct) of each sentence was included in each list, and (2) participants read and judged 300 different sentences during each session (half correct). Stimuli were divided into five blocks, containing 60 sentences each (half correct, balanced across conditions). Cronbach’s alphas for the GJT (both versions, at both testing sessions) ranged from .834 to .895, all falling at or above the median for instrument reliability in L2 research (Plonsky and Derrick, 2016).
Participant responses to each sentence in the GJT were recorded, and accuracy and d’ scores were calculated. Analyses were based on d’ scores, which account for potential response bias (e.g. Stanislaw and Todorov, 1999). Because we were interested in examining behavioral development, individuals’ change in performance on each task was calculated by subtracting baseline scores from follow-up scores (e.g. GJT d’change = GJT d’follow-up – GJT d’baseline).
c Measures of L2 neurocognitive processing
EEG data were collected while participants completed the GJT. Sentences were presented visually through EPrime (Psychology Software Tools, Inc.) one word at a time. Each word appeared in the center of the screen for 400 ms, with an additional 400 ms between each word. After the end of the sentence, the screen was blank for 1000 ms, followed by a question mark to prompt participants to provide their judgment (left click for ‘good’ and right click for ‘bad’). The question mark remained on the screen until the participant responded via mouse click (up to 5000 ms), and was followed by a blink prompt (####) that remained on the screen until the participant clicked to begin the next sentence. After the blink screen, the cycle repeated, beginning with presentation of a fixation cross (1000 ms) and then the next sentence. The 300 stimuli were presented over five blocks (approximately 10 to 12 minutes each), with a short break after each block.
Scalp EEG was continuously recorded in DC mode at a sampling rate of 512 Hz using ASA-lab (ANT) 4.7.9 software. Participants were fitted with a Waveguard Cap (ANT) comprising 32 Ag/AgCl electrodes placed according to the extended 10–20 system, as illustrated in Figure 1. The impedance for each electrode was reduced to below 5 kΩ, and impedances were monitored after each block to ensure that they were held below this threshold. Scalp electrodes were referenced online to the average of all electrodes. The signal was amplified by an AMP-TRF40AB Refa-8 amplifier with a gain of 22-bit. The vertical electrooculogram (VEOG) was recorded from electrodes above and below the right eye, and the horizontal electrooculogram (HEOG) was recorded from electrodes on the left and right temples.

(a) Electrode layout and electrodes included in group-level event-related-potential (ERP) analyses. Lateral electrodes include the four side columns enclosed in solid line; midline electrodes are enclosed in a dashed line. (b) Electrode layout and electrodes included in individual-level ERP analyses. Electrodes enclosed in the two rows represent the centro-parietal region of interest.
After recording, data processing and analyses were completed in MatLab (version R2009a) using EEGlab (version 9.0.7.6b) and ERPlab (version 2.0.0.0) plug-ins. First, 1400 ms epochs were extracted from the continuous EEG (200 ms pre-critical word to 1200 ms post-critical words). All data were re-referenced offline to the mean of the left and right mastoids, then filtered using an IIR (infinite impulse response) Butterworth filter with a high pass of .10 Hz and a low pass of 20.0 Hz. In order to detect eye blinks and other artifacts, stepwise artifact rejections were performed on both EEG and EOG channels using a 40 µV threshold, a 10 ms step, and a 400 ms moving window. In order to reject epochs containing drift, an additional stepwise artifact rejection was performed on EEG channels using a 40 µV threshold over the entire 1400 ms time window (one 1400 ms step). Participants with greater than 25% rejection in the experimental condition (violation or correct) during either baseline or follow-up assessments were excluded from analyses. After these participants were excluded, artifacts led to rejections of less than 6% of experimental trials at either testing session (baseline: 5.44% correct, 5.38% violation; follow-up: 2.31% correct, 4.87% violation). Following previous research with L2 learners, all remaining trials, regardless of behavioral responses, were included in the main analyses (e.g. Bowden et al., 2013; Morgan-Short et al., 2010; Tanner et al., 2013).
For ERP analysis, both group-level and individual-level analyses were conducted. ERPs time-locked to the onset of critical words were averaged off-line for each participant, at each electrode site, using a 200 ms pre-stimulus baseline. ERP components of interest were quantified by computer as mean voltage amplitudes within a time window of activity. Two time windows of interest, corresponding to the N400 and P600 effects, were selected based on previous research: 300–500 ms and 600–900 ms, respectively. Specific analyses procedures for group-level (grand average) and individual-level (participant average) ERP analyses are described below.
For group-level analyses, individual ERPs were calculated, averaged across participants, and entered into a group grand average for each session (baseline, follow-up) and time window (300–500 ms, 600–900 ms). Lateral and midline global ANOVAs were conducted for each time window. The lateral ANOVA was based on 20 electrode sites (F7, F4, F3, F8, FC5, FC3, FC1, FC4, T7, C3, C4, T8, CP5, CP1, CP2, CP6, P7, P4, P3, P8; see Figure 1a) and included the experimental factor Violation (violation, correct) and the distributional factors Hemisphere (left, right), Laterality (lateral, medial) and AnteriorPosterior (anterior, anterior-central, central, centro-posterior, posterior). The midline ANOVA was based on five electrode sites (FPz, Fz, Cz, Pz, Oz; see Figure 1a) and included the experimental factor Violation (violation, correct) and the distributional factor AnteriorPosterior (anterior, anterior-central, central, centro-posterior, posterior). Significant main effects and interactions (p < .05) from the global ANOVAs for each time window at each session are reported. For all repeated measures with greater than one degree of freedom, the Greenhouse–Geisser correction for inhomogeneity of variance was applied and corrected p-values are reported. Follow-up analyses for significant interactions consisted of Bonferroni corrected, pair-wise comparisons.
For individual-level analyses, processing signatures were quantified using a response magnitude index (RMI) and a response dominance index (RDI; following Tanner et al., 2014). These indices were calculated for each participant for both sessions (baseline and follow-up) based on formulae reported in Tanner et al. (2014) using the absolute value of the N400 and P600 effects over the centro-posterior region of interest (ROI; C3, Cz, C4, P3, Pz, P4; see Figure 1b) in the 300–500 ms and 600–900 ms time windows. RMI provides a measure of the magnitude of an individual’s overall sensitivity to violations, where greater RMI values indicate larger neural responses to violations across both time windows, regardless of the type of response. RDI provides an index of an individual’s relative response dominance (negativity/N400 or positivity/P600), where a participant with relatively equal-sized N400 and P600 effects would have an RDI value near zero. The dependent variable of interest, an individual’s change in each processing metric, was calculated by subtracting the value for the metric at baseline from the value at follow-up (e.g. RMIchange = RMIfollow-up – RMIbaseline).
2 Results
In this section, we first provide a statistical overview of behavioral performance and change. These results serve to inform the interpretation of the regression analysis tied to the first research question:
Research question 1: Do individual differences in declarative, procedural, and working memory account for changes in L2 behavioral performance over the course of one semester?
Next, we report group- and subgroup-level ERP patterns at each testing session as well as ERP change as revealed through individual-level indices. Again, this reporting serves to inform the interpretation of the regression analyses addressing our second research question:
Research question 2: Do individual differences in declarative, procedural and working memory account for changes in L2 neurocognitive processing over the course of one semester?
a L2 behavioral performance
We explored the GJT scores for at-home learners in order to gain an understanding of the performance changes that took place over the semester (see Table 2). At baseline testing, learner scores were descriptively low but were significantly above chance for phrase structure judgments (GJT d’: t(16) = 7.50, p < .001). From baseline to follow-up testing, at-home learners evidenced performance gains on the GJT of large effect (GJT d’: t(16) = 4.411, p < .001), although individual performance change varied from a loss of 6% to a gain of 22%.
At-home learners: Descriptive results for GJT.
Notes. at(16) = 3.764, p = .002. *p < .05, **p < .01.
b Declarative, procedural and working memory and L2 performance
Our first research question asked whether change in at-home learners’ L2 performance could be accounted for by different aspects of memory. In order to capture the unique contribution of each aspect of memory (Tabachnick and Fidell, 2001), a standard, linear multiple regression was run where we regressed GJT d’ change score (the dependent variable) onto the declarative, procedural, and working memory composite scores (the predictor variables; for scores on the memory measures, their intercorrelations, and their correlations with GJT gains, see Appendix 2, Tables 11 and 12). Results from the regression (see Table 3) indicated that memory accounted for approximately 24% of variance, although the model was not statistically significant and neither declarative, procedural, nor working memory were found to be statistically significant predictors of performance gains.
At-home learners: Regression model examining declarative, procedural, and working memory and GJT performance change from baseline to follow-up: GJT d’ Change.
Notes. B = unstandardized regression coefficient; SEB = standard error of B; β = standardized regression coefficient.
c L2 neurocognitive processing (ERPs)
Prior to examining at-home learners’ ERP data in regard to the specific research questions, we first explored the ERP effects evidenced at baseline and follow-up in order to gain an understanding of whether learners in this group evidenced typical language-related ERP effects (e.g. N400, P600). As a group at baseline, at-home learners evidenced a sustained negativity that was significant at anterior and anterior-central sites during the 300–500 ms time window, and at anterior sites during the 600–900 ms time window. At follow-up testing, this group of learners evidenced an early negativity that was marginally significant at posterior sites during the 300–500 ms time window. For the 600–900 ms time window at follow-up, learners evidenced a late positivity that was marginally significant at anterior and anterio-central sites. (For waveforms and voltage maps, see first row in Figure 2. For full statistical results, see the text and Table 14 provided in Appendix 3.) Thus, these group-level analyses revealed that at-home learners showed a negative effect over anterior sites in both time windows at baseline, but no clear effect at follow-up, although there were also both negative and positive trends in the data.

ERP waveforms and voltage maps for at-home learners at (a) baseline and (b) follow-up.
Given recent reports of evidence of different processing strategies in both L1 and L2 (Tanner and van Hell, 2014; Tanner et al., 2014), we considered whether the general lack of group effects could be due to the fact that the group average was obscuring clear ERP effects in subgroups of learners with different processing strategies, as opposed to general issues of lack of power or lack of language processing effects. Indeed, both at baseline and at follow-up there were negative relationships between the effects in the 300–500 ms and the 600–900 ms time windows (baseline: r(15) = −.329, p = .197; follow-up: r(15) = −.788, p < .001), suggesting that learners’ responses were either negativity- or positivity-dominant, particularly at follow-up testing. Thus, we explored whether significant ERP effects would emerge at baseline and at follow-up within learners who were grouped by whether they had a negative or positive RDI value at the relevant testing session. Note that evidence of a general negative or positive ERP effect within these subgroups would not be particularly informative, but effects with timing and distribution characteristics consistent with the N400 and/or P600 would provide evidence of language processing effects within these groups.
Indeed, at baseline, we found that within the subgroup of negativity-dominant learners (n = 11; see second group of rows in Figure 2 and Table 14), statistically significant negativities were present at both medial and lateral sites. Between 600–900 ms at baseline, significant negativities were evidenced at anterio-central, central and centro-posterior medial sites and at anterior and anterio-central lateral sites. At follow-up testing, between 300–500 ms, the subgroup of negativity-dominant learners (n = 11) showed a marginally significant negativity at medial sites and a significant negativity over lateral sites. Between 600–900 ms, these learners evidenced a marginally significant negativity at lateral sites. Thus, at both baseline and follow-up testing, the subgroups of negativity-dominant learners evidenced significant negative effects, with the timing and the distribution consistent with an N400 effect at follow-up testing.
For the subgroup of positivity-dominant learners at baseline testing (n = 6; see third group of rows in Figure 2 and Table 14), no statistically significant effects or interactions were evidenced in either time window. For positivity-dominant learners at follow-up (n = 6), no significant effects or interactions were evidenced between 300–500 ms. However, between 600–900 ms, this subgroup evidenced statistically significant positive effects along all anterior to posterior sites for both medial and lateral sites. Thus, at follow-up testing, a P600 constrained to the typical P600 time window was evidenced within the subgroup of positivity-dominant learners.
Overall, the full group of learners who studied Spanish over the course of a semester at their home university showed an anterior negative ERP effect at baseline and trends to both negative and positive ERP effects at follow-up. These overall group averages, however, seemed to obscure the fact that negativity-dominant learners drove the anterior negativity at baseline testing, with positivity-dominant learners not evidencing any ERP effect at that time. At follow-up testing, negativity-dominant learners showed a clear N400 effect and positivity-dominant learners showed a P600 effect.
d Declarative, procedural, and working memory and L2 processing
Our second research question asked whether change in at-home learners’ processing could be accounted for by different aspects of memory. Because group-level ERP effects do not allow us to examine individual differences, we adopted the indices from Tanner et al. (2014) to capture change in individuals’ processing response magnitude and type/dominance. Table 4 reveals descriptive changes among the at-home learners as a group, with a moderate-sized increase in the magnitude of their responses and a small shift to more positive processing responses, although neither of these changes reach statistical significance (RMI: t(16) = 1.669, p = .114; RDI: t(16) = 0.577, p = .572). Correlations between processing change metrics and cognitive factors are reported in Appendix 2, Table 12.
At-home learners: Processing changes in magnitude and dominance.
In order to capture the unique contribution of each aspect of memory, two standard, multiple regression analyses were run, for which either RMI-change or RDI-change (as the dependent variable) was regressed onto declarative and procedural learning ability and WM composite scores (the predictor variables; see Table 5). Results from the regression for RMI-change indicated that memory accounted for approximately 18% of the variance, and results from the regression for RDI-change indicated that memory accounted for approximately 13% of the variance. However, neither model was statistically significant and no unique predictors of processing changes were evidenced.
At-home learners: Regression model examining declarative, procedural, and working memory and processing change from baseline to follow-up.
Notes. RDI = response dominance index; RMI = response magnitude index.
3 Discussion
Results from Experiment 1 showed that intermediate Spanish learners in an at-home context made significant gains from the beginning to the end of a semester in regard to their ability to detect syntactic, phrase structure violations. In regard to learners’ neural processing of these violations, at-home learners as a group did not evidence language-related ERP processing effects at the beginning or at the end of the semester, nor did the overall size (RMI) or type (RDI) of neural signatures change at a statistically significant level. However, when learners were grouped by their ERP dominance type at each time point, significant N400 and P600 effects were evidenced at the end of the semester. Thus, overall, at-home learners evidenced improved performance on a judgment task as well as N400 and P600 effects at the end of a semester of intermediate-level Spanish study.
How do these results compare with previous research that motivated the current experiment? In regard to L2 performance, the only context-based study to examine gains for at-home learners on GJTs failed to find improvements (Isabelli-Garcia, 2010), although the target structure was grammatical gender agreement (which is different than that of the current study and is known to be a particularly difficult structure for L2 learners to master). In regard to L2 processing, previous research has shown progressions from no ERP effect to an N400 and/or from an N400 to a P600 longitudinally in learners of natural and artificial L2s (McLaughlin et al., 2010; Morgan-Short, Sanz et al., 2010; Morgan-Short, Steinhauer et al., 2012; Osterhout et al., 2006). As a group, the learners in the current study did not evidence this progression. The lack of group ERP effect at follow-up testing, however, is consistent with previous research: at the end of the semester, at-home learners were approximately 73% correct on detecting phrase structure violations; similarly, explicitly trained learners in Morgan-Short et al. (2012) did not evidence any group ERP effect when performing at approximately 77% correct on detecting phrase structure violations in an artificial language. Importantly though, the at-home learners’ group-averaged ERPs seemed to obscure ERP effects present within subgroups. At the end of the semester of at-home study, one subgroup of learners evidenced the development of an N400 effect, which suggests a reliance on meaning-based strategies for processing phrase structure, whereas another subgroup of learners evidenced the development of a P600 effect, which suggests a reliance on combinatorial processes. These results add converging evidence to (1) L2 studies that report N400s in response to syntactic violations (e.g. McLaughlin et al., 2010; Morgan-Short et al., 2010, 2012; Tanner et al., 2013), and (2) research that has found different language processing strategies may be obscured by group-level analyses (Tanner and van Hell, 2014; Tanner et al., 2014).
In regard to our research questions asking whether abilities in different memory systems account for changes in performance or processing among at-home learners, the results did not reveal a predictive role for individual differences in memory. In regards to WM, we predicted that WM would account for performance gains; this prediction was not supported. In contrast with the null results for the present study, Linck and Weiss (2015) found that WM was a marginally significant predictor of gains on a written measure of general proficiency from pre- to post-semester for classroom learners at earlier stages of experience. These contrasting results suggest that the role of WM for at-home learners may be more relevant for general proficiency changes, and for earlier stages of development, than for the specific linguistic structure and task utilized here. In terms of declarative and procedural learning ability, we predicted a role for declarative memory in behavioral change if learners remained at low proficiency (which they did not) and did not predict a role for procedural memory among at-home learners. Indeed, if explicit laboratory studies can be related to at-home contexts, these results corroborate those of Brill-Schuetz and Morgan-Short (2014), where no effect for procedural memory was found for explicitly trained learners of an artificial L2. In terms of processing, we predicted a role for procedural memory in processing changes, which was not supported. In general, the particular relationships found in previous research, based on learners of artificial L2s trained under implicit and incidental conditions, do not seem to be substantiated in this set of at-home learners of Spanish. The disconnect between previous studies that have found a role for declarative and procedural memory and the current study, which did not find evidence of such a role, could be due to (1) differences in learner proficiency and experience, as learners in the current study may have moved through proficiency levels that were neither low nor high enough to expect strong reliance on declarative or procedural memory and were likely exposed to a mixture of explicit and implicit training, or (2) the fact that the present study examined a change in performance and processing whereas previous research examined performance at discrete points in time. Overall, even though performance gains and processing changes were evidenced in at-home learners, we were not able to account for this development with the cognitive factors examined here.
III Experiment 2: Study abroad
1 Methods
a Participants
Participants included 20 native speakers of English studying Spanish as an L2. During the semester of study, participants were enrolled in 12- to 15-week study-abroad programs in Spanish-speaking countries (Spain, Argentina, Dominican Republic), and completed four or five university-level courses taught in Spanish (e.g. intermediate/advanced Spanish grammar, linguistics, literary analysis, history, culture, and gastronomy; mean weekly classroom hours = 15.5, SD = 3.5). Participants completed the cognitive and baseline language sessions an average of 7.2 days (SD = 3.8 days) prior to departure for their study-abroad program and completed the follow-up language session an average of 14.4 days (SD = 10.7 days) after returning from study abroad. Screening procedures and participation criteria were the same as those used in Experiment 1. A total of seven participants were excluded from analysis for the following reasons: Five participants failed to complete all experimental sessions and two had excessive artifacts in EEG data, resulting in final analyses on data from 13 participants (all female; additional participant data reported in Appendix 1).
b Materials and procedures
All materials and procedures were identical to those utilized in Experiment 1. Artifacts in EEG data led to rejections of less than 8% of experimental trials from ERP analyses at either testing session (baseline: 6.08% correct, 7.64% violation; follow-up: 6.27% correct, 5.30% violation).
2 Results
In this section, as Section II.2, we present statistical analyses that describe performance and processing at each session and their change over the semester. These results inform the interpretation of the regression analyses that are carried out to address our specific research questions.
a L2 behavioral performance
We first explored GJT scores in order to gain insight into performance changes that took place over the semester (see Table 6). At baseline testing, learners showed fairly low but statistically above chance performance for phrase structure judgments (GJT d’: t(12) = 3.452, p = .005). From baseline to follow-up testing, learners who studied abroad evidenced improved performance on phrase structure judgments of moderate-sized effect that reached statistical significance (GJT d’: t(12) = 2.392, p = .034), although individual performance change varied from a loss of 13% to a gain of 24%.
Study-abroad learners: Descriptive results for GJT.
Notes. at(12) = 2.136, p = .054. ^p < .10, *p < .05.
b Declarative, procedural, and working memory and L2 performance
In order to address our first research question in regard to study-abroad learners, we examined whether the change in performance could be accounted for by different aspects of memory. A standard, multiple regression was conducted, for which the GJT d’ change score (the dependent variable) was regressed onto the declarative, procedural, and working memory composite scores (the predictor variables; for scores on the memory measures, their intercorrelations, and their correlations with GJT gains, see Appendix 2, Tables 11 and 13). Results from the regression (see Table 7) indicated that memory accounted for approximately 70% of the variance in GJT d’ change scores and that procedural learning ability was statistically significant, unique predictor of performance gains.
Study-abroad learners: Regression model examining declarative, procedural, and working memory and GJT performance change from baseline to follow-up: GJT d’ change.
Notes. B = unstandardized regression coefficient; SEB = standard error of B; β = standardized regression coefficient. ** p < .01.
c L2 neurocognitive processing (ERPs)
Prior to examining study-abroad learners’ ERP data in regard to the specific research questions, we first explored the ERP effects evidenced at baseline and follow-up sessions in order to gain an understanding of whether learners in this group evidenced typical language-related ERP effects. As a group, at baseline, significant positivities were evidenced at anterior, central-anterior and central sites for the 300–500 ms time window. For the 600–900 ms time window at baseline, a significant positivity with a broad distribution was evidenced. At follow-up testing, for the 300–500 ms time window, study-abroad learners evidenced significant negativities at left-hemisphere medial sites as well as at right-hemisphere medial and lateral sites. For the 600–900 ms time window at follow-up, no significant effects or interactions were evidenced. (See first row in Figure 3 for waveforms and voltage maps. For full statistical results, see the text and Table 15 provided in Appendix 3.) Thus, these group-level analyses revealed that study-abroad learners showed an anterior positivity at baseline and an N400 effect with typical timing and distribution characteristics at follow-up testing. Following the analysis for Experiment 1, we explored whether these group averages were obscuring different processing strategies in subgroups of participants. As with the at-home learners, there were negative relationships between the effects in the 300–500 ms and 600–900 ms time windows (baseline: r(11) = –.622, p = .023; follow-up: r(11) = –.504, p = .079), suggesting that learners largely displayed either negativity- or positivity-dominant processing responses. Thus we explored whether additional significant ERP effects would emerge within learners who were grouped by their dominance (RDI).

ERP waveforms and voltage maps for study-abroad learners at (a) baseline and (b) follow-up. On the waveforms, the black line represents processing of correct phrase structure stimuli and the red line represents processing of violation phrase structure stimuli. Voltage maps represent the value of the difference wave for the violation condition minus the correct condition. Note that negative voltage is plotted up.
For the subgroup of negativity-dominant learners at baseline (n = 3; see second group of rows in Figure 3 and Table 15), analyses revealed a marginally significant negativity at posterior sites, as well as a significant negativity at left, medial sites. Between 600–900 ms, these negativity-dominant learners evidenced a significant negativity constrained to left, medial, posterior sites. At follow-up testing, negativity-dominant learners (n = 8) evidenced significant negative effects at both medial and lateral sites. No statistically significant effects or interactions were evidenced between 600–900 ms.
For the subgroup of positivity-dominant learners at baseline (n = 10; see third group of rows in Figure 3 and Table 15), between 300–500 ms, a significant positive effect was distributed over anterior, anterio-central and central sites. Between 600–900 ms, a significant, broad positivity was evidenced. At follow-up testing, between 300–500 ms, no significant effects or interactions were evidenced for positivity-dominant learners (n = 5). However, between 600–900 ms, analyses revealed positive effects over medial and lateral sites.
Overall, the full group of learners who studied Spanish abroad over the course of a semester showed an anterior positivity at baseline and an N400 effect at follow-up testing. However, when analyses were conducted based on response dominance, negativity-dominant learners evidenced clear N400 effects at both baseline and at follow-up testing whereas positivity-dominant learners evidenced a more anterior positivity at baseline testing and a clear P600 effect at follow-up testing.
d Declarative, procedural, and working memory and L2 processing
Our second research question asked whether study-abroad learners’ change in processing could be accounted for by different aspects of memory. As in Experiment 1, we adopted the indices from Tanner et al. (2014) and explored change in the size (RMI) and type (RDI) of neural response. Table 8 reveals descriptive changes among the study-abroad learners as a group, with a moderate-sized increase in the magnitude of their responses, which did not reach statistical significance (RMI: t(12) = 1.216, p = .247), and a moderate-sized, statistically significant shift to more negative processing responses (RDI: t(12) = 2.294, p = .041). Correlations between these processing changes and cognitive factors are reported in Appendix 2, Table 13.
Study-abroad learners: Processing changes in magnitude and dominance.
Note. * p < .05.
To examine whether change in processing could be accounted for by different aspects of memory, two standard, multiple regression analyses were run in which either RMI-change or RDI-change (as the dependent variable) was regressed onto the declarative and procedural learning ability and WM composite scores (the predictor variables; see Table 9). Results from the regression for RMI-change indicated that memory accounted for approximately 62% of the variance and that both procedural learning ability and WM were statistically significant predictors of RMI-change. Results from the regression for RDI-change indicated that memory accounted for approximately 8% of the variance; neither declarative nor procedural learning ability nor WM were statistically significant predictors of RDI-change.
Study-abroad learners: Regression model examining declarative, procedural, and working memory and processing change from baseline to follow-up.
Notes. RDI = response dominance index; RMI = response magnitude index. B = unstandardized regression coefficient; SEB = standard error of B; β = standardized regression coefficient. * p < .05, ** p < .01.
3 Discussion
Results from Experiment 2 showed that intermediate Spanish learners in a study-abroad context made significant gains from the beginning to the end of a semester in regard to their ability to detect syntactic, phrase structure violations. In regard to the processing of phrase structure violations, these learners evidenced the development of a group-level N400 effect and a significant shift in RDI to a more negativity-dominant response. In addition, when learners were grouped by their ERP dominance type, N400s were evidenced before and at the end of the semester abroad, and a P600 was evidenced at the end of the semester. Thus, overall, study-abroad learners evidenced improved performance on a judgment task as well as the emergence of a subgroup-level N400 and P600 at the end of the semester.
How do the current results compare with previous study-abroad research? In regard to performance changes, the results differ from findings for grammatical gender agreement, where judgement task improvements were not observed after study abroad (Grey et al., 2015; Isabelli-Garcia, 2010). The current study’s results are consistent, however, with findings of improvements among study-abroad learners for word order and number agreement judgments (Grey et al., 2015). Thus, it appears that certain linguistic forms may be more likely to develop with study-abroad experience. In regard to ERP processing effects, the group-level analysis seems to suggest a progression from no clear, language-related effect before studying abroad to the development of an N400 effect, which is consistent with other longitudinal L2 ERP studies that have shown this pattern of progression (e.g. Morgan-Short et al., 2010). Interestingly, the specific ERP effects evidenced across the study-abroad learners at baseline or follow-up testing are consistent with ERP effects evidenced in previous studies for similar populations of learners in several ways: First, although its functional significance is unclear, the early anterior-central positivity evidenced in the current experiment at baseline is similar to (1) the effects found for syntactic violation processing among low proficiency, explicitly trained learners of an artificial language (Morgan-Short et al., 2012), as well as (2) the effect in L2 Spanish learners who were relatively matched with the present learner group at baseline in terms of L2 proficiency and experience (Bowden et al., 2013). Second, after a semester abroad, the present learner group exhibited an N400 effect when their performance was approximately 75% accurate on the phrase structure judgment task, which is parallel to the implicitly trained artificial L2 learners who evidenced an N400 effect when performance was approximately 71% accurate on a phrase structure judgment task (Morgan-Short et al., 2012). These parallel results suggest that learners in study abroad (and implicit laboratory) contexts may tend to rely on meaning-based strategies when processing L2 syntax.
Indeed, previous research has found evidence of more meaning-based communicative strategies for learners with study-abroad experience (Tokowicz et al., 2004). Specifically, in their L2 production study, Tokowicz, Michael and Kroll (2004) found that higher WM allowed learners with study-abroad experience to make more effective use of meaning-based communicative strategies. Considering that the N400 is posited to reflect semantic processing strategies (see Kuperberg, 2007; Kutas and Federmeier, 2011), if study-abroad experience pushes learners to employ meaning-based communicative strategies, we might expect learners who have just returned from immersion to make more use of semantic information in sentence processing, as reflected by the findings of a more negative RDI at follow-up testing and the emergence of an N400.
Although these group-level results are informative, it is also important to consider that these patterns did not seem to be representative of all learners. Indeed, when grouped by processing dominance, a subgroup of study-abroad learners evidenced a P600 effect at follow-up testing, which suggests reliance on combinatorial processing. The finding in this subgroup of learners is consistent with the P600 effect for phrase structure violations found previously for implicitly trained learners at higher levels of proficiency (e.g. 94% accuracy, Morgan-Short et al., 2012). Overall, even as we detect group-level results, we see individual variation in processing responses that provides further evidence that group-level analyses may obscure processing effects evidenced in subgroups (Tanner and van Hell, 2014; Tanner et al., 2014).
Our research questions asked whether individual differences in memory could account for performance and processing changes. For study-abroad learners, we predicted that WM would account for gains in behavioral performance and changes in processing. WM did not account for behavioral gains but did account for processing changes. Within this group of learners, WM predicted an increase in neural response magnitude from pre- to post-semester, providing converging evidence with previous eye-tracking research that found WM modulated processing strategies for learners with study-abroad experience (LaBrozzi, 2012).
In terms of declarative and procedural memory, we predicted a role for procedural memory in both behavioral and processing changes. Our results are consistent with and extend previous work: here we look at change in performance and processing as opposed to at high stages of proficiency. Although most of the learners in this group did not reach advanced stages of proficiency, and thus would not be expected to rely fully on procedural memory for L2 grammar, we did find that procedural learning ability accounted for gains in behavioral performance as well as an increase in the magnitude of learners’ ERP responses. This pattern of results is consistent with predictions made by declarative and procedural perspectives of L2 acquisition that components of L2 grammar should come to rely on the procedural memory system with increasing proficiency and exposure (e.g. DeKeyser, 2015; Paradis, 2009; Ullman, 2015; Ullman and Lovelett, 2018).
Overall, study-abroad learners evidenced large performance gains and significant changes in processing dominance over the course of the semester of study. Procedural memory accounted for changes in performance and processing, and WM accounted for changes in processing.
IV General discussion
The two experiments reported here sought to explore how individual differences in declarative, procedural, and working memory account for changes in L2 behavioral performance and neural processing for learners in at-home and study-abroad contexts. The implications of the results for L2 issues related to context of learning, electrophysiological processing, and individual differences in cognitive abilities have been considered in the experiment-specific discussion sections (Sections II.3 and III.3). In this general discussion section, we consider the larger theoretical implications of the full set of results for perspectives about the role of memory in L2 and how that role may vary in different contexts.
First, we consider WM, which has been posited to play a role in L2 processing and development (e.g. Linck et al., 2014; McDonald, 2006; Williams, 2012). Results from the current study suggest a fairly constrained role of WM in development over one semester in intermediate-level learners of Spanish. The results indicated a role for WM for processing, but not behavioral, changes for learners in study-abroad, but not at-home contexts. These findings bear out claims that factors such as context and linguistic structure likely mediate the effect of WM on L2 development (e.g. Linck et al., 2014). Specifically, the current experiments provide preliminary evidence that WM may predict processing changes in settings where input is less controlled (e.g. study abroad), which is in line with (1) theoretical claims that processing demands, and thus, the role of WM, are greater in immersion settings (e.g. McDonald, 2006; Sagarra and Herschensohn, 2010), as well as (2) previous research that has found WM to be relevant for learners with immersion experience, but not for those whose L2 experience is limited to traditional classroom settings (e.g. LaBrozzi, 2012; Sunderman and Kroll, 2009; Tokowicz et al., 2004). Because WM accounted for increase in the magnitude of the processing response in study-abroad learners, we may tentatively conclude that WM facilitates processing regardless of the specific strategy, although it may be particularly important in meaning-based processing strategies (see Section III.3).
Second, we consider theoretical perspectives that posit a role for declarative and procedural memory and knowledge in L2 acquisition (e.g. DeKeyser, 2015; Paradis, 2009; Ullman, 2015). In regard to declarative memory, these perspectives predict that individuals who have a greater ability to learn declaratively may show more success at earlier stages of L2 and that this relationship might be enhanced when the context is more explicit (Ullman, 2015). When examining changes in performance and processing for intermediate-level learners, we did not find a role for declarative memory, which may not be surprising given the specific model predictions. Note, however, that because of the relatively low levels of behavioral performance at baseline testing, we might expect to find a relationship with declarative memory at that session. Indeed, post-hoc exploratory analyses revealed that at baseline testing, declarative memory was positively correlated with online processing (Experiment 1, RMI: r = .597, p = .011) and with behavioral performance (Experiment 2, GJT d’: r = .687, p = .009), which is in line with these specific theoretical predictions as well as with previous research (Carpenter et al., 2009; Hamrick, 2015; Morgan-Short et al., 2014; Morgan-Short et al., 2015b).
In regard to procedural memory, theoretical perspectives predict that individuals who have a greater ability to learn procedurally may show more success at later stages of L2 and that this relationship might be enhanced when the context is less explicit (Ullman, 2015). Behaviorally, we found that procedural memory predicted gains only for learners in the study-abroad context, despite similar behavioral development among at-home learners (improvement in GJT accuracy: at-home, 8%; study-abroad, 7%). This parallels the pattern of results found with artificial language, where learners with high procedural memory experienced advantages when trained under implicit, but not explicit, conditions (Brill-Schuetz and Morgan-Short, 2014), and provides converging evidence to support that the role of procedural memory may be enhanced under less explicit, more exposure-based contexts.
With regard to accounting for processing changes, results indicated that procedural memory predicted an increase in overall neural response magnitude for learners in the study-abroad context. What might have led to this result? It could be that having better procedural memory, along with better WM, facilitated processing the L2 input provided in the study-abroad context, which, in turn, led to a larger processing response in this group at the end of the semester, regardless of whether learners fell into the N400 or P600 processing subgroups. Note that the experiments here utilize phrase structure as a proxy for L2 development; certainly, future research that examines the relationship between cognitive factors and L2 development with different linguistic structures will aid in developing a better understanding of exactly which aspects of linguistic competence of tapped by processing measures, as noted by Roberts et al. (this issue).
The results of the current study must be considered in light of its limitations, many of which are inherent to research in natural contexts. A primary limitation is that the samples sizes of 17 (Experiment 1) and 13 (Experiment 2) are certainly smaller than ideal, which is not uncommon with other studies of this kind (e.g. Bond et al., 2011; Tanner et al., 2013; Tanner et al., 2014; White et al., 2012). Despite this methodological constraint, we seem to have sufficient power to detect effects in these datasets (the emergence of group-level ERP effects in both experiments, as well as a predictive role for procedural and working memory in Experiment 2). Future research should aim to replicate and extend these findings with larger participant numbers. A second limitation is that learners in both experiments represent relatively diverse L2 experiences during the semester of study: Participants were enrolled in different Spanish classes in both experiments, and in a number of different semester-long study-abroad programs in Experiment 2. These differences likely imply a range of L2 exposure and use that may interact with cognitive abilities to influence L2 development. Although this diversity could arguably afford greater generalizability of these results to a wider range of ‘at-home’ and ‘study-abroad’ contexts as opposed to specific programs, future work may wish to measure and analyze the role of amount and type of L2 contact in these settings. Such work may also want to consider affective factors, such as motivation, as a variable that accounts for L2 performance and processing (see Tanner et al., 2014 for evidence regarding a role for motivation). The results of the current study suggest that future investigations along these lines will be informative to our understanding of L2 acquisition in study-abroad and at-home contexts.
V Conclusions
The two complementary experiments reported here build upon previous research by providing the first empirical assessment of individual differences in declarative, procedural, and working memory as predictors of L2 development in different contexts of learning. This research addresses gaps in the extant literature by examining both processing and performance development in naturalistic settings. Using a longitudinal design, we found that procedural memory accounted for changes in L2 performance and processing, and WM accounted for changes in L2 processing, but only for learners in a study-abroad context. Future research that utilizes a multidimensional approach informed by second language acquisition and cognitive neuroscience research will provide further insights into the relationships between external and internal factors in L2 development that will help to determine whether individual differences in memory are differentially predictive of L2 outcomes across learning contexts and to refine theories of L2 acquisition.
Footnotes
Appendix 1
Participant background information.
| Variable | At home |
Study abroad |
|---|---|---|
| Age | 20.18 (1.63) | 20.54 (1.51) |
| Number of L1s | 1.35 (.49) | 1.31 (.48) |
| Number of L2s | 1.71 (.92) | 1.31 (.48) |
| Age of Exposure: Spanish | 13.06 (3.44) | 11.85 (2.48) |
| Years Formal Instruction: Spanish | 6.29 (2.46) | 7.04 (2.15) |
| Age of Exposure: First L2 | 11.44 (2.77) | 11.85 (2.48) |
| Years Formal Instruction: All L2s | 7.31 (2.25) | 7.27 (2.30) |
| IQ a | 101.59 (13.44) | 107.85 (12.53) |
| Motivation: Spanish b | 6.44 (.69) | 6.62 (.76) |
| EIT-Baseline c | 48.88 (22.96) | 46.08 (19.94) |
| EIT-Follow-up | 48.69 (22.64) | 66.69 (15.42) |
| DELE-Baseline d | 21.71 (5.37) | 20.15 (4.12) |
| DELE-Follow-up | 20.82 (4.42) | 26.85 (6.67) |
Notes. Although we are not comparing learners in the At-Home and Study-Abroad experiments, t tests were run to compare means for background factors as well as baseline linguistic assessments (EIT and DELE) in order to explore whether participants in these two experiments differed at baseline. The only significant difference evidenced between the groups was for Number of L2s: t(28) = −1.417, p = .037. aComposite IQ score from Kaufman and Kaufman, 2004. bMotivation on a range of 0–7, composite of responses to three questions to assess overall motivation to learn Spanish, questionnaire completed at the end of baseline language assessment session. cElicited imitation task (adapted from Ortega et al., 1999); maximum score = 116. dDiplomas de español como lengua extranjera (Diplomas of Spanish as a Foreign Language) (Spanish Embassy, Washington, DC), written proficiency test; maximum score = 50.
Appendix 2
Correlations and intercorrelations among dependent variables and predictor variables for study-abroad learners.
| Assessment | Declarative composite | Procedural composite | Working memory composite |
|---|---|---|---|
| GJT PS d’ change | −.502 | .808** | −.280 |
| RMI-change | .112 | .327 | .348 |
| RDI-change | .113 | −.156 | −.105 |
| Declarative composite | — | −.474 | .197 |
| Procedural composite | — | −.539^ | |
| Working Memory composite | — |
Notes. ^p < .10, **p < .01.
Appendix 3
In this appendix, we report all statistically significant main effects and interactions from the lateral, global ANOVAs, along with motivated follow-up, Bonferroni-corrected, pair-wise comparisons. Results from midline, global ANOVAs and their follow-up, Bonferroni-corrected, pair-wise comparisons are reported only when these analyses revealed effects that were not revealed by the lateral analysis. Note that the full statistical results from the lateral, global ANOVAs are available in Table 14 (at-home learners, Experiment 1) and Table 15 (study-abroad learners, Experiment 2).
Acknowledgements
We gratefully acknowledge the following people for their ideas, feedback, and suggestions, which have improved this research and manuscript: Darren Tanner, Jessica Williams, Susanne Rott, Luis Lopez, the members of the Cognition of Second Language Acquisition Laboratory at the University of Illinois at Chicago, the members of the Instructed Second Language Acquisition Laboratory at Northern Illinois University, and our reviewers.
Declaration of conflicting interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by a Language Learning Dissertation Grant to MFS and a University of Illinois at Chicago Campus Research Board Pilot Grant to KMS.
