Abstract
In two experiments, this article investigates the predictive processing of gender agreement in adult second language (L2) acquisition. We test (1) whether instruction on lexical gender can lead to target predictive agreement processing and (2) how variability in lexical gender representations moderates L2 gender agreement processing. In a pretest–posttest design, Experiment 1 trained 34 intermediate first language (L1) English learners of German on gender assignment. After training, the L2 group showed predictive gender processing; yet, performance correlated with accuracy in gender assignment. Experiment 1 suggests that target knowledge of lexical gender in the L2 lexicon is a prerequisite for predictive use of gender agreement in L2 syntax: Non-target gender assignment would lead to partially erroneous gender prediction such that use of gender agreement is costly for the parser and therefore abandoned. To test this account, Experiment 2 investigated predictive processing in 42 German native speakers who have target-like gender assignment and agreement. In a between-group design, one group received target input and the other received filler items with non-target gender assignment. The latter group subsequently stopped using gender agreement predictively in all experimental trials. Hence, L2 problems with gender agreement can be emulated in native processing. Taken together, the experiments suggest that variability of lexical gender assignment affects processing of gender agreement in natives and non-natives. We interpret the findings in the context of current probabilistic theories of implicit learning and processing adaptation.
I Introduction
Grammatical gender ranks among the linguistic features that are the most difficult to learn in late second language (L2) acquisition, even though it is acquired with relative ease and rapidity in first language (L1) acquisition. Recent research on the real-time processing of gender in production and comprehension has shown that the degree of difficulty with gender is moderated by various factors, including L1 (Dussias et al., 2013; Foucart and Frenck-Mestre, 2011), proficiency (Gillon-Dowens et al., 2010; Hopp, 2013), lexical knowledge of gender classes (Lemhöfer et al., 2014), type of learning (e.g. Arnon and Ramscar, 2012; Davidson and Indefrey, 2009; Grüter et al., 2012) and lexical access (Hopp, 2013). These findings suggest that non-target-like gender processing in L2 learners is systematic and conditioned by a variety of factors in adult L2 processing.
Against this backdrop, the present study tests (1) under which conditions adult L2 speakers can come to process L2 grammatical gender in a native-like way, and (2) which aspects condition variability in the processing of grammatical gender in an L2. Our focus will be on intermediate adult L2 learners. Specifically, we explore how the learning of gender assignment, i.e. the classification of a noun in a gender class (e.g. German Tisch (‘table’) → masculine) affects the predictive processing of grammatical gender agreement, i.e. constructing a forward dependency relation between the gendered determiner and the noun (derMASC → TischMASC). In particular such predictive processing of gender agreement has been shown to be associated with great difficulty for adult L2 speakers (e.g. Grüter et al., 2012; Hopp, 2013; Lew-Williams and Fernald, 2010), while it is highly automatic in (child) native speakers (e.g. Lew-Williams and Fernald, 2007).
Different from many previous psycholinguistic studies that consider the processing of grammatical gender in a sample of L2 learners at a particular point in time, this study focuses on effects of learning gender. Our focus on training effects in the predictive processing of grammatical gender agreement allows us to adjudicate between theoretical approaches, (1) that adult L2 learners have difficulty in representing new abstract grammatical (gender) features in their L2 (Representational Deficit Hypothesis; Hawkins and Chan, 1997; Tsimpli and Dimitrakopoulou, 2007); (2) that gender features are underspecified (e.g. Hawkins and Casillas, 2008; McCarthy, 2008); (3) that L2 learners fall short of mapping gender forms to features in real-time processing (Missing Surface Inflection Hypothesis; Prévost and White, 2000); (4) additionally, more recent proposals relate L2 problems in processing syntactic gender agreement to variability in lexical gender assignment (e.g. Grüter et al., 2012; Hopp, 2013; Montrul et al., 2014). In order to test this last set of proposals, we present two experiments.
In Experiment 1, we present a pretest–posttest training study on the predictive processing of grammatical gender in L2 German: Low-intermediate to advanced L1 English learners of German receive training on gender assignment to reduce their lexical variability with gender, and they are subsequently tested on predictive processing of gender agreement. Experiment 2 probes whether the specific non-target processing pattern of the adult L2 learners elicited in Experiment 1 can be replicated among native speakers under L2-like input conditions. In Experiment 2, we introduce gender assignment errors in the input to native speakers, i.e. we simulate L2-like variability in gender assignment in the input. Both experiments show effects of implicit learning in the predictive processing of gender agreement. In Experiment 1, greater accuracy in lexical gender assignment among the L2 learners is associated with increased grammatical prediction of gender agreement. In Experiment 2, we find that native speakers stop using grammatical gender predictively once the input becomes unreliable. In conjunction with previous findings in Grüter et al. (2012) and Hopp (2013), the present study suggests that non-target-likeness of gender agreement processing is mandated by differences in the lexical representations and processing of gender. We interpret these results as a consequence of adaptive error-driven learning and parsing.
This article is structured as follows. In Section II, we outline previous research on grammatical gender in L2 processing and develop the research questions. Section III presents Experiment 1, and we discuss Experiment 2 in Section IV. In Section V, we offer a general discussion and outline suggestions for future research.
II Grammatical gender in L2 processing
Grammatical gender is realized in many languages that classify nouns into different lexical gender classes (Corbett, 1991). In psycholinguistic models, lexical gender assignment is conceptualized as the linking of nouns to a gender node (e.g. Schriefers and Jescheniak, 1999) or the annotation of gender information at a noun’s lemma level (e.g. Carroll, 1989). On top of lexical gender assignment, gender is realized morphosyntactically by virtue of agreement between nouns and their dependents inside the noun phrase (e.g. determiners and adjectives) as well as in anaphoric pronoun reference within and across clauses. Many grammatical models hold that syntactic gender agreement is based on checking or matching relations between the noun’s gender feature and features on determiners and adjectives (e.g. Bernstein, 1993; Carstens, 2000). In language processing, speakers and listeners then compute feature-based agreement relations between syntactically related constituents (e.g. Franck et al., 2008).
Late L2 learners have difficulties in both lexical gender assignment and syntactic gender agreement. First, even very advanced L2ers exhibit non-target assignment of gender to nouns in production, in particular if their L1 does not have grammatical gender (Franceschina, 2005). For instance, advanced L1 English L2ers of Spanish only reach levels from 75% to 90% accuracy in gender assignment in elicited production (Alarcón, 2011; Bruhn de Garavito and White, 2003; Franceschina, 2005; Grüter et al., 2012; Montrul et al., 2008), even though Spanish affords transparent phonological cues for gender assignment in noun endings. In languages with more opaque gender systems, e.g. German and Hindi, gender assignment among late learners appears to be even more target-deviant or variable (Hopp, 2013), even though learners benefit from phonological and semantic regularities in gender assignment (e.g. Bobb et al., 2015; Bordag et al., 2006). Hence, L2 learners show partially incorrect gender assignment and unstable gender assignment. We refer to these as variable lexical gender assignment, which signifies sizeable deviations from the systematic and target-like assignment of lexical gender to nouns in the mental lexicon of L2 learners compared to native speakers.
For syntactic gender agreement, late L2 learners also show non-target like processing, in particular for adjective–noun gender agreement. Reading-time and event-related-potentials (ERP) studies reveal that sensitivity to agreement violations correlates with the realization of grammatical gender marking in the L1 (e.g. Sabourin and Stowe, 2008; though see Foucart and Frenck-Mestre, 2011; Gillon-Dowens et al., 2010, 2011), proficiency (Gabriele et al., 2013; Sagarra and Herschensohn, 2010) as well as with working memory capacity (Keating, 2010; see also Foote, 2011). However, even intermediate learners distinguish between grammatical and ungrammatical gender agreement in automatic processing in native-like ways (Tokowicz and MacWhinney, 2005). Beyond agreement violations, however, late L2 learners have been shown to be non-target-like in gender processing; in particular, they seem compromised in their ability to use gender agreement in predictive comprehension.
In sentence comprehension, computing predictive, i.e. forward, agreement relations between the determiner and the noun on the basis of gender marking on determiners can serve to narrow down the set of potential nouns that could follow and thus facilitate comprehension (e.g. Lew-Williams and Fernald, 2007). Indeed, native speakers and early bilinguals of French display shorter naming latencies of nouns if they are preceded by gender-informative vs. gender-neutral determiners (e.g. le/leur joli bateau ‘theMASC/theirAMBIGUOUS nice ship’). In contrast, even highly advanced late L1 English L2ers of French make no difference in naming speed according to preceding gender information on nouns (Guillelmon and Grosjean, 2001; see also Bobb et al., 2015) and adjectives (Scherag et al., 2004).
In visual-world eye-tracking studies, predictive effects of gender agreement can be studied in sentence contexts. In visual-world eye-tracking, listeners view displays containing different objects while they listen to spoken instruction to look at or manipulate one of the objects (Huettig et al., 2011). Concurrently, eye movements are measured. The analysis of fixation proportions to potential referents allows for insights into the ongoing reference resolution in sentence comprehension. Native adults as well as children as young as 28 months use gender on determiners to pre-activate noun labels of the objects that are presented, and they will show faster looking times to the target pictures and less competition from other nouns/objects in the display when gender information clearly indicates which noun will follow (e.g. le bouton ‘theMASC button’ vs. la bouteille ‘theFEM bottle’; Dahan et al., 2000; Lew-Williams and Fernald, 2007; van Heugten and Johnson, 2011). For adult L2 learners, however, predictive processing of gender obtains only partially. Adult learners whose L1 encodes grammatical gender can come to use gender marking predictively at intermediate proficiency levels (Dussias et al., 2013; Morales et al., 2015). In contrast, even more highly-proficient L1 English learners do not show benefits from informative gender marking on determiners to predict the following nouns by virtue of gender agreement (Grüter et al., 2012; Lew-Williams and Fernald, 2010; though see Dussias et al., 2013). In a partial replication of Lew-Williams and Fernald (2010), Grüter et al. (2012) found that, unlike native speakers, even near-native L1 English learners of Spanish did not demonstrate anticipatory processing of gender agreement for highly familiar Spanish nouns. Of note, the L2 group did evince native-like gender prediction for novel nouns whose labels and genders they first encountered in an auditory training phase of the experiment.
Based on this asymmetry and a prior training and simulation study by Arnon and Ramscar (2012), Grüter et al. map out a ‘lexical gender learning hypothesis’ of non-target-like gender processing. Arnon and Ramscar (2012) show that speakers learning an artificial language with determiner–noun gender agreement acquire noun gender more robustly if they are first presented with non-segmented input (i.e. determiner–noun sequences) than if they are first exposed to noun labels only and the non-segmented input later (see also Siegelman and Arnon, 2015). A computational model simulation yielded the same result. Arnon and Ramscar argue that this difference can serve to explain child–adult asymmetries in gender processing because children and adults acquire language in different environments that lead them to parse and represent gender at different levels of granularity.
Children are exposed to aural and non-segmented input, from which they need to extract words and match them with conceptual representations. For the isolation and interpretation of nouns, children thus first need to segment nouns from the speech stream of continuous determiner–noun sequences. Given that children are excellent at tracking statistical properties in the input (e.g. Saffran et al., 1996), children can rely on high co-occurrence probabilities between determiners and nouns to isolate noun units and ultimately map determiner forms to abstract grammatical nodes expressing gender and map the remaining units to nouns. As the co-occurrence association between determiners and nouns is strong initially, gender assignment will reliably be encoded and gender-marked forms can act as strong predictive cues for the associated nouns in processing.
In contrast, adults come to the task of L2 acquisition with prior metalinguistic knowledge about nouns and determiners as well as gender classes. In addition, they receive written input, which is visually pre-segmented by blanks between words. In consequence, adult learners can map nouns directly to a conceptual representation without computing co-occurrence relations to extract gender-marking determiners and nouns. Since adults therefore start learning on the basis of smaller linguistic units, i.e. noun labels, the gender class representations of nouns are likely to be weaker than in children and native speakers. Gender cues on the determiner will thus be less informative in sentence processing for L2 adults than for child learners as a direct consequence of the different learning environments in which the language was acquired.
Against this backdrop, the asymmetry between familiar and novel nouns Grüter et al. (2012) report in their experiment for late L2 learners may reflect the different ways in which the nouns were learnt. Since the novel nouns were first encountered auditorily embedded in full sentences containing determiner–noun sequences in the course of the experiment, the L2 learners arguably had to engage the same distributionally based segmentation strategies for separating determiners and nouns as child learners in order to learn the gender of the nouns. In consequence, the links between lexical gender nodes and the novel nouns were stronger and more reliable than for familiar nouns.
Building on these findings, Hopp (2013) directly tested for contingencies between lexical gender and predictive processing in L1 English late learners of German. In this study, the strength of lexical gender nodes was operationalized as the accuracy by which the correct gender form could be supplied in spoken production. For L1 English advanced to near-native adult learners of German, correct gender assignment in production correlated with predictive gender processing in comprehension. In addition, speed of lexical access moderated the size of gender prediction. The study showed, first, that native-like predictive gender processing is possible even for speakers whose L1 does not instantiate grammatical gender; yet, second and crucially, that target prediction is moderated by lexical variability.
Hopp frames the account for the link between lexical and syntactic variability in terms of recent psycholinguistic approaches to language acquisition and prediction. Across different frameworks, prediction has been argued to be a central component of language acquisition (e.g. Chang et al., 2006; Clark, 2013; Dell and Chang, 2014; MacDonald, 2013; Phillips and Ehrenhofer, 2015; Pickering and Garrod, 2013). By making predictions based on observations of the probability of linguistic events in the input (prediction-by-association) or based on the engagement of the listener’s own production system (prediction-by-simulation), listeners can test hypotheses about language against the input and thus acquire knowledge of the target language or adjust their assumptions about the target language (Huettig, 2015). If a prediction is not met in the input, prediction error allows for the adjustment of knowledge and leads to implicit learning by adaptation to the properties actually observed in the input. It has been argued that a rational and resource-constrained parser adapts its predictive beliefs in order to maximize the utility function of prediction: The parser matches the use of prediction to the statistics and the reliability of the input in order to reduce the level of surprisal or processing difficulty of the upcoming input (for discussion, see Kuperberg and Jaeger, 2015).
In language processing, the parser has been shown to rapidly adapt to prediction error in that prediction is adjusted, attenuated or even abandoned (DeLong et al., 2014; see also Fine and Jaeger, 2013; Fine et al., 2010, 2013). For instance, native readers adjust their parsing preferences according to the probabilities experienced in the experimental environment. In Fine et al. (2013), participants read sentences like ‘The experienced soldiers warned about the dangers {before the midnight raid/conducted the midnight raid}’, where the segment in italics is ambiguous between a relative clause or a main clause continuation. Even though the main clause interpretation is more frequent and much preferred in general, after exposure to the relative clause variant in the course of the experiment, readers started showing shorter reading times for the latter. In turn, the originally preferred structure got demoted and became associated with longer reading times, i.e. a lower degree of prediction. Wells et al. (2009) report that experimentally induced preferences for otherwise low-frequency variants persevere in readers over longer time periods. Such findings from syntactic processing mirror findings from speech perception, where listeners rapidly adapt their prediction according to speaker identity and implicitly learn to adjust their expectations from previous prediction errors (e.g. Kraljic and Samuel, 2007; Kleinschmidt and Jaeger, 2015; for syntax, see Kamide, 2012).
In the context of gender agreement, Hopp (2013, 2015a) argued that the relation between knowledge of gender assignment and predictive use of gender agreement follows from similar implicit learning on the basis of prediction error. If gender assignment is variable or non-target-like in the L2 lexicon, then L2 learners will encounter frequent mismatches between their (subjective) gender assignment and the actual gender of nouns in the input. Specifically, lexical variability paves the way to frequent prediction errors, since the noun predicted to occur by the listener based on her non-target gender assignment does not occur in the (target) input. As a consequence, the parser likely adjusts prediction strength according to error-based implicit learning, such that gender agreement will not be used predictively by L2 learners due to variable lexical gender assignment.
Lexical variability in gender assignment has also been found to moderate gender agreement processing in other paradigms. In an ERP study on German–Dutch late bilinguals, Lemhöfer et al. (2014) find that the bilingual group does not demonstrate target-like sensitivity to gender agreement violations when trials are analysed according to the target gender system of Dutch. However, when the subjective gender assignment to the Dutch words by the participants as elicited in a separate written gender assignment task was taken as the baseline, the bilinguals evinced a native-like ERP response pattern. For translation recognition, English late learners of German also display target-like sensitivity to gender agreement violations only when their accuracy in gender assignment is taken into account (Bobb et al., 2015). Such studies further underscore that gender agreement processing is contingent on variability in lexical knowledge of gender assignment.
In this article, we directly test this contingency relation by manipulating lexical variability in gender assignment in that we provide instruction on German gender to intermediate L1 English learners on German gender. Previous behavioural and neurophysiological studies on instruction on gender in adult L2 acquisition have found that L2 learners of Spanish can come to acquire grammatical gender agreement to high levels through implicit as well as explicit training (De Jong, 2005; Morgan-Short et al. 2010). Similarly, Dutch learners of a miniature German grammar come to display target neurophysiological sensitivity to declension and gender violations even after minimal explicit or implicit training (Davidson and Indefrey, 2011; but see Davidson and Indefrey, 2009). These studies suggest that learners benefit from different types of instruction on a restricted set of nouns even in the course of several minutes and days in that they show emerging sensitivity to gender agreement (see also Jackson et al., 2015). Fully native-like gender processing may obtain after longer-term instruction (e.g. Gabriele et al., 2013). Against this backdrop, we designed a training study on a set of German nouns to address the following questions.
Can training on determiner–noun sequences lead to target predictive processing of grammatical gender in intermediate L2 learners (Experiment 1)?
Does lexical variability affect the predictive processing of grammatical gender agreement in adult L2 learners (Experiment 1)?
III Experiment 1
Experiment 1 is a training study in a pretest–posttest design of production and comprehension accuracy on gender in L2 German. In a visual-world eye-tracking study, we test whether participants use a gendered determiner to predict the following noun. The pretest and the posttest were modelled on Hopp (2013) and included (1) a picture description task in order to probe lexical gender assignment in production and (2) a visual-world eye-tracking task to investigate the predictive processing of gender agreement. Figure 1 gives an overview of the design of the study and the tasks used in the testing and training phases.

Experimental design of training study (tasks shown in italics).
1 Participants
Thirty-four L1 English late L2 learners of German took part in the experiment (M = 21.2 years, SD = 2.2; 18 females). At the time of testing, all participants were students in German programs at several universities in the USA. They had all started acquiring German after age 10 (M = 13.9 years, SD = 2.5). Prior to participating in the experiment, they completed a standardized 30-item written placement test (Goethe Institut, 2010). On average, they scored 17.7 points (SD = 4.4), which placed them into the upper-intermediate range. In addition, they were asked to assess their current use of German in percent. Their mean proportional weekly use of German amounted to 11.5% (SD = 9.5), which demonstrates that participants were strongly dominant in English. All participants had normal or corrected-to-normal vision and reported no history of learning difficulty or language deficits. Further information about the participants is provided in Table 1.
Participant information: L2 participants (n = 34).
The study comprised a pretest, a treatment and a posttest. In the following, we first present the materials of the pretest and the posttest and then outline the treatment.
2 Pretest and posttest
a Materials
As in the study in Hopp (2013), we constructed stimuli containing a definite determiner, an adjective and a noun, in which gender is unambiguously realized on determiners only, with adjectives being the same across all three genders.
(1) Wo ist Where is the masc/fem/neut yellow [Noun]?
Twenty items in total were created for the gender condition in example (1), i.e. five difference trials for each of the three gender forms, i.e. masculine, feminine and neuter, in (1), and five same trials. For each item, four-picture displays using coloured drawings or reduced photographs of pictureable and easily identifiable inanimate objects were designed (Figure 2). All object labels had a frequency of at least 2,651 tokens in the 25-billion-word COSMAS II corpus of contemporary written German (mean = 362,163; sd = 1,336,894; COSMAS II, 2008; date accessed 21 July 2015). The lexical items were chosen from word lists of German textbooks and online resources to make sure that most of the labels would be known to intermediate L1 English learners of German (see Appendix 1).

Display used in Experiment for the difference condition (i.e. the three identically-coloured objects have different genders).
For the difference trials, one object with a clearly identifiable colour was the target; there were two identically coloured competitor objects that bore one of the other two genders each, and the fourth object in the display was a differently coloured distracter of a gender different from the target (Figure 2). All objects were similar in size and placed in identically-sized quadrants on the display. For the same trials, five ambiguous displays were designed along the same lines as Figure 2; however, the three colour-matched objects all had the same gender, and the differently coloured distractor had a different gender. The stimuli thus consisted of 15 difference trials, in which gender on the determiner was a potentially informative predictive cue for the target object by making it the only referent that is grammatically compatible with the determiner, and five same trials, in which the determiner and the adjective were grammatically compatible with three objects in the display. In addition, five items were added in which the differently coloured object was the target (filler condition). In this condition, the colour adjective was an unambiguous lexical cue to the referent. Moreover, five items contained a numeral (‘Where do you see two [A] [N]?’), which clearly identified the target (numeral condition). These latter two conditions enabled participants to predict the upcoming noun independently of gender information on the determiner. The resulting 30 displays were assigned to one of three lists which counter-balanced target noun and object position for the experimental items. Trials were presented in pseudorandomized order, such that no condition appeared twice in succession. In all, 80 nouns were used in the experiment, since some of the distractor and filler items were repeated.
b Procedure
All sentences were recorded by a native male speaker of High German at a slow-to-moderate pace. The mean length of the determiner was 392 ms (range: 388–411 ms), 621 ms (range: 566–705 ms) for the adjectives and 592 ms (range: 535–763 ms) for the nouns. The onset of the adjective was aligned to occur 650 ms after determiner onset, and the onset of the nouns to 1,750 ms after the onset of the determiner.
Participants were informed that the experiment involved word learning in German. They were debriefed about the real nature of the experiment only after the posttest. Participants were tested individually in a quiet room at the university. In the first session, the participants completed the proficiency test and a language history questionnaire. In addition, they took part in the pretest. For the second session that involved the training and posttest, the participants returned to the lab approximately one week later (M = 7.4 days, sd = 0.9).
As in Hopp (2013), the pretest and posttest combined a production and a comprehension task. Participants sat in front of a 19-inch screen at a distance of approximately 70 cm. Participants were instructed, and they completed two practice trials before a calibration began. The calibration aimed for visual acuity below 0.5 degrees and was repeated in the course of the experiment if necessary. Throughout the experiment, an SMI RED eye-tracker recorded gaze position at 60 Hz.
c Production task
In the production task that immediately preceded the comprehension task, participants saw still images of the four-picture displays and were asked to name the four objects in each display, including their colour (see Figure 2). This way, we could (1) ensure that participants recognized the objects and knew their labels and (2) assess which gender L2 participants assigned to the labels of the objects. In their descriptions of the objects, the participants used noun phrases containing determiners, colour adjectives and nouns. In total, participants named 80 different objects in the experimental condition (see Appendix 1). Participants’ responses were audio-recorded and transcribed. Responses were coded for gender on the basis of the gender form used on the determiner (der, die, das) for definite NPs as in example (1) or, if the determiner was indefinite, the adjective (e.g. gelb-er, gelb-e, gelb-es ‘yellowMASC/FEM/NEUT’) for indefinite NPs. Overall accuracy was coded as percentage of target responses.
d Comprehension task
After the production part, the screen changed immediately to show the same displays including a fixation cross. After a preview of 3,000 ms, a sound signal alerted participants to fixate on the cross. The participants’ gaze was directed to the fixation cross prior to each trial to avoid baseline effects of participants already looking at the target item before the onset of the critical noun phrase (Barr et al., 2011); 1,500 ms after the sound signal, the auditory presentation of the sentences began. In the analysis, gaze position after determiner onset was coded every 20 ms for 3,000 ms. Items with looks to the target region within the first 200 ms after determiner onset were discarded, because they could not reflect linguistically guided gaze shifts (e.g. Huettig et al., 2011). In all, this affected less than 1% of the data. In total, the experiment comprising the two tasks took about 20 minutes. Participants received a different list of the experiment in the pretest and the posttest, such that they would not encounter the same target nouns in the same trials or see the target objects in the same position as in the pretest.
3 Training experiment
The training experiment comprised a computer-based training and a testing phase: In the training phase, all 80 nouns used in the pretest were presented to the participants. The pictures were the same as the ones used in the pretest. The procedure of the training experiment followed previous gender learning studies such as Arnon and Ramscar (2012), Davidson and Indefrey (2009, 2011) and Jackson et al. (2015). The 80 nouns were assigned to two blocks of 30 items each for the nouns in the experimental conditions and one block of 20 items for the nouns in the filler condition. In each block, a picture corresponding to each noun was presented to the participants for a duration of 4,000 ms each, with the determiner–noun sequence (e.g. der Käse ‘theNOM cheese’) written underneath the picture (see Figure 1). In addition, an audio-stimulus was played containing the determiner–noun sequence. The participants repeated the auditory stimulus out loud after they had heard it. Within each block, each noun/object was presented three times. Order of presentation was randomized, and no feedback was given.
In the testing phase, which immediately followed each training block, the objects were presented visually without any auditory or written information (Figure 1). Participants then had to say the determiner–noun sequence out loud. The experimenter logged their responses on the keyboard. If they had produced the correct noun and the target determiner, the participants received positive feedback (‘yes’), and the next item was presented. In cases when they had produced a non-target determiner, the participants received negative feedback (‘no’), and the item was tagged to reappear at the end of the block. The participants did not receive any correction. Items for which the participants had produced non-target determiners were presented until the participants produced the correct determiner. As a consequence, all participants reached 100% accuracy on grammatical gender assignment in production at the end of the testing blocks.
The order in which the two blocks containing the items from the experimental conditions were presented alternated between participants. Order within the blocks was randomized. The final block of 20 items comprised the objects used in the filler condition. Since the last block was identical for all participants, we could control for recency effects in training on the experimental items. In all, the training experiment took about 20 minutes for the participants. The posttest immediately followed the training study.
4 Results
a Production task
In order to test the extent to which the participants learned noun labels and gender assignment in the training, we first consider the production accuracy by the participants in the pretest and the posttest. Table 2 shows the accuracy scores for noun naming and gender assignment.
Accuracy in production in pretest and posttest: all L2 participants (n = 34).
In the pretest, there was large variability in accuracy of picture naming. In contrast, errors or failures in picture naming were very low in the posttest, which indicates that participants had successfully learned the nouns during the training. For noun naming, a mixed logistic regression with participant and item intercepts as crossed random factors and Test as a fixed main effect showed a highly significant effect of Test, i.e. a difference in accuracy between pretest and posttest (β = 1.451, SE = 0.128, t = 11.349, p < .001).
For gender accuracy, we observe variability between the participants in both the pretest and the posttest. The mixed logistic regression on the number of gender mistakes in the pretest and the posttest also yielded a highly significant effect (β = 1.440, SE = 0.071, t = 20.342, p < .001), which shows that the training improved performance on gender assignment. However, variability remained in the posttest demonstrating that the participants could retain accuracy in gender assignment from the training and testing blocks to different degrees.
In a next step, we assessed the extent to which learner variables correlated with naming and gender assignment accuracy at each test. For noun naming in the pretest, the only learner variable that yielded a significant association with proficiency score (r = .508, p = .002), i.e. the higher the proficiency, the higher the gender accuracy. Neither length of exposure, length of residence nor current use of German returned any significant correlations (all rs < .17). For noun naming in the posttest, there were no significant associations with any learner variable (all rs < .2).
For gender assignment accuracy in the pretest, both Length of Exposure (r = –.406, p = .017) and proficiency score (r = .385, p = .025) correlated with accurate gender marking. In the posttest, only the association between proficiency and gender accuracy retained significance (r = .469, p = .005).
In sum, the learners demonstrated clear effects of learning from pretest to posttest in both vocabulary and gender assignment. Yet, learning effects were different between noun naming and gender assignment, in that gender assignment remained more variable even after training to criterion in the training session prior to the posttest.
b Comprehension task
In the comprehension task, we tested the extent to which training on lexical gender affects predictive use of gender agreement. We analysed the reaction time, i.e. the first fixation on the target picture after determiner onset. Following Hopp (2013) and Lemhöfer et al. (2014), we computed reaction times for the subset of trials based on the production accuracy of gender assignment by the L2ers, i.e. the subjective gender assignment of the participants. This was done to make sure we only used trials in which the genders assigned by the participants could act as an informative cue (difference trials) or would be uninformative (same trials). Specifically, we categorized the comprehension trials of the L2ers into (1) difference trials, if the L2er had assigned the target lexical gender to the target item and other (target) genders to the other items. Hence, these were trials in which gender marking on determiners would be informative as to the target item according to the learner’s subjective gender assignment. For all experimental trials, the difference trials constituted 159 out of 496 difference trials (32%) in the pretest and 375 out of 500 difference trials (75%) in the posttest. Further, we defined (2) same trials as trials in which the participant had assigned the target item and other items the same gender. These amounted to 91 out of 170 same trials (54%) in the pretest and to 146 out of 170 same trials (86%) in the posttest. In the same trials, gender marking could not act as an unambiguous cue for the target object. All other trials were categorized as uninformative trials. In these cases, the target gender cue is incongruous with the subjectively assigned gender and could not guide predictive looks to the target object or other objects were assigned the same gender as the target object in difference trials. These items were excluded from the analysis.
Figure 3 graphs the time course of fixations in the pretest (grey lines) and the posttest (black lines). The graphs show that the line for difference trials hardly diverges from the line for the same trials in the pretest. In contrast, participants were considerably faster to fixate on the target object in the difference trials compared to the same trials in the posttest.

Time course of fixation proportions to target object in same and difference trials in pretest (grey lines) and posttest (black lines) (in ms from determiner onset).
We analysed the reaction times of looking towards the target region (Figure 4) in a mixed linear regression model with the fixed factors Type (same vs. difference) and Test (pretest vs. posttest). In addition, participant and item were crossed random factors. There were no significant main effects of Type (β = 216.28, SE = 152.39, t = 1.419, p = .156) or Test (β = 193.22, SE = 151.33, t = 1.277, p = .202), yet a highly significant interaction of Type and Test (β = 305.82, SE = 87.68, t = 3.488, p = .001). In pairwise comparisons for the pretest and the posttest, the learners showed no significant difference between difference and same trials in reaction times in the pretest (β = 72.16, SE = 68.29, t = 1.057, p = .294); in contrast, there was a highly significant difference in the posttest (β = 411.92, SE = 59.53, t = 6.919, p < .001).

Mean reaction times (in ms) in target region by condition (Same vs. Different) and test (Pre vs. Post).
Figure 4 shows the mean reaction times in each trial type for the pretest and the posttest. As can be seen in this figure, the mean difference between difference and same trials is 88 ms in the pretest, while it reaches 346 ms in the posttest.
In order to address the second research question about the relation between lexical and syntactic variability, we tested whether the difference between same and difference trials was affected by accuracy in lexical gender assignment. For the pretest, a mixed linear regression analysis with Type and lexical gender assignment accuracy (Gender Accuracy) in the production test as fixed factors and participant and item as crossed random factors yielded no effects of Type, Gender Accuracy or any interaction (all ts < 1). For the posttest, however, the same analysis returned significant main effects of Type (β = 608.65, SE = 75.49, t = 8.063, p < .001), Gender Accuracy (β = 29.49, SE = 10.41, t = 2.832, p = .005), and a significant interaction of Type and Gender Accuracy (β = 22.79, SE = 5.44, t = 4.190, p < .001). To identify the nature of the interaction, correlational analyses were performed between the mean predictive gender effect (i.e. the mean difference between same and difference trials) and the lexical gender assignment accuracy in the production test for each participant. For the pretest, there was no significant association between the predictive effect of grammatical gender (r = .110). For the posttest, however, the correlation between gender assignment and the predictive use of gender was highly significant (r = .534, p = .001), i.e. greater accuracy in lexical gender assignment led to larger predictive effects in processing gender agreement.
To illustrate the significant association of the predictive effect in grammatical gender agreement, we decided to separate the L2 participants into two near-equally sized groups on the basis of their overall accuracy in gender assignment in production in the posttest following the procedure in Hopp (2013): A group with variable lexical gender, i.e. variable or non-target-like gender assignment, that made at least seven gender mistakes (9%) for the 80 objects (mean accuracy on gender = 76%), and a group with consistent lexical gender, i.e. close to target gender assignment, that made at most six (7.5%) gender mistakes (mean accuracy = 97.5%). Table 3 lists the gender assignment accuracy for all participants and the two lexical gender groups in the pretest and in the posttest.
Production results by group: Gender assignment accuracy: Pretest vs. posttest as percentages (n = 34).
One-way ANOVAs were run to check whether the groups differed on any factors other than gender assignment accuracy in the posttest. For the biographical variables, the groups did not show any significant differences, except for proficiency, with the gender-consistent group scoring somewhat higher (19.5 vs. 16.2; F(1,32) = 5.614. p = .024). In the pretest, the gender-consistent group also had a significantly higher gender assignment accuracy than the gender-variable group (F(1,33) = 8.047, p = .008), but both groups showed variability in gender production accuracy in the pretest.
For the comprehension data, a linear mixed analysis with Test, Type and Group as fixed factors and participant and item as crossed random factors returned a significant three-way interaction between Test, Type and Group (β = 396.65, SE = 183.20, t = 2.165, p = .031). Subsequent pairwise comparisons of the mean reaction times by group showed no significant difference between difference and same trials in the pretest for either group (all ts < 1). For the posttest, the effect of Type was not significant for the gender-variable group (β = 121.25, SE = 76.40, t = 1.587, p = .116). In contrast, the gender-consistent group demonstrated a highly significant effect of Type (β = 604.79, SE = 79.36, t = 7.621, p < .001). Figures 5 and 6 graph the mean reaction times for the two groups in the pretest and in the posttest.

Mean reaction times (in ms) in target region by condition (Same vs Different) and test (Pre vs Post).

Mean reaction times (in ms) in target region by condition (Same vs Different) and test (Pre vs Post).
Figure 5 illustrates that the gender-variable group did not benefit from the training on lexical gender with respect to the predictive use of grammatical gender. Even though there is a numerical advantage for the difference vs. the same trials in the posttest, this advantage does not increase compared to the pretest. In contrast, the gender-consistent group came to use gender marking on the determiner as a predictive cue in the posttest (Figure 6).
In sum, the findings of Experiment 1 bear out that all participants showed variability in lexical gender assignment in the production task of the pretest. In the comprehension experiment in the pretest, the L2 participants did not use grammatical gender as a predictive cue for agreement processing, although only trials in which the subjective gender assignment of the participants could lead to grammatical prediction were used for analysis.
In the training part, the L2 participants received training on noun labels and the corresponding gender assignment until they performed to criterion in the testing blocks at the end of the training session. Despite identical training on lexical gender assignment before the posttest, participants continued to exhibit variability in gender assignment in production in the posttest to different degrees.
In the comprehension experiment of the posttest, the L2 group as a whole demonstrated predictive use of grammatical gender agreement. However, correlational analyses showed that predictive use of gender on determiners was moderated by individual differences in the accuracy of lexical gender assignment in production. Learners who had learned to consistently assign the target gender to nouns employed gender marking on determiners for predictive agreement. In contrast, L2 learners who continued to exhibit variability in gender assignment in production failed to make predictive use of the subjective gender marking in comprehension.
5 Discussion
Experiment 1 showed clear effects of training on gender assignment on the predictive use of grammatical gender agreement. After massed exposure to grammatical gender in form-focused training activities, low-intermediate to advanced L1 English learners attained online sensitivity in the predictive processing of grammatical agreement even after short periods of training. In these respects, the present findings complement studies on training effects on gender violations in artificial language learning (Arnon and Ramscar, 2012; Morgan-Short et al., 2010) or on a very restricted set of determiner–adjective–noun sets in German (Davidson and Indefrey, 2011). In addition, even though the training presented decontextualized examples in a lab setting, the present findings mirror those of classroom-based studies on explicit and implicit instruction of gender (De Jong, 2005; for review, see Alarcón, 2013) and the emerging effects of online sensitivity to gender agreement after more exposure in instructional settings (e.g. Alemán-Bañón et al., 2014; see also Tokowicz and MacWhinney, 2005). Hence, research question 1 can clearly be answered in the affirmative in that training on determiner–noun sequences leads to predictive gender processing in adult L2 learners.
The second question in Experiment 1 was whether individual differences in lexical gender assignment affect the predictive processing of gender. In the pretest, there was no association between predictive processing of grammatical gender and lexical gender assignment, largely because there was no evidence of predictive use of grammatical gender in the L2 participants. In the posttest, however, the predictive use of gender was strongly related to the amount of target-like assignment of gender in production, i.e. the extent to which the recently acquired or reinforced gender assignment in the training session had been robustly represented such that it could be called upon in production. In contrast, learners with variable or non-target gender assignment did not benefit from training as regards predictive use of gender agreement. For the present set of intermediate to advanced instructed learners of German, Experiment 1 hence replicates contingencies between lexical gender assignment and the use of grammatical gender in prediction found for immersed advanced to near-native L2 learners of German in Hopp (2013).
The present experiment manipulated knowledge of lexical gender in training and could show that larger gains in the posttest on lexical gender assignment led to the emergence of predictive gender agreement. This key finding bolsters the hypothesis that lexical and syntactic variability in gender are related. Importantly, the results go beyond the finding that learners demonstrate online sensitivity to subjective rather than objective gender (Lemhöfer et al., 2014); rather, the overall mastery of lexical gender seems a prerequisite to the use of gender as a predictive cue in agreement processing.
In this vein, the data from the L2 learners in Experiment 1 resonate with findings for child L1 learners that lexical knowledge, i.e. overall vocabulary size, is the best predictor for syntactic online prediction over and above other factors, e.g. age (Borovsky et al., 2012; Mani and Huettig, 2012). In addition, Experiment 1 is in line with research about adaptivity in implicit learning according to prediction error (Fine et al., 2013; Fine and Jaeger, 2013; Jaeger and Snider, 2013). L2 learners who successfully learned the target gender assignment for the nouns used in the experiment in the training session employ gender as a predictive cue in agreement processing, because the cue leads to facilitative processing of the upcoming noun, i.e. to a more efficient way of completing the experimental task. In other words, the L2 parser adapts to the task by recruiting the recently acquired or strengthened knowledge of lexical gender in online processing.
Conversely, the learners with more variable gender assignment do not use their non-target gender assignment for prediction, even for those (many) trials in which it would lead to facilitatory prediction. If learners consistently used their variable and non-target gender assignment for prediction, they would make erroneous predictions that entail costly reanalysis for the parser (Hopp, 2013). In terms of parsing efficiency, error-based implicit learning then serves to align predictions in order to reduce future prediction error; the result being that learners who continue to exhibit variable gender assignment do not use gender for predictive agreement overall. From a psycholinguistic perspective, then, the finding that variability in gender assignment in production is associated with variability in gender agreement in predictive processing receives an explanation in terms of implicit learning based on prediction success and error.
In Experiment 2, we investigate whether the observed relation between the gender assignment and gender agreement processing in Experiment 1 holds up in different contexts, or whether it was accidental in L2 learners. After all, gender accuracy may have acted as a proxy for other learner characteristics such as proficiency, etc. Experiment 2 tests whether native speakers of German demonstrate a similar relation between lexical gender assignment accuracy and the predictive use of gender agreement. Unlike L2 learners, mature native speakers (1) are highly proficient in the target language, (2) have fully target-like, stable and readily accessible gender representations in their lexicons (dialectal variation aside), and (3) use gender for predictive agreement processing. In Experiment 2, we introduce gender assignment errors in the input to native speakers in a visual-world eye-tracking task. The introduction of non-target gender assignment in the input leads to prediction error if native speakers continued to use their (target) gender knowledge for anticipation. In this respect, Experiment 2 simulates the situation of the L2 group with variable lexical gender in Experiment 1. If, as argued, prediction error drives the (non-)use of gender agreement, we would expect it to engender implicit adjustment, i.e. attenuation, of predictive gender processing also in native speakers. Hence, we probe whether L2-like performance can be induced in native speakers by varying the task for natives (e.g. Hopp, 2010; McDonald, 2006).
IV Experiment 2
In Experiment 2, we address the following research question:
Does the introduction of lexical variability in gender assignment lead to attenuation of predictive gender processing in native speakers?
Experiment 2 employed a between-group design. Participants were assigned to one of two groups, the No Error group and the Error group. The latter received gender assignment errors in the input. By comparing the two groups, we test for effects of non-target gender assignment in the input on predictive gender agreement processing in native speakers of German.
1 Participants
We recruited 42 native speakers of German (mean age: 22.6 yrs (sd = 3.2); 32 females) for the experiment. All participants were students at a German university at the time of testing and received 5 Euros for taking part in the experiment.
2 Materials
For Experiment 2, all items from Experiment 1 were used. To show that participants in both conditions use gender for prediction before encountering trials containing gender errors, we added more items to Experiment 2. In total, we constructed 30 difference items and 30 same items as in Experiment 1. In addition, 30 items were added in which the differently coloured object was the target (filler condition). In this condition, the adjective was an unambiguous lexical cue to the referent. The items were assigned to two blocks, comprising 45 items each (15 of each type). The 15 difference items from Experiment 1 were assigned to the second block, along with the five same items from Experiment 1, 10 additional same items and 15 filler items. This way, we ensured we could compare predictive processing for the same set of items in Experiment 1 and the second block of Experiment 2. In all, 160 nouns were used in the experiment.
We created three lists which counter-balanced target noun and object position for the experimental items. Each of the lists was shown in two conditions to two participant groups: a No-error condition, in which the gender assignment of the objects named was always target-like, and an Error condition. In the Error condition, the target objects of 11 filler items in the second block were assigned a non-target gender, i.e. participants would hear 11 incorrectly gender marked determiner–noun combinations. This proportion of non-target gender assignment (11/45) corresponded approximately to the mean proportion of non-target gender assignment among the variable-gender group in the posttest in Experiment 1 (76%). Importantly, none of the experimental items in the Error condition contained gender assignment mistakes, since non-target gender assignment was restricted to filler items.
3 Procedure
The stimuli were recorded by the same male native speaker of German as in Experiment 1. The pace of the recordings was faster than in Experiment 1, adjusting for the fact that only native speakers were tested. The mean length of the determiner was 198 ms (sd = 28 ms), 606 ms (sd = 53 ms) for the adjectives and 681 ms (sd = 139 ms) for the nouns. The onset of the noun occurred 1,050 ms after determiner onset. The items were pseudorandomized in each block. Participants were tested individually in a quiet room with the same eye-tracker as in Experiment 1.
4 Analysis and results
The data from one participant in the Error group had to be excluded due to track loss, which left 41 participants for analysis (18 in the No-error and 23 in the Error condition). Mean reaction times, i.e. the first fixation on the target picture after determiner onset, were computed as in Experiment 1. Since the gender mistakes accrued over the course of the second block, we decided to break down the results by quartiles. Table 4 presents the mean reaction times in the different and same conditions by quartiles in the experiment. It also displays the size of the predictive gender effect, i.e. the mean difference between the same and the different conditions.
Mean reaction times (in ms, from determiner onset, standard error in parentheses) by quartile by group.
Note. Δ indicates predictive gender effect (i.e. difference score).
The graph in Figure 7 illustrates the development of the predictive gender effect in the two groups across the quartiles. The graph shows how the predictive gender effect increases in both groups over time, yet declines in the Error group in the final quartile. In a mixed linear regression model with the fixed factors Type (different vs. same) and Group (Error vs. No-error) as a fixed factors and participant and item as crossed random factors, there is a significant effect of Type (β = 143.40, SE = 27.87, t = 5.146, p < .001), but there was no main effect of Group or any interaction (all ts < 1). When adding the fixed factor Quartile (1–4), the three-way interaction of Type, Group and Quartile did not reach significance (β = 36.81, SE = 33.167, t = 1.110, p = .267). Given our specific hypothesis about possible effects of Group in the last time window, we computed separate analyses for each quartile. These results are given in Table 5.

Development of predictive gender effect (in ms, error bars show standard error) across quartiles.
Mixed regression analysis by quartile.
In Quartile 4, the interaction between Type and Group becomes marginally significant. Post-hoc pairwise comparisons in Quartile 4 show that the No-error group displays a highly significant effect of Type (β = 188.28, SE = 57.66, t = 3.265, p = .001), whereas the effect is not significant for the Error Group (β = 66.15, SE = 45.99, t = 1.438, p = .151). These findings bear out that the inclusion of gender assignment errors in the filler items in the Error Group affected the predictive effect of gender agreement in the experimental trials.
5 Discussion
Experiment 2 showed that native speakers robustly use gender for predictive agreement. As the time course of the effect in the No-error group demonstrated, the use of gender for predictive processing became enhanced throughout the experiment, as prediction proved to be successful and facilitated task completion. At the same time, the experiment yielded clear evidence that the introduction of non-target-like gender assignment in unrelated filler trials in the Error group engendered a sizeable attenuation of the predictive use of gender agreement in native processing in the final quartile.
The present results about the modulation of agreement processing complement previous findings in visual-world studies where gender errors increase competition in auditory comprehension (Van Heugten et al., 2012). Moreover, in an ERP study on gender agreement violations, Hanulíková et al. (2012) find that native Dutch listeners stop showing ERP components indexing sensitivity to gender agreement violations between determiners on nouns in the second half of the experiment, i.e. after they had encountered a large number (50%) of gender violations in the input (see also Hahne and Friederici, 1999 for effects of error probability on brain responses). In addition, Hanulíková et al. (2012) had a speaker with a strong non-native accent read the stimuli, and native Dutch speakers did not register any brain responses associated with ungrammaticality detection. These studies attest that attenuated sensitivity to gender agreement as a consequence of gender errors in the input is not specific to prediction. In conjunction, these findings demonstrate that experience about past encounters with speakers (i.e. native vs. non-native) and the input (error probabilities) affect the use of gender information. Similar findings have been reported in work on adaptivity in syntactic priming and garden-path sentences, where previous exposure leads to convergence in comprehenders to the input patterns in sentence comprehension and production (Fine et al., 2013).
In the present experiment, attenuated use of gender in predictive processing was observed after only a handful of gender mistakes in the filler trials. Hence, the effects of exposure on morphosyntactic processing were rapid, which suggests that the parser is very flexible in adapting to the input in an attempt to facilitate online language processing in immediate environments. For garden-path sentences, Wells et al. (2009) show that adaptivity to the statistics in the input lasts beyond the experimental session and perseveres at least for a couple of days. Morphosyntactic adaptation therefore seems to be a form of implicit learning, rather than a short-lived consequence of priming or episodic processing (e.g. Kaschak and Glenberg, 2004; for discussion, see Fine et al., 2013). In Experiment 2, adaptation to the input properties resulted in native speakers becoming insensitive to gender cues for predictive agreement processing. In the final quartile, the native speakers in the Error group replicated the performance pattern of the gender-variable L2 group in Experiment 1.
V General discussion
In two experiments, we probed the relation between lexical gender assignment and the predictive processing of gender agreement in non-native and native German. In Experiment 1, we found (1) that intermediate L1 English late learners of German can come to show predictive processing of gender agreement after training on lexical gender assignment, and (2) that the accuracy in gender assignment moderates predictive gender agreement. These results add to previous findings in Grüter et al. (2012) and Hopp (2013) on more advanced L2 learners, and they suggest that lexical and syntactic variability in L2 gender processing are correlated across the proficiency spectrum. Following Hopp (2013), we interpreted this relation as reflecting implicit learning of the parser to adjust for prediction success and errors. L2 learners who acquire overall target lexical gender for nouns (used in the experiment) come to exploit this lexical knowledge for predictive agreement processing, because doing so reliably facilitates comprehension. In other words, once lexical gender assignment is target-like in the lexicon, predictive processing by gender agreement ensues. In contrast, with non-target-like gender assignment in a sizeable part of the paradigm, L2 learners do not even use gender in predictive processing for the items for which they assign the correct gender. Overall, using gender assignment for gender agreement prediction would incur prediction errors for the items with non-target subjective gender assignment, such that the parser adapts its predictive capacities according to experience (see also Fine et al., 2013). In the context of Experiment 1, such adaptation would lead to the non-use of (subjective) gender as a predictive cue. 1
Whereas such an account is compatible with the results, the correlation between lexical and syntactic variability in Experiment 1 does not imply a causal link. Despite the end-of-training test at which all L2ers performed to criterion on gender assignment, it is not certain whether the lexical and syntactic representations of gender were target-like or robust enough in the L2 participants, such that they could be used for predictive processing at that point in principle. In consequence, the failure of the gender-variable group to employ gender predictively may not have been the result of error-driven implicit learning but could mean that gender was too weakly represented in the L2 mental lexicon or that the choice of gender in L2 production does not equal the use of gender in (predictive) comprehension.
In Experiment 2, we therefore tested a group of native speakers of German, whose lexical and syntactic representations of gender were uncontroversially target-like and who make robust use of gender agreement in predictive processing. In the between-group blocked design of Experiment 2, native speakers stopped using gender predictively once gender had become instable in irrelevant parts of the experiment, i.e. the filler trials. Importantly, attenuation in the predictive use of gender was not conditioned by actual prediction failure in the experimental trials, since prediction would always be correct in the experimental trials, i.e. continued prediction would lead to facilitation in comprehension. Yet, listeners appeared to adapt automatic processing routines based on the computations of probability estimates of gender reliability in the overall input. These findings are in line with previous research on effects of experience and adaptation in sentence comprehension and they extend adaptation effects to gender processing (e.g. Fine et al., 2013; Jaeger and Snider, 2013; Kamide, 2012). With respect to morphosyntactic variability in L2 adult processing, the results from native speakers in Experiment 2 suggest that L2-like performance on predictive gender agreement can be emulated in native processing when the input conditions in the experiment begin to resemble the structure of gender representations in the L2 lexicon: Specifically, (subjective) gender becomes an unreliable cue because gender assignment is not always target-like. Such a cue has decreased predictive value in language processing since the forward transitional probability of the subjective, but non-target, gender representation in the learner corresponds to zero in the target language. Under these circumstances, native speaker performance parallels L2 behaviour in that prediction according to gender is suspended.
Previous research on L2 inflectional variability that sought to emulate L2-like performance in native speakers relied on task manipulations that put natives under increased processing pressure by speeding up stimulus presentation, restricting response times and/or degrading the stimulus (e.g. Hopp, 2010; Lopez-Prego and Gabriele, 2014; McDonald, 2006). Such measures served to recreate L2 capacity constraints in real-time language processing in native speakers. In contrast, the present study did not address capacity aspects of (non-)native processing; rather, Experiment 2 aimed to simulate the learning experience of adult L2 speakers, namely, the less robust, variable and partially non-target representation of gender assignment. Under these conditions, use of extant gender representations is attenuated in native speakers as a result of error-based implicit learning. Current probabilistic models of prediction (e.g. Kuperberg and Jaeger, 2015) hold that predictions are generated in multirepresentational hierarchical frameworks according to different types of information, including internal lexical representations (as for the L2 speakers in Experiment 1) and beliefs about external (speaker) lexical representations (as for the native speakers in Experiment 2). Once these representations become unstable, prediction is adjusted by listeners because of its reduced utility in language processing. In such frameworks, internal and external variability in lexical gender assignment can lead to the same adaptive consequences in predictive processing.
Nevertheless, one may wonder whether the provision of non-target input in native speakers is a suitable means of emulating instable and non-target gender in non-native learners. We submit that there are many parallels. For instance, L2 speakers’ own output of gender in production, which then becomes (part of) the input, is often non-target-like, and, second, adult L2 learners, especially instructed learners, are often exposed to non-target gender from their L2 peers and/or from non-native instructors. In these respects, the input conditions of the Error group in Experiment 2 bear some resemblance to L2 learner input. Importantly, such inconsistent input forms the basis (1) from which the L2 parser needs to extract grammatical regularities of the target language and (2) according to which the L2 parser computes form-based and rule-based processing strategies (for discussion, see Omaki and Lidz, 2015; Phillips and Ehrenhofer, 2015). We argue that the present findings illustrate that the conditions under which the parser extracts information for acquisition from the input merit serious consideration.
According to the lexical gender learning hypothesis by Grüter et al. (2012) and Hopp (2013), these input conditions differ systematically between children and adults, as outlined in Section II. As a consequence, gender representations are less stable in that adults create weaker links between nouns and gender nodes (e.g. Gollan et al., 2008), which leads to gender assignment being partially erroneous or variable, such that gender will not be used as a predictive cue in agreement processing. This account captures the data on lexical and syntactic gender in Grüter et al. (2012), Hopp (2013), the findings on learning grammatical gender in Experiment 1 for L2 learners as well as the results about the un-learning of gender as a predictive cue in mature native speakers in Experiment 2. 2 At the same time, the hypothesis does not posit any fundamental differences in the neurocognitive architecture between native and non-native processing and can account for variation in predictive gender processing in both native and non-native speakers.
In addition, the results have methodological and theoretical implications. First, the failure to obtain target (or native-like) behavioural or neurophysiological responses to gender agreement violations does not necessarily point to lack of or impairment in morphosyntactic representations (e.g. Hawkins and Casillas, 2008; Hawkins and Chan, 1997) or morphosyntactic processing routines (e.g. Jiang, 2007; Ullman, 2005). The present experiments show that short-term training can lead to the native-like online integration of gender representations in L2 learners in predictive processing. Conversely, minor changes in the statistics of the input can lead to L2-like insensitivity to gender marking in native processing. In conjunction, these findings strongly argue against fundamental neurocognitive differences between native and adult non-native speakers. In this respect, testing for adaptivity in the online integration of inflection seems to be a fruitful strategy for probing the extent to which native-non-native differences are gradual and hence easily manipulable through training or instruction, or whether they remain resistant to changes in the input, indicating that they demarcate qualitative differences in the architecture and the processes in L1 and L2 sentence comprehension.
Second, the present findings also suggest that non-target performance in inflectional morphology does not reduce to learners failing to map inflectional forms onto syntactic features in online processing (e.g. Hopp, 2010; Prévost and White, 2000). Even for correct form assignment of gender, the L2 parser does not appear to recruit this information in comprehension. While it is true that subjective knowledge of gender assignment at the level of the individual lexical item is a prerequisite for the use of gender in agreement processing by (L2) learners (Lemhöfer et al., 2014), overall knowledge of lexical gender affects the efficacy of gender cues in language processing. For very advanced learners, the subjective knowledge of gender is close to the target system, such that gender can be recruited for predictive use in language processing. For less proficient learners, however, gaps or erroneous assignments in the lexicon will curtail the deployment of gender in agreement processing. Acquiring target gender in the lexicon presents a particular challenge for learners of languages that have a great degree of syncretism and that conflate case, gender, definiteness and number marking on determiners, such as German. Given equal levels of proficiency, processing of gender should be comparatively harder in, e.g., German than in Spanish or other languages that are more regular and transparent in gender marking.
In consequence, future research should investigate the extent to which the lexical gender learning account translates to other L1–L2 pairings, and, in particular, how L1 transfer in the lexical realization of gender, i.e. gender congruency (e.g. Bordag et al., 2006), and L1–L2 differences in the morphosyntactic realization of gender (e.g. Foucart and Frenck-Mestre, 2011), interact with predictive processing of gender agreement. Experiments along these lines are under way at present. In addition, it will be important to study how the account generalizes to other aspects of L2 morphosyntactic processing beyond gender. In any case, we believe that the current set of experiments highlights the need for research on L2 inflectional variability to consider how input and instruction interact with learning and processing mechanisms in adult L2 acquisition in order to understand why some aspects of the target language remain persistently hard for L2 learners.
In sum, two experiments about (un-)learning grammatical gender in German showed that variability in lexical gender assignment moderates predictive gender agreement processing. These results point to the interaction of lexical and syntactic features in L2 acquisition and L2 processing, and they underline that the nature of both lexical and syntactic representations should inform future comprehensive approaches to inflectional variability in L2 processing.
Footnotes
Appendix
Experimental items in Experiment 1.
| Targets |
Fillers | ||
|---|---|---|---|
| Masculine | Feminine | Neuter | |
| Käse ‘cheese’ | Rose ‘rose’ | Fenster ‘window’ | Glas ‘glass’ |
| Zahn ‘tooth’ | Karte ‘card’ | Messer ‘knife’ | Auge ‘eye’ |
| Mantel ‘coat’ | Karotte ‘carrot’ | Radio ‘radio’ | Trompete ‘trumpet’ |
| Hut ‘hat’ | Flasche ‘bottle’ | Glas ‘glass’ | Kissen ‘pillow’ |
| Apfel ‘apple’ | Gabel ‘fork’ | Schiff/Boot ‘ship/boat’ | Nase ‘nose’ |
| Salat ‘salad’ | Axt ‘axe’ | Haus ‘house’ | Brille ‘spectacles’ |
| Baum ‘tree’ | Lampe ‘lamp’ | Blatt ‘leaf’ | Ohr ‘ear’ |
| Schrank ‘cupboard’ | Banane ‘banana’ | Sofa ‘sofa’ | Buch ‘book’ |
| Schirm ‘umbrella’ | Wurst ‘sausage’ | Brot ‘bread’ | Tisch ‘table’ |
| Fernseher ‘television’ | Hose ‘trousers’ | Klavier ‘piano’ | Stift ‘pen’ |
| Stuhl ‘chair’ | Blume ‘flower’ | Ei ‘egg’ | Elefant ‘elephant’ |
| Schuh ‘shoe’ | Uhr ‘watch’ | Flugzeug ‘plane’ | Teufel ‘devil’ |
| Löffel ‘spoon’ | Tür ‘door’ | Fahrrad ‘bicycle’ | Tiger ‘tiger’ |
| Pullover ‘pullover’ | Tasse ‘cup’ | Hemd ‘shirt’ | Engel ‘angel’ |
| Schlüssel ‘key’ | Tablette ‘pill’ | Bett ‘bed’ | Junge ‘boy’ |
| Fuß ‘foot’ | Straße ‘street’ | Kreuz ‘cross’ | Hand ‘hand’ |
| Zug ‘train’ | Treppe ‘staircase’ | Telefon ‘telephone’ | Tafel ‘blackboard’ |
| Hammer ‘hammer’ | Pizza ‘pizza’ | Zelt ‘tent’ | Bus ‘bus’ |
| Computer ‘computer’ | Kirche ‘church’ | Bier ‘beer’ | Teller ‘plate’ |
| Koffer ‘suitcase’ | Ampel ‘traffic lights’ | Bild ‘picture’ | Ball ‘ball’ |
Acknowledgements
I would like to thank Carmen Lukoschek, Marlene Schulz and Nico Lindheimer for assistance with stimulus preparation and data collection for Experiment 2. I also thank members of the CLS colloquium at Penn State University, USA, the audiences at the International Conference on Multilingualism at McGill (2013) and at BUCLD 40 and Theres Grüter for helpful discussion of different aspects of the matters discussed in the article. Any errors or inaccuracies remain solely my own faults.
Declaration of conflicting interest
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
