Investigating auditory processing of syntactic gaps with L2 speakers using pupillometry

Abstract

According to the Shallow Structure Hypothesis (SSH), second language (L2) speakers, unlike native speakers, build shallow syntactic representations during sentence processing. In order to test the SSH, this study investigated the processing of a syntactic movement in both native speakers of English and proficient late L2 speakers of English using pupillometry to measure processing cost. Of particular interest were constructions where movement resulted in an intermediate gap between clauses. Pupil diameter was recorded during auditory presentation of complex syntactic constructions. Two factors were manipulated: syntactic movement (such that some conditions contained movement while others did not), as well as syntactic movement type (either causing an intermediate gap or not). Grammaticality judgments revealed no differences between the two groups, suggesting both were capable of comprehending these constructions. Pupil change slope measurements revealed a potential sensitivity to intermediate gaps for only native speakers, however, both native and late L2 speakers showed similar facilitation during processing of the second gap site. Acoustic analysis revealed potential acoustic cues that may have facilitated the processing of these constructions. This suggests that, contrary to the predictions of the SSH, late L2 speakers are capable of constructing rich syntactic representations during the processing of intermediate gap constructions in spoken language.

Keywords

filler gap dependency intermediate gap L2 sentence processing pupillometry shallow structure hypothesis

I Introduction

Typical (canonical) English word order is subject–verb–object, with the first noun phrase (NP) in a construction being the subject of the verb and the second NP being the object. For example, in (1) below, the subject John occurs before the verb brought, which is followed by the object cookies.

(1) John bought cookies yesterday after school.

However word order is not always enough to establish thematic roles (i.e. who is doing what to whom): elements within a sentence like (1) can be transformed into a question by replacing the object with what as seen in the Example (2) below, or by moving what to the front of the question (3).

(2) John bought what yesterday after school?

(3) What_i did John buy t_i yesterday after school?

Constructions like (3) are known as filler gap dependencies, and it is generally assumed that these types of constructions are formed via movement operations (Chomsky, 1965): the filler (what_i) has moved from its position after the verb, leaving a gap with a trace (t_i) at the movement site. The trace is phonologically silent but syntactically relevant as it is through the trace that grammatical properties are assigned to the filler what. The term filler gap dependency arises because correct interpretation of the role of the filler is dependent on the gap site. Some theorists argue for a traceless theory of grammar in which the filler is reactivated upon reaching the subcategorizing verb; Pickering and Barry (1991) call this the Direct Association Hypothesis (DAH). Nevertheless, there is a growing body of evidence supporting a grammar that contains filler gap dependencies (e.g. Clahsen and Featherson, 1999; Hestvik et al., 2007; Lee, 2004).

Most research into the processing of filler gap dependencies has focused on the speaker’s native language (L1). However, recent studies have also considered how second language learners process these types of dependencies in their second language (L2). Given their complexity, filler gap dependencies are an ideal construction to test theories of L2 language processing: by investigating the language processing of L2 learners we can determine whether they are able to acquire the same complex linguistic skills as native speakers. In the current study, we examine how late proficient L2 learners of English and native speakers process a type of filler-gap dependency known as an intermediate gap. We use a novel psychophysiological measure called pupillometry, in order to test a theory of L2 processing known as the Shallow Structure Hypothesis. Previous research testing this hypothesis has relied on self-paced reading using written stimuli, in contrast the current study investigates spoken language processing using pupillometry in order to garner new insight into L2 processing of spoken language. Before the empirical study is presented, we first outline the Shallow-Structure Hypothesis (SSH). This is followed by a discussion of the research comparing native and L2 processing of filler gap dependencies, particularly focusing on intermediate gap constructions. Finally, the role that the auditory modality may play during processing of filler gap dependencies will be briefly discussed.

1 Shallow Structure Hypothesis

In an attempt to form an empirically based model of L2 language processing, Clahsen and Felser (2006) reviewed a series of studies comparing adult native speakers, child native speakers, and L2 language learners. In terms of syntax, Clahsen and Felser suggested that L2 speakers rely more on pragmatic and lexical-semantic information for sentence processing, as opposed to native speakers who rely on syntactic information. While L2 comprehension (in offline studies) appeared native-like, online processing diverged from that of native speakers, and these differences could not be explained by working memory restrictions, influence from their native language, incomplete acquisition of the L2 grammar, nor from slower processing speeds.

In light of these observations, Clahsen and Felser (2006) put forward the Shallow Structure Hypothesis (SSH) to explain the grammatical processing of L2 learners. According to the SSH, unlike native speakers, L2 learners do not compute full syntactic representations during comprehension, but rather construct shallow syntactic representations. This in turn leads to greater reliance on non-syntactic information such as lexical, pragmatic, and real word knowledge during comprehension in L2 learners’ non-native language. Many of the studies investigating the SSH (including the current study) have focused on a type of filler gap dependency that contains an intermediate gap.

2 Intermediate gaps

As noted earlier, a gap in a filler gap dependency involves an element moving from its canonical position and leaving a trace of itself; see (3) above. An intermediate gap is involved if the dependency spans across more than one clause. For example, in (4) the dependency of who_i spans across the clause (the consultant claimed) and hence an intermediate gap t’_i occurs at the boundary between this clause and the next (the new proposal had pleased (the manager)).

(4) The manager who _i the consultant claimed t’_i that the new proposal had pleased t_i will hire five workers tomorrow.

The longer the distance between a filler and a gap, the more resources are needed to process the filler gap dependency; however, when an intermediate gap is present, it breaks up the filler-gap dependency into two shorter dependencies (who_i to t’_i, and t’_i to t_i). This aids comprehension and relieves working memory, thus facilitating the later integration of filler and gap (t_i). Gibson and Warren (1999) tested filler gap dependencies with an intermediate gap site (4) and compared these constructions to sentences without intermediate gaps like that shown in (5). In (4) the moved wh-element is extracted across a verb phrase (VP) (the consultant claimed), thus signaling a new cyclical domain (an intermediate gap is present), while in (5) the extracted wh-element is extracted across a noun phrase (NP) (the consultant’s claim) and no intermediate gap is present. Additionally, the authors included baseline conditions that included no movement; (6) and (7).

(5) The manager who _i the consultant’s claim about the new proposal had pleased t_i will hire five workers tomorrow.

(6) The consultant claimed that the new proposal had pleased the manager who will hire five workers tomorrow.

(7) The consultant’s claim about the new proposal had pleased the manager who will hire five workers tomorrow.

Using a self-paced reading task, Gibson and Warren found that intermediate gap sites did, indeed, facilitate filler and gap integration at the second gap site (following pleased) as shown by shorter total reading times for this segment (had pleased in 4 compared to 5). At the intermediate gap site (e.g. claimed in 4) they found longer reading times compared to the corresponding segment in the non-movement condition (6), suggesting that the participants did reactivate the filler at this site.

Marinis et al. (2005) investigated how constructions with intermediate gap sites were processed by L2 learners of English, testing native speakers of Greek and German (languages that employ wh-movement), as well as native speakers of Chinese and Japanese (languages that do not employ wh-movement). If L2 learners were only establishing shallow representations of syntactic structures (as proposed by Clahsen and Felser, 2006) then Marinis et al. hypothesized that L2 learners would not make use of intermediate gap sites. The authors used the same movement stimuli as Gibson and Warren (1999) but attempted to improve the non-movement stimuli such that the non-movement conditions had the same number of words between the sentence onset and the embedded verb making comparisons easier, as seen in (8)–(11).

(8) Movement, verb phrase (filler gap, intermediate gap): The nurse who _i the doctor argued t’_i that the rude patient had angered t_i is refusing to work late.

(9) Movement, noun phrase (filler gap, no intermediate gap): The nurse who _i the doctor’s argument about the rude patient had angered t_i is refusing to work late.

(10) Non-movement, verb phrase: The nurse thought the doctor argued that the rude patient had angered the staff in the hospital.

(11) Non-movement, noun phrase: The nurse thought the doctor’s argument about the rude patient had angered the staff at the hospital.

Marinis et al. recorded reading times and comprehension accuracy using a non-cumulative moving-window procedure (Just et al., 1982). Both L1 and L2 speakers of English showed similarly high comprehension accuracy (no differences were found between any of the L2 groups or the L1 group). There was also evidence of a slowing in reading time at the gap in the movement conditions (following angered in 8 and 9) for all groups, suggesting that the filler was being associated with the subcategorizing verb. However, while native speakers showed a slowing of reading at the intermediate gap site in the VP movement condition (following argued in 8) compared to the corresponding segment in the non-movement condition (following argued in 10), the L2 speakers showed no such slowing. Importantly at the final gap site (following angered in 8 and 9), there was an interaction between phrase type and movement at this site for the L1 group, with the VP movement condition (which contains an intermediate gap) eliciting faster reading times than the NP movement (which contains a gap but no intermediate gap), and with no difference between the two non-movement conditions (with no filler gap dependencies) which both elicited a faster reading time than the movement conditions. The authors argued that the native speakers’ slowed reading time at the intermediate gap site in VP movement sentences was evidence of filler reactivation which in turn led to faster reading at the subsequent gap site. While the L2 group did show evidence of associating the filler with the subcategorizing verb in terms of longer reading times at the final gap site for movement compared to non-movement conditions, they did not show any reading time differences at the final gap site between the VP and NP movement conditions, and they did not show any interaction between phrase type and movement. This suggests that the intermediate gap in the VP movement conditions did not benefit the L2 learners in integrating the filler with the gap (e.g. after angered in 8).

Overall, these results led Marinis et al. to argue that L2 learners, even those whose native language contains similar syntax, employ a lexically driven gap filling strategy (in which the filler is linked to the lexical subcategorizer). They do not employ a syntactically driven strategy in which the filler is linked to the gap and indirectly linked to the subcategorizer. L2 learners appeared not to be forming filler gap dependencies like native speakers. Rather, they seemed to be using a direct association in which they form a dependency between the filler and lexical subcategorizer (i.e. DAH, Pickering and Barry, 1991), in line with the SSH.

However, Dekydtspotter et al. (2006) expressed some concerns regarding the SSH; they argued that delayed or slowed processing in L2 speakers may make comparisons between native and L2 language processing problematic. For example, L2 research focusing on critical segments may be reflecting different moments in processing for L1 and L2 speakers. Dekydtspotter et al. (2006) re-analysed the data from the Marinis et al. (2006) study, examining the segment following the intermediate gap (that the rude patient in example 8) to see whether there were any delayed effects in the L2 groups. They found, in this segment, slowed reading time in the VP (intermediate gap) movement condition for the Japanese and German native speakers (but they did not see this slowing in Greek and Chinese native speakers). This suggests that at least the Japanese and German groups were reactivating the filler at the intermediate gap site similarly to native speakers, but that the reactivation was slightly delayed. Interestingly, the Japanese language does not have intermediate gaps in its syntax (wh-movement), while German does, suggesting that the ability to use intermediate gaps is not dictated by the speaker’s native language.

Dekydtspotter et al. (2006) argued, therefore, that the SSH is unable to explain the results from the Marinis et al. (2006) study, given that there seems to be a delayed computation of immediate gap sites (for at least some of the L2 learners). They also argued that the lack of evidence for delayed facilitation in Chinese and Greek speakers did not warrant the argument for shallow processing in these groups. This example highlights that observed differences in reading times at a critical segment may not in fact reflect disparate processing systems between native and L2 learners. It also highlights the need for further controlled and well designed studies to explore the underlying comprehension systems of L2 speakers.

More recently, Pliatsikas and Marinis (2013) investigated the role that previous naturalistic language exposure plays in L2 syntactic processing using the intermediate gap stimuli from the Marinis et al. (2005) study. A group of native English speakers and two groups of Greek L2 learners of English participated in their study (one group had only classroom exposure, and the other had approximately 9 years of naturalistic English exposure, but both groups had the same level of English proficiency). They found evidence that only those who learned L2 in a naturalistic setting showed evidence of making use of intermediate gaps, although the processing of the intermediate gap seemed to be delayed (as evidenced by increased reading time following the intermediate gap). Nevertheless, the intermediate gap ultimately facilitated the reintegration of the filler at the subcategorizing verb (as evidenced by shorter reading times at the final gap). These findings suggested that with adequate naturalistic exposure, L2 learners of English are capable of employing native-like syntactic processing contrary to the SSH.

To summarize, research has investigated the processing of filler gap dependency constructions with intermediate gaps sites in native speakers and second language learners. The SSH was highlighted as an influential theory that assumes L2 learners build shallow representations in which full syntactic representations are not constructed during comprehension. This, in turn, causes an over-reliance on non-syntactic information (unlike native speakers). While some data supports the SSH in terms of intermediate gap processing, several studies do not support the SSH (e.g. Dekydtspotter et al., 2006; Pliatsikas and Marinis, 2013). Given these contradictions, it is important to further test the SSH by investigating the processing of intermediate gaps, in order to form a more accurate and comprehensive theory of second language acquisition and processing. Additionally, previous studies have used written language to test the SSH, and general claims on processing differences between native and non-native speakers should not just be based on research on the processing of written language.

II Study aims

In the current study, pupillometry is employed to investigate the processing of intermediate gap constructions with native and L2 speakers of English using the same stimuli as Marinis et al. (2005) but with auditory sentence presentation. The aim is to test the extent to which the SSH could be extended to the processing of spoken language and thus to provide further insight into second language acquisition and processing. The previously outlined, and contradictory, research investigating L2 processing of intermediate gaps, has all employed a self-paced reading paradigm. The current study employs a novel auditory stimulus presentation which is closer to ‘natural’ linguistic conditions than self-paced reading as a continuous speech signal is presented, while measuring pupil change, with the aim of extending the test of the SSH to the modality of spoken language.

Pupillometry is a psychophysiological measure that involves recording pupil diameter change in response to a stimulus. Change in pupil diameter is believed to reflect the resources expended during mental activity with larger pupil size reflecting greater use of cognitive resources (e.g. Hess and Polt, 1960; Kahneman and Beatty, 1966). Pupillometry has been shown to be an effective method for measuring cognitive processing load in a variety of linguistic tasks: from simple word recall (e.g. Kahneman, 1973; Kahneman and Wright, 1971) to sentence processing (e.g. Fernandez, 2013; Just and Carpenter, 1993). Both Fernandez (2013) and Just and Carpenter (1993) showed that pupil size was an index of processing difficulties associated with forming filler gap dependencies. Investigating the processing of these constructions using pupillometry provides us with a new methodology with a more naturalistic auditory stimulus presentation compared to self-paced reading. Consequently, here we extend this novel methodology to intermediate gap constructions where it gives a unique psychophysiological measure of the difficulty associated with filler gap dependencies in these constructions. Additionally, we use pupillometry with aurally presented stimuli, which is more natural than self-paced reading (as far as we are aware this is the only study investigating intermediate gaps to have used aurally presented stimuli). This will provide a more natural and temporally sensitive measure to the phenomena in question, and therefore provide us with a new insight into L2 processing that was not available in previous research.

We hypothesize that there will be an increased slope of pupil change at the intermediate gap site in the VP movement condition (following argued in Example 8: The nurse who _i the doctor argued t’_i that the patient had angered t_i is refusing to work late) compared to the corresponding segments in the two non-movement conditions, and the NP movement condition. If participants are sensitive to the intermediate gap site, a steeper slope of pupil change at this site for the VP movement condition is predicted, as there should be an increase in processing load upon reactivation of the filler at the intermediate gap. This is the site in the VP movement condition where the dependency is broken up and therefore the pupil will dilate reflecting the increased processing load caused by the reactivation of the filler. There should be no such reactivation for the other conditions, given that there is no intermediate gap in the NP movement condition, and there are no gaps in the non-movement conditions. If, as predicted by the SSH, L2 speakers are not sensitive to the intermediate gap, no differences should be seen between the NP and VP movement conditions for this group.

At the second critical segment (following angered), we hypothesize there will be a larger decrease in slope of pupil change for the movement conditions (where there is a filler gap dependency) compared to the non-movement conditions. This is because processing load will be reduced as the parser associates the filler and the gap, which will reduce load on working memory given that the parser no longer has to hold the filler in working memory (Fernandez, 2013; Just and Carpenter, 1993), and consequently less pupil dilation. Importantly, we hypothesize an increased slope of pupil change at this segment for the NP movement condition compared to the VP movement condition, given that, as stated above, the intermediate gap site in the VP condition should facilitate the subsequent gap processing leading to less dilation of the pupil. If the L2 group is not sensitive to the intermediate gap, as predicted by the SSH, there should be no differences between the two movement conditions.

III Methods

1 Participants

a L2 Speakers

Thirty students recruited from the University of Potsdam in Germany participated for payment or as part of their undergraduate course requirements. All participants were native speakers of German and had normal or corrected-to-normal vision. All participants took the Oxford Online Placement Test to measure their English proficiency, and only those who were at a C1 or C2 level based on the Common European Framework of Reference for Languages were included in the experiment (individuals who score in these levels are considered proficient users; Council of Europe, 2001). Of the 30 participants, eight participants did not meet the proficiency criteria; therefore 22 participants (17 female) were included in the analysis.

L2 speakers were selected to be late learners of English with limited naturalistic exposure to English, in order to test those L2 speakers who are most likely to make use of shallow processing (Clahsen and Felser, 2006; Pliatsikas and Marinis, 2013). The SSH predicts that while late L2 learners may exhibit high proficiency, they do not postulate syntactically defined gaps in the same way as native speakers (and early L2 learners) do. In the current study participants were late learners of English with an average of 9 years of English education (SD 4.8 years). Only one of the 22 participants had spent significant time in an English speaking country and this was for 10 months as an adult. This was the only participant that had even limited adult naturalistic exposure to English, all other participants had no naturalistic English exposure.

b L1 Speakers

Fourteen participants (8 female) recruited from the University of Potsdam and from the Berlin area participated for payment. All participants were native speakers of English and had normal or corrected to normal vision. Participants were asked to provide a self-reported measure of the percent of time that they currently used English in their daily lives: participants reported an average 87.14% (SD 14.19%) of their time currently spent speaking English. Suggesting that, despite living in Germany, most of the L1 participants did not use German frequently, 10 of the participants used English 90% of the time or more (no participant reported using English less than 60% of the time).

2 Stimuli

There were four sentence types, as seen in Table 1. This task consisted of 50 trials: four practice trials, 20 critical trials (five of each sentence type), and 26 fillers (eight of which were grammatical and 18 of which were ungrammatical). The sentences were recorded by a female native speaker of American English in the phonetics lab at the University of Potsdam with a sampling rate of 44,100 Hz and saved as .wav. The recordings were randomly placed into four versions and rotated in a Latin square design; the practice trials were the same across the four versions. Critical items were counterbalanced across participants so each participant was presented with one condition of each item, and order of the critical items was randomized across versions.

Table 1.

Example sentences for each of the four experimental conditions.

	Verb phrase	Noun phrase
Movement	The nurse who_i the doctor argued t’_i that the patient had angered t_i is refusing to work late.	The nurse who_i the doctor’s argument about the rude patient had angered t_i is refusing to work late.
Non-movement	The nurse thought the doctor argued that the rude patient had angered the staff in the hospital.	The nurse thought the doctor’s argument about the rude patient had angered the staff at the hospital.

The critical items were identical to those used by Marinis et al. (2005). There were four critical conditions contrasting movement and phrase type (see Table 1); in the movement conditions the first NP (the nurse) preceded a relative clause, this relative clause began with a wh-pronoun, which was the object of the verb (angered). In the movement VP conditions there was an intermediate bridge verb (e.g. argued) between the filler (who_i) and the gap site (t_i), this verb type allowed wh-movement thus forming an intermediate gap (t’_i). This was not the case in the movement NP conditions, and therefore there was no intermediate gap site between who and the gap site following angered. The non-movement conditions were formed from the movement items by removing the filler gap dependency and adding an additional level of embedding to avoid differences in the structural complexity between the movement and non-movement conditions. Given that the distance between the filler and the gap differed in the two movement conditions, the non-movement conditions mirrored the differences and hence controlled for this confound during analysis.

3 Acoustic properties of the stimuli

An inherent question that arises concerning filler gap dependencies is how does the parser identify these phonologically silent gaps? Research has found that prosody plays an important role in the processing of syntactic ambiguity, for example, intonational phrase boundaries (e.g. Engelhardt et al., 2010; Ferreira, 1993; Nagal et al., 1994; Watson and Gibson, 2004). Nagal et al. (1994) suggest that during auditory processing, prosodic information aids in identifying the location of the gap. Prosody refers to the metrical (stress and duration) and intonational (pitch and fundamental frequency) components of speech (Ferreira, 1993). Nagal et al. compared spoken filler gap dependency constructions such as, Which doctor_i did the supervisor call t_i to get help for his youngest daughter? to constructions without a gap after call, such as, Which doctor_i did the supervisor call to get help for t_i during the crisis? This allowed the authors to compare the acoustic properties of the same verb call either followed by a gap or followed by no gap. When the verb was immediately followed by a gap they found an increase in both verb and pause duration compared to the no gap condition. Nagal et al. argued that perhaps gaps are not entirely phonologically silent, but prosody plays an important role during gap identification.

It is possible that the prosodic cues available in the auditory modality (that are absent in the previous self-paced reading studies) may influence the processing of the items in this study; therefore an acoustic analysis of the sentence materials presented in this study was run to investigate prosodic differences between the VP movement condition and non-movement condition. These potential prosodic differences may provide gap location cues (that are not available in the written modality) to L2 speakers and aid in the processing of filler gap dependencies. While the prosodic properties of the items were not actively manipulated in the current study, post hoc acoustic analysis was performed to investigate whether there were prosodic differences that may have provided cues for the gap sites.

Previous studies have shown that intonational phrase boundaries aid in syntactic processing, and that the duration of the verb and the pause in a filler gap dependency may aid in gap identification. Therefore, four regions of interest were investigated to compare the VP movement and VP non-movement conditions: the duration of the verb angered in both 12 and 13 below, the pause following the intermediate gap in the movement VP condition compared to the corresponding area in the non-movement VP condition following argued (the t’_i in 12 and # in 13), the duration of the verb angered and the pause following it in the VP movement vs. non-movement condition (t_i in 12 and ## in 13). Additionally a fifth region was investigated, the length of the pause following the actual gap site was compared between the two movement conditions (t_i in 12 and 14).

(12) Movement, VP (filler gap, intermediate gap): The nurse who _i the doctor argued t’_i that the rude patient had angered t_i is refusing to work late.

(13) Non-movement, VP: The nurse thought the doctor argued # that the rude patient had angered ## the staff in the hospital.

(14) Movement, NP (filler gap, no intermediate gap): The nurse who _i the doctor’s argument about the rude patient had angered t_i is refusing to work late.

The first region was chosen in order to investigate whether there were any differences in the length of the verb (before the intermediate gap) that signaled the upcoming gap. The second region of interest was the pause at the site of the intermediate gap that may provide a cue to the listener compared to the non-movement VP condition. The third and fourth regions were chosen to investigate whether there were significant differences in duration of the verb angered and pause following the final verb, which may have provided additional cues signifying the gap in the movement condition. The fifth region was chosen to test whether there were significant differences in duration of the pause at the final gap site between the two conditions containing a filler gap dependency, which may also have provided additional cues. The five regions of interest were identified in each item in Praat (Boersma and Weenink, 2016) and the duration of the regions were measured. Times were submitted to a paired sample t-test in R (R Core Team, 2012).

Comparing the regions of interest in the two VP conditions tested whether there were acoustic differences between the movement and non-movement conditions (that may provide cues to the listener during processing). There was no significant difference in duration of the first verb (argued) (t(19) = 1.578, p = 0.13) in the non-movement condition (M: 498 ms, SD 166.84 ms) compared to the movement VP condition (M: 546 ms, SD: 129.79 ms). But there was a significant difference in duration at the site of the intermediate gap and the corresponding segment in the VP non-movement condition (t(19) = 2.45, p < 0.05), with the movement VP condition having a longer duration (M: 172 ms, SD: 89.62 ms) compared to the VP than non-movement condition (M: 121 ms, SD: 63.60 ms). The duration of the second verb (angered) was significantly different (t(19) = 10.89, p < 0.001) in the two conditions, with the verb in the VP movement condition having a longer duration (M: 558 ms, SD: 92.91 ms) compared to the non-movement condition (M: 404 ms, SD: 87.77 ms). The duration following the verb angered was significantly different (t(19) = 10.43, p < 0.001), with the VP movement condition (265 ms, SD: 102.01 ms) have a longer pause duration than the non-movement condition (M: 25 ms, SD: 23.14 ms).

Comparing the regions of interest between the VP movement and NP movement condition (following angered) tested whether there was an acoustic difference in the pause duration between the filler gap dependency condition with an intermediate gap (VP movement condition) compared to the condition without an intermediate gap (NP movement condition). There was a significant difference between the two conditions with the VP movement condition (265 ms, SD: 102.01 ms) revealing a longer pause compared to the NP movement condition (211 ms, SD: 98.66 ms, t(19) = −2.25, p < 0.05).

4 Apparatus

Stimulus presentation was programmed using Tobii Studio software. Pupil diameter was recorded with a Tobii T120, sampling at 120 Hz. Tracking was binocular but only the right eye was used for analysis. The eye tracker was built into a 17-inch thin film transistor liquid crystal color display monitor (1,280 × 1,024 pixel resolution). Participants sat approximately 600–650 mm away from the display and tracking was remote. Eyes were calibrated using a 9-point sequence.

5 Procedure

The participants were informed that a fixation point would appear on the screen for 2000 ms (to allow the pupil to adjust to screen luminance), after which the fixation point would turn into a fixation cross and a recorded sentence would be heard; the fixation cross remained on the screen for the entirety of sentence presentation and for an additional 2000 ms post recording offset. Participants were asked to focus on the fixation cross while it was on the screen, attend to the auditorily presented sentences, and to try to avoid blinking during sentence presentation. After the fixation cross disappeared from the screen a new screen appeared with a scale that instructed the participants to rate the grammatical acceptability of the sentence they had heard on a scale from a 1 to 7 (with 1 being least grammatical and 7 being most grammatical). Ratings were input by pressing the corresponding number key on a keyboard provided.

6 Analysis

The design was 2 × 2 × 2 (movement × phrase type × language). Movement refers to whether the item contained a moved constituent. Phrase type refers to whether there was a NP between the filler and gap or a VP between filler and gap (or corresponding segment in the non-movement conditions). The VP movement condition was that which allowed for an intermediate gap. Both variables were manipulated within subject. Language refers to whether the participant was a native speaker of English or a L2 speaker of English (native speaker of German).

Blinks were filtered out by replacing the missing values with linear interpolation using 50ms before and after the missing values (e.g. Jackson and Sirois, 2009; Lemercier et al., 2014). Data was corrected using a 200ms baseline for the region of interests for each item. We used the 200ms before the region of interest (as opposed to the period before the presentation of the sentence) as this was the procedure in Just and Carpenter (1993). In addition, this baseline enables us to grasp more local effects of pupil change, and eliminates residual pupillary differences, and potential differences between participants. The pupil takes 1.2 seconds on average to reach its maximum diameter (Just and Carpenter, 1993), consequently, pupil change was analysed across a 1.2 second window at two critical segments investigated in previous research (e.g. Marinis et al., 2005) as explained above: (1) starting at the onset of the word following the first verb (that) in the VP conditions and at the corresponding word in the NP conditions (about), and (2) following the onset of the second verb (had angered) across the four conditions. Although there has been debate regarding L2 learners having slower processing, given the relatively long time window being analysed, we anticipate any processing time differences between the two groups will be encompassed within this 1.2 second window, we therefore do not analyse the following segment (see Dekydtspotter et al., 2006). While the graphs below display smoothed pupil slopes, this is only for visual representation but not analysis given that 14 out of 22 studies reported in a review by Lemercier et al. (2014) did not smooth their data.

Previous pupillometry research has used many different measures, since the current study was the first to investigate processing of auditorily presented intermediate gap constructions, rather than selecting a single measure we employed three measures from previous pupillometry paradigms: pupil change slope, peak latency, and peak amplitude. Before analysis, data was screened for outliers; any data point three or more standard deviations from the mean of the conditions was replaced with the mean of that condition. All trials were averaged within each condition resulting in 4 vectors for each participant (one for each of the 4 conditions). These vectors were submitted to a simple regression where pupil size served as the dependent variable and time served as the independent variable. The slope of pupil change over time (i.e. the unstandardized regression coefficient) was the main dependent variable in this study, this measure has been shown to reflect the processing effort associated with auditory presentation of garden path sentences (Engelhardt et al., 2010). We also calculated the peak amplitude and peak latency for each vector. Peak amplitude is the largest size that the pupil reached during the critical time windows and this reflects the maximum processing load evoked by the task (e.g. Schmidtke, 2014; Zekveld et al., 2012). Peak latency is the time by which the peak amplitude was reached and latency information may be informative concerning the delayed intermediate gap processing found for L2 speakers by Dekydtspotter et al. (e.g. Schmidtke, 2014; Zekveld et al., 2012).

Grammaticality rating, pupil change slope (unstandardized regression coefficient), peak amplitude, and peak latency were each analysed using linear mixed effect models using R (R Core Team, 2012) and lme4 (Bates et al., 2012), results include p-value estimates from the lmerTest package (Kuznetsova et al., 2016). Given that we had clear hypotheses, fixed effects were maximally specified with the interaction of movement (movement/non-movement), phrase type (NP/VP), and language (native/L2), and random effects of participants. Random effect of participants was maximally specified with random slopes for movement, phrase type, and language¹ (Barr et al., 2013). Below we report the t-value and p-values of the models; for model estimates and standard errors, see Appendix 1.

IV Results

1 Grammaticality Ratings

There were no main effects of language (t = 1.451, p = 0.155) or phrase type (t = 1.231, p = 0.220). There was a main effect of movement (t = 4.691, p < 0.001): non-movement conditions evoked higher grammaticality ratings than movement conditions (see Figure 1). There were no significant interactions (language by movement t = −0.801, p = 0.426; language by phrase type, t = −0.667, p = 0.500; movement by phrase type, t = 1.229 p = 0.219; and language by movement by phrase type, t = −0.154, p = 0.878). This shows that native English speakers and L2 speakers rated sentence grammaticality similarly and there was no difference in their pattern across sentences.

Figure 1.

Mean grammaticality judgment for Native and L2 speakers.

2 Pupil Change Slope: First Segment

Analysis of this segment investigated whether there was an effect of the intermediate gap on pupil change slope, compared to the other conditions without an intermediate gap.

In terms of pupil change slope at the first segment (see Figure 2) there were no significant main effects of language (t = 1.478, p = 0.148) or phrase type (t = −0.333, p = 0.740). There was a main effect of movement (t = −2.031, p < 0.050), with the movement condition having a larger pupil change slope than the non-movement condition.

Figure 2.

Smoothed pupil change slope from the onset of the first segment.

There were no two-way interactions between language and phrase type (t = 1.102, p = 0.277) or movement and phrase type (t = −0.881, p = 0.378) but there was a two-way interaction between language and movement (t = 2.086, p < 0.050). Additionally there was a three-way interaction between language, movement, and phrase type (t = −3.805, p < 0.001). To interpret this interaction, pupil change slope at the first segment was submitted to linear mixed effect models for the native and the L2 language groups separately². For the native group there were no main effects of movement (t = 1.174, p = 0.261) or phrase type (t = 1.074, p = 0.302). However, there was an interaction between movement and phrase type (t = −5.661, p < 0.001) with the NP movement condition having a smaller pupil change slope than the other three conditions (VP movement, VP non-movement, and NP non-movement). For the L2 group there was a main effect of movement approaching significance, with the movement structure having a larger pupil change slope than the non-movement (t = −1.821, p = 0.080), no effect of phrase type (t = −0.333, p = 0.742), nor an interaction (movement by phrase type, t = −0.878, p = 0.380).

Peak amplitude showed a significant main effect of language (t = −2.077, p < 0.050) with the L2 group having a larger peak amplitude than the native group. There were no other main effects (movement t = −0.207, p = 0.836, phrase type t = 0.115, p = 0.908) nor interactions (language by movement t = 1.529, p = 0.130; language by phrase type, t = 0.870, p = 0.387; movement by phrase type, t = −0.496, p = 0.623; and language by movement by phrase type, t = −1.145, p = 0.260; see Figure 3).

Figure 3.

Peak amplitude at the first segment.

Peak latency showed no main effects (language t = −1.392, p = 0.170; movement t = −1.308, p = 0.195; phrase type t = 0.411, p = 0.682) nor interactions (language by movement t = 1.794, p = 0.070; language by phrase type, t = 0.866, p = 0.389; movement by phrase type, t = −0.301 p = 0.765; and language by movement by phrase type, t = −1.098, p = 0.279; see Figure 4). This indicates that the segment containing the intermediate gap and the corresponding segments (in the NP movement and the two non-movement conditions) did evoke measurable changes in terms of the pupil change slope and peak amplitude, but not peak latency.

Figure 4.

Peak latency at the first segment.

3 Pupil Change Slope: Second Segment

This segment allowed investigation of whether there was an effect of the gap site on pupil size for both the VP and NP movement conditions, compared to the other conditions that did not have a moved element (and therefore no gap site). Analysis of the pupil change slope at the second segment showed a main effect of phrase type (t = −3.355, p < 0.010) with the VP condition showing a smaller pupil change slope compared to the NP condition. There were no main effects of language (t = 0.732, p = 0.474) or movement (t = −0.019, p = 0.988) on pupil change slope; see Figure 5 below.

Figure 5.

Pupil change from the onset of the second segment: Smoothed pupil change slope across time.

There were no two-way interactions between language and movement (t = −0.260, p = 0.796), language and phrase type (t = 0.889, p = 0.379), nor was there a three-way interaction between language, movement, and word (t = −0.108, p = 0.913). Critically, however there was a significant two-way interaction between movement and phrase type (t = 7.781 p < 0.001; see Figure 5). This interaction revealed no difference between the NP and VP items in the non-movement condition, while there was a difference between the NP and VP items in the movement condition with a larger pupil change slope (increase) in the NP movement compared to the VP movement condition.

At the second segment, there were no main effects on peak amplitude (see Figure 6) (language t = −0.677, p = 0.502; movement t = −0.248, p = 0.805; phrase type t = −1.496, p = 0.139) nor interactions (language by movement t = 0.453, p = 0.652; language by phrase type, t = 0.538, p = 0.592; movement by phrase type, t = −0.233, p = 0.817; and language by movement by phrase type, t = 0.152, p = 0.880). Also, there were no main effects on peak latency (language t = −0.677, p = 0.502; movement t = −0.248, p = 0.805; phrase type t = −1.496, p = 0.139) nor interactions (language by movement t = 0.453, p = 0.652; language by phrase type, t = 0.538, p = 0.592; movement by phrase type, t = −0.233 p = 0.817; and language by movement by phrase type, t = −0.152, p = 0.880; see Figure 7).

Figure 6.

Peak amplitude at the second segment.

Figure 7.

Peak latency at the second segment.

V Discussion

Clahsen and Felser (2006) proposed the SSH which states that late L2 speakers do not form full syntactic representations during sentence processing, but rather over-rely on non-syntactic information in their sentence interpretation. While there is some research that has supported this account (e.g. Clahsen and Felser, 2006; Marinis et al., 2005), other research, using intermediate gaps, has challenged this claim that L2 speakers use shallow syntactic processing (e.g. Dekydtspotter et al., 2006; Pliatsikas and Marinis, 2013). This study further tested the SSH by investigating the processing of intermediate gap constructions with a group of native speakers and a group of L2 speakers of English, using pupillometry and auditory sentence presentation, a novel methodology, as a measure of processing cost. We suggest that auditory sentence processing gives a richer and more natural presentation of the stimuli compared to self-paced reading used previous research. Similarly, pupillometry provides a more sensitive and naturalistic measure of processing costs than response times, and provides new insightful information into the processing of intermediate gaps by late L2 speakers.

Our analysis focused on pupil change slope, peak amplitude, and peak latency during two critical time windows in 4 construction types; examples are reiterated in (15)–(18).

(15) Movement, verb phrase (filler gap, intermediate gap): The nurse who _i the doctor argued t’_i that the patient had angered t_i is refusing to work late.

(16) Movement, noun phrase (filler gap, no intermediate gap): The nurse who _i the doctor’s argument about the rude patient had angered t_i is refusing to work late.

(17) Non-movement, verb phrase: The nurse thought the doctor argued that the rude patient had angered the staff in the hospital.

(18) Non-movement, noun phrase: The nurse though the doctor’s argument about the rude patient had angered the staff at the hospital.

Behaviorally, there were no differences in grammaticality ratings between the native and L2 speakers. For both language groups, non-movement conditions were rated higher than movement conditions, and NP conditions higher than VP conditions.

The two critical time windows for analysis of the pupil data were at the site of the intermediate gap in the movement VP condition (following argued in 15), and the corresponding segments in the other three conditions (following argument in 16 and 18 and argued in 17) and at the gap site in the movement conditions and the corresponding segments in the non-movement conditions (following angered).

At the first critical segment (the intermediate gap site after the verb, and corresponding segments), we hypothesized an increased slope of pupil change for the VP movement condition compared to the VP non-movement condition (given that in the VP movement condition the parser should be reactivating the filler at the intermediate gap site which should cause an increase in processing costs reflected by pupil size). The SSH predicts that L2 speakers should not be sensitive to intermediate gaps while native speakers should be. Consequently, only the native group should show sensitivity to the intermediate gap in the VP movement condition, while the L2 group should show no such sensitivity.

Pupil change slope at this segment revealed similarities and differences between the two language groups during the processing of these constructions. Both groups showed larger pupil change slope to movement constructions compared to non-movement constructions.

When separating out the two language groups the native group showed that non-movement conditions did not differ in pupil change slope (between the NP and VP condition) but do in the movement condition, with the VP condition having a larger pupil change slope compared to the NP condition. This pattern was not exactly in line with our hypothesis: we predicted the VP movement to have an increased slope compared to the other three conditions, while we found that the NP movement condition had a decreased pupil slope compared to other three conditions. For the L2 group we found no such interaction, suggesting that the L2 group did not process the two movement conditions differently, and thus no evidence for filler reactivation at this site.

At the second critical segment (the gap site and the corresponding segments), we hypothesized a larger decrease in slope (a less positive pupil change slope) for the movement conditions than the non-movement conditions indexing the release of the filler from working memory at the gap site. Additionally, we hypothesized an increased slope of pupil change for the NP movement condition compared to the VP movement condition, indexing a potential facilitation by the intermediate gap. The SSH predicted there would be no differences between the NP and VP movement conditions for the L2 speakers at the second critical segment, given that the L2 speakers would not have taken advantage of the previous intermediate gap due to the shallow syntactic processing.

When we looked at the interaction between movement and phrase type we observed a pattern in line with our predictions: the non-movement conditions showed no differences between the phrase types while there were differences in the movement conditions. There was a significantly greater increase in pupil diameter for the NP movement condition compared to the VP movement condition, suggesting that the processing of the final gap site in the VP condition required less cognitive effort than in the NP movement condition. The VP movement condition contained an intermediate gap, so this finding fits the results of previous research that has shown facilitation in the processing of the following gap site when it was preceded by an intermediate gap (Gibson and Warren, 2004; Marinis et al., 2005). In our study, pupil diameter appears to reflect this facilitation in processing the final (second) gap.

Unlike the results found in previous research, both groups show similar processing patterns at the second segment suggesting no processing differences between the two groups: like native speakers, L2 speakers seem to be using fewer processing resources in the movement VP condition than the NP condition. This is striking and goes against the predictions made by the SSH, which argues that L2 speakers underuse syntactic information during parsing. In their study, Marinis et al. (2005) found no evidence that L2 speakers were making use of intermediate gaps. The results of the current study have interesting implications for theories of second language acquisition and processing, and particularly the SSH. Given the similar pattern displayed by both language groups at the second segment, the data in the current study contradict the SSH. Furthermore, the group of participants in the current study were late learners of English, and under the SSH, less likely to postulate gaps like native speakers. However the data from the second segment shows an apparent facilitation effect in the conditions with an intermediate gap (movement VP), and there is no evidence that this effect differs between the native and L2 speakers.

In previous research, increased reading times at the first segment in the movement VP conditions have been interpreted as suggesting that the filler is being reactivated at the intermediate gap. It is this reactivation that is argued to lead to the facilitation when forming the filler gap dependency at the gap site (second segment) in the VP movement condition. Interestingly, we see exactly this pattern in the pupil slope change across both segments with the native speakers, but we only see an effect at the second segment with the L2 speakers. For the L2 group we did not detect increased processing costs at the intermediate gap site (first segment), but still found facilitation at the gap at the second segment. The data reported here paint an interesting picture. Data from the first segment are consistent with earlier research that also found that L2 speakers are not capable of making use of intermediate gaps (Marinis et al., 2005). Data from the second segment are consistent with research that also found that L2 speakers are capable of making use of intermediate gaps (e.g. Dekydtspotter et al., 2006; Pliatsikas and Marinis, 2013), but using a novel methodology that provides an index of processing costs and using auditory presentation. Dekydtspotter et al. (2006), found evidence that some L2 learners were indeed making use of intermediate gaps (in the movement VP condition) but that there was evidence for a processing delay in the segment following the intermediate gap. In line with Dekydtspotter et al. and Pliatsikas and Marinis it could be that in the current study we do not find evidence for filler reactivation at the first segment for L2 speakers because it is delayed. However, we found no difference in peak latency between the native speakers and the L2 learners which does not support the assumption of a processing delay in the L2 learners.

Given that this study is, to the best of our knowledge, the first to investigate intermediate gaps using auditory presentation, it is possible that the presentation mode played a role in our results. It may also be that acoustic cues, not apparent during self-paced reading, but present in speech (that coincide with the gap sites) facilitate deeper syntactic processing by L2 speakers, and that previous research in favor of the SSH was confounded by the nature of the task, and has underestimated the abilities of the L2 parser. In terms of the acoustic analysis, at the first segment, there was no difference in the duration of the first verb (argued) between the two VP conditions in our study, but there was a significantly longer pause following the first verb in the VP movement condition (which is the site of intermediate gap) compared to the VP non-movement condition. At the second segment we found a longer duration of the verb prior to the final gap for the VP-movement condition than for the non-movement condition. Given that the duration of the first verb did not differ while the duration of second verb did differ it may be the duration of the verb (or the coupling of verb duration and pause duration) that provided a cue for the final gap but not for the intermediate for the L2 speakers. However, we are unaware of any other research investigating the acoustic information of spoken intermediate gap constructions, and more research investigating the nature of spoken intermediate gap constructions is required.

It is important to note that research has found that immersed L2 learners can have reduced access to their native language (e.g. Linck et al., 2009), so it could have been that the lack of differences between the L1 and L2 speakers here stemmed from the immersion of the L1 participants in German (causing reduced access to their L1 English). However, this seems unlikely for our L1 participants as the majority of the L1 speakers reported using German less than 10% of the time.

VI Limitations

Given that pupillometry is a somewhat novel methodological approach to investigate linguistic processing and that the pupil is inherently quite noisy, it is important to address some of the potential limitations in the current study in hope that future research addresses some of these points. First, there is a lack of standardization among pupillometry research in terms of pupil data processing and pre-processing (Lemercier et al., 2014). In the current study processing and pre-processing steps were based on previous research, but future research should attempt to standardize these approaches to allow for comparisons across studies. Second, the current study has a limited number of items in each condition (5) and participants in the monolingual group (n = 14). Future research should increase the power of their findings with greater numbers of items and participants. These points coupled with the inherent variability of the pupil warrant some caution when interpreting the data in the current study. Despite these potential limitations, the acoustic presentation of items along with the continuous recording of pupil data provide us with new insights in to native and L2 processing of complex syntactic structure, and holds promise in providing a more comprehensive understanding of L2 language processing.

VII Conclusions

The SSH argues that late L2 speakers do not form full syntactic representations, but rather rely on non-syntactic information during comprehension, causing them to form shallow representations during processing (Clahsen and Felser, 2006). In this study, the SSH was tested by investigating pupil response, as a measuring of processing costs, during the auditory presentation of sentences with filler gap dependencies and with or without an intermediate gap (and corresponding non-movement control sentences) by native and late L2 speakers of English.

First, it is important to note that all previous research has used a reading task: this is the first study to investigate the processing of intermediate gaps using auditory presentation. Consequently, it is possible that the differences in the results between this study and those in the past (e.g. Clahsen and Felser, 2006; Dekydtspotter et al., 2006; Pliatsikas and Marinis, 2013) reflect the different nature of processing in the two modalities and the benefit of acoustic cues during the processing of filler gap dependencies. We would argue that auditory presentation provides a more natural reflection of language processing and is more likely to reflect the automatic processes used by listeners. Further research to replicate and extend the findings presented here, using converging methodologies is clearly warranted.

While there were significant differences between the two language groups, at the first segment, when analysing the second segment, there was no interaction between language group and phrase type: consequently we do not have evidence that late L2 speakers underuse syntactic information. Indeed, we found evidence supporting a position that late L2 speakers do make use of the intermediate gap, and form filler gap dependencies. This is in contrast to the data from Marinis et al. (2005), but in accordance with a growing body of research that has found L2 speakers are capable of making use of intermediate gaps (e.g. Dekydtspotter et al., 2006; Pliatsikas and Marinis, 2013). Overall, it seems that the L2 speaker parsing mechanism, even in late L2 speakers, is capable of more in-depth processing than previously argued, at least within the auditory modality. Theories of second language acquisition and processing must take this into account, and provide a more comprehensive model of the rich and complex processing of which late L2 speakers are capable.

Footnotes

Appendix

Appendix 1

Linear mixed effects models information.

Grammar rating
Model parameter	b	SE	t	p
(Intercept)	4.471	0.237	18.846	< 0.001
Language (Native)	0.449	0.309	1.451	0.155
Movement (Non-Movement)	1.185	0.256	4.619	< 0.001
Phrase type (VP)	0.271	0.220	1.231	0.220
Language (Native) × Movement (Non-Movement)	–0.265	0.331	–0.801	0.426
Language (native) × Phrase type (Non-Movement)	–0.191	0.282	–0.677	0.500
Movement (Non-Movement) × Phrase type (VP)	–0.357	0.290	1.229	0.219
Language (native) × Movement (Non-Movement) × Phrase type (VP)	0.571	0.371	0.154	0.878
First segment pupil change slope
Model parameter	b	SE	t	p
(Intercept)	0.068	0.044	1.478	0.148
Language (Native)	–0.126	0.077	–1.726	0.093
Movement (Non-Movement)	–0.080	0.039	–2.031	0.049
Phrase type (VP)	–0.014	0.042	–0.333	0.740
Language (Native) × Movement (Non-Movement)	0.130	0.062	2.086	0.044
Language (native) × Phrase type (Non-Movement)	0.073	0.066	1.102	0.277
Movement (Non-Movement) × Phrase type (VP)	–0.007	0.009	–0.881	0.378
Language (native) × Movement (Non-Movement) × Phrase type (VP)	–0.054	0.014	–3.805	< 0.001
First segment pupil slope split by Language (native)
Model parameter	b	SE	t	p
(Intercept)	–0.058	0.059	–0.985	0.342
Movement (Non-movement)	0.050	0.042	1.174	0.261
Phrase type (VP)	0.594	0.055	1.074	0.302
Movement (Non-Movement) × Phrase type (VP)	–0.062	0.011	–5.661	< 0.001
First segment pupil slope split by Language (L2)
Model parameter	b	SE	t	p
(Intercept)	0.687	0.047	1.442	0.164
Movement (Non-movement)	–0.080	0.044	–1.821	0.083
Phrase type (VP)	–0.014	0.421	–0.333	0.742
Movement (Non-Movement) × Phrase type (VP)	–0.007	0.009	–0.878	0.380
First segment peak latency
Model parameter	b	SE	t	p
(Intercept)	0.319	0.049	6.438	< 0.001
Language (Native)	–0.109	0.078	–1.392	0.170
Movement (Non-Movement)	–0.074	0.056	–1.308	0.195
Phrase type (VP)	0.024	0.060	0.411	0.682
Language (Native) × Movement (Non-Movement)	0.160	0.089	1.794	0.077
Language (native) × Phrase type (Non-Movement)	0.082	0.094	0.866	0.389
Movement (Non-Movement) × Phrase type (VP)	0.021	0.071	0.301	0.765
Language (native) × Movement (Non-Movement) × Phrase type (VP)	–0.125	0.113	–1.098	0.279
First segment peak amplitude
Model parameter	b	SE	t	p
(Intercept)	43.523	5.249	8.290	< 0.001
Language (Native)	–17.238	8.300	–2.077	0.042
Movement (Non-Movement)	–1.428	6.914	–0.207	0.836
Phrase type (VP)	0.904	7.860	0.115	0.908
Language (Native) × Movement (Non-Movement)	16.714	10.932	1.529	0.130
Language (native) × Phrase type (Non-Movement)	10.809	12.428	0.870	0.387
Movement (Non-Movement) × Phrase type (VP)	–4.666	9.416	–0.496	0.623
Language (native) × Movement (Non-Movement) × Phrase type (VP)	–17.047	14.888	–1.145	0.260
Second segment pupil change slope
Model parameter	b	SE	t	p
(Intercept)	0.013	0.035	0.380	0.707
Language (Native)	0.043	0.059	0.732	0.474
Movement (Non-Movement)	–0.001	0.040	–0.019	0.988
Phrase type (VP)	–0.119	0.355	–3.355	< 0.01
Language (Native) × Movement (Non-Movement)	–0.016	0.064	–0.260	0.796
Language (native) × Phrase type (Non-Movement)	0.050	0.056	0.889	0.379
Movement (Non-Movement) × Phrase type (VP)	0.083	0.010	7.781	< 0.001
Language (native) × Movement (Non-Movement) × Phrase type (VP)	–0.001	0.016	–0.108	0.913
Second segment peak latency
Model parameter	b	SE	t	p
(Intercept)	0.331	0.060	5.494	< 0.001
Language (Native)	0.057	0.095	0.599	0.551
Movement (Non-Movement)	0.060	0.077	0.787	0.433
Phrase type (VP)	–0.127	0.077	–1.648	0.104
Language (Native) × Movement (Non-Movement)	–0.132	0.122	–1.086	0.280
Language (native) × Phrase type (Non-Movement)	0.042	0.122	0.347	0.729
Movement (Non-Movement) × Phrase type (VP)	0.009	0.100	0.094	0.925
Language (native) × Movement (Non-Movement) × Phrase type (VP)	0.042	0.158	0.268	0.790
Second segment peak amplitude
Model parameter	b	SE	t	p
(Intercept)	48.143	5.675	8.483	< 0.001
Language (Native)	–6.071	8.973	–0.677	0.502
Movement (Non-Movement)	–1.857	7.475	–0.248	0.805
Phrase type (VP)	–11.095	7.417	–1.496	0.139
Language (Native) × Movement (Non-Movement)	5.357	11.819	0.453	0.652
Language (native) × Phrase type (Non-Movement)	6.310	11.728	0.538	0.592
Movement (Non-Movement) × Phrase type (VP)	–2.381	10.203	–0.233	0.817
Language (native) × Movement (Non-Movement) × Phrase type (VP)	2.452	16.133	0.152	0.880

Note: Significant (p< 0.05) findings are indicated in bold.

Declaration of Conflicting Interest

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the European Commission in Erasmus Mundus Joint Doctoral Programme IDEALAB under the Specific Grant Agreement 2012-1713/001-001 EMJD (Frame-work Partnership Agreement 2012–25). Lyndsey Nickels was supported by an Australian Research Council Future Fellowship (FT120100102).

Notes

References

Bates

Maechler

Bolker

(2012) lme4: Linear mixed-effects models using S4 classes. R package version 1.0.136.

Barr

Levy

Scheepers

Tily

(2013) Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language 68: 255–78.

Boersma

Weenink

(2016) Praat: Doing phonetics by computer [computer program], Version 6.0.19.

Chomsky

(1965) Aspects of the Theory of Syntax. Cambridge, MA: MIT Press.

Clahsen

Featherson

(1999) Antecedent priming at trace positions: Evidence from German scrambling. Journal of Psycholinguistic Research 28: 415–37.

Clahsen

Felser

(2006) Grammatical processing in language learners. Applied Psycho-linguistics 27: 3–42.

Council of Europe (2001) Common European framework of reference for languages. Cambridge, Cambridge University Press.

Dekydtspotter

Schwartz

Sprouse

(2006) The comparative fallacy in L2 processing research. In: O’Brien

Shea

Archibald

(eds) Proceedings of the 8th Generative Approaches to Second Language Acquisition Conference. Sommerville, MA: Cascadilla Proceedings Project, 33–40.

Engelhardt

Ferreira

Patsenko

(2010) Pupillometry reveals processing load during spoken language comprehension. Quarterly Journal of Experimental Psychology 63: 639–45.

10.

Fernandez

(2013) Pupillometry and Filler Gap Dependencies. Unpublished Master of Philos-ophy thesis, Northumbria University, Newcastle upon Tyne, UK.

11.

Ferreira

(1993) Creation of prosody during sentence production. Psychological Review 100: 233–53.

12.

Gibson

Warren

(1999) The psychological reality of intermediate linguistic structure in long-distance dependencies. Unpublished manuscript, MIT, Cambridge, MA, USA.

13.

Hestvik

Maxfield

Schwartz

Shafer

(2007) Brain responses to filled gap. Brain and Language 100: 301–16.

14.

Hess

Polt

(1960) Pupil size as related to interest value of visual stimuli. Science 132: 349–50.

15.

Jackson

Sirois

(2009) Infant cognition: Going full factorial with pupil dilation. Developmental Science 12: 670–90.

16.

Just

Carpenter

(1993) The intensity dimension of thought: Pupillometric indices of sentence processing. Canadian Journal of Experimental Psychology 47: 310–39.

17.

Just

Carpenter

Woolley

(1982) Paradigms and processes in reading comprehension. Journal of Experimental Psychology: General 111: 228–38.

18.

Kahneman

(1973) Attention and effort. Englewood Cliffs, NJ: Prentice-Hall.

19.

Kahneman

Beatty

(1966) Pupil diameter and load on memory. Science 154: 1583–85.

20.

Kahneman

Wright

(1971) Changes of pupil size and rehearsal strategies in a short-term memory task. Quarterly Journal of Experimental Psychology 23: 187–96.

21.

Kang

Rubin

Pickering

(2010) Suprasegmental measures of accentedness and judgments of language learner proficiency in oral English. The Modern Journal of Language 95: 554–66.

22.

Kuznetsova

Brockhoff

Bojesen

(2016) lmerTest: Tests in linear effects models. R package version 2.0-33.

23.

Lee

(2004) Another look at the role of empty categories in sentence processing (and grammar) Journal of Psycholinguistic Research 33: 51–73.

24.

Lemercier

Guillot

Courcoux

Garrel

Baccino

Schlich

(2014) Pupillometry of taste: Methodological guide – from acquisition to data processing – and toolbox for MATLAB. The Quantitative Methods for Psychology 10: 179–99.

25.

Linck

Kroll

Sunderman

(2009) Losing access to the native language while immersed in a second language evidence for the role of inhibition in second-language learning. Psychological Science 12: 1507–15.

26.

Nagal

Shapiro

Nawy

(1994) Prosody and the processing of filler-gap sentences. Journal of Psycholinguistic Research 23: 473–85.

27.

Marinis

Roberts

Felser

Clahsen

(2005) Gaps in second language sentence processing. Studies in Second Language Acquisition 27: 53–78.

28.

Pickering

Barry

(1991) Sentence processing without empty categories. Language and Cognitive Processes 6: 229–259.

29.

Pliatsikas

Marinis

(2013) Processing empty categories in second language: When naturalistic exposure fills the (intermediate) gap. Bilingualism: Language and Cognition 16: 167–182.

30.

R Core Team (2012) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

31.

Schmidtke

(2014) Second language experience modulates word retrieval effort in bilinguals: evidence from pupillometry. Frontiers in Psychology 5: 137.

32.

Trofimovich

Baker

(2006) Learning second language suprasegmentals: Effect of L2 experience on prosody and fluency characteristics of L2 speech. Studies in Second Language Acquisition 28: 1–30.

33.

Watson

Gibson

(2004). The relationship between intonational phrasing and syntactic structure in language production. Language and Cognitive Processes 19: 713–55.

34.

Zekveld

Kramer

Festen

(2010) Pupil response as an indication of effortful listening: The influence of sentence intelligibility. Ear and Hearing 31: 480–90.