Abstract
Prosodic cues help to disambiguate incoming information in spoken language perception. In structurally ambiguous coordinate utterances, such as three-name sequences, the intended grouping is marked by three prosodic cues: F0-range, final lengthening, and pause. To indicate that the first two names are grouped together, speakers typically weaken the durational and tonal cues on the first name whereas they are strengthened on the second name, compared with a structure without internal grouping. The current study uses a gating paradigm to test whether listeners can decide about the internal grouping of a coordinate structure by already exploiting prosodic information on the first name. One hundred ninety-two stimuli were cut into seven parts (gates) and presented to naive participants (n = 45) successively (gate by gate) with increasing length of the utterance and amount of prosodic information. In a two-alternative forced-choice decision task, accuracy was above chance level after the second name. However, more than half of the participants could already reliably detect grouping patterns after the first name. These interindividual differences point toward the existence of different subgroups with diverging prosodic parsing strategies. Furthermore, listeners were sensitive to speaker-specific prosodic patterns. Depending on speaker-specific characteristics and individual parsing capacities, it seems possible—at least for a subgroup of listeners—to make predictions about the underlying grouping structure of coordinated name sequences based on early prosodic cues.
1 Introduction
Prosody, the modulation of pitch and rhythm, is a pivotal source of information in spoken language comprehension. As prosody accompanies a spoken utterance, it guides the listener along the syntactic structure (e.g., Steinhauer et al., 1999) and the speaker along their mental structure (e.g., Kraljic & Brennan, 2005; Speer et al., 2011; Wagner, 2005; Watson & Gibson, 2004). In more detail, speakers produce prosody for a variety of purposes, including syntactic, lexical, and pragmatic objectives, and thus convey content that is critical for understanding. On the side of the listener, these underlying purposes and meaningful cues must be exploited in prosodic parsing to identify constituents, structure, and meaning of the utterance (e.g., Clifton et al., 2002).
Prosodic cues demarcate junctures in an utterance, such as the beginning and end of a discourse segment (Swerts & Geluykens, 1993), newly introduced concepts in the discourse (Féry & Kügler, 2008), and internal chunks that group semantically and pragmatically related constituents, by means of prosodic boundaries (Kentner & Féry, 2013).
In this study, we focus on the latter, specifically on boundary-related prosodic markers in German three-name sequences coordinated by und (“and”) with and without internal grouping of the first two names (see examples (1) and (2) below). These structures are particularly suitable for the investigation of prosodic boundaries: first, linguistic characteristics of coordinated names can easily be controlled when designing the experimental stimuli. Second, coordinated name sequences are short and simple—and it has been demonstrated that prosodic cues appear to have larger implications for perception among listeners in shorter constituents than in longer ones (Clifton et al., 2006). In the following examples, the answer to the question Who is arriving at the station? may include an internal grouping of two of the persons or may be uttered without such an internal grouping. Example (2) contains no internal grouping and the three persons are possibly all arriving together. In contrast, example (1) contains an internal grouping of the first two constituents (indexed by the brackets), indicating that Moni and Lilli are arriving together whereas Manu is arriving separately.
Who is arriving at the station?
(1) [Moni und Lilli] und Manu
Internal grouping (bracket)
(2) Moni und Lilli und Manu
No internal grouping (no bracket)
Regarding the production of such prosodic boundaries, Kentner and Féry (2013) developed a model of syntax-prosody mapping, the Proximity/Similarity Model, that accounts for the relative strengths of these prosodic boundaries. Proximity predicts that a boundary between two names is weakened when they are grouped together in comparison to a structure without internal grouping. That is, the boundary between the first and the second names, Moni and Lilli, is predicted to be weaker in (1) than in (2). According to antiproximity, a prosodic boundary is strengthened between two names that are of different syntactic levels in comparison to an ungrouped structure. Consequently, the boundary after the second name, Lilli, is predicted to be strengthened in (1) compared with (2). This stronger prosodic boundary on the second name indicating a grouping such as in (1) is realized in different languages, including German (Gollrad, 2013; Huttenlauch et al., 2021; Kentner and Féry, 2013; Peters et al., 2005; Petrone et al., 2017), by a longer duration of the final syllable of the second name (henceforth referred to as final lengthening), a higher rise of the fundamental frequency (F0) on the second name, and a pause right after the second name in comparison to an ungrouped structure. Thus, for coordinated name sequences as in (1), the boundary bearing the most salient cues to syntactic grouping is located at the end of the second name. In this article, we will call the prosodic cues around the second name late prosodic cues to boundaries, as opposed to early prosodic cues, which are located before the second name.
In the perception of coordinate structures, pitch, pause, and final lengthening are not equally relevant for ambiguity resolution. In a perception study manipulating late prosodic cues, Gollrad (2013) demonstrated that pitch alone is not sufficient for boundary perception whereas jointly presented durational cues (pause, final lengthening) may facilitate the parsing process without a pitch being present (for similar results from ERP data, see Holzgrefe-Lang et al., 2016). According to Petrone et al. (2017), the pause cue triggers a more categorical shift in prosodic judgments than pitch and final lengthening. These studies focused on late prosodic cues in coordinated name structures on or after the second name such as in (1) compared with (2). However, boundaries are scaled relative to one another and boundary strength is determined not just locally but across the whole utterance (e.g., Clifton et al., 2006; Wagner, 2010).
As described above, the Proximity/Similarity Model predicts a weakening of the boundary on the first name in (1) versus (2). These early cues, that is cues on Name1, might already give hints to the grouping of the structure. Corresponding prosodic patterns, in line with proximity/antiproximity, were observed by Kentner and Féry (2013). Early prosodic cues have also been reported in the study by Huttenlauch et al. (2021), in which participants produced coordinate structures akin to examples (1) and (2) above: differences in cue usage were found not only on the second name (henceforth Name2) but also already on the first name (henceforth Name1). More precisely, the pitch was lower and final lengthening was shorter for Name1 in the bracket condition (example (1)) than in the no bracket condition (example (2)). Huttenlauch et al. (2021) concluded that speakers used early as well as late prosodic cues to distinguish between conditions.
Several studies have shown that listeners pay attention to more global prosodic features such as phrase length, speech rate, and speaker as well as language-specific prosodic patterns (e.g., Jun, 2003). Moreover, nonlocal/more distant F0 and durational cues have been shown to influence listeners’ perception of segmentation (or the grouping of segments into words, e.g., foot—notebook—worm/footnote—book—worm; Brown et al., 2011; Dilley & McAuley, 2008). In an ERP study on coordinate structures conducted by Li and Yang (2009), a closure positive shift, reflecting the perception of a prosodic boundary, was also elicited for earlier boundaries characterized by more subtle cues. This suggests that listeners are sensitive to early prosodic cues in coordinate utterances. It remains an open question whether such cues are already sufficient for disambiguation. Hence, the overarching research question of this study is: Can listeners exploit early, subtle prosodic cues such as the cues present on Name1 in coordinated name sequences to predict the internal grouping of the utterance?
With respect to the processing of different sentence types, that is questions versus statements, it has already been shown that listeners can make use of early prosodic cues for disambiguation (Face & d’Imperio, 2005 for Spanish; Petrone & Niebuhr, 2014 for German; van Heuven & Haan, 2002 for Dutch; Vion & Colas, 2006 for French). Thus, we predict that listeners may be able to use early cues for disambiguation in other linguistic contexts as well. We will make use of nonmanipulated productions of coordinate structures from the study by Huttenlauch et al. (2021) to investigate our research question. If early prosodic cues are sufficient to predict the underlying syntactic structure, future studies should pay considerably more attention to these cues in (perception) experiments on disambiguating prosody, as opposed to investigating the influence of the local, late boundary cues only. We employ a gating paradigm (see Section 2) to investigate how listeners make use of gradually increasing amounts of prosodic information.
Besides investigating the use of early cues for disambiguation in perception, we focus on interindividual differences in the production of prosodic boundaries in coordinate structures and their influence on perception. Several studies have observed variability in cue combinations and attunement of cues on the second constituent, with pause being the most stable cue that was produced (Gollrad, 2013; Huttenlauch et al., 2021; Kentner & Féry, 2013; Peters et al., 2005; Petrone et al., 2017). The general variability in the usage of cues that were found in these studies points toward a considerable interindividual range of degrees of freedom in cue realization.
Similar to production, variability can also be found in perception: Cangemi et al. (2015) investigated the production and perception of question-answer pairs with fictional name targets in different linguistic focus structures: broad, narrow, and contrastive focus. Answer sentences were structurally identical, with the respective focus structure being signaled (or disambiguated) by means of prosody. Listeners had to match the target sentences (recorded in a preceding production part with different participants) to one of the different focus conditions. They varied in their decoding of prosodic contrasts across speakers and, in addition, in their decoding of prosodic contrasts produced by particular speakers. The authors suggest an individual-specific network of phonological knowledge that leads to speaker- and listener-specific differences in the identification of prosodic contrasts. When it comes to online processing of prosody, Kim (2019) observed differences in prosodic cue usage for the perception of disambiguating boundaries over the course of listeners’ fixations to the target picture in a visual world paradigm: some listeners looked to the correct target earlier than others, depending on the available prosodic cues, which varied between conditions.
Individual differences in prosody perception were also found in studies on boundary and prominence annotations conducted by means of rapid prosody transcription (RPT): Cole et al. (2017) found a few “super annotators” in each of their participant groups, despite considerable differences in group characteristics (spoken dialect, lab-based vs. crowd-sourced settings). The concept of “super annotators” refers to a high agreement rate in prosodic marking within Tones and Break Indices (TOBI) annotations (Silverman et al., 1992) between a naive annotator and trained TOBI annotators. However, Bishop et al. (2020) point out that the existence of “super annotators” might be due to chance, as they only found “one or two” (page 7) among 158 naive participants. In another RPT study, Roy et al. (2017) report subsets of participants who made use of global cues such as intensity, whereas others used these cues to a minor extent. Overall, effects differed substantially across the annotators, with trained annotators performing better than untrained ones. Among the untrained annotators in this study, only a subset seemed to use the same prosodic cues as the trained annotators did. The authors suggest that these individual differences are driven either by differences in sensitivity to a cue or by differences in the sensitivity to contextual factors that predict the occurrence of a prosodic boundary or a prominent feature. These assumptions are supported by Baumann and Winter (2018), who interpret their findings on prosodic prominence as some listeners paying more attention to pitch-related features and less to semantic-syntactic and lexical features, whereas others show the reverse pattern. These findings do not necessarily generalize to boundary perception but can be seen as another indication of different listener groups.
If major differences exist between listeners regarding prosody processing styles (e.g., Yu, 2013), cochlear responses to tonal aspects in the speech signal (Ladd, 2008), or communicative skills that are required for prosody perception (e.g., Jun & Bishop, 2015), it is important to capture these individual differences. When averaging over the entire group of participants, an important part of the information about prosodic processing is lost. Therefore, in this study, we will explore individual differences in the use of early prosodic cues for syntactic disambiguation.
2 Aims and hypotheses of the current study
This study aims to investigate listeners’ ability to exploit early prosodic cues for the detection of the intended internal grouping in German coordinated name sequences. For this purpose, we use the gating paradigm, a specific experimental setup introduced by Grosjean (1980). The gating paradigm is a method in which a whole stimulus is cut into several parts (gates) and the participants are presented with gated snippets of successively increasing length.
Gating studies have been used successfully for research on spoken word recognition. Specifically, in the domain of prosody and listeners’ predictions based on prosodic parsing, the gating paradigm has been used to gain insights into listeners’ exploitation of pitch accents (Cutler & Otake, 1999), the prediction of sentence length (Grosjean, 1983, 1996; O’Brien et al., 2013) and sentence continuation (Hughes & Szczepek Reed, 2011), and the intonation of questions (Petrone & Niebuhr, 2014), as well as for assessing speech segmentation among native listeners in comparison to the second language learners (Field, 2008). It is noteworthy that Beach (1991) used a version of the gating paradigm (short vs. long sentence beginning conditions; stimulus example: Jay believed. . . vs. Jay believed the gossip. . ., p. 4) to show that listeners made use of prosody to distinguish a direct object and sentence-complement syntactic structures before the complete sentence information was available. In Allopenna et al. (1998), a gating paradigm was used along with eye tracking to investigate continuous prosodic mapping.
In the current study, we applied the gating paradigm to investigate listeners’ prosodic parsing of the coordinate stimuli produced by four of the participants in the study of Huttenlauch et al. (2021). To this end, the coordinated three-name sequences were cut into seven gates (g1–g7), where the first gate comprised the first syllable of Name1. With each gate, the subsequent syllable was added in a cumulative manner. Thus, g1 included the first syllable of Name1, g2 comprised the first and second syllables of Name1 (i.e., the complete first constituent), g3 comprised the first constituent and the conjunction, and so forth (a more detailed description is given in section 3.2).
After each gate, participants had to decide if the structure belonged to the condition with or without grouping. The following research questions (RQ) were addressed:
RQ 1: At which gate can listeners reliably predict the structure of German coordinated name sequences with or without internal grouping of the first two names?
As in the grouping condition (bracket), the most alerting prosodic cues occur at or after the final syllable of Name2 (Huttenlauch et al., 2021), listeners were predicted to reliably detect the internal grouping at g5 (i.e., after Name2). If cues that are located at earlier points in the utterance can already serve as reliable markers for grouping patterns, listeners’ decisions about internal grouping should already be above chance level in early gates (i.e., before Name2).
RQ 2: Are there individual differences among listeners with respect to prosodic parsing capacities?
As described above, previous evidence from perception experiments usually mirrors mechanisms that were found in production. Different individuals naturally exhibit some degree of variability in production (e.g., Huttenlauch et al., 2021) and variability has also been observed in perception (Cangemi et al., 2015). To our knowledge, listener variability has not been investigated in coordinate structures before. This second research question was rather exploratory and thus not tied to specific predictions.
3 Methods and procedures
3.1 Participants
A total number of 45 adults participated in this study (39 female, 6 male, mean age = 22.37, SD = 3.42, age range = 18–30 years). All of them were monolingual native speakers of German without self-reported neurologic or psychiatric symptoms, language impairments, and hearing or vision problems. All participants were students at the University of Potsdam and were recruited via an online participant database. They received course credits or monetary compensation for participation. Written informed consent was obtained from all participants prior to the study. They were naive to the purpose of the study. The procedure for this study was approved by the Ethics Committee of the University of Potsdam (approval number 72/2016).
3.2 Stimuli
3.2.1 Structure of the source material
The gated stimuli were based on nonmanipulated recordings of coordinate structures taken from a production study by Huttenlauch et al. (2021). We will refer to these original recordings of coordinate structures as the source material in the following. The source material appeared in two grouping conditions, one with the internal grouping of the first two names as in (1), and one without internal grouping as in (2). The same coordinate structures had been previously used in production and perception studies (e.g., Holzgrefe-Lang et al., 2016; Huttenlauch et al., 2021) and consisted of six items that all had the same structure: a sequence of three German names coordinated by und (“and”). All names were disyllabic, stressed on the penultimate syllable, and ended either in an /i/ (Moni, Lilli, Leni, Nelli, Mimmi, and Manni) in the position of Name1 and Name2 or in /u/ or /a/ (Manu, Nina, and Lola) as Name3. We controlled for frequency effects of adjacent names: The occurrence of all possible adjacent name combinations was nonfrequent in the dlexDB corpora (Heister et al., 2011) as well as in printed sources covering the years 1500 to 2021 accessed in an online-search using the Google Ngram Viewer (Lin et al., 2012).
Of all the 15 speakers analyzed by Huttenlauch et al. (2021), four speakers were selected on the basis of a perception check conducted in the same study. This perception check had been carried out to confirm that the internal grouping of constituents produced by the speakers following the instructions in the production experiment was congruent with the structure perceived by naive listeners (n = 31 in Huttenlauch et al., 2021). In contrast to the procedure of the current study, participants in the perception check listened to the complete productions. For each production, they were asked to identify the grouping condition (internal grouping vs. no internal grouping) and choose between two pictograms, one depicting two persons grouped together and the third person standing alone (as in Figure 5(a) in section 3.3, referring to internal grouping) and one with three persons grouped together (as in Figure 5(b) in section 3.3, referring to no internal grouping). In the analysis, the ratio of the number of congruent responses to the number of total responses (referred to as rating accuracy) was calculated. The 48 productions of the four speakers selected for the current study (6 name sequences × 2 conditions × 4 speakers) had achieved a slightly higher rating accuracy (mean per speaker > 98%) than the productions of the remaining 11 speakers (M: 94%). We interpreted high ratings as indicating that the intended structure could reliably be recovered by naive listeners when listening to the complete coordinate structure. The four selected speakers (speaker IDs 6, 10, 11, and 16) all identified as female and had a mean age of 24 years (SD: 4.24, range 21–30).
3.2.2 Creation of the gated stimuli
For the current study, the 48 recordings were each cut into seven parts (gates, g1–g7), yielding a total number of 336 gated stimuli. Ascending gate numbers represent longer utterance durations and an increasing amount of prosodic information (see Figures 1 and 2(a) and 2(b) showing the position of the gates in the utterance). As of g7, the corresponding recording comprised the whole utterance (i.e., a complete coordinated three-name sequence). For the cutting procedure, the segment boundaries and pauses, as previously labeled by Huttenlauch et al. (2021) according to the criteria of Turk et al. (2006) in the software Praat (Boersma & Weenink, 2017), were used.

Example of the segmental material in each of the seven gates (g1–g7) that were cut from a complete coordinate three-name sequence (cf., g7).

(a) Oscillogram/spectrogram with F0 contour (solid line), names, and corresponding gate numbers for an example stimulus for the bracket condition. (b) Oscillogram/ spectrogram with F0 contour (solid line), names, and corresponding gate numbers for an example stimulus for the no bracket condition.
3.2.3 Descriptive visualization of speaker-specific cue use
In the following, we will provide a short description of the prosodic nature of the source material. We will mainly focus on the three prosodic cues that have commonly been investigated in previous studies, including Huttenlauch et al. (2021), as indicators of internal grouping: F0 movement, final lengthening, and pause after Name1 and Name2. F0 movement captures the distance between the F0-minimum and the F0-maximum in semitones separately on Name1 and Name2. Final lengthening gives the duration of the final vowel of a name relative to the duration of the whole name (in percent), again, separately for Name1 and Name2, and the variable pause contains the duration of a possible pause following Name1 and Name2 relative to the duration of the whole utterance (in percent). In the bracket version of the source material, the prosodic cues on Name1 are expected to be smaller as compared with the no bracket version, whereas on Name2, the prosodic cues are expected to be larger as compared with the no bracket version.
For all productions of the four speakers selected as source material for the gated stimuli for the current study, the two grouping conditions could reliably be differentiated by naive listeners (cf., 3.2.1). However, this does not rule out the possibility that there are interindividual differences due to speaker-specific use of prosodic cues. This was also confirmed by the analysis in Huttenlauch et al. (2021), which is why we describe the prosodic cues of the source material of the stimuli in a speaker-specific manner.
Figure 3 shows the distributions of the three cues (rows) in raincloud plots (Allen et al., 2019) separately for Name1 (left column) and Name2 (right column) and individually for the four speakers (y-axis). The pause after Name1 is not visualized as most productions lack a pause at this position. The figure depicts values for individual utterances (black dots for the bracket condition and gray dots for the no bracket condition) together with the density distribution and a box plot within a cue, condition, and speaker. Overall, black and gray dots as well as the density show a larger overlap within single cues and speakers on Name1 (especially for final lengthening) as compared with Name2. We interpret an overlap as an indication that the corresponding cue was not used distinctively between the bracket and the no bracket conditions. Thus, the final lengthening on Name1 was not used to clearly distinguish between conditions. In contrast, for F0 range on Name1, two speakers show less overlap (i.e., 6 and 11) than the other two. The figure suggests that the former two speakers systematically used F0 on Name1 to differentiate the bracket from the no bracket condition. On Name2, there are more cases where the two conditions show no overlap (e.g., F0-range in speakers 16, 11, and 6, pause in all speakers). In sum, on Name2, more cues diverge between conditions than on Name1. Nevertheless, as mentioned in the introduction, the analysis by Huttenlauch et al. (2021) also revealed reliable cues on the group level on Name1.

Distribution of the three prosodic cues F0-range (upper row), final lengthening (mid row), and pause (bottom row) on Name1 (left column) and on Name2 (right column) by condition (black—bracket, gray—no bracket) separated for speakers (y-axis).
The three described prosodic cues are relative measures that unfold over time or in relation to the surrounding speech material. This makes it difficult to determine a specific point in time where a cue is located or takes effect. The association of a prosodic cue with a specific gate is, thus, always a simplification. We associate g2 and g5 with the cues final lengthening and F0-range. In the case of F0 range, the F0-minima are largely located on g1 and g4, whereas g2 and g5 bear the end positions of the movement. It is, therefore, possible that there are already perceivable differences in the preceding gates. The production of a silent pause, however, is only perceivable in the presence of the following speech material (i.e., on g3 and g6). Thus, simplified, the gate corresponding to the boundary at the group edge on Name2 in the bracket condition is g5. Finally, it should be noted that additional cues may be present in the utterance which have, so far, not been investigated.
For an additional visualization, we extracted F0 values of the source material to be able to consider the F0 contour of the whole utterance in its continuous nature, using a customized praat script that combines the procedures of Mausmooth (Cangemi, 2016) and ProsodyPro (Xu, 2013). Unreliable pitch points were removed manually before smoothing the pitch contour with a bandwidth of 10 Hz. After interpolation of pitch points, the contour was smoothed again with a bandwidth of 15 Hz following the procedure in Cangemi (2016). A total of 140 F0 values (10 per each segment in the names and 10 per coordination) were extracted and converted into semitones (st) relative to 1 Hz following Hazan et al. (2016) to facilitate a comparison independent of pitch height. Intervals labeled as pauses were not considered. Figure 4 shows, thus, the smoothed F0 contours, plotted separately for each speaker and condition (bracket and no bracket productions on top of each other). As time on the x-axis is normalized and pauses are excluded, the figure does not contain information in the durational domain. The (normalized) time domains of the seven gates are given by vertical lines. Considering the black mean lines (solid for bracket and dashed for no bracket) and the shaded standard deviations in the time domains of g1 and g2, differences between speakers become apparent, because the two lines neatly overlap for speaker 16, but start to diverge on g2 for the other three speakers.

Time normalized smoothed F0-contours for the bracket (solid) and the no bracket (dashed) condition, separated for speakers (panels).
3.3 Experimental procedure
The experiment took place in the Acoustics Laboratory at the University of Potsdam. Participants were tested one by one with a single session lasting about 60 min, of which the actual experiment took about 30 min. Participants were seated in a sound-attenuated booth in front of a flat-panel display with 1920 × 1200 resolution. They received the instructions in verbal and in written form and were given the opportunity to ask questions before and after the practice phase. The practice phase was run prior to the test phase and consisted of two gated utterances, one for each condition, thus 14 audio snippets in total. The stimuli presented in the practice phase had been produced by a different, randomly chosen speaker and had also been verified regarding the identifiability of the respective condition in the perception check in Huttenlauch et al. (2021).
In the practice and test phases, the gated audio stimuli were presented via an HSC 271 headset (produced by AKG Acoustics). Randomization of source stimuli but not gated stimuli (meaning that the ascending gates in each test item were always from the same individual uncut source stimulus) was implemented for all items by means of eight different randomization lists. Scripts were written and run in Open Sesame (Mathôt et al., 2012), version 3.3.6, logging all data that Open Sesame gathered during the experiment and selecting the variables relevant for analysis after the experiment while dropping redundant columns. Open Sesame was executed on a Dell laptop that was located outside the sound booth and connected to all technical devices used in the experiment via an Alesis io12 interface.
The experimental task was a forced-choice decision task with two alternatives in which participants had to assign each gated stimulus to one condition, no bracket or bracket. Answers were given via button press on a Cedrus RB-840 button box, using the left and right index fingers, after the stimulus onset. To avoid time delays while answering, participants were advised to place their fingers on the buttons again after each trial. Answers were supposed to be given as fast as possible. Thus, it was possible to already give an answer before the end of the auditory stimulus presentation. One trial consisted of the auditory presentation of a gated stimulus while showing a fixation cross on the screen, followed by the visual presentation of two pictograms after 1,000 ms, each referring to one of the two conditions (see Figure 5(a) and 5(b)). In four of the randomization lists, the bracket option was localized on the left side of the screen and the no bracket option on the right, whereas in the other four lists, the pictograms were switched. After each given answer, participants had to rate their confidence for the given answer on a seven-point scale using the respective number bars on a keyboard (1 corresponded to completely unsure, 4 corresponded to somewhat sure, 7 corresponded to completely sure). The confidence rating (CR) was followed by a blank screen that lasted for 2,000 ms before the start of the next trial.

(a) Pictogram (bracket condition) indicating button press in the experiment. (b) Pictogram (no bracket condition) indicating button press in the experiment.
3.4 Statistical analysis
All calculations were executed using the software RStudio, version 1.3.1056 (R Core Team, 2020). Visualizations were also generated in RStudio, using the package ggplot2 (Wickham, 2016), version 3.3.3. From the variables logged by Open Sesame during the experiment, the following variables were selected and used for analyses as outcome variables, predictor variables, or random effects: response accuracy (correct/ incorrect), condition (no bracket/ bracket), confidence rating (CR), speaker (6, 10, 11, 16), item, and participant. The data analysis was carried out in the frequentist framework, using and generalized linear mixed models (GLMM) for significance testing. A conservative alpha level of .05 was predefined. For model implementation, the function glmer from the package lme4 (Bates et al., 2014), version 1.1-26, was used. For all predictor variables that were used in the analyses outlined in the following, the significance of the predictor was evaluated in model comparisons using the anova function from the package car (Fox & Weisberg, 2018), version 3.0-10. Likewise, the best applicable model complexity was assessed, further taking into account the Akaike information criterion (Akaike, 1974) as well as the Bayesian information criterion (Schwarz, 1978). Predictor contrasts were tied to research questions and predetermined predictions, or to certain exploratory questions that were specified before running the model analysis, and were coded using the R package MASS (Venables & Ripley, 2013), version 7.3-53. Random effects were determined as proposed by Barr et al. (2013). The extracted model estimates were transformed from log odds to percentage proportions prior to interpretation. All reported results lie within the respective 95% confidence interval (CI).
3.4.1 Analysis of response accuracy
3.4.1.1 Chance range and significance tests
Using a binomial sign test, the accuracy score that indicates a robust performance above chance within one gate was calculated. This was done using the function binom.test from the R base package stats, with a one-sided test (alternative “greater”). For an additional reassurance of robustness, we checked whether a performance above chance was constant for successive gates within participants. The significance of observed differences between gates was calculated using a GLMM. Following the first research question, gate was included as a predictor coded with a sliding difference contrast. As a result, the linear model successively compared the levels of the factor gate against each other—g2 was compared to g1, g3 to g2, and so forth. The full model further comprised random effects of gate with correlating varying intercepts and slopes by participants and items. Condition was not included as a predictor as the predictions for the related research question were not specific to the no bracket or bracket condition.
3.4.1.2 Post hoc ratings of response patterns and subgroup analysis
After data collection, the data were visualized and explored to get an overview of the distributions and to check for outliers and unexpected or interesting patterns. Two distinct response patterns were apparent when looking at the visualizations of the given responses per participant. To find out whether participants could potentially be grouped according to their response patterns, we let six individuals with a background in experimental linguistics match the visualizations per participant (as in Figure 6(a) and 6(b) in Section 4.2.3, but unsorted) to one of the proposed subgroups. The two recognizable subgroups were described to the raters in a neutral way that did not include any hypothesized background assumptions, thus, the descriptions define patterns resulting from response behaviors (pressing one or both buttons) and do not make any claims about response decisions (see below). There was also the opportunity to assign a participant to an alternative third group (Neither of the above (n)), if the response pattern did not fit either description.

(a) Scatterplots of response patterns of the participants assigned to the identification subgroup. The numbers refer to the participant IDs. (b) Scatterplots of response patterns of the participants assigned to the waiting subgroup. The numbers refer to the participant IDs.
The descriptions of the different response patterns given to the raters read as follows:
Group 1:
Participants in this group stuck to one button during the first gates for the vast majority of trials. This results in a response pattern with one condition mostly at an accuracy of 1 (i.e., correct) and the other at 0 (i.e., incorrect). At higher gate numbers, participants used both buttons and the overall accuracy increases.
Group 2:
Participants in this group used both buttons right from the beginning and throughout the experiment. This results in a response pattern with both conditions distributed across accuracy 1 and 0.
Group 3:
For participants in this group, it is impossible to deduce a certain response pattern. They neither have a visible tendency to fit in group 1 nor group 2.
Following the results of the rating, participants were categorized into subgroups.
To complement the percentage values of the agreement and assess rating reliability while accounting for the possibility of guessing, Fleiss Kappa was computed. Free-marginal kappa was chosen here, because the raters were not restricted regarding their distributions of cases into categories (Randolph, 2005). The additional variable subgroup was added to the results data frame and was used as a predictor variable, with gate nested below it in the maximal GLMM. Random intercepts and slopes for gate and subgroup by items were also included. Because subgroups are linked to participants, no random intercepts and slopes by participants were included. The applied sum contrast compared each factor level of subgroup to the grand mean. Thus, this model was testing for statistically significant differences between the subgroups within each gate.
3.4.1.3 Exploratory analysis of accuracy by speaker
Different speakers naturally exhibit individual patterns of prosodic cue usage. To explore differences between speakers and the performance within each speaker for each gate, a GLMM was run using the speaker (i.e., the person who had produced the coordinate structure in Huttenlauch et al., 2021) as a sum-contrasted predictor variable nested below gate. This model compared factor levels of speaker with a reference level (speaker 6) and the gate-wise increase in performance by the speaker. The choice of a (in this case arbitrary) reference level was required to make a speaker-comparison possible. The full model also contained random effects for the speaker, with correlated varying intercepts and slopes by participants. Because items are linked to speakers, varying intercepts and slopes were only calculated by participants.
3.4.1.4 Familiarization effects
Experimental tasks are often different from natural processing—so is a forced-choice decision task with gated stimuli. To check for familiarization effects, a unique variable for familiarity was created, based on the possibility that participants might have undergone adaptation to the task or learning over the course of the experiment as indicated by increasing accuracy scores. For this, the very first ten (out of 192) coordinate structures each participant encountered (split into seven gates, thus equivalent to the first 70 trials) were categorized as unfamiliar. All following trials (n = 1,274) were categorized as familiar, in the sense of postfamiliarization. A GLMM was set up evaluating familiarity as a sum-contrasted predictor of accuracy, including varying intercepts and slopes for familiarity by participants. In addition, a model with familiarity nested below subgroup was compared with the model solely including familiarity. The full model included varying intercepts and slopes for subgroups by items and for familiarity by participants, including correlation parameters. Because the speaker that each participant encountered initially varied according to randomization lists, an interaction of familiarity and speaker was also tested.
3.4.2 Analysis of confidence ratings
As another complementary analysis, CRs were analyzed using a GLMM with the potential predictors accuracy, gate, condition, speaker, and subgroup. The full model comprised a random structure with correlated varying intercepts and slopes for all significant predictors by participants and items.
4 Results
Two participants had to be excluded from the analysis due to a performance at chance level or below at g7 (where the whole utterance was presented) or accuracy scores of more than two standard deviations (SDs) below the group mean at g7. The observed performance in these two individuals indicates a lack of ability to identify internal grouping correctly after all prosodic information was given—or possibly a lack of motivation for correct task execution. A remaining number of 43 participants were included for data analysis (37 female, 6 male, mean age = 22.14, SD = 2.83, age range = 18–30 years).
4.1 Descriptive statistics
Table 1 provides an overview of means and SDs by gate for accuracy and CRs. Accuracy increases with higher gates, whereas SDs decrease. CRs also increase (i.e., confidence in given responses increases) with higher gate numbers.
Means and SDs of accuracy (proportion correct) and confidence ratings (1–7) by gate (across all 43 participants).
Accuracy above chance.
4.2 Statistical analyses of response accuracy
4.2.1 Response accuracy in relation to chance and additional check of robustness
The accuracy value at which performance was robustly considered above chance is .65, resulting in g3 being the gate where performance exceeds the chance range at the group level. About 65% of the participants already scored above chance that early (see Table 2). At g5, nearly all participants scored above chance.
Number and percent of participants (n = 43) with accuracy above chance by gate.
The additional sanity check (robustness of an above-chance score in subsequent gates per participant) revealed a less robust performance within participants for the first gate than for all following gates. That is, at g2, two out of nine participants that scored above chance at g1 no longer showed performance above chance. As of g3, however, the performance of all participants who scored above chance was constant for subsequent gates.
4.2.2 Generalized linear mixed model
The variable gate was a significant predictor of accuracy (p < .0001) and was included in the GLMM. Because the full model did not converge, it was reduced. The most complex converging model was a zero correlation parameter (zcp) model with varying intercepts and slopes for gate by participants and items. Fixed effects are displayed in Table 3. Statistically significant differences between gates were found for g5 compared with g4, g6 compared with g5, and g7 compared with g6.
Fixed effects of the model on accuracy by gate.
Note. A sliding difference contrast was used to successively compare adjacent factor levels. Estimates are presented as in the original model output (log odds) as well as in percentage increase from gate to gate. Statistically significant effects are marked in bold (p < .05).
4.2.3 Ratings of answer patterns
Interrater agreement was 76.89% with a free-marginal kappa value of .65 (95% CI: .54, .77). 26 participants were assigned to the identification pattern subgroup and 17 to the waiting pattern subgroup (see Figure 6(a) and 6(b)). None of the participants were assigned to the third option (neither of the above). It is noteworthy that all participants who already scored above chance at g1 (see Table 2) were rated as belonging to the identification pattern subgroup.
4.2.4 Subgroup analysis
There was strong evidence for subgroup as a predictor of accuracy (p < .0001). The maximal model including varying intercepts and slopes for gate and subgroup by items did not converge; thus, the results from a model with correlated varying intercepts but no slopes are reported (see Table 4). Figure 7 additionally shows accuracy per gate and subgroup.
Fixed effects for the model including subgroup, identification pattern (i)/waiting pattern (w) nested below gate as a predictor of accuracy.
Note. Estimates refer to the intercept and are presented in the original model output (log odds) as well as in the percent difference between the two subgroups. Significant effects are marked in bold (p < .05).

Boxplots of accuracy (in %) by gate and subgroup, identification pattern group (i)/waiting pattern group (w).
The model revealed that up to g5, accuracy was significantly higher for the identification pattern subgroup compared with the subgroup of participants classified as showing the waiting pattern. There were no significant differences in accuracy between the subgroups for g6 and g7.
4.2.5 Accuracy by speaker
There was no evidence for speaker as a single predictor of accuracy in a model comparison (p = .427). A model including speaker nested below gate attained a significantly better fit to the data (p < .0001). The full model including random slopes did not converge. The results were thus extracted from a model with correlated varying intercepts. For the productions of speakers 6 and 11, the model revealed a significant increase in listeners’ accuracy with increasing gates: g2 compared with g1 (3.615%, p = .03; 5.581%, p = .0002), g3 to g2 (3.709%, p = .03; 4.663%, p = .004), g5 to g4 (9.859%, p < .0001; 8.642%, p < .0001), and g6 compared with g5 (12.019%, p < .0001; 0.452%, p = .002). For speakers 10 and 16, a significant increase in listeners’ accuracy was found for g5 compared with g4 (9.326%, p < .0001; 12.745%, p < .0001) as well as for g6 compared with g5 (10.838%, p < .0001; 7.571%, p < .0001).
Figure 8 complements the analysis results. Thus, for productions stemming from two speakers (speaker 6 and speaker 11), accuracy already improved significantly early. Only later, that is with higher gates, did listeners’ performance also increase for speaker 10 and speaker 16, with a more pronounced effect for speaker 16.

Boxplots for accuracy (in %) by gate and speakers (6, 10, 11, 16).
4.2.6 Familiarization effects
Familiarity was a significant predictor of accuracy in the model comparison (p < .0001). There was no evidence for an interaction between familiarity and speaker (p = .193). The GLMM including familiarity nested below subgroup fit the data significantly better than the model including familiarity as a single predictor (p < .0001). Results are reported from the full model. The effect of familiarity was significant (p = .002), with a 3.194 percent higher accuracy for familiar than for unfamiliar items. Participants with an identification pattern outperformed participants with a waiting pattern in both the familiarization phase, by 12.457%, and the postfamiliarization phase, by 13.702%.
4.3 Statistical analysis of confidence ratings
Except for condition (p = .361) and speaker (p = .361), all other tested predictor variables were significant (p < .0001 for gate, p < .0001 for accuracy, p = .013 for subgroup) in the model comparison. Due to a convergence failure, the full model was reduced. The most complex model that converged included correlated varying intercepts by participants and items. The model predicted correct answers to be linked to higher CR scores than incorrect answers (4.371%, p < .0001). The effect of subgroup was also significant (p = .012), with participants of the identification pattern subgroup scoring 12.491% higher in CRs than participants of subgroup (w). Comparisons between gates were highly significant for g2 compared with g1 (5.336%, p < .0001), g3 to g2 (4.529%, p < .0001), g4 to g3 (3.822%, p < .0001), g5 to g4 (5.485%, p < .0001), as well as for g6 compared with g5 (5.391%, p < .0001). Figure 9 shows the increasing proportion of high CRs in higher gates.

Bar plot of CRs (ratings from one to seven, in proportions) by gate.
4.4 Summary of results
This study investigated listeners’ ability to exploit early prosodic cues in coordinated three-name sequences to identify the internal grouping of the constituents. At the group level, accuracy exceeded the chance range at g3. Gate-wise comparisons of accuracy were significant for g5 compared to g4, g6 to g5, and g7 to g6. Ratings of the response patterns of participants revealed two subgroups: participants with a waiting pattern primarily stuck to one response button (i.e., one choice) during the first gates up to g5 (Name2), whereas participants with an identification pattern used both response buttons right from the beginning. Subgroup also was a significant predictor of accuracy: the identification pattern subgroup significantly outperformed the waiting pattern subgroup at all gates up to g5 (Name2). Similarly, the identification pattern subgroup already scored above chance at g2 (and the following gates), whereas the waiting pattern subgroup only exceeded the chance range at g5. Also, all participants who already scored above chance at the first gate (n = 9) belong to the identification pattern subgroup.
The experimental stimuli stemmed from four different speakers and thus speaker was included as a predictor within each gate. For productions of speakers 6 and 11, accuracy already increased significantly early, for all gate comparisons starting with g2 compared with g1. For speakers 10 and 16, an increase in accuracy was not statistically significant until later gates, starting with g5 as compared to g4. Although the visualizations of the speaker-specific cue use in the source material are not precisely part of the statistical analyses, we relate our findings to their features for an easier interpretation. In Figure 3, the mean F0-range on Name1 does not overlap between bracket and no bracket conditions for speakers 6 and 11, but they do overlap for the remaining two speakers. The time-normalized F0 contours in Figure 4 show that the F0 contours of the bracket and the no bracket condition start to diverge on Name1 (g2) not only for speakers 6 and 11, but also for speaker 10, although not for speaker 16.
Regarding a possible familiarization, accuracy slightly increased across participants after the first 70 trials, but there was no interaction of familiarity and speaker. Again, the identification pattern subgroup outperformed the waiting pattern subgroup in both phases, the familiarization phase (i.e., the first 70 trials) and the postfamiliarization phase. CRs increased gradually across gates (see descriptive statistics in Table 1 as well as Figure 9), which was confirmed by the corresponding GLMM—all gate comparisons up to g6 were statistically significant. Confidence in correct trials was rated higher than confidence in incorrect trials. Furthermore, participants assigned to the identification pattern subgroup had higher confidence in their answers than participants with a waiting pattern. Speaker and condition (bracket/no bracket) did not significantly contribute to explaining variance in the model addressing CRs.
5 Discussion
This study was designed to gain insights into the role of early, scarcely investigated prosodic cues in the perception of coordinated three-name sequences with and without internal grouping of the first two names. “Early” refers to the location of the cues, namely on/after the first name (Name1), that is, before the most salient prosodic cues on/after the second name (Name2), which is at the group edge. The overarching question was whether these early cues can be used to predict the syntactic structure of the evolving utterance. More precisely, at which gate are listeners able to reliably distinguish between sequences with or without internal grouping (RQ 1)? A second aim was to explore variability in listeners’ respective parsing capacities (RQ 2). Stimuli consisted of three-name sequences that were cut into seven parts (gates, g1–g7) and that were presented to participants with successively increasing length and thus, an increasing amount of prosodic information. The analysis of response accuracy of a two-alternative forced choice decision task was complemented by an analysis of the individual CRs.
In general, the findings are in line with the prediction that listeners can reliably detect the internal grouping after Name2: at the related gate (g5), almost all participants’ performances (97.67%) exceeded the chance range. One participant did not score above chance until g6, possibly due to the cutting of the stimuli: the pause cue is only reliably perceivable in the following gate (g6) because silence at the end of g5 is indistinguishable from the end of the gated recording. The same holds for the pause cue that is present at g2 (Name1)—it will only be reliably perceivable at g3. Additional prosodic information may possibly be located at the coordinating conjunction und (English “and”), which is present at both g3 and g6.
We will now discuss our findings with respect to our research questions. Regarding RQ 1, the processing of early prosodic cues and group-level performance was above chance at g3, thus, a reliable detection of internal grouping was already possible shortly after Name1. Gate 3 corresponds to the snippet containing the first name and the following coordinating und. Therefore, it is the part of the utterance, where the Proximity/Similarity Model by Kentner and Féry (2013) predicts prosodic differentiation between structures with and without internal grouping and where differences in the use of prosodic cues had been observed by production studies (Huttenlauch et al., 2021; Kentner & Féry, 2013). The visual inspection of F0 movement in the source material matches this prediction for three out of four speakers: diverging F0 contours are observable as of the second syllable of Name1 (g2) for speakers 6, 11, and 10. Our results suggest that listeners in the current study were able to exploit these early cues for disambiguation. Furthermore, with respect to RQ 2, the study results indicate individual differences among listeners: at least 20% of the participants were able to make reliable decisions about the internal grouping even earlier than the group mean, namely already at the first two gates corresponding to Name1 (20.93% of the listeners at g1, 41.86% at g2—see Table 2) and more than half of the listeners made a reliable decision at the gate before Name2 (g3, 65.12%). Note that performance was constantly above chance for subsequent gates among these participants; hence, we consider the above-chance performance to be quite robust.
Overall, the observation of variability between listeners was underlined by the finding of two subgroups: 26 listeners were classified into an identification pattern subgroup and 17 into a waiting pattern subgroup through a rating of their response patterns. For the identification pattern subgroup, the clear above-chance performance at g2 and the fact that all participants who could already reliably judge the internal grouping at g1 belong to this subgroup indicate a high prosodic parsing capacity. However, for the waiting pattern subgroup, it is not clear whether their chance performance up to g5 is due to varying prosodic parsing capacities or varying strategies for task completion. The former assumption is supported by the finding of listener variability in prosodic parsing that has been observed in different experimental tasks and with different speech materials by Cangemi et al. (2015), Cole et al. (2017), and Roy et al. (2017). The analysis of CRs suggests that participants in the identification pattern subgroup were also more confident about their given answers than in the waiting pattern subgroup. This may be interpreted as an indication of enhanced parsing skills in the identification pattern subgroup and further confirms the existence of clearly distinct differences between the individuals in the two subgroups. These differences are also supported by the fact that accuracy remained significantly higher for participants with an identification pattern than for the waiting pattern subgroup up to g5, where we find the late prosodic cues at the second syllable of Name2. Furthermore, the effect size for differences between subgroups per gate is the highest at g2, where early prosodic information related to Name1 is located. The statistically significant differences at early gates (g1, g2, g3) for the subgroups are presumably related to individual differences with respect to sensitivity to F0 cues. The visual inspection of F0 movement (see Figure 3 in Section 3) as produced by the speakers of our stimuli suggests the systematic use of F0 to distinguish between no bracket and bracket conditions. These individual differences are in line with the listener-specific attention to pitch-related features described by Baumann and Winter (2018) for prosodic prominence.
Now, we will discuss the exploratory analyses on speaker-related processing. Identifiability of internal grouping seems to depend not only on parsing capacities or internal strategies of the listener but also on the cues that are produced by the speaker. This assumption is based on the complementary analysis of accuracy by the speaker: for two out of four speakers, the statistical models predict a significant improvement in accuracy already at early gates, before Name2. These findings are in line with previous findings on speaker-dependent accuracy in prosodic parsing tasks (Cangemi et al., 2015; Swerts & Geluykens, 1994). Figures 3 and 4, which show descriptive visualizations of the three prosodic cues investigated by Huttenlauch et al. (2021), reveal differences in the cue use between the individual speakers which go along with the response behavior of the listeners. For speakers 6, 10, and 11 in Figure 4, means of the smoothed time normalized F0-contours and the SDs of the bracket versus the no bracket condition diverge from each other on the second syllable of Name1 (i.e., the time domain of g2), whereas they completely overlap in the productions of speaker 16. For the former three speakers, the listeners’ mean accuracy was already above the chance range at g2, whereas it was below for the latter (cf. Figure 8). Although it needs to be clarified whether the difference on Name1 produced by speakers 6, 10, and 11 is audible, the observation provides an indication that F0 is used by listeners to predict the internal grouping structure. In contrast, a visual consideration of final lengthening on Name1 (Figure 3) does not reveal clear differences between conditions in any of the speakers. Of course, we are aware that caution is needed when drawing conclusions based on the visual inspection of graphs. Thus, future research should statistically verify this issue further.
Interestingly, speaker-related differences in disambiguation are not mirrored by the analysis of participants’ confidence ratings at the first gates containing early cues; that is, participants were not more confident in their decisions on speakers 6, 10, or 11 than on speaker 16. Thus, listeners were probably not aware of the fact that certain speakers supplied obviously more “useful” cues than others.
Finally, we will discuss another exploratory analysis we ran to account for the rather artificial nature of the experimental task: we investigated the influence of familiarity with the task. The analysis revealed a mild improvement in participants’ performances after a familiarization phase of the first 70 trials. The superior performance of the identification pattern subgroup is present in both phases, suggesting that participants of this subgroup did not acquire their superior parsing capacities over the course of time but brought them with them. At the same time, the waiting pattern subgroup could not benefit more than the identification pattern subgroup from the familiarization with the task. After all, familiarizing with the task did not seem to have a decisive influence on the skills that were used to solve it. As we could not find an interaction between familiarity and speaker, our results tend to support a syntagmatic process in which listeners identify prosodic features by means of variation in the local context, as opposed to changes perceived in relation to a speaker-specific prosodic space.
With respect to our overarching research question, predictions about the syntactic structure of the whole name sequence seem to be possible based on early prosodic cues on Name1, and about 65% of the listeners are sensitive to this early information. Listeners additionally face the difficulty of compensating for a high degree of individual speaker differences. Rapid integration of incoming prosodic information into the parsing process may be an especially rewarding effort in structures of higher complexity than coordinated name sequences.
In any case, the findings of this study underline the global nature of prosodic boundaries as they are already indicated by earlier cues in an utterance which can be effectively used by (at least some) listeners for syntactic parsing. Especially regarding F0 as a cue to internal grouping, it seems necessary to consider the whole time course over which it unfolds, as at least some speakers modulate F0 right at the beginning of the utterance to distinguish between bracket and no bracket conditions and it seems that some listeners use this information for disambiguation. Boundary phenomena, thus, should not be investigated solely as local phenomena, detached from the whole prosodic context, but in a more global manner.
For further investigation of the processing of early cues, a study using the visual world paradigm would be a valuable method. By using eye tracking, results from gated stimuli could be compared with those from ungated stimuli (as in Allopenna et al., 1998), to determine how cue exploitation in the gating paradigm compares to processing in a more natural setting. This would also allow to corroborate our findings on individual variability in listeners’ integration of prosodic markers for ambiguity resolution over the course of prosodic parsing. It would also be interesting to test if our results can be replicated with a wider range of productions and/ or productions from more natural settings. As demonstrated by Clifton et al. (2006), among listeners, prosodic cues appear to have larger implications for perception in shorter than in longer constituents. Furthermore, prosody was observed to be especially crucial in disambiguating utterances which are different in interpretation with respect to the intended grouping (Watson & Gibson, 2004). Moreover, it would be interesting to investigate the perception-production link and to see if individuals who produce stronger early prosodic cues perform better at perceiving and exploiting these cues than individuals who do not produce clear early prosodic cues.
6 Conclusion
The results of this study strongly indicate variability among listeners regarding prosodic parsing: some listeners were already able to correctly predict at the first name whether it belongs to a three-name sequence with or without internal grouping of the first two names. This suggests that these listeners were sensitive to prosodic cue information that is located earlier in the name sequence than the prosodic cues at the end of the grouping (referred to as late cues on the second name). Other listeners were not able to correctly identify the prosodic pattern until the end of the second name. In addition to individual parsing capacities, listeners’ responses showed sensitivity to speaker-specific variability that matches the individual differences in prosodic cues observed for the speakers the productions stemmed from. The speakers whose productions received the highest accuracy ratings at early gates show visible differences in F0 on the first name between grouping conditions. As we did not specifically analyze possible facilitation effects of specific prosodic cues for perception, statistical verification of this observation remains for future research. Overall, the data support the notion that prosodic marking of internal grouping is not a local phenomenon but rather unfolds globally over the course of an utterance—and that early prosodic cues provide meaningful information which can be exploited for ambiguity resolution, at least by a subset of listeners.
Supplemental Material
sj-csv-3-las-10.1177_00238309221127374 – Supplemental material for Individual Differences in Early Disambiguation of Prosodic Grouping
Supplemental material, sj-csv-3-las-10.1177_00238309221127374 for Individual Differences in Early Disambiguation of Prosodic Grouping by Marie Hansen, Clara Huttenlauch, Carola de Beer, Isabell Wartenburger and Sandra Hanne in Language and Speech
Supplemental Material
sj-pdf-1-las-10.1177_00238309221127374 – Supplemental material for Individual Differences in Early Disambiguation of Prosodic Grouping
Supplemental material, sj-pdf-1-las-10.1177_00238309221127374 for Individual Differences in Early Disambiguation of Prosodic Grouping by Marie Hansen, Clara Huttenlauch, Carola de Beer, Isabell Wartenburger and Sandra Hanne in Language and Speech
Supplemental Material
sj-rmd-2-las-10.1177_00238309221127374 – Supplemental material for Individual Differences in Early Disambiguation of Prosodic Grouping
Supplemental material, sj-rmd-2-las-10.1177_00238309221127374 for Individual Differences in Early Disambiguation of Prosodic Grouping by Marie Hansen, Clara Huttenlauch, Carola de Beer, Isabell Wartenburger and Sandra Hanne in Language and Speech
Footnotes
Acknowledgements
We are very grateful to the comments of two anonymous reviewers and the associate editor who helped to improve the paper considerably.
Data availability statement
Data and code are available via SAGE.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Project number 317633480—SFB 1287.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
