Abstract
Conversation is the natural setting for language learning and use, and a key property of conversation is the smooth taking of turns. In adult conversations, delays between turns are minimal (typically 200 ms or less) because listeners display a striking ability to predict what their partner will say, and they formulate a response before their partner’s turn ends. Here, we tested how this ability to coordinate comprehension and production develops in preschool children. In an interactive paradigm, 106 children (ages 3–5 years) and 48 adults responded to questions that varied in predictability but were controlled for linguistic complexity. Using a novel distributional approach to data analysis, we found that when children can predict a question’s ending, they leave shorter gaps before responding, suggesting that they can optimize the timing of their conversational turns like adults do. In line with a recent ethological theory of turn taking, this early competency helps explain how conversational contexts support language development.
For adults, conversation is the most natural form of social interaction, and a key characteristic of conversation is the smooth taking of turns. Timely turn taking appears to be universal (Levinson, 2006): Across diverse languages and cultures, conversational partners often switch turns, rarely speak over one another, and typically leave only the smallest possible silent gap between turns (Stivers et al., 2009). The brevity of such silent gaps (often < 200 ms) is particularly striking because linguistic formulation takes time: Preparing to name a picture, for instance, takes at least 600 ms (Indefrey & Levelt, 2004). Thus, speakers must begin formulating their responses even while their partner is still talking, and this act of formulation must itself be based on a prediction about what their partner was going to say (Bögels, Magyari, & Levinson, 2015; Corps, Crossley, Gambi, & Pickering, 2018; Levinson, 2016; Magyari, Bastiaansen, de Ruiter, & Levinson, 2014; Riest, Jorschick, & de Ruiter, 2015; Sacks, Schegloff, & Jefferson, 1974; though see Sjerps & Meyer, 2015). In other words, the optimal timing of conversational turns depends on an ability to coordinate language comprehension (prediction) and production (formulation).
In recent work, Levinson (2006, 2016) argued that fluent turn taking may be “part of our ethology” (Levinson, 2016, p. 10); that is to say, it is an evolved adaptive behavior that facilitates communication and a skill on which the development of other linguistic and cognitive skills might depend (Clark, 2007; S. A. Gelman, 2009; Hirsh-Pasek et al., 2015; Zimmerman et al., 2009). Therefore, he proposed that the ability to engage in fluent turn taking, including the ability to coordinate prediction and formulation, should emerge early in development. And indeed, observational studies show that prelinguistic infants engage in proto-conversational interactions with their caregivers (Jaffe, Beebe, Feldstein, Crown, & Jasnow, 2001; Murray & Trevarthen, 1986) and that these interactions are well coordinated and well timed (Hilbrink, Gattis, & Levinson, 2015).
However, as children begin to interact using language, their fluency at turn taking declines, such that they leave longer gaps during linguistic turn taking than during previous nonlinguistic turn taking (Hilbrink et al., 2015). One recent corpus analysis found that the median gap left by toddlers and preschoolers was greater than 600 ms (i.e., hundreds of milliseconds longer than the typical gap left by adults; Casillas, Bobb, & Clark, 2016), whereas other work reported median gaps of greater than 1,000 ms, even in 5-year-olds (Garvey & Berninger, 1981; Stivers, Sidnell, & Bergen, 2018). This developmental trajectory—from well-timed nonlinguistic interactions to less fluent linguistic interactions—raises the possibility that preschoolers may in fact lack competence at flexibly coordinating linguistic prediction and formulation. For example, they may start formulating a response only when their partner is about to stop speaking (as Sjerps & Meyer, 2015, suggested for adults), which would also limit the overlap between comprehension and production and thus make switching between tasks less taxing (Zelazo et al., 2003).
In response, to the evidence that children’s turn taking becomes less fluent over time, Levinson (2016) argued that the poor timing of children’s conversational turn taking is not indicative of an underlying lack of competence but instead results from “the challenge of cramming . . . complex linguistic material into brief turns” (p. 10; see also Casillas et al., 2016). Thus, children can flexibly coordinate prediction of what a conversational partner will say with early formulation of a response, but compared with adults, they find it hard to linguistically encode the ideas that they want to express, and thus they respond with a delay. Consistent with this, findings from eye-tracking studies suggest that children are skilled predictors. For instance, when 1-year-olds observe two people conversing, they anticipatorily gaze to the next speaker as the turn changes (Casillas & Frank, 2017; see also Keitel, Prinz, Friederici, von Hofsten, & Daum, 2013), whereas even 2-year-olds can predict upcoming words when listening to simple sentences (Borovsky, Elman, & Fernald, 2012; Gambi, Pickering, & Rabagliati, 2016; Mani & Huettig, 2012; Rabagliati, Gambi, & Pickering, 2015). But importantly, this experimental work assessed only how children process language while passively listening.
To directly test the key claim that children can coordinate prediction and formulation, we developed a paradigm in which children were active contributors to an experimentally controlled conversation, embedded in an interactive touch-screen maze game. Our design was inspired by the finding that adults leave shorter gaps after questions containing earlier informative material (e.g., “Which character, also called 007, appears in the famous movies?”) than questions containing later informative material (e.g., “Which character from the famous movies is also called 007?”), suggesting that they prepare a response as soon as possible (Bögels et al., 2015). We built on this demonstration to test whether children can coordinate prediction of a question with early preparation of their response.
In our paradigm, children conversed with an avatar (Peter Pan) as he played hide and seek with a parrot around a maze. When Peter Pan reached a fork in the maze, he encountered two familiar cartoon characters (e.g., Po and Boots), and the parrot hid behind one of them (see Fig. 1 for a visual depiction of each condition and Table 1 for an overview of the experimental design). Peter Pan then asked the participant where the parrot was hiding, to which the participant responded verbally. We expected children to leave shorter conversational gaps when the informative content (the name of one of the cartoon characters) arrived earlier in the question (e.g., “Is Po hiding the parrot?”) than when it arrived later (e.g., “Is the parrot behind Po?”).

Visual representation of the four conditions. Characters are depicted schematically here, but participants saw actual depictions of the characters when playing the game.
Experimental Design and Summary of Predictions
To account for how extraneous linguistic differences between the questions may affect response times, we created control mazes in which Peter Pan chased and asked about two animals (a parrot and a tiger; see Fig. 1). At each fork, the two animals hid behind different cartoon characters, and Peter Pan asked where one had hidden (e.g., “Is Po hiding the parrot?” or “Is Po hiding the tiger?”). Thus, because Peter Pan could ask about either animal, participants could not accurately predict what he would say, regardless of whether the relevant information came early or late in the question. Crucially, if participants could use predictions to begin formulating a response before a question ended, then we expected an interaction between the information structure of the question and the type of maze: In one-animal mazes, participants should be faster to respond after early than late questions, and this difference in response times should be greater than in two-animal mazes, in which prediction was not possible.
To test for the presence of this interaction in adults and preschoolers, we adopted a novel distributional data-analysis method. Because the distribution of gaps in turn taking is heavily skewed to the right (especially for children), standard statistical comparisons between means are problematic, because skew can mask true differences in response times or induce spurious ones. We thus moved away from analyzing the mean and instead modeled how the full distribution of response times changed across conditions (Umlauf & Kneib, 2018). Our statistical model was based on the ex-Gaussian distribution, in which a Gaussian distribution and exponential distribution are convolved. This provides an excellent fit to right-skewed data: The mean of the Gaussian distribution captures shifts in modal response times, and the rate of the exponential distribution captures changes to the weight of the right tail (Balota & Yap, 2011). Developmental researchers have rarely used ex-Gaussian analyses because such models are typically fitted to individual participants and thus require large numbers of observations. Here, however, we built on recent statistical advances (Umlauf & Kneib, 2018) to fit a hierarchical, or multilevel, ex-Gaussian model, in which observations are partially pooled across participants (A. Gelman & Hill, 2007), resulting in a rich description of response dynamics during turn taking.
Method
Participants
We recruited 70 younger children (32 female; age: M = 40.7 months, range = 36–47 months) and 50 older children (25 female; age: M = 62.6 months, range = 54–71 months). We planned to recruit only 5-year-old children in the older group, but because of recruitment difficulties, 11 children were younger than 5. To avoid confusion, we thus refer to the two groups as younger and older children, respectively. All were native English speakers, although it was reported that 8 children also heard a second language at home. Twelve younger children and 2 older children were excluded because they had a speech delay (n = 1), did not understand how to play the game (n = 1), or showed no interest in the game (n = 10) or because of experimenter error (n = 2). We also tested 48 English-speaking adults from the University of Edinburgh (30 female; age: M = 20.6 years, range = 18–35 years).
Sample sizes were based on unpublished work from our laboratory, which showed that lexical factors affected the timing of turn taking in a sample of 24 adults. That paradigm was within subjects, whereas the present paradigm was between subjects; therefore, we doubled the sample size. The sample sizes for children were larger to counter the fact that children did not always complete the full experiment. Data collection stopped when the first author (the experimenter) reached the end of her degree. At that point, we had collected roughly as many child trials (across the two age groups) as adult trials. All adult participants and child caregivers gave written consent prior to beginning the study, and children provided verbal assent. The procedure was approved by the ethics committee of the University of Edinburgh School of Philosophy, Psychology and Language Sciences.
Materials and procedure
The game was presented on an iPad and written using Swift (Version 1.2; Swift Core Team, 2015). Participants were told that the goal of the game was to help Peter Pan navigate a set of mazes while searching for either one or two animals and that they should answer his questions as quickly as possible. Participants completed four mazes (i.e., blocks of trials), and each maze contained 24 forks (i.e., trials). The number of animals (maze type) was varied between participants, and the identity of the animal used in one-animal mazes (parrot or tiger) was counterbalanced across participants. Information structure (early: “Is Po hiding the parrot?” vs. late: “Is the parrot behind Po?”) was manipulated within participants.
Between trials, participants moved Peter Pan around the maze using the touch screen. A trial began when the participant reached a fork in the maze, whereupon the game froze and a zoomed-in version of the fork appeared, with a cartoon character marking each direction. In two-animal mazes, the two animals that Peter Pan was looking for were each hiding behind one of the cartoon characters. In one-animal mazes, the animal that Peter Pan was looking for was hiding behind one of the cartoon characters, while the other cartoon character hid nothing (see Fig. 1). As soon as the zoomed-in version of the maze appeared, Peter Pan solicited the participant’s help (“Can you help me?”), and after 3.5 s, he asked the critical question (e.g., “Is the parrot behind Po?”). On half of the trials, Peter Pan’s question required an affirmative answer, and on the other half of the trials, it required a negative answer (e.g., “Is Boots hiding the parrot?” or “Is the parrot behind Boots?” in relation to the example shown in Fig. 1). We chose simple yes/no answers to minimize the complexity of the child’s response, on the assumption that this would maximize the power to detect concurrent prediction and preparation.
The experimental trials used 48 well-known cartoon characters (12 per maze). Because participants tend to answer earlier when questions are longer (Corps et al., 2018; De Ruiter, Mitterer, & Enfield, 2006), we controlled for question length (both the actual length and the length potentially predicted by the participant). On half of the trials, both characters had short names (one or two syllables; mean length = 608 ms), and on the other half, both characters had long names (three or more syllables; mean length = 1,037 ms); this was true for early and late questions. Parrot and tiger were chosen as animal names because they are length matched and have similar log frequencies (4.75 and 4.62, respectively) in the children’s television subsection of the SUBTLEX-UK database (Van Heuven, Mandera, Keuleers, & Brysbaert, 2014). Finally, overall differences in length between early and late questions were controlled for by the experimental design, as we compared across one- and two-animal mazes.
A female speaker of Scottish English recorded the 192 possible questions (1 per character per condition) in a quiet room, using slow, child-directed prosody. After recording was completed, a 300-ms pause was inserted before the last word to ensure that participants had sufficient time to predict the final word and use that prediction to prepare a response. The pause sounded natural and was in keeping with the slowness of the child-directed speech used in the recordings. However, although the inclusion of the pause should have reduced performance demands on participants, it also partially limited the generalizability of our findings to more naturalistic conversations.
Participants completed two practice trials before the study began. Before each block, they were shown each of the cartoon characters that they would see in that block; the characters were named by the experimenter and repeated out loud by the participants. The block began after the experimenter judged that the participant knew each character’s name. Participants’ spoken responses were recorded using the internal iPad microphone and were coded off-line.
Coding
Audio recordings were coded by the first and second authors and three trained research assistants using Praat software (Version 5.4.18; Boersma & Weenink, 2019). Response times were measured from question offset to the onset of the first speech sound. If the answer was preceded by a filled pause, we measured to the onset of the filled pause to ensure greater consistency (because the filled pause was often coarticulated with the answer); prespeech inhalations were excluded because these were not reliably picked up by the microphone. Responses were also transcribed to determine accuracy. Trials were excluded if participants failed to answer the question, if the trial’s question had to be repeated, or if there was too much background noise to allow coding. In total, 14.3% of trials were excluded (adults: 2.2%, older children: 15.3%, younger children: 25.5%).
Data analysis
Conventionally, analyses of chronometric experiments evaluate whether different experimental conditions cause a difference in the mean response time and include the assumption that the distribution of response times across conditions has an identical shape. However, as Figure 2a makes clear, the shape of the distribution of response times can be markedly different between conditions. In this experiment, for example, the distribution of child response times was much more right skewed than the distribution of adult response times. When standard statistical analyses are used, these large differences in the distributions could potentially mask differences in mean response times or indeed induce spurious mean differences. Thus, rather than simply compare condition means, we analyzed how the distribution of response times varied across conditions.

Results. The distribution of response times (a) is shown as a function of age group (adults vs. children), information structure (early vs. late), and maze type (one animal vs. two animals). The adult panels combined are based on 4,528 data points, whereas the child panels combined are based on 4,863 data points. Mean response time (b) is plotted as a function of maze type and information structure, separately for each age group. Error bars represent 95% by-participant confidence intervals bootstrapped over 1,000 samples. Mean response times were calculated after excluding data points greater than 1,500 ms, as in the Gaussian analyses.
To capture the shape of both the adult and the child distributions, we assumed that the response times followed an ex-Gaussian distribution (Balota & Yap, 2011), which is the convolution of a Gaussian distribution and an exponential distribution (and thus subsumes the Gaussian as a special case). The ex-Gaussian distribution is known to provide an excellent fit to empirical response time distributions, particularly because it can model their long right tail. The distribution has three parameters: µ, the location of the Gaussian component; σ, the spread (standard deviation) of the Gaussian component; and τ, the inverse rate of the exponential component that accounts for the thickness of the right tail. Importantly, these three parameters can be differentially affected by one and the same experimental manipulation (Balota & Yap, 2011), which reflects the fact that response times are the result of a combination of multiple processing steps as well as more unusual events (e.g., distractions, cognitive overload), which can cause highly delayed responding. For example, an effect on τ in the absence of an effect of µ could indicate that an experimental manipulation does not delay typical processing steps but does increase the degree of distraction. A distributional analysis should thus provide a much more complete picture of how prediction and formulation affect response times during turn taking.
Distributional analyses typically require many observations per participant to produce robust participant-level estimates (at least 50 per condition), more than could feasibly be provided by a 3-year-old. We overcame this limitation by building on recent advances in Bayesian statistics and distributional multilevel regression modeling (Bürkner, 2016; Carpenter et al., 2016), which allowed us to fit an ex-Gaussian model to the data in a multilevel fashion, using individual observations across all participants to leverage our by-participant estimates (A. Gelman & Hill, 2007). In particular, we tested how the parameters of the model—the Gaussian’s location µ and spread σ and the exponential rate τ—varied across the different conditions, while accounting for how the observations were hierarchically clustered within participants. This analysis is analogous to a linear mixed-effects regression, which models how the location (mean) of a distribution varies across conditions and participants, except that this analysis simultaneously models how all three parameters of the ex-Gaussian distribution vary across those predictors.
We used the same set of predictors to model each of the three parameters: fixed effects of maze type (i.e., one vs. two animals), information structure (early vs. late), and age (children vs. adults), along with their full set of interactions; we also included control predictors for the length of the question’s final word and for whether participants had to respond affirmatively or negatively. Factorial predictors were contrast coded (−0.5, 0.5), and continuous predictors were standardized. The random-effects structure included random intercepts for each participant and random slopes for information structure (which was the only within-subjects factor); the correlation between random slopes and intercepts was fixed to zero. To aid model convergence, we pooled the data for younger and older children (thus, age was a two-level factor: children vs. adults) but also explored potential developmental changes in follow-up analyses (see the Supplemental Material available online).
Ex-Gaussian analyses were run using the brms package (Version 2.1.0; Bürkner, 2016) in the R programming environment (Version 3.3.3; R Core Team, 2017). We ran four chains per model, each for 2,000 iterations, with a warm-up period of 1,000 iterations and initial parameter values set to zero. The model converged with no divergent transitions (all Rˆs ≤ 1.01). For each parameter, we report an estimate (b), estimated error (EE), and the 95% credible interval (CrI). If zero lies outside the CrI, then we conclude that there is sufficient evidence to suggest that the estimate is different from zero. Note that σ and τ were fitted on the log scale.
Given the novelty of this procedure, we also report a more standard analysis for comparison, using a linear mixed-effects model with a Gaussian link function to predict log-transformed response times (fitted using the lmer function of the lme4 package, Version 1.1-13, in the statistical platform R; Bates, Mächler, Bolker, & Walker, 2015). This had the same fixed-effects structure described above but also included random intercepts for items and by-item random slopes for information structure and maze type. We report a coefficient estimate (b), standard error (SE), and t value for each predictor but no p value, as it is unclear how to compute dfs for linear mixed-effects models with crossed random effects; instead, we report 95% confidence intervals (CIs) from the confint function (Wald method).
Prior to conducting ex-Gaussian analysis, we excluded trials on which participants provided incorrect answers (adults: 0.8%, older children: 4.2%, younger children: 17.9%) and trials that were outliers on the left side of the distribution, that is, very early anticipatory responses that were further than 1.5 standard deviations below the age-appropriate mean (< 1% of data points at each age group; this excluded roughly the same amount of data as applying a 2.5-SD cutoff to unskewed data). Five younger children were then excluded because they did not provide at least 20 data points after exclusions; no participants were excluded from the other two age groups. Disfluent but correct answers were included in the analyses to reduce data loss. Response times were then standardized, and a constant was added to all data points to avoid negative values because there was a small (< 1) percentage of overlaps. In addition, for the linear model analyses, we excluded all response times longer than 1,500 ms to further reduce the skew of the distribution. All analyses were conducted in RStudio (Version 1.0.143; RStudio Team, 2017). Data and analysis scripts are available at https://osf.io/kcp9z/.
Results
If participants prepare a response as they listen to a question, then we would expect an interaction between maze type and information structure: Response times should shift closer to zero when questions mention critical information early rather than late and the maze is predictable (one-animal maze) rather than unpredictable (two-animal maze). We thus tested for the presence of this interaction in both adults and children. For the distributional analysis, we assumed that this interaction would shift the location of the Gaussian component (an effect on the µ parameter), but also analyzed whether it would affect the spread of the Gaussian component (i.e., its standard deviation, or the σ parameter) as well as the rate of the exponential component (the τ parameter), capturing the thickness of the right tail of the distribution.
Figure 2a shows how the distribution of response times varied across the different age groups and conditions, whereas Figure 2b shows the variation in mean response time (for a breakdown of the child data into younger and older children, see Fig. S1 in the Supplemental Material). Our analysis first focused on the location of the Gaussian component of these distributions (µ; see Table 2). Overall, this was shifted earlier in time when the maze was predictable (maze type: b = 0.027, EE = 0.013, 95% CrI = [0.003, 0.053]) and also when the critical cartoon character’s name was mentioned earlier (i.e., when the final word in the sentence was “tiger” or “parrot”; information structure: b = 0.068, EE = 0.004, 95% CrI = [0.059, 0.077]). This latter main effect was unexpected; we suggest that it occurred because participants were faster to recognize the final word when it named one of the animals because those labels were repeated on every trial of the study.
Parameters From the Ex-Gaussian Distributional Analysis Comparing Children With Adults
Note: Rˆ is a measure of convergence of the algorithm (Rˆ = 1 at convergence). EE = estimated error; CrI = credible interval.
Crucially, however, these findings were qualified by the predicted interaction: Information structure had a significantly larger effect on response times when the maze was predictable (one animal) than when the maze was unpredictable (two animal; b = −0.026, EE = 0.009, 95% CrI = [−0.043, −0.008]). Importantly, there was no further interaction between information structure, maze type, and age group (b = 0.005, EE = 0.018, 95% CrI = [−0.031, 0.039]), that is, both children and adults appeared to prepare a response ahead of time on the basis of their predictions about the content of a question. This interaction is perhaps best appreciated by examining Figure 2a and comparing the distributions of responses to early questions (solid line) and late questions (dashed line) in each of the left-hand panels (for predictable one-animal mazes) and the relative distributions of these responses in the corresponding right-hand panels (which show unpredictable two-animal mazes). Critically, for both adults and children, the response distributions from predictable one-animal mazes show considerably less overlap than the response distributions from unpredictable two-animal mazes, indicating how prediction allowed children to formulate a response in advance and thus reduce response time. Follow-up analyses focused only on the child sample confirmed this result: Children as a whole showed the same interaction, and the size of the interaction did not vary across age groups, providing no strong evidence for major developmental differences in the ability to coordinate prediction and production (see Tables S1, and S2 in the Supplemental Material). Thus, our findings support a theory of turn-taking development in which the ability to use prediction to flexibly time formulation of a response is an early milestone.
These key findings were replicated in the nondistributional linear mixed-effects analysis. When adults were compared with children (see Table 3), the model revealed an interaction between information structure and maze type, b = −0.064, SE = 0.020, 95% CI = [−0.104, −0.024], t = −3.13, but no further interaction with age group, b = 0.031, SE = 0.030, 95% CI = [−0.028, 0.091], t = 1.03, and the same was true in the analyses that compared younger children with older children (see Tables S3 and S4 in the Supplemental Material). Thus, both distributional and traditional analyses confirm that, during conversation, children can prepare a response ahead of time on the basis of a prediction about what their partner will likely say next.
Parameters From the Gaussian Linear Mixed-Effects Analysis Comparing Children With Adults
Note: CI = confidence interval.
Returning to the distributional analysis, we assessed how the different experimental manipulations affected the spread of the Gaussian component, as well as the rate of the exponential component (i.e., right skew), and whether those effects differed between adults and children. As expected on the basis of observational studies (Casillas et al., 2016) and inspection of Figure 2a, the spread and skew parameters were larger in children than in adults, indicating that children showed more variation around the modal response time (b = −0.222, EE = 0.086, 95% CrI = [−0.387, −0.048]) as well as a considerably thicker right tail (i.e., highly delayed responses; b = −1.187, EE = 0.096, 95% CrI = [−1.374, −1.001]).
In addition, across age groups, the spread and skew were also larger when the critical information came late (effect of information structure—spread: b = 0.298, EE = 0.057, 95% CrI = [0.190, 0.411]; skew: b = −0.035, EE = 0.036, 95% CrI = [0.080, 0.224]). When the maze was not predictable, response times showed larger skew (effect on the rate of the exponential; b = 0.258, EE = 0.095, 95% CrI = [0.072, 0.453]) and also a larger spread, but only in children (interaction between age group and maze type; b = −0.408, EE = 0.164, 95% CrI = [−0.736, −0.097]). However, and importantly, neither parameter showed a reliable interaction between information structure and maze type (spread: b = 0.036, EE = 0.112, 95% CrI = [−0.186, 0.256]; skew: b = 0.081, EE = 0.074, 95% CrI = [−0.065, 0.227]). This suggests that flexible predictive formulation of responses mainly shifts the location of the response time distribution, although it does not affect response variability or the likelihood of very delayed responding.
A notable feature of these latter analyses is that they highlight how attention to the distribution of response times can be more informative than attention to their mean. Our traditional analysis of the mean suggested that children’s turn taking was significantly slower than adults’ turn taking (a significant age-related decrease; see Fig. 2b and Table 3), but the distributional analysis showed that this was not simply because adults’ processing speed is greater. Rather, children and adults had similar modal response times (see Fig. 2a and Table 2), 1 but children also showed significantly more variability in their response times and were also more prone to respond after very long delays (main effects of age group on both σ and τ parameters; see Fig. 2a and Table 2). This pattern suggests that the major developmental change in turn taking is not in the speed with which children coordinate comprehension and formulation but in how likely children are to become distracted or experience difficulties in switching between tasks. We will return to this issue in the Discussion section.
Finally, we assessed how response times were affected by our two control predictors: the length of the question’s final word and whether the participant had to answer yes or no. Consistent with prior work by De Ruiter et al. (2006), results showed that when the final word was longer, not only were response times shifted closer to zero, the skew of those responses was reduced (effects on the location, b = −0.023, EE = 0.001, 95% CrI = [−0.025, −0.020], and the rate of the exponential, b = −0.097, EE = 0.013, 95% CrI = [−0.124, −0.071]). This finding is important because it confirms that children do not typically wait until the question is over to begin formulation; rather, they start formulation as they listen to the question and thus respond earlier when the longer final word gives them more time to prepare. Interestingly, when the final word was longer, the spread of the Gaussian component tended to be somewhat larger (b = 0.107, EE = 0.022, 95% CrI = [0.063, 0.152]). This is consistent with other evidence showing that adults respond earlier to questions that are longer but that the timings of their answers are less precise, for example, resulting in more overlaps (Corps et al., 2018). Finally, when participants responded no, their response times were shifted later compared with when they responded yes (effect on the location parameter; b = 0.016, EE = 0.002, 95% CrI = [0.012, 0.021]), but there were no further effects on the spread or skew of the distribution. This latter finding is consistent with some corpus evidence (Stivers et al., 2009) and could be explained by the fact that rejections are dis-preferred responses (Kendrick & Torreira, 2015).
Discussion
Observational analyses suggest that preschool children’s conversational turn taking is poorly timed, but our experiment reveals that preschoolers’ ill-timed responses mask a sophisticated ability to coordinate language comprehension and language production: They can generate predictions about what their conversational partners will say, use those predictions to prepare a response while still listening, and thus respond more quickly. In this way, children optimize the timing of their conversational turns.
Our findings contribute to a growing body of work highlighting the importance of turn taking for human culture and development. Most notably, they are consistent with the ethological approach to turn taking advocated by Levinson (2006, 2016), who argued for continuity in the mechanisms that allow young infants to engage in well-timed nonlinguistic turn taking and the mechanisms that preschoolers use when engaging in (less well-timed) linguistic turn taking (Hilbrink et al., 2015). Such continuity acts as a key argument in support of Levinson’s stronger claims: that turn taking represents a universal biological adaptation in humans (rather than, e.g., a cultural adaptation for facilitating smooth linguistic interaction) and that, over historical time, languages have evolved to ensure that conversational turn taking is as smooth as possible (e.g., by favoring grammatical or prosodic features that allow listeners to quickly identify whether a turn is a question or statement).
The present data also shed light on the striking recent discovery that children’s longitudinal language development is better predicted by the quality of their early conversational experiences than by more traditional measures, such as quantity of words heard (Hirsh-Pasek et al., 2015; Zimmerman et al., 2009). The particular contribution of our study is to specify one of the cognitive mechanisms, the flexible coordination of prediction and formulation, that would allow children to engage their caregivers in these higher quality conversational interactions. For instance, this mechanism allows children to generate conversational turns that both build informatively on the previous speaker (e.g., by responding to a request) and begin within a pragmatically appropriate time frame. Thus, children can actively promote caregiver engagement and maximize their own opportunities to learn. In turn, the timeliness of a child’s responses allows adults to monitor the child’s degree of understanding and thus provide appropriate feedback. In combination, these factors should act as scaffolding for children as they acquire new linguistic and world knowledge from conversations (see Clark, 2007; S. A. Gelman, 2009) and may also help foster a better understanding of the social uses of language (Dunn & Shatz, 1989).
One question raised by our study is why, if children can coordinate comprehension and production, they still often produce poorly timed conversational turns. Levinson (2016) suggested that children’s poor timing may reflect difficulties that they have in producing complex language (consistent with naturalistic observations; Casillas et al., 2016; Clark & Lindsey, 2015), which is partially consistent with the present data: Here, children needed to produce only very simple yes/no answers, and their modal response times were actually strikingly similar to adults’ response times. However, for us to fully understand how production difficulties constrain the development of conversational timing, it is important to test more complex responses as well as younger children. Moreover, even when children in our task produced simple utterances, they still often left long gaps (i.e., children’s response distributions had a larger right tail than adults’ response distributions), which suggests that factors other than linguistic planning difficulty may also contribute to the poor timing of children’s linguistic turn taking.
We thus propose that such poor timing may also arise from a more general factor, such as distractibility or skill at switching tasks (e.g., from comprehension to formulation; Zelazo et al., 2003). This would be consistent with previously reported associations between executive processes and the weight of the right tail of the response distribution (Schmiedek, Oberauer, Wilhelm, Süß, & Wittmann, 2007; though see Matzke & Wagenmakers, 2009). It also highlights an important message from the present research: Distributional data-analysis methods (Balota & Yap, 2011; Umlauf & Kneib, 2018), which characterize the shape of the response time curve, allowed us to capture children’s coordination skills while at the same time accounting for (and indeed modeling) the noise in their response times. This underscores the utility of these analytic methods for the interpretation of chronometric experiments.
Conclusion
Smooth conversational turn taking appears to be universal, despite the pressure that it places on understanding and using language (Levinson, 2016). Our findings suggest that preschoolers, who are still learning to use complex language, can nonetheless flexibly coordinate language comprehension and language production and, thus, optimize the timing of their conversational interactions. The early development of these skills suggests that an understanding of turn taking will shed light on language and cognitive development more broadly.
Supplemental Material
GambiOpenPracticesDisclosure – Supplemental material for Preschoolers Optimize the Timing of Their Conversational Turns Through Flexible Coordination of Language Comprehension and Production
Supplemental material, GambiOpenPracticesDisclosure for Preschoolers Optimize the Timing of Their Conversational Turns Through Flexible Coordination of Language Comprehension and Production by Laura Lindsay, Chiara Gambi and Hugh Rabagliati in Psychological Science
Supplemental Material
GambiSupplementalMaterial – Supplemental material for Preschoolers Optimize the Timing of Their Conversational Turns Through Flexible Coordination of Language Comprehension and Production
Supplemental material, GambiSupplementalMaterial for Preschoolers Optimize the Timing of Their Conversational Turns Through Flexible Coordination of Language Comprehension and Production by Laura Lindsay, Chiara Gambi and Hugh Rabagliati in Psychological Science
Footnotes
Action Editor
Rebecca Treiman served as action editor for this article.
Author Contributions
L. Lindsay and C. Gambi are joint first authors of this article. H. Rabagliati and C. Gambi designed the study with input from L. Lindsay. L. Lindsay created the interactive maze game and collected the data. L. Lindsay coded the data, with help from C. Gambi. H. Rabagliati conducted the distributional regression analyses with input from C. Gambi. C. Gambi drafted the manuscript, and H. Rabagliati provided significant revisions. All the authors approved the final manuscript for submission.
Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.
Funding
This research was supported by the Carnegie Foundation, by an Economic and Social Research Council (ESRC) Studentship N71737J to L. Lindsay, by the Leverhulme Trust (RPG 2014-253) to H. Rabagliati and M. J. Pickering, and by an ESRC Future Research Leaders award (ES/L01064X/1) to H. Rabagliati.
Open Practices
All data have been made publicly available via the Open Science Framework and can be accessed at osf.io/kcp9z. The visual stimuli are available only on request from the corresponding author because the cartoon characters used are restricted by copyright. However, the audio stimuli are available at osf.io/kcp9z/. The design and analysis plans for the study were not preregistered. The complete Open Practices Disclosure for this article can be found at https://journals-sagepub-com.web.bisu.edu.cn/doi/suppl/10.1177/0956797618822802. This article has received the badge for Open Data. More information about the Open Practices badges can be found at
.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
