Abstract
In conversation, speakers can mobilize a variety of prosodic cues to signal a switch in topics. This paper uses a mixed-methods approach combining Conversation Analysis and Instrumental Prosody to investigate the prosody of topic transition in American English, and analyzes the ways in which speakers can play on register level and on register span. A cluster of three prosodic parameters was found to be predictive of transitions: a higher maximum fundamental frequency (F0), a higher median F0 (key), and an expanded register span. Relative to speakers’ habitual profiles, the mobilization of such prosodic cues corresponds to a marked upgraded prosodic design. This finding is consistent with the general assumption that continuation constitutes the norm in conversation, and that departing from it, as in the case of a topic transition, requires a marked action and marked linguistic design. The disjunctive action of opening a new topic corresponds to the use of a marked prosodic cue.
1 Introduction
The present study focuses on the interactional action of initiating a topic transition—that is, when a speaker moves away from the current topic and operates a switch to a different one. I hypothesize that speakers mobilize variations in pitch register in two ways: they can play both on their register level and on their register span in order to signal the topic switch.
The present study considers that prosody is a part of grammar, and hypothesizes that specific prosodic cues can be used by participants to carry out an interactional action: “prosodic phenomena are not seen as accidental or aleatory, nor as automatic reflexes of cognitive and affective states. They are thought to have their own systematicity, but a systematicity which can only be accessed in a context-sensitive fashion” (Couper-Kuhlen, 2001a, p. 16).
Cruttenden (1997) listed three main prosodic parameters: length, loudness, and pitch—the latter often being considered the most crucial to intonation. More encompassing definitions include more prosodic parameters (Couper-Kuhlen, 2001a): pitch (high vs. low), loudness (loud vs. soft), speech rate (fast vs. slow), rhythm and tempo (spacing of beats), pause, and voice quality (breathy, creaky, etc.). As in countless other studies, priority here is given to pitch parameters, despite the well-established knowledge that other prosodic parameters, such as speech rate and intensity, play a crucial role in interaction—or even less studied phenomena, such as spectral richness and voice tenseness (D’Alessandro, 2006). However, this study integrates two aspects of pitch register that are rarely analyzed together. Some studies have identified discourse functions for register level variations, such as in the delivery of good versus bad news (Freese & Maynard, 1998) or the reason-for-the-call in radio phone-in programs (Couper-Kuhlen, 2001b), but register span is less frequently taken into account (Portes & Di Cristo, 2003).
Register concerns the phonetic realization of tonal targets, not on the horizontal time dimension, but on a vertical scale along the dimension of pitch (Ladd, 1998). Contrary to pitch accents, which concern individual (stressed) syllables, register changes can affect an entire unit, including unstressed syllables. Pitch register is defined here as “a participant’s local pitch span during an interactional sequence, turn or intonation phrase” (Szczepek Reed, 2011a). While voice range is conditioned by physiological and sociolinguistic factors, variation in pitch register is used to achieve various interactional functions.
Defined as being what a portion of the conversation is about (Berthoud & Mondada, 1995; Goutsos, 1997),
The present study proposes a mixed-methods approach to prosody in interaction, following Zellers and Post (2012), Zellers and Ogden (2014), and Sicoli, Stivers, Enfield, and Levinson (2015) among others. A corpus of spontaneous talk-in-interaction was analyzed from the dual perspective of qualitative and quantitative analysis. The qualitative analysis consisted of the use of sequential analysis as developed by the conversation-analytic methodology (e.g., Schegloff, 2007), as well as the manual and systematic coding of linguistic features commonly used in Corpus Linguistics (e.g., Glynn, 2014). This qualitative coding was then used to conduct statistical analysis, which consisted of using logistic regression to model the connection between interactional meaning and the prosodic variables that had been identified as relevant to the speakers. The qualitative and quantitative parts of this study shed light on one another, being complementary perspectives on the puzzle of the prosody of topic transition in interaction. Section 2 provides theoretical background on the issues of studying the prosody of talk-in-interaction. Section 3 details the corpus and methodology used. Section 4 presents the results, interpretation, and further implications, which are then discussed in Section 5.
2 Background
2.1 Prosodic analysis of conversation
Three of the main difficulties when working on prosody are 1) the interrelated nature of prosodic phenomena, 2) absolute and relative measures, and 3) gradience and categories.
The first issue is due to the fact that distinct prosodic cues can share properties, for example, the way they mobilize articulators for phonation. Yelling is an everyday illustration of this phenomenon: a sudden increase in loudness may also mechanically translate into a higher-pitched voice. It may thus be difficult to analyze loudness without taking into account its effect on pitch. Unfortunately, much remains to be understood about the perception of prosody (Vaissière, 2008). Besides, prosody can never be “switched off” when speaking: whatever it is they say, speakers necessarily deliver a turn with a specific pitch contour, register, loudness, tempo, voice quality, etc. This results in what can be termed the multifunctionality of prosodic cues, as one cue can have more than one function at the same time. An illustration of this can be seen in example (1). ALI is sharing an extended narrative about the first day at work of her new manager (“her” l.2). Several things went wrong that day, including ill-fitting pants (“they” l.1). The manager is new to Colorado, and did not expect that she would need extra time in the morning to wipe snow and ice off her car.
ANN initiates a transition l.10, about her own morning. Her transition may have been triggered by her mother’s mention of “windshield wipers” (l.4), as the topic she initiates is about the layer of ice that formed on her own car. ANN’s transition is delivered with a very high register level. This aspect of prosodic design has two functions here. Besides cueing the disjunction created by the topic transition, it also corresponds to an instance of competitive overlap. ANN starts her turn in overlap (“unhunh and that-” l.10) with ALI’s turn l.9 (“it only goes”). The overlap is temporarily resolved when ANN interrupts her turn, and ALI finishes hers without overlap (“downhill from there” l.9). As ALI finishes her turn, ANN immediately jumps in with a restart of her transition (“that (.) ice stuff was thick too ’cause I took the (.) blankets off my car this morning?”). Kurtic, Brown and Wells (2009) showed that high register level is characteristic of competitive overlap and stops when the competition for the turn is resolved. In (1), the elevated fundamental frequency (F0) is maintained even after the overlap is resolved, as it is also mobilized to cue the transition itself. It is worth noting that ANN even maintains her use of a high register level for the turn following her transition. The competitive overlap coinciding with her transition may be partly responsible for the maintained use of an elevated pitch, by preventing further interruptions—which is also congruent with the faster speech rate at which l.10 and 12 are delivered.
Consequently, it is a difficult task to determine what parameter—if any one parameter—is most relevant in context. The concept of
The second difficulty when studying prosody—that is, absolute versus relative measures—is connected to the tremendous variability pertaining to prosodic parameters. All linguistic phenomena are characterized by some degree of variability, such as sociolinguistic variants and idiolects. Prosodic phenomena in particular are even more sensitive to cross-speaker variability, as physiological differences along with sociolinguistic variables play a role. Individual differences such as sex, height, build, or size of the larynx affect phonation and the acoustic properties of speech. Consequently, relative measures are often more appropriate than absolute ones to the description of pitch phenomena (Ladd, 1998), especially when comparing the production of different speakers in different conversations. A high pitch is not “high” in isolation but by comparison to either the speaker’s voice or to prior talk—what Ladd (1998, p. 189) termed “normalizing” and “initializing” approaches respectively.
In an
A third source of difficulty for prosodic research concerns issues of gradience and categories. Many prosodic parameters (such as pitch, loudness, or speech rate) are physiological and acoustic parameters continuous in nature. In theory, a measure of F0 can fall anywhere on an axis going from 0 Hertz (no voicing) to +∞. In practice, only a limited part of this continuum is actually relevant. For example, a range going from 60 to 600 Hertz (Hz) is commonly used as a default window to analyze human prosody. Speakers can use a lower pitch in some contexts (e.g., creaky voice) or a higher one (e.g., laughter). F0 is a continuous acoustic parameter, but it does not necessarily follow that its perceptual correlate—pitch—is always best analyzed in terms of gradience. While many prominent frameworks consider that intonational meaning is discrete, Gussenhoven (1999) and Post, Nolan, Stamatakis, and Hudson (2009) have deplored that this has tended to remain an underlying hypothesis seldom tested. Until more perceptual and cognitive analyses are undertaken, it continues to be a mostly unresolved matter, whether meaning carried by intonation is discrete, gradient, or maybe a bit of both. In the meantime, this study hypothesizes that noticeable changes in register can cue a specific meaning in interaction. To decide what is “noticeable,” it is necessary to assume and determine a cut-off point—this is discussed further in Section 3.5. In an Interactional Linguistics perspective, grammatical and interactional meaning is taken to rely mostly on categories: speech is perceived as “high” or “low,” as “fast” or “slow,” or as “same” or “different.” The reader can be referred, for example, to Couper-Kuhlen (1993) on matched or mismatched tempo, or Couper-Kuhlen (2001b) on high onsets. From an interactional point of view, a prosodic design can be “downgraded” or “upgraded,” “marked” or “unmarked.” It is assumed that a speaker does not perceive a variation in register level as a “100 Hz” or even a “10 semitones (st)” shift upward, but rather as a “meaningful” and “significant” register shift, whatever this means in a particular environment. The delicate question when studying prosody is thus to determine what is a meaningful, relevant, or marked change.
Frequency represents one answer to the question of gradience and categories. Speakers are highly sensitive to frequencies in language, and grammar can be seen as emerging from the repeated use and recognition of patterns (Bybee & Hopper, 2001). A recent trend sees the emergence of studies using instrumental techniques coupled with statistical analysis to analyze interactional phenomena such as competitive overlap (Kurtic, et al., 2009), evaluative questions (Sicoli, et al., 2015), prosodic orientation (De Looze, Scherer, Vaughan, & Campbell, 2014), or contrastive structures (Zellers & Ogden, 2014). In the present study, several continuous prosodic parameters (e.g., register span) were treated as categories based on the individual speaker’s profile, determined from statistical measures of dispersion. For example, the originally continuous variable of register span (in octaves) was operationalized as a binary variable (normal vs. expanded). This is hypothesized to map onto the marked-unmarked dimension: dispersion and frequency were used to determine what could be considered marked or unmarked register span for a specific speaker (see Section 3.5). In sum, frequency and dispersion can indicate thresholds, which in turn hint at categories of meaning.
2.2 Prosodic cues to topic structure
Signaling that a turn is a topic transition is part of a more encompassing task concerning every turn, which Szczepek Reed (2011b, p. 16) described as “the constant necessity for speakers to display each turn as either continuing a previously established interactional project, or as starting a new one.” Newness in discourse has long been associated with elevation in pitch—or at least some variation in pitch. While a sizeable body of research has investigated read speech (Hirschberg & Grosz, 1992; van Dijk, 1977; Wichmann, 2000; Zellers, 2011; Zellers & Post, 2012, inter alia), fewer studies have investigated the prosodic cues to new topics in spontaneous speech. Yule (1980) analyzed three prosodic cues that signal the onset and offset of topic units (which he called “major paratones” or “speaker’s topics”): onset height, register level, and pauses. The onset of a major paratone is delivered with a high onset, and can involve a high register level. The offset of the unit can mobilize a low register level and compressed register range. Nakajima and Allen (1993) presented evidence that new topics have a high register level as well as a higher F0 peak than the preceding utterance. Zellers (2013) found that contrastive structures initiating topic transition are characterized by an expanded span in the following turn (post-contrast). Hence, in the most general way, the literature suggests that topic transitions are characterized by some level of prosodic upgrade. However, even this very general statement can be undermined. Zellers and Ogden (2014) showed that speakers can avoid doing a prosodic upgrade to downplay moments of disjunction, including topic transitions.
Some characteristics of these studies call for the expansion and re-examination of their findings. Yule’s (1980) paper was a preliminary proposal to analyze the prosodic cues to discourse structure, and it rested on the qualitative analyses of a few hand-picked cases. Nakajima and Allen (1993) also posed issues of generalization, as it relied on the analysis of one conversation between two speakers. To the best of my knowledge, Zellers (2013) is the only quantitative study that inquired into the prosodic cues to new topics in interaction, albeit on a small sample (36 tokens). With a mixed-methods approach combining Conversation Analysis, Experimental Prosody, and statistics, she found evidence of variation in register span in topic transitions. However, she focused on a very specific subtype of topic transitions: stepwise transitions using a contrastive structure as a pivot, with an initializing approach comparing the prosody of a turn to prior talk. The present study proposes to expand and reexamine these findings with a corpus of strictly spontaneous and interactional speech, controlled neither for topic structure nor for phonological structure, and which placed no restriction on the types of topic transitions taken into account.
3 Corpus and methods
3.1 Corpus
The corpus consists in American English spontaneous conversations audio-recorded as part of the Santa Barbara Corpus of Spoken American English (Du Bois et al., 2000–2005) collected across the United States in the 2000s, and freely accessible online through the TalkBank database (talkbank.org). On the basis of external criteria such as sound quality, I selected six dyadic conversations, and extracted 15 minutes from each, for a total time of 1 hour and 30 minutes, and 11 different speakers.
3.2 Minimal unit segmentation
The study was conducted with the turn-constructional unit (TCU) as its basic unit despite the existing segmentation of the Santa Barbara corpus into intonation units. The TCU is the basic component of turns-at-talk in the conversation-analytic framework (Clayman, 2013), and it is defined as a potentially complete turn, projected by a wide array of syntactic, prosodic and interactional cues (Selting, 2000). For example, a turn may not be considered over until the completion of a syntactic schemata (e.g., an if-then structure or a transitive verb and its object) or an intonation contour (e.g., falling contour on the last item of a list). Very often, speakers add constituents to previous turns that could be complete on their own. To decide whether such increments were stand-alone TCUs or belonged to the previous TCU, I applied the semantic criteria described in Ford, Fox, and Thompson (2002), who made a distinction between “extension increments,” which they considered to be part of the turn they complete, and “unattached NPs,” increments representing a separate TCU. I also used the phonetic distinction that Local and Walker (2004) made between the phenomena of rush-throughs and abrupt-joins. Rush-throughs tend to have a continuative prosody that blends the post-TRP increment with the preceding unit. Abrupt-joins on the other hand are more disjunctive in their prosodic design. Following the guidelines presented in Selting (2000), Ford et al. (2002), and Local and Walker (2004), the corpus was discreetly segmented in TCUs.
As previously argued in Zellers (2011) and Riou (2015), transition to a new topic is an interactional action and, as such, it is more reasonable to assume that it is implemented over the course of an interactional unit, such as the TCU, rather than a phonological one, such as the intonation unit. This is open to debate, as intonation units have also been associated with discourse segmentation (Du Bois, 2003) and information flow (Chafe, 1994).
3.3 Identification of topic transitions
Topic transitions were identified adopting the methodology previously presented in Riou (2015). Following a qualitative, sequential analysis of topic structure in the conversations, each of the 2606 TCUs of the corpus was coded as implementing Transition (n = 212) or Continuity (n = 2394).
When topic is discussed in Conversation Analysis-oriented studies, the focus tends to be on the different ways in which topic can arise in talk-in-interaction and, hence, on the traditional difference that Conversation Analysis makes between stepwise and disjunctive topic transitions (Holt & Drew, 2005; Jefferson, 1984; Maynard, 1980). I included the two types of topic transition for the present study. Stepwise topic transition—also called step-by-step topic shift and topic shading (Schegloff & Sacks, 1973)—is a gradual move to a new topic related to the topic already under discussion (Jefferson, 1984). The new topic can be for example a different aspect of the topic already under discussion. Disjunctive topic transition on the other hand represents a more abrupt change to a new topic. The new topic “does not emerge from [prior talk], it is not typically coherent with it, but constitutes a break from it” (Jefferson 1984, p. 194).
An inter-rater agreement procedure then checked the identification process on 33% of the corpus (2 conversations out of 6) and yielded a substantial agreement (Cohen’s kappa, κ = .73). To avoid concerns of circularity, the external coder was not informed of what specific linguistic cues were to be analyzed in the study. Having access to the transcription and the audio files, the external coder was provided with a basic definition of topic in terms of aboutness and instructed to incrementally decide whether each new TCU was “about the same topic as the previous TCU.” The whole corpus was then systematically coded in a spreadsheet comprising a wide range of qualitative variables pertaining either to the linguistic format of each TCU, or to its position and function in the interaction. Parallel to the coding scheme, I also annotated the topic structure of each conversation in a Praat textgrid.
3.4 Prosodic annotation
Not all the TCUs of the corpus were fit for instrumental prosodic analysis due to varying issues of speech overlap, sound quality, and F0 detection errors. Consequently, I selected a subset of 450 TCUs (175 Transitions; 275 Continuities) controlled for detection errors in Praat (Boersma & Weenink, 2012). Using each conversation’s long file in Praat, I manually extracted all the Transitions not characterized by any overlap or background noise. I then opened them individually in Praat, adjusted the pitch settings according to the speaker’s voice range, 3 and checked for detection errors. In the case of Continuities, I only extracted a very small subset compared with the much larger set of Continuities in the corpus (n = 2394) to avoid unbalance in sample size. As there was available material allowing me to restrict my choice, I only selected TCUs that met the following criteria:
No overlap
No background noise
No obvious detection error (octave jump, detection of F0 for voiceless segments, incoherent F0 values)
No reported speech
No backchannel
I avoided selecting TCUs containing reported speech because speakers often modify their prosody to mimic someone else’s voice (Couper-Kuhlen, 1996; Günthner, 1999), which would not be representative of a speaker’s habits when doing Continuity or Transition. I excluded backchannel signals (such as okay, right, unhunh) because they tend to be very short and would not be comparable to Transitions. In a dual effort to stratify the corpus for speaker and to avoid unbalanced sample size, not all of the eligible Continuities were included. Rather, I aimed to select 25 Continuities per speaker, evenly distributed in the conversation (beginning, middle, end). For some speakers, it was not possible to reach the target of 25 eligible Continuities, for example because they delivered many TCUs in overlap or corresponding to feedback. Table 1 summarizes how many Continuities were included per speaker.
Inclusion of Continuities per speaker.
3.5 Variables
For an instrumental analysis of register, I focused on four indicators of the register level and register span of TCUs:
minimum F0 (
maximum F0 (
median F0 (
register span (
Using multivariate logistic regression, I analyzed the impact that topic structure (Continuity vs. Transition) has on these four variables.
To transpose individual measures of pitch to a more global analysis, I defined, for each variable and for each speaker, a threshold value above which I could consider that the TCU had a higher maxF0, higher minF0, higher key, or expanded span. My method in this respect is consonant with Sicoli et al. (2015) who analyzed initial pitch in questions to determine whether onset height is predictive of the action a question carries out. They considered that a question had “marked pitch” if its F0 value was in the top 10% of a speaker’s range. In a comparable manner, I used a statistical measure of dispersion to determine the thresholds defining a maxF0, minF0 and key values as “higher,” or a span value as “expanded,” but focused on top 25% values rather than top 10% values. I used the third quartile (Q3) as a cut-off point for all four variables and for each speaker. As the values above Q3 correspond to the speaker’s top 25% values, this threshold ensures that any value above Q3 is likely to be qualitatively “higher” (or “expanded” in the case of span) and, as such, can be considered a rather marked value. For example, speaker FRE from example (3) reaches an expanded span at roughly 1 octave (0.962) for a TCU, but ALC from example (2) needs to use 1.416 octaves for the span of her TCU to be considered expanded. This system allows measures to remain individual, while making cross-speaker comparisons possible as well.
As an illustration, Figure 1 shows the distribution of register span for all speakers in the subset of 450 TCUs selected for instrumental analysis. The barplot suggests that topic Transition tends to mobilize an expanded span much more so than Continuity, as 42% of Transitions are delivered with an expanded span (dark grey), while the register span mobilized for Continuity is much more evenly distributed (only 15% with expanded span). Operationalizing the variable “register span” involved a binary distinction between TCUs mobilizing an expanded span (i.e., > Q3) and TCUs not doing so (≤ Q3). The same method was used for the other three prosodic variables analyzed here—minF0, maxF0, and key.

Cross-speaker mobilization of register span according to topic status of TCU (N = 450 TCUs).
Converting continuous variables into categorical variables can be questionable from a statistical point of view, as it flattens the data and some information is lost. All that remains from micro-variations in the data are cruder binary categories—for example, whether a TCU has an expanded span or not without indication of just how expanded it is. Nevertheless, doing so does present advantages. Using binary variables in Corpus Linguistics can be very useful as it corresponds to the way many grammatical phenomena are considered to function. Despite the fact that most prosodic phenomena are continuous in nature and display tremendous variability, grammar involves contrasts and categories. I chose a level of granularity sensitive to the difference between marked and unmarked prosodic design. 4 Determining where the cut-off point would be (here, Q3) involves some degree of arbitrariness, but converting a continuous variable such as register span into a binary variable can be motivated from an interactional point of view. The present study rests on the assumption that speakers are more sensitive to a marked variation in prosody rather than to the exact value. Moreover, working on such categorical variables allows for an easier treatment of cross-speaker variability. Converting prosodic cues to categorical variables facilitates quantitative treatment and analysis, as it allows for multivariate logistic regression.
4 Results
4.1 High register level
High register level can be crucial to the interpretation of a TCU as a topic transition. In example (2), a TCU containing a relative clause could be interpreted as Continuity, but instead, its high register participates in its interpretation as a Transition. ALN is describing her arrival at a party, and before she launches into the description of two guests (“paddlers” l.3), she embarks into a side-story about one of them (l.5), providing justification for her negative opinion of her.
5
ALN’s transition l.5 is delivered with a higher register level than her previous turn (l.3), which not only marks the move away from the current topic, but also projects a multi-turn unit (Selting, 2000). Without prosodic upgrade, ALN’s transition could have been more easily interpreted as continuation of the previous topic. If it was not for the higher register level cueing disjunction, the relative clause present in the transition (“one of which had a Halloween party”) could be interpreted as side information because of its prototypical function of adding syntactically dependent information. ALN thus mobilizes the syntactic and informational properties of relative clauses to signal a topic transition, and a high register level is crucial to this interpretation.
4.2 Expanded register span
The two dimensions of register—level and span—are connected. Variation in register level tends to involve upward shifts as speakers use the lower third of their voice range more and consequently have less potential for variation in their lower range. For that same reason, variations in register span also tend to translate into the mobilization of the upper part of a speaker’s voice range to find room for expansion. However, maintaining the distinction is analytically useful, as speakers can also modulate the two parameters independently. A TCU can be delivered with an expanded span without reaching a high register level (modulated, low contour), but another option can be to opt for a higher register level overall with a compressed span (high, flat contour). The topic transition presented in example (3) corresponds to the former scenario. FRE and RIC have been organizing a basketball game at the local YMCA, which is very close to where RIC works. This leads FRE to initiate a topic transition (l.10) about visiting RIC at work:
FRE’s transition l.10 (“you know I have been wanting to go visit you”) is characterized by an expanded register span, but it is not situated particularly high in his voice range. The transition is very modulated and spans over 10 semitones (Figure 2).

Pitch contour of the topic transition (l.10) in example (3).
By comparison, FRE’s preceding turn l.5, a TCU doing Continuity, spans over 7 semitones (Figure 3).

Pitch contour of the Continuity (l.5) in example (3).
4.3 Logistic regression
Logistic regression is a confirmatory statistical modeling technique that analyzes whether one or more variables can independently predict another variable. In the case in question, I wanted to see whether the use of four marked prosodic parameters are associated with topic structure (Transition vs. Continuity). If so, it would be a good indication that mobilizing these prosodic cues is connected to the action of cueing, and possibly perceiving, topic transition in interaction.
In the statistics program R (Version 3.0.2, R Development Core Team, 2013), three functions were used to perform a logistic regression:
- glm() (pre-installed),
- lrm() (rms package; Harrell, 2014): indicates the overall predictive strength of the model,
- glmer() (lme4 package; Bates, Maechler, Bolker, & Walker, 2014): used for mixed-effects models.
Multivariate logistic regression showed that three of the four prosodic cues are associated with topic structure (Table 2 reports the results of the model generated by the glm() function). A higher minimum F0 does not indicate either a Transition or Continuity. However, the other three variables are significantly associated with topic structure. When initiating a topic transition, speakers are 1.87 times more likely to use a higher maximum F0, 2.04 times more likely to use a higher key, and 2.63 times more likely to use an expanded register span.
Fixed-effect logistic regression (GLM)—prosodic cues to topic transition.
N = 450; AIC = 555.66.
s. = not significant (p ≥ 0.05); * = significant (p < 0.05).
In addition, the lrm() function provided the following model diagnostic indicators reported in Table 3:
Fixed-effect logistic regression (LRM)—prosodic cues to topic transition.
Checking for multicollinearity can ensure that variables are not correlated, which would inflate the results. The variance inflation factor (VIF) scores indicate that this is not the case in the model (Table 4).
Variance inflation factor scores for the GLM model.
A score of 4 or higher would suggest multicollinearity. The VIF scores of the model do not even reach the more conservative threshold of 2.5, showing that no major correlation between the variables affects the model. Finally, the LMR model was cross-validated by bootstrapping with the validate() function: the c-statistic varied very little (0.668) after 500 bootstraps.
Adding a random variable to account for the fact that each speaker was responsible for more than one observation translated into better predictive strength (see Appendix 1). A mixed-effect model generated with the glmer() function yielded a c-statistic of 0.726, slightly higher than the c-statistic obtained with the lrm() model (C = 0.688).
5 Discussion
5.1 Statistical modeling
Statistical modeling confirmed the qualitative analysis proposed by Yule (1980) as well as subsequent qualitative analyses, which suggested that a high register level is a cue to topic transition. However, it should be stressed that register level was not directly analyzed here, but rather looked at through the lens of two indirect parameters: maxF0 and key. These findings confirm and expand on the results presented in Nakajima and Allen’s (1993) for F0 peaks. The speakers’ baseline (minF0) is not significantly raised, which confirms frequent observations that register variations tend to occur in the upper voice range (Szczepek Reed, 2011a; Wichmann, 2000).
Relative to speakers’ habitual profiles, the mobilization of such prosodic cues corresponds to a marked upgraded prosodic design (top 25% values). This finding is consistent with the general assumption that continuation is habitually an unmarked action in conversation, and that departing from it, as in the case of a topic transition, requires a marked action and marked linguistic design. This is the argument found for example in Schegloff (1990) and Holt and Drew (2005) as to why stepwise transition (vs. disjunctive) is the norm in conversation, because of the way it fits to prior talk and allows for a smoother topic flow. While it did not compare stepwise and disjunctive transitions, the present study suggests that the action of opening a new topic is treated by speakers as a disjunctive action, which is consequently associated to the use of a marked prosodic cue. One could see some degree of iconicity in such a mapping of interactional action and prosodic design.
The main contribution of the present study is to show the relevance of pitch register variations for the global management of talk-in-interaction. Previous work has demonstrated that speakers can mobilize register variations for interactional actions at the local level. Zellers (2013) adopted an initializing approach and compared the prosody of turns initiating transition with that of their previous turns and following turns. She found that the turn initiating the transition does not have a more expanded span than the preceding turn, but that the turn following the transition—the one consolidating the switch of topics—has a more expanded span. Her study thus shed light on the local prosodic management of topic transition, focusing on stepwise transitions. Thanks to its normalizing approach, the present study suggests that speakers routinely mobilize a marked prosodic design for their topic transitions, that is, using their upper and/or more expanded register. Consequently, topic transitions tend to be prosodically distinct from all other turns-at-talk. Despite the variety of topic transitions included (stepwise, disjunctive), these results suggest that they still form a group of turns sharing a common interactional function. This also supports the idea that topic transition is an interactional activity of high relevance to speakers, and that specific linguistic strategies are allocated to its implementation.
5.2 Limitations and further research
The specific contribution of this study is to combine the analysis of several prosodic cues pertaining to different aspects of pitch register. The inclusion of additional prosodic cues such as onset height or speech rate could form the object of further research to complement our understanding not only of the contribution of individual prosodic cues but also of the way they intersect.
A perception study manipulating the variables analyzed here could inquire into the speakers’ perception of fine alterations of the F0, and determine which cues they rely on most crucially. It could also help untangle the distinct but intertwined dimensions of register level and register span—something that is very difficult to do in auditory and instrumental analysis of spontaneous interaction.
The results presented here suggest that speakers do use register variations to cue an interactional task. The question of whether they treat such variations as discrete or gradient remains open. One way to answer this question specifically with topic transition in mind could be to determine whether speakers treat a topic transition as more disjunctive or abrupt when register variations are the most striking. An imitation task such as the one imagined by Pierrehumbert and Steele (1989) and reported by Gussenhoven (1999) might provide answers on the matter.
The results of the present study concern American English, and it remains to be determined whether they hold for other languages. It may also be worthwhile considering if these findings apply to other varieties of English, for example for speakers who routinely use uptalk (Warren, 2016) or in varieties that favor extreme high rise pitch movements, such as Australian English (Fletcher & Harrington, 2001), since pitch movements with much amplitude necessarily affect register span. Recent findings also suggest that some interactional mechanisms may be universals (see Levinson, 2006a and 2006b for his discussion of the “interaction engine”), and in this light it may be fruitful to ask whether register variations are routinely used by speakers across languages to cue global interactional management.
5.3 Interplay with verbal cues
One does not expect absolute pairings of intonational forms and linguistic functions, as prosodic cues are versatile and can be used for a variety of purposes in interaction. However, this study contributes to a growing body of research showing that prosody can be considered part of grammar, as it can be mobilized alongside verbal forms to cue specific interactional actions. Example (4) is a case in point, in which high register level is combined with a discourse marker to cue a topic transition. RIC has been telling FRE about his recent break-up with his partner Jeanie (“her” l.1), and how he still hopes they can get back together.
RIC uses a combination of two discourse markers (so and I mean) and high register level for his topic transition l.4. Interestingly, RIC mobilizes a high register level only at the beginning of his transition, and this moment of upward shift encompasses the two discourse markers. This combination of high register level and a chain of two discourse markers early in the transition helps signal that it is to be understood as a disjunction from previous turns. Without a high register level indicating a new beginning, or at least a boundary of some sort, RIC’s transition could have been interpreted as a continuation of the previous sequence—especially because of the discourse marker so, which can signal continuity. Indeed, it may very well be a form of conclusion wrapping things up on the topic of “getting back together.” High register level here functions as a signal that RIC is embarking on something new: a different part of his narrative—this time, about a specific telephone conversation with his ex-partner. After the transition, RIC reverts to a lower register level (l.5–7).
Prosodic cues are thus part of the protean repertoire available to speakers shaping on-going talk. This study was part of a larger research project investigating the role of verbal cues along with prosodic cues in topic transition. A subsequent publication will report on the interplay between register variations and verbal cues such as discourse markers and questions.
6 Conclusion
Prosody is one of the modalities that can be mobilized by speakers in the environment of a topic transition, and conversational participants display their orientation to prosody as a crucial cue to topic structure. In this paper, I demonstrated that two dimensions of register variations are routinely mobilized to transition to a new topic in American English conversation: those of register level and register span.
Statistical modeling confirmed that at least three prosodic variables are involved in the prosodic marking of topic transition: a higher maxF0, a higher key, and an expanded register span. A raised minF0 does not help predict either a Transition or Continuity. However, the other three variables are significant predictors—expanded register span being the strongest of the three. When initiating a topic transition, speakers are:
twice as likely to use a higher maximum F0
twice as likely to use a higher key
more than 2.5 times as likely to use an expanded register span
Relative to speakers’ habitual profiles, the mobilization of such prosodic cues corresponds to a marked upgraded prosodic design (top 25% values). This finding is consistent with the general assumption that continuation constitutes the norm in conversation, and that departing from it, as in the case of a topic transition, requires a marked action and marked linguistic design. Consequently, it can be argued that the disjunctive action of opening a new topic corresponds to the use of a marked prosodic cue.
Footnotes
Appendix 1. Transcription conventions
The transcription conventions used in this paper mostly correspond to the revised system devised by Gail Jefferson for Conversation Analysis (see for example Jefferson (2004) and Hepburn & Bolden (2013)), but with normalized orthography, following Szczepek Reed (2011a) and Thompson, Fox, & Couper-Kuhlen (2015), inter alia. Symbols transcribing prosody are inspired by Szczepek Reed (2011) for the bracket notations (< >), with an additional notation for register span inspired by Di Cristo et al. (2004). Each number line in the transcripts corresponds to a turn-constructional unit (TCU).
Appendix 2. Mixed-effect logistic regression model
Mixed-effect logistic regression—prosodic cues to topic transition.
|
|
|
|||||
|---|---|---|---|---|---|---|
| Predictor | Odds Ratio | [95% C.I.] | p-value | Group | Variance | S.E. |
|
|
|
0.05 | 1.23 | |||
| Raised minF0 | 0.90 | [0.51 – 1-60] | 0.73 (n.s.) | |||
|
|
||||||
| Raised maxF0 | 1.91 | [1.07 – 3.41] | 0.03 (*) | |||
|
|
||||||
| Raised key | 2.04 | [1.20 – 3.50] | 0.009 (*) | |||
|
|
||||||
| Expanded span | 2.69 | [1.49 – 4.87] | 0.001 (*) | |||
N = 450; AIC = 556.7; C-statistic = 0.726.
C.I.: Confidence Interval; S.E.: Standard Error; n.s.: not significant (p ≥ 0.05); *: significant (p < 0.05).
Acknowledgements
I am very grateful to Elisabeth Delais-Roussarie, Eric Corre, and Dylan Glynn for their kind and most helpful comments and suggestions at various stages of the doctoral research from which this paper stems. I would also like to thank the participants who provided helpful comments on a previous version of this study when it was presented as a poster at the Phonetics and Phonology in Europe (PaPE) 2015 conference that took place at the University of Cambridge in June 2015. Finally, I am very grateful to two anonymous reviewers, whose constructive criticisms have enabled me to improve this paper greatly. Any shortcoming remains my own.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Scientific Council of the Sorbonne-Nouvelle Paris 3 through a doctoral grant awarded by the Doctoral School of English, German, and European Studies of the Sorbonne-Nouvelle Paris 3; additional support was provided by the Department of Linguistics and Doctoral School of Linguistics of Paris Diderot Paris 7 University.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
