The dual status of filled pauses: Evidence from genre,proficiency and co-occurrence

Abstract

The present corpus study aims to contribute to the debate regarding the lexical or non-lexical status of filled pauses. Although they are commonly associated with hesitation, disfluency, and production difficulty, it has also been argued that they can serve more fluent communicative functions in discourse (e.g., turn-taking, stance-marking). Our work is grounded in a usage-based and discourse-functional approach to filled pauses, and we address this debate by examining the multiple characteristics of euh and eum in spoken French, as well as their co-occurrence with discourse markers. Combining quantitative and qualitative analyses, we analyze their distribution across different communication settings (prepared monologs vs. spontaneous conversations) and levels of language proficiency (native vs. non-native). Quantitative findings indicate differences in frequency, duration, position, and patterns of co-occurrence across corpora, and our qualitative analyses identify fine-grained differences, mainly two distinct patterns of distribution (initial position clustered with a discourse marker vs. medial position clustered with other hesitation markers), reflecting the different “fluent” and “disfluent” uses of filled pauses. We thus argue for a dual status of euh and eum based on formal, functional, and contextual features.

Keywords

filled pauses pragmatic markers proficiency disfluency usage-based

1 Introduction

Filled pauses, also commonly known as uh and um, have been studied extensively in different languages, such as French (e.g., Candea, 2000), English (e.g., Clark & Fox Tree, 2002), Japanese (e.g., Watanabe et al., 2008), Dutch (e.g., De Leeuw, 2007) or Portuguese (e.g., Moniz et al., 2007). Despite phonological variations across languages, filled pauses mainly consist of two contrasting variants, namely a central vowel uh and a nasal sound uhm (Clark & Fox Tree, 2002, p. 92). For all we know about filled pauses, there is still some debate regarding their status as lexical or non-lexical items (Clark & Fox Tree, 2002). Furthermore, if filled pauses are part of the lexicon, to which linguistic category do they belong? The present study addresses this issue by triangulating evidence from multiple data types in French (native vs. non-native; prepared vs. spontaneous speech) and multiple linguistic variables, including their co-occurrence with discourse markers (e.g., ben “well,” mais “but,” donc “so”) and their pragmatic functions.

Filled pauses are commonly associated with hesitation, as they are said to arise when speakers are uncertain (e.g., Smith & Clark, 1993) or when they have choices to make (Finlayson & Corley, 2012). However, evidence suggests that they can also mark discourse structure (Swerts, 1998) and manage turn-taking (Beňuš, 2009; Kjellmer, 2003). Given their ambivalent status, filled pauses have been described in radically different ways, which Clark and Fox Tree (2002) summarized as: (i) “filler-as-symptom,” (ii) “filler-as-signal” and (iii) “filler-as-word.” In the first view, filled pauses are considered as pure “symptoms” of production difficulties (e.g., Levelt, 1983). By contrast, advocates of the second position acknowledge that filled pauses can “signal” some linguistic functions but refrain from classifying them into standard linguistic categories. For instance, O’Connell and Kowal (2005) rule out the comparison between filled pauses and interjections by pointing out that, unlike the latter, the former do not express emotions. Similarly, Corley and Stewart (2008) further argue that there is no clear evidence that filled pauses are words with communicative meanings.

The third position (filler-as-word) is attracting growing support in corpus-based and experimental research. Clark and Fox Tree (2002) claim that filled pauses present the same characteristics as words with respect to phonology, prosody, syntax and semantics. More recently, Kirjavainen et al. (forthcoming) conducted perception experiments on filled pauses in constructions with specific lexical items and showed that participants recognized some constructions as more acceptable than others, in particular when the filled pause follows a given word (e.g., said um). The authors suggest that filled pauses could therefore be cognitively represented as grammatical clitics (see also Schneider, 2014). Tottie (2011, 2014, 2015, 2016, 2019) dedicated several studies to this issue and proposed the term planners (also used by Jucker, 2014), as their main function would be to signal the online production of utterances. She showed that the use of filled pauses is conditioned by several factors such as register, context, gender or social class. Tottie (2016) concludes that filled pauses should be treated on a par with pragmatic markers such as you know or well, with which they frequently co-occur (cf. also Swerts, 1998).

The present corpus study aims to contribute to this debate by examining the characteristics of filled pauses in French. In particular, we take up Tottie’s (2016) claim that filled pauses are similar to discourse markers by comparing their characteristics when they co-occur with a discourse marker and when they do not. In addition, we also compare the rate and features of filled pauses in native and non-native French, as well as in prepared and spontaneous settings. In doing so, we will triangulate evidence in support of a dual view of filled pauses, in keeping with a functional, usage-based approach to discourse.

We will first review previous studies on the use and distribution of filled pauses across contexts and speakers (Section 2). Our usage-based approach to filled pauses will be introduced in 2.3. In Section 3, we will present the corpora and method. Our analysis in Section 4 will then test whether the frequency, duration, form and position of filled pauses vary across contexts of use and speaker proficiency. We discuss the theoretical implications of our findings regarding the status of filled pauses in Section 5. Finally, we provide some conclusions and perspectives in Section 6.

2 Filled pauses vary with context

In this section, we first take stock of previous studies on the variation of filled pauses across communicative settings (2.1) and across native and non-native speakers (2.2). We then present our usage-based approach to filled pauses and discuss in particular their status with respect to discourse markers and other related phenomena (2.3).

2.1 Filled pauses across communicative settings

Several authors have conducted corpus-based studies in order to investigate the impact of the communicative context on the uses and functions of filled pauses in discourse. For example, Crible (2018) found that filled pauses frequently co-occur with discourse markers, especially in contexts of low interactivity where speakers produce long stretches of talk, such as lectures or political speeches. She further found that filled pauses were more frequent in face-to-face interviews compared to radio interviews, which may be due to the higher degree of preparation and professionalism in broadcast settings.

Others have also been concerned with the role of conversation topic and topic familiarity. For instance, Schachter et al. (1991) compared rates of filled pauses in academic lectures, and found higher rates in humanities compared to social and natural sciences, which the authors explain by the number of options that different disciplines offer to talk about their subject matter. Merlo and Mansur (2004) investigated the role of topic familiarityand found no difference in frequency between familiar and unfamiliar topics; however, they observed differences across types of information (e.g., attributes, spatial location, comments). Similarly, Bortfeld et al. (2001) conducted a large corpus study involving 48 pairs of speakers who discussed familiar and unfamiliar topics, to test whether disfluencies (such as restarts, repeats, and filled pauses) would increase with heavier planning demands. While most disfluencies were more frequent during the description of unfamiliar topics, the opposite tendency was found for filled pauses. They concluded that while turn-initial filled pauses could reflect a planning effort, it may not be because speakers were experiencing trouble, but because they were displaying their intention to take the turn, thus performing an interpersonal function. This supports the idea that filled pauses can be used as resources for discourse and turn-taking purposes (in line with Sacks et al., 1974; Schegloff, 2010). In addition, Michel et al. (2007) investigated the influence of task condition on fluency rates in L2 speech and found significant differences between monologic and dialogic situations. They compared dyadic phone conversations (dialog) and messages left on an answering machine (monolog). Results showed that the non-native speakers produced simpler sentences during dialogs, and in fact produced significantly fewer filled pauses than during monologs (which led to fewer errors). This positive effect of interactivity was not what the investigators of the study had initially predicted, and they suggested that it could be explained by the type of speaking task (a phone conversation). Speakers were perhaps more likely to help their partners by taking the turn when the latter were pausing (thus leading to shorter and more fluent utterances). During monologs, on the other hand, speakers can no longer rely on their conversational partner to yield the turn, which may compel them to produce longer disfluent utterances.

Furthermore, Duez (1982) compared the uses of filled and unfilled pauses in the speech of French politicians across three different contexts (political speech, political interviews, and casual interviews) in order to investigate their possible stylistic function. Her results showed that pauses were much more frequent in political and casual interviews than in political speeches but were strikingly longer in the latter. She suggested that the high rate of pauses in interviews may relate to speakers’ focusing on planning and production issues in spontaneous settings, while the long duration of pauses in prepared speeches may perform a stylistic function, namely, to emphasize what is being said.

These corpus studies have shown the impact of degree of preparation and formality on the distribution of filled pauses. Other factors, such as anxiety, may also come into play. Christenfeld and Creager (1996) investigated the relationship between filled pauses and anxiety in a production experiment with undergraduate students. They found significant differences between the low anxiety and high anxiety conditions, with an average of seven filled pauses per minute in the latter and four in the former. They concluded that the use of filled pauses was not necessarily a by-product of anxiety, but a sign that students were more self-conscious of their speech (cf. Broen & Siegel, 1972). The role of such self-monitoring can also explain Tottie’s (2014) corpus findings that showed a higher frequency of filled pauses in task-oriented contexts (deliberation, presentation of evidence), where there can be professional pressure and/or important outcomes at stake, than in casual conversation, where speakers might not be very self-conscious.

These studies paint a complex picture of multiple factors and multiple functions: although genre variation seems to suggest that filled pauses primarily reflect cognitive processes (cf. the “filler-as-symptom” view), some authors also propose discourse-functional interpretations, which are more compatible with the “filler-as-signal” (or “filler-as-word”) approach.

2.2 Filled pauses in native and non-native speech

Another key component regarding filled pause production is language proficiency. Numerous studies on filled pauses and hesitation phenomena have shown distributional differences between native and non-native speech. For example, Tavakoli (2011) found differences in position: non-native speakers tend to produce mid-clause filled pauses while native speakers produce them more frequently at discourse boundaries. Fehringer and Fry (2007) have found significant differences in the number of hesitation phenomena produced by bilingual speakers of German and English, with higher rates in their second language. De Jong (2016) further showed that high-proficiency Dutch learners produced fewer pauses than low-proficiency ones. Similarly, Riazantseva (2001) found that Russian learners paused more frequently in their L2 than in their L1, and their pauses were also found to be significantly longer in their second language. This was also the case in Kahng’s (2014) study of Korean learners, who produced pauses, which were almost twice as long as the ones produced by the English native speakers. Hesitation and fluency phenomena have thus been shown to be key components of L2 proficiency and the way non-native speakers differ from native speakers in their spoken productions.

Gilquin (2008) conducted a corpus study on hesitation markers and “smallwords” (e.g., kind of, well, I mean) produced by French learners of English and native English speakers in interviews. Her study showed that pauses were very frequent among both native speakers and learners, but that the latter produced them more frequently. One interesting finding is that, while French-speaking learners overused pauses (both filled and unfilled), they did not make use of the full range of smallwords. In fact, they were extremely underused. She gives the example of like, which was very common in native English speech, but almost absent in French learner speech. She added that filled pauses were crucial to non-native speakers as a conversational strategy, as they can be used to signal production difficulties to their conversational partner, but also to keep the floor or to be more polite, functions that also exist in native use. This functional approach to hesitation phenomena, which lies at the core of most corpus-based studies, aims to support an ambivalent view of filled pauses and regards them as conversational tools.

Overall, using filled pauses in very high frequencies and/or in non-typical positions seems to be associated with low proficiency, but it has also been shown that they can be used to perform essential interactive functions. This ambivalence, also observed across communicative settings in native data (cf. Section 2.1), prompts us to adopt a functional, usage-based approach to the issue of the status of filled pauses, which we develop in the next section.

2.3 Filled pauses, discourse markers and (dis)fluency: a usage-based approach

Filled pauses have often been categorized as disfluency markers by psycholinguists (Ferreira & Bailey, 2004; Shriberg, 1994) as their main formal characteristic is to interrupt the production of utterances. But as described earlier, filled pauses can serve many other functions besides signaling an interruption in speech. This paper is grounded in a functionally ambivalent view of (dis)fluency (see Crible et al., 2019; Götz, 2013; Kosmala, in press.; Pallaud et al., 2013) which no longer considers disfluency phenomena as a binary opposition between “fluency” and “disfluency” but rather adopts a usage-based approach in which fluencemes, such as filled pauses, have the potential to serve both fluent and disfluent functions, depending on the configuration and context of use. Thus, mid-utterance, lexical-search uses of filled pauses might relate more to disfluency, whereas turn-initial uses might contribute to the smoothness and flow of the interaction.

In this respect, filled pauses are similar to another pragmatic class of (dis)fluent devices, namely “discourse markers” (Schiffrin, 1987), which are also polyfunctional. Discourse markers are frequent expressions such as well, so or actually that serve to manage the discourse structure and the interaction, through a wide range of functions including marking discourse relations (e.g., consequence, contrast) but also expressing the speaker’s subjectivity (e.g., I mean) and addressing the hearer (e.g., you know) (see Maschler & Schiffrin, 2015 for a recent overview). Crible (2018) provided a detailed study of the many functions of discourse markers across genres, using their co-occurrence with other fluencemes (including filled pauses) as criterion to evaluate the relative fluency of different discourse markers. For instance, she found that markers expressing the “reformulation,” “punctuation,” or “monitoring” functions tended to be rather disruptive and disfluent, whereas discourse markers performing a “sequential” function were more associated with discourse structure and fluency (see also Crible & Pascual, 2020).

The frequent clustering of discourse markers and filled pauses, as observed by Crible et al. (2017) in various genres of English and French, has been taken by Tottie (2016) as evidence in support of a unifying view of discourse markers and filled pauses as belonging to the same category of “planners” (her term). In her study, she looked at the co-occurrence of filled pauses with discourse markers such as well, you know, I mean, or like. She found that filled pauses and discourse markers can be used turn-initially to gain time while planning the upcoming turn. Her results showed that a majority of uhm tokens (70%) did not cluster with any discourse markers. However, she found a tendency for coordinating conjunctions such as and and but to co-occur with filled pauses (13% of all conjunctions), in line with Schneider (2014, p. 9) who pointed out that they often merged together to form chunks such as anduh and butuh. Tottie concludes that the shared functions and frequent co-occurrence between filled pauses and discourse markers vouch for a single category of “planners” that includes both types of elements.

Tottie’s (2016) argument, that repeated co-occurrence leads to joint categorization, is in line with the principles of Cognitive Grammar and usage-based linguistics (e.g., Glynn, 2010, p. 8). In the present study, we follow this line of reasoning and add a functional dimension to the analysis of co-occurrence: given the polyfunctionality of discourse markers, we propose to systematically disambiguate the meaning-in-context of discourse markers clustered with filled pauses, using Crible and Degand’s (2019) coding scheme in four discourse “domains,” which will be presented in the next section. The type of function expressed by the discourse marker is taken to be “passed onto” the adjacent filled pause, which will in turn shed more light onto its potential pragmatic nature.

Overall, the ambivalence of filled pauses and other fluencemes, with respect to their functions and relative degree of (dis)fluency, echoes the issue of their (lexical or other) status: different contexts of use may result in different categorizations depending on the function-in-context. In other words, whether or not filled pauses are words might not have a unique answer but rather requires a context-bound approach investigating recurrent form-function patterns, as previous studies have done for other discourse-pragmatic phenomena (e.g., Fischer, 2015; Fried & Östman, 2005). To meet this goal, the present study analyzes the frequency and characteristics of filled pauses across three variables: i) speech genre (prepared monolog vs. spontaneous dialog); ii) speaker nativeness (native vs. non-native French); and iii) co-occurrence (clustered with a discourse marker or not). This novel combination of variables is expected to further our understanding of the production of filled pauses.

In sum, we propose a quantitative-qualitative analysis that aims at reconciling the three views of filled pauses summarized by Clark and Fox Tree (2002), on the basis of systematic corpus analyses that are detailed in the following section.

3 Corpus and method

3.1 Data

Given the rarity of French spoken data in native and non-native speech that has been transcribed and coded for the purpose of (dis)fluency research,¹ we selected an available videotaped corpus of L1–L2 interactions of French and American speakers, based on an existing richly annotated sample (Kosmala, in press). In order to conduct further cross-genre analyses, we chose a second dataset with comparable features where two genres are included.

The first data source is the SITAF corpus (Horgues and Scheuer, 2015), which recorded French and American undergraduate students of approximately the same age (in their early twenties) studying at the same university. The students were part of a tandem program, which randomly assigned pairs of French and English-speaking students, who met once a month during the academic year. They were videotaped during semi-guided speaking tasks in which they were asked to discuss a given topic and agree on their level of agreement. These tasks lasted 2–3 minutes on average. The selected data for the present study includes six pairs of French and American students conversing in French.

The second corpus used for our study is the DisReg corpus (Kosmala, 2020), which comprises 12 French undergraduate students (aged 18–21) enrolled in a French literature class. They were recorded in two different communication settings, first in pairs during a casual conversation, and a second time individually during class presentations. The class presentation was a graded oral assignment which they prepared at home. They had their notes written on a piece of paper, and very often just read them aloud without spontaneously engaging with their audience (the classroom). For the conversation part, the assigned pair knew each other fairly well: they were either friends or classmates. They were given a few topics to talk about to start with (funny anecdote at university, last film seen on TV, etc.) but they were also free to talk about anything else, which they often did. The recordings were on average much longer than in the SITAF corpus (about 25–30 minutes on average) so we randomly selected 3–4 minutes from each recording of the DisReg corpus to match approximately the average size of the recordings found in SITAF.

Table 1 gives the corpus size in number of words and total duration, broken down by speaker group and genre in the two corpora under scrutiny.

Table 1.

Corpus size.

	SITAF corpus	DisReg corpus
Number of words	Native speakers: 2323 Non-native speakers: 2253	Class presentations: 5609 Conversations: 6981
Duration (min)	Native speakers: 15:16 Non-native speakers: 11:30	Class presentations: 34:30 Conversations: 31:30
Participants	Twelve participants (aged 18–21) Six American speakers Six French speakers	Twelve participants (aged 18–21) French speakers

3.2 Method

Filled pauses were identified depending on their phonological variants (mainly [ə(:)] and [ə(:)m]) designated as euh and eum.² They were annotated for duration in milliseconds and position in the intonation unit (defined as “a stretch of speech uttered under a single coherent intonation contour,” Du Bois et al., 2013, p. 47). Four positions were distinguished: initial (Example 1), medial (Example 2), final (i.e., when the unit is incomplete; Example 3), standalone (4), and interrupted by another speaker (5).

(1) (0.410) euh on a seulement toi³

(0.410) euh we only have you

(2) même pour la dissert (0.685) euh c’est les deux pires

even for the essay (0.685) euh these two are the worst

(3) (0.422) ils essayent de le sauver et pour euh –

bon pour Scapin c’est un peu différent

(0.422) they’re trying to save him and for euh—

well for Scapin it’s a little different

(4) une anecdote t’en as pas une?

eum (1.120)

ah c’est bon!

an anecdote don’t you have one?

eum (1.120)

ah I have one!

(5) <spk1> (0.465) le maître et euh –

<spk2> j’ai pas encore lu Tartuffe

<spk1> (0.465) the master and euh—

<spk2> I haven’t read Tartuffe yet

The immediate context of the filled pause was then analyzed to see whether the item formed a cluster with at least one other fluenceme. We included the following types of fluencemes in this analysis: unfilled pauses (400 ms minimum duration threshold, following Derwing et al., 2009 and Tavakoli, 2011), lengthenings (marked prolongations of phonemes), repetitions (non-semantic repetitions of a word or a segment), self-repairs (reformulations made by the speaker), false starts (self-interrupted and incomplete units), non-linguistic sounds (such as tongue clicks and inbreaths) and discourse markers. Following Crible et al. (2019, p. 22), we define discourse markers as “optional expressions with a procedural meaning and a discourse-structuring function,” which includes connectives (such as mais “but” or alors “so”) and other pragmatic particles such as ben or bon “well.” In our sample, the full list of discourse markers in a cluster with a filled pause is the following: alors “well/then,” après “after/but,” bah “well,” ben “well,” bon “well,” donc “so,” du coup “so,” en fait “actually,” enfin “I mean,” et “and,” mais “but,” même si “even if,” ou “or,” par exemple “for example,” parce que “because,” puis “then,” si “if,” voilà quoi “that’s it you know,” voyons “let’s see.”

Each filled pause instance was thus categorized as either isolated or clustered. In the SITAF corpus, we also specifically identified filled pauses that clustered with one (or more) discourse marker(s). Consider these examples:

(6) si on connait euh rien à l’Écosse ou quoi c’est pas grave?

if we don’t euh know anything about Scotland or anything is that okay?

(7) moi je pense que:e en général euh (0.925) si:i si:i si ton pote fa:ait une bêtise

I think tha:at in general euh (0.925) i:if i:if i:if if your friend does something bad

In (6), the speaker produced an isolated occurrence of euh, which does not co-occur with any other fluenceme. In (7), however, the filled pause co-occurs with an unfilled pause, and several repetitions and lengthenings of the discourse marker si. Example (7) thus shows a greater disruption in the speech flow, which is ultimately more “disfluent” than the isolated instance of the filled pause in (6).

Lastly, filled pauses that clustered with a discourse marker were further analyzed with respect to i) the position of the marker at either the left, right or both sides of the filled pause, and ii) the function of the discourse marker. The latter was operationalized through Crible and Degand’s (2019) framework, where four discourse domains are distinguished:

ideational uses, where discourse markers connect facts (Example 8);

rhetorical uses, where discourse markers connect ideas, opinions or express the speaker’s subjectivity (Example 9);

sequential uses, where discourse markers signal major transitions (topics or turns) and regulate the flow of speech (Example 10);

interpersonal uses, where the markers are hearer-oriented for monitoring or (dis)agreement purposes (Example 11).

(8) si on vient à Paris et on on veut voir la tour Eiffel euh je ne sais pas Notre Dame voilà

alors euh c’est—

on on e:est on a une vision qui e:est un peu limitée

if you go to Paris and you want to see the Eiffel tower uh I don’t know Notre Dame

alors ‘then’ uh it’s –

you you are you have a vision that is a bit limited

(9) le sujet c’était euh (0.509) quand quelqu’un se se trompe leur petit ami ou quelque chose

c’est grave oui euh mais pas dans la même manière

the topic was uh when someone cheats on their boyfriend or something

it’s serious yes uh mais ‘but’ not in the same way

(10) parce que c’est mal de laisser da:ans croire qu’il a raison alors que pas du tout

donc euh imaginons on est dans le cadre d’une dispute

because it’s wrong to let someone believe they’re right when they’re really not

donc ‘so’ uh let’s imagine we’re in an argument

(11) alors un vrai ami doit prendre notre défense quoi qu’il arrive

bah oui euh ce sont des vrais amis

so a true friend must take our side no matter what

bah ‘well’ yes uh they are true friends

In (8), alors connects a condition (coming to see only the Eiffel tower) with its consequence (you only get a limited view of Paris). In (9), the speaker nuances his subjective evaluation of an immoral action (cheating). In (10), the speaker uses donc to mark the start of a scenario used as an example. Finally, in (11), the speaker emphasizes his agreement with the interlocutor by starting his turn with bah. The two authors performed this functional analysis in a double-blind way; we then compared our results and discussed disagreements until a consensus was reached for all cases. More details on this framework (inter-annotator agreement, operational definitions, etc.) can be found in Crible and Degand (2019).

3.3 Data analysis

A total of 664 occurrences of filled pauses were extracted from the DisReg corpus and 223 from SITAF, all coded for four variables (form, duration, position, isolated vs. clustered use). In addition, the instances from the SITAF corpus were annotated for the presence of a discourse marker. In this data, we found 69 cases of co-occurrence with a discourse marker, which were further coded for position and function of the marker.

To analyze this data, we ran log-likelihood tests to measure frequency differences across corpora (on http://ucrel.lancs.ac.uk/llwizard.html) and z-scores to assess the significance of differences between proportions (e.g., rate of isolated vs. clustered uses, on http://vassarstats.net/propdiff_ind.html). We further ran linear mixed-effect regression models on filled pause duration, using the lmer function of the {lme4} package (Bates et al., 2015) on R.

4 Results

4.1 Filled pauses in class presentations versus casual conversations

In the DisReg corpus, which includes productions of French students in two different communication settings (prepared class presentations and spontaneous casual conversations), we extracted 664 instances of filled pauses, including 385 during presentations and 281 during conversations. On average, speakers produced 6.8 filled pauses per hundred words (phw) during presentations and 4.2 during conversations, which was found to be a statistically significant difference (LL = 47.02, p < .001). A linear mixed-effect model with genre as fixed effect and speaker as random effect further shows that filled pauses were also significantly longer during presentations (M = 415 ms, SD = 240 ms) than during conversations (M = 343 ms, SD = 341 ms) (β = –71.97, SE = 25.98, t = –2.770, p < .01). Large individual differences can be observed in Figure 1 which shows speakers’ rates (phw) in class presentations (prepared speech) and casual conversations (spontaneous speech). The letters in the speakers’ code (A, B, C, etc.) corresponds to a pairing of students, each coded 1 or 2.

Figure 1.

Individual rates of filled pauses per hundred words (DisReg).

Figure 1 shows stylistic differences across speakers. The majority of speakers (A1, A2, B1, B2, C1, C2 and E1) produce more filled pauses during presentations than during conversations. Conversely, other speakers (D1, D2, E2, F1) show no major difference in the rates of filled pauses across genres. Finally, speaker F2 shows the opposite behavior, with more filled pauses in the conversation than in the presentation.

Despite these individual differences, we found that filled pauses were significantly more frequent in monologs. This prevalence of filled pauses in presentations is surprising and goes against most previous studies that showed a higher frequency of filled pauses in dialog situations (e.g., Duez, 1982), with the exception of Michel et al. (2007) who found a higher rate of filled pauses during monologs than during dialogs. However, it should be noted that our analysis is also comparing prepared versus spontaneous speech. During their class presentations, the students were allowed to have their notes, and they were delivering an assignment that had already been prepared at home, so they were practically reading their notes. Therefore, the task was very much prepared, as opposed to a spontaneous conversational task where they did not know in advance what they were going to say. Although we expected that the higher planning pressure of spontaneous speech would lead to an increased rate of filled pauses, our results are more in line with the study of Michel et al. (2007), where they suggested that filled pauses in monologs can be a by-product of linguistic complexity and longer utterances with no listener contributions.

In addition, anxiety and self-monitoring might further explain the pattern found in our data: the assignment that the students had to prepare counted for 40% of their overall grade, so this task may have caused great anxiety to the speakers. Speaking in front of an audience of peers may also be intimidating. Our findings thus corroborate Christenfeld and Creager’s (1996) and Broen and Siegel’s (1972) account that filled pauses increase when speakers are more self-conscious of their speech. All these conclusions regarding genre differences must be taken with a grain of salt, considering the relatively small number of speakers in our corpus (12), which limits its representativity.

Turning to our qualitative features, more instances of the nasal variant eum were found during presentations (23%, 88/385) than during conversations (8%, 22/279; z = 5.12, p < .001). Nasal filled pauses are typically associated with longer delays and major transitions or boundaries (Clark & Fox Tree, 2002). In addition, differences were also found in the position of the filled pauses: 40% (156/385) occurred in initial position in presentations against 24% (67/279; z = 4.44, p < .001) in conversations, while only 3% of instances (12/385) occurred in final position in presentations against 14% (41/279; z = –5.43, p < .001) in conversations. These findings on phonological variant and position reflect differences of the speech genre: a class presentation requires clear segmentation and discourse boundaries to help the audience follow, so it is not surprising to find more eum and more initial filled pauses than during conversations. The higher rate of final filled pauses in spontaneous interactions, on the other hand, might perform turn-yielding or interpersonal functions (e.g., inviting hearer inferences) and could thus reflect the tendency of dialog participants to rely more on joint productions. No significant differences were found between isolated and clustered instances of filled pauses in the two settings (64% vs. 59%; z = 1.315; p = .17). Let us consider the following examples, produced by the same speaker in the two different situations:

(12) (0.570) euh on a une multiplic- une euh multiplication pardon des destructions dans ce vers deux

euh bien plus riche que que le verbe initial

eum sous la forme d’une accumulation (il) y a une asyndète d’ailleurs

eum (0.680) alors on peut se demander si si ce deuxième vers enflammer débriser ruiner mettre en pièce est une extension du premier

(0.570) euh there is a multiplic- a euh multiplicity sorry of destructions in this second line

euh much richer than than the initial verb

euh in the form of an accumulation and there is a syndeton as well

eum (0.680) so we can ask ourselves whether whether this second line to inflame to shatter to ruin to break into pieces is an extension of the first one

(13) euh il euh il se baladait avec des eum des euh je sais pas si tu vois les les le:es des gants de pieds

ah oui je vois très bien c’est celles avec les orteils

c’est des chaussures avec le:es euh voilà

avec les orteils

euh he euh he walked around with eum euh I don’t know if you see the the

the:e five finger shoes

ah right I see it’s the ones with the toes

it’s shoes with the:e euh right

with the toes

In (12), the speaker produces four initial filled pauses (at the beginning of each intonational phrase), and each time, he is looking at his notes.⁴ The filled pauses may help the speaker segment his speech and structure upcoming discourse, which is similar to what Tottie (2014) and Swerts (1998) found in their data. However, his use of filled pauses is very repetitive (five occurrences in total in this short excerpt), which makes his speech prosodically disjointed and fragmented.

In (13), the filled pauses are used in a radically different way. They mostly occur in medial position (except for the first one) and in contexts of joint lexical search. In this excerpt, the first speaker is talking about an administrative employee who was wearing funny-looking shoes, but he is taking some time to retrieve the noun phrase. This is indicated by the production of two filled pauses and several word repetitions and lengthenings. But he is not only buying time to plan upcoming speech, he is also inviting his partner in this joint word search by addressing her (I don’t know if you see). Joint word search uses of filled pauses can also precede proper nouns, as in (14) below.

(14) ah mais y’a pa:as euh l’autre là comment il s’appelle euh Yvan Yvan Attal

Rod Paradot

Yvan Attal si Yvan Attal c’est le personnage principal

oh but isn’t he euh what’s his name euh Yvan Yvan Attal?

Rod Paradot

Yvan Attal yes Yvan Attal he’s the main character

Speaker 2 first proposes French actor Rod Paradot to help speaker 1 in her search, before coming up with the correct answer, Yvan Attal. According to Tottie (2016), such contexts of use are frequent for filled pauses, as name searching is a common problem among speakers.

In addition to word search, filled pauses in conversation also serve turn-taking functions. Tottie (2016) and others (e.g., Kjellmer, 2003; Schegloff, 2010) have pointed out the roles of filled pauses in turn-taking. In Example (15), the two final filled pauses (“mais euh,” “rôle euh”) could be used by speakers to yield their turn to their partner.

(15) e:et euh e:et Neil Schneider je l’ai pas encore vu mais euh (0.929)

parce-qu’il joue dedans aussi lui?

(0.531) hh euh ouais

(0.631) mais il a un rôle euh (0.924)

je sais pas je l’ai pas vu encore

a:and euh a:and Neil Schneider I haven’t seen it yet but euh (0.929)

so he’s also playing in the film?

(0.531) euh yeah

(0.631) but he has a role euh (0.924)

I don’t know I haven’t seen it yet

They are talking about a film that neither of them has seen, so they both have limited knowledge of it. Speaker 1 first assesses that she has not seen the film and finishes her turn with a filled pause accompanied by a silent pause. Speaker 2 asks two questions in return, the second one also ends with a filled pause clustered with a silent pause. This cluster of pauses found in both speakers seems to serve a similar turn-yielding function.

These examples have shown functional and distributional differences of filled pauses across two genres with different degrees of preparation (written notes vs. spontaneous) and interactivity (monolog vs. dialog). Filled pauses seem to perform different communicative functions in face-to-face interactions (turn-taking, lexical search), whereas in presentations they tend to be used for segmentation purposes, which can sometimes be perceived as repetitive and disruptive, considering their high frequency in this genre. Both quantitative and qualitative differences were found in the data, which supports Tottie’s (2011) claim that filled pause use is highly contextual and determined by numerous factors. However, these findings also need to take into account the limited size of the data sample, which is smaller than the corpora in previous corpus-based studies. We shall also return to the issue of individual differences in Section 5. We now turn to the role of nativeness in the production of filled pauses.

4.2 Filled pauses across native and non-native French

In the SITAF corpus, where we sampled native and non-native French data, we extracted 223 instances of filled pauses, including 103 produced by native French speakers and 120 by American learners. This amounts to a rate of 4.4 phw in native speech, and 5.3 in non-native speech, which is not a significant difference (LL = 1.87, p > .05). This result contrasts with the bulk of studies on learner speech, where higher rates of filled pauses were found in non-native and low-proficiency speakers (e.g., De Jong, 2016; Gilquin, 2008). But this lack of significance could also be due to the size of the data, which only sampled six speakers in each group, as well as individual variation, as reported in Fig. 2. As the numbers show, many individual differences are found, reflecting speakers’ idiosyncrasies. While one speaker did not produce any filled pause at all (i.e., F13), another produced a really high rate (i.e., A02) compared with the average.

Figure 2.

Individual rates of filled pauses per hundred words (SITAF).

Filled pauses were, however, significantly longer in non-native French (M = 524 ms, SD = 222 ms) than in native French (M = 378 ms, SD = 200 ms), as confirmed by a linear mixed-effect model with speaker as random effect and language group as fixed effect (β = 146.36, SE = 28.43, t = 5.149, p < .001). This suggests that filled pause duration might be a more reliable index of language proficiency to distinguish between native and non-native speakers (cf. Riazantseva, 2001), at least as far as our sample is concerned.

Both speaker groups produced considerably more euh forms than the nasal variant eum, with a similar low rate of nasal pauses (around 15%). The position of filled pauses within utterances is also strikingly the same across corpora, with around 60% of medial uses, less than 30% of initial uses and 10% of final positions. Standalone and interrupted positions are exclusive to the learners, which may reflect the fragmented nature of non-native speech.

More differences arise when we turn to the rate of isolated vs. clustered filled pauses. The data shows a higher rate of isolated uses in native speakers (28% vs. 18%; z = 3.116, p < .05), which suggests that filled pauses in non-native speech are more frequently accompanied by other fluencemes. This frequent clustering tendency in learners can be seen as a sign of a higher degree of disfluency, following Crible (2018) who considered the length of fluenceme sequences as an indicator of higher disruption.⁵ Indeed, while isolated filled pauses might go unnoticed by the hearer or serve local structuring functions, they are more likely to be disruptive when clustered with other fluencemes such as unfilled pauses, repetitions and/or lengthenings. Consider the following example produced by a learner:

(16) mais c’est un peu:u euh –

si on a:a euh (0.660) eum (0.600) euh (1.177) euh –

alors il y a il y a pas un un contrat social

but it’s a little:e euh –

if we h a:ave euh (0.660) eum (0.600) euh (1.177) euh –

then there is no there is no social contract

The fluenceme sequence (in bold) is made of eight different fluencemes (a lengthening, three silent pauses, three filled pauses and a self-interruption). In fact, the speaker’s current utterance was so disfluent that he had to abandon it and start a new one. The filled and unfilled pauses are also very long (up to 1,177ms for the last one).

On the other hand, sequences of pauses (filled and unfilled) and discourse markers are quite frequent in both learners and native speakers, and do not necessarily interrupt the flow of speech, as in (17) and (18).

(17) c’est grave oui euh mais pas dans la même manière de:e

it’s bad yeah euh but not in the same way tha:at

(18) donc euh imaginons on est dans le cadre d’une dispute

so euh imagine we’re in the middle of an argument

In (17), produced by an American speaker, the filled pause is combined with the discourse marker mais “but,” and in (18), produced by a French speaker, the filled pause is combined with the discourse marker donc “so.” In the former (17), the filled pause occurs in medial position, following an assessment (it’s bad) and a response particle oui “yeah.” This assessment is then refuted, as indicated by the use of discourse marker but. In the latter (18), the sequence occurs at the beginning of the intonation unit and projects a new sequence of talk (imagine we’re in the middle of an argument), which is not disruptive at all. Both fluenceme sequences do not essentially disrupt the flow of speech, but rather build coherence between discourse segments.

In sum, the comparison between learners and native speakers of French in our sample suggests that filled pauses might be a good indicator of language proficiency, as they are longer, in more non-typical positions (interrupted and standalone) and more often clustered with other fluencemes when they are produced by non-native speakers. Despite the small size of our sample, this result brings some support to the filler-as-symptom view of filled pauses, according to which filled pauses are a by-product of higher cognitive demands and planning effort. However, filled pauses are also quite frequent in native speech, and the observed qualitative differences are only relative, albeit significant. A more fine-grained and functional analysis of contextual uses is necessary to refine differences between speaker groups and between different types of filled pauses. To this end, we now turn to the analysis of co-occurrence with discourse markers in the next section, extracted from the same native and non-native data.

4.3 Filled pauses and discourse markers

In the same SITAF corpus, we found 69 instances of filled pauses that were clustered with at least one discourse marker, and these are fairly equally distributed across native (37) and non-native speakers (32), although the rate is slightly (non-significantly) higher for the former (36% versus 27%). The average duration of filled pauses is the same with or without a discourse marker (433 ms versus 467 ms), even if we break it down by speaker group: 376 ms vs. 379 ms with or without a discourse marker for native speakers; 499 ms versus 534 ms for non-native speakers (not significant: t = 0.75, p = .23). Similarly, the non-nasal variant is the preferred form of the pause in both types of contexts for both speaker groups, with between 84% and 92% of euh.

However, the position of filled pauses in the utterance differs greatly: they are overwhelmingly medial without a discourse marker (73%), whereas more than half of the clustered cases are initial (57%), with only 28% in medial position. This distribution is shared by native and non-native speakers. This result first suggests that filled pauses might be attracted to the typical initial position of discourse markers, as observed by Crible et al. (2017). It also shows that filled pauses are used in (at least) two clearly distinct formal patterns (initial versus medial). This is illustrated in the following examples:

(19) après euh (0.465) peut-être que y’en a qui se réfugient (0.652) qui:i qui se cachent da:ns dans leur profil

however euh (0.465) maybe there are some people who hide (0.652) who:o who hide themselves behind their profile

(20) peut-être que:e pour les gens euh qui habitent pas près de:e toi

maybe fo:or the people euh that don’t live close to:o you

In (19), produced by a native speaker, the cluster of a discourse marker après “however,” a filled pause and an unfilled pause in initial position of the intonation unit serves a discourse-structuring function, introducing a different argument which contrasts with what the speaker was saying before (it is better to make friends in real life, although some people prefer to hide behind their social media profile). This change in position is reflected by the strong discourse boundary marked by the cluster of fluencemes. In (20), produced by a learner, the filled pause is isolated and in medial position between a relative clause and its antecedent, which constitutes a minor syntactic boundary. These two examples thus illustrate clearly different configurations: clustered use in initial position at a major discourse boundary (19) and isolated use in intonation-medial position at a minor syntactic boundary.

This dual pattern is further evidenced by the position of the discourse marker with respect to the filled pause: in native speech, the discourse marker mostly precedes the filled pause (26 out of 37), whereas in non-native speech, left and right positions are equally frequent (12 versus 14). Consider the following examples:

(21) et du coup euh Facebook ça me permet vraiment en fait de:e garder contact (0.577) avec des gens

and so euh Facebook really allows me to stay in touch (0.577) with people

(22) sur les choses qui sont les plus euh les plus intéressants

euh ou pas les plus intéressants mais les plus euh communs

on things that are the most euh the most interesting

euh or not the most interesting but the most euh common

Example (21) from a native speaker shows two co-occurring discourse markers preceding the filled pause, which is similar to Example (19). By contrast, in (22), the non-native speaker starts his intonation phrase with a filled pause followed by the reformulative discourse marker ou “or” in a context of repair. This order (filled pause + discourse marker) is relatively rare in native speech and seems to reflect online planning and repair processes rather than a more “intentional” discourse-structuring function.

The typical native pattern (discourse marker + filled pause) represents what some authors have called a “clitic” use of filled pauses (e.g., Kirjavainen et al., forthcoming), which even prompted Schneider (2014) to suggest anduh and butuh as joint spellings for these recurrent clusters in English. Indeed, through repeated joint exposure, discourse markers and filled pauses might become entrenched as one unit, forming a “complex” discourse marker, similar to and then, with a specific function (see Crible & Cuenca, 2019). By contrast, the higher relative frequency of the opposite pattern (e.g., euh ou) in non-native speech could suggest that learners first and foremost use filled pauses to stall while planning, while their proposition only starts later, with the discourse marker. We can also interpret this non-native pattern as reflecting the underuse of discourse markers (cf. Gilquin, 2008), whereby learners are not making use of discourse markers as a resource to plan upcoming speech while maintaining verbal fluency. They produce filled pauses in first position in a more symptom-like way, instead of integrating them as lexical components of a complex discourse-structuring device. Although some learners are also able to produce the native pattern (marker + pause), the different distribution across speaker groups is an interesting indicator of the dual, ambivalent status of filled pauses. These tentative conclusions should however be confirmed on a larger corpus.

Moving from form to function, we can further observe a difference between speaker groups: while the discourse markers produced by native speakers in a cluster with filled pauses are mainly sequential (i.e., perform a major discourse-structuring function such as turn-taking or topic shifting), the functional distribution of the markers produced by non-native speakers is more scattered across sequential, ideational and rhetorical uses. As a reminder, the latter two domains (ideational and rhetorical) consist of types of discourse relations (factual versus more subjective). More specifically, Figure 3 represents the functional domains of the discourse markers clustered with filled pauses. In the case of multiple discourse markers within a cluster, each item was counted separately.

Figure 3.

Functional domain of clustered discourse markers.

We can see that the main difference between speaker groups is the lower proportion of sequential markers (42% versus 60%) and the larger use of ideational markers (36% versus 7%) in non-native speakers, as in Example (23).

(23) tu leur parles sur Facebook

euh mais tu les vois pas

you talk to them on Facebook

euh but you don’t see them

Here the filled pause separates two short segments of a contrast (talking but not seeing). Such uses of filled pauses in the vicinity of ideational discourse relations are surprising: using a similar coding scheme on native English and French data, Crible (2018) found that filled pauses were mostly clustered with sequential discourse markers, as is the case for the native speakers in our data. Ideational markers typically connect shorter segments in local, almost syntactic relations (e.g., through subordinating or coordinating conjunctions). Therefore, the use of a filled pause in these contexts cannot be attributed to a major discourse-structuring function but rather corresponds to the “minor syntactic boundary” use already illustrated by Example (20) above.

In addition, qualitative examination of the data shows that many non-native uses of filled pauses occur in contexts of lexical search, where the speaker is clearly looking for words. Although such uses can sometimes be observed in native speakers, they are more typical of learners, as in the example below.

(24) oui ma:ais je sais pas parce-que quand euh (0.410) être a- empri- euh emprisonné

yes bu:ut I don’t know because when euh (0.410) to be e- empri- euh emprisoned

In this example, the speaker is experiencing trouble finding the right words in his target language, and this is shown in the use of filled pauses clustered with several other fluencemes (word truncations, self-repair and silent pauses). In this specific case, the filled pauses arise in a highly disfluent context and reflect the speaker’s difficulty in formulating his ideas. Such lexical-search uses of filled pauses never co-occur with discourse markers, which further attests of the difference in function (and possibly, in status) between the two devices—or at least between discourse markers and this particular “disfluent” use of filled pauses.

5 Discussion

Our corpus study has revealed quantitative and qualitative differences in filled pause use across various communication settings and contexts. The multiple formal and functional variables included in our analysis shed some light onto the interplay of factors impacting the use of filled pauses. Still, the issue of their lexical or non-lexical status remains a complex one. In this section, we discuss some methodological limitations and theoretical implications of our findings.

5.1 Beyond the role of preparation: What is cognitively demanding?

We started the analysis of the DisReg corpus with the assumption that the more spontaneous setting (i.e., free conversations) would be more cognitively demanding than the more prepared situation (i.e., class presentations with written notes), which would in turn result in a higher frequency of filled pauses in the former. This was not confirmed by the data, which instead showed more frequent, longer and more nasal filled pauses in prepared presentations than in spontaneous dialog. We have interpreted these findings in relation to rhythmic and stylistic differences between monolog and dialog (cf. Duez, 1982), as well as the turn-taking mechanics of dialogs as opposed to monologs, which leads to longer utterances in the latter (cf. Michel et al. 2007). We also mentioned the role of anxiety and self-consciousness in formal situations with higher stakes (graded assignment), which resonates with several previous studies (Broen & Siegel, 1972; Christenfeld & Creager, 1996; Tottie, 2014).

Although these additional factors (style and anxiety) may well explain our findings, they still prompt us to question the assumed difference in cognitive effort between prepared and spontaneous speech. Both situations in the corpus have their own source of difficulty: on the one hand, class presentations can be stressful, the topic can be quite specialized and unfamiliar, the notes can be only partial without full sentences; on the other hand, conversations require to improvise and speak in a timely manner, the speakers attend to production and comprehension at the same time and must care for the hearers’ needs. In other words, the two genres in our corpus do not simply differ in their degree of preparation, but they each involve a range of contextual factors that may have a different impact on various features of speech, including filled pause use.

Therefore, the high frequency of filled pauses in prepared presentations does not necessarily question the link between filled pauses and cognitive effort (Bortfeld et al., 2001), but rather challenges the assumed higher demands of conversations. Every communicative situation comes with its own set of challenges, so that taking genre as a single measure of cognitive effort is too restrictive. With respect to our research question on the status of filled pauses, we found no support for either the filler-as-symptom or filled-as-word position on the basis of frequency differences alone, and the effect of genre seems to be more qualitative (see Section 5.3 below).

However, this study did not look at the combined effects of genre and proficiency, but rather treated them individually in separate corpus analyses. Crossing genre and proficiency in a single dataset could lead to different results. For instance, the results from Michel et al. (2007) on L2 speech are consistent with our genre differences in L1 speakers, but to our knowledge, no previous study has investigated both genre and speaker group on the same corpus. More work should therefore be conducted on the effect of combined variables impacting the use of filled pauses.

5.2 The weight of individual differences

A related issue is the resort to a binary corpus design (prepared vs. spontaneous; native vs. non-native speakers) to draw general conclusions on the use of filled pauses when the reality is much more idiosyncratic and somewhat escapes such generalizations. We found evidence for strong individual differences in both corpora under investigation (cf. Figures 1 and 2). For example, one participant from the DisReg corpus produced 5.3 filled pauses phw (30 cases) during the conversation task, while he only produced 0.9 phw (eight cases) during his presentation (cf. Table 1). This speaker stands in sharp contrast with the average frequency of filled pauses, which is significantly higher in presentations (4.2 vs. 6.8 per hundred words in conversations and presentations, respectively). Similarly, a native French speaker from the SITAF corpus produced 5.9 filled pauses phw (45 cases), while an American speaker produced only 1.7 (six cases) in his second language.

Such variation, not only between but also within groups and genres, highlights the diversity of “umming behavior” (Tottie, 2014, p. 11) among speakers. It might also explain why our findings diverge from most L2 studies, which consistently showed higher rates of filled pauses in non-native and low-proficiency speakers (De Jong, 2016; Gilquin, 2008; Tavakoli, 2011). The non-significant frequency difference between native and non-native speech in our data might be due to the individual differences mentioned above, in addition to the heterogenous proficiency levels of the learners, who might have produced them at a more native-like rate.⁶ In line with our discussion of genre in the previous section, we thus suggest that frequency differences between binary subcorpora might overlook a more complex picture of factors, including idiosyncratic preferences, that impact filled pause use. Such considerations therefore limit the representativity and generalizability of our data, as is often the case in (small-scale) corpus studies.

5.3 Benefits and drawbacks of qualitative analysis

Besides the (limited) frequency differences discussed above, the most interesting findings of our study lie in the more qualitative and functional variables in both corpora, especially position and function. Looking at the position of filled pauses with respect to the intonation unit, they were more often initial in presentations and medial or final in conversations (DisReg corpus). We further noticed an attraction of filled pauses to the initial slot when they co-occurred with a discourse marker (SITAF corpus). These positional preferences were also reflected in the functional range of filled pauses: mainly used for segmentation or turn-taking purposes in initial position, where they often co-occur with sequential discourse markers, they can also be used on their own (i.e., isolated) in medial position in hesitation contexts, or in final position for joint word search and turn-yielding. We were able to link these tendencies to genre features and speaker proficiency, thus identifying two broad uses: 1) initial filled pauses used to segment speech and mark major boundaries by proficient speakers and/or in monologs, and 2) medial filled pauses used for planning and word search, by less proficient speakers and/or in conversations.

While convincing, these results should nevertheless be taken with a grain of salt: with qualitative variables, every methodological decision has substantial consequences. For instance, in this study, we used the intonation phrase as the unit of reference for the analysis of filled pause position, in line with conventions of Discourse Transcription (Du Bois et al., 1993). As a result, the segmented units do not always map with clausal boundaries, and some units are quite short (e.g., one noun phrase). Some filled pauses that were presently annotated as initial with respect to prosody would have thus been considered medial with respect to syntax. A different segmentation unit would therefore probably result in different positional patterns, which is especially significant as position was found to be a very important feature of filled pause use.

Moreover, qualitative variables only provide relative differences between patterns of use (e.g., 40% vs. 24% of initial position in presentations and conversations) and were tested on a rather small corpus size, due to the high workload associated with such fine-grained analyses. These limitations constrain the generalizability of our findings, despite the statistical significance of the observed differences.

Last but not least, qualitative analyses, and in particular functional ones (e.g., discourse marker functions) rely on the linguist’s subjective interpretation of particular contexts. As such, they may somewhat differ from analyst to analyst, and can reflect top-down expectations instead of actual speaker intentions (if any). Although the functional annotation of discourse markers was double-blind, this is a notoriously difficult object of study (e.g., Spooren & Degand, 2010). Pragmatic devices and, to a perhaps greater extent, filled pauses display a very vague “meaning” (if any meaning at all), their functional interpretation is highly contextual and potentially framework dependent. It is a fact of linguistic life that the most interesting variables are often the most complex to investigate and, although the potential subjectivity of our functional analysis should be borne in mind, it remains that this study provides solid grounds to shed some light onto the debate regarding the status of filled pauses, as developed in the following.

5.4 Corpora, categories and constructions: back to the status of filled pauses

The goal of this study was to triangulate evidence from multiple corpora and variables that would support either the lexical or non-lexical status of filled pauses, especially with respect to their categorization by Tottie (2016) as “planners” similar to discourse markers. This triangulation implied the analysis of two different corpora, where the same annotation procedure was applied. The results of our analyses show that some variables do not play the same role in both corpora: for instance, filled pause frequency was significant and duration was non-significant in DisReg, while the opposite was true in SITAF (non-significant frequency but significant duration); similarly, the rate of isolated vs. clustered uses was significantly different in SITAF but not in DisReg. What these apparent discrepancies suggest is not that the analyses or the corpora are unreliable, but rather that, as already mentioned above, different contextual factors have a different impact on filled pause use.

We believe that this combination of corpora helps us contribute to, and hopefully settle, the debate regarding the status of filled pauses. Previous contributions to this issue tended to adopt a monolithic answer, looking for the unique best way to categorize filled pauses based on their function in speech either as symptoms, signals or words (Clark & Fox Tree, 2002). By contrast, our present usage-based approach prompted us to look for recurrent form-function patterns, in which the linguistic and extra-linguistic context has a crucial role in determining the function of filled pauses. With this approach, we found two main patterns of use or “constructions”:

filled pauses used in initial position, often following a sequential discourse marker (e.g., donc euh “so uh”), to signal major boundaries (cf. Swerts, 1998), thus forming a rather “fluent” construction similar to the filler-as-signal view;

filled pauses used in medial position in contexts of lexical search, typically without a discourse marker but often clustered with other fluencemes such as unfilled pauses, lengthenings or repetitions, thus forming a rather “disfluent” construction similar to the filler-as-symptom view.

The first pattern tends to be preferred in presentations and by native speakers, while the second is more common in conversations and in non-native speakers. We take this converging evidence from genre and proficiency as convincing support in favor of our dual view of filled pauses. Such ambivalence cannot be reduced to a single label or category, as most authors have previously attempted. In a recent study, Tottie (2019) showed differences between the written variant (ehm) of filled pauses, as used in journalistic prose, and the spoken one (uhm): the former can be used intentionally by writers as stance markers, while the latter is said to be used unintentionally for speech planning. She further argues for a dual, modality-based categorization of filled pauses, either as “inserts” in speech or stance adverbs in writing. This position refines Tottie’s previous claims but still somewhat fails to acknowledge the duality of filled pauses within the spoken modality, as we have demonstrated.

Lastly, although filled pauses and discourse markers can cluster, we found strong differences in their use (especially their position), which rule against a joint categorization of the two types of devices, even though in some contexts of use they may become more and more entrenched as a complex unit (cf. Schneider, 2014 on anduh and butuh). Although some filled pauses perform similar functions as discourse markers, they are, in our view, more similar to prosody or co-speech gestures: their use can be learned (cf. Kirjavainen et al., forthcoming.), they can be functional, but they are still widely different from “words” because of their extreme mobility and the difficulty to pin down their meaning. Still, clear form-function patterns emerged from our corpus analysis and filled pauses might enter more fixed constructions in the future through repeated use.

6 Conclusion

The aim of the present study was to discuss the lexical or non-lexical status of filled pauses by triangulating evidence from multiple data types and multiple linguistic variables. We analyzed quantitative and qualitative features of filled pauses across different communication settings (prepared monologs vs. spontaneous conversations) and levels of language proficiency (native vs. non-native) in French. We found differences in frequency, duration, position and patterns of co-occurrence, which showed that filled pause use is determined by several linguistic and extra-linguistic factors, in line with Tottie (2011, 2014, 2016). However, these results need to take into account the limited size of our data sample, which may not be representative of different speaking styles and idiosyncratic preferences. Our qualitative analyses identified fine-grained differences and illustrated the different “fluent” and “disfluent” uses of filled pauses, which can be used in two main patterns: in initial position and often clustered with a discourse marker to signal major boundaries (filler-as-signal), vs. in medial position, isolated or clustered with hesitation markers in contexts of planning and word search (filler-as-symptom). We therefore argue for a dual status of filled pauses in speech based on formal and functional features. This usage-based approach reconciles previous monolithic accounts and situates filled pauses on a continuum between prosody (with its segmentation and stylistic functions) and discourse markers, with which they show both similarities and differences.

The restrictions and limitations of the present corpus study call for further investigation of patterns of filled pause use in larger datasets, more genres and languages. An immediate avenue of research is to replicate the analysis of discourse marker clusters in the DisReg corpus, to compare their distribution and functions across genres. Finally, combinations of corpus-based and experimental methods (as in Kirjavainen et al., forthcoming.) would further our understanding of the way filled pauses are produced and perceived.

Footnotes

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Loulou Kosmala

Notes

References

Bates

Mächler

Bolker

Walker

(2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01.

Beňuš

. (2009). Variability and stability in collaborative dialogues: Turn-taking and filled pauses. Tenth Annual Conference of the International Speech Communication Association, Brighton, United Kingdom. http://www.isca-speech.org/archive/interspeech_2009/i09_0796.html.

Bortfeld

Leon

S. D.

Bloom

J. E.

Schober

M. F.

Brennan

S. E.

(2001). Disfluency rates in conversation: Effects of age, relationship, topic, role, and gender. Language and Speech, 44(2), 123–147. https://doi.org/10.1177/00238309010440020101.

Broen

P. A.

Siegel

G. M.

(1972). Variations in normal speech disfluencies. Language and Speech, 15(3), 219–231. https://doi.org/10.1177/002383097201500302.

Candea

(2000). Contribution à l’étude des pauses silencieuses et des phénomènes dits « d’hésitation » en français oral spontané. Étude sur un corpus de récit en classe de français. PhD Thesis. Université Paris III- Sorbonne Nouvelle.

Christenfeld

Creager

(1996). Anxiety, alcohol, aphasia, and ums. Journal of Personality and Social Psychology, 70(3), 451. https://doi.org/10.1037/0022-3514.70.3.451.

Clark

Fox Tree

J. E.

(2002). Using uh and um in spontaneous speaking. Cognition, 84(1),73–111. https://doi.org/10.1016/S0010-0277(02)00017-3

Corley

Stewart

O. W.

(2008). Hesitation disfluencies in spontaneous speech: The meaning of um. Language and Linguistics Compass, 2(4), 589–602. https://doi.org/10.1111/j.1749-818X.2008.00068.x.

Crible

(2018). Discourse markers and (dis)fluency: Forms and functions across languages and registers. John Benjamins Publishing.

10.

Crible

Degand

(2019). Domains and functions: A two-dimensional account of discourse markers, Discours, 24, 3–35. https://doi.org/10.4000/discours.9997.

11.

Crible

Degand

Gilquin

(2017). The clustering of discourse markers and filled pauses. Languages in Contrast, 17(1), 69–95. https://doi.org/10.1075/lic.17.1.04cri

12.

Crible

Dumont

Grosman

Notarrigo

(2019). (Dis)fluency across spoken and signed languages: Application of an interoperable annotation scheme. In Degand

Gilquin

Simon

A. C.

(Ed.) Fluency and disfluency across languages and language varieties (Corpora and Language in Use-Proceedings 4). Presses universitaires de Louvain.

13.

Crible

Pascual

(2020). Combinations of discourse markers with repairs and repetitions in English, French and Spanish. Journal of Pragmatics, 156, 54–67. https://doi.org/10.1016/j.pragma.2019.05.002

14.

Cuenca

M. J.

Crible

(2019). Co-occurrence of discourse markers in English: From juxtaposition to composition. Journal of Pragmatics, 140, 171–184. https://doi.org/10.1016/j.pragma.2018.12.001

15.

De Jong

. (2016). Predicting pauses in L1 and L2 speech: The effects of utterance boundaries and word frequency. International Review of Applied Linguistics in Language Teaching, 54(2), 113–132. https://doi.org/10.1515/iral-2016-9993.

16.

De Leeuw

. (2007). Hesitation markers in English, German, and Dutch. Journal of Germanic Linguistics, 19(2), 85–114. https://doi.org/10.1017/S1470542707000049

17.

Derwing

T. M.

Munro

M. J.

Thomson

R. I.

Rossiter

M. J.

(2009). The relationship between L1 fluency and L2 fluency development. Studies in Second Language Acquisition, 3(4), 533–557.

18.

Du Bois

J. W.

Schuetze-Coburn

Cumming

Paolino

. (2013). Talking data: Transcription and coding in discourse research, Edwards

Lampert

M. D.

(Ed.). Psychology Press.

19.

Duez

(1982). Silent and non-silent pauses in three speech styles. Language and Speech, 25(1), 11–28. https://doi.org/10.1177/002383098202500102.

20.

Fehringer

Fry

(2007). Hesitation phenomena in the language production of bilingual speakers: The role of working memory. Folia Linguistica: Acta Societatis Linguisticae Europaeae, 41(1–2), 37–72. https://doi.org/10.1515/flin.41.1-2.37.

21.

Ferreira

Bailey

K. G. D.

(2004). Disfluencies and human language comprehension. Trends in Cognitive Sciences, 8(5), 231–237. https://doi.org/10.1016/j.tics.2004.03.011.

22.

Finlayson

I. R.

Corley

(2012). Disfluency in dialogue: An intentional signal from the speaker? Psychonomic Bulletin & Review, 19(5), 921–928. https://doi.org/10.3758/s13423-012-0279-x.

23.

Fischer

(2015). Situation in grammar or in frames? Evidence from the so-called baby talk register. Constructions and Frames, 7(2), 258–288. https://doi.org/10.1075/cf.7.2.04fis.

24.

Fried

Östman

J.-O.

(2005). Construction grammar and spoken language: The case of pragmatic particles. Journal of Pragmatics, 37(11), 1752–1778. https://doi.org/10.1016/j.pragma.2005.03.013.

25.

García-Amaya

Lang

(2020). Filled pauses are susceptible to cross-language phonetic influence: evidence from Afrikaans-Spanish bilinguals. Studies in Second Language Acquisition, 42(5), 1–29. https://doi.org/10.1017/S0272263120000169.

26.

Gilquin

(2008). Hesitation markers among EFL learners: Pragmatic deficiency or difference. In Romero-Trillo

(Ed.) Pragmatics and corpus linguistics: A mutualistic entente (pp. 119–149). De Gruyter Mouton.

27.

Glynn

(2010). Synonymy, lexical fields, and grammatical constructions. A study in usage-based cognitive semantics. Cognitive Foundations of Linguistic Usage-patterns: Empirical Studies. Advance online publication. https://doi.org/10.13140/RG.2.1.1079.9524.

28.

Götz

(2013). Fluency in Native and Nonnative English Speech. John Benjamins Publishing.

29.

Horgues

Scheuer

(2015). Why some things are better done in tandem. In Investigating English Pronunciation (pp. 47–82). Springer. http://link.springer.com/chapter/10.1057/9781137509437_3

30.

Horgues

Scheuer

(2018) L’exploitation d’un corpus d’interactions en tandem anglais/français pour mieux comprendre les enjeux de la rétroaction corrective entre pairs. Alterstice: Revue Internationale de la Recherche Interculturelle, 8(1), 63–81. https://doi.org/10.7202/1052609ar.

31.

Jehoul

(2019). Filled pauses from a multimodal perspective. On the interplay of speech and eye gaze. PhD Thesis. University of Leuven.

32.

Jucker

A. H.

(2014). Uh and um as planners in the corpus of historical American English. In Taavitsainen

Kytö

Claridge

Smith

(Ed.) Developments in English: Expanding Electronic Evidence (pp. 162–177). Cambridge University Press.

33.

Kahng

(2014). Exploring utterance and cognitive fluency of L1 and L2 English speakers: Temporal measures and stimulated recall. Language Learning, 64(4), 809–854. https://doi.org/10.1111/lang.12084.

34.

Kjellmer

(2003) Hesitation. In defence of er and erm. English Studies, 84(2), 170–198. https://doi.org/10.1076/enst.84.2.170.14903.

35.

Kirjavainen

Crible

Beeching

(forthcoming). Do filled pauses behave like linguistic items? Investigating the effect of exposure on the representation of um.

36.

Kosmala

(in press). On the specificities of L1 and L2: (Dis)Fluencies and the interactional multimodal strategies of l2 speakers in tandem interactions. Journal of Monolingual and Bilingual Speech.

37.

Kosmala

(2020). Euh le saviez-vous ? Le rôle des (dis) fluences en contexte interactionnel: étude exploratoire et qualitative. SHS Web of Conferences, 78, 01018. https://doi.org/10.1051/shsconf/20207801018.

38.

Kosmala

Morgenstern

(2019). Should ‘uh’ and ‘um’ be categorized as markers of disfluency? The use of fillers in a challenging conversational context. In Degand

Gilquin

Simon

A. C.

(Ed.) Fluency and disfluency across languages and language varieties (Corpora and Language in Use-Proceedings, Vol. 4). Presses Universitaires de Louvain.

39.

Levelt

W. J.

(1983). Monitoring and self-repair in speech. Cognition, 14(1), 41–104. https://doi.org/10.1016/0010-0277(83)90026-4.

40.

Maschler

Schiffrin

(2015). Discourse markers: Language, meaning, and context. In Tannen

Hamilton

H. E.

Schiffrin

(Ed.) The Handbook of Discourse Analysis, Vol. 2 (pp. 189–221). John Wiley and Sons.

41.

Merlo

Mansur

L. L.

(2004). Descriptive discourse: Topic familiarity and disfluencies. Journal of Communication Disorders, 37(6), 489–503. https://doi.org/10.1016/j.jcomdis.2004.03.002.

42.

Michel

M. C.

Kuiken

Vedder

(2007). The influence of complexity in monologic versus dialogic tasks in Dutch L2. International Review of applied Linguistics in Language Teaching, 45(3), 241–259. https://doi.org/10.1515/iral.2007.011.

43.

Moniz

Mata

A. I.

Viana

M. C.

(2007). On filled-pauses and prolongations in European Portuguese. Eighth Annual Conference of the International Speech Communication Association (pp. 2647–2648). http://www.isca-speech.org/archive/interspeech_2007/i07_2645.html.

44.

O’Connell

D. C.

Kowal

(2005). Uh and um revisited: are they interjections for signaling delay? Journal of Psycholinguistic Research, 34(6), 555–576. https://doi.org/10.1007/s10936-005-9164-3.

45.

Pallaud

Rauzy

Blache

(2013). Auto-interruptions et disfluences en français parlé dans quatre corpus du CID. TIPA. Travaux interdisciplinaires sur la parole et le langage, 29, 1–24. https://doi.org/10.4000/tipa.995.

46.

Riazantseva

(2001). Second language proficiency and pausing: a study of Russian speakers of English. Studies in Second Language Acquisition, 23(4), 497–526. https://doi.org/10.2307/44486959.

47.

Sacks

Jefferson

Schegloff

E. A.

(1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50(4), 696–735. https://doi.org/10.1016/B978-0-12-623550-0.50008-2.

48.

Schachter

Christenfeld

Ravina

Bilous

(1991). Speech disfluency and the structure of knowledge. Journal of Personality and Social Psychology, 60(3), 362. https://doi.org/10.1037/0022-3514.60.3.362.

49.

Schegloff

E. A.

(2010). Some other “uh(m)” s. Discourse Processes, 47(2), 130–174. https://doi.org/doi:10.1080/01638530903223380.

50.

Schiffrin

(1987). Discourse markers. Cambridge University Press.

51.

Schneider

(2014). Frequency, hesitations and chunks. A usage-based study of chunking in English. PhD Thesis. Albert-Ludwigs-Universität.

52.

Shriberg

E. E.

(1994). Preliminaries to a theory of speech disfluencies. University of California.

53.

Smith

V. L.

Clark

H. H.

(1993) On the course of answering questions. Journal of Memory and Language, 32(1), 25–38. https://doi.org/10.1006/jmla.1993.1002.

54.

Spooren

Degand

(2010). Coding coherence relations: Reliability and validity. Corpus Linguistics and Linguistic Theory, 6(2), 241–266. https://doi.org/10.1515/cllt.2010.009.

55.

Swerts

(1998). Filled pauses as markers of discourse structure. Journal of Pragmatics, 30(4), 485–496. https://doi.org/10.1016/S0378-2166(98)00014-9.

56.

Tavakoli

(2011). Pausing patterns: Differences between L2 learners and native speakers. ELT Journal, 65(1), 71–79. https://doi.org/10.1093/elt/ccq020.

57.

Tottie

(2011). Uh and um as sociolinguistic markers in British English. International Journal of Corpus Linguistics, 16(2), 173–197. https://doi.org/10.1075/ijcl.16.2.02tot.

58.

Tottie

(2014). On the use of uh and um in American English. Functions of Language, 21(1), 6–29. https://doi.org/10.1075/fol.21.1.02tot.

59.

Tottie

(2015). Uh and um in British and American English: Are they words? Evidence from co-occurrence with pauses. In Dion

Lapierre

Cacoullos

R. T.

(Ed.) Linguistic variation: Confronting fact and theory (pp. 38–55). Routledge.

60.

Tottie

(2016). Planning what to say: Uh and um among the pragmatic markers. In Kaltenböck

Keizer

Lohmann

(Ed.) Outside the clause. Form and function of extra-clausal constituents (pp. 97–122). John Benjamins.

61.

Tottie

(2019). From pause to word: Uh, um and er in written American English. English Language & Linguistics, 23(1), 105–130. https://doi.org/10.1017/S1360674317000314.

62.

Watanabe

Hirose

Den

Minematsu

. (2008). Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners. Speech Communication, 50(2), 81–94. https://doi.org/10.1016/j.specom.2007.06.002.