Constraints on speech rate: A heritage-language perspective

Abstract

Research questions:

Polinsky argues that speech rate in heritage languages is highly correlated with proficiency level. In sociolinguistics studies, speech rate in monolingual speakers is found to be conditioned by social factors. What occurs when both proficiency and social factors vary? Is speech rate a valid measure of proficiency?

Methodology:

We use two automated methods of measuring articulation rate (syllables per second), cross-referenced to improve accuracy: an orthographic vowel count and an acoustic measure of amplitude changes from syllable nucleus to periphery.

Data and analysis:

Across 51 speakers, each recorded in an hour-long conversation in Heritage Italian, Russian, Ukrainian, or Homeland Italian, we calculate speech rate in more than 10,000 clauses.

Findings:

Linear regression analyses reveal that articulation rate correlates with generation (since immigration) and age, but, surprisingly, not with ethnic orientation, sex or language. Age and generation are partly collinear in our sample, and models with generation fit better than those with age. We also find that articulation rate does not predict performance on sociolinguistic variables (voice onset time for stops, subject pronoun presence) in heritage varieties.

Originality:

This study compares two ways of calculating articulation rate automatically, examining whether speech rate is a viable stand-in for proficiency when social factors and proficiency vary independently. We resolve several obstacles to using articulation rate as a stand-in for more labor-intensive proficiency measures in spontaneous speech data.

Implications:

These findings suggest that speech rate is a valid proxy for heritage language proficiency. The factor with the strongest effect is generation since immigration (indicating the dominant language in the speaker’s childhood community). The effects of the social factors are complex but must be considered in order to interpret the proficiency measure accurately.

Keywords

Speech rate articulation rate heritage languages proficiency prosody subject pronoun presence

Introduction

Linguistic proficiency is understood to reflect the amount and type of input received, so valid and efficient measures of proficiency are essential in heritage language studies. For heritage languages, Polinsky (2008, 2012) observes that speech rate might well be “one of the most sensitive measures reflecting the level of proficiency” (2008, p. 54; see also de Jong & Wempe, 2009, p. 385; Kagan & Friedman, 2003).¹ Polinsky (2008, p. 53) shows that heritage speakers of Russian who have the full system of three genders speak their heritage language (HL) nearly as quickly as monolingual Russian speakers; however, heritage speakers whose gender system is simplified to a binary contrast speak Russian more slowly. More generally, Aalberse et al. (2019, p. 187) note that “HL-speakers tend to speak more slowly than non-HL speakers.” Irizari van Suchtelen (2016, pp. 116–118) found that speech rate differences between directly adjacent generational groups of heritage Spanish speakers in the Netherlands are not significant, but that homeland speakers do speak significantly more quickly than second generation speakers. Polinsky (2012) notes that there are several advantages of using speech rate as a proxy for proficiency: it “does not rely on literacy skills” (p. 106) and is “a very simple measure” (ibid.). However, she notes that it is a measure that requires further empirical investigation and verification, which we offer here.

The development of such a measure is necessary for reliably comparing proficiency in varieties where the norms of the “homeland” or “hearth” variety may not be appropriate benchmarks, that is, when a heritage variety is en route to endonormativity (Schneider, 2008), as has been shown for some heritage varieties (Nagy, 2016). While fluency “constitutes an essential criterion in assessing L2 performance and proficiency” (Kahng, 2014, p. 810), measuring proficiency in a heritage language is a thorny issue, for two related reasons. First, the grammars of heritage languages have not been described thoroughly enough to allow for the sort of proficiency testing that is based on linguistic (grammatical or lexical) knowledge that is available for standardized languages. Second, like all languages, heritage languages evolve over time, perhaps even faster than standardized varieties as a result of intense contact with the community’s dominant language. Thus, any test of linguistic knowledge of a heritage language risks actually testing knowledge of a related standardized language or a potentially outdated version of the heritage variety. Vocabulary size or richness (as measured by counting unique words in an excerpt of elicited or spontaneous speech) has been proposed as a measure of proficiency that circumvents the problem of standardization. However, cross-linguistic comparisons of this measure are difficult, as noted by Tidball and Treffers-Daller (2007), because of typological differences. Additionally, type-token ratios decline with the length of the text (Richards, 1987, p. 204), which necessitates, for spontaneous speech, calculations of the ideal length of text to examine. Furthermore, de Jong and Bosker (2013, p. 19) do not report particularly strong correlations between vocabulary size and their proficiency measures (which have to do with pauses). Our goal, in this chapter, is to understand the range of variation in speech rate in spontaneous speech across multiple generations of heritage language speakers in order to assess whether speech rate acts in ways consistent with a proficiency measure, as a first step toward evaluating the claims of Polinsky (2008). As Kahng (2014, p. 837) notes, L1 and L2 speakers’ fluency differs. Heritage language speakers are neither exactly like L1 nor exactly like L2 speakers, and this makes it critical to conduct research directly with this population.

Data are from the Heritage Language Variation and Change (HLVC) project in Toronto (described in Nagy, 2011). We use samples from three generations of speakers – first-generation immigrants, who speak these languages natively; second- and third-generation, that is, children and grandchildren of immigrants; and homeland speakers. Our use of “generation” is defined in Section 4.1. We consider three languages in order to ascertain whether speech rate operates comparably across languages, and we look for correlates among social factors including responses to an ethnic orientation questionnaire (EOQ). We thus lay the groundwork for future research in which speech rate may act as a proxy for proficiency and be considered as a predictor of other linguistic (e.g. phonological, morphological) patterns. If we find correlations between speech rate and linguistic (e.g. phonetic or syntactic) variation or other measures of proficiency, such as amount of pausing, it will suggest that speech rate reflects acquisition/loss/attrition differences among speakers. On the other hand, correlations between speech rate and social factors (e.g. age, sex, ethnic orientation) suggest that identity-marking is responsible for speech rate variation.

This article thus responds to Polinsky’s call for further investigation by examining the role of speech rate in several heritage languages spoken in Toronto, Canada. We begin by comparing literature on the prosodic variable speech rate as used for a proficiency measure and as a sociolinguistic variable (among monolinguals), then turn to means of measuring speech rate. We report on which factors correlate to speech rate in Toronto heritage speakers and show that, in our data, speech rate does not correlate to sociolinguistic variables expected to show effects of language contact with English.

Speech rate and L2 variation

In L2 research, speech rate has often been found to correlate with fluency or proficiency (see, e.g. Daller et al., 2011; de Jong et al., 2015; Ginther et al., 2010; Kahng, 2014; Kormos & Dénes, 2004; O’Brian et al., 2007; Riggenbach, 1991). For this reason, rate of speech is sometimes used as a proxy for these (see, e.g. Kagan & Friedman, 2003, p. 540) insofar as there may be “a higher speech rate in more advanced learners” (Daller et al., 2011, p. 218). The length and frequency of pauses may also correlate with proficiency (see Chambers, 1997; de Jong & Bosker, 2013; Ginther et al., 2010; Kahng, 2014; Wood, 2001; see also Derwing, 1990), even though that is best considered separately from articulation rate.

Age, sex, and education level “have traditionally been assumed to be relevant for investigations of language attrition” (Schmid & Dusseldorp, 2010, p. 128). As we will see, the data examined here present no evidence of attrition, so we will not discuss this concept further.

Speech rate and L1 variation

In recent years, there has been a surge of interest in speech rate as a sociolinguistic variable. Like conventional phonological and morphosyntactic variables, speech rate can vary according to a number of social and linguistic factors (Jacewicz et al., 2009, pp. 233–234). How much influence geography can have on speech rate is disputed. Some researchers find significant distinctions between dialect regions: for instance, Jacewicz et al. (2009) compared two regions of the United States; for comparisons of regions in Belgium and the Netherlands, see Verhoeven et al. (2004) and Quené (2008). However, some scholars do not find geographic distinctions (see, e.g. Hewlett & Rendall, 1998, who compared regions in Scotland). There is more agreement for other social variables, especially sex. Men tend to speak faster than women, and this appears to be a fairly robust effect in both English and Dutch: Jacewicz et al. (2009) report that their male participants in two American states speak more quickly than their female counterparts; Whiteside (1996, p. 38) finds the same for two dialects in northern England, though she cautions that her sample-size is low and her data come from a reading task. Verhoeven et al. (2004, p. 306) report that Dutch-speaking men spoke 6% faster than Dutch-speaking women; Biemans (2000, Chapter 9), manipulating Dutch speech stimuli artificially, finds a relationship between increased speech rate and perceived masculinity. Age effects on speech rate are also reported: younger adults generally speak faster than older ones (Jacewicz et al., 2009; Verhoeven et al., 2004; Yuan et al., 2006), though the reasons for this may be mostly physiological (Ramig, 1983).

The extent to which speech rate varies across languages has not been probed as much as language-internal factors; the use of data across different languages in the present study raises the question of whether a difference in average speech rate between them can be anticipated. The literature on the matter appears to engender some dispute. On one hand, Roach (1998) surveys cross-linguistic research and concludes that any discrepancies uncovered might well be attributed to social factors alone. That is, speech rate cross-linguistically might be expected to vary from a common baseline as a result of purely cultural distinctions. In opposition are Pellegrino et al. (2011), who find a negative correlation between the “information density” of the language in question and the average speech rate. This suggests that different languages might have different baseline levels in terms of average speech rate (measured in phonological units). Daller et al. (2011, p. 234) conclude their study of proficiency in L2 Turkish and L2 German by calling for subsequent investigation into “automated measures” of fluency and proficiency, preferably across a range of languages.

Methods

In addition to answering the questions raised in the preceding two sections about whether speech rate corresponds to fluency and/or to social factors such as age and sex in spontaneous heritage language speech, we are also reporting on efforts to automate as much as possible the analysis processes that can be used with spontaneous speech. This will increase the potential of analyzing large quantities of naturally occurring speech, while minimizing the loss of accuracy, in order to understand best how language functions in its natural context. We describe in this section the corpus used, several possible methods of measuring speech rate, data reduction methods to optimize outcomes, the multiple regression analysis to which we subject our measurements to determine which factors are the best predictors of speech rate, and tests of correlation with other linguistic performance. All procedures were performed in compliance with local laws and University of Toronto guidelines. The Office of Research Ethics has approved them (REB #24041).

Data

We focus on three heritage languages spoken in Toronto: Russian and Ukrainian, which are closely related to each other, and Italian, which is at a greater taxonomic distance. This allows us to see the extent to which typological differences need to be considered in developing methodology for comparing speakers’ speech rates. To control for dialectal variation, all speakers representing each language have family backgrounds from the same homeland city or region. Russian speakers (or their ancestors) come from Moscow or St. Petersburg, Ukrainians from Lviv, and Italians from Calabria.

We sample speakers across three generations. “Generation” is defined based on immigration history, not age. First generation speakers were born and raised in the homeland until at least age 18 and have subsequently spent 20 or more years in Toronto. Second generation speakers were born in Toronto, or arrived before age 6, and their parents qualify as first generation. Third generation speakers are those whose parents qualify as second generation. In each generation, we sample across age groups as much as possible. In the data examined here, we have speakers in all three generations between the ages of 45 and 50. Our analysis (presented below) shows that the model including Generation is better than that with Age (comparison of models is discussed below), so that is our focus throughout the article.

We also compare a sample of homeland speakers from Calabria. This allows us to distinguish effects of limited input on speech rate from other sources of HL variation, as these Homeland speakers and our Generation 1 all grew up in Calabria. As shown in Table 1, we have between 7 and 19 speakers representing each generation, for a total of 51 speakers. This includes 12 Heritage Italian, 12 Russian, and 16 Ukrainian speakers, as well as 11 Homeland Italian speakers. This sample also allows us to test our methods on non-heritage (monolingual) speakers; among them proficiency is constant, so we expect any observed variation to be accounted for by social factors rather than proficiency or quantity/quality of input.

Table 1.

Speaker sample.

	Italian		Russian		Ukrainian		Total
	Male	Female	Male	Female	Male	Female	Total
Homeland	7	4	–	–	–	–	11
Generation 1	3	1	3	3	3	1	14
Generation 2	2	4	2	4	2	5	19
Generation 3	1	1	0	0	2	3	7
Total	23		12		16		51

Our spontaneous conversational speech data are from a corpus of sociolinguistic interviews (for details on data collection methods, see Labov, 1984), each about one hour in length, conducted in speakers’ heritage languages. In addition to conversing on various topics of personal interest for approximately an hour, speakers were asked the items in the EOQ. Their oral responses were later coded on a three-point scale to allow for quantitative comparison. For each scale, we assign 0 points to speakers whose answer is oriented toward English use/preference (with various questions exploring various contexts), 2 points to speakers whose answers reflect an orientation toward their heritage language most/all the time, and 1 point for mixed answers. The questionnaire is posted at http://projects.chass.utoronto.ca/ngn/pdf/HLVC/short_questionnaire_English.pdf, and additional methodological details are available in Nagy (2011). Data from all speakers in the corpus whose transcriptions are complete and who responded to the EOQ are included.² The speaker sample for this study is shown in Table 1, and individual median speech rate and EOQ scores are in Appendix A.

We looked for effects of ethnic orientation (EO) on speech rate along two paths, considering both the mean score of all 37 questions and the scores related to the subgroup of questions about language use. First, we looked for significant correlations between speech rate and EOQ scores, and then looked for individual predictive effects of EOQ scores in multiple regression analysis. For the latter process, we first identified subsets of EOQ scores that correlated with each other. Only two pairs were too strongly and significantly correlated to run in the same model at once: EO Average and Language Use (Pearson’s r: 0.89, p < 0.01) and Language Choices and Cultural Environment (Pearson’s r: 0.90, p < 0.01). No models included both highly correlated items. Individuals’ scores for EO Average and Language Use are listed in Appendix A. Speakers’ individual EO Average scores are calculated by taking the mean of all their responses to the EOQ. Their Language Use scores are calculated by taking the mean of their responses to the five questions about language use in Section B of the EOQ.

Measuring speech rate

There is little consensus on how speech rate is best calculated (Roach, 1998, pp. 151–152; Thomas, 2011, p. 188), and none of the established methods is without flaws. The number of words per unit of time can be measured (Daller et al., 2011; O’Brian et al., 2007; Polinsky, 2008), which seems straightforward. However, there are several problems with this approach, especially for a study that examines more than one language. First, the very definition of “word” is contentious, at least for some languages (Haspelmath, 2011; Roach, 1998, p. 151). Second, average word-length varies across languages. Third, depending on the properties of the language(s) under study, orthographic word-length may not provide a good indication of spoken duration.

Measuring the number of syllables per unit of time is “likely to be a much more reliable way of comparing different languages” when it comes to speech rate (Roach, 1998, p. 152), and this method is commonly used (Hewlett & Rendall 1998, p. 65; Kahng, 2014; Verhoeven et al., 2004). Again, though, there are cross-linguistic differences, this time in syllable structure. If speech rate is measured according to syllables per second, languages with relatively long or numerous consonant clusters will register as slower than languages with preferences for strict consonant-vowel (CV) syllable templates. De Jong et al. (2015) also make this point, specifically about comparing English and Turkish. A third option is to measure the number of phonemes per second; however, this is cumbersome and time-consuming, especially with large amounts of spontaneous speech data.

We measure speech rate in syllables per second. We use two automated methods to identify the number of vowels per second, using each to check the other. Because neither method is foolproof on its own (as discussed in section Stringency criteria), we use the measurements only from cases where the two methods produce comparable results. The first method counts the number of orthographic vowels in the transcript. The second method counts amplitude peaks that mark syllable nuclei in the acoustic signal.

Segmenting the sound files and running the vowel-counting script

The spontaneous speech from sociolinguistic interviews had been previously transcribed by native speakers of each language, using ELAN (Brugman & Russel, 2004) to create annotation segments that are time-aligned to the audio recording (.wav file). Each participant in the conversation is transcribed on a separate tier.³ The three languages have all been transcribed using the Roman alphabet – Italian in its standard orthography, and Russian and Ukrainian using Comrie and Corbett’s (2002, pp. 827, 832–833) transliteration system. Transcribers were instructed to segment the file into clauses. In as far as a clause is a recursive unit in language, transcribers varied to some extent in the length and complexity of units transcribed in one annotation segment (see the range of examples in Table 2). Such units are similar to the “analysis of speech unit,” “a single speaker’s utterance consisting of an independent clause, or a subclausal unit, together with any subordinate clause(s) associated with either,” used in de Jong et al. (2015, p. 229). For ease of explication, we refer to these annotation segments as clauses.

Table 2.

Excerpt of output from syllable-counting script, comparing the two methods of counting syllables per second.

File name(Speaker_Task_Timestamp)	Articulation rate	Ortho. syll. count	Peaks syll. count	Difference	# pauses	Duration	Phonation time	Clause
IXM47A_IV_6275	3.60	5	4	1	0	1.11	1.11	“si secondo me”¹
IXM52A_IV_1774	4.69	11	10	1	0	2.13	2.13	“mia mamma abita a San Gregorio”
IXM52A_IV_788	2.66	6	5	1	0	1.88	1.88	“eravamo sei”
I1F71A_IV_23593	3.87	12	12	0	0	3.10	3.10	“è na parrocchia abbastanza brava, si, no?”
I1F71A_IV_35	3.62	10	10	0	1	3.15	2.77	“Ma tu vu nome mio prima proprio?”

The sample clauses may be translated as: ‘if, according to me,’ ‘my mother lives in San Gregorio,’ ‘we were six,’ ‘it’s a pretty good church, no?’ and ‘but you really want to name my cousin?’

Because this long-running project is dependent on (often short-term) student labor, these transcripts were produced by 23 different transcribers who edited each other’s work. It is not possible to produce useful measures of inter-transcriber reliability because in one case (Russian), the vast majority of the transcripts were produced by one transcriber while in the others, six or seven transcribers worked together in various combinations. Corrections for comparability were made in the editing stage.

To prepare the time-aligned transcription and audio files for processing, we used Lennes’s (2002) script in Praat (Boersma & Weenink, 2015) to split both the audio .wav files and the transcription Praat TextGrid files (exported from ELAN .eaf files) into segments corresponding to each clause. The script was adapted to label the output .txt and .wav files according to their start times in milliseconds and create small text files containing the transcriptions of each clause.⁴ This yielded hundreds of short .wav and .txt files for each speaker.

The amplitude-peak counting method is based on a Praat script written by de Jong and Wempe (2009) (for discussion, see Daller et al., 2011; de Jong & Wempe, 2009), modified by Hugo Quené et al. (2010), and then further modified by Maksym Shkvorets to implement the orthographic vowel-counting method. The de Jong and Wempe (2009) script creates a number of numerical measures of speech rate. We use “articulation rate,” which excludes silent portions in the recording when calculating rate (see, e.g. Daller et al., 2011, p. 219). Pauses might reflect planning as well as sociolinguistic variability (Thomas, 2011, p. 188); in fact, the use of pauses and hesitancy can be fairly idiosyncratic (Jacewicz et al., 2009, p. 235). Pauses (defined as silences of > 0.3 seconds) are thus excluded from the duration measured to calculate speech rate (articulation rate) here. This 0.3 second threshold is an intermediary value of the range of thresholds used in previous research: 0.2 s (Daller et al., 2011) to 0.4 s (O’Brian et al., 2007). It falls in the range that produces the strongest correlations between speech rate and vocabulary size in a study by de Jong and Bosker (2013). That article provides additional support for using such a threshold.

This method produces an output file that contains 10,832 clauses. For each, the outcome of two methods of counting syllables, duration of the clause, number of pauses, duration of the clause and phonation time of the clause (the duration including pauses), a time-marker corresponding to the original ELAN .eaf file, and a speaker identifier are provided. See the (simplified) sample output in Table 2.

We treat each clause as a single token of the dependent variable speech rate and subject the set to multiple regression analysis. “Clauses,” as noted above, are defined by the annotation units created by the transcriber. See examples in Table 2. Each clause is treated as an individual token, both for the process of pruning the data and in the multiple regression analysis. Given the limitations of using natural speech data from a corpus of sociolinguistic interviews rather than lab-collected samples, we are unable to control more stringently for length; this is a necessary sacrifice associated with conversational speech data. However, since no individual utterance is smaller than a full clause, differences across utterances in terms of the weight assigned to the syllables in an utterance are small.

From this data set, we aimed simultaneously to maximize the usable data and minimize the number of erroneous measurements produced by our automated measurement techniques. We therefore systematically excluded several types of clauses, as follows.

Turn-length strongly affects speech rate (cf. Quené, 2008, p. 1104; Thomas, 2011, p. 188). The ending of a turn or utterance tends to be prolonged relative to the rest, which means that a shorter utterance (i.e. one where the ending represents a greater proportion) will exhibit a lower speech rate. We therefore omit short clauses (defined in section Stringency criteria). We manually excluded clauses containing English words as we are interested in the rate of speaking the heritage language, rather than English. Excluding all clauses containing words with English phonology means that all code-switches, and likely some borrowings, are excluded.

Stringency criteria

As noted above, we also needed to exclude clauses for which the acoustic and orthographic vowel counts disagreed sharply. The orthographic method picks up vowels from text in the annotations that is not transcribed speech, such as “(laugh)”, “[interruption]” or other transcriber comments. Orthography also misrepresents phonology in several ways. For example, <e> the letters and sometimes signifies specific pronunciations of a preceding <c> or <g> and other times correspond to vowel phonemes. Syllables that contain diphthongs will be miscounted as two syllables. To mitigate the latter, our script counted as only one syllable the following sequences in Italian: <ia>, <ie>, <io>, <iu>, <uo>, <ua>, <ue>. This, overall, should be more accurate than counting each of these sequences as two syllables, although they do sometimes represent that in Italian (cf. giornale [ʤornále] ‘newspaper’ vs. biologia [bioloʤía] ‘biology’).

As for the acoustic method, it is subject to errors relating to background noise. The script will be thrown off by other participants’ speech occurring during an annotation segment for the main speaker,⁵ by someone bumping the microphone, or by someone setting a coffee cup down on the table. All of these cause errors because their larger amplitude causes the script to ignore the subtler amplitude differences that actually reflect individual syllables produced by the speaker. Short, unstressed syllables may be missed in any case because they have relatively small changes in amplitude (de Jong & Wempe, 2009, p. 388). Syllables that are lost to complete phonetic reduction (see, e.g. Ernestus & Warner, 2011) will also not be picked up by the acoustic method of measuring syllables per second. While this might seem counterproductive, our goal is to measure clauses that are fully articulated and examine the variation in how quickly they are spoken. If anything, phonetic reduction is conflated with faster speech rates; disregarding tokens for which the acoustic vowel count does not agree with the orthographic count will control for this and enable us to be confident that what we are measuring is indeed speech rate variation, rather than differing degrees of phonetic reduction on top of that. Whether there are effects of generation, sex, and the like on phonetic reduction is a separate question, and one that we leave to future research.

In both directions, then, excluding the cases of disparity between the acoustic and orthographic counts served as a filter against problems with each method. Thus, we need to determine two thresholds: (1) how long the clauses should be in order to provide reliable measurements of speech rate; and (2) how much discrepancy between our two measurements is tolerable. The histograms in Figure 1 represent the distribution of the difference between the orthographic vowel count and the acoustic count (orthographic minus acoustic) for our full data sample (10,832 clauses). We see that there are quite a few tokens with discrepancies between the two measures of up to 10 syllables. For each language the data distribution is centered close to zero but with a slight positive bias. This reflects the tendency for the orthographic script to find more vowels than the acoustic one did (in any given clause).

Figure 1.

Histogram of differences between orthographic vowels and acoustic vowels found by the two methods.

We first consider the issue of minimal allowable clause length. As utterance-final lengthening causes more distortion in shorter clauses, we investigated the trade-off in amount of data available for analysis versus changes in speaker-rate rankings with three thresholds related to clause length. If we were to retain only very long clauses, where clause-final lengthening would have minimal effect, we would be working with less data. On the other hand, if we include the shorter clauses also, those will contain syllables disproportionately affected by clause-final lengthening (Quené, 2008, p. 1104; Thomas, 2011, p. 188). We compared rankings based on all clauses versus when we excluded only the clauses with fewer than five syllables versus when we excluded only those with fewer than ten syllables, using Pearson’s r. The correlations were stronger in the case of the former (as a comparison between the top and bottom values in Table 3 indicates). This is because fewer exclusions mean larger data sets, which support statistical comparison more robustly. This comparison indicates that we optimize our results by retaining all clauses with five or more syllables. In order to be retained, a clause needed to have at least five syllables according to each of the two counting methods. Worthy of note is that most of the clauses in the study were longer than five syllables to begin with; excluding the short clauses cut only 533 of the original 10,832 clauses. This approach maximizes the amount of data included in our analysis while minimizing the number of clauses that may be too short to represent speech rate reliably (in that they rank the speakers differently compared to when all available data are used).

Table 3.

Comparison of speaker rankings by articulation rate with different thresholds for data pruning (values are Pearson’s r for each measure compared to the ranking resulting from using only data where both measures produce the identical count, p<0.00001 in all comparisons).

Minimum length threshold	Difference in # of syllables between two measures	Pearson’s r
Excluded clauses of <5 syllables	±3	0.87
	±2	0.84
	±1	0.86
Excluded clauses of <10 syllables	±3	0.70
	±2	0.68
	±1	0.55

Next, we consider the issue of maximal allowable discrepancy in syllable count between the acoustic and orthographic methods. We tested four levels of stringency by comparing a difference of 0 between the two measurement methods for each token (our baseline) to differences of ±1 syllable, ±2 syllables, and ±3 syllables. For this purpose, we ranked the speakers according to median articulation rate. We then calculated Pearson correlations of the ranking of speakers’ median speech rates between the exact match data set (clauses with no difference in the counts produced by the two methods) and the other three less stringently selected data sets. This allows us to see where a change in methodology results in a re-ranking of speakers according to median articulation rate. Table 3 shows the (strong and significant) correlation of speaker rankings by each stringency level to their ranking when we use only our most exacting criteria (that is, the two methods produce exactly the same number of syllables/clause). Using the median rate for each speaker minimizes the effect of outliers caused by errors in the measurement method as well as aberrant behavior in production.

The best correlation is with exclusion of all clauses where the difference in the outcomes of the two measures is greater than three. Therefore, we include all clauses where the difference in the number of syllables counted by the two methods was three or less, and the clause contained more than five syllables, maximizing the number of clauses available to analyze. The reduction in data produced by these comparisons is shown in Table 4. Other methods of excluding less-than-ideal clauses are possible, but the critical issue of determining what is “too much difference” between methods will remain. We have addressed it by considering the effect of the different measures on the ranking of the speakers by articulation rate.

Table 4.

Number of tokens (clauses) produced and used per language (⩾ 5 syllables, ⩽ 3 syllables difference in counts produced by the two methods).

Language	Clauses transcribed	Clauses used	Clauses used for EO analyses
Homeland Italian	800	466	n/a
Heritage Italian	2,309	1,172	1,038
Heritage Russian	4,078	2,106	n/a
Heritage Ukrainian	3,645	1,887	1,777
Total	10,832	5,631	2,815

After the exclusions described in the previous section, our data collection yielded 5,631 tokens across three languages and four generations (including the homeland speakers of Italian as a separate “generation” group; see Table 4). The speakers vary according to language, generation, age, and sex, as well as their responses to the EOQ regarding their language use, attitudes, and cultural practices. The sample is not balanced for these factors as it was not collected in an experimental paradigm. Therefore, we analyze our data in the variationist framework, using linear regression analyses. As elegantly illustrated by Schmid and Dusseldorp (2010), only analyses that consider the effects of a wide range of factors (their “multifactorial web”) can accurately capture the contribution of each.

We used Articulation Rate (phonation rate with pauses longer than 0.3 seconds expurgated) as a continuous dependent variable, and test models with Language, Generation, Age, and Sex, plus an interaction of Language and Generation as fixed effects and Speaker as a random intercept. Duration of the clause is also included as a factor, since we expect longer clauses to have faster rates. Analyses are conducted using Rbrul (Johnson, 2009) within the R programming environment (R Core Team, 2014). Models are compared for best fit by comparing the (corrected) Akaike’s Information Criterion (AICc). The Akaike information criterion is used to compare the fit of various models to a data sample. AICc is a version that has a correction for small sample sizes. Significant factors are indicated in the model by values of p < 0.05. Significant contribution to the model is confirmed by running step-up/step-down models which determine which factors significantly improve the model when included, following standard variationist sociolinguistic practice.

Additionally, to address the fact that we do not have recordings in the speakers’ other language (English), we also constructed an additional mixed effects model using a data set consisting of only the Homeland and Generation 1 speakers but otherwise the same as the model in the previous paragraph. This allows us to consider the Homeland speakers as a control group.

Based on findings in the experimental literature (cf. Polinsky, 2008), we predict faster speech rates for first-generation speakers (who grew up in their homeland prior to moving to Toronto) than for second- and third-generation speakers (for whom English provided some of their input during the acquisition process as they were born in Toronto or arrived there before the age of six). We also predict more fine-grained differences if we look at specific aspects of language use and ethnic orientation that differ among speakers even within the same generation. We therefore next constructed models that included EOQ as well as the factors found significant above. In order to test for effects of EOQ, we focused on the speakers who answered at least half of the EOQ questions; these were found only among the Heritage Italian and Ukrainian speakers. Therefore, the number of tokens included in the models including EOQ was reduced to 2,815 (see Table 4) and the number of speakers to 24. These models necessarily exclude Generation as a factor because Generation is highly collinear with ethnic orientation. We compare models including two factors: the overall average EOQ score (combining responses related to questions about ethnic identity, language use, language preference, cultural environment, cultural choices, and discrimination) versus the average score for only the Language Use-related responses.

Finally, to investigate the relationship of speech rate and structural changes in heritage varieties, we make an exploratory analysis of the relationship between Articulation Rate and a few sociolinguistic variables that have been previously investigated (voice onset time or VOT, null subjects) and with pause rate, using Pearson correlation tests.

Results

The model that best fits our data shows significant main effects of clause Duration (measured as number of syllable peaks) and Generation, as Table 5 shows.⁶ The generational effect shows first-generation immigrants and homeland speakers speaking their languages more quickly, and second- and third-generation speaking more slowly. In the mixed effects model comparing only the Homeland and Generation 1 speakers, Generation did not emerge as significant, suggesting that no attrition has occurred in the Generation 1 speakers (who have lived in an English-dominant environment for at least 20 years). In tandem with the fact that the methods of participant selection and data collection were identical, it also suggests that it is valid to use the Homeland sample as a comparative norm.

Table 5.

Mixed-effects linear regression model with Articulation Rate as the dependent variable and Speaker as a random effect (N =5,631; Intercept 3.61, mean 4.20, AICc = 11786.2; deviance = 11846.3; Factors tested but found non-significant: Language, Language * Generation, Sex).

Predictor	Coefficient	# Tokens	Mean
# syll. peaks/sec (p < 0.01)	+1.03
Generation (p < 0.00001) Generation 1	0.28	2,701	4.38
Homeland	0.07	466	4.26
Generation 2	−0.03	2,074	4.05
Generation 3	−0.32	390	3.82
Sex (p = 0.07, n.s.)
Female	0.07	3,461	4.23
Male	−0.07	2,170	4.16
Language (p = 0.63, n.s.)
Italian	0.05	1,638	4.28
Ukrainian	−0.02	1,887	4.06
Russian	−0.03	2,106	4.27

Age and Generation cannot be run in the same model because of partial collinearity. While Age is significant in a model without Generation, models with Generation in place of Age fit the data better, according to comparisons of AICc as well as deviance.

There is no effect of Sex. This is unexpected given the findings of Jacewicz et al. (2009) for American English and Verhoeven et al. (2004) for Dutch, whereby men speak significantly faster than women. This is in spite of the fact that both earlier studies also use articulation rate as the dependent variable (i.e. speech rate with the pauses removed) and spontaneous speech data (though for Jacewicz et al., 2009, this is only part of the study). Thus, in the dataset as a whole, neither Age nor Sex model the speech rate variation accurately. Similarly, neither Age nor Sex has a significant effect on speech rate in a model of just the homeland Italian data that, like the model for all the data, includes Duration (as number of acoustic peaks), as a significant correlate of Articulation Rate. Future work could address the extent to which effects of Sex on Articulation Rate are language-specific.

Individual variability was captured by the inclusion of Speaker as a random effect in the model. This provides us with the mean rate for each speaker, showing the range of inter-speaker variation, as well as preventing any speakers with extreme rates from over-influencing the coefficient of the groups they are in. Random intercepts for the individuals are reported in the Appendix.

The Duration of the clauses has a significant effect: the longer the clause (measured in terms of number of articulatory peaks), the faster the articulation rate. While this is not a new finding (cf. Quené, 2008, p. 1104; Thomas, 2011, p. 188), including it in the model improves the fit and mitigates imbalances in average clause length between groups of speakers. This measure of the length of clause in terms of number of acoustic peaks provided a better-fitting model than either measuring by number of orthographic vowels or in seconds.

There was no main effect of Language, as seen in Table 5. The interaction factor of Language * Generation would have captured any differences between heritage and homeland speakers, but it did not emerge as significant and so is not shown here.

Next, we constructed models with overall Ethnic Orientation (EO) scores and Language Use scores respectively, also including clause Duration as a fixed effect and Speaker as a random effect. While overall EO score did not emerge as significant, Language Use did. However, there is not a linear relationship between this measure and Articulation Rate, as illustrated in Figure 2. Language Use is significant only when Generation is excluded from the model. That is, Language Use is significant only in so far as Generation 2 speakers self-report using their heritage language less than Generation 1 speakers, but within either generation there is no significant correlation between Articulation Rate and Language Use, as is evident in Figure 2. However, the model with Generation instead of Language Use better fits the data. Because of collinearity, that is, the fact that Generation 1 speakers generally report using their heritage language more than subsequent generations (see dark-filled shapes in Figure 2), Language Use and Generation cannot be included in the same model.

Figure 2.

Mean Articulation Rate by Language Use. Higher values indicate more (self-reported) use of the heritage language (N = 2,815).

Finally, to compare the effects on Articulation Rate between heritage and monolingual speakers, we analyzed the homeland Italian data alone. Interestingly, given findings reported previously for the effects of social factors (see Section 3), neither Age nor Sex has a significant effect on Articulation Rate among these homeland speakers. This may be the result of the size of the available data set – small considering the amount of variation inherent to speech rate – but it is worth emphasizing that we found the same lack of social factor effects in the monolingual homeland speakers as the larger data set including the heritage speakers: no effect of Age or Sex. (Factors like Language Use and Ethnic Orientation cannot be applied in the monolingual homeland context.) The similarity in patterns for the heritage and the homeland speakers suggests that the amount of input, for adults, does not affect speech rate.

Discussion

Our results are thus a mixed bag, summarized in Table 6. Articulation Rate varies according to Generation and to the highly correlated (self-reported) Language Use. Partially resembling Irizari van Suchtelen’s (2016) report on heritage Spanish speakers, however, there is no significant difference between the speech rate of the adjacent groups of Homeland and Generation 1 speakers, though there are significant differences between Generations 1 and 2 and between Generations 2 and 3.

Table 6.

Predicted and observed correlates to proficiency.

Hypothesized source of correlation	Factor	Observed?
Factors predicted to reflect fluency	Generation	yes
	(Self-reported) Language Use	(yes)
	Ethnic Orientation (EO)	no
Factors predicted to reflect sociolinguistic identity	Sex	no
Factors predicted to reflect sociolinguistic identity	Age	(yes)

Articulation Rate does not vary according to two social factors that typically condition sociolinguistic variables: Sex and Age. Age is marked parenthetically in Table 6 because, while having a significant effect, it seems to be an epiphenomenon of Generation (since immigration), which better models the data. Language Use is marked parenthetically because the effect is not a straightforward monotonic one of the type that would support using Articulation Rate as a measure of proficiency and it is, unsurprisingly, collinear with Generation. These findings do not map tidily onto either of the most straightforward interpretations. If speech rate were associated with proficiency, we would expect monotonically-related effects of Generation, Language Use, and EOQ. On the other hand, if it were acting as a sociolinguistic variable as it is claimed to do in the speech rate studies of monolinguals cited above, Sex and Age should condition the variation. However, as summarized in Table 6, these data show that only Generation and Language Use (but not overall EO) are significant predictors of Articulation Rate. Schmid and Dusseldorp (2010, pp. 130, 152) also note the weakness of empirical corroboration of the “basic prediction” that the less one uses a language the slower retrieval from memory, and thus speech, will be.

Also surprising is the lack of a Language effect. Although this is exactly what Roach (1998) predicted, it is unexpected for three reasons: (1) Italian has a very different syllable template from the Slavic languages; (2) inter-language differences in speech rate have been reported (Pellegrino et al., 2011); and (3) vowel counting scripts do not necessarily work in identical ways on different languages (Pellegrino et al., 2011; see also de Jong & Wempe, 2009, p. 390; Verhoeven et al., 2004, p. 298).

But perhaps the most surprising finding, if we assume that speech rate can serve as a proxy for proficiency, is a set of subsequent results that directly contradict Polinsky’s report on the correlation between speech rate and gender marking (see Nagy, 2015, for discussion of why experimental and corpus-based heritage language data may lead to different results). Considerations of space preclude us from developing complete analyses of sociolinguistic variation in this article, but we have calculated correlations between speakers’ median speech rate and their performance with two previously examined sociolinguistic variables, voice onset time (VOT) and variable null subject presence (prodrop). These two variables are “conflicts sites” (Poplack & Meechan, 1998) in that they are areas of difference between English and the heritage languages studied here: all three have short-lag VOT for voiceless stops while English has long-lag. Similarly, all three heritage languages allow frequent null subjects (50–80% of clauses) while English has virtually categorically overt subjects. Thus, we might expect interference from English among speakers who are less fluent in their heritage language.

However, Table 7 shows the lack of significant correlation to articulation rate in all cases. Correlations of speech rate to average VOT and to percentage of finite clauses with null subjects (by speaker) are calculated as Pearson’s product-moment correlations in R. For methods of analysis for word-initial voiceless stops /p, t, k/, see Nagy and Kochetov (2013); for null subjects, see Nagy (2015). In the case of VOT, a strong negative correlation would mean that people who speak more quickly have shorter VOTs (more like homeland Italian and Russian phonetic patterns) while those who speak more slowly have longer-lag patterns indicative of English interference in production. Similarly, strong positive correlations with variable prodrop patterns would indicate that people who speak more quickly have higher or more homeland-like null subject rates, compared to those who speak more slowly and, under influence from English, produce fewer null subjects in their heritage variety. However, no p-values are significant and no correlations are strong.

Table 7.

Correlations (Pearson’s r) between speech rate and two sociolinguistic variables.

	VOT			Null subjects
Speaker group	# Speakers	r	p-value	# Speakers	r	p-value
Heritage Italian	8	0.33	0.42	8	−0.13	0.77
Heritage Russian	5	−0.05	0.93	6	−0.55	0.26
All	13	−0.41	0.10	14	−0.50	0.06

We find the same lack of evidence that speech rate measures proficiency when we compare the number of pauses per second to Articulation Rate. We make this comparison within each Heritage group, as well as across the full set of Italian (homeland and heritage) speakers, and across all speakers, as shown in Table 8. While we might expect that speakers with more limited input might exhibit both slower speech and more frequent pauses, there is just one strong and significant correlation. Given that we tested 11 correlations in this section, this one significant effect may be interpreted as an unsurprising false positive. While it may also be a real effect, it would be hard to account for why Heritage Russian speakers who speak more quickly also have more pauses. While the number of pauses was shown to correlate (negatively) with vocabulary size for L2 speakers by de Jong and Bosker (2013, p. 19) and both were considered valid measures of fluency, we do not find much relationship between speech rate and number of pauses in our data - which is surprising if both measures are understood to respond to increasingly limited input.

Table 8.

Pearson correlations between speech rate and number of pauses (> 0.3 seconds) per second.

Speaker group	# Speakers	Pearson’s r	p-value
Heritage & Homeland Italian	23	−0.10	0.63
Heritage Italian	12	−0.12	0.72
Heritage Russian	12	0.74	0.01*
Heritage Ukrainian	16	−0.27	0.32
All	51	−0.07	0.62

Conclusion

We posed and addressed the question of whether speech rate would show signs of being a stand-in for – or at least a correlate of – proficiency in three heritage languages of Toronto. We found effects of Generation (or of Language Use), as anticipated. Based on a body of sociolinguistic studies showing effects of age and sex among monolinguals (see Section Speech rate and L2 variation), we expected Age and Sex effects. Sex effects were not found. While Age effects were found, they match the data less well than Generation, with which Age is collinear. Our results suggest that using speech rate as a proxy for proficiency – at least in the context of heritage languages – is valid. That said, the factor with the largest effect on Articulation Rate among the heritage speakers is Generation, and the generations are most distinguished by which was the dominant language in the speaker’s childhood. Having established a method to examine speech rate efficiently in spontaneous speech from a range of speaker types, we briefly examined the relationship between speech rate and two linguistic structural patterns. Here, we are not able to report any correlation to speech rate of the choice between more English-like variants versus variants preferred in the homeland varieties (Table 7). Further analysis is required in order to better understand the relationship of articulation rate to proficiency, and the role of fluency in sociolinguistically variable patterns of heritage language speakers.

Footnotes

Appendix

Appendix A.

Individual average speech rate measures (syllables/second) ranked from fastest to slowest, Fifty-one speakers, measurements based on a total of 5,593 tokens. Mean 4.16 vowels/second, standard deviation 0.36. For convenience, each speaker’s average Ethnic Orientation (EO) scores (if available) is also listed in this table, but these values were not considered in the model from which these intercepts are taken.

Speaker	Intercept	N (clauses)	Median speech rate (syllables/sec)	EO average score	Language use score
R1F55B	0.43	126	4.91	N/A	N/A
IXM47A	0.27	10	4.80	N/A	N/A
IXM35A	0.38	19	4.80	N/A	N/A
IXF51A	−0.22	9	4.67	N/A	N/A
R1M62A	0.27	166	4.64	N/A	N/A
I2F44A	0.36	86	4.59	1.03	1.50
R1M47A	0.32	205	4.58	N/A	N/A
I1M62A	0.08	153	4.52	1.30	1.50
IXM52A	0.26	11	4.52	N/A	N/A
I1M61B	0.01	179	4.46	1.33	2.00
IXF43A	−0.23	12	4.45	N/A	N/A
I1F71A	−0.15	205	4.44	1.68	2.00
IXF22A	0.22	201	4.41	N/A	N/A
U1M46A	0.10	152	4.40	1.25	1.25
U1F85A	0.00	302	4.37	1.49	2.00
I2F57A	0.22	125	4.35	0.95	1.00
I2M53A	0.27	21	4.35	N/A	N/A
R2F50A	0.20	200	4.33	N/A	N/A
R2F20A	0.21	93	4.30	N/A	N/A
R1M62D	−0.16	130	4.28	N/A	N/A
R1F47A	−0.17	445	4.27	N/A	N/A
U2F44A	0.07	156	4.26	0.84	1.00
IXM14B	0.20	12	4.23	N/A	N/A
R2F31A	0.08	65	4.23	N/A	N/A
U3F26A	0.27	45	4.22	0.9	0.50
I2F53A	−0.08	54	4.21	1.00	1.00
R1F81A	−0.40	298	4.17	N/A	N/A
U1M51A	0.11	110	4.15	N/A	N/A
I1M75A	−0.17	103	4.15	1.14	2.00
U2F21A	0.13	99	4.12	0.82	0.00
I2F45A	−0.17	67	4.08	1.50	2.00
IXF35A	−0.13	161	4.05	N/A	N/A
U3F50A	0.03	68	3.98	0.75	0.00
R2F53A	−0.10	151	3.98	N/A	N/A
U2F60A	−0.11	159	3.97	0.77	0.00
U3M19A	0.16	31	3.94	0.88	0.50
I2M52A	−0.07	67	3.93	0.90	0.00
U1M85A	−0.27	127	3.91	1.53	2.00
U2F14A	−0.04	69	3.91	1.48	0.50
U2M57A	0.01	193	3.89	0.89	0.00
U2F20A	−0.22	119	3.81	0.82	0.00
U2M60A	−0.09	123	3.76	1.11	0.00
I3F21A	−0.14	74	3.76	N/A	N/A
IXM19B	−0.08	6	3.76	N/A	N/A
U3F19A	−0.07	72	3.75	0.87	0.75
IXM19A	−0.21	11	3.70	N/A	N/A
U3M21A	−0.08	62	3.66	1.26	1.75
R2M30A	−0.26	169	3.60	N/A	N/A
R2M50A	−0.43	58	3.54	N/A	N/A
IXM18A	−0.45	14	3.51	N/A	N/A
I3M22B	−0.16	38	3.45	N/A	N/A

Acknowledgements

We are extremely grateful to the many students and research assistants who collected, transcribed, analyzed and interpreted the data used in this study. They are listed at http://projects.chass.utoronto.ca/ngn/HLVC/3_2_active_ra.php and . We also thank the editors and reviewers for helpful suggestions for clarifying our findings.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Social Science and Humanities Research Council of Canada [grant numbers 410-2009-2330, 435-2016-1430]; a Shevchenko Foundation Heritage Program grant and the Canadian Institute for Ukrainian Studies.

ORCID iD

Marisa Brook

Notes

Author biography

Naomi Nagy is a professor of Linguistics at the University of Toronto. Her area is sociolinguistics, with particular interest in understanding language contact through variationist sociolinguistic approaches. She directs the Heritage Language Variation and Change () Project. She has published recently in Asia-Pacific Language Variation, International Journal of Bilingualism, International Journal of the Sociology of Language, Journal of Phonetics, Journal of Sociolinguistics, Language Documentation and Conservation, Language Learning, Language Variation and Change and Linguistic Approaches to Bilingualism.

Marisa Brook is an assistant professor at the University of Toronto. Her research is based in variationist sociolinguistics, especially with respect to diachrony and/or language contact.

References

Aalberse

Backus

Muysken

(2019). Heritage languages: A language contact approach. John Benjamins.

Biemans

(2000). Gender variation in voice quality (PhD thesis). Katholieke University Nijmegen, Nijmegen, Netherlands.

Boersma

Weenink

(2015). PRAAT: Doing phonetics by computer [Computer software]. Version 5.4.20. http://www.praat.org/

Brugman

Russel

(2004). Annotating multimedia/multi-modal resources with ELAN. In Lino

M. T.

Xavier

M. F.

Ferreira

Costa

Silva

(Eds.), Proceedings of LREC 2004: Fourth international conference on language resources and evaluation. ELAN (software). http://tla.mpi.nl/tools/tla-tools/elan/

Chambers

(1997). What do we mean by fluency? System, 25(4), 535–544.

Comrie

Corbett

(2002). The Slavonic languages. Routledge.

Daller

M. H.

Yıldız

de Jong

N. H.

Kan

Başbaĝi

(2011). Language dominance in Turkish-German bilinguals: Methodological aspects of measurements in structurally different languages. International Journal of Bilingualism, 15(2), 215–236.

de Jong

N. H.

Bosker

(2013). Choosing a threshold for silent pauses to measure second language fluency. In Eklund

(Ed.), Proceedings of disfluency in spontaneous speech (DiSS) 2013 (pp. 17–20). KTH Royal Institute of Technology.

de Jong

N. H.

Groenhout

Schoonen

Hulstijn

J. H.

(2015). Second language fluency: Speaking style or proficiency? Correcting measures of second language fluency for first language behavior. Applied Psycholinguistics, 36(2), 223–243.

10.

de Jong

N. H.

Wempe

(2009). Praat script to detect syllable nuclei and measure speech rate automatically. Behavior Research Methods, 41(2), 385–390.

11.

Derwing

T. M.

(1990). Speech rate is no simple matter. Studies in Second Language Acquisition, 12(3), 303–313.

12.

Ernestus

Warner

(2011). An introduction to reduced pronunciation variants. Journal of Phonetics, 39(3), 253–260.

13.

Ginther

Dimova

Yang

(2010). Conceptual and empirical relationships between temporal measures of fluency and oral English proficiency with implications for automated scoring. Language Testing, 27(3), 379–399.

14.

Haspelmath

(2011). The indeterminacy of word segmentation and the nature of morphology and syntax. Folia Linguistica, 45(1), 31–80.

15.

Hewlett

Rendall

(1998). Rural versus urban accent as an influence on the rate of speech. Journal of the International Phonetic Association, 28(1–2), 63–71.

16.

Irizari van Suchtelen

(2016). Spanish as a heritage language in the Netherlands: A cognitive linguistic exploration (PhD thesis). Katholieke University Nijmegen, Nijmegen, The Netherlands.

17.

Jacewicz

Fox

R. A.

O’Neill

Salmons

(2009). Articulation rate across dialect, age, and gender. Language Variation and Change, 21(2), 233–256.

18.

Johnson

D. E.

(2009). Getting off the GoldVarb standard: Introducing Rbrul for mixed-effects variable rule analysis. Language and Linguistics Compass, 3(1), 359–383.

19.

Kagan

Friedman

(2003). Using the OPI to place heritage speakers of Russian. Foreign Language Annals, 36(4), 536–545.

20.

Kahng

(2014). Exploring utterance and cognitive fluency of L1 and L2 English speakers: Temporal measures and stimulated recall. Language Learning, 64(4), 809–854.

21.

Kormos

Dénes

(2004). Exploring measures and perceptions of fluency in the speech of second language learners. System, 32(2), 145–164.

22.

Labov

(1984). Field methods of the project on linguistic change and variation. In Baugh

Sherzer

(Eds.), Language in use: Readings in sociolinguistics (pp. 22–66). Prentice Hall.

23.

Lennes

(2002). Script to save labeled intervals to wav sound files. Script for Praat. www.helsinki.fi/~lennes/praat-scripts/public/save_labeled_intervals_to_wav_sound_files.praat

24.

Nagy

(2011). A multilingual corpus to explore geographic variation. Rassegna Italiana di Linguistica Applicata [Italian Review of Applied Linguistics], 43(1–2), 65–84.

25.

Nagy

(2015). A sociolinguistic view of null subjects and VOT in Toronto heritage languages. Lingua, 164(Part B), 309–327.

26.

Nagy

(2016). Heritage languages as new dialects. In Côté

M.-H.

Knooihuizen

Nerbonne

(Eds.), The future of dialects: Selected papers from Methods in Dialectology XV (pp. 15–34). Language Science.

27.

Nagy

Kochetov

(2013). VOT across the generations: A cross-linguistic study of contact-induced change. In Siemund

Gogolin

Schulz

Davydova

(Eds.), Multilingualism and language contact in urban areas: Acquisition - development - teaching - communication (pp. 19–38). John Benjamins.

28.

O’Brian

Segalowitz

Freed

Collentine

(2007). Phonological memory predicts second language oral fluency gains in adults. Studies in Second Language Acquisition, 29(4), 557–581.

29.

Pellegrino

Coupé

Marsico

(2011). A cross-language perspective on speech information rate. Language, 87(3), 539–558.

30.

Polinsky

(2008). Gender under incomplete acquisition. Heritage Language Journal, 6(1), 40–71.

31.

Polinsky

(2012). What linguistics can learn from heritage languages: Research questions and research methods. Conference on formal approaches to heritage languages. University of Massachusetts. http://www.umass.edu/larc/events/fahl2012/files/fahl2012_polinsky.pdf

32.

Poplack

Meechan

(1998). Introduction: How languages fit together in codemixing. International Journal of Bilingualism, 2(2), 127–138.

33.

Quené

(2008). Multilevel modeling of between-speaker and within-speaker variation in spontaneous speech tempo. Journal of the Acoustical Society of America, 123(2), 1104–1113.

34.

Quené

Persoon

de Jong

N. H.

(2010). Praat script syllable nuclei v2. https://sites.google.com/site/speechrate/Home/praat-script-syllable-nuclei-v2

35.

R Core Team. (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing. http://www.r-project.org.

36.

Ramig

(1983). Effects of physiological aging on speaking and reading rates. Journal of Communication Disorders, 16(3), 217–226.

37.

Richards

(1987). Type-token ratios: What do they really tell us? Journal of Child Language, 14(2), 201–209.

38.

Riggenbach

(1991). Toward an understanding of fluency: A microanalysis of nonnative speaker conversations. Discourse Processes, 14(4), 423–441.

39.

Roach

(1998). Myth 18: Some languages are spoken more quickly than others. In Bauer

Trudgill

(Eds.), Language myths (pp. 150–158). Penguin.

40.

Schmid

Dusseldorp

(2010). Quantitative analyses in a multivariate study of language attrition: The impact of extralinguistic factors. Second Language Research, 26(1), 125–160.

41.

Schneider

E. W.

(2008). Postcolonial English: Varieties around the world. Cambridge University Press.

42.

Thomas

E. R.

(2011). Sociophonetics: An introduction. Palgrave Macmillan.

43.

Tidball

Treffers-Daller

(2007). Exploring measures of vocabulary richness in semi-spontaneous French speech: A quest for the Holy Grail? In Daller

Milton

Treffers-Daller

(Eds.), Modelling and assessing vocabulary knowledge (pp. 133–149). Cambridge University Press.

44.

Verhoeven

De Pauw

Kloots

(2004). Speech rate in a pluricentric language: A comparison between Dutch in Belgium and the Netherlands. Language and Speech, 47(3), 297–308.

45.

Whiteside

S. P.

(1996). Temporal-based acoustic-phonetic patterns in read speech: Some evidence for speaker sex differences. Journal of the International Phonetic Association, 26(1), 23–40.

46.

Wood

(2001). In search of fluency: What is it and how can we teach it? The Canadian Modern Language Review/La revue canadienne des langues vivantes, 57(4), 573–589.

47.

Yuan

Cieri

Liberman

(2006). Towards an integrated understanding of speaking rate in conversation. Paper presented at the 2006 International Conference on Spoken Language Processing (Interspeech), Pittsburgh, PA. http://papers.ldc.upenn.edu/Interspeech2006/Interspeech_2006_Speech_Rate_Paper.pdf