Abstract
This study presents two experiments aimed at investigating tune-to-text alignment and pitch scaling in Lifou French, a variety spoken by bilingual speakers of French and Drehu. Descriptions of New Caledonian French have focussed on language use of European descendants or the variety spoken in the urban region, neglecting emergent varieties spoken by the indigenous population in rural areas, like the island Lifou. Due to the reduced inventory of pitch accents, dialectal variation in French intonation has proved to be difficult to detect, which has led to the assumption that French has a relatively homogeneous intonation system across its varieties. This study shows that fine-grained phonetic differences in speaking tempo and at the level of tonal alignment as well as in the scaling of AP-final peaks can be attributed to dialectal variation.
1 Introduction
Kanaky, 1 more commonly known as New Caledonia, is sui generis 2 a collectivity of France situated in the South Pacific about 16,000 km away from mainland France. Lifou is a small island belonging to New Caledonia and which counts no more than 9500 inhabitants; most of them are Kanak and speakers of Drehu, an Oceanic language (ISEE, 2014). From the 1950s, a mass education system was established which was aligned to the Metropolitan French syllabus (Vernaudon, 2015) and although this system only fully reached Lifou during the last few decades, 3 today there are almost no monolingual Drehu speakers and children are raised bilingually. New Caledonia is a linguistically diverse region with a relatively large number of linguistic communities speaking 28 Kanak languages, additional ten migrant languages from Asia, Polynesia, Europe, and French, the lingua franca of the region (Dotte et al., 2017; Vernaudon, 2015).
Linguistic contact between languages from the Loyalty Islands and French was noted but only considered a marginal phenomenon since, in the 1950s to 1960s, the majority of islanders reportedly spoke their own Kanak languages (Hollyman, 1971; Tryon, 1963). Thereafter descriptions of New Caledonian French have focussed on language use of European descendants (Hollyman, 1979; Pauleau, 1988). On the other side of the spectrum, studies have dealt with either French-based pidgin or creole languages (Hollyman, 1964; Kihm, 1995), neglecting emergent French varieties spoken by the indigenous population in rural regions, such as the Loyalty Islands. However, it has been suggested there is more granularity between the opposites of Standard French and French-based creole (Ehrhart, 2016) and recent acoustic studies on French in Noumea indicate there is a considerable degree of heterogeneity, at least in the representation of vowels (Lewis, 2015, 2019). Arguably, New Caledonia’s linguistic diversity, together with an increased social mobility and greater access to the French school system (Vernaudon, 2015), have led to a series of contact situations between French and local Kanak, as well as immigrant languages. Although studies claim that some urban varieties associated with linguistic contact situations are stigmatised (Colombel-Teuira et al., 2017; Fillol, 2016), one ethnographic study on linguistic attitudes of young urban Oceanic speakers suggests that these varieties of French can be the source of pride (Barnèche, 2005). Taking into account the latter observations and previous studies on minoritised populations (Labov, 1986; Mendoza-Denton, 2014), it is conceivable that phonetic traits of French contact varieties can be carriers of covert prestige. However, only little is known on the specific phonetic characteristics, especially regarding prosodic traits of these varieties.
Prosodic studies in Lifou are of particular interest considering the rather recent but now well-established contact situation between the two languages of bilinguals. French is a Romance language whose origins are found in Europe and Drehu represents an Oceanic language from the Southern Melanesian linkage (Crowley et al., 2011). Current accounts of French intonational phonology analyze it as a language with phrasal prominence marking (Jun & Fougeron, 2002); on the contrary, Drehu had been impressionistically described as having lexical stress (Lenormand, 1954; Tryon, 1968). However, recent acoustic investigations of the intonational phonology of Drehu indicate there is phrasal prominence marking, with a low tone demarcating the left and a high tone the right edge of the prosodic word (Torres et al., 2018b; Torres and Fletcher, 2020). Additionally, a first investigation of tonal properties of the Accentual Phrase in Lifou French claims that tonal targets in this variety are the same as those from Metropolitan French (Torres et al., 2018a). The term Metropolitan French here refers to French spoken in mainland France and encompasses speakers from Paris and Southern France, whose speech was analyzed in the studies we replicate (D’Imperio & Michelas, 2014; Welby, 2006). In the following, two experiments aimed at investigating the intonational structure of the Accentual Phrase (AP), the intermediate phrase (ip), and the Intonation Phrase (IP) will be presented. Our aim is to describe tonal alignment in the AP, scaling processes related to the ip, and to determine regional variation on a prosodic level.
1.1 Intonational phonology
A number of studies have found that intonation represents a characteristic regional marker that indicates differences between related linguistic varieties. These discrepancies can be reflected in the realisation of tonal alignment, composition of pitch accents or more complex intonation tunes. In Spanish, a language spoken in a number of countries by rather large populations, intonation has been claimed to be one of the most distinctive dialectal markers (Hualde & Prieto, 2015). Intonation contours like the Mexican declarative circumflex contour or the long fall in Argentinian Spanish (Kaisse, 2001) are some examples. Similarly, in languages with smaller numbers of speakers, regionally bound differences in intonation have been attested. For example, it was found that Northern and Algherese varieties of Catalan display different pitch accent realisations compared to other varieties. Moreover, we might find differences in intonational phonology that are rooted in the diversity of intonational typology. Unlike its closely related neighbor languages Spanish and Catalan, French did not preserve the lexical stress pattern originally found in Latin. Within Autosegmental Metrical phonology (AM) it is well established that French represents a language with phrasal prominence which is marked with a pitch accent at the right edge (Delais-Roussarie et al., 2015; Jun & Fougeron, 2000, 2002; Post, 2000). Although it is generally accepted that there are several dialectal varieties of French, compared to other Romance languages, it seems that French shows less dialectal intonational variation (Delais-Roussarie et al., 2015). It has been speculated this is due to the historically long-lasting and relatively high standardisation of the language, or could be as well just a side effect of a preference to study the Metropolitan variety in the school system. Alternatively, French intonational typology, which crucially differs from other Romance languages but resembles more that of languages such as Bengali, Georgian, Turkish, or Korean, with strong edge marking intonational events (Jun, 2014; Jun & Jiang, 2019), might be the reason why regional variation has not been immediately apparent. A study on Yanbian and Seoul Korean (Jun & Jiang, 2019) showed dialectal variation that originates in the realisation of prosodic phrasing. While the two varieties seem not to differ much in their intonational phonologies and the AP is described as the smallest prosodic unit for both Yanbian and Seoul Korean, prosodic phrasing phenomena are realised differently. Prosodic phrasing differs in its phonetic realisation and its phonological function. While the AP accounts for the dual function of marking prominence and syntactic structure in Yanbian Korean, the intermediate phrase (ip) is the prosodic unit that has the same dual function in Seoul Korean. In view of this finding, it can be hypothesised that French regional varieties will not differ regarding the composition of pitch accents but rather in their phonetic encoding in relation to phrasing.
1.2 French prosody
1.2.1 The Accentual Phrase
There is general agreement that in French the utterance can be divided into smaller units which have been termed differently: Intonème mineur (Rossi, 1985), Intonation Group (Mertens, 1987), Prosodic Word (Vaissière, 1991), Rhythmic Unit (Di Cristo, 1998), Phonological Phrase (Post, 2000), or Accentual Phrase (Jun & Fougeron, 2000; Welby, 2006)—the term employed in this study.
Within AM, the Accentual Phrase (AP) represents the lowest tonally marked prosodic constituent in French (Jun & Fougeron, 2002; Welby, 2006). The AP consists of one or more content words, optionally preceded by one or more function words, and can contain up to seven syllables (Pasdeloup, 1992).
Jun and Fougeron (2002) proposed the notation /LHiLH*/ as the underlying tonal pattern of the canonical AP. To avoid confusion between the two low tones, we will use /L1HiL2H*/ in this description, following a similar notation by Welby and Loevenbruck (2006). In French, the position of stress is fixed at the word level and its realisation relies upon the location of a word within a phrase. An obligatory phrase final rise (L2H*) is associated to the metrically strongest and last full syllable of the phrase, whereas the optional, non-accentual rise can occur phrase initially (L1Hi) (Jun & Fougeron, 2002; Welby, 2006). As exemplified in Figure 1, tonal targets within the AP can be undershot and apart from the canonical pattern in (a), five other tonal patterns have been identified. Moreover, for shorter APs (of three or fewer syllables) a more common pattern is /L1H*/ (Jun & Fougeron, 2000; Welby, 2006).

Six predicted surface realisations of the AP from Jun and Fougeron (2002). The notation in (a) represents the canonical pattern and the remaining five notations are alternative realisations. Undershot tones appear in parentheses.
As for the canonical pattern, the tone Hi corresponds to the optional phrase-initial prominence, and H* represents the phrase-final prominence, which is also stronger in pitch and duration. According to Welby (2006), the initial L1Hi sequence represents an edge tone which is structurally different from the final L2H* rise which is a pitch accent. The initial L1 tone is associated with the left edge of the constituent boundary of the AP, while Hi is variable and not linked to any particular syllable within it. The final rise (L2H*), marks the right boundary of the AP and has a double association, then while it marks the right edge, its peak H* is at the same time associated with the stressed syllable. The L2H* tone is considered a pitch accent because part of the tone is associated with a stressed syllable at the phrasal level. It should be noted that there is agreement on the definition of pitch accent for French as being different from that used for Germanic languages where it is associated with a lexically stressed syllable (Jun & Fougeron, 2002; Welby, 2006). Additionally, the realisation of the L2 tone is more variable due to its lack of association with any specific syllable. The L2 tone can occur on the penultimate syllable of the AP, meaning on the syllable immediately preceding the H*, but also on the final syllable together with H*.
Although variability has been attested in tune-to-text alignment, the intonational structure of the AP seems to be fairly stable across varieties. A study on tonal alignment in Vaudois French 4 (Sertling-Miller, 2007) found that relative to the Metropolitan variety, the phonological specifications of tonal targets of the AP were not dramatically different. However, Sertling-Miller (2007) also notes that Vaudois speakers articulate APs and IPs at a slower rate. This finding is further confirmed by Schwab and Avanzi (2015), who report evidence for speaking rate differences with Swiss speakers from Neuchâtel and Nyon showing a slower tempo than speakers from Paris and Lyon. It is thus conceivable that gradient variation in alignment along with durational differences could contribute to the perceived dialectal variation.
1.2.2 The intermediate phrase (ip) and Intonation Phrase (IP)
Current descriptions of French intonational phonology explain the Intonation Phrase (IP) is marked by a major continuation rise or a major final fall and thus by a phrase final tone (H% and L%) which can be optionally followed by a pause (Jun & Fougeron, 2000, 2002).
Although the IP represents a rather uncontroversial prosodic level, there has been an ongoing theoretical debate concerned with the existence of the intermediate phrase (ip) as further level between the AP and the IP. While Post (2000) clearly opposed an ip level, Jun and Fougeron (2000) initially argued for it but did not maintain this view later on (Jun & Fougeron, 2002). More recent accounts examine the internal structure of the IP and find evidence for the ip level (D’Imperio & Michelas, 2010, 2014; Michelas & D’Imperio, 2012). These studies emphasise the relationship between syntactic structure and intonation and suggest the demarcation of syntactic constituents and prosodic levels can be phonetically linked. They postulate that the double association of the phrase final rise (L2H*) leads to a difficulty in determining the source of variability in F0 and duration at the right edge of prosodic constituents since accentuation and phrasal boundaries always coincide in French (Michelas & D’Imperio, 2012). Conversely, a prosodic hierarchical analysis can account for these discrepancies.
D’Imperio and Michelas (2014) find phonetic cues which under certain circumstances are stronger at the right edge of prosodic constituents and suggest there are two prosodic levels that can be distinguished: the AP and the ip. The acoustic cues associated with the right edge would be stronger at the ip than at the AP boundary. More precisely, an investigation of the right edge of subsequent APs found that pitch-scaling effects were related to the internal structuring of the IP. Figure 2 visualises the modulation of F0 in an utterance consisting of subsequent short APs. To investigate scaling, the height of right edge peaks was compared relative to the utterance initial peak which reportedly sets the F0 reference level within the IP. It is shown that in declarative utterances the syntactic break between a complex noun phrase (NP) and a verb phrase (VP) triggers complete pitch reset, meaning that recursive downstep of subsequent AP-final LH* rises is blocked. In other words, it is found that within a larger intermediate phrase, AP-final syllables are produced with lower F0 values when in non-final position (declination) but register shift and a complete reset of F0 is found when immediately preceding an ip right boundary. This means that F0 at the right boundary of the ip is scaled to the level of the IP initial peak which sets the reference level in the utterance. Additionally, they find greater vowel lengthening within ip-final syllables than in non-final ones.

Schematic representation of intermediate phrase boundary marking through complete pitch reset in Metropolitan French. The black line represents the reference pitch level for the first phrase. The dashed line represents the reference pitch level for the second phrase. Every LH* corresponds to one AP and Hpb marks the post-boundary high tone. Adapted from D’Imperio and Michelas (2014).
An exploratory study of Lifou French finds internal restructuring within the IP which seems to be related to a demarcation of the ip in this variety (Torres et al., 2019). The left panel in Figure 3 shows downstep blocking in Lifou French and the right panel complete pitch reset in Metropolitan French. Contrary to observations on the Metropolitan variety, scaling processes in Lifou French suggest there is downstep blocking across the entire ip, meaning that peaks maintain the F0 level across this prosodic constituent. However, this study only included a small number of analyzed utterances and it would be of interest to test whether this trend is consistent when evaluating more data from a larger number of participants. Overall these studies show gradient differences of phonetic cues marking two distinct boundaries and argue in favour of the existence of the ip as further prosodic level in Metropolitan and Lifou French. In the following, we are interested in examining whether the previously reported patterns for the ip in Lifou French can be confirmed and whether the insertion of pause after APs conditions a higher prosodic level, namely the IP.

Schematic representation of intermediate phrase boundary marking through (left) downstep blocking in Lifou French and (right) complete pitch reset in Metropolitan French. The black dashed line represents the reference pitch level for the first phrase and the grey dashed line represents the level of downstepped peaks. Every peak (H*) corresponds to one AP and Hpb marks the post-boundary high tone.
2 Experiment 1
This experiment seeks to investigate the accentual pattern of the AP, and more specifically properties of tonal alignment to segmental landmarks in Lifou French. The aim of this experiment is to describe the tonal structure of this constituent specially with regard to tune-to-text alignment and so to test whether, as stated in Torres et al. (2018a), the AP in Lifou French shares the same tonal properties as that of the Metropolitan variety. Due to the difficulty to trace dialectal differences in French intonation (Delais-Roussarie et al., 2015) which might be only small and gradient between varieties (Sertling-Miller, 2007), a controlled laboratory phonological experiment appears to be the most appropriate approach. This is because Lifou French has been largely undocumented and there is no literature we can draw on; more importantly, because the differences are expected to be fine-grained and therefore more noticeable and measurable within controlled speech.
2.1 Hypotheses
One goal of this study is to examine the properties of tonal alignment in Lifou French in comparison to Metropolitan French in order to evaluate whether intonational phonology in Lifou French shows evidence of regional variation. Therefore, the following eight hypotheses will be tested:
H1 CANONICAL ACCENT PATTERN HYPOTHESIS: Similarly to the Metropolitan variety, it is expected that the most frequent pattern will be LHiLH* followed by LH*.
H2 PEAK HEIGHT HYPOTHESIS: For LHiLH* patterns the scaling of Hi and H* is different in that H* represents the stronger peak in terms of fundamental frequency.
H3 EARLY L ASSOCIATION HYPOTHESIS: The tonal alignment of the L1 tone of the initial rise is edge seeking, meaning that L1 will be situated at the edge between the monosyllabic function word and the following content word.
H4 VARIANT EARLY HI SEGMENTAL ANCHORING HYPOTHESIS: Tonal alignment of the early peak is variable meaning that the peak of Hi is dependent on time constraints and difficult to predict based on a segmental landmark in the AP.
H5 VARIANT EARLY RISE CONSTANT SLOPE HYPOTHESIS: The time of the rise excursion of L1Hi is variant and it does not reliably predict the F0 excursion.
H6 VARIANT LATE L ASSOCIATED TONE HYPOTHESIS: Tonal alignment of the L2 tone of the final rise is variable, meaning that it is dependent on time constraints and difficult to predict based on a segmental landmark in the AP.
H7 LATE H ASSOCIATED TONE HYPOTHESIS: The position of the tone can be described with respect to the duration of the last full syllable of the AP.
H8 VARIANT LATE RISE CONSTANT SLOPE HYPOTHESIS: The time of the rise excursion of L2H* is variant and it does not reliably predict variability in F0 excursion.
2.2 Methods
2.2.1 Participants
Recordings were made during a field trip to Lifou in 2017 where five female speakers (age 29–47) participated. The participants responded to a linguistic questionnaire adapted from the Bilingual Language Profile and administered online (Gertken et al., 2014). All participants reported they were permanent residents in Lifou, they acquired French and Drehu during childhood (starting at no later than 7 years with either language), they were schooled in French, and had varying degrees of school instruction in Drehu (0–10 years). The degree of education varied from finishing primary school for the eldest, to the equivalent of a bachelor’s degree (French licence) for a 36-year-old participant. Additionally, all participants work in the local community in professions that require them to speak in both languages (e.g., librarian, secretary).
2.2.2 Materials
Materials for this experiment were taken from Welby (2003) who investigated tonal alignment in Metropolitan French. A number of utterances were chosen because they ensured comparable data regarding the position of both rises, and eventual variation in tonal patterns. Elicitation materials consisted of a set of carrier phrases, with 17 target words consisting of 2 (5x), 3 (8x) or 4 (4x) syllables. All target words had only sonorant or voiced consonants and were preceded by 1 or 2 monosyllabic function words. The position of the target phrase was manipulated and target tokens (here in bold) were inserted in sentence initial or medial positions, as illustrated in examples 1 and 3. Appendix A contains all utterances that served as stimuli. 5
(1)
“The minimum will be calculated by Manon.”
(2)
“And the minimum will be calculated by Manon.”
(3) Le maximum,
“The maximum, the minimum, and the standard deviations will be calculated by Manon.”
(4) Le maximum
“The maximum and the the minimum will be calculated by Manon.”
2.2.3 Procedure
The first author, who is a fluent speaker of French, recruited all participants, gave the necessary instructions, and conducted the experiment. The recordings were carried out in a quiet room of the local library or of the community center in which the participants worked. Speakers were recorded in individual sessions, at a sampling rate of 48kHz and 16-bit depth, using a Zoom H6 Handy recorder and a head mounted-microphone. The materials were provided printed on paper and participants had time to read the utterances and familiarise themselves with them prior to recording.
2.3 Data analysis
The recorded sound files were manually transcribed and then force aligned in WebMAUS, using a grapheme to phoneme conversion, with a parameter model based on SAMPA (Kisler et al., 2017). After the forced alignment, TextGrids with three tiers were obtained, from which two were kept, one for the orthographic word and a second for phones. Subsequently, in three additional tiers, target APs, position, and tones were identified. Tones and syllable boundaries of target tokens were marked manually. Figure 4 shows the points that were labeled for tones. Since automatic forced alignment is not optimised for Lifou French, manual correction was required. All utterances were visually inspected in Praat 6.0.48 (Boersma & Weenink, 2017) and the segmentation of phones in target tokens was corrected when necessary. During the correction of segmental alignment, special attention was paid to the setting of phone boundaries. A boundary was set between vowels and nasals, laterals or approximants at the point where sudden changes in both amplitude and formant structure occurred. In case the change in formant structure was gradual, the segment boundaries were marked at the midpoint of the transition from vowel to liquid or approximant. For obstruents, the start of closure was marked as the onset and the start of high amplitude periodicity was marked as the onset of the next vowel (Harrington, 2010). Pauses that were inserted after the target tokens were also identified and marked.

F0 schematisation of the accentual phrase and labels used for marking the data. Taken from Welby (2003).
A hierarchical database was constructed using the EMU Speech Database Management System (Winkelmann et al., 2017a). The durational characteristics of tones and values for F0 were queried and analyzed using the emuR package in R (Winkelmann et al., 2017b; R Core Team, 2017). Due to misspellings, disfluencies within the target APs or pitch track errors, 11% of the data had to be discarded. This study reports results from 302 APs.
The experiment included two positions for the target tokens, sentence initial (si) and sentence medial (sm). The segmental points chosen to measure the alignment of tones were determined through statistical analyses explained in 2.3.1. To investigate tonal alignment of H*, only tokens which ended on CV or CVC-sonorant syllables were considered. Measurements of F0 were taken for the tonal targets L1 and Hi of the initial rise, as well as the low tone in the late rise L2 and the tonal target H*. To further examine the two rises, following Welby and Loevenbruck (2006), pitch excursion was measured as the difference of F0 in Hz between the L and the following H tone. Rise time was calculated as the time difference between the two tones of the (a) initial L1Hi and (b) final rise L2H*. The slope was calculated with the formula m = yb – ya / xb – xa, where ya is the F0 value of L1, yb is the F0 value of Hi, xa is the time of L1, and xb is the time of the Hi, from Welby (2006), also employed by Fougeron and Jun (1998) under the term “average velocity.” In other words, this formula divides the F0 excursion size of a rise by its rise time.
2.3.1 Statistical analyses
The Wilcoxon Signed-Rank test was used to determine whether the F0 peaks of Hi and H* are significantly different in the LHiLH* pattern. This method is a non-parametric statistical hypothesis test with which matched data from repeated observations of the same subject can be analyzed. This means that it can be determined whether the corresponding data distributions are identical without assuming them to follow a normal distribution. To examine which variables were the most reliable predictors of the alignment of tonal targets (L1, Hi, L2, and H*), stepwise regression analyses were performed, together with a cross-validation technique, similar to Welby (2006). We used 80% of the data as training data. The first step consisted of establishing the dependent variable which would account for most of the variance in the data, then the contribution of individual variables was tested. This method allows us to determine whether adding more independent variables significantly improves a model. Predictor variables were either continuous or binary, in the latter case, a dummy variable with two levels was coded (e.g., one and two, or yes and no). Only independent variables that were found to be statistically significant were retained in the model. To determine how accurately the model predicts the response we used the RMSE (root mean square error) and SI (scatter index) as measures of goodness of fit. The SI is a normalised value which is calculated by dividing the RMSE by the mean of the measured data. An SI value lower than one means the estimations are acceptable. To examine how much of the variability in F0 excursion size is predicted by rise time and investigate the slope, regression analyses and Pearson’s correlation tests were performed. P-values ⩽ 0.05 are considered significant. The software R (R Core Team, 2017) and the statistics package lme4 (Bates et al., 2015) were used to carry out statistical analyses.
2.4 Results
2.4.1 Accent patterns
Table 1 shows a summary with the percentage of tonal patterns found in Lifou French and Metropolitan French (Welby, 2003). These results are evidence in favour of the CANONICAL ACCENT PATTERN HYPOTHESIS which predicted that LHiLH* would be the most frequent pattern found, followed by L1H* as second most frequent. One possible reason for the higher percentage of LHiLH* patterns in the Lifou data could be the additionally inserted tokens with three and four syllables which were not part of the study by Welby (2003) and led to eight more possible APs with four to six syllables. Measurements taken for the early and late peaks show that Hi is lower (mean 226 Hz) than H* (mean 245 Hz). The Wilcoxon Signed-Rank test reveals this difference is significant (V = 4526, p < .0001). This result supports the PEAK HEIGHT HYPOTHESIS which predicted peak values for Hi and H* would be similar to those in the Metropolitan variety.
Realisation of tonal patterns in Metropolitan French (Welby, 2003) and Lifou French.
2.4.2 The initial rise
A detailed inspection of tonal alignment of the initial and final rise shows that more similarities are found. In our corpus, all APs were realised with an L1 tone which straddled the function word and content word boundary. All of these tones were included in the following analyses (N = 302). Figure 5 shows the alignment of the initial low is mostly situated at the boundary between function and content word, note that there are almost no differences in the median values between the two positions in the experiment. Two regression models tested whether the alignment of L1 was better described with respect to the left-edge of the AP or the left-edge of the function word immediately preceding the content word. Model A had LATENCY OF L1 FROM LEFT-EDGE of AP as dependent variable and DURATION AP as independent variable. The adjusted R2 was of 0.28 meaning that the model accounted for 28% of the variance in the data. Model B had LATENCY FROM LEFT-EDGE OF LAST FUNCTION WORD as dependent variable and DURATION OF LAST FUNCTION WORD as independent variable and FUNCTION WORD (one, two) as further variable. The adjusted R2 value of this model was of 0.48, meaning that it accounts for 48% of the variance in the data, of which 47% can be attributed to the duration of the last function word. The independent variable function word did not reach statistical significance (p = .1) and was therefore not retained in the model. The test data validated the model with an SI value of 0.24, which is lower than 1 and therefore acceptable. Table 3 summarises the coefficient values obtained for independent variables in the selected models as well as the RMSE and SI values that validate the same models. Results for L1 are in favour of the EARLY L ASSOCIATION HYPOTHESIS which predicts that the L1 tone is associated to the edge between the last function word and the first content word syllable of the AP.

Density plot of the early L1 tone in all patterns (L1HiLH*, L1H*, L1LH*, L1Hi, L1HiL*), dotted lines indicates the median according to position. A positive value in ms indicates the tone was realised after the function word and content word boundary. Realisation at 0 ms indicates that the tone was realised precisely at the boundary between the function word and the content word.
A fourth hypothesis predicts that the early peak in L1Hi is variant, therefore not anchored to a particular syllable and that it can not be reliably predicted based on a segmental landmark. Figure 6 shows the alignment of Hi (in patterns LHiLH*, LHiH*, LHi, and LHiL*) moves between syllables, with it falling 54% on the first and 46% on the second syllable of content words (N = 209). Three models (C, D, E) with two different dependent variables were used to determine whether the tone seeks to align with respect to the left-edge of the word initial syllable or of the AP. Two models (C, D) used LATENCY OF Hi FROM LEFT-EDGE OF WORD INITIAL SYLLABLE as the dependent variable. Model C used DURATION OF CONTENT WORD INITIAL SYLLABLE as independent variable. The adjusted R2 of model C was of 0.25, indicating that it only explained 25% of the data. Model D used DURATION OF CONTENT word as independent variable and provided a more reliable but still not satisfactory result, with an adjusted R2 value of 0.33, showing that it accounts for 33% of the variance in the data. Since the peak in Hi does not seem to be anchored to a fixed distance from the onset of the content word, we proceed to examine whether the AP is a better predictor. Model C used the LATENCY OF Hi FROM LEFT-EDGE OF AP as dependent variable, together with DURATION OF AP and FUNCTION WORD (one, two) and the dummy variable TRISYLLABLE (yes, no) as independent variables. For model E the adjusted R2 accounted for 53% of the data, of which 43% can be attributed to the duration of the AP. The independent variables function word (p < .0001) and the dummy variable trisyllable (p = .02) were statistically significant and therefore retained. Other dummy variables did not improve the model. The test data validated model E with an SI value of 0.19. This indicates that the position of Hi is more likely to depend on the duration of the AP in ms than on a fixed segmental landmark. Additionally, the results indicate that the number of syllables in the AP also has an influence with an alignment of Hi occurring approx. 58 ms later when two function words precede the content word. Considering that the Hi peak is realised on the first and second syllables, and that the duration of the AP is the best predictor of its placement, it seems plausible to conclude that the VARIANT EARLY HI SEGMENTAL ANCHORING HYPOTHESIS is applicable.

Density plot of the early Hi tone from the LHiLH*, LHiH*, and LHiL* patterns, dotted lines indicate the median. A positive value in ms indicates the tone was realised after the first and second syllable boundary of the content word. Realisation at 0 ms indicates that the tone was realised precisely at the boundary between the two syllables.
An investigation of the slope in initial rises was conducted to examine whether this factor appears to be variant, as predicted in H5. Table 2 shows the mean slopes and standard deviation for speakers of Lifou and Metropolitan French (Welby, 2006). To assess whether speakers seek to achieve a constant slope a regression analysis was performed. The aim was to establish how much of the variability in F0 excursion size could be predicted by rise time. In case the slope is constant, longer rise times should be correlated with larger F0 excursions. Results show that rise time did not predict the F0 excursion reliably since the regression analysis only accounted for 10% of the variance. Additionally, only a low correlation of (r = 0.33, df = 196) between F0 excursion and rise time could be established. Regarding the rise time of L1Hi, Welby (2006) reports a fair amount of variation reaching from under 50 to nearly 300 ms. Also in Lifou we can establish variation ranging from 70 to up to 391 ms which indicates that the rise time in Lifou French is slower. The data appears in favour of the VARIANT EARLY RISE CONSTANT SLOPE HYPOTHESIS.
Mean slopes for the early and late rise in Lifou and Metropolitan French (Welby, 2006). Standard deviations are given in parentheses.
Coefficients of selected models obtained using the training data and values for goodness of fit (RMSE and SI) obtained using the test data.
2.4.3 The final rise
In H6 the prediction is that the tonal alignment of L2 is variant and the tone is not anchored to any specific segmental landmark. L2 tones found in the patterns LHiL2H* and LL2H* are included in the following analyses (N = 204). As shown in Figure 7, the tone is mostly realised in a time frame close to the left edge of the final syllable. It should be noted that the alignment of the L2 tone is not only in the final (74%) but also in a preceding syllable (26%). Welby (2006) did not find a reliable measure for the alignment of L2 and states that for some speakers in her study, the latency from the left edge, for others, the latency from right edge of the AP-final syllable was a better measure. The models F, G, and H used three different dependent variables and sought to determine which is the best segmental landmark to predict the alignment of L2. Model F used LATENCY OF L2 FROM RIGHT EDGE OF FINAL SYLLABLE as dependent variable and duration final syllable as independent variable. The adjusted R2 shows that the model accounts for 32% of the data. Model G used latency of L2 from left edge of the final syllable as dependent variable and duration final syllable as independent variable. This model only accounts for 24% of variance in the data. Model H USED LATENCY OF L2 FROM RIGHT EDGE OF THE WORD INITIAL SYLLABLE as dependent variable and CONTENT WORD DURATION as independent variable plus the two dummy variables DISYLLABLE (yes, no) and TRISYLLABLE (yes, no), to account for number of syllables in the word. The adjusted R2 shows that this model accounts for 68% of the variance of which 49% can be attributed to the duration of the content word. The predictor variables disyllable (p < .0001) and trisyllable (p < .0001) were statistically highly significant. The test data validated model H with an SI value of 0.3. Since the dummy variable TETRASYLLABLE did not improve the model we find that the realisation of the L2 in short content words (disyllables and trisyllables) depends on the duration of the word whereas it is frequent in words with four or more syllables which show longer duration values anyway. This result supports the VARIANT LATE L ASSOCIATED TONE HYPOTHESIS.

Density plots of the latency of late L2 tone from the LHiL2H* and LL2H* patterns. A positive value in ms indicates the tone was realised past the right edge of the initial syllable of the content word. Realisation at 0 ms indicates that the tone was realised precisely at the boundary between the first and second syllables.
The timing of the peak of the late rise is compared for the patterns LHiLH*, LLH*, and LH* (N = 260). Figure 8 shows the H* peak is affected by whether or not there are tones other than L1 in the AP, with it being realised earlier in the syllable when only L1 precedes H*. It also shows that the insertion of a pause causes the peak to be realised later. Two models (J, K) investigated the alignment of H*. Model J had LATENCY OF H* FROM RIGHT EDGE OF FINAL SYLLABLE as dependent variable and duration of final syllable as independent variable. The adjusted R2 of this model was of only 0.002 meaning that it accounted for less than 1% of the variance. Model K had LATENCY OF H* FROM LEFT EDGE OF FINAL SYLLABLE as dependent variable and DURATION OF FINAL SYLLABLE together with pause (yes, no) and rise (only, final) as independent variables. The dummy variable rise was created putting together rises from LHiL2H*, LL2H* into the category final, and rises in L1H* into only. The adjusted R2 shows that the model explains 75% of the variance of which 70% can be attributed to the duration of the final syllable. Since the independent variables pause (p = .009) and rise (p = .05) reached statistical significance, they were kept in the model which was validated with an SI value of 0.17 in the test data. Results reveal that the peak was realised earlier when no pause followed after the token. For the LH* pattern, the peak was realised, in average, at 181 ms from the left edge of the final syllable, when no pause followed and at 281 ms in pre-pausal items. In contrast to the Metropolitan experiment, in the present data there are no cases of peaks being realised past the offset of the last syllable of the AP. Although this only represented a very small proportion in the Metropolitan data (3%). These results suggest the LATE H ASSOCIATED TONE HYPOTHESIS can be confirmed and that the final peak is aligned to the final syllable. For H8, the slope was calculated to investigate whether it is constant and whether the F0 excursion can be predicted by the rise time. Mean values and standard deviations are in Table 2. The timing of the rise shows here too variation that ranges from 77 to 697 ms. A regression analysis between F0 excursion and rise time shows that the model only accounts for 23% of the variance. The correlation test between the same variables shows a correlation of 0.48 which is moderate and higher than that of the initial rise. The VARIANT LATE RISE CONSTANT SLOPE HYPOTHESIS is palpable, however, it seems the final rise is less variable than the initial rise.

Density plots of the late H* tone. The left panel shows contrast between tokens with a final rise (LHiLH* and LH*) and only one rise (LH*). The right panel shows H* realised prior to a pause (+Pause) and with no following pause (-Pause), dotted lines indicate the median. A negative value in ms indicates the tone was realised before the right boundary of the content word. Realisation at 0 ms indicates that the tone was realised precisely at the end of the final syllable.
2.4.4 Speaking rate
Since differences in rise time could be established, it is conceivable that participants in Lifou might display a slower speaking rate than in the study by Welby (2006). Speaking rate was calculated by taking the duration of target APs in ms and dividing it by the number of phones uttered. The mean phone duration was used to arrive at an approximate number of phones uttered per second. We could establish that speakers in Lifou produced in average 10.3 to 12.5 phones/second which is less phones than speakers in the Metropolitan study produced per second at normal speech rate (11.5 to 14.2). This suggests speaking rate is slower in Lifou French.
2.5 Summary
This first experiment sought to provide a more detailed investigation of the intonational phonology of Lifou French. It was the aim to test whether, as suggested by Torres et al. (2018a), the AP shares the same tonal properties as those reported for the Metropolitan variety. Results indicate that the same tonal targets are found in both varieties and that these show comparable tune-to-text alignment. It was established that in Lifou French the initial L1 tone is edge seeking and occurred close to the boundary between the function word and the following content word. Meanwhile the Hi (of L1Hi) represents a variant tone, which is most reliably predicted by the duration of the AP and which can be associated to the the first or second syllables of the content word of the AP. Neither for the initial nor for the final rise the rise time was a reliable predictor of the excursion size, which in turn is in agreement with previous observations in the Metropolitan variety. However, our data suggests that the rise time is slower in Lifou French, which could contribute to a perceived difference between the two varieties and be an indicator of regional variation. Further inspection of speaking rate suggests that Lifou French speakers produce in average less phones per second, which is in line with the slower rise time. One interesting, although perhaps marginal, observation is that no H* tone was realised past the offset of the final syllable. We also noted that the insertion of a pause coincided with a later placement of the peak. Finally, the majority of L2 tones (namely 74%) were found in the final syllable, which suggests a preference for a realisation of L2H* within the last syllable of the AP. We could establish that the duration of the content word and number of syllables (two-three syllables) were the best predictors for the realisation of L2. These results suggest tonal targets in French are stable across varieties and fine differences in their implementation such as rise time and speaking rate are likely to have an influence on perceived rhythmic differences.
3 Experiment 2
Experiment 1 showed that the tonal realisation in Lifou French is very close to that of Metropolitan French and allows us to predict that short APs will be produced with a H* peak at the right edge. In experiment 2 we are interested in evaluating whether or not the intermediate phrase constitutes a further level in the prosodic hierarchy in Lifou French. Recent investigations of Metropolitan speakers in southern France find evidence for the an ip level (D’Imperio & Michelas, 2014; Michelas & D’Imperio, 2012). It is argued that differences at the right boundary of constituents which are related to a rescaling of F0 and pre-boundary lengthening are involved in the demarcation of the ip in contrast to the AP. Similarly, an exploratory study on Lifou French finds evidence for internal restructuring of scaling processes within the IP (Torres et al., 2019). Experiment 2 seeks to investigate whether further evidence in favour of the ip is found in Lifou French and, if so, to determine which is the role of F0 in demarcating the right boundary at this level. We predict that in case there is a further prosodic level between the AP and the IP, the right boundary of the ip should be phonetically distinguishable from boundaries at a lower and a higher level. Since speech rate can constrain prosodic phrasing, leading for example to modifications in the shape of F0 or a lowering of peaks (Fougeron & Jun, 1998), we additionally tested the effect of speech rate on tonal scaling. This allows us to identify how robust the insertion of an ip boundary is across rates. Experiment 2 was realised in two phases during two separate field work trips in Lifou carried out in 2018. In both cases the experiment aimed at investigating phrasing in Lifou French and the structure of the experiment and materials remains the same, while the stimuli presented to the participants were different in both sessions. All phrases used as stimuli during the first and the second elicitation are listed in Appendix B.
3.1 Hypotheses
Based on findings from experiment 1, it can be predicted that APs, like the ones used in this experiment, will be produced with the patterns previously discussed, for example, L1(HiL2)H*. Since we are interested in scaling processes, our analyses will focus on the peak height of the H* tone of APs. Taking into consideration previous observations made on Metropolitan and Lifou French (D’Imperio & Michelas, 2014; Torres et al., 2019), the following four hypotheses will be discussed:
H1 COMPLETE RESET HYPOTHESIS: In French, it could be established that a major syntactic break between an NP and a VP triggers complete reset of the (LH*) rising tone of the AP immediately placed at this major break. Therefore, it is hypothesised that the pitch level of a LH* rising accent of the right most peak of a complex NP is scaled at the level or close to that of the utterance initial peak (D’Imperio & Michelas, 2014). In this case, the prediction for the ratio value for the two peaks should be = 1.
H2 DOWNSTEP BLOCKING HYPOTHESIS: Another variation in pitch scaling that has been associated with a demarcation of a further prosodic level in French is downstep blocking of F0. Evidence was found that the F0 level of peaks within the ip was held relatively constant. More precisely, it was found that the pitch level of a LH* rising accent immediately placed at the NP|VP break is scaled to the same level as the peak preceding it within the ip (Torres et al., 2019). In this case, we predict that the right most peak within a complex NP will be scaled to the same level as the preceding peak. This means the ratio value for peaks at the NP|VP break and the preceding AP should be = 1.
H3 DOWNSTEP HYPOTHESIS: According to observations in French, in a sequence of APs [[AP][AP][AP]]IP where no major syntactic break or focus is marked, continuous downstep of F0 should be found throughout the whole IP. This means the pitch level of a peak at an AP boundary is downstepped relative to the peak of the preceding AP. In agreement with previous observations on Lifou French, we hypothesise that APs at a syntactic break will not show this behavior but rather that APs within the ip will show downstep. This means that peaks placed within a verb or noun phrase should be clearly downstepped relative to a preceding AP. In this case, the ratio value between the peaks within the VP and NP peaks should be < 1.
H4 PRE-PAUSAL PITCH RESET: Similarly to the Metropolitan variety it is expected that a H tone which is accompanied by the insertion of a pause will be related to the demarcation of the IP. It is hypothesised that this H% will be stronger than that of IP internal peaks.
3.2 Methods
3.2.1 Participants
In total 21 participants, 12 female and nine male teenage (14–20 years old) bilingual speakers of French and Drehu took part in the experiment. Participants responded to an adapted version of the sociolinguistic questionnaire BLP (Birdsong et al., 2012). They reported using both languages on a daily basis, when being raised, and that they live in a bilingual French/Drehu-speaking household. Only speakers who always had New Caledonia as their main residence place and were exposed to Drehu were included in the study. 6 Two participants reported having passive knowledge of Iaai and Nengone, two Kanak languages from the neighboring islands Ouvéa and Maré. All participants in the present experiment are enrolled at a local Lycée in Lifou and have gone through the French Metropolitan education system that has been implemented in New Caledonia.
3.2.2 Materials
This experiment used a set of written sentences as stimuli that were adapted to the regional context of Lifou. Elicitation materials consisted of 44 utterances (20 in the first and 24 in the second elicitation) that were separated into a 2 or 3 AP-condition. In the 2 AP-condition a noun phrase that was made up of two NPs was followed by a verb phrase, while in the 3 AP-condition a larger NP made up of three NPs was followed by a verb phrase. This was done in order to test whether the syntactic break between the NP and the VP would have an effect on scaling processes. The syntactic structure and corresponding intonational make up of phrases in the 2 and 3 AP-conditions are here explained:
The target tokens were APs that in most cases contained three syllables (apart from three tokens that had 2 or 4 syllables). All target APs always ended on a CV syllable, containing a vowel which was preceded by a voiced consonant (/b/ /d/ /g/ /l/ /m/ /n/ /ʁ/ /z/ ). Materials for the first data collection were adapted from D’Imperio and Michelas (2014) and checked for comprehensibility prior to elicitation. Examples 5 and 6 show both conditions of sentences whereby AP1 stands for first AP, AP2 for second, AP.Fs for final short (2 AP-condition), AP.Fl for final long (3 AP-condition) and Hpb for post boundary H*. All utterances used as stimuli are listed in Appendix B.
(5) [La mamie]AP.1 [de Rémy]AP.Fs [demandait]Hpb Bruno.
Remy’s grandma asked for Bruno.
(6) [La mamie]AP.1 [des amis]AP.2 [de Rémy]AP.Fl [demandait]Hpb Bruno.
Remy’s friend’s grandma asked for Bruno.
3.2.3 Procedure
Recordings were carried out in a quiet office at Lifou’s high school. Participants were recorded in individual sessions, at a sampling rate of 44.1 or 48kHz 7 and 16-bit depth, using a Zoom H6 Handy recorder and a head mounted-microphone. The first author carried out the experiment, gave the necessary instructions to participants and responded to questions they had. Participants were instructed to first read aloud at a self selected normal speech rate and then after a short break read again at fast speech rate. They had time to familiarise themselves with the task and read the material before being recorded. Stimuli were visually presented in slides on a PC and the order in which the utterances appeared was randomised. For the elicitation at normal speech rate each utterance appeared alone on one slide, with no line breaks. For the recording at fast speech rate the order of the utterances was randomised again and the phrases appeared in sets of five or four on each slide.
3.3 Data analysis
The same procedures as in section 2.3 were followed. As exemplified in Figure 9, the target APs, position, and H tones were manually annotated (Jun & Fougeron, 2002). A hierarchical database was constructed using the EMU Speech Database Management System (Winkelmann et al., 2017a). It included tiers for the tones, syllables, words, and target AP position.

Waveform and F0 trace of the utterance La mamie des amis de Rémy demandait Bruno “Remy’s friends’ grandma was asking for Bruno.”
Values for the H* tones of the target APs were extracted in emuR (Boersma & Weenink, 2017; R Core Team, 2017; Winkelmann et al., 2017b). To provide a psycho-acoustic analysis the frequency measured in Hz was converted into semitones. The formula employed for the semitone conversion employs a base-21/12 (this is the equivalent of 12 times the base-2 logarithm) (Nolan, 2003). It is expected that male and female speakers will have different pitch ranges due to differences in the size of the vocal tract. However, it is expected that scaling processes will show the same trends across all speakers. To examine the relationship between peaks and the position in the utterance ratio values of H* peak means were calculated in semitones. A ratio of one between two peaks indicates that the tonal target is not downstepped relative to each other, if one value is higher (> 1), then that peak is set to a higher level.
3.3.1 Statistical analyses
Paired Wilcoxon tests were performed to compare the speaking rate for all participants. Linear mixed effects models investigate ratio values measured for each condition and whether (a) the position of the AP, (b) the insertion of a pause, (c) the change in speech rate, and (d) the sex of participants have an influence on the height of peaks. Statistical analyses were carried out in R (R Core Team, 2017) with help of the packages lme4 and emmeans (Bates et al., 2015; Lenth & Herve, 2019). Automatic backward model selection was employed to arrive at a final model and obtain significance values. Additionally, estimated marginal means were obtained for factor comparisons and p-values were adjusted with the Tukey method. Random effects included random intercepts for speakers and words, as well as by-speaker random slopes for the effect of speech rate.
3.4 Results
3.4.1 Speaking rate
Participants were instructed to read the stimuli at a self selected normal and fast speech rate. Since the utterances used are very similar to those in D’Imperio and Michelas (2014) we can compare between both studies relatively well. Note that speakers in Lifou are younger (14–20 years old) than participants in the Metropolitan experiment (24–34 years old) and also that more participants were recorded in Lifou, namely 21 compared to nine. Average number of syllables produced per second was calculated using the duration values of target APs in ms divided by the number of syllables in the same AP. The mean syllable duration was used to arrive at an estimate of produced syllables per second. Paired Wilcoxon tests were employed to examine whether participants had succeeded in increasing the speed between rates for this measure. Table 4 summarises the average number of syllables produced per second at two speech rates and provides the results for statistical analyses performed for all speakers. Similar to observations in experiment 1, participants in Lifou show a slower speaking rate. The slowest speaker (FI) produced 5.1 syllables per second and the fastest (BR) 8.2 at fast speech rate. In the study by D’Imperio and Michelas (2014), the slowest speaker produced 7.2 and the fastest 9.8 syllables per second, at fast speech rate. Note that from 21 participants, three speakers failed to increase the number of syllables produced per target AP (AL, FI, IQ). Although most speakers managed to increase the speaking rate, in average, they did so by 14.9% while the Metropolitan speakers increased the rate by 28.4%.
Average syllable duration per second for normal and fast speech rate in target Accentual Phrases. The increment from normal to fast is given in percentage and paired Wilcoxon tests were used for statistical comparison providing p-values.
3.4.2 F0 peaks
Ratio values for relevant peak comparisons were calculated using semitone measures. Values were obtained by speaker, rate, and condition. Table 5 provides the mean values obtained for ratio comparisons in the 2 AP and 3 AP-conditions. The mean ratio between AP.Fs and AP1 measured for female and male speakers (= 0.97) shows that the AP.Fs peak is realised at a level close to the utterance initial H* in the 2 AP-condition. A comparison between AP.Fl and AP1 (= 0.95) shows that also in this case the peak associated with the syntactic break is close to the utterance initial H* in the 3 AP-condition. These ratio values show that peaks at the NP|VP syntactic break are set to a level close to that of the utterance initial peak in AP1. Although the F0 level is not set at the exact same level, these results could suggest that a moderate version of the Complete reset hypothesis could apply. However, this might not be the most suitable analysis for the observed scaling.
Calculated ratio values of right edge H* peaks of APs in two conditions.
Another mechanism linked to the restructuring of prosodic levels in French is downstep blocking. Figure 10 shows the peak height in semitones for the right most peaks of the NP and the preceding peaks in the 2 AP and 3 AP-condition. This graph suggests that peaks at the NP|VP break are closely scaled to preceding peaks within the same NP. To evaluate whether the peaks at the NP|VP break are scaled to the level of their respectively preceding peaks the ratio values between the right most peak of the complex-NP boundary and the preceding APs were measured.

Peak values for H* in semitones of female speakers for APs occurring within the NP and at the NP|VP break in two conditions. The graph includes tokens from the two elicited speech rates.
Table 5 shows that for this comparison, in the 2 AP-condition, the ratio is of 0.97, while in the 3 AP-condition it is of 0.96 and 0.99. In order to test the downstep blocking hypothesis ratio values of peaks were fitted to a linear mixed effects model (N = 2800). The initial model had ratio (AP.Fs/AP1, AP.Fs/Hpb, AP.Fl/AP.1, AP.Fl/AP2, AP.Fl/Hpb), rate (normal, fast), and condition (2 AP, 3 AP) as fixed factors together with speaker as random intercept and by-speaker random slopes for the effect of rate. The backward model selection retained the fixed factors ratio and condition together with the random factor and slopes. Table 6 provides the results for the Tukey corrected factor comparisons. The statistical analyses reveal that values for peaks at the syntactic break and the utterance initial H* (AP.Fs/AP1 and AP.Fl/AP1) are not significantly different which suggests a comparable level of H* at the NP|VP break is found for both conditions. However, note that in the 3 AP-condition the difference between the ratio values of AP.Fl/AP1 and AP.Fl/AP2 is statistically significant, which does not support the downstep blocking hypothesis. This difference can be expected in case dowstep blocking does not apply across the entire NP. Instead, downstep blocking could be specifically targeting one peak, namely the NP right most one.
Results of Tukey corrected factor comparisons between ratio values.
If the ratio values across conditions are compared for peaks at the syntactic NP|VP break and the H* immediately preceding them (AP.Fs/AP1 and AP.Fl/AP2) we have a more robust result, confirming that these peaks do not differ significantly. These results suggest that downstep blocking applies to the peak at the syntactic break (AP.Fs and AP.Fl) but not across the entire ip. Additionally, if there was a continuous downstep trend across the utterance, ratio values between AP.Fs/AP1 and AP.Fl/AP2 should have been different since AP.Fl occurs later in the utterance.
Figure 11 summarises the ratio values calculated for the 2 AP and 3 AP-conditions at two speech rates. It shows that the ratio values between the peak at the NP|VP break and the subsequent peak in the VP are higher which means that peaks in AP.Fs and AP.Fl are set to a considerably higher level than peaks in Hpb. The Tukey corrected factor comparisons show that differences of ratio values in AP.Fl/Hpb and AP.Fs/Hpb are not statistically significant. Again, if there was a continuous downstep trend across utterances, we would expect ratio values between AP.Fl/Hpb and AP.Fs/Hpb to be significantly different, since Hpb occurs later in the 3 AP-condition. Taken together, results from statistical analyses suggest that peaks located at the NP|VP break are scaled to the level of the immediately preceding peak in the ip and are affected by downstep blocking whereas peaks following thereafter (Hpb) show downstep.

Ratio values of H* peaks at two speech rates, normal and fast. To the left is the 2 AP and to the right the 3 AP-condition.
Figure 12 shows a comparison between peaks of APs at the major syntactic break in the 2 AP and 3 AP-conditions. Downstep was predicted across peaks of subsequent APs in cases where no ip boundary marking would be triggered through the NP|VP break.

Mean peak values for H* in semitones of male speakers in two subsequent APs within an IP. The first break shows the peaks in the 2 AP-condition while the second break shows peaks in the 3 AP-condition. The graph includes tokens from the two elicited speech rates.
To examine whether there is downstep after the syntactic break, the ratio values were measured between AP.Fs and the peak in the following AP, namely Hpb. Here, the mean ratio value is of 1.1. A similar result is obtained between AP.Fl and the peak in the following peak Hpb, which is 1.06, in this case. To statistically investigate the H* of APs, a linear mixed effects model was used, it had position (AP1, AP.Fs, Hpb and AP1, AP2, AP.Fl, Hpb), speech rate (normal, fast), sex (male, female), condition (2 AP, 3 AP) together with speaker and token word as random factors and by-speaker random slopes for the effect of speech rate (N = 3373). This model only included peaks which were not preceded by a pause. The backward model selection retained the fixed factors position, and sex as well as the random effects.
Results of the Tukey corrected factor comparisons for the 2 AP and 3 AP-condition are summarised in Table 7. The statistically significant difference between peaks at and after the NP|VP break indicate the downstep hypothesis applies to peaks realised after the complex-NP. Further evidence comes from the results shown in Table 6, which indicate that in the 2 AP-condition the ratio values between AP.Fs/Hpb and AP.Fs/AP1 are significantly different. Similarly, a comparison between the peaks in the 3 AP-condition and ratio values between AP.Fl/Hpb and AP.Fl/AP1 as well as between AP.Fl/Hpb and AP.Fl/AP2 are statistically significant. Taken together, the results discussed for Tables 7 and 6 provide supporting evidence for the downstep hypothesis which was predicted to apply on peaks following after the syntactic break (Hpb) and which occur within the phrase (AP2 relative to AP1 in the 3 AP-condition).
Results of Tukey corrected factor comparisons.
Finally, we were interested in investigating whether the insertion of a pause, the speaking rate, and the sex of the speakers had an effect on peak height. A third linear mixed effects model was employed to examine the H* peaks in the corpus, this time including pre-pausal items. The model had pause (yes, no), position (AP1, AP.Fs, Hpb and AP1, AP2, AP.Fl, Hpb), speech rate (normal, fast), sex (male, female), plus speaker and token word as random factors and by-speaker random slopes for the effect of speech rate (N = 3556). Table 8 summarises the results for the three investigated factors. Although of small magnitude, results show that the insertion of a pause after APs, causes the tones to be significantly higher than other peaks. The raising of F0 in this particular case can be associated with the demarcation of a higher prosodic level, namely the IP. Further, we could not establish a signifiant difference of pitch height across speech rates which indicates that speakers seek to reach the same tonal height despite the adjustment. Finally, as it is to expect, we find a significant difference between the peaks of female and male speakers, with males having a lower range. As predicted, the trends in scaling processes are stable for male and female speakers despite differences in F0 range.
Results of linear mixed effects model.
3.5 Summary
Experiment 2 aimed at investigating the right boundary of APs to evaluate whether there are phonetic cues distinguishing between three different prosodic levels. First, the speaking rate in syllables per second was calculated to measure the time used articulating syllables in target APs. A comparison of speaking rate values suggests that Lifou French speakers have a slower rate than Metropolitan speakers, which is in agreement with our observations in experiment 1. Based on previous findings (Torres et al., 2019), it was hypothesised that prosodic phrasing in Lifou French would differ in its phonetic realisation in comparison to the Metropolitan variety. It could be established that downstep blocking plays a role in the demarcation of an AP right boundary found at major syntactic break between a noun phrase and a verb phrase. Moreover, downstep blocking was found to affect the peak at the NP|VP break and not all peaks in the noun phrase. This indicates that downstep blocking applies to the AP associated with the ip boundary. Additional evidence supporting pitch scaling effects is found at the right boundary of APs that occur within the noun or verb phrase. In these cases it was found that there is a small degree of downstep and that peaks were lowered. This is in line with previous findings (D’Imperio & Michelas, 2014) claiming that the insertion of an ip boundary is associated with a H-tone which demarcates this prosodic level. However, we do not find that the peak at the ip boundary is reset relative to the level of the ip-initial peak but rather that continuous downstep is blocked. Moreover, we were interested in investigating whether pre-pausal peaks would be influenced by the insertion of a pause and found evidence that the peak height is indeed significantly affected. This shows that the insertion of a pause causes the H* to be higher, which can be associated with the demarcation of a higher prosodic level, namely the IP.
4 Discussion and Conclusion
Results from two experiments showed that Lifou French shares basic elements with the intonational phonology of Metropolitan French and that these elements are subject to fine grained phonetic variation regarding the implementation of rise time, speaking rate, and scaling of F0. Basic elements of the French intonational phonology refers here to the tonal structure of the AP, the tonal targets it includes, and how they are aligned in relation to segmental landmarks. The AP is the lowest tonally marked constituent in Lifou French as well as in Metropolitan French and can be canonically described with the notation /LHiLH*/. No striking differences were found in the tune-to-text alignment of the initial and late rise which for Lifou French is similarly variable as it is for the Metropolitan variety. Although a slight preference to realise the final rise on the last syllable of the AP was observed for Lifou French, there was still some degree of variability in the data. The examination of tonal alignment of Metropolitan French led to propose the notion of anchorage region (Welby & Loevenbruck, 2006), a concept that holds for the here studied variety and is preferable to predictions based on the strict segmental anchoring hypothesis for the study of French varieties. As described by Welby (2006), it could be established that the L1 in the initial rise and the H* tone of the late rise are tied to a region of the AP and not to one specific segment. More precisely, the L1 is an edge seeking tone that is most frequently realised at the boundary of a function word and a following content word of the AP. The H* tone is consistently aligned with the last full syllable of the AP and does not cross over to a subsequent AP, in contrast to what has been noted for the Metropolitan variety. A similar stability of these tones, in relation to syllabic landmarks, had been noted for Vaudois French (Sertling-Miller, 2007). The here presented results as well as the study on Vaudois French support the idea that French intonational phonology and its tonal primitives are stable across different varieties. We could establish that the best predictor of the variable tones Hi and L2 is the duration of the AP or content word, which suggests that these tones are inserted for rhythmic reasons but are not further tied to segmental landmarks. Additionally, Hi and L2 can be undershot, as known from other French varieties.
French speakers seem aware of dialectal variation and often refer to noticeable “melodic” differences which we should address. Similar to previous observations of Swiss French (Schwab & Avanzi, 2015), results from experiment 1 suggest that Lifou French speakers show a slower speaking rate than Metropolitan speakers and this observation was further confirmed in experiment 2. Considering that French pitch accents differ from Germanic pitch accents in that they have different prosodic functions and do not show the same variability in their make up but remain constant, we can expect to find differences in the phonetic implementation of tonal targets. More precisely, it is conceivable that differences in tempo cause a slower rise time, between a low and a high tonal target, leading to a perceived divergent intonation.
The aim of experiment 2 was to investigate the right boundary marking of prosodic constituents and evaluate how they relate to the French prosodic hierarchy. It could be established that the insertion of a pause causes pre-pausal peaks to be higher which can be associated with the demarcation of the IP. Our results also demonstrate that there is a restructuring of peaks associated with the intermediate phrase level. Although evidence is found for this prosodic level, intonationally the ip in Lifou French is marked in a different way than in the Metropolitan variety. First, there is less variation between peaks in Lifou French, and secondly, we do not observe complete pitch reset. Instead, the demarcation of the ip is realised through downstep blocking at the ip boundary, which hinders the otherwise expected recursive downstep. Further evidence is found in the speech stream following after the ip boundary, which is set to a lower register and clearly dowstepped relative to the preceding pitch level. Moreover, APs within the same noun or verb phrase that do not coincide with the ip boundary are affected by downstep. In terms of variability, the observed scaling could signal a different rhythmicity to Metropolitan French listeners who could expect more pronounced tonal movements. This is because in the Metropolitan variety the ip boundary is marked through reset and this involves a higher pitch level of the peak at the ip boundary when compared to its preceding peak. Since in Lifou French these two peaks are scaled to the same level, this could be perceived as a difference in the scaling of F0 then associated with a divergent rhythmicity. Similar to Metropolitan speakers from southern France, we found evidence that pitch height of peaks is stable across speech rates (D’Imperio & Michelas, 2014) which was not observed for speakers from Paris, who produced lower F0 peaks at fast speech rate (Fougeron & Jun, 1998), although the latter study did not investigate the ip systematically.
Our results confirm an initial hypothesis predicting regional variation in relation to prosodic phrasing. Moreover, we find evidence that suggests Lifou French speakers show a slower tempo than Metropolitan speakers. Arguably, the attested scaling of F0 and tempo described for Lifou French contribute together and mark the characteristic intonation of this variety. This study adds to our knowledge on dialectal variation and shows that also for a language like French, intonation represents a regional marker. Since the bitonal LH* French pitch accent does not vary in its composition as a tune, only an acoustic investigation of subsequent peaks within the Intonation Phrase has revealed that phonetic realisation is the source of moderate variation. It would be of interest to test perceptually how strongly Lifou and Metropolitan French speakers perceive the here described ip boundary.
This study is part of a larger project investigating prosody in Lifou French and Drehu. Findings on the intonational structure of Drehu suggest that there is phrasal prominence marking which resembles that of French intonational phonology. Evidence was found for a low tone demarcating the left edge and a high tone at the right edge of the prosodic word in Drehu (Torres & Fletcher, 2020). There are two possible scenarios regarding these similarities: first, the rather recent but strong influence of French could be having an effect on Drehu intonational phonology, leading to convergence; second, there is reason to assume the original impressionistic description of lexical stress in Drehu was not correct. Further investigation is required to validate either hypothesis. It would be of interest to examine more in detail prosodic realisations of the bilinguals and test, for example, whether there are differences between the two languages in register level or in the perception of prominence.
Footnotes
Appendices
Authors’ Note
Catalina Torres, Janet Fletcher and Gillian Wigglesworth are also affiliated with ARC Centre of Excellence for the Dynamics of Language.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was conducted with support from the ARC Centre of Excellence for the Dynamics of Language (Project ID: CE140100041).
