The interactive relationship between prosody and respiration of computer BIOPAC systems: Shanghai dialect and mandarin

Abstract

This paper uses the computer BIOPAC Systems tool to analyze Shanghai dialect and Mandarin which refers to the relationship between prosody and respiration in reading fable, the conclusion are as follows: 1) the mean of respiratory parameters and respiratory units was positively related, and respiration curve on Shanghai dialect which shows the characteristics of small ups and downs is different from the Mandarin curve; 2) the reset of respiration has relationship with mute segment, and the occurrence of reset breathing place must have a quiet period, while the opposite does not happen; 3) on the situation of fable literary style with flexible feature, the text of the proficiency can significantly increase the complexity of the respiration curve, showing a more special features such as “breathless” pronunciation.

Keywords

Computer BIOPAC systems respiration Shanghai dialect mandarin reading fable

1. Introduction

Lieberman [1] “Breath-Group theory” pointed out that “Marked Breath-Group” is closely related to speech activities. B. Conrad and P. Schönle [2] analyzed the differences in breathing during four verbal tasks: spontaneous speech, reading, serial speech, and arithmetic, which under three kinds of speech states, namely speaking aloud, articulating subvocally, and quiet performance by trying to exclusively “think” the tasks. It is believed that there is a breathing continuum from calm breathing to verbal breathing. Peking University [3] combined respiratory physiological signal and rhythm research earlier. In recent years, scholars have discussed from the perspectives of different styles or corpora [4, 5, 6, 10], different language conditions and language tasks [7, 11] to different national languages and different dialects [8, 9, 12, 13, 14] to discuss the relationship between prosody and respiration.

In this paper, the Biopac multi-channel physiological instrument is used to collect the speech signals and the synchronized breathing signals of Shanghai dialect and Mandarin by expressing with allegorical style reading. Statistical analysis with R (3.03) to make a preliminary research on the interaction between the two.

2. Research methods structure of the paper

2.1 Speakers and experimental materials

In the experiment, 40 speakers (20 speakers in Shanghai dialect and 20 in Mandarin) were selected. The speakers were all university students, healthy, fluent in expression and moderate in speaking speed. The text is the fable “The North Wind and the Sun”. Due to the differences in dialect grammar, expression habits and other factors, the text length of Shanghai dialect and Mandarin is slightly different. The Mandarin text has a total of 186 syllables, and the Shanghai dialect text has a total of 171 syllables.

Figure 1.

Acqknowledge 4.2 four-channel signal acquisition example.

2.2 Data collection

The experimental acquisition equipment includes a multi-channel physiology instrument (BIOPAC Systems MP150), a laptop computer, a sound card (Avid Mbox1.0), and a microphone (AKG). The software is Acqknowledge 4.2 (AcqKnowledge acquisition and analysis software is an interactive and intuitive software that can view, measure, analyze, and transform data in real time. Simple pull-down menus and dialog boxes can be used to perform complex data collection, triggering, and analysis, as well as recording and analyzing physiological, behavioral, and subjective response data. Its application of analysis automation tools can save processing time and routine analysis processes. It can record and label data files, easily label graphics, and output multiple file formats, including AcqKnowledge graphics, Excel, MATLAB, Text, and so on), which is the supporting software of the multi-channel physiology instrument. In order to facilitate future research, the experiment synchronously collects four channels of signals (see Fig. 1): the first channel (Ch1) is the voice signal collected by microphone; the second channel (Ch2) is the vocal signal collected by EGG; the third channel (Ch3) is the thoracic breathing signal (collected by TSD201); the fourth channel (Ch4) is the abdominal breathing signal. Unless otherwise specified, the research in this paper is limited to voice signals and thoracic respiration signals.

The voice signal sampling frequency is 44100 Hz by using the praat spectrum, and the respiration signal sampling frequency is 2500 Hz. After the acquisition, the four-channel data is saved in .acq format. The audio signal is finally saved in .wav format by the format converter.

2.3 Parameter setting

In terms of prosodic units, this paper refers to Zheng Qiuyu’s [15] “Hierarchical Organizational Structure of Discourse Prosody (HPG)”, and divides prosodic units into prosodic group (PG), intonational phrases (IPh), Prosodic Phrases (PPh) and Prosodic Words (PW) from large to small.

In terms of respiratory parameters, the main data include [16, 17]: inspiratory duration, inspiratory amplitude, inspiratory slope, expiratory duration, expiratory amplitude, expiratory slope, respiratory valley and respiratory peak.

3. Data extraction and analysis

3.1 Experiment of auditory discrimination of prosodic boundary

In order to determine the prosodic boundary, this study first conducted an auditory discrimination experiment. The mechanism used to select the speakers is as follows: Considering that every speaker has their own tone and habits, it’s amazing that the brain can quickly understand these changes. Moreover, while analyzing which vowels and consonants have changed, which words have been used, and how these words have been formed into phrases and sentences, the brain must also keep up with and understand these tonal changes, which all occur in milliseconds. According to the modern popular classification, modern Chinese can be integrated and divided into seven major dialects, namely, Mandarin, Cantonese, Wu, Hakka, Min, Xiang, and Gan.

The experimental data for listening and identification are Shanghai dialect (20 articles) and Mandarin (20 articles) fable reading materials “North Wind and the Sun” each. The subjects are 10 students with Shanghai dialect background and northern dialect background. The subjects are healthy and have normal hearing.

Experimental steps: 1. Distribute corpus texts with consistent word spacing and no punctuation, and the subjects should be familiar with the text. 2. Play a recording that is consistent with the background of the subjects’ native language. The recording is played back four times. When it is played for the first time, the listener needs to mark the longest pause in the sense of hearing and mark it as “////”. When playing for the second time, the listener will mark the place where the hearing delay is second to the longest between the first text marks and mark it as “///”. When playing for the third time, the listeners mark the place where the pause in the sense of hearing is second to the second longest between the text marks of the second time, and mark it as “//”. During the fourth playback, between the third text marks, the listeners mark the place where the hearing pause is second to the third longest, marked as “/”.

Each native subject identified 20 recordings, and a total of 200 Shanghai dialect and Mandarin results were obtained. After that, count the number of prosodic units marked for the first time by each native subject, and then calculate the arithmetic mean of the number of prosodic units, and the result is taken as an integer. Thereby, the final prosodic unit division scheme is determined.

The labeling of prosodic boundaries is done in Praat (see Fig. 2); the prosodic unit data at all levels is extracted through the Praat script (see Table 1 for an example).

Table 1
Example of saving results of prosodic unit data for a Mandarin speaker (PM1)

Speaker	Prosody level	Marker	Start time (s)	End time (s)	Duration (s)
PM1	Prosodic Sentence (PG)	1A	18.53677	22.42181	3.885
PM1	Intonation Phrase (IPh)	1B1	18.53677	19.11233	0.576
PM1	Prosodic Phrase (PPh)	1C2	19.51702	20.60069	1.084

Table 2

Example of respiratory data for a Shanghainese speaker (SM1)

Parameter	Beginning valley		Peak		End valley
	Time	Amplitude	Time	Amplitude	Time	Amplitude
Period 1	37.401	$-$ 7.375	38.343	6.182	41.024	$-$ 1.196
Period 2	41.024	$-$ 1.196	41.462	$-$ 0.432	42.990	$-$ 7.722
Period 3	42.990	$-$ 7.722	45.410	0.965	46.434	$-$ 4.067

Figure 2.

Praat prosodic level annotation example. (Notation description: A stands for prosodic group (PG), B stands for intonation phrase (IPh), C stands for prosodic phrase (PPh), D stands for prosodic word (PW), S stands for silent segment; the number before the English letter indicates that the prosodic unit is located in the the position of the discourse sentence. The number after the English letter indicates the position of the prosodic unit in the sentence. For example, 1B2 indicates the second intonation phrase in the first sentence; 1BS1 indicates the first silent segment between intonation phrases in the first sentence.)

3.2 Labeling and extraction of respiratory parameters

The absolute value of the respiration curve has obvious individual differences. In this paper, the method of respiration degree [18] is used to normalize the data.

Normalization processing formula: $H=(P-V\min)/(P\max-V\min)$

In the formula, $H$ represents the degree of respiration at point $P$ , $P$ represents the amplitude of the respiration curve at this point, $V$ min represents the minimum valley value of the entire respiratory curve of a subject, and $P$ max represents the maximum value in the entire respiratory curve of the subject. The $H$ value actually reflects the proportion of the amplitude of the $P$ point in the maximum amplitude of the entire respiratory curve. The larger the $H$ value, the greater the degree of breathing and the greater the breathing amplitude, and vice versa. The normalized respiration amplitude ( $H$ value) was divided into respiratory groups with a peak-to-valley difference of more than 0.5, respiratory segments with a peak-to-valley difference of 0.2–0.5, and respiratory segments with a peak-to-valley difference of less than 0.2.

Acqknowledge 4.2 was used to extract the corresponding parameters of the respiratory signal, and R (3.03) was used for statistical analysis. The data examples are shown in Tables 2 and 3.

Table 3
Example of respiratory data for a Shanghainese speaker (SM1)

Parameter	Inspiratory duration	Expiratory duration	Inspiratory amplitude	Expiratory amplitude	Inspiratory slope	Expiratory slope
Period 1	0.943	2.681	0.779	0.424	0.826	0.158
Period 2	0.438	1.528	0.044	0.419	0.100	0.274
Period 3	2.419	1.024	0.499	0.289	0.206	0.282

Table 4

Mean comparison of Shanghainese and Mandarin respiratory units

speakerRespiratorystage unit	Inspiratory curve			Expiration curve
speakerRespiratorystage unit	Respiratory group	Respiratory segment	Respiratory section	Respiratory group	Respiratory segment	Respiratory section
Shanghainese speaker	4	11	6	4	12	5
Mandarin speaker	5	9	8	6	10	6

Figure 3.

Example of breathing curve [7]. (Explanation: Fig. 3 is a two-dimensional map of the respiration curve, the horizontal axis represents time, an d the vertical axis represents the voltage change caused by respiration. Ti represents the duration of inspiration, Te represents the duration of expiration, Ci represents the amplitude of inspiration, and Ce represents expiration. Amplitude, V means respiratory valley, P means respiratory peak.)

4. Research result

4.1 Respiratory parameters and respiratory units

4.1.1 Number of breathing units

Statistical statistics were carried out on the average number of respiratory units of 20 Shanghainese and 20 Mandarin speakers respectively, and the results are shown in Table 4.

There are considerable differences between Shanghai dialect and Mandarin in terms of vocabulary, grammar, and pronunciation. The comparison found that the modes of expiratory and inspiratory curves in Shanghainese and Mandarin are concentrated in the respiratory segment, which is determined by the relatively stable tidal volume 1(/Breathing) of human respiration. In terms of the number of respiratory groups and respiratory section, the number of respiratory units of Shanghai dialect speakers is less than that of Mandarin speakers, which to a certain extent indicates that the distribution of respiratory units of different grades in Shanghai dialect is relatively concentrated, and the grades of respiratory units vary relatively little, which the fluctuation of the breathing curve is smaller than that of Mandarin.

Note: When breathing calmly, a healthy adult breathes about 6.4 seconds, and the volume of gas inhaled and exhaled each time is about 350–600 ml. This gas volume is also called tidal volume (Tidal volume/Breathing).

4.1.2 Inspiratory parameters and respiratory units

The data of inspiratory parameters of 20 Shanghainese and 20 Mandarin speakers were screened, and singular values were removed, and a normal distribution test was performed. The results are shown in Table 5.

Table 5
Comparison of Shanghainese and Mandarin inspiratory parameters

Inspiratory parameters	SpeakerStatistics	Respiratory group		Respiratory segment		Respiratory section
	SpeakerStatistics	M	Sd	M	Sd	M	Sd
Inspiratory duration (s)	Shanghai dialect	0.99	0.36	0.82	0.24	0.53	0.14
	Mandarin	0.97	0.15	0.77	0.18	0.53	0.11
Inspiratory amplitude ( $H$ value)	Shanghai dialect	0.65	0.12	0.35	0.08	0.10	0.04
	Mandarin	0.77	0.14	0.32	0.09	0.11	0.06
Inspiratory slope	Shanghai dialect	0.75	0.21	0.47	0.11	0.19	0.08
	Mandarin	0.79	0.17	0.43	0.11	0.20	0.09

In general, whether it is Shanghai dialect or Mandarin, in terms of mean, the values of each parameter of the respiratory group are the largest, followed by the respiratory segment, and the respiratory section is the smallest. In terms of data distribution concentration, the standard deviation of each respiratory unit in the inspiratory duration is relatively large, indicating that the data distribution range has a certain overlap; the inspiratory slope is second; the inspiratory amplitude is relatively small, indicating that the data distribution range is relatively concentrated, basically there is no overlap on the top, and the stability is better.

The difference lies in that, according to the mean statistics, in terms of inhalation duration, the overall average duration of each breathing unit of Shanghai dialect speakers is longer than that of Mandarin. Among them, the inhalation time before the respiratory group and respiratory segment is significantly longer than that in Mandarin, and the duration before the respiratory section is the same. In terms of inspiratory amplitude, the inspiratory amplitude of Shanghai dialect speakers is slightly smaller than that of Mandarin before the respiratory group and respiratory section, and slightly larger before the respiratory segment, indicating that the fluctuation range of the inspiratory curve of Shanghai dialect speakers is relatively small. In terms of inspiratory slope, the inspiratory slope of Shanghai dialect speakers is slightly smaller than that of Mandarin before the respiratory group and respiratory section, and slightly larger before the respiratory segment, indicating that the inspiratory curve of Shanghai dialect speakers reaches the peak relatively gently.

According to the standard deviation statistics, in terms of the length of inhalation, the standard deviation of respiratory groups, respiratory segments, and respiratory sections in Shanghai dialect is larger than that in Mandarin, and the level is above 0.14 (inclusive), which indicates the inhalation time data of Shanghai dialect speakers. The distribution is relatively scattered, and there is a certain overlap in the inspiratory duration of the three-level breathing units. The variation of inspiratory duration is generally restricted by the length of the expression content and the complexity of the verbal task. In terms of inspiratory amplitude ( $H$ value), the standard deviation of Shanghai dialect speakers in respiratory groups, respiratory segments, and respiratory sections are generally smaller than that of Mandarin, and the level is below 0.12 (inclusive), indicating that Shanghai dialect speakers are inhaling at the third-level respiratory unit. In terms of inspiratory slope, the standard deviation changes of Shanghai dialect and mandarin in respiratory groups, respiratory segments and respiratory sections are roughly the same. The results are shown in Table 6.

Table 6

Statistical table of analysis of variance analysis of inspiratory parameters in Shanghai dialect

All levels of respiratory units and inspiratory parameters	Df1	Df2	$F$ _value	$P$ _value
Respiratory units and inspiratory amplitudes at all levels ( $H$ value)	1	83	494.1	2e-16 ${}^{***}$
Respiratory units and inspiratory duration at all levels (s)	1	83	21.6	1.25e-05 ${}^{***}$
Respiratory units and inspiratory slopes at all levels	1	83	131.7	2e-16 ${}^{***}$

Each group of parameters in Shanghainese and Mandarin generally obeys a normal distribution, that is, for each level of the factor, the observed value comes from a simple random sample of normal distribution. On this basis, analysis of variance (F Test) was performed on the data of the two by statistical R (3.03) software. The results are shown in Table 7.

Table 7

Statistical table of variance analysis of inhalation parameters in Mandarin Chinese

All levels of respiratory units and inspiratory parameters	Df1	Df2	$F$ _value	$P$ _value
Respiratory units and inspiratory amplitudes at all levels ( $H$ value)	1	82	422.0	2e-16 ${}^{***}$
Respiratory units and inspiratory duration at all levels (s)	1	82	46.6	1.4e-09 ${}^{***}$
Respiratory units and inspiratory slopes at all levels	1	82	274.2	2e-16 ${}^{***}$

Statistics of Shanghai dialect found that in terms of inspiratory amplitude ( $H$ value), $F$ (1.83) $=$ 494.1, $p<$ 0.001; in terms of inspiratory duration, $F$ (1.83) $=$ 21.6, $p<$ 0.001; inhalation On the slope, $F$ (1.83) $=$ 131.7, $p<$ 0.001.

The statistics of Mandarin Chinese found that in terms of inspiratory amplitude ( $H$ value), $F$ (1.82) $=$ 422.0, $p<$ 0.001; in terms of inspiratory duration, $F$ (1.82) $=$ 46.6, $p<$ 0.001; inspiratory slope above, $F$ (1.82) $=$ 274.2, $p<$ 0.001. The results show that, whether it is Shanghai dialect or Mandarin, different levels of respiratory units have a significant impact on the inspiratory parameter values, and each inspiratory parameter value is positively correlated with the size of the respiratory unit, that is, the higher the respiratory unit level, the higher the value of each inspiratory parameter. the larger, and vice versa.

4.1.3 Expiratory parameters and respiratory units

The data of expiratory parameters were tested for normal distribution, and statistics showed that the expiratory duration, expiratory amplitude, and expiratory slope were all normally distributed. In general, the characteristics of expiratory parameters are more consistent with those of inspiratory parameters.

The difference is that, according to the average statistics, in terms of exhalation time, the inhalation time of Shanghai dialect speakers before the respiratory group is longer than that of Mandarin speakers, and the length of time before the respiratory segment and respiratory section is not much different. In terms of exhalation amplitude, the exhalation amplitude of Shanghai dialect speakers is larger than that of Mandarin speakers before the respiratory group and respiratory segment, and the amplitude before the respiratory section is the same. In terms of the expiratory slope, the exhalation slope of the Shanghai dialect speaker before the respiratory group is smaller than that of the Mandarin speaker, and it is roughly the same in the respiratory segment and before the respiratory section, indicating that the exhalation curve of the Shanghai dialect speaker is relatively flat when reading the fable.

According to the standard deviation statistics, Shanghai dialect and Mandarin have larger standard deviations of respiratory group, respiratory segment and respiratory section in terms of exhalation duration, and the level is above 0.30 (inclusive), and the data distribution is relatively scattered. There is a certain overlap in duration. In terms of expiratory amplitude ( $H$ value), the standard deviation of the three is relatively small, below 0.13 (inclusive), and the data distribution is relatively concentrated. In terms of expiratory slope, the numerical changes of Shanghai dialect and Mandarin respiratory group, respiratory segment and respiratory section are within 0.04 (inclusive), which is relatively stable. The results are shown in Table 8.

Table 8
Comparison of exhalation parameters between Shanghainese and Mandarin

Expiratory parameters	SpeakerStatistics	Respiratory group		Respiratory segment		Respiratory section
	SpeakerStatistics	M	Sd	M	Sd	M	Sd
Expiratory duration (s)	Shanghai dialect	2.51	0.83	1.75	0.76	0.78	0.30
	Mandarin	2.11	0.92	1.83	0.60	0.92	0.60
Expiratory amplitude ( $H$ value)	Shanghai dialect	0.64	0.13	0.36	0.09	0.12	0.05
	Mandarin	0.57	0.17	0.32	0.07	0.12	0.05
Expiratory slope	Shanghai dialect	0.28	0.10	0.24	0.09	0.15	0.07
	Mandarin	0.31	0.14	0.20	0.06	0.17	0.03

The analysis of variance (F Test) was performed on the expiratory parameter data of Shanghai dialect and Mandarin. The research method was the same as that of inspiratory parameters. The results showed that different levels of respiratory units had a significant effect on expiratory parameter values. The expiratory parameter values of Shanghai dialect and Mandarin There is a positive correlation with the size of the respiratory unit and vice versa.

4.2 Correspondence comparison between prosody level and respiratory unit

It is meaningless to segment a speech stream solely from the phonetic perspective. The value of phonetic segmentation must be reflected by marking segments related to meaning (expressing the grammatical meaning of a sentence). Therefore, the functional exploration of prosodic segmentation can be classified according to the different grammatical meanings it expresses. Statistics on the mean of prosody and breathing data for each Shanghai dialect speaker show that in a reading corpus, there are 21 breathing cycles, corresponding to 9 prosodic sentences, 24 intonation phrases, 47 prosodic phrases, and 63 prosodic words. During the breathing process, the rhythmic sentence level includes 7 respiratory groups, 18 respiratory segments, and 11 respiratory sections. Among them, in the process of inhalation, the level of prosodic sentences contains 3 groups, 6 segments, and 5 sections; during exhalation, the level of prosodic sentences consists of 4 groups, 12 segments, and 6 sections.

“Prosodic features”, also known as “supraphonetic features” or “suprasegmental features”, are a phonological structure of language, closely related to syntactic and textual structures, information structures, and other linguistic structures. Prosodic features can be divided into three main aspects: intonation, temporal distribution, and stress, which are realized through suprasegmental features. The suprasegmental features include pitch, intensity, and temporal characteristics, which are loaded by phonemes or groups of phonemes. Rhythm is a typical feature of human natural language, with many common characteristics across languages, such as pitch down, stress, pause, etc., which are commonly found in different languages. Statistics on the mean of prosody and breathing data for each Mandarin speaker show that in a reading corpus, there are 21 breathing cycles, corresponding to 9 prosodic sentences, 21 intonation phrases, 45 prosodic phrases, and 63 prosodic words. During the breathing process, the rhythmic sentence level consists of 8 respiratory groups, 14 respiratory segments, and 12 respiratory sections. Among them, in the process of inhalation, the level of prosodic sentences contains 2 groups, 4 segments, and 7 sections; during exhalation, the level of prosodic sentences includes 6 groups, 10 segments, and 5 sections.

Regarding the correspondence between prosodic units and respiratory units in Shanghainese and Mandarin, the commonality is that respiratory groups generally appear at the beginning and end of higher-level prosodic units such as prosodic sentences and intonation phrases. In the whole breathing cycle, prosodic sentences and intonation phrases are most closely related to breathing groups; there is also the possibility of respiratory groups appearing at the turn of words. The position of the respiratory segment is relatively flexible, and the respiratory section mostly appears at the prosodic phrases and prosodic words.

4.3 Silent segment and inhalation parameters

Due to the short pause time between prosodic phrases, new breathing units are rarely generated. This paper only compares the silent segment and inhalation parameters between Shanghainese and Mandarin prosodic sentences and intonation phrases. The results show that (see Table 9), the silence period duration, inspiratory duration, inspiratory amplitude, and inspiratory slope between Shanghainese and Mandarin prosodic sentences are all larger than those corresponding to intonation phrases. In Shanghai dialect, the duration of silence between prosodic sentences and intonation phrases and the duration of inhalation are generally smaller than those in Mandarin. The inspiratory amplitude and inspiratory slope between prosodic sentences and intonation phrases in Shanghai dialect are generally larger than those in Mandarin.

Table 9
Statistics of silent segments and inspiratory parameters between Shanghainese and Mandarin prosodic units

	SpeakerStatistics	Prosodic sentence		Intonation phrase
	SpeakerStatistics	M	Sd	M	Sd
Silence period duration (s)	Shanghai dialect	0.69	0.13	0.41	0.12
	Mandarin	0.79	0.13	0.44	0.14
Inspiratory duration (s)	Shanghai dialect	0.80	0.23	0.58	0.11
	Mandarin	0.84	0.25	0.61	0.14
Inspiratory amplitude ( $H$ value)	Shanghai dialect	0.46	0.16	0.30	0.19
	Mandarin	0.44	0.23	0.24	0.17
Inspiratory slope	Shanghai dialect	0.58	0.17	0.41	0.22
	Mandarin	0.51	0.20	0.37	0.20

Regarding the relationship between the breath reset and the silent section, there must be a silent section where there is a breath reset, but there is not necessarily a breath reset where there is a silent section [19, 20]. In Fig. 4, the Shanghai dialect (transliteration) “When the North Wind and the Sun get together to argue about who’s the best, they can’t tell” has an obvious silent segment in the voice signal channel, but does not show respiration reset corresponding to the respiratory signal channel.

Figure 4.

Shanghai dialect voice breathing signal comparison chart I.

In reading aloud texts, the inhalation curve mostly corresponds to the silent segment, but there are exceptions. In Fig. 5, the speech signal “on the road” (/lu lã/) circled at ⟁ corresponds to the rising segment of the breathing curve, indicating that the speaker of the word completes the speech task when inhaling. The voice signal “thick” (//) circled at ⟂ corresponds to a flat breathing curve, and there is a situation of “breath-holding”. Some scholars believe that breath-holding can be regarded as a special kind of exhalation (Feng Shi et al., 2009).

(The content of the transliterated text of the Shanghai dialect material in the picture: the content of the Mandarin text: the content of the Mandarin text: the content of the content of the Mandarin text: the content of the content of the Mandarin text: the content of the content of the content of the Mandarin text: the content of the content of the content of the mandarin text: the content of the content of the content of the mandarin language: the content of the content is that you cannot tell the difference between the high and the low.)

Figure 5.

Shanghai dialect speech breathing signal comparison chart II.

(The corresponding transliterated text content at Shanghai Dialect in the picture: There is a morning light, and a man walks by Lu Lang, wearing a thick coat. Text content in Mandarin: At this time, there is a walkway on the road. Yes, he’s wearing a heavy coat.)

Pauses in speech flow when completing reading tasks are not only due to physiological needs, but are also affected by other factors. According to the analysis of the reasons for the breathing signal as shown in Fig. 5, the Shanghai dialect speakers of this corpus are far more proficient in the text than other speakers. This makes it possible to control the text more freely, and there are flexible corresponding modes of prosody unit and breathing unit, such as inhalation pronunciation, “breath-holding” pronunciation, etc.

B. Conrad and P. Schönle [2] believed that the subject’s proficiency in text would have a certain influence on the breathing curve. The article summarized the possible breathing cycles and types of pauses in a speech pause (see Fig. 6) and pointed out that, Pause type 1 (1a, 1b) accounts for 80% of pause types. Obviously, the pause type 1a belongs to the inspiratory part pronunciation.

In I of Fig. 6, when reading aloud for the first time (curve A), the expiratory curve at A ⟀ is a steep drop after a new inspiratory segment; when reading aloud for the second time (curve B), the extension line of the expiratory curve Immediately after the steep drop line (B ⟀) of the expiratory curve, and the entire expiratory segment becomes longer; when reading aloud for the third time (curve C), the expiratory curve drops gently, and there is no obvious steep drop line after that, while It is to take a supplementary breath at 4.5 s, and at the same time, the voice is produced with the inhalation segment.

Figure 6.

The possible breathing cycles and types of pauses in a speech pause.

Figure 6: The breathing and speech signal graphs of a subject reading the same text aloud three times (A–C) in a row, with the text content above the speech graph. A, B, C text content to the effect: “The story of a stupid donkey carrying salt. A donkey carried salt across the river and walked to the middle of the river. The river water did not pass its salt, and its buttocks felt very comfortable, soaked in the cool river water. salt until that …”. The dashed lines indicate the beginning and end of the three breathing cycles. II: Schematic diagram of possible breathing cycles during a speech pause.

5. Conclusion

Taking Shanghai dialect and Mandarin Chinese allegorical style reading as an example, this paper preliminarily discusses the interaction between discourse rhythm and respiratory rhythm. The article points out that, regardless of Shanghai dialect or Mandarin, the mean value of respiratory parameters is positively correlated with the size of the respiratory unit. There is an implied relationship between breath reset and silent segment. Where there is a breath reset, there must be a silent segment, and where there is a silent segment, there is not necessarily a breath reset. In addition, the proficiency of the text will obviously increase the complexity of the breathing curve, showing more special breathing characteristics such as “breath-holding” pronunciation. This paper investigates the prosody characteristics of Shanghai dialect and Mandarin from the perspective of the physiological mechanism of breathing, which provides a possibility for multimodal synthesis of dialects and a new method for the study of acoustic prosody.

Footnotes

Acknowledgments

This study has received support by the Scientific Research Incubation Program of Ningbo University of Technology (2022): Research on the Creative Transformation and Innovative Development of Chinese Dialects from the Perspective of Media Communication (No. 2022TS22).

References

Lieberman

, Some acoustic and physiologic correlates of the breath group, Journal of the Acoustical Society of America39 (1966), 12–18. doi: 10.1121/1.1942683.

Conrad

and Schonle

, Speech and respiration, Archives of Psychiatry and Neruological Sciences226 (1979), 251–268. doi: 10.1007/BF00342238.

Tan

J.J.

Y.H.

and Kong

J.P.

, Breathing-reset when reading literature in mandarin, Journal of Tsinghua University (Natural Science Edition)48 (2008), 613–620. doi: 10.16511/j.cnki.qhdxxb.2008.s1.002.

Wang

W.Y.

,Research on Breathing and Rhythm under Chinese English Learners’ Reading Task, University of Yanbian Press, 2020.

Ding

Y.B.

, Research on the Breathing Rhythm of Huaer Folk Songs, Northwest University for Nationalities Press, 2018.

Yang

, A Study on The Features of Chest and Abodominal Breathing Between Reciting and Chanting Chinese Poetry, Journal of Chinese Linguistics43 (2015), 399–410. doi: P20181204001-201501-201812060010-201812060010-399-410.

Zhang

J.Y.

Shi

and Bai

X.J.

, Preliminary analysis of respiratory between the states of narration and reading, Nankai Linguistics19 (2012), 56–63.

Y.H.

Zhang

J.S.

Wang

S.W.

and Yu

H.Z.

, Acoustic analysis of breathing signals in tibetan news reading, Journal of Northwest University for Nationalities (Natural Science Edition)31 (2010), 17–21. doi: 10.14084/j.cnki.cn62-1188/n.2010.02.019.

Yin

J.D.

and Kong

J.P.

, Preliminary Study on the Relationship between Korean Breathing Rhythm and Intonation Group, in: Proceedings of the 9th National Conference on Human-Machine Speech Communication, 2007, pp. 315–320.

10.

Tan

J.J.

, Research on Breath Reset in Different Styles of Mandarin Chinese, Journal of Tsinghua University (Natural Science Edition)4 (2008), 613–620. doi: 10.16511/j.cnki.qhdxxb.2008.s1.002.

11.

Slifka

, Respiratory Constraints on Speech Production at Prosodic Boundaries, Dissertation of Harvard-MIT Press, 2000.

12.

Steinhauer

, Electrophysiological correlates of prosody and punctuation, Brain and Language86 (2003), 142–164. doi: 10.1016/S0093-934X(02)00542-4.

13.

J.M.

and Lv

S.N.

, A discussion on the prosodic characteristics of the expressive intonation, Chinese Language and Literature6 (2011), 540–549.

14.

W.J.

Wang

X.Q.

and Yang

Y.F.

, Closure Positive Shifts Evoked by Different Prosodic Boundaries in Chinese Sentences, Advances in Cognitive Neurodynamics, 2008, 505–509.

15.

Zheng

Q.Y.

, Text rhythm and upper information-also on the research methods and discoveries of phonetics, Language and Linguistics9 (2008), 659–719.

16.

Xiong

Z.Y.

, An acoustic study of the boundary features of prosodic units, Applied Linguistics2 (2003), 116–121.

17.

A.J.

, Acoustic representation of prosodic features in mandarin dialogue, Chinese Language and Literature6 (2002), 525–535.

18.

Shi

Bai

X.J.

Zhang

J.Y.

and Zhu

Z.H.

, A preliminary analysis of the rhythm of breathing discourse, Nankai Phonetic Annual Report2 (2009), 1–11.

19.

Yuan

and Li

A.J.

, The Pause in Expressive Speech, in: National Conference on Man-Machine Speech Communication, 2007, p. 7.

20.

Yin

Z.G.

, Study on the Rhythm of Reading aloud in Mandarin Chinese, Ph. Dissertation of Chinese Academy of Social Sciences, 2011.

The interactive relationship between prosody and respiration of computer BIOPAC systems: Shanghai dialect and mandarin

Abstract

Keywords

1. Introduction

2. Research methods structure of the paper

2.1 Speakers and experimental materials

2.3 Parameter setting

3. Data extraction and analysis

3.1 Experiment of auditory discrimination of prosodic boundary

Table 1 Example of saving results of prosodic unit data for a Mandarin speaker (PM1)

Table 3 Example of respiratory data for a Shanghainese speaker (SM1)

4.1 Respiratory parameters and respiratory units

4.1.1 Number of breathing units

4.1.2 Inspiratory parameters and respiratory units

Table 5 Comparison of Shanghainese and Mandarin inspiratory parameters

Table 8 Comparison of exhalation parameters between Shanghainese and Mandarin

4.3 Silent segment and inhalation parameters

Table 9 Statistics of silent segments and inspiratory parameters between Shanghainese and Mandarin prosodic units

Footnotes

Acknowledgments

References

Table 1
Example of saving results of prosodic unit data for a Mandarin speaker (PM1)

Table 3
Example of respiratory data for a Shanghainese speaker (SM1)

Table 5
Comparison of Shanghainese and Mandarin inspiratory parameters

Table 8
Comparison of exhalation parameters between Shanghainese and Mandarin

Table 9
Statistics of silent segments and inspiratory parameters between Shanghainese and Mandarin prosodic units