Abstract
This paper uses the computer BIOPAC Systems tool to analyze Shanghai dialect and Mandarin which refers to the relationship between prosody and respiration in reading fable, the conclusion are as follows: 1) the mean of respiratory parameters and respiratory units was positively related, and respiration curve on Shanghai dialect which shows the characteristics of small ups and downs is different from the Mandarin curve; 2) the reset of respiration has relationship with mute segment, and the occurrence of reset breathing place must have a quiet period, while the opposite does not happen; 3) on the situation of fable literary style with flexible feature, the text of the proficiency can significantly increase the complexity of the respiration curve, showing a more special features such as “breathless” pronunciation.
Introduction
Lieberman [1] “Breath-Group theory” pointed out that “Marked Breath-Group” is closely related to speech activities. B. Conrad and P. Schönle [2] analyzed the differences in breathing during four verbal tasks: spontaneous speech, reading, serial speech, and arithmetic, which under three kinds of speech states, namely speaking aloud, articulating subvocally, and quiet performance by trying to exclusively “think” the tasks. It is believed that there is a breathing continuum from calm breathing to verbal breathing. Peking University [3] combined respiratory physiological signal and rhythm research earlier. In recent years, scholars have discussed from the perspectives of different styles or corpora [4, 5, 6, 10], different language conditions and language tasks [7, 11] to different national languages and different dialects [8, 9, 12, 13, 14] to discuss the relationship between prosody and respiration.
In this paper, the Biopac multi-channel physiological instrument is used to collect the speech signals and the synchronized breathing signals of Shanghai dialect and Mandarin by expressing with allegorical style reading. Statistical analysis with R (3.03) to make a preliminary research on the interaction between the two.
Research methods structure of the paper
Speakers and experimental materials
In the experiment, 40 speakers (20 speakers in Shanghai dialect and 20 in Mandarin) were selected. The speakers were all university students, healthy, fluent in expression and moderate in speaking speed. The text is the fable “The North Wind and the Sun”. Due to the differences in dialect grammar, expression habits and other factors, the text length of Shanghai dialect and Mandarin is slightly different. The Mandarin text has a total of 186 syllables, and the Shanghai dialect text has a total of 171 syllables.
Acqknowledge 4.2 four-channel signal acquisition example.
The experimental acquisition equipment includes a multi-channel physiology instrument (BIOPAC Systems MP150), a laptop computer, a sound card (Avid Mbox1.0), and a microphone (AKG). The software is Acqknowledge 4.2 (AcqKnowledge acquisition and analysis software is an interactive and intuitive software that can view, measure, analyze, and transform data in real time. Simple pull-down menus and dialog boxes can be used to perform complex data collection, triggering, and analysis, as well as recording and analyzing physiological, behavioral, and subjective response data. Its application of analysis automation tools can save processing time and routine analysis processes. It can record and label data files, easily label graphics, and output multiple file formats, including AcqKnowledge graphics, Excel, MATLAB, Text, and so on), which is the supporting software of the multi-channel physiology instrument. In order to facilitate future research, the experiment synchronously collects four channels of signals (see Fig. 1): the first channel (Ch1) is the voice signal collected by microphone; the second channel (Ch2) is the vocal signal collected by EGG; the third channel (Ch3) is the thoracic breathing signal (collected by TSD201); the fourth channel (Ch4) is the abdominal breathing signal. Unless otherwise specified, the research in this paper is limited to voice signals and thoracic respiration signals.
The voice signal sampling frequency is 44100 Hz by using the praat spectrum, and the respiration signal sampling frequency is 2500 Hz. After the acquisition, the four-channel data is saved in .acq format. The audio signal is finally saved in .wav format by the format converter.
Parameter setting
In terms of prosodic units, this paper refers to Zheng Qiuyu’s [15] “Hierarchical Organizational Structure of Discourse Prosody (HPG)”, and divides prosodic units into prosodic group (PG), intonational phrases (IPh), Prosodic Phrases (PPh) and Prosodic Words (PW) from large to small.
In terms of respiratory parameters, the main data include [16, 17]: inspiratory duration, inspiratory amplitude, inspiratory slope, expiratory duration, expiratory amplitude, expiratory slope, respiratory valley and respiratory peak.
Data extraction and analysis
Experiment of auditory discrimination of prosodic boundary
In order to determine the prosodic boundary, this study first conducted an auditory discrimination experiment. The mechanism used to select the speakers is as follows: Considering that every speaker has their own tone and habits, it’s amazing that the brain can quickly understand these changes. Moreover, while analyzing which vowels and consonants have changed, which words have been used, and how these words have been formed into phrases and sentences, the brain must also keep up with and understand these tonal changes, which all occur in milliseconds. According to the modern popular classification, modern Chinese can be integrated and divided into seven major dialects, namely, Mandarin, Cantonese, Wu, Hakka, Min, Xiang, and Gan.
The experimental data for listening and identification are Shanghai dialect (20 articles) and Mandarin (20 articles) fable reading materials “North Wind and the Sun” each. The subjects are 10 students with Shanghai dialect background and northern dialect background. The subjects are healthy and have normal hearing.
Experimental steps: 1. Distribute corpus texts with consistent word spacing and no punctuation, and the subjects should be familiar with the text. 2. Play a recording that is consistent with the background of the subjects’ native language. The recording is played back four times. When it is played for the first time, the listener needs to mark the longest pause in the sense of hearing and mark it as “////”. When playing for the second time, the listener will mark the place where the hearing delay is second to the longest between the first text marks and mark it as “///”. When playing for the third time, the listeners mark the place where the pause in the sense of hearing is second to the second longest between the text marks of the second time, and mark it as “//”. During the fourth playback, between the third text marks, the listeners mark the place where the hearing pause is second to the third longest, marked as “/”.
Each native subject identified 20 recordings, and a total of 200 Shanghai dialect and Mandarin results were obtained. After that, count the number of prosodic units marked for the first time by each native subject, and then calculate the arithmetic mean of the number of prosodic units, and the result is taken as an integer. Thereby, the final prosodic unit division scheme is determined.
The labeling of prosodic boundaries is done in Praat (see Fig. 2); the prosodic unit data at all levels is extracted through the Praat script (see Table 1 for an example).
Example of saving results of prosodic unit data for a Mandarin speaker (PM1)
Example of saving results of prosodic unit data for a Mandarin speaker (PM1)
Example of respiratory data for a Shanghainese speaker (SM1)
Praat prosodic level annotation example. (Notation description: A stands for prosodic group (PG), B stands for intonation phrase (IPh), C stands for prosodic phrase (PPh), D stands for prosodic word (PW), S stands for silent segment; the number before the English letter indicates that the prosodic unit is located in the the position of the discourse sentence. The number after the English letter indicates the position of the prosodic unit in the sentence. For example, 1B2 indicates the second intonation phrase in the first sentence; 1BS1 indicates the first silent segment between intonation phrases in the first sentence.)
The absolute value of the respiration curve has obvious individual differences. In this paper, the method of respiration degree [18] is used to normalize the data.
Normalization processing formula:
In the formula,
Acqknowledge 4.2 was used to extract the corresponding parameters of the respiratory signal, and R (3.03) was used for statistical analysis. The data examples are shown in Tables 2 and 3.
Example of respiratory data for a Shanghainese speaker (SM1)
Example of respiratory data for a Shanghainese speaker (SM1)
Mean comparison of Shanghainese and Mandarin respiratory units
Example of breathing curve [7]. (Explanation: Fig. 3 is a two-dimensional map of the respiration curve, the horizontal axis represents time, an d the vertical axis represents the voltage change caused by respiration. Ti represents the duration of inspiration, Te represents the duration of expiration, Ci represents the amplitude of inspiration, and Ce represents expiration. Amplitude, V means respiratory valley, P means respiratory peak.)
Respiratory parameters and respiratory units
Number of breathing units
Statistical statistics were carried out on the average number of respiratory units of 20 Shanghainese and 20 Mandarin speakers respectively, and the results are shown in Table 4.
There are considerable differences between Shanghai dialect and Mandarin in terms of vocabulary, grammar, and pronunciation. The comparison found that the modes of expiratory and inspiratory curves in Shanghainese and Mandarin are concentrated in the respiratory segment, which is determined by the relatively stable tidal volume 1(/Breathing) of human respiration. In terms of the number of respiratory groups and respiratory section, the number of respiratory units of Shanghai dialect speakers is less than that of Mandarin speakers, which to a certain extent indicates that the distribution of respiratory units of different grades in Shanghai dialect is relatively concentrated, and the grades of respiratory units vary relatively little, which the fluctuation of the breathing curve is smaller than that of Mandarin.
Note: When breathing calmly, a healthy adult breathes about 6.4 seconds, and the volume of gas inhaled and exhaled each time is about 350–600 ml. This gas volume is also called tidal volume (Tidal volume/Breathing).
Inspiratory parameters and respiratory units
The data of inspiratory parameters of 20 Shanghainese and 20 Mandarin speakers were screened, and singular values were removed, and a normal distribution test was performed. The results are shown in Table 5.
Comparison of Shanghainese and Mandarin inspiratory parameters
Comparison of Shanghainese and Mandarin inspiratory parameters
In general, whether it is Shanghai dialect or Mandarin, in terms of mean, the values of each parameter of the respiratory group are the largest, followed by the respiratory segment, and the respiratory section is the smallest. In terms of data distribution concentration, the standard deviation of each respiratory unit in the inspiratory duration is relatively large, indicating that the data distribution range has a certain overlap; the inspiratory slope is second; the inspiratory amplitude is relatively small, indicating that the data distribution range is relatively concentrated, basically there is no overlap on the top, and the stability is better.
The difference lies in that, according to the mean statistics, in terms of inhalation duration, the overall average duration of each breathing unit of Shanghai dialect speakers is longer than that of Mandarin. Among them, the inhalation time before the respiratory group and respiratory segment is significantly longer than that in Mandarin, and the duration before the respiratory section is the same. In terms of inspiratory amplitude, the inspiratory amplitude of Shanghai dialect speakers is slightly smaller than that of Mandarin before the respiratory group and respiratory section, and slightly larger before the respiratory segment, indicating that the fluctuation range of the inspiratory curve of Shanghai dialect speakers is relatively small. In terms of inspiratory slope, the inspiratory slope of Shanghai dialect speakers is slightly smaller than that of Mandarin before the respiratory group and respiratory section, and slightly larger before the respiratory segment, indicating that the inspiratory curve of Shanghai dialect speakers reaches the peak relatively gently.
According to the standard deviation statistics, in terms of the length of inhalation, the standard deviation of respiratory groups, respiratory segments, and respiratory sections in Shanghai dialect is larger than that in Mandarin, and the level is above 0.14 (inclusive), which indicates the inhalation time data of Shanghai dialect speakers. The distribution is relatively scattered, and there is a certain overlap in the inspiratory duration of the three-level breathing units. The variation of inspiratory duration is generally restricted by the length of the expression content and the complexity of the verbal task. In terms of inspiratory amplitude (
Statistical table of analysis of variance analysis of inspiratory parameters in Shanghai dialect
Each group of parameters in Shanghainese and Mandarin generally obeys a normal distribution, that is, for each level of the factor, the observed value comes from a simple random sample of normal distribution. On this basis, analysis of variance (F Test) was performed on the data of the two by statistical R (3.03) software. The results are shown in Table 7.
Statistical table of variance analysis of inhalation parameters in Mandarin Chinese
Statistics of Shanghai dialect found that in terms of inspiratory amplitude (
The statistics of Mandarin Chinese found that in terms of inspiratory amplitude (
The data of expiratory parameters were tested for normal distribution, and statistics showed that the expiratory duration, expiratory amplitude, and expiratory slope were all normally distributed. In general, the characteristics of expiratory parameters are more consistent with those of inspiratory parameters.
The difference is that, according to the average statistics, in terms of exhalation time, the inhalation time of Shanghai dialect speakers before the respiratory group is longer than that of Mandarin speakers, and the length of time before the respiratory segment and respiratory section is not much different. In terms of exhalation amplitude, the exhalation amplitude of Shanghai dialect speakers is larger than that of Mandarin speakers before the respiratory group and respiratory segment, and the amplitude before the respiratory section is the same. In terms of the expiratory slope, the exhalation slope of the Shanghai dialect speaker before the respiratory group is smaller than that of the Mandarin speaker, and it is roughly the same in the respiratory segment and before the respiratory section, indicating that the exhalation curve of the Shanghai dialect speaker is relatively flat when reading the fable.
According to the standard deviation statistics, Shanghai dialect and Mandarin have larger standard deviations of respiratory group, respiratory segment and respiratory section in terms of exhalation duration, and the level is above 0.30 (inclusive), and the data distribution is relatively scattered. There is a certain overlap in duration. In terms of expiratory amplitude (
Comparison of exhalation parameters between Shanghainese and Mandarin
Comparison of exhalation parameters between Shanghainese and Mandarin
The analysis of variance (F Test) was performed on the expiratory parameter data of Shanghai dialect and Mandarin. The research method was the same as that of inspiratory parameters. The results showed that different levels of respiratory units had a significant effect on expiratory parameter values. The expiratory parameter values of Shanghai dialect and Mandarin There is a positive correlation with the size of the respiratory unit and vice versa.
It is meaningless to segment a speech stream solely from the phonetic perspective. The value of phonetic segmentation must be reflected by marking segments related to meaning (expressing the grammatical meaning of a sentence). Therefore, the functional exploration of prosodic segmentation can be classified according to the different grammatical meanings it expresses. Statistics on the mean of prosody and breathing data for each Shanghai dialect speaker show that in a reading corpus, there are 21 breathing cycles, corresponding to 9 prosodic sentences, 24 intonation phrases, 47 prosodic phrases, and 63 prosodic words. During the breathing process, the rhythmic sentence level includes 7 respiratory groups, 18 respiratory segments, and 11 respiratory sections. Among them, in the process of inhalation, the level of prosodic sentences contains 3 groups, 6 segments, and 5 sections; during exhalation, the level of prosodic sentences consists of 4 groups, 12 segments, and 6 sections.
“Prosodic features”, also known as “supraphonetic features” or “suprasegmental features”, are a phonological structure of language, closely related to syntactic and textual structures, information structures, and other linguistic structures. Prosodic features can be divided into three main aspects: intonation, temporal distribution, and stress, which are realized through suprasegmental features. The suprasegmental features include pitch, intensity, and temporal characteristics, which are loaded by phonemes or groups of phonemes. Rhythm is a typical feature of human natural language, with many common characteristics across languages, such as pitch down, stress, pause, etc., which are commonly found in different languages. Statistics on the mean of prosody and breathing data for each Mandarin speaker show that in a reading corpus, there are 21 breathing cycles, corresponding to 9 prosodic sentences, 21 intonation phrases, 45 prosodic phrases, and 63 prosodic words. During the breathing process, the rhythmic sentence level consists of 8 respiratory groups, 14 respiratory segments, and 12 respiratory sections. Among them, in the process of inhalation, the level of prosodic sentences contains 2 groups, 4 segments, and 7 sections; during exhalation, the level of prosodic sentences includes 6 groups, 10 segments, and 5 sections.
Regarding the correspondence between prosodic units and respiratory units in Shanghainese and Mandarin, the commonality is that respiratory groups generally appear at the beginning and end of higher-level prosodic units such as prosodic sentences and intonation phrases. In the whole breathing cycle, prosodic sentences and intonation phrases are most closely related to breathing groups; there is also the possibility of respiratory groups appearing at the turn of words. The position of the respiratory segment is relatively flexible, and the respiratory section mostly appears at the prosodic phrases and prosodic words.
Silent segment and inhalation parameters
Due to the short pause time between prosodic phrases, new breathing units are rarely generated. This paper only compares the silent segment and inhalation parameters between Shanghainese and Mandarin prosodic sentences and intonation phrases. The results show that (see Table 9), the silence period duration, inspiratory duration, inspiratory amplitude, and inspiratory slope between Shanghainese and Mandarin prosodic sentences are all larger than those corresponding to intonation phrases. In Shanghai dialect, the duration of silence between prosodic sentences and intonation phrases and the duration of inhalation are generally smaller than those in Mandarin. The inspiratory amplitude and inspiratory slope between prosodic sentences and intonation phrases in Shanghai dialect are generally larger than those in Mandarin.
Statistics of silent segments and inspiratory parameters between Shanghainese and Mandarin prosodic units
Statistics of silent segments and inspiratory parameters between Shanghainese and Mandarin prosodic units
Regarding the relationship between the breath reset and the silent section, there must be a silent section where there is a breath reset, but there is not necessarily a breath reset where there is a silent section [19, 20]. In Fig. 4, the Shanghai dialect (transliteration) “When the North Wind and the Sun get together to argue about who’s the best, they can’t tell” has an obvious silent segment in the voice signal channel, but does not show respiration reset corresponding to the respiratory signal channel.
Shanghai dialect voice breathing signal comparison chart I.
In reading aloud texts, the inhalation curve mostly corresponds to the silent segment, but there are exceptions. In Fig. 5, the speech signal “on the road” (/lu lã/) circled at ⟁ corresponds to the rising segment of the breathing curve, indicating that the speaker of the word completes the speech task when inhaling. The voice signal “thick” (//) circled at ⟂ corresponds to a flat breathing curve, and there is a situation of “breath-holding”. Some scholars believe that breath-holding can be regarded as a special kind of exhalation (Feng Shi et al., 2009).
(The content of the transliterated text of the Shanghai dialect material in the picture: the content of the Mandarin text: the content of the Mandarin text: the content of the content of the Mandarin text: the content of the content of the Mandarin text: the content of the content of the content of the Mandarin text: the content of the content of the content of the mandarin text: the content of the content of the content of the mandarin language: the content of the content is that you cannot tell the difference between the high and the low.)
Shanghai dialect speech breathing signal comparison chart II.
(The corresponding transliterated text content at Shanghai Dialect in the picture: There is a morning light, and a man walks by Lu Lang, wearing a thick coat. Text content in Mandarin: At this time, there is a walkway on the road. Yes, he’s wearing a heavy coat.)
Pauses in speech flow when completing reading tasks are not only due to physiological needs, but are also affected by other factors. According to the analysis of the reasons for the breathing signal as shown in Fig. 5, the Shanghai dialect speakers of this corpus are far more proficient in the text than other speakers. This makes it possible to control the text more freely, and there are flexible corresponding modes of prosody unit and breathing unit, such as inhalation pronunciation, “breath-holding” pronunciation, etc.
B. Conrad and P. Schönle [2] believed that the subject’s proficiency in text would have a certain influence on the breathing curve. The article summarized the possible breathing cycles and types of pauses in a speech pause (see Fig. 6) and pointed out that, Pause type 1 (1a, 1b) accounts for 80% of pause types. Obviously, the pause type 1a belongs to the inspiratory part pronunciation.
In I of Fig. 6, when reading aloud for the first time (curve A), the expiratory curve at A ⟀ is a steep drop after a new inspiratory segment; when reading aloud for the second time (curve B), the extension line of the expiratory curve Immediately after the steep drop line (B ⟀) of the expiratory curve, and the entire expiratory segment becomes longer; when reading aloud for the third time (curve C), the expiratory curve drops gently, and there is no obvious steep drop line after that, while It is to take a supplementary breath at 4.5 s, and at the same time, the voice is produced with the inhalation segment.
The possible breathing cycles and types of pauses in a speech pause.
Figure 6: The breathing and speech signal graphs of a subject reading the same text aloud three times (A–C) in a row, with the text content above the speech graph. A, B, C text content to the effect: “The story of a stupid donkey carrying salt. A donkey carried salt across the river and walked to the middle of the river. The river water did not pass its salt, and its buttocks felt very comfortable, soaked in the cool river water. salt until that …”. The dashed lines indicate the beginning and end of the three breathing cycles. II: Schematic diagram of possible breathing cycles during a speech pause.
Taking Shanghai dialect and Mandarin Chinese allegorical style reading as an example, this paper preliminarily discusses the interaction between discourse rhythm and respiratory rhythm. The article points out that, regardless of Shanghai dialect or Mandarin, the mean value of respiratory parameters is positively correlated with the size of the respiratory unit. There is an implied relationship between breath reset and silent segment. Where there is a breath reset, there must be a silent segment, and where there is a silent segment, there is not necessarily a breath reset. In addition, the proficiency of the text will obviously increase the complexity of the breathing curve, showing more special breathing characteristics such as “breath-holding” pronunciation. This paper investigates the prosody characteristics of Shanghai dialect and Mandarin from the perspective of the physiological mechanism of breathing, which provides a possibility for multimodal synthesis of dialects and a new method for the study of acoustic prosody.
Footnotes
Acknowledgments
This study has received support by the Scientific Research Incubation Program of Ningbo University of Technology (2022): Research on the Creative Transformation and Innovative Development of Chinese Dialects from the Perspective of Media Communication (No. 2022TS22).
