Abstract
Objectives:
Constitutions are Traditional Chinese Medicine syndromes that are used to classify symptoms. The present study sought to identify objective acoustic features for eight commonly occurring abnormal constitutions.
Methods:
Speech signals were obtained from 281 subjects through a 1-second vowel sound, /a/, uttered by the subjects. For each constitution, differences in acoustic parameters between the low-score and high-score groups were compared.
Results:
Subjects in the high-score groups for Yin-deficiency, Qi-deficiency, Phlegm-wet, Blood-stasis, and Qi-depression showed lower acoustic intensities than subjects in the corresponding low-score groups (all p<0.05). Subjects in the high-score groups of Qi-deficiency and Blood-stasis exhibited higher maximum pitches and higher minimum pitches than subjects in the low-score groups (all p<0.01). The average number of zero-crossings was lower in the high-score groups of Qi-deficiency and Blood-stasis than in the low-score groups for both constitutions (p<0.05). Subjects in the high-score group of special diathesis demonstrated higher low-spectral-energy ratios than subjects in the low-score group (p<0.05), and subjects in the high-score group of Blood-stasis had higher middle spectral energy ratios than subjects in the low-score group (p<0.05). In contrast, the middle spectral energy ratio in the high-score group of special diathesis was lower than in its corresponding low-score group (p<0.05). The high spectral energy ratios were lower in the high-score groups for Yin-deficiency and Blood-stasis (both p<0.05) than in the low-score groups.
Conclusions:
The present study identified acoustic features for constitutions and established objective methods for constitutional diagnosis. These acoustic features can potentially be applied in the expert system of Traditional Chinese Medicine for the diagnosis of constitutions in the general population.
Introduction
Since the questionnaire is the main tool for measuring constitutions, subjective feelings determine the diagnostic results. To develop objective methods of constitutional characterization, many research studies have been performed to connect biologic signals, including pulse waves, heart rate variability, and tongue images to the constitutions. 7 –9 Nevertheless, studies that apply modern speech analytical methods on TCM auscultation are very limited. One of the few studies reported that acoustic parameters in patients with chronic rhinosinusitis with Phlegm constitution were different from patients without the Phlegm constitution. 10 Another study found that by analyzing four novel acoustic parameters (including the average number of zero-crossings, variation in peaks and valleys, variation in formant frequencies, and high or low spectral energy ratios), patients with Qi-deficiency and Yin-deficiency can be differentiated from those with the normal constitution. 11 Since all of these previous studies involved subjects with various diseases, it is not known whether their results can be applied to the general population. Moreover, because previous studies focused on just two to three constitutions, the relationship between acoustic parameters and the other constitutions remains uninvestigated.
The present study identified the acoustic features for all of the eight abnormal constitutions. In addition to novel acoustic parameters and spectral energy distributions, traditional parameters—including intensity and pitch, both of which are commonly used in the practice of TCM—were also analyzed. 12 Acoustic parameters for the high-score group and low-score group for each constitution were compared to find the distinguishing features for the constitutions.
Materials and Methods
Subjects
Subjects were recruited through an advertisement at the China Medical University Hospital (CMUH) between February and December 2011. All subjects received a full explanation of the study and provided written informed consent. Inclusion criteria included that subjects be more than 20 years of age while not taking any medications in the past 3 months to avoid medication effects. Subjects were excluded if they were pregnant, or suffering from acute diseases or acute pain. The study was approved by the Institutional Review Board of CMUH.
Speech signal measurement
Subject speech signals were recorded after a 20-minute rest in a quiet room with echo suppression in the Department of Chinese Medicine of CMUH. A demonstration audio was played for the subjects, and then the subjects were asked to pronounce a sustained /a/ vowel sound using their usual speech volume 5 cm from a 90-degree angle unidirection stereo condenser microphone (SONY ECM-MS907, Japan) for 1 second. The speech signals were then digitized using a sound blaster (Model no. SB1090, Creative Labs, Singapore) at a 44.1 KHz sampling rate, with 16-bit resolution. Speech signals were acquired and analyzed using a data acquisition system developed under the LabVIEW (National Instruments Corporation, TX, USA) environment.
13
The acoustic parameters were measured and calculated as described in the previously cited study.
11
Briefly, to measure the average number of zero-crossings, the input utterance /a/ sound was first divided evenly into eight segments. Durations of 100 ms from the first peak of the second, fifth, and seventh segments were then extracted. The resultant portions were denoted by S1(i), S2(i), and S3(i), with 0≤i≤999. The number of zero-crossings, ZCj
, for segment j was
Measurement of constitutional score and low-score/high-score grouping for abnormal constitutions
The constitutional scores were measured using the Nine-Constitution Scale, which has test–retest reliability scores for the nine constitutions ranging from 0.77 to 0.90, with internal consistency (Cronbach's α) ranges from 0.72 to 0.82.
6
The Nine-Constitution Scale is a self-reported questionnaire comprising 6 to 8 items for each constitution (Supplementary Table S1; Supplementary material is available online at
Statistical analysis
Statistical analyses were performed using the SPSS 18.0 statistical software package. Individual variables were examined by percentage, mean, and standard error of the mean (SEM). Differences in acoustic parameters between the high-score group and low-score group for each constitution were analyzed via independent t-test. A two-tailed p-value <0.05 was considered to be statistically significant.
Results
Subject characteristics
A total of 281 participants were recruited for the study. They had a mean age of 38.0±0.8 years (range, 20–70 years), a mean height of 162.2±0.5 cm (range, 145–183 cm), a mean weight of 59.2±0.7 kg (range, 39–102 kg), and a mean body mass index of 22.4±0.2 kg/m2 (range, 15.8–38.4 kg/m2). There were 5.7% of the subjects who had current chronic diseases, and 23.5% reported surgical histories. The demographic characteristics of the enrolled subjects, including gender, education level, employment status, current diseases, and surgical history are summarized in Table 1. Among the eight abnormal constitutions, qi-deficiency had the highest mean score and special diathesis had the lowest mean score in these subjects. The mean score for each constitution and the number of subjects in the low-score group and high-score group for the eight abnormal constitutions are shown in Table 2.
Data were represented by the mean±standard error of the mean.
Differences in traditional acoustic parameters between low-score and high-score subjects in the abnormal constitutions
Acoustic parameters for the low-score group and high-score group of each abnormal constitution were compared. In terms of traditional acoustic parameters, subjects in the high-score groups of Yin-deficiency (p=0.044), Qi-deficiency (p=0.030), Phlegm-wet (p=0.046), Blood-stasis (p=0.001), and Qi-depression (p<0.001) had lower acoustic intensities than subjects in the corresponding low-score groups (Fig. 1A). Maximum pitches in the high-score groups of Qi-deficiency (p=0.005) and Blood-stasis (p<0.001) were higher than in the low-score groups of both constitutions (Fig. 1B). Minimum pitches were also higher in the high-score groups of Qi-deficiency (p=0.006) and Blood-stasis (p=0.002) than in the low-score groups of these constitutions (Fig. 1C).

Differences in traditional acoustic parameters between low-score groups and high-score groups of the abnormal constitutions.
Differences in time-domained acoustic parameters between low-score and high-score subjects in the eight abnormal constitutions
Waveforms of the whole 1-second /a/ sounds and their middle segments of 5000 points are shown in Figure 2A and B, respectively. By gross comparison, time-domained waveforms for the normal constitution and the abnormal constitutions differed in density and amplitude. Subjects in the high-score groups of Qi-deficiency (p=0.009) and Blood-stasis (p=0.023) had lower average numbers of zero-crossings than subjects in the corresponding low-score groups (Fig. 3A). There were no significant differences in variation in peaks and valleys and in variation in formant frequencies between the high-score groups and low-score groups of all the constitutions (all p>0.05) (Fig. 3B and C).

Representative speech waveforms of /a/ vowel sound for normal and abnormal constitutions.

Differences in time-domained acoustic parameters between low-score groups and high-score groups of the abnormal constitutions.
Differences in spectral energy distribution between low-score and high-score subjects of abnormal constitutions
The formant spectra for the normal and abnormal constitutions are shown in Figure 4. Subjects in the high-score group for special diathesis had a higher ratio of low spectral energy (<800 Hz) than subjects in the low-score group (p=0.018) (Fig. 5A). Middle spectral energy was higher in subjects with high scores of Blood-stasis than in subjects with low scores of Blood-stasis (p=0.030). In contrast, subjects in the high-score group for special diathesis had a lower ratio of middle-frequency energy (800 Hz–3000 Hz) than subjects in the corresponding low-score group (p=0.010) (Fig. 5B). The ratio of high-frequency energy (>3000 Hz) was lower in the high-score groups of yin-deficiency (p=0.002) and Blood-stasis (p=0.016) than in the corresponding low-score groups (Fig. 5C).

Representative spectra of /a/ vowel sound for the normal and abnormal constitutions. H, harmonious constitution (signals measured from subjects for whom all the abnormal constitutional scores were <40). YaD, Yang-deficiency; YiD, Yin-deficiency; QDf, Qi-deficiency; PW, Phlegm-wetness; WH, Wet-Heat; BS, Blood-stasis; SD, special diathesis; QDp, Qi-depression.

Differences in frequency energy ratios between low-score groups and high-score groups of the eight abnormal constitutions.
Discussion
In order to determine objective diagnostic methods for constitutional identification, this study attempted to identify acoustic features for the eight abnormal constitutions. Differences in acoustic parameters were found between the low-score group and high-score group for six of the abnormal constitutions, namely, Yin-deficiency, Qi-deficiency, Phlegm-wetness, Blood-stasis, special diathesis, and Qi-depression. Of the eight abnormal constitutions, subjects with Blood-stasis had the most acoustic characteristics that differed from subjects without Blood-stasis, indicating that Blood-stasis may be the most proper constitution for identification by acoustic parameters. The acoustic parameters that can characterize Blood-stasis included low intensity, high maximum pitch and minimum pitch, low average number of zero-crossings, high energy ratio in middle frequencies, and low energy ratio in high frequencies. Subjects with Qi-deficiency had the second highest number of characteristics that may distinguish them from subjects without Qi-deficiency. Those characteristics included low intensity, high maximum pitch and minimum pitch, and low average number of zero-crossings. Subjects with Yin-deficiency had low intensity and low energy ratios in high frequencies, and those with special diathesis had high energy ratios in low frequencies but low energy ratios in middle frequencies. The Phlegm-wet and Qi-depression constitutions could only be characterized by low intensity. Subjects with Yang-deficiency and Wet-Heat could not be differentiated from subjects without these two constitutions by any of the examined acoustic parameters.
Human speech is formed by a series of pressure waves, with the airstream to power the speech generated in the lungs and then modulated by structures within the vocal tract. The vocal folds in the pharynx produce large pressure disturbances. The air waves are also altered when passing through the supraglottal vocal cavities, including the pharynx, oral cavity, and nasal cavity, whose shapes are controlled by articulators including the tongue, lips, soft palate, and mandible. 14 In addition, the various emotions, such as anger, anxiety, joy, sadness, fear, disgust, despair, and surprise, also alter speech utterances in humans. 15 The present study demonstrated that different constitutions can be characterized by different acoustic parameters, which might derive from changes in the aforementioned utterance structures and influencing factors.
Intensity is influenced by lung volume and mental status. With decreased end-inspiratory lung volume, the subglottal pressure, peak-to-peak flow amplitude, and glottal leakage tend to decrease, 16 leading to decreased speech intensity. 17 High-score groups for Yin-deficiency, Qi-deficiency, Phlegm-wet, Blood-stasis, and Qi-depression exhibited low speech intensity, physically implying that the end-inspiratory lung volume might be decreased in subjects with these abnormal constitutions. This observation might explain the weak voice in subjects with Qi-deficiency, and chest discomfort in subjects with Phlegm-wet, Blood-stasis, and Qi-depression. 18 On the other hand, since speech produced by sad individuals has lower intensity than speech associated with happiness or neutral emotion, 19 it is speculated that apart from organic abnormality, depression and anxiety might also be the cause of low-intensity speech in subjects with Qi-depression, which is characterized by both negative mental states and emotional fragility. 20
Vocal pitch is related to the vocal folds, which physiologically behave like a damped harmonic oscillator during phonation. 21 The higher the longitudinal of the prevalence tension, the longer the length, 22 and the drier the mucosa of the vocal folds, the higher the pitch can be generated. 23 Moreover, patients in pain tend to raise the pitch of their vocalizations higher than normal. 24 The data in this study revealed that subjects in the high-score groups for Qi-deficiency and Blood-stasis had both higher maximum and minimum pitches than subjects in the corresponding low-score groups for these two constitutions, implying that subjects with high scores for these two constitutions might have the aforementioned pathological changes in their vocal folds. Moreover, since Blood-stasis is characterized by pain, it is speculated that the elevated vocal pitches for these subjects might also have been due to physical discomfort.
A zero-crossing is an instantaneous point at which there is no voltage present. A high number of zero-crossings indicates a high overall frequency of waveforms. 25 In the present study, subjects with Qi-deficiency and Blood-stasis had lower average numbers of zero-crossings, suggesting that the waveforms of subjects with high scores for Qi-deficiency and Blood-stasis were looser than those of subjects with low scores for these two constitutions. The present study is consistent with a previous finding that in patients with autoimmune diseases, vocalizations of those subjects with Qi-deficiency exhibit a lower average number of zero-crossings. 11 The present study extended this result to the general population and identified Blood-stasis as another constitution having a low average number of zero-crossings.
Several factors affect the distribution of spectral energy in acoustic signals. When the voice is breathy, the ratio of low-frequency signal components tends to dominate waveforms and high frequency energy ratio is relatively decreased. 26 It has also been reported that the ratio of high-frequency energy (2000–4000 Hz) to low-frequency energy (0–1000 Hz) increases when a vowel with high nasalance is uttered. 27 The spectral energy in high frequencies also increases in short vocal folds. 28 Moreover, in a voice with hoarseness, the energy of noise around 1500 Hz increases, which might lead to a decrease of ratios in the high and low frequencies. 29 This study revealed that in subjects with high scores of special-diathesis, there was a spectral energy shift from middle frequencies to low frequencies, implying low nasalance and greater breathiness during utterances produced by such subjects. This low nasalance and greater breathiness might be due to nasal symptoms caused by allergic rhinitis, such as nasal obstruction and rhinorrhea. The present study also revealed an energy shift from high spectral energy to middle spectral energy in the high-score group for Blood-stasis, implying noise caused by hoarseness around the middle frequencies for these subjects. On the other hand, while the decrease in high-frequency ratios for subjects in the high-score group for Yin-deficiency could speculatively be attributed to low nasalance or longer vocal folds, the real cause of this change in formant energy has yet to be definitively identified.
Although the present study has yielded findings that connect acoustic parameters to the various constitutions, it is not without flaws. First, this study is limited since it did not have practitioners of TCM examine the subjects to verify that the syndromes the constitution scale identified were the same found by practitioners in routine clinical practice. Future studies could also examine this. Second, since the structures and functions of the subjects' vocal tracts and their mental states were not examined, the causes in acoustic parameter changes associated with the constitutions can only be speculated about. Further studies are needed, then, to verify the causes of these acoustic changes. Third, although this study identified acoustic features for six constitutions, there are still two constitutions (Yang-deficiency and Wet-Heat) that cannot be distinguished by acoustic parameters. Parameters related to the other diagnostic methods in TCM, including pulse diagnosis and tongue diagnosis, should be taken into account for the objective diagnosis for these two constitutions.
Conclusions
In conclusion, the present study identified the acoustic features for abnormal constitutions that are commonly used in TCM. These acoustic features might help not only in the objective diagnosis of constitutions, but also in understanding of the pathogenesis of TCM constitutions. The data of this study can also potentially be applied in the TCM expert system for the diagnosis of constitutions in the general population.
Footnotes
Acknowledgments
This study was entrusted by the Committee on Chinese Medicine and Pharmacy, Department of Health (CCMP100-RD-025); however, the contents of the study in no way represent the opinion of the committee.
Disclosure Statement
No competing financial interests exist.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
