Abstract
A two stage recognition method combined multiple kind of features was proposed to overcome the limitation of single kind of feature in the lung sound recognition. The method combines the improved Welch power spectrum, Mel cepstrum coefficients and the linear prediction cepstral coefficients based on the wavelet decomposition. In the first stage, pneumonia samples and asthma samples are firstly taken as the abnormal category. Then a two-class classifier based on random forests is trained to identify the normal samples and the abnormal samples. In the second stage, a classifier based on random forests is trained to recognize pneumonia and asthma from the samples classified as the abnormal samples in the first stage. To further improve the accuracy, a multi granularity cycle segmentation method of lung sounds was presented, which is based on the short time zero crossing rate. It can better segment lung sounds. Experimental results showed that the proposed method greatly improved the recognition accuracy, especially for improving the accuracy of pneumonia and asthma.
Keywords
Introduction
Pulmonary auscultation can diagnose lung diseases through lung sounds, which is a common method to prevent and diagnose pulmonary diseases. Lung sounds are the sound signals, which are produced by gas exchange in respiratory system. Lung sounds have been produced during the whole respiratory process. They contain rich physiological and pathological characteristics. Due to the factors such as air pollution, smoking and aging of the population, the incidence of chronic obstructive pulmonary disease, asthma, lung cancer and other diseases at home and abroad is obviously increased. The traditional diagnosis of lung sounds has been restricted by various conditions. The use of the electronic stethoscope can help to improve the effect of lung sound diagnosis. The lung sound data collected by the electronic stethoscope can be stored in the computer. The lung health status of the patients can be predicted through analyzing the time-frequency features of the lung sounds with software or algorithms.
Lung sounds are nonstationary random signals. To diagnose lung diseases with computers, we need to extract the features of lung sounds. There are many methods to extract the features from the lung sounds, such as time domain analysis, Fourier transform, short time Fourier transform, spectral analysis, wavelet analysis and so on. Welch power spectrum and bispectrum are commonly used spectral analysis methods. The normal lung sounds and three kind of abnormal lung sounds are recognized through extracting the statistical characteristics of Welch power spectrum from lung sounds in [1]. The method is obtained based on the bispectrum, which extracts the spectral peak, the spectral peak interval and the slice energy. The method can extract different features from various kind of lung sounds and it is easy to be realized [2].
Unlike the traditional analysis method, Hilbert Huang transform is suitable for analyzing nonlinear and nonstationary signals, which completely breaks away from the constraint of linearity and stability in [3]. The crackles are effectively recognized through extracting the time and frequency distribution characteristics of the peaks from the lung sounds after being executed Hilbert Huang transform in [4]. According to the high sensitiveness of the fractional Hilbert transform to the abnormal components of the signals, the different fractional Hilbert transforms are used to rhonchi signals. And the results are used to build the correlation function as features, which are employed to detect rhonchi signals [5]. The improved Hilbert Huang transform is utilized to extract the enveloping information of the anesthetic respiratory sound, which can reflect the patient’s tide and more accurately detect the breathing status of patients in [6]. Three kind of Hilbert Huang transform are proposed, namely, the instantaneous envelope of the intrinsic mode function (IMF), the instantaneous envelope of the first four layers of IMF and the maximum of the edge spectrum in [7]. Better results are acquired.
The features of lung sounds can also be extracted by linear predictive cepstrum coefficients (LPCC) [8] and Mel frequency cepstrum coefficients (MFCC) [9, 10]. The method combining LPCC with the wavelet decomposition is put forward in [11]. In the method, the wavelet coefficients are firstly obtained by the wavelet decomposition. Then the linear predictive cepstrum coefficients are acquired through using LPCC algorithm on the wavelet coefficients. Finally, the feature vectors are extracted from the cepstrum coefficients, which can effectively recognize the polyphonic lung sounds and the sharp lung sounds. In [12], the features of lung sounds are extracted by MFCC and the effectiveness is validated by two classifiers respectively. MFCC features are extracted from single channel lung sound signals in [13, 14]. Hjorth parameters, mean, standard deviation, skewness, kurtosis and entropy are derived from MFCC. Better results are obtained in [13]. The method based on VDA-GMM model is presented to check MFCC features and HMM is used to classify lung sounds, which is suitable for recognizing COPD in [15]. The statistical features are extracted from mel-frequency cepstral coefficients of lung sounds to recognize lung sounds in [16]. Results show that the method outperform commonly used wavelet-based features as well as standard cepstral coefficients including MFCCs.
The wavelet decomposition has been widely applied in extracting features from lung sounds. The wavelet transform is used to decompose the lung sounds into 7 levels in [17]. And the statistical features are extracted from the third level to the seventh level as the feature vectors. Lung sounds are better classified by the support vector machine and the neural network on the feature vectors. The features are extracted from the wavelet coefficients by MFCC and non-Gaussian power, which is used to detect cough and burst sounds in lung sounds and is taken as a basis for judging children’s pneumonia in [18]. The wavelet packet decomposition is used to get the energy of lung sounds with different frequency range, which is taken as features to recognize four kind of lung sounds including the normal, tracheitis, pneumonia and asthma in [19]. Wavelet transforms and neural networks are used to classify asthmatic breath sounds in [20].
Many methods are used to recognize lung sounds after extracting appropriate features from lung sounds, such as support vector machine (SVM), K nearest neighbor (KNN), naive Bayes (NB), Fisher discriminant, artificial neural network (ANN), convolutional neural network (CNN) and so on. Crackles and wheezing are effectively identified through the wavelet decomposition and BP neural network in [21]. BP neural network is combined with the genetic algorithm to classify lung sounds in [1]. The linear parameterization of multi-channel lung sound information and neural network are used to effectively classify the lung sounds in [22]. Applications of machine learning in lung disease classification are investigated, such as logistic regression, decision tree and so on in [23]. Multilayer perceptron is used to identify lung sounds in [24]. A two-layer pattern recognition system architecture is proposed to detect asthma wheezing in recorded children’s respiratory sounds with SVM in [25]. A spectral subband based feature extraction scheme is proposed that works with artificial neural network and support vector machine classifiers for the multichannel signal in [26]. Convolutional neural network is used to classify respiratory audio in [27, 28]. Random forest algorithm is proposed by Breiman in 2001 [29]. In the method, the decision tree is as the base classifier and bagging algorithm is used to create many decision trees. Then the feature subset of the training set is randomly selected by the random subspace algorithm. Finally, the results are determined according to the voting results of the decision trees.
In many lung sound recognition methods, single kind of feature is extracted to classify lung sounds. However, it is difficult to fully describe lung sound information with single kind of features because lung sounds are complex and nonstationary signals. Meanwhile, the cycle segmentation of lung sounds plays an important role in effectively extracting features from lung sounds. In this paper, we proposed a recognition method, which combined Welch power spectrum, MFCC and LPCC based on the wavelet decomposition. And the random forest method is used to train the classified model. At the same time, a multi granularity cycle segmentation method, which was based on the short time zero crossing rate, was proposed to improve the recognition accuracy. And a two stage classifier was designed, which got better classification results.
A multi granularity cycle segmentation method of lung sounds based on short time zero crossing rate
De-noising of lung sounds
Although the electronic stethoscope can remove lots of noise, it is unavoidably to mix with some ambient noise as well as other noise such as muscle friction sounds, gastrointestinal peristalsis sounds and heart sounds. The frequency band of the heart sounds is 5∼600Hz, which highly overlaps the low frequency part of the lung sounds. Therefore, it is impossible to remove heart sounds under without damaging lung sounds. In the paper, a hybrid de-noising technique is used. A raw lung sound signal is as shown in Fig. 1. At first, the low frequency noise is deleted by a high-pass filter. The result is shown in Fig. 2. Then the heart sounds are removed by the wavelet threshold method. The pure lung sound is obtained as shown in Fig. 3.

A raw lung sound.

The lung sound of removing the low frenquecy noise.

The lung sound of removing the heart sounds.
Although the lung sound signal is a nonstationary signal, it is a periodic signal. It takes a respiratory process as a cycle. It is the key to correctly segment cycles of lung sounds for effectively extracting features from lung sounds. Here, we use short time zero crossing rate to segment the lung sounds. Short time zero crossing rate represents the frequency of a frame signal passing through the horizontal axis. For the continuous signal, the zero crossing rate is the frequency of the wave passing through the time axis in the time domain. For the discrete signal, it is called zero crossing if the adjacent sampling value changes the symbol. Let x (n) be a lung sound, y
i
(n) be a frame and L be the frame length, the short time zero crossing rate is as follows:
where sgn[•] is the sign function, namely

The instance of cycle segmentation of lung sounds. (a) A raw lung sound, (b) The length of each frame is 1 second, (c) The length of each frame is 0.5 second, (d) The length of each frame is 0.25 second, (e) Segmentation results of the lung sound in (a).

Cycle segmentation results of the lung sound.
Although the lung sound is a periodic signal, the length of each cycle is not equal and the boundary between some cycles are not either obvious. It is necessary to determine cycles with some method. To quickly and accurately segment lung sounds, we adopt a multi granularity method. For the lung sound removed noise, the signal is firstly segmented into several frames where the length of each frame is one second. The zero crossing rate is calculated and the wave trough is found, for example the point a in Fig. 4 (b). Then the length of each frame is shortened to 0.5 second and the zero crossing rate is computed. The nearest wave trough b far away the point a is obtained. And so on, the nearest wave trough c far away the point b is acquired through shortening the length of each frame to 0.25. Then the exact location of the wave trough a is determined through b and c. Finally, the final location of the trough is obtained, namely, the location of the cycle. The steps are shown in Fig. 4.
When the lung sound is segmented with the above method, the cycles possibly have some errors because the lung sound signal include a lot of complex components. Therefore, the following treatment is done to the cycles after the segmentation according to the real human respiration.
According to the actual situation of human respiration, the cycle will be deleted if it is too short or too long. And the cycle is divided into two cycles if its length is from 5 seconds to 8 seconds. During processing, the signal at both ends will be removed. The segmentation results of the lung sound in Fig. 1 are shown in Fig. 5. Each box in Fig. 5 represents a cycle.
After segmenting the lung sounds with the above method, each cycle will not exceed 4 seconds. We get 287 normal lung sound cycles, 245 pneumonia sound cycles and 170 asthma cycles from 36 normal lungs, 33 pneumonia sounds and 25 asthma sounds.
Feature extraction based on Welch power spectrum
The power spectrum represents the signal power at unit frequency, which describes the distribution of signal power in frequency domain. The period graph method [30] is a common method of spectral estimation. X (w) can be gained through directly performing Fourier transform on the signal x (k) , k = 1, 2, …, N . The power spectrum estimation is as follows:
Although the period graph method is simple, the fluctuation of the spectral lines is intensified when N is too large. And the resolution is poor when N is too small. The average period graph method effectively improves the problem. The signal is firstly divided into several frames and the period graph method is executed on each frame. Then their mean is calculated as the power spectrum estimation of the signal. Welch power spectrum has further improved the average periodic graph method, which allows partial overlap between frames when the signal is divided. It increases the number of frames and improves the estimated performance. Otherwise, it adds a window on each frame (Hamming window or Hanning window), which degrades the spectral distortion caused by the rectangular window.
It is not suitable for classification and recognition because Welch power spectrum has higher dimension. To this end, we treat the power spectrum with statistical method. We firstly divide the power spectrum into several frames, namely, X = {X1, X2, …, X
n
} . Then Y ={ yi1, yi2, yi3 } is obtained through analyzing each frame. Here,
MFCC simulates the auditory characteristics of human ears and has better recognition and anti-noise ability. The human hearing perception is linearly related to the frequency below 1000Hz and is logarithmically related to the frequency above 1000Hz. The transformation formula between the actual frequency and Mel frequency is F mel = 2595 log(1 + f/700) .
A number of band-pass filters {Hm(k)} are set up in the frequency range of lung sounds where 0 ≤ m < M and M represents the number of filters. All these filters have the characteristics of the triangle filter, which central frequency is f (m) . Their bandwidth is equal in the range of Mel frequency. The transfer function of each band-pass filter is as follows:
Do preprocessing. Lung sounds are added windows, which goal is to reduce leakage in frequency domain. When lung sounds are multiplied by hamming windows, the lung sound x (n) are changed to x (m) . Run fast Fourier transform. To study the lung sounds in the frequency domain, we perform fast Fourier transform on the lung sounds.
Calculate the spectral line energy. The energy of the spectral line is calculated on results of FFT.
Compute the energy passing Mel filters. The energy spectrum is computed by using Mel filters. And the energy in Mel filters is figured out. The energy can be computed with the following formula.
Calculate DCT cepstrum. FFT cepstrum
According to the discrete cosine transform, DCT of the lung sound x (n) is as follows:
According to Equation (13), we can conclude that the cepstrum of FFT is the inverse transformation of FFT after calculating logarithm of X (k). And the cepstrum of DCT is similar to the cepstrum of FFT. DCT can be computed through solving logarithm of the energy S (m) of Mel filters obtained by step 4).
The wavelet decomposition is a method combining time domain and frequency domain, which is especially suitable for analyzing and processing nonstationary random signals. The wavelet analysis of the lung sounds is to filter the signals with a series of continuous low-pass and band-pass filters. Each filter will get the wavelet coefficients with different frequency components. Thus we can separate different frequency components of the lung sounds. It is not suitable for analyzing lung sounds with the wavelet coefficients because the dimension is too higher. Here, we adopt the method proposed in [11], which combine LPCC with the wavelet decomposition and is more robust.
When combining LPCC with the wavelet decomposition, the wavelet coefficients are firstly obtained by the wavelet decomposition. Then the linear prediction cepstrum coefficients are computed on the wavelet coefficients, which are taken as eigenvectors.
The wavelet function and the number of decomposition layers are firstly determined in the wavelet decomposition. Here we select the db5 wavelet by comparing the different wavelet functions. The signal is divided 5 layers with the wavelet decomposition. d1 ∼ d5 is the high frequency part of each layer and the approximate coefficient a5 is the low frequency part of the fifth layer. Their frequency bands are respectively 1000 ∼ 2000Hz, 500 ∼ 1000Hz, 250 ∼ 500Hz, 125 ∼ 250Hz, 63 ∼ 125Hz and 0 ∼ 63Hz according to the frequency decomposition characteristics of the wavelet decomposition. The steps are as follows: The frequency range of the original signal is 0 ~ 2000Hz. The first layer decomposition can gain the low frequency wavelet coefficient a1 and the high frequency wavelet coefficient d1 of the layer. The corresponding bands are respectively 0 ∼ 1000Hz and 1000 ∼ 2000Hz. The second layer decomposes the low frequency wavelet coefficients of the above layer. The low frequency wavelet coefficient a2 and the high frequency wavelet coefficient d2 of the layer are obtained. The corresponding frequency bands are respectively 0 ∼ 500Hz and 500 ∼ 1000Hz. The third layer decomposes a2, which can obtain a3 and d3. The corresponding frequency bands are respectively 0 ∼ 250Hz and 250 ∼ 500Hz. The forth layer decomposes a3, which can get a4 and d5. The corresponding frequency bands are respectively 0 ∼ 125Hz and125 ∼ 250Hz. The fifth layer decomposes a4, which can acquire a5 and d5. The corresponding frequency bands are respectively 0 ∼ 63Hz and 63 ∼ 125Hz.
The relationship between the wavelet coefficients and the frequency bands of each layer is shown in Table 1.
Relationship between the wavelet coefficients and the frequency band
Relationship between the wavelet coefficients and the frequency band
In Table 1, a i and d i respectively express the low frequency wavelet coefficient and the high frequency wavelet coefficient of the ith layer.
The high frequency wavelet coefficient d
i
s of the 2-5th layers are arranged in the frequency band range from low to high, which covers the bandwidth of the whole signal. The signal can be represented with 4 wavelet coefficients.
The different frequency components of the signals are separated through the wavelet decomposition. Each frequency component corresponds to a wavelet coefficient. The characteristics of the different frequency components of the lung sounds can be obtained by extracting the features on the wavelet coefficients. Then the feature L = [l1, l2, l3, l4] can be acquired through extracting LPCC from the wavelet coefficients of each layer. Here, each l i contains 12 attributes.
The sensitivity of each kind of features is different for different classes of lung sounds. It is also proved by experiments in section 5. It is difficult to fully describe the information of the lung sounds with single kind of features. To improve the recognition accuracy, three kind of features are combined to describe the lung sounds. In experiments, we extract 36 Welch power spectrum feature W = [w1, w2, …, w36] , 12 MFCC feature M = [m1, m2, …, m12] and 48 LPCC feature based on the wavelet L = [l1, l2, l3, l4] where each element includes 12 attributes. The three kind of features can be combined by series connection, namely, all attributes of each kind of feature are all taken as the attribute of the combined features. Thus we get a feature vector F = [W, M, L] , which contains 96 attributes. The lung sounds can be more adequately described through combining these features, which will improve the accuracy.
Two stage lung sound recognition model based on random forests
Lung sound recognition model based on random forests
To diagnose the lung diseases from the lung sounds, a recognition model based random forests is proposed, which is shown in Fig. 6. In the model, the low frequency noise and the heart sounds in the lung sounds are firstly removed. Then the multiple granularity cycle segmentation method based on the short time zero crossing rate is used to divide the lung sounds into some cycles. Next Welch power spectrum features, MFCC features, LPCC features based on the wavelet are extracted. And the three kind of features are combined by series connection and the category of the lung sounds is recognized with random forests.
Results from the above model are shown in Table 2. The total accuracy is 77.71%. However, the recognition accuracy of asthma is lower.

Lung sound recognition model based on random forests.
Results from the model based on random forests (%)
In Table 2, the accuracy of asthma is relatively lower, which leads to a lower total accuracy. Then which classes are the asthma samples misclassified into? We can find answers from the confusion matrix. The confusion matrix of the classification results in Table 2 is shown in Table 3.
Confusion matrix
Confusion matrix
From the confusion matrix, we can discover that most of the asthma samples with misclassification (72.5%) are classified into pneumonia. From the view of the frequency domain, we compare the similarity between pneumonia and asthma. The signals in time domain are converted into frequency domain by fast Fourier transform. The frequency-amplitude of three kind of lung sounds is shown in Fig. 7.

Frequency domain diagram of three kind of lung sound signals (a) Asthma, (b) Pneumonia, (c) Normal.
By comparing the spectrum of the lung sounds, we can find that the spectrum energy of pneumonia and asthma is concentrated in 0-400HZ, while the spectrum energy of normal samples is distinct from the other two classes. Therefore, to improve the accuracy, pneumonia and asthma can be taken as a large class. And the normal lung sounds are regarded as a category. When we recognize the lung sounds, the samples are firstly classified into two categories: normal and abnormal. Then the abnormal samples are further divided into pneumonia and asthma. The two stage recognition model is shown in Fig. 8.

Two stage lung sound recognition model.
In the model, samples are firstly divided into normal lung sounds and abnormal lung sounds. The model based random forests in the first stage is constructed through inputting the training set and the class label. Then the abnormal samples from the first model are feed into the second model based on random forests. The samples were identified as pneumonia or asthma. Experimental results in section 5 show that the recognition accuracy is effectively improved with two classification models, especially the accuracy of asthma.
Experimental data
The data were collected by professional doctors using 3M 3200 type electronic stethoscope. The normal lung sounds were collected from volunteers and the abnormal lung sounds were collected from patients with specific diseases in the hospital. The frequency band of lung sounds is 100 ~ 2000Hz and the sampling frequency is 4000Hz. We collected 30 seconds lung sound of each individual under natural respiration. 36 typical normal lung sounds, 33 typical pneumonia sounds and 25 typical asthma sounds were selected from the data, which formed a lung sound data set consisting of 94 samples.
In experiments, the noise and the heart sounds in the lung sounds are firstly removed by the technologies introduced in subsection 2.1. Then the lung sounds are segmented by the multi granularity cycle segmentation based on the short time zero crossing rate. We acquire 285 normal lung sound cycles, 245 pneumonia sound cycles and 170 asthma sound cycles. Finally, three kind of features are extracted for each cycle and the category of the lung sound is identified.
Experiments include 4 parts: recognition based on single kind of features, recognition on combination features, comparison with other classifiers and comparison with other methods. In experiments, 5 fold cross validation is adopted where four-fifths of samples are taken as the training set and one-fifth is as the testing set. The mean of 5 times is counted.
In experiments, the improved Welch power spectrum is written as IWPS and LPCC based on the wavelet decomposition is written as LPCC-WD.
Recognition based on single kind of features
To compare the effects of different kind of features on the classification results, IWPS, MFCC and LPCC-WD are extracted through the method introduced in section 3. The effect on classification of single kind of features is tested on these features. At first, the features are respectively normalized. The features are directly classified into three categories by random forests. The results are shown in Table 4. Next the two stage model is used to classify the samples into three classes. The results from the model of the first stage are shown in Table 5. The abnormal samples recognized in the first stage are classified by the model in the second stage. The results are shown in Table 6.
Accuracy of directly classifying three classes (%)
Accuracy of directly classifying three classes (%)
Accuracy of two classes (%)
Accuracy of the two stage model (%)
From the experimental results, the accuracy of the direct classification by random forests on IWPS features is slightly higher than that of the two stage model. But the two stage model improves the accuracy of pneumonia and asthma. The accuracy of direct classification by random forests on other two kind of features is slightly lower than that of the two stage model. And the two stage model also improves the accuracy of pneumonia and asthma. In the two stage model, the recognition accuracy of the abnormal samples in the first stage is higher than that of the normal samples. The recognition accuracy of asthma is improved through classifying the abnormal samples by the second model. Apparently, the two stage model improves the recognition accuracy of pneumonia and asthma. But in general, the accuracy of asthma is still lower in contrast to normal samples and pneumonia samples.
Meanwhile, the accuracy of pneumonia from the two stage model is the highest on MFCC features. And the accuracy of asthma from the two stage model is the highest on IWPS features. It is proved that each kind of features has different sensitivity to different lung sound class.
Because single kind of features cannot fully describe the information of lung sounds, we test the effect of different combinations of features on the recognition results. At first, each two kind of features are combined and the combined features are classified with the two stage model. Then three kind of features are combined and the combined features are classified with the two stage model. The results are shown in Table 7. The results from the first model in the classification process are shown in Table 8. Features are combined by series connection for different combining means.
Results from combining different kind of features (%)
Results from combining different kind of features (%)
Results of the normal and the abnormal (%)
Experimental results show that the accuracy of the normal lung sound samples is improved by combining various kind of features. The accuracy of pneumonia from combining different kind of features is near to that from single kind of features. And the accuracy of asthma is greatly improved by combining IMPS features with MFCC features. The total accuracy is both improved by combining IMPS features with MFCC features and combining three kind of features.
From the results of the first stage, the total accuracy of combining three kind of features is the highest. And the abnormal accuracy of combining three kind of features is also the highest. Meanwhile, the recognition accuracy of asthma is effectively improved by combining IWPS with MFCC. Therefore, in the two stage model, three kind of features are combined for the first random forest model. IWPS and MFCC are combined for the second random forest model to classify the abnormal samples recognized in the first stage. The results are shown in Table 9. The comparison of various feature combining means is given in Fig. 9.
Results from combining three kind of features in the first stage and combining two kind of features in the second stage (%)

Comparison of various feature combining means.
Experimental results show that the sensitivity of different kind of features to different classes of lung sounds is different. The accuracy can be improved by adopting different feature combination mean for two stages. In the two stage model, better results are gained by combining three kind of features in the first stage and combining IWPS with MFCC in the second stage. And the accuracy of asthma is greatly improved under improving the accuracy of the normal and pneumonia.
In this section, we compare the proposed model with other classifiers on the same data set. Other classifiers include random forests, KNN, SVM and BP network. In experiments, the first stage combines three kind of features and the second stage combines two kind of features for the two stage model. And other four classifiers combine three kind of features. The results are shown in Table 10.
Comparison with other classifiers (%)
Comparison with other classifiers (%)
From the results, the accuracy of the normal samples from the proposed model is slightly lower of that from random forests, KNN and SVM. However, the total accuracy and the accuracy of asthma and pneumonia are greatly higher that from other four classifiers. Experimental results show that the two stage model greatly improved the recognition accuracy of lung sounds.
To further verify the effectiveness of the proposed model, we compare it with other methods on the same data set. The methods are respectively proposed in [1, 28]. These methods respectively extract Welch power spectrum [1], LPCC based on the wavelet decomposition [11], statistical features of MFCC [16], wavelet features [18] and wavelet packet features [19] to recognize the lung sounds. Convolutional neural network is used to classify lung sounds in [27, 28]. Results are given in Table 11.
Comparison with other methods (%)
Comparison with other methods (%)
Experiments show that the results from the proposed method is greatly superior that from other methods. And the recognition accuracy of asthma and pneumonia are greatly improved. The results from CNN is higher than that of other methods and the accuracy of asthma is also higher than that of other methods. But the accuracy of CNN is lower than that of the proposed method. It is mainly due to the insufficient sample size of lung sounds, which leads to over-fitting while training the network model.
Lung sounds can be used to diagnose lung diseases, which plays an important role in the prevention and treatment of lung diseases. Different kind of features has different sensitivity to different classes of lung sounds. To improve the recognition accuracy, we combine various kind of features to classify lung sounds. The lung sound recognition process includes into two stages. In the first stage, the lung sounds are divided into the normal samples and the abnormal samples by combining the improved Welch power spectrum, MFCC and LPCC based on the wavelet decomposition. In the second stage, the abnormal samples are identified as pneumonia and asthma by combining the improved Welch power spectrum with MFCC. The proposed model improved the accuracy, especially greatly improved the accuracy of asthma and pneumonia. Next we will collect more lung sound samples and further improve the recognition accuracy of lung sounds. And we will apply the two stage model to help the real lung disease diagnoses.
Footnotes
Acknowledgments
This work was supported in part by a grant from NSF of Hebei province of China (No. F2017202145).
