Abstract
Least Mean Square (LMS) and Normalized-Least Mean Square (NLMS) algorithms are very popular and frequently used algorithms for noise cancellation in speech. But selecting the step size for updating the weight of adaptive filter is the big issue in LMS and NLMS algorithms. So as to meet disagreeing requirements of quick convergence and less MSE, step size needs to be correctly controlled. Along with step size, length of adaptive filter also plays major role in the effective noise cancellation. These two factors greatly affect the performance of the ANC. To get the best possible solution, a variety of trials of filter length and step size are required. The main motivation behind the development of proposed High Performance Self Tuning (HPST) adaptive filter algorithm is to adaptively determine the step size. The selection of length of adaptive filter is based on the distance between two microphones in the ANC system. The proposed algorithm works very well, as shown in the experiments which are carried out on NOIZEUS speech corpus as well as actually recorded noisy speech signals. Results indicate that proposed algorithm is superior to referred algorithms in terms of Mean- Square- Error (MSE), Peak- Signal to Noise ratio (PSNR), convergence time and complexity.
Introduction
In Acoustic Noise Cancellation (ANC) adaptive filters are widely used. Many algorithms were proposed in the context of the ANC, but the workhorses are LMS and NLMS algorithms because of easy implementation and reasonable computationalcomplexity.
The step size μ and filter length M greatly affect the performance of the LMS and NLMS algorithms. A smaller step size μ results into more convergence time, less MSE. A large value of μ causes the algorithm to diverge which degrades the performance of adaptive filters. Hence selection of step size μ is required to be done so as to have a tradeoff between convergence time and MSE which is a difficult task. Lots of trials of the simulation are required to reach optimum step size.
Another practical issue is the selection of tap length of the filter. With the increase in filter length, convergence time of the filter and MSE increase. Hence filter of shorter length should be chosen. But to model real system long filters are required. Unfortunately, selecting number of taps of filter are largely a matter of experience and trial and error.
Selection of step size of the algorithm and length of adaptive filters, to get best possible noise cancellation is difficult in LMS and NLMS algorithms and needs various trials of the simulation.
Though these algorithms are widely used for the noise cancellation purpose, researchers tend to improve the performance of the ANC by enhancing and modifying the algorithm. Researchers have proposed many variants of variable step size algorithms in the past decades [2–12] in which several parameters are required to be tuned for better performance.
The tap length of a filter is very important parameter in deciding the performance of the ANC. Literature shows that shorter filters converge quickly than longer ones. But to model real systems, longer filters are necessary [9]. But with the increase in length of the filter, convergence time of filter increases. Christian Schuldt et al. [9] designed an algorithm in which length of a filter is adaptively determined. The author has analyzed the effect of too short and too long filters.
Recently a work has been done in the area of selection of filter tap length [7] in which author has proposed an FSSFL-NLMS algorithm which makes use of the ratio of the energy spectral density of noisy speech and noise for selecting a step size. It is observed that the results are improved for shorter lengths. In this algorithm value of step size was fixed and was decided based on experimentation.
A lot of work has been done in the development of adaptive filter algorithms for improvement of performance of the ANC. Many algorithms are based on tuning of different parameters which are not easy to control in real time. Hence it is required to design an algorithm which runs without requiring additional parameters.
This paper suggests a non parametric algorithm in which, filter length as well as selection of step size is done adaptively and with improvement in all performance parameters. The algorithm is tested for all speech sentences in NOIZEUS database. These sentences are added with eight real world noises with 0 dB, 5 dB, 10 dB, 15 dB SNRs. The algorithm is also tested for actually recorded noisy signals for three different noises.
Experimentation and results show that proposed HPST algorithm performs very well than LMS, NLMS and FSSFL-NLMS algorithms. The quality of recovered speech is evaluated by parameter metrics such PSNR, MSE, convergence time, misadjustment and computational complexity.
A proposed ANC system with two microphones is shown in Fig. 1 where d (n) is the noisy speech signal and forms primary input to ANC. It is represented as d(n) = s(n) + x’(n) and x(n) is the noise signal which forms the reference input to ANC.
The traditional ANC system in the literature consists of only reference signal as an input to the adaptive filter which is noise signal. Hence the weight updating of adaptive filter is dependent on energy of noise signal.
In the proposed ANC system, additional input is given to the adaptive filter which is noisy signal d(n) and is shown by dotted line. Proposed algorithm uses this d(n) signal to find PSNR. This ratio is used to choose the step size for updating the weights.
The following assumptions are made for the analysis of a system. There exists a high correlation between noise signal x (n) and noise present in desired speech signal x’(n) and no correlation with speech part of the signal s (n). Two microphones are isolated to avoid the speech being added to noise reference. Noise signal r (n) is fed to the filter input directly. Hence r (n) = x (n).
Adaptive algorithms
LMS algorithm [1]
Consider a transversal filter with the input X (n) with filter length M at the sampling point n.
X (n) is a column vector, subscript n is used as time index. And weight vector W (n) is
Filter output at time instant n is y (n) and given by
Error signal e (n) is given by
Substituting Equations (3) in (4) yields
Thus the cost function MSE is used to optimize the filter design and denoted by J
An estimate of gradient vector ∇J is obtained as
Substituting estimate of the above equation for gradient vector ∇J (n) in the steepest descent algorithm described by
e (n) = d (n) - W T (n) X (n), we get a new recursive relation for updating the tap vector
Equation (8) is the weight update equation for LMS filter.
The weight update equation for LMS filter is modified as
In order to overcome the problem of selection filter length (M) and the step size (μ) in NLMS algorithm, the author [7] has proposed FSSFL-NLMS algorithm. The initial value of μ is 0.01 and M = 32. Both are constant. Then in subsequent iterations step size is modified as explained by algorithm.
The algorithm uses the ratio R of ESD of noisy speech signal and the reference noise signal of length equal to filter length (M). The value of this ratio is used to update the weight vector.
Parameters:
M = filter length
μ= step size
E1=ESD of noise signal
Initialization:
w (0)= 0, M = 32, ɛ= 0.01
Computation: For n = 1, 2, 3, ... M
α= 20
For n = M+1, ...
α= 1
Value of α is chosen 20 for first M iterations considering that MSE is more during initial convergence and then after it reduces. Then for remaining iterations α is chosen to be 1.
In FSSFL-NLMS algorithm, filter length M = 32 and step size μ= 0.01. These values were chosen based on the experimentation. But the aim here is to design the algorithm which adaptively determines step size and need not be specified by the designer. Filter length is chosen based on the distance between two microphones as explained in Section 3 and it is constant throughout the experimentation. Difficult task of selection of step size and filter tap length in NLMS algorithm, is overcome by a proposed HPST algorithm which adaptively tunes itself to the appropriate step size. The HPST algorithm is summarized below. The variables that are used to calculate the step size formula of HPST algorithm are α and β. Here α represents the standard deviation of noise vector X of length M and β represents the sum of SNR values noisy signal d(n) and noise signal x(n) of length M during each iteration. Selection of filter length M is explained in the Section 3.
Parameters:
M = filter length
μ= step size (computed by the algorithm)
Initialization:
w (0) = 0
Computation:
For n = 1, 2, 3, ...
For i = 1 to M
End for
End for
Here x (n) is noise signal. snr (i) is the instantaneous ratio of noisy signal d (i) and noise signal x (i) at the ith instant of time in dB. snr (n) is average of all instantaneous snr values of length M.
Approximately first 1000 samples are required for adaptive filter to tune itself to the changing input. If we use 8KHz sampling frequency, then the time for around 1000 samples is 125 ms which is negligible as compared with the total signal length. At the beginning, i.e., during the silence period, only the noise is present. Hence, to calculate the step size standard deviation of noise signal is considered. It is observed from Equation (16), that initial value of the step size up to 1000 samples are quite higher than remaining samples. After that when noisy speech occurs, the step size is calculated by taking the ratio of noisy speech and noise as expressed in Equation (16). The proposed algorithm calculates the Step size μ on its own using this snr (n) i.e. algorithm tunes itself to the appropriate step size. The designer need not have to specify the step size value which is a difficult task.
Ideally, when the filter begins to converge, step size should be large so that weight vector w (n) moves speedily towards the optimum solution. Then, as the filter converges towards the optimum solution, step size should be decreased so that excess MSE is decreased. The proposed algorithm works exactly as in the ideal case. From Equation (16) we see that μ (n) is adjusted automatically and is totally dependent on noisy signal d (n) and noise signal x (n).
We define the weight error vector
Such that C (n) represents the errors in the estimates of the optimum coefficients at time n. If x (n) and d (n) are jointly WSS then
The steepest descent algorithm becomes
Subtracting W
opt
from both sides
In terms of weight error vector Equation (20) becomes
To further simplify these matrix equations, R can be described by eigenvalue decomposition as
Λ= a diagonal matrix containing eigenvaluesof R.
Through simple manipulations of Equation (18), we can express weight vector W (n) as
The following assumptions are made for the analysis of the system. Noise signal r (n) is fed to the filter input directly. Hence r (n) = x (n). There exists a high correlation between noise signal x (n) and noise present in desired speech signal x’ n) and no correlation with speech part of the signal s (n).
Here for the analysis, microphones in mobile handset are considered where the distance between reference and primary microphone is 10 cms. Speed of sound in air is 343.2 m/s. Hence time required for noise to travel from reference microphone to primary microphone is calculated directly using, time (T) = distance/speed. A sampling frequency of speech and noise signal is 8 KHz. Thus the delay in terms of samples from one to other microphone in time (T) is calculated. For 10 cms of distance between two microphones, the delay is about 2 samples. In addition to delay in samples, 10% fall in the noise intensity (leakage) at the primary microphone isconsidered.
Thus the relation between x (n) and x’ (n) is given by
Coherence of noise go on reducing with increasing separation between microphones which hampers the effective noise removal [11].
In this section, we present our experimentation to evaluate the performance of HPST adaptive filter algorithm and reference algorithms. Subjective as well as objective tests were performed to check the quality of recovered speech at the ANC output. LMS, NLMS and FSSFL-NLMS algorithms are used as reference algorithms and are implemented according to reference papers.
Objective quality tests
Convergence time
Speed of convergence decreases as filter length M increases. There are two reasons for this behavior. First is the ratio of largest to smallest eigenvalues λ
max
/λmin increases as filter length increases. As can be seen from Equation (28), large ratio leads to a larger disparity in the values of (1 - μλ
k
) for different k and due to which it becomes difficult to choose a good step size to obtain faster convergence. Second as the length of the filter increases, step size decreases and in turn convergence increases as shown by Equation (30), where σx2 is noise signalvariance.
This parameter indicates closeness of algorithm to the optimum value of mean square error and indicated by m.
where λ is eigenvalue of autocorrelation matrix R.
Where τ
mse
is average time constant for the algorithm.
Equations (31) and (33) indicate that misadjustment increases linearly with filter length M and step size μ.
Computational complexity of the algorithm is decided by a number of mathematical operations such as addition, subtraction, multiplication and division required in one iteration of weight update and is represented as O (M) where M is filter length. As filter length increases, complexity of algorithm increases as well as convergence time increases.
Peak signal to noise ratio (PSNR)
This metric is used to measure the quality of reconstructed speech. Higher PSNR indicates that the reconstruction is of higher quality.
MSE indicates the difference between reconstructed output and clean speech signal.
A listening test with 25 persons was conducted for subjective evaluation. Two randomly chosen noisy speech sentences from NOIZEUS database were used. All these noisy speeches were enhanced by the ANC system using different algorithms and were used for the listening test. The subjects were asked to grade the recovered speech signals on a quality scale between 0 and 5, 5 being the best score i.e. Mean Opinion Score (MOS). Grading was based on identification of speech quality and intelligibility.
Simulation
Database
NOIZEUS speech corpus is used for the evaluation of algorithms. It contains 30 speech sentences spoken by three female and three male members. The speech sentences produced are added with eight different real world noises at four different SNRs 0 dB, 5 dB, 10 dB and 15 dB. Thus 960 noisy speech signals and 30 clean signals are used for experimentation. Using Noisy and clean speech signal, noise x(n) is generated. Thus 960 noise signals are made available for experimentation. From x (n), x’(n) is generated as explained in Section 3.
The algorithms are also evaluated by actually recorded noisy signals with 8 KHz of sampling and 16 bit quantization. Three noisy environments are considered e.g. temple, fish market and oil industry. 5 female and 5 male speakers read out the sentences. Recording is done at two different SNRs 0 dB and 10 dB. Thus 60 noisy sentences and 60 noise signals are used. The recording was carried out in.wav format.
Results
In this section performance of proposed HPST adaptive filter algorithm is shown and compared with those, achieved by state of the art adaptive algorithms for the ANC. For simulation, a workstation with OS Windows 10 on Core i5 processor, 2.2 GHz speed and 12 GB RAM is used. The simulation is done using MATLAB.
In order to check the validity of the proposed algorithm, first experimentation is carried out for the various filter lengths (M = 2, 4, 8, 16, 32, 64, 128 and 256) and performance of the ANC is studied. As the distance between two microphones increases, delay in terms of samples increases. Figure 2 shows curves for different delays (2, 4, 8, 16, 32 and 64 samples) for Airport 0 dB noise signal for speech sentence SP 01. For a specific delay of x samples, it is observed that the filter length should be 2X for maximum PSNR. This means that for the delay 2, filter length M should be 4, for the delay of 4 filter length M should be 8 and so on. It is also observed that PSNR is maximum for the delay 2 (i.e. M = 4) as compared to other higher delays. This can be justified by Equation (30 and 33) which indicate that as adaptive filter length increases, step size decreases and in turn convergence time increases. Also misadjustment increases with the increase in filter length. Hence the performance of ANC system is better for shorter filter lengths as compared to longer ones.
When an ANC system is designed, it is assumed that the distance between two microphones is known and fixed. Accordingly the filter length is chosen which is kept constant throughout our experimentation. Here M = 4 is chosen as delay between two microphones is 2 samples.
Table 1 lists the performance of four different algorithms for noisy speech (SP-01) signal contaminated by three different noises i.e. Babble, Car and Airport. Speech sentence SP-01 is “The birch canoe slid on the smooth planks”.
Experimentation is done for all types of noise with 0 dB, 5 dB, 10 dB and 15 dB SNR. Due to space constraints results of only three types of noise are shown. Concurring to the results, the proposed HPST algorithm outperforms other three and in many cases the improvement is significant. The results demonstrate that the performance is consistent in removing the noise in speech sentences made by different genders.
Table 1 shows a PSNR comparison for only one sentence in database out of 30 sentences. Figure 3shows an average PSNR comparison for LMS, NLMS, FSSFL-NLMS and HPST algorithms for four different, i.e. Airport, Babble, Car and Train noises for 0 dB SNR. In NOIZEUS database each noise is added to all 30 sentences. Average of PSNRs of all 30 recovered speech sentences is calculated and used for comparison. Due to space constraints remaining results are not presented here.
For LMS μ= 0.2, M = 4; NLMS μ= 0.01, M = 4 and FSSFL-NLMS algorithm μ= 0.02, M = 32 are used for the simulation. But HPST algorithm finds a step size adaptively on its own using a self tuning feature. Figure 3 shows simulation for various delays. It is observed that PSNR is maximum for the delay 2 than other values.
It can be noticed from Fig. 3 that proposed HPST algorithm is clearly outperforming other algorithms in terms of PSNR.
Figure 4 shows comparison of recovered and original speech signal for NOIZEUS database and Fig. 5 shows comparison for actually recorded signals. Red color represents recovered speech signal and green color represents original clean speechsignal.
For LMS algorithm μ= 0.2, M = 4; NLMS algorithm μ= 0.01, M = 4 and FSSFL-NLMS algorithm μ= 0.02, M = 32 are chosen after various trials of simulation. On the contrary, in HPST algorithm, step size μ is calculated by the algorithm itself and need no experimentation to find the optimum value.
It is clearly seen from Figs. 4 and 5 that initial error represented by exponentially decreasing red color patch, is minimum for HPST algorithm for NOIZEUS database as well as recorded database. Also, it can be noticed that both original and recovered signals are almost identical. Hence PSNR is improved to a great extent and is highest amongst three algorithms.
MSE plots from Fig. 6 indicate that, MSE of HPST algorithm is significantly less than other three algorithms. It should be noted that these MSE plots for LMS, NLMS and FSSFL-NLMS algorithms are derived for a specific value of μ and M at which we get superior performance. These values are decided after trials of experimentation.
For LMS algorithm μ= 0.2, M = 4; NLMS algorithm μ= 0.01, M = 4 and FSSFL-NLMS algorithm μ= 0.02, M = 32 are selected. But it should be highlighted that MSE for HPST algorithm exhibits minimum MSE with the advantage that this algorithm do not require any step size value μ and filter length M to be specified by the designer.
Table 2 shows comparison of four algorithms for Temple and Fish market noisy signals. These are actually recorded signals in the noisy environments of Temple, Fish market and Oil Industry at 0 dB and 10 dB SNR levels. It is observed that PSNR is improved significantly in HPST algorithm than LMS, NLMS and FSSFL-NLMS algorithm. Misadjustment is the excess MSE present in the filter output above min MSE. Misadjustment is minimum in HPST algorithm which indicate that the algorithm is closer to optimum solution than other three. Convergence time taken by HPST algorithm is minimum amongst all four. It is expressed in terms of iterations.
Table 3 summarizes the computational complexity of four algorithms. It is observed that the computational complexity of HPST algorithm is slightly more than NLMS algorithm, but with the additional advantage of self tuning and very good performance.
Table 4 shows Mean Opinion Score of three different sentences for SP-03 Station 5 dB, SP-05 Train 10 dB and SP-10 Airport 15 dB to evaluate the quality of recovered speech and is quite good as can be seen from their values.
Conclusion
To achieve a proper compromise between rapid convergence and low misadjustment, selection of correct step size is very important. This is one of the challenging situations in ANC. In this paper, a new HPST adaptive filter algorithm for noise removal is proposed. The major advantage of this algorithm is that, it is a self tuning algorithm and does not require initial step size value and filter length from the designer. It provides a solution to the difficult task of selection of these values as in LMS and NLMS algorithms. From rigorous experimental analysis and testing of NOIZEUS database and actual recorded noisy data, we conclude that HPST algorithm exhibits superior performance than reference algorithms in terms of PSNR, misadjustment and convergence time with almost same computational complexity. The proposed algorithm can also be used for other databases and all real time applications of ANC.
