Optimal preconditioned regularization of least mean squares algorithm for robust online learning 1

Abstract

Despite its low computational cost, and steady state behavior, some well known drawbacks of the least means squares (LMS) algorithm are: slow rate of convergence and unstable behaviour for ill conditioned autocorrelation matrices of input signals. Several modified algorithms have been presented with better convergence speed, however most of these algorithms are expensive in terms of computational cost and time, and sometimes deviate from optimal Wiener solution that results in a biased solution of online estimation problem. In this paper, the inverse Cholesky factor of the input autocorrelation matrix is optimized to pre-whiten input signals and improve the robustness of the LMS algorithm. Furthermore, in order to have an unbiased solution, mean squares deviation (MSD) is minimized by improving convergence in misalignment. This is done by regularizing step-size adaptively in each iteration that helps in developing a highly efficient optimal preconditioned regularized LMS (OPRLMS) algorithm with adaptive step-size. Comparison of OPRLMS algorithm with other LMS based algorithms is given for unknown system identification and noise cancelation from ECG signal, that results in preference of the proposed algorithm over the other variants of LMS algorithm.

Keywords

Optimal Cholesky factor regularization variable step-size preconditioning

1 Introduction

The least mean squares (LMS) algorithm is an iterative solver of the online estimation problem that adaptively minimises the mean squares error (MSE) to generate a steady state solution of the problem. The conventional LMS algorithm can not perform well under high eigenvalue spread (condition number) of input signal’s autocorrelation [1, 2]. A stylish collection of preconditioned LMS algorithms is realized by Newton’s method that offers fast convergence speed, but at high computational cost [3 –5]. However, recent studies have shown that these algorithms may have poor misalignment hence exhibiting deviation from optimal solution [6]. Variable step-size methods offer a good remedy for unbiased solution of LMS algorithm [7], but such methods may slowdown the convergence speed of the algorithm [8].

Deterministic approach, on the other hand, offers direct methods for least squares (LS) solution of the problem. The recursive least-squares (RLS) algorithms have fast deterministic methods, however, this family of algorithms have numerical instabilities and high computational cost [9]. The problems related to numerical instability are overcome by QRD-RLS based techniques, which employ stable orthogonalization tools of numerical linear algebra in finding the Cholesky factor or inverse Cholesky factor of the input data correlation matrix, and use it to obtain a direct solution of the adaptive filtering problem [10, 11]. A noticeable thing in these direct methods is the toeplitz structure of the input signal’s correlation matrix, that can be optimized to regularize the LMS algorithm. Regularization methods play an important role in improving the performance of iterative solvers of inverse problem [12]. Recently, S. Cipolla et. al. [13] have shown that regularized preconditioner for toeplitz matrices can be optimized at a very low cost. In this paper, the inverse QRD-RLS technique is optimized for known statistics of input signals to design a regularization matrix for improving the characteristics of the input signals. The toeplitz structure of the input signal’s correlation matrix that provides an ease in regularizing preconditioned LMS algorithms optimally. The input tap-vector is regularized by the optimal inverse Cholesky factor of the autocorrelation matrix of a stationary stochastic process, like autoregressive (AR) process. The regularized input signals have low eigenvalue spread that guaranty the fast convergence of the proposed method with nominal additional computational complexity. Furthermore, since the regularization matrix is optimized from the inverse Cholesky factor, the proposed optimal regularized LMS (OPRLMS) algorithm has the numerical stability similar to that of inverse QRD-RLS algorithm. Additionally, to have an unbiased solution without compromising on robustness, computational cost and mean squares deviations, an adaptive step-size is used for the development of OPRLMS algorithm. The convergence criterion of LMS algorithm sets an optimal value for step-size parameter to have minimum mean squares deviation that is a measure of convergence in misalignment. However, step-size parameter is bounded by the spectral power of input signals, and its optimal value cannot be determined a priori because statistics of input signals are not always known a priori. This problem is controlled to great extent by the use of variable step-size, or more precisely by an adaptive step-size [8]. Major focus of this paper is on stationary stochastic environment, and therefore the adaptive step-size of proposed OPRLMS algorithm is realized by an optimal power of input signal. It is adaptive in the sense that its value is optimized adaptively according to the statistics of input signals. The proposed algorithm has linear complexity as that of conventional LMS algorithm, but is much faster than it and exhibit minimum mean square deviation from optimal solution as compared with other modifications of LMS algorithms. These characteristics make is highly suitable for applications with highly correlated input signals of known statistics. This paper is organized as follows: A brief summary of the conventional LMS algorithm is given in section 2. In section 3, a Cholesky factor based preconditioning technique is described along with the main idea of optimizing it to form an optimal preconditioned regularization in section 3.1. The highly efficient optimal preconditioned regularized LMS (OPRLMS) algorithm is developed in section 4. Experimental results of section 5 present the comparative performance of proposed algorithm in application of unknown system identification and PLI noise cancelation from real ECG signal. Both applications show preference of proposed algorithm over the rest, and make is highly efficient for applications with stationary environments.

2 A review of the LMS Algorithm

The least mean squares (LMS) algorithm is an adaptive filtering methods for solving the online estimation problem that is modeled by $A_{n} x_{n} \approx b_{n},$ (1) where A_n is n × N (n ⪢ N) data matrix of rank r ≤ N, formed by a sequence of input signals ${θ (n)}_{n = 1}^{\infty}$ . The vector $b_{n} \in R^{n}$ consists of desired signals, and $x_{n} = [x_{n} (0), \dots, x_{n} (N - 1)]^{T} \in R^{N}$ is tap-weight vector of length N. The objective is to predict desired signal s (n) with an online learning algorithm at time n, by estimated output signal $y (n) = \sum_{i = 0}^{N - 1} x_{n} (i) θ (n - i) = x_{n}^{T} Θ_{n} = Θ_{n}^{T} x_{n},$ (2) where, $Θ_{n} = {[\begin{matrix} θ (n) & θ (n - 1) & \dots & θ (n - N + 1) \end{matrix}]}^{T},$ (3) is the input vector at instant n. So that the error incurred is given by $e (n) = s (n) - y (n) .$ (4) This process requires an adaptive algorithm to update filter tap-weight vector x_n recursively as new signal comes in. The least mean squares (LMS) algorithm does so by the update equation: $x_{n + 1} = x_{n} + 2 μ e (n) Θ_{n} .$ (5) where μ is the step-size parameter, and controls the convergence speed and stability of the algorithm. The autocorrelation matrix $Φ = E {Θ_{n} Θ_{n}^{T}}$ statistically determines the spectral bounds for μ to control convergence behavior of the algorithm. For each n the characteristics of input signal vector Θ_n measure the deviation of filter tap weight x_n from the optimal Wiener solution x_o, and therefore this deviation can be taken as a function of input signal’s autocorrelation matrix Φ. It is evident from the past studies [1, 14] that power of input signal appears along the diagonal of Φ, therefore, eigenvalue decomposition of Φ is a very useful indicator of the properties of input signal, and hence performance of the algorithm. These eigenvalues can be used to analyze the stability of LMS based algorithms.

Let ϒ = Diag (γ₁, γ₂, …, γ_N) be the diagonal matrix, having eigenvalues γ₁ ≤ … ≤ γ_N of Φ as its diagonal entries, then corresponding eigenvectors h₁, h₂, …, h_N make columns of an orthogonal matrix H such that an eigenvalue decomposition for Φ is $Φ = H ϒ H^{T} .$ (6) Furthermore, since Φ^-1 = H ϒ^-1 H^T, the condition number (or eigenvalue spread) χ_Φ of Φ comes out to be $χ_{Φ} = \frac{γ_{\max}}{γ_{\min}} .$ (7) Hence eigenvalue spread of correlation matrix Φ depends upon maximum and minimum eigenvalues only. The eigenvalue spread of Φ, given by (7), determines the convergence speed of the algorithm, and algorithm exhibits fastest convergence for χ_Φ ≈ 1. If misalignment m_n is defined in terms of weight error vector as: $m_{n} = x_{n} - x_{o},$ (8) then adaptive tap weights x_n, generated with an appropriate initial estimate, will relax in a close neighborhood of optimal Wiener solution x_o if statistical mean of m_n converges to zero and variance is stable. Equations (5) and (8) yield $E {m_{n + 1}} = (I - 2 μ Φ) E {m_{n}} .$ (9) It shows that E {m_n} makes a geometric progression, which converges to zero if convergence parameter μ satisfies the condition $μ < \frac{1}{2 γ_{j}}; 1 \leq j \leq N .$ (10) It yields $μ < \frac{1}{2 γ_{\max}},$ (11) where γ_max is the largest eigenvalue of Φ. Since value of γ_max depends upon the characteristics of the input signal, and is large for highly correlated inputs, therefore, the stability of the algorithms depends upong μ whose value satisfies (11). Stability of the LMS algorithm determines the adaptation speed and an appropriate value of μ is required to make to algorithm convergent. Analysis of [1] has shown that although LMS algorithm is robust and a computationally linear, with O (N) complexity, however its rate of convergence is highly sensitive to the eigenvalue spread of input signal’s autocorrelation matrix. This draw back has lead researchers to look for some faster algorithms which are less dependent on the correlation properties of the input signals.

LMS-Newton algorithm is an ideally preconditioned algorithm, that is able to yield optimal Wiener solution in a single step in exact arithmetic [2]. In this algorithm the input vector Θ_n, in correction term μ e (n) Θ_n of (5), is preconditioned by the inverse Φ^-1 of input signal’s autocorrelation matrix Φ. The modified update equation is: $x_{n + 1} = x_{n} - 2 μ e (n) Φ^{- 1} Θ_{n} .$ (12) However, LMS-Newton is not suitable for practical purposes, because of the unavailability of exact autocorrelation matrix and its inverse. Nevertheless, a regularized LMS-Newton method can be designed by approximating correlation matrix Φ by Φ_n in (12),

$x_{n + 1} = x_{n} + 2 μ e (n) Φ_{n}^{- 1} Θ_{n} .$ (13) The normalized LMS (NLMS) algorithm is a regularized algorithm of this type [14]. The correction term of NLMS algorithm, at instant n + 1, is preconditioned by the spectral power of input vector Θ_n at instant n. Since $Θ_{n} Θ_{n}^{T}$ has rank one, it has at most one nonzero eigenvalue which is given by: $∥ Θ_{n} ∥_{2}^{2} = Θ_{n}^{T} Θ_{n}$ , which is infact spectral power of Θ_n. This power can be very close to zero for numerically small values of input signals, in which case the algorithm might get unstable. In order to avoid division by zero, it is good to choose $(ψ I + Θ_{n} Θ_{n}^{T})^{- 1}$ as a regularized inverse of $Θ_{n} Θ_{n}^{T}$ , where ψ ≈ 0. The update equation of NLMS algorithm is given by:

$x_{n + 1} = x_{n} + μ e (n) (ψ I + Θ_{n} Θ_{n}^{T})^{- 1} Θ_{n} .$ (14) NLMS has fast convergence speed, with preconditioned input vector ${\bar{u}}_{n} = (ψ I + Θ_{n} Θ_{n}^{T})^{- 1} Θ_{n}$ .

3 The optimal preconditioned regularization

If data matrix A_n is ill-conditioned, then convergence of iterative methods to optimal Wiener solution is not fast enough. One reason is high eigenvalue spread of input signal’s autocorrelation matrix, and the solution is an efficient preconditioner. Consider the normal equation associated with the adaptive LS problem of the form (1): $A_{n}^{T} A_{n} x_{n} = A_{n}^{T} b_{n} .$ (15) Since Φ is symmetric and positive definite matrix, a Cholesky factorization Φ = C^TC exists for an upper triangular matrix C, its approximation C_n will be a Cholesky factor of $Φ_{n} = A_{n}^{T} A_{n}$ . Then $C_{n}^{- T}$ can be used as a preconditioner for the normal equation (15) such that $\begin{matrix} (C_{n}^{- T} A_{n}^{T} A_{n} C_{n}^{- 1}) C_{n} x_{n} = C_{n}^{- T} A_{n}^{T} b_{n}, \end{matrix}$ or

${\hat{Φ}}_{n} {\hat{x}}_{n} = {\hat{p}}_{n},$ (16) where $\begin{matrix} {\hat{Φ}}_{n} = C_{n}^{- T} A_{n}^{T} A_{n} C_{n}^{- 1} = (A_{n} C_{n}^{- 1})^{T} (A_{n} C_{n}^{- 1}), \end{matrix}$ $\begin{matrix} {\hat{x}}_{n} = C_{n} x_{n}, \end{matrix}$ $\begin{matrix} {\hat{p}}_{n} = C_{n}^{- T} A_{n}^{T} b_{n} = (A_{n} C_{n}^{- 1})^{T} b_{n} . \end{matrix}$ The following observation, realized from [15], is sufficient to show the efficiency of a factorization preconditioner $C_{n}^{- T}$ to reduced the eigenvalue spread of input signals’s correlation matrix.

Observation 3.1. For all n, eigenvalue spread of ${\hat{Φ}}_{n}$ remains smaller than or equal to the eigenvalue spread of Φ_n.

Proof. Since Φ_n = H ϒ H^T, an EVD for preconditioned correlation matrix can be ${\hat{Φ}}_{n} = \hat{H} \hat{ϒ} {\hat{H}}^{T}$ , where $\hat{ϒ} = Diag ({\hat{γ}}_{1}, {\hat{γ}}_{2}, \dots, {\hat{γ}}_{N})$ is the diagonal matrix of eigenvalues of ${\hat{Φ}}_{n}$ . With ${\hat{Φ}}_{n} = C_{n}^{- T} Φ_{n} C_{n}^{- 1}$ , $\begin{matrix} \begin{matrix} \hat{ϒ} & = {\hat{H}}^{T} {\hat{Φ}}_{n} \hat{H} = {\hat{H}}^{T} C_{n}^{- T} Φ_{n} C_{n}^{- 1} \hat{H} \\ = ({\hat{H}}^{T} C_{n}^{- T} H) ϒ ({\hat{H}}^{T} C_{n}^{- T} H)^{T} = G ϒ G^{T}, \end{matrix} \end{matrix}$ where $G = {\hat{H}}^{T} C_{n}^{- T} H$ . If c_jk is (j, k)th element of G, then jth eigenvalue of ${\hat{Φ}}_{n}$ is ${\hat{γ}}_{j} = \sum_{k = 1}^{N} c_{jk}^{2} γ_{k}; j = 1, \dots, N$ . Furthermore, since eigenvalues of both Φ_n and ${\hat{Φ}}_{n}$ are real, $\begin{matrix} 0 \leq γ_{\min} \sum_{k = 1}^{N} c_{jk}^{2} \leq {\hat{γ}}_{\min} \leq {\hat{γ}}_{\max} \leq γ_{\max} \sum_{k = 1}^{N} c_{jk}^{2} . \end{matrix}$ Hence $\frac{{\hat{γ}}_{\max}}{{\hat{γ}}_{\min}} \leq \frac{γ_{\max}}{γ_{\min}}$ .□

3.1 Main idea

The optimial preconditioner (OP) of the proposed regularized method is realized by optimizing a priori knowledge of the statistics of input signals. Since Cholesky factorization of a symmetric positive definite matrix is possible, and autocorrelation matrix Φ is a symmetric and positive definite (or semi-definite in some rare cases) matrix, assume that Cholesky factorization of Φ is possible. Then its Cholesky factor can be approximated by the Cholesky factor C_n of an approximated autocorrelation matrix Φ_n. The main idea is presented in following theorem.

Theorem 3.2. If $\hat{Φ} = E [{\hat{u}}_{n} {\hat{u}}_{n}^{T}]$ denotes the autocorrelation matrix of preconditioned input ${\hat{u}}_{n}$ , then $\hat{Φ}$ is asymptotically equivalent to the identity matrix.

Proof. For $Φ_{n} = C_{n}^{T} C_{n}$ , (13) can be written as $\begin{matrix} x_{n + 1} = x_{n} + 2 μ e (n) C_{n}^{- 1} C_{n}^{- T} Θ_{n} . \end{matrix}$ Multiplication by C_n gives

$C_{n} x_{n + 1} = C_{n} x_{n} + 2 μ e (n) C_{n}^{- T} Θ_{n},$ (17) or,

${\hat{x}}_{n + 1} = {\hat{x}}_{n} + 2 μ e (n) {\hat{u}}_{n},$ (18) The last two equations yield the preconditioned input vector of LMS-Newton algorithm as

${\hat{u}}_{n} = C_{n}^{- T} Θ_{n},$ (19) Using value of ${\hat{u}}_{n}$ from equation (19) in $\hat{Φ} = E [{\hat{u}}_{n} {\hat{u}}_{n}^{T}]$ , we have $\begin{matrix} \hat{Φ} = E [{\hat{u}}_{n} {\hat{u}}_{n}^{T}] = C_{n}^{- T} E [Θ_{n} Θ_{n}^{T}] C_{n}^{- 1} = C_{n}^{- T} Φ C_{n}^{- 1} . \end{matrix}$ As n→ ∞, Φ approaches its optimal value Φ_∞, so that $\begin{matrix} ∥ C_{n}^{- T} Φ C_{n}^{- 1} - I ∥ \to ∥ C_{\infty}^{- T} Φ_{\infty} C_{\infty}^{- 1} - I ∥ \approx O, \end{matrix}$ $\Rightarrow C_{n}^{- T} Φ C_{n}^{- 1} \to I$ , hence proving asymptotic equivalence of $\hat{Φ}$ to the identity matrix.□

Thus fast convergence can be achieved for the input vectors having autocorrelation matrix Φ_∞ close to $Φ_{n} = C_{n}^{T} C_{n}$ , in which case the eigenvalue spread of matrix $C_{n}^{- T} Φ_{\infty} C_{n}^{- 1}$ clusters around 1. But, computation of exact Cholesky factor is expensive, and not a good choice for preconditioner. However, since regularized inverse of approximated correlation matrix in NLMS algorithm comes out to be a scalar matrix, it is possible to obtain an optimal Cholesky factor of approximated correlation matrix by using a fixed upper triangular matrix C_∞ that is a fairly good approximation of the Cholesky factor of Φ for all n greater than a large positive integer M. From a priori knowledge of the correlation properties of input signals, it is possible to have an appropriate Cholesky factor C_∞ such that direct computation of a sparse $C_{\infty}^{- 1}$ is not expensive.

3.2 Formation of optimal preconditioner (OP)

To understand the concept of optimal preconditioner, a first order autoregressive (AR) input process θ (n) = a₁θ (n-1) + ν (n) of the type described in [16] is considered.

Here a₁ is a real number, and ν (n) a white noise process of zero mean. The autocorrelation matrix T_N of a stochastic input process can be formed by the autocorrelation function of AR process r_n (j) = E [θ (n) θ (n-j)] for j = 0, 1, …, N-1, such that $T_{N} = (\begin{matrix} r_{n} (0) & r_{n} (1) & \dots & r_{n} (N - 1) \\ r_{n} (1) & r_{n} (0) & \dots & r_{n} (N - 2) \\ ⋮ & ⋱ & ⋮ \\ r_{n} (N - 1) & r_{n} (N - 2) & \dots & r_{n} (0) \end{matrix}) .$ (20) If variance of noise process ν (n) is one, then AR process θ (n) can be taken as a white noise passed from an all pass filter, whose transfer function is:

$H (z) = \frac{\sqrt{1 - α_{n}^{2}}}{1 - α_{n} z^{- 1}},$ (21) where |α_n| < 1, α_n is the pole of the filter, and controls the spectral power of the input signals. Since spectral power of input signals is measured by the eigenvalue spread of autocorrelation matrix, therefore α_n can be measured directly from the autocorrelation function r (j) as

$r_{n} (j) = E [θ (n) θ (n - j)] = c_{1} α_{n}^{j} .$ (22) Here c₁ is a nonzero constant, and determines the power of input process. Once an appropriate value of correlation parameter α_n is computed from (22), for an incoming process θ (n), its autocorrelation matrix can be computed from (20) as, $T_{N} (α_{n}) = (\begin{matrix} 1 & α_{n} & α_{n}^{2} & \dots & α_{n}^{N - 1} \\ α_{n} & 1 & α_{n} & \dots & α_{n}^{N - 2} \\ α_{n}^{2} & α_{n} & 1 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ α_{n}^{N - 1} & α_{n}^{N - 2} & \dots & 1 \end{matrix}) .$ (23) It is a symmetric toeplitz matrix having Cholesky factor C_N (α_n), and inverse Cholesky factor C_N (α_n) ^-1 as: $C_{N} (α_{n}) = (\begin{matrix} 1 & α_{n} & α_{n}^{2} & ... & α_{n}^{N - 1} \\ 0 & \sqrt{1 - α_{n}^{2}} & α_{n} \sqrt{1 - α_{n}^{2}} & ... & α_{N - 2}^{n} \sqrt{1 - α_{n}^{2}} \\ 0 & 0 & \sqrt{1 - α_{n}^{2}} & ⋱ & α_{N - 3}^{n} \sqrt{1 - α_{n}^{2}} \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & ... & \sqrt{1 - α_{n}^{2}} \end{matrix})$

$C_{N} {(α_{n})}^{- 1} = (\begin{matrix} 1 & \frac{- α_{n}}{\sqrt{1 - α_{n}^{2}}} & 0 & ... & 0 \\ 0 & \frac{1}{\sqrt{1 - α_{n}^{2}}} & \frac{- α_{n}}{\sqrt{1 - α_{n}^{2}}} & ... & 0 \\ ⋮ & ⋮ & ⋱ & ⋱ & ⋮ \\ 0 & 0 & ... & \frac{1}{\sqrt{1 - α_{n}^{2}}} & \frac{- α_{n}}{\sqrt{1 - α_{n}^{2}}} \\ 0 & 0 & ... & \frac{1}{\sqrt{1 - α_{n}^{2}}} \end{matrix})$ (24) Clearly C_N (α_n) ^-1 is very sparse having nonzero entries in the main diagonal and upper band only. Furthermore it is independent of N, and depends upon α_n only.

Theorem 3.3. The inverse Cholesky factor C_N (α_n) ^-1 of T_N (α_n) converges asymptotically to C_∞ (α_n) ^-1.

Proof. The pattern of C_N (α_n) ^-1 in (24) yields that for N→ ∞, C_N (α_n) ^-1 ≈ C_∞ (α_n) ^-1.□

Observation 3.4. As n→ ∞, α_n → α_∞, and C_∞ (α_n) ^-1 yields an approximation $C_{\infty} (α_{\infty})^{- 1} = C_{\infty}^{- 1}$ of the Cholesky factor of input signal’s autocorrelation matrix.

As the value of α_n increases, the ability of $C_{\infty}^{- T}$ to decorrelate input signals increase, as is shown in Fig. 1. It is clear that the percentage of reduction in the eigenvalue spread increases with the increase in value of α_n, which in turn guarantees that the condition number of input autocorrelation matrix remains close to 1.

Fig. 1

Percentage of reduction in the eigenvalue spread of input autocorrelation matrix corresponding to parameter α_n.

3.3 Computational complexity of OP

From above analysis, it seems logically true to design an optimal preconditioner $C_{\infty}^{- T}$ using a set S = {s₀, s₁, s₂, . . . , s_t} of at most t ≤ N scalars, such that

$C_{\infty}^{- T} Φ C_{\infty}^{- 1} \to I,$ (25) as n gets very large (larger than a large positive number M). For first order AR process, direct computation of $C_{\infty}^{- T}$ from (24) requires t = 2, with s₀ = 1, $s_{1} = \frac{1}{\sqrt{1 - α_{n}^{2}}}$ , $s_{2} = \frac{- α_{n}}{\sqrt{1 - α_{n}^{2}}}$ . Therefore choosing $C_{\infty}^{- T}$ as preconditioner for iterative solvers of adaptive filtering problem is quite economical, having a linear computational complexity and being 100% able to reduce the eigenvalue spread of input signal’s correlation matrix.

4 The optimal preconditioned regularization based OPRLMS Algorithm

This section contains application of optimal preconditioner $C_{\infty}^{- T}$ on least mean squares algorithm for online learning, that is further followed by a regularized step-size technique to develop an optimal preconditioned reqularized LMS (OPRLMS) algorithm.

Using $C_{\infty}^{- T}$ in Eq-(19) to transform input vector Θ_n in preconditioned input, $\begin{matrix} {\hat{u}}_{n} = C_{\infty}^{- T} Θ_{n} . \end{matrix}$ Similarly, (18) gives the preconditioned equation,

$C_{\infty} x_{n + 1} = C_{\infty} x_{n} + 2 μ e (n) C_{\infty}^{- T} Θ_{n},$ (26) with error signal $e (n) = s (n) - (C_{\infty} x_{n})^{T} (C_{\infty}^{- T} Θ_{n}) = s (n) - x_{n}^{T} Θ_{n} .$

Convergence Criteria:. Rewriting equation (26) as,

$\begin{matrix} x_{n + 1} & = x_{n} + 2 μ e (n) C_{\infty}^{- 1} C_{\infty}^{- T} Θ_{n} \\ = x_{n} + 2 μ e (n) (C_{\infty}^{T} C_{\infty})^{- 1} Θ_{n}, \end{matrix}$ (27) where $(C_{\infty}^{T} C_{\infty})^{- 1}$ is an optimal inverse of T_∞ (α_n) and is associated with correlation matrix ${\hat{Φ}}_{n}$ such that $\begin{matrix} {\hat{Φ}}_{n} & = & E [{\hat{u}}_{n} {\hat{u}}_{n}^{T}] \\ = & C_{\infty}^{- T} E [Θ_{n} Θ_{n}^{T}] C_{\infty}^{- 1} \approx C_{\infty}^{- T} Φ_{n} C_{\infty}^{- 1} . \end{matrix}$ The following definition is straight froward.

Definition 4.1. The preconditioned misalignment is defined as the translation of the preconditioned tap-weight vector ${\hat{x}}_{n}$ , with respect to the shifted origin ${\hat{x}}_{o}$ , i.e., ${\hat{m}}_{n} = {\hat{x}}_{n} - {\hat{x}}_{o}$ .

With ${\hat{x}}_{o} = C_{\infty} x_{o}$ as the preconditioned optimal weight vector, an obvious observation is

Observation 4.1. $\begin{matrix} {\hat{m}}_{n} & = {\hat{x}}_{n} - {\hat{x}}_{o} \\ = C_{\infty} x_{n} - C_{\infty} x_{o} \\ = C_{\infty} m_{n} \end{matrix}$

Solving (27) for misalignment, and taking expectation $\begin{matrix} E [{\hat{m}}_{n + 1}] & = (I - 2 μ {\hat{Φ}}_{n}) E [{\hat{m}}_{n}] \\ = (I - 2 μ C_{\infty}^{- T} Φ_{n} C_{\infty}^{- 1}) E [{\hat{m}}_{n}] \end{matrix}$

It shows that $E [{\hat{m}}_{n}]$ forms a geometric progression, whose convergence to 0 is subject to the spectral radius $∥ 2 μ C_{\infty}^{- T} Φ_{n} C_{\infty}^{- 1} - I ∥ < 1 .$ (28) As n→ ∞, $C_{\infty}^{- T} Φ_{n} C_{\infty}^{- 1} \to I$ , so that (28) yields $0 < μ < 1 .$ (29)

4.1 OPRLMS Algorithm

The review of LMS algorithm in §2, and above discussion show that step-size μ is a measure of convergence speed of LMS based algorithms. Since it is bounded by eigenvalue spread of input signal, therefore is sensitive to change in correlation as a new signal comes in. This is the reason of slow convergence behavior of LMS algorithm. In case of NLMS algorithm, step-size changes adaptively with incoming signals, and algorithm have different step-size in each update. However, NLMS algorithm suffers from bad misadjustment especially when step-size gets large. A small step-size is to be used to obtain small misadjustment, and convergence in misalignment. It appears that a time dependent regularization of step-size may work better to adapt the changes in the characteristics of inputs signals.

Since spectral power $Θ_{n}^{T} Θ_{n}$ of input vector Θ_n increases with an increase in eigenvalue spread, therefore, the proposed regularization is aimed to maintain this power to 1. The step-size is modified adaptively as:

$μ_{n} = \frac{μ_{n - 1}}{n ε_{1} + 1},$ (30) where ε₁ is a small constant, and n time instant. μ_n-1 is the step-size at previous instant. With this time dependent step-size, and update equation is $C_{\infty} x_{n + 1} = C_{\infty} x_{n} + 2 μ_{n} e (n) C_{\infty}^{- T} Θ_{n} .$ (31) Algorithm ?? summarizes the OPRLMS algorithm. It has linear complexity with respect to the filter length, because the number of multiplications for one complete iteration are 6N. Furthermore, it will be shown in simulations that it is easy to implement in different signal processing applications and is not time consuming.

Algorithm 1 The OPRLMS Algorithm.

Initializations: x_init = 0 ; ${\hat{x}}_{init} = 0;$ 0 < μ_init < 1

Update:

1: for n = 1, ⋯ do

2: $α_{n} = \frac{E {θ (n) θ (n - 1)}}{c_{1}}$

3: $μ_{n} = \frac{μ_{n - 1}}{n ε_{1} + 1}$

4: ${\hat{u}}_{n} = C_{\infty}^{- T} Θ_{n}$

5: $y (n) = x_{n}^{T} Θ_{n} = {\hat{x}}_{n}^{T} {\hat{u}}_{n}$

6: e (n) = s (n) - y (n)

7: ${\hat{x}}_{n + 1} = {\hat{x}}_{n} + 2 μ_{n} e (n) {\hat{u}}_{n}$

8: $x_{n + 1} = C_{\infty}^{- 1} {\hat{x}}_{n + 1}$

9: end for

Convergence Criteria:. Since μ_init < 1, step 3 of Algorithm ?? yields that ∀n

$μ_{n} < 1 .$ (32)

Resemblance with NLMS Algorithm:. From above discussion, it can be emphasized that as n→ ∞, T_∞ (α_n) → T_∞ (α_∞). Since it is an approximation of Φ_∞, it can be verified that maximum power of input signal is clustered around diagonal of $(C_{\infty}^{T} C_{\infty})^{- 1}$ , which shows a close resemblance of OPRLMS algorithm with NLMS algorithm. However, and additional characteristics of the proposed method is its low computational cost.

5 Experimental results

In order to verify analytical results, and examine the performance of proposed OPRLMS algorithm, simulations are performed for unknown system identification and noise cancelation from real ECG signals.

5.1 Unknown system identification

For this application an autoregressive process process of unit variance is passed through a coloring filter with frequency response given by (21). The correlation parameter 0 ≤ α_n < 1 is a measures of the correlation characteristics of incoming signal at instant n,as is shown in Fig. 1. It is required to identify a system whose exact output s (n) is given by

$s (n) = S_{n} Θ_{n} + △ s (n),$ (33) where $S_{n} \in R^{N}$ , and △s (n) is an output noise with SNR ≈ 30dB. Performance measures is convergence in normalized misalignment $10 \log_{10} (\frac{∥ x_{n} - x_{o} ∥_{2}^{2}}{∥ x_{o} ∥_{2}^{2}})$ , and convergence behavior of proposed OPRLMS algorithm is compared with NLMS and TDLMS algorithms.

Choice of μ_init. Initial estimate of step size plays an important role in parameter estimation for starting iterations. Setting ε₁ = 10^-4α_n, Fig. 2 shows the results of three simulations corresponding to μ_init = 0.05, 0.1 & 0.15. It is clear from these results that value of μ_n decreases adaptively with an increase in time n. It is visible from Fig. 3 that larger the value of μ_init, higher the value of initial normalized misalignment. It is therefore convenient to choose a smaller value of μ_init to have better convergence in normalized misalignment. The results of Fig. 3 are obtained by an ensemble average of 500 independent runs.

Fig. 2

Adaptive change in μ_n with successive iterations for α = 0.6059931.

Fig. 3

Learning curve of normalized misalignment [dB] for OPRLMS algorithm with N = 6 and 0.5 < α < 0.75.

Comparison. Simulation results of this section contain comparison of OPRLMS algorithm with conventional LMS algorithm, NLMS algorithm and TDLMS algorithm. Filter length for this simulation is N = 6, and all the results are obtained by taking average of 500 independent runs.

The experimental results are obtained after performing 1200 iterations. The first results is obtained for normalized misalignment, and this comparison is shown in Fig. 4(a). A mean value of α_n = α is used for this experiment and this value in the interval (0.5, 0.75) for all n.

Another observation that is made during this experiment is the computation of the mean squares error (MSE) whose learning curve is shown in Fig. 4(b). This observation show the comparative convergence behavior for first 200 iterations. It is evident that although MSE behavior have slight improvement with OPRLMS algorithm, but misalignment behavior shows clear preference of proposed algorithm over the rest.

Fig. 4

OPRLMS and conventional algorithms with N = 6 and 0.5 < α < 0.75.

5.2 Noise cancelation

This application presents the cancelation of powerline interference (PLI) from real ECG signals. ECG signal include valuable clinical information, but frequently this valuable information is corrupted by various kinds of noise presented in the recordings. For simulations real ECG signals are obtained from MIT-BIH database of physionet.com [17]. The MIT-BIH database contains 48 half hour waveforms of two channel ambulatory ECG recordings, which were obtained from 47 patients, including 25 men aged 32-89 years, and women aged 23-89 years. The recordings are digitized at 360 samples per second per channel with 11-bit resolution over a 10 mV range, and have different heartbeat frequencies depending upon the ages of the patients [18]. MIT-BIH signal ECG 101, of a female with age = 75 years, is used for performance study of proposed adaptive filtering algorithms. Performance of these algorithms is analyzed in terms of their efficiency to denoise ECG signal, and computation time. Since medical technicians are interested in accuracy of detecting events of ECG waveform, therefore a minimum norm (MN) analysis is provided to examine the efficiency of an adaptive algorithm to identify clean signal from the noisy one.

Cancelation of PLI Noise. PLI is an environmental noise that is often recorded along with the ECG signals and interrupt the exact diagnosis. However, it may be canceled by some efficient noise cancelation technique. Such noise is generated by changes in the alternating current (AC) in response to poor connection of recording machine. Owing to the AC characteristics, PLI is harmonic, and therefore if sampling frequency is 360Hz, PLI comes up with a frequency of 60Hz, 120Hz, 180Hz, etc. For experimental purposes, a PLI noise is simulated in Matlab, using formula $PLI (n) = A_{PL} \sin (2 π f_{PL} n),$ (34) where A_PL is the amplitude, and f_PL is the frequency of PLI noise that may be higher than or equal to 40Hz. A 60Hz PLI noise is used to generate a harmonic PLI noise for simulation results. The harmonic nature of PLI noise make OP based preconditioned algorithm a suitable choice for PLI removal from real ECG.

Fig.5

Clean and noise contaminated waveforms of ECG 101.

This simulation involves a comparative study of the performance of LMS, NLMS and OPRLMS algorithms in terms of their efficiency in noise cancelation. A common performance measure in signal processing is SNR, however, this tool is not effective in ECG signals. It is so because medical technician and scientists are generally interested in an accurate detection of clean ECG waveform out of noisy one in minimum possible time. For this reason the performance analysis of this section is done by computing the processing time taken by an algorithm in cancelation of PLI noise. Then Euclidean 2-norm of the difference between denoised ECG and clean ECG signal is computed, to observe the mean square deviation of the algorithm. Furthermore, denoised waveforms are recorded to observe the efficiency of a particular algorithm in cancelation of PLI noise from ECG. All simulations are performed for a filter length N = 5 and 10 seconds duration time, with n = 0 :0.002778 : 10.

The real ECG 101, obtained from MITBIH, is considered as the clean signal, and its waveform is shown in Fig. 5(a), while the PLI contaminated ECG 101 is shown in Fig. 5(b). The waveforms of denoised ECG signals, shown in Fig. 5a, are obtained by the application of LMS, NLMS and OPRLMS algorithms. An overall analysis of the results of Table 1, and denoised waveforms of Fig. 5a show preference of OPRLMS algorithm over the rest. Computational time for OPRLMS algorithm is slightly larger than LMS algorithm, however it is smaller than NLMS algorithm. On the other hand, value of 2-norm is least for OPRLMS algorithm. These results prove high efficiency of OPRLMS algorithm at nominal additional cost of computing optimal preconditioner and adaptive step-size. Time cost for these computations is far less than that of NLMS algorithms, while efficiency is much better than both LMS and NLMS algorithms

Fig. 6

PLI denoised ECG waveforms for N = 5.

Table 1

Processing time of each algorithm for noise cancelation, and norm of the deviation of denoised ECG from clean ECG 101.

Algorithm	Time (in sec.)	Norm of Deviation
LMS	0.06337	2.96603
NLMS	0.08771	2.79772
OPRLMS	0.07177	2.12722

6 Conclusion

In this paper, an insight into the need of preconditioning the regularization techniques in for online learning is highlighted. Setting a framework of factorization preconditioner for pre-whitening input data signals is shown to regularize the eigenvalue spread of input autocorrelation matrix by inverse Cholesky factor. Then representation of LMS-Newton algorithm as a factorization preconditioned algorithm developed the idea of using inverse Cholesky factor of correlation matrix to precondition LMS algorithm. Afterwards a stylized reformation of NLMS algorithm as a diagonally preconditioned algorithm and then as a step-preconditioned algorithms has lead to the importance of preconditioning. These realizations, and efficiency of optimal preconditioned regularization has produced an efficient OPRLMS algorithm with fast and unbiased convergence behavior. The optimal preconditioner, presented in this paper, is formed using optimizing the inverse Cholesky factor of inverse QRD-RLS algorithm for known statistics of input signals. The proposed regularization technique is developed at nominal computational cost, that is able to achieve a fast convergence rate for adaptive LMS filter without increasing too much in the complexity. Convergence analysis and experimental results have proved the preference of newly proposed OPRLMS algorithm over the existing preconditioned algorithms. These results show high efficiency of OPRLMS algorithm in the applications of unknown system identification and cancelation of PLI noise from real ECG signals.

Footnotes

Acknowledgment

The authors would like to acknowledge the financial support by Universiti Sains Malaysia through Research University Grant (RUI) (acc. No. 1001/PMATHS/8011040).

References

Farhang-Boroujeny

, Adaptive filters: theory and applications, 2013.

Haykin

S.S.

, Widrow

and Wiley

, Least-mean-square adaptive filters. Wiley Online Library, 2003.

de Campos

M.L.

and Antoniou

, A new quasi-newton adaptive filtering algorithm, Circuits and Systems II: Analog and Digital Signal Processing, IEEE Transactions on 44(11) (1997), 924–934.

Diniz

P.S.R.

, Adaptive Filtering: Algorithms and Practical Implementation. Springer Verlag, 2008.

Bhotto

M.Z.A.

and Antoniou

, Improved quasi-newton adaptive-filtering algorithm, Circuits and Systems I: Regular Papers, IEEE Transactions on 57(8) (2010), 2109–2118.

Javed

and Ahmad

N.A.

, Performance study of LMS based adaptive algorithms for unknown system identification, in SKSM21. AIP Conference Proceedings, In Press.

Mathews

V.J.

and Xie

, A stochastic gradient adaptive filter with gradient adaptive step size, Signal Processing, IEEE Transactions on 41(6) (1993), 2075–2087.

Ang

W.-P.

and Farhang-Boroujeny

, A new class of gradient adaptive step-size lms algorithms, IEEE Transactions on Signal Processing 49(4) (2001), 805–810.

Pierce

D.J.

and Plemmons

R.J.

, Tracking the condition number for RLS in signal processing, Mathematics of Control, Signals and Systems 5(1) (1992), 23–39.

10.

Alexander

S.T.

and Ghirnikar

A.L.

, A method for recursive least squares filtering based upon an inverse QR decomposition,, IEEE Transaction On Signal Processing 41(1) (1993), 20–30.

11.

Apolinário

J.A.

, QRD-RLS adaptive filtering, 2009.

12.

Benning

and Burger

, Modern regularization methods for inverse problems, Acta Numerica 27 (2018), 1–111.

13.

Cipolla

, Di Fiore

, Durastante

and Zellini

, Regularizing properties of a class of matrices including the optimal and the superoptimal preconditioners, Numerical Linear Algebra with Applications 26(2) (2019), e2225.

14.

Haykin

, Adaptive Filter Theory, 2nd ed. Prentice Hall, 1991.

15.

Erdol

and Basbug

, Wavelet transform based adaptive filters: analysis and new results, Signal Processing, IEEE Transactions on 44(9) (1996), 2163–2171.

16.

Zhao

, Man

, Khoo

and Wu

H.R.

, Stability and convergence analysis of transform-domain LMS adaptive filters with second-order autoregressive process, Signal Processing, IEEE Transactions on 57(1) (2009), 119–130.

17.

Moody

M.R.

, GB. (2000) Mitbih arrhythmia database. [Online]. Available: http://www.physionet.org/physiobank/database/mitdb/

18.

Moody

G.B.

and Mark

R.G.

, The impact of the MIT-BIH arrhythmia database, Engineering in Medicine and Biology Magazine, IEEE 20(3) (2001), 45–50.