Why Daubechies wavelets are so successful

Abstract

In many applications, including analysis of seismic signals, Daubechies wavelets perform much better than other families of wavelets. In this paper, we provide a possible theoretical explanation for the empirical success of Daubechies wavelets. Specifically, we show that these wavelets are optimal with respect to any optimality criterion that satisfies the natural properties of scale- and shift-invariance.

Keywords

Daubechais wavelets seismology invariance

1 Formulation of the problem

Need for 1-D wavelets. The values of most physical quantities q change with time t: q = q (t). In some cases, e.g., in celestial mechanics, we know the general shape of this dependence, i.e., we know that the function f (t, c₁, …, c_n) such that the actual dynamics q (t) is determined by this expression q (t) = f (t, c₁, …, c_n) for some values of the parameters c₁, …, c_n. In such cases:

first, we use the values q (t₁), …, q (t_k) observed during some observation period to determine the values of these parameters, i.e., to solve the system of equations $q (t_{i}) = f (t_{i}, c_{1}, \dots, c_{n}), i = 1, \dots, k,$ (1) with the unknowns c_i;

then, we use the resulting values c_i to predict the to predict the future values q (t) of the quantity q as q (t) = f (t, c₁, …, c_n).

Sometimes, the dependence on the parameters c_i is non-linear –so this system of equations is not easy to solve. However, if this is how q depends on time, there is nothing we can do about it.

In many other cases, however, we do not know the shape of the dependence. In such cases, it is also desirable to come up with a general formula f (t, c₁, …, c_n) –with not-too-many parameters c_i –that would adequately describe the dynamics of the quantity of interest. In such situations, it is reasonable to select a family for which the corresponding system of equations (1) is the easiest to solve –i.e., is a system of linear equations. For this purpose, we select a family for which the dependence f (t, c₁, …, c_n) is linear in terms of the unknowns, i.e., for which $f (t, c_{1}, \dots, c_{n}) =$ $f_{0} (t) + c_{1} \cdot f_{1} (t) + \dots + c_{n} \cdot f_{n} (t)$ (2) for some functions f_i (t).

Such representations are indeed actively used in data processing. For example, for a smooth dependence q (t), it is reasonable to approximate it by a polynomial –i.e., by the sum of the first few terms of its Taylor expansion. In this case, the functions f_i (t) are monomials t^j corresponding to non-negative integers j. For a periodic process with a known period T, we can use sines and cosines sin(j · ω · t) and cos(j · ω · t) for non-negative integers j, where ω = 2π/T, etc.

However, many physical processes –e.g., seismic processes –are neither smooth nor periodical: they consist of time-localized bursts of activity. To describe such processes, it makes sense to use similarly time-localized functions f_i (t). Such functions are known as wavelets; see, e.g., [5 , 14].

From generic wavelets to Daubechies wavelets. One of the computational advantages of Fourier series –i.e., of representing the desired dependence as a linear combination of sines and cosines –is that all the functions f_i (t) used in this approximation can be obtained from each other by scaling and shift, i.e., they all have the form f_i (t) = f₀ (a_i · t + b_i) for some values a_i and b_i, where $f_{0} (t) \overset{def}{=} sin (ω \cdot t)$ .

It turns out that we can select wavelets that satisfy a similar property –namely, we select the basic function φ (t) known as the mother wavelet, and take the functions f_i (t) of the form φ (2^j · t - ℓ), where j ≥ 0 and ℓ are integers. (There are also similar functions generated by another related function, known as the father wavelet.) For the resulting functions f_i (t) to be efficient in representing and processing data, the mother wavelet must satisfy a certain linear functional equation.

This functional equation has many different solutions. Empirically:

wavelets corresponding to some solutions work better, while

wavelet corresponding to other solutions of the functional equations do not work so well.

To select a single solution –and thus, to fix a family of wavelets –we need to impose additional restrictions on the function φ (t). To make computations easier –and to preserve linearity of the corresponding system of equation –it makes sense to impose restrictions which are linear in terms of φ (t). A general such linear restriction has the form

\int c_{m} (t) \cdot φ (t) dt = b_{m}, m = 1, \dots, M,

(3) for some function c_m (t).

Once we know one solution φ₀ (t) to the system (3) of linear equations, we can have an even simpler system of linear equations for the difference $Δ (t) \overset{def}{=} φ (t) - φ_{0} (t)$ : $\int c_{m} (t) \cdot Δ (t) dt = 0, m = 1, \dots, M .$ (4)

Of course, once the equality ∫c (t) · Δ (t) dt = 0 holds for all M functions c₁ (t), …, c_M (t), the same equality holds for all possible linear combinations $c (t) = s_{1} \cdot c_{1} (t) + \dots + s_{M} \cdot c_{M} (t)$ of these functions, i.e., for the whole M-dimensional linear space L of functions generated by the functions c_m (t). From this viewpoint, the condition (4) can be described as requiring that $\int c (t) \cdot Δ (t) dt = 0$ for all functions c (t) from an M-dimensional linear space of functions. Thus, the selection of a specific wavelet family means selecting an appropriate linear space L.

Ingrid Daubechies proposed to use c_m (t) = t^m-1, i.e., equivalently, to take, as L, the linear space of all polynomials $c (t) = a_{0} + a_{1} \cdot t + \dots + a_{M - 1} \cdot t^{M - 1}$ of order less than M; see, e.g., [6]. The resulting wavelets are known as Daubechies wavelets.

Empirical fact. Daubechies wavelets perform very well in many practical applications: in civil engineering (see, e.g., [10]), in power systems engineering (see, e.g., [4]), in biomedical engineering (see, e.g., [11]), and in signal and image processing in general; see, e.g., [7, 9].

In particular, in many problems related to processing seismic signals, Daubechies wavelets work very well, much better than many other wavelet families; see, e.g., [1–3 , 16] and references therein.

What we do in this paper. In this paper, we provide a possible theoretical explanation for the empirical success of Daubechies wavelets.

Specifically, we show that these wavelets are optimal with respect to any optimality criterion that satisfies the natural properties of scale- and shift-invariance.

The structure of the paper is as follows. In Section 2, we analyze the problem and, as a result of this analysis, formulate this problem in precise mathematical terms. Section 3 contains the main result –an explanation of why Daubechies wavelets are optimal –and the proof of this result.

2 Analysis of the problem

We need to select an M-dimensional linear space. As we have mentioned earlier, selecting a family of wavelets is equivalent to selecting an M-dimensional linear space L of functions. In these terms, the question is: What is the optimal selection of an M-dimensional linear space of functions?

We will only consider smooth (differentiable) functions c (t). In wavelet analysis, the corresponding functions c (t) are differentiable. In view of this, in this paper, we will also limit ourselves to the case when all the functions a (t) from the linear space L are differentiable.

Comment. This diffentiability requirement makes sense: e.g., it is known that every continuous function c (t) can be approximated, with any given accuracy, by smooth functions. Since from the practical viewpoint, a very small difference is not noticeable, it thus makes sense to assume that all the functions c (t) are differentiable.

However, a reader should be warned that it is not possible to follow this argument too far: Actually, some wavelets are not smooth. Even Daubechies wavelets of higher order M, while smooth, are not infinitely differentiable: if we differentiate the corresponding mother wavelet again and again, we will eventually reach a function which is not differentiable at some points.

What does “optimal” mean. Usually, when we say that an alternative A_opt is optimal, it means that:

there is a numerical characteristic F (A) describing the imperfection of different alternatives, and

the alternative A_opt has the smallest value of this characteristic.

For example, for different wavelet families A, we can take, as F (A), the mean squared accuracy with which the use of the first few wavelets from this family approximates signals from the given set of signals.

However, this is not the only way to describe optimality. In the above example, we may have several different families with the same smallest possible value of the mean squared accuracy. In such a case, we can use this non-uniqueness to minimize some other characteristic G (A): e.g., the average computation time needs to get the corresponding approximation. We then say that the alternative A is better or of the same quality as an alternative B –we will denote it by A ≤ B –if:

either F (A) < F (B),

or F (A) = F (B) and G (A) ≤ G (B).

If this additional numerical criterion does not lead to a unique selection of an alternative, we can minimize something else, etc., until we reach the final optimality criterion –for which there is exactly one optimal alternative.

No matter how complex our comparison, in all these cases, we have a relation A ≤ B between the two alternative describing that A is better or of the same quality as B.

Of course, each alternative has the same quality as itself A ≤ A, and if A ≤ B and B ≤ C, then A ≤ C. Thus, we arrive at the following definition.

Definition 1. Let $A \in A$ be a set. Its elements will be called alternatives.

By an optimality criterion, we mean a binary relation ≤ on this set which satisfies the following two properties:

for every $A \in A$ , we have A ≤ A (reflexivity), and

for all $A, B, C \in A$ , if A ≤ B and B ≤ C, then A ≤ C.

An alternative A_opt is called optimal with respect to the optimal criterion ≤ if we have A_opt ≤ A for all $A \in A$ .

An optimality criterion ≤ is called final if for this criterion, there exists exactly one optimal alternative.

Natural invariance properties. We are interested in describing how a quantity changes with time. We describe this dependence in numerical terms, as a dependence q (t) of the numerical value of the quantity q on the numerical value of time t.

However, the numerical value of time depends on the selection of the measuring unit and on the selection of the starting point. If we replace the original unit with a new one which is a times smaller –e.g., consider seconds instead of minutes –then all numerical values of time are multiplied by a. The corresponding linear transformation t ↦ a · t is known as scaling.

Similarly, if we replace the original starting point for measuring time with a new starting point which is t₀ moments earlier, then this value t₀ will be added to all numerical values of time. The corresponding linear transformation t ↦ t + t₀ is known as shift.

In general, if we change both the unit and the starting point, we replace t with a · t + t₀ – i.e., we get a linear transformation.

The numerical values change, but the physical process remains the same. From this viewpoint, it is reasonable to require that the relative quality of two different methods should not change if we simply change the unit and/or the starting point. In terms of linear spaces –that describe different wavelet families – we thus arrive at the following definition.

Definition 2.

By a linear transformation, we mean a function T (t) = a · t + t₀ for some values a and t₀.

For each linear transformation T and each function e (t), by the result T (e) of applying T to e we mean a function e (T (t)).

For each M-dimensional linear space L of smooth functions, by the result T (L) of applying T to L we mean the linear space formed by the functions T (e) for e ∈ L.

We say that the optimality criterion ≤ on the set $L$ of all M-dimensional linear spaces of smooth functions is invariant if for every two spaces, L₁ ≤ L₂ implies that T (L₁) ≤ T (L₂).

Now, we can formulate our main result.

3 Main result

Proposition. For every final invariant optimality criterion on the set of all M-dimensional linear spaces of smooth functions, all elements of the optimal family L_opt are polynomials of order less than M.

Comments.

Thus, we have indeed proven that the linear space corresponding to Daubechies wavelets is optimal –and thus, so, in this sense, Daubechies wavelets are optimal.

The following proof follows ideas first described in [13].

Proof. Let us first prove that the optimal family L_opt is itself invariant, i.e., that T (L_opt) = L_opt.

Indeed, the fact that L_opt is optimal means that L_opt ≤ L for all families L, in particular, for all families of the type T^-1 (L), where T^-1 is the inverse transformation. So, L_opt ≤ T^-1 (L) for each L. By using invariance of the optimality criterion, we conclude that T (L_opt) ≤ L for every L, i.e., that the linear space T (L_opt) is also optimal. However, the optimality criterion ≤ is final, which means that there is only one optimal space, so indeed, T (L_opt) = L_opt.

Let us now select any basis e₁ (t), …, e_M (t) in the optimal linear space. Invariance of the linear space L_opt means, in particular, that for each i and for each t₀, the shifted function e_i (t + t₀) also belongs to this linear space, i.e., that $e_{i} (t + t_{0}) = \sum_{j = 1}^{M} C_{ij} (t_{0}) \cdot e_{j} (t)$ (5) for some coefficients C_ij depending on t₀. If we select M different moments of time t₁, …, t_M, we get a system of M linear equations to determine these coefficients C_ij (t₀) in terms of the functions e_j: $e_{i} (t_{1} + t_{0}) = \sum_{j = 1}^{M} C_{ij} (t_{0}) \cdot e_{j} (t_{1}); \dots e_{i} (t_{M} + t_{0}) = \sum_{j = 1}^{M} C_{ij} (t_{0}) \cdot e_{j} (t_{M}) .$ (6) In general, the solution of a system of linear equations is a linear combination of the left-hand sides. The left-hand sides e_i (t_k + t₀), k = 1, …, M are differentiable functions of t₀, thus, all the coefficients C_ij (t₀) are also differentiable. So, all the functions in the formula (5) are differentiable. Thus, we can differentiate both sides with respect to t₀, and get $e_{i}^{'} (t + t_{0}) = \sum_{j = 1}^{M} C_{ij}^{'} (t_{0}) \cdot e_{j} (t) .$ (7) In particular, for t₀ = 0, we get $e_{i}^{'} (t) = \sum_{j = 1}^{M} c_{ij} \cdot e_{j} (t),$ (8) where we denoted $c_{ij} \overset{def}{=} C_{ij}^{'} (0)$ . So, for M functions e₁ (t), …, e_M (t), we have a system of M linear differential equations with constant coefficients.

It is known that a general solution to such a system is a linear combination of expressions of the type t^k · exp(α · t), where:

the value α is an eigenvalue of the matrix ∥c_ij∥, and

the value k is an non-negative integer which is smaller that the multiplicity of this eigenvalue.

Similarly, another consequence of invariance is that for every i and for every a, the function e_i (a · t) also belongs to the optimal space L_opt, i.e., that i.e., that $e_{i} (a \cdot t) = \sum_{j = 1}^{M} C_{ij} (a) \cdot e_{j} (t)$ (9) for some coefficients C_ij depending on a. If we select M different moments of time t₁, …, t_M, we get a system of M linear equations to determine these coefficients in terms of e_j: $e_{i} (a \cdot t_{1}) = \sum_{j = 1}^{M} C_{ij} (a) \cdot e_{j} (t_{1}); e_{i} (a \cdot t_{M}) = \sum_{j = 1}^{M} C_{ij} (a) \cdot e_{j} (t_{M}) .$ (10) In general, the solution of a system of linear equations is a linear combination of the left-hand sides. The left-hand sides e_i (a · t_k) are differentiable functions of t₀, thus, all the dependence of all the coefficients C_ij (a) is also differentiable. So, all the functions in the formula (9) are differentiable. Thus, we can differentiate both sides with respect to a, and get $t \cdot e_{i}^{'} (a \cdot t) = \sum_{j = 1}^{M} C_{ij}^{'} (a) \cdot e_{j} (t) .$ (11) In particular, for a = 1, we get $t \cdot e_{i}^{'} (t) = \sum_{j = 1}^{M} c_{ij} \cdot e_{j} (t),$ (12) where we denoted $c_{ij} \overset{def}{=} C_{ij}^{'} (1)$ . Let us introduce an auxiliary variable $x \overset{def}{=} ln (t)$ , so that t = exp(x) and dx = dt/t. Then, $t \cdot \frac{{de}_{i}}{dt} = \frac{{de}_{i}}{dt / t} = \frac{{de}_{i}}{dx},$ so the formula (12) takes the form $\frac{{dE}_{i} (x)}{dx} = \sum_{j = 1}^{M} c_{ij} \cdot E_{j} (x),$ (13) where we denoted $E_{i} (x) \overset{def}{=} e_{i} (exp (x))$ . So, for M functions E₁ (x), …, E_M (x), we also have a system of M linear differential equations with constant coefficients, and thus, each of these functions is a linear combination of the expressions of the type $x^{k} \cdot exp (α \cdot x) .$ So, each function e_i (t) = E_i (ln(x)) is a linear combination of functions $(ln t)^{k} \cdot exp (α \cdot ln (t)) = (ln t)^{k} \cdot t^{α} .$ (14)

One can check that the only way to have a function representable both as a linear combination of these expressions (14) and a linear combination of expressions t^k · exp(α · t) is when in the formula (14), we have k = 0 and α must be an integer. So, each function e_i (t) is a linear combination of monomials t^k –i.e., a polynomial.

To complete the proof, let us show that all polynomials can only have degree <M.

Indeed, suppose that the optimal linear space L_opt contains a polynomial of degree d, i.e., a function $e^{(0)} (t) = a_{0} \cdot t^{d} + a_{1} \cdot t^{d - 1} + \dots,$ with a₀ ≠ 0. The optimal linear space is invariant with respect to shift, so for each h, the function e⁽⁰⁾ (t + h) also belong to this space. Since L_opt is a linear space, it also contains any linear combination of the two functions e⁽⁰⁾ (t) and e⁽⁰⁾ (t + h), in particular, their difference $e^{(1)} (t) \overset{def}{=} e^{(0)} (t + 1) - e^{(0)} (t) .$ (14) One can check that this difference is a polynomial $e^{(1)} (t) = d \cdot a_{0} \cdot t^{d - 1} + \dots$ (15) of degree ≤ (d - 1). By applying this difference again and again, we get a polynomial $e^{(2)} (t) = e^{(1)} (t + 1) - e^{(1)} (t)$ of degree ≤ (d - 2), etc., all the way to a polynomial $e^{(d)} (t) = e^{(d - 1)} (t + 1) - e^{(d - 1)} (t)$ of degree 0, i.e., to a constant.

These d + 1 polynomials e⁽⁰⁾ (t), …, e^(d) (t) are all linearly independent: indeed, each linear combination $c_{i_{1}} \cdot e^{(i_{1})} (t) + \dots + c_{i_{k}} \cdot e^{(i_{k})} (t)$ for some i₁< i₂ < … and all c_{i
_j} ≠ 0 starts with a non-zero term proportional to t^d-i₁ and thus, cannot be identically 0.

According to linear algebra, in an M-dimensional space, we can have no more than M linearly independent elements, so here we have d + 1 ≤ M, thus d ≤ M - 1, hence indeed d < M.

The proposition is proven.

Footnotes

Acknowledgments

This work was supported in part by the National Science Foundation grants 1623190 (A Model of Change for Preparing a New Generation for Professional Practice in Computer Science), and HRD-1834620 and HRD-2034030 (CAHSI Includes), and by the AT&T Fellowship in Information Technology.

It was also supported by the program of the development of the Scientific-Educational Mathematical Center of Volga Federal District No. 075-02-2020-1478.

The authors are thankful to the anonymous referees for valuable suggestions.

References

Adhikari

, Dahal

, Karki

, Mishra

R.K.

, Dahal

R.K.

, Sasmaland

and Klausner

, Application of wavelet for seismic wave analysis inKathmandu Valley after the Gorkha earthquake, Nepal, Geoenvironmental Disasters 7 (2020), 2.

Al-Hashmi

, Rawlins

and Vernon

F.L.

, A wavelet transform methodto detect P and S-phases in three component seismic data, OpenJournal of Earthquake Research 2 (2013), 1–20.

Botella

, Rosa-Herranz

, Giner

J.J.

, Molina

and Galiana-Merino

J.J.

, A real-time earthquake detector with prefiltering bywavelets, Computers & Geosciences 29(7) (2003), 911–919.

Brito

N.S.D.

, Souza

B.A.

and Pires

F.A.C.

, Daubechies wavelets inquality of electrical power, Athens,Greece, Proceedings of the 8thInternational Conference on Harmonics and Quality of Power 1 (1998), 511–515.

Burrus

, Gopinath

, Guo

Introduction toWavelets and Wavelet Transforms: A Primer, Prentice Hall, Upper Saddle Rover, New Jersey (1997).

Daubechies

Ten Lectures on Wavelets, SIAM Publishers, Philadelphia, Pennsylvania (1992).

Ding

, Cao

Application of Daubechies wavelet transform in the estimation of standard deviation of white noise, Proceedings of the Second International Conference on Digital Manufacturing & Automation, Zhangjiajie, China (2011), 212–215.

Joevivek

, Chandrasekarm

V.J.

, Chandrasekar

, Jayangondaperumal

, Evaluation of optimalwavelet filters for seismic wave analysis, Himalayan Geology 37(2) (2016), 176–189.

Lina

J.M.

Complex Daubechies wavelets: filters design and applications, Proceedings of the 1st International Congress of the International Society for Analysis, its Applications and Computation ISAAC’97, Newark, Delaware (1997).

10.

, Xue

, Yang

and He

, A study of the construction andapplication of a Daubechies wavelet-based beam element, FiniteElements in Analysis and Design 39(10) (2003), 965–975.

11.

Mahmoodabadi

S.Z.

, Ahmadian

, Abolhasani

M.D.

ECG feature extraction using Daubechies wavelets, Proceedings of the 5th IASTED International Conference on Visualization, Imaging, and Image Processing, Benidorm, Spain (2005), 343–348.

12.

Mallat

A Wavelet Tour of Signal Processing: The Sparse Way, Academic Press, Burlington, Massachusetts (2009).

13.

Nguyen

H.T.

, Kreinovich

Applications of Continuous Mathematics to Computer Science, Kluwer, Dordrecht (1997).

14.

Percival

D.B.

, Walden

A.T.

, Wavelet Methods for Time Series Analysis, Cambridge University Press (2006).

15.

Rowe

A.C.H.

and Abbott

P.C.

, Daubechies wavelets and Mathematica, Computers in Physics 9(6) (1995), 635–648.

16.

Shan

, Ma

and Yang

, Comparisons of wavelets, contourlets andcurvelets in seismic denoising, Journal of Applied Geophysics 69(2) (2009), 103–115.