Abstract
In many applications, including analysis of seismic signals, Daubechies wavelets perform much better than other families of wavelets. In this paper, we provide a possible theoretical explanation for the empirical success of Daubechies wavelets. Specifically, we show that these wavelets are optimal with respect to any optimality criterion that satisfies the natural properties of scale- and shift-invariance.
Formulation of the problem
first, we use the values q (t1), …, q (t
k
) observed during some observation period to determine the values of these parameters, i.e., to solve the system of equations
then, we use the resulting values c
i
to predict the to predict the future values q (t) of the quantity q as q (t) = f (t, c1, …, c
n
).
Sometimes, the dependence on the parameters c
i
is non-linear –so this system of equations is not easy to solve. However, if this is how q depends on time, there is nothing we can do about it.
In many other cases, however, we do not know the shape of the dependence. In such cases, it is also desirable to come up with a general formula f (t, c1, …, c
n
) –with not-too-many parameters c
i
–that would adequately describe the dynamics of the quantity of interest. In such situations, it is reasonable to select a family for which the corresponding system of equations (1) is the easiest to solve –i.e., is a system of linear equations. For this purpose, we select a family for which the dependence f (t, c1, …, c
n
) is linear in terms of the unknowns, i.e., for which
Such representations are indeed actively used in data processing. For example, for a smooth dependence q (t), it is reasonable to approximate it by a polynomial –i.e., by the sum of the first few terms of its Taylor expansion. In this case, the functions f i (t) are monomials t j corresponding to non-negative integers j. For a periodic process with a known period T, we can use sines and cosines sin(j · ω · t) and cos(j · ω · t) for non-negative integers j, where ω = 2π/T, etc.
However, many physical processes –e.g., seismic processes –are neither smooth nor periodical: they consist of time-localized bursts of activity. To describe such processes, it makes sense to use similarly time-localized functions f i (t). Such functions are known as wavelets; see, e.g., [5, 14].
It turns out that we can select wavelets that satisfy a similar property –namely, we select the basic function φ (t) known as the mother wavelet, and take the functions f i (t) of the form φ (2 j · t - ℓ), where j ≥ 0 and ℓ are integers. (There are also similar functions generated by another related function, known as the father wavelet.) For the resulting functions f i (t) to be efficient in representing and processing data, the mother wavelet must satisfy a certain linear functional equation.
This functional equation has many different solutions. Empirically: wavelets corresponding to some solutions work better, while wavelet corresponding to other solutions of the functional equations do not work so well.
To select a single solution –and thus, to fix a family of wavelets –we need to impose additional restrictions on the function φ (t). To make computations easier –and to preserve linearity of the corresponding system of equation –it makes sense to impose restrictions which are linear in terms of φ (t). A general such linear restriction has the form
Once we know one solution φ0 (t) to the system (3) of linear equations, we can have an even simpler system of linear equations for the difference
Of course, once the equality ∫c (t) · Δ (t) dt = 0 holds for all M functions c1 (t), …, c
M
(t), the same equality holds for all possible linear combinations
Ingrid Daubechies proposed to use c
m
(t) = tm-1, i.e., equivalently, to take, as L, the linear space of all polynomials
In particular, in many problems related to processing seismic signals, Daubechies wavelets work very well, much better than many other wavelet families; see, e.g., [1–3, 16] and references therein.
Specifically, we show that these wavelets are optimal with respect to any optimality criterion that satisfies the natural properties of scale- and shift-invariance.
The structure of the paper is as follows. In Section 2, we analyze the problem and, as a result of this analysis, formulate this problem in precise mathematical terms. Section 3 contains the main result –an explanation of why Daubechies wavelets are optimal –and the proof of this result.
Comment. This diffentiability requirement makes sense: e.g., it is known that every continuous function c (t) can be approximated, with any given accuracy, by smooth functions. Since from the practical viewpoint, a very small difference is not noticeable, it thus makes sense to assume that all the functions c (t) are differentiable.
However, a reader should be warned that it is not possible to follow this argument too far: Actually, some wavelets are not smooth. Even Daubechies wavelets of higher order M, while smooth, are not infinitely differentiable: if we differentiate the corresponding mother wavelet again and again, we will eventually reach a function which is not differentiable at some points.
there is a numerical characteristic F (A) describing the imperfection of different alternatives, and the alternative Aopt has the smallest value of this characteristic.
For example, for different wavelet families A, we can take, as F (A), the mean squared accuracy with which the use of the first few wavelets from this family approximates signals from the given set of signals.
However, this is not the only way to describe optimality. In the above example, we may have several different families with the same smallest possible value of the mean squared accuracy. In such a case, we can use this non-uniqueness to minimize some other characteristic G (A): e.g., the average computation time needs to get the corresponding approximation. We then say that the alternative A is better or of the same quality as an alternative B –we will denote it by A ≤ B –if: either F (A) < F (B), or F (A) = F (B) and G (A) ≤ G (B).
If this additional numerical criterion does not lead to a unique selection of an alternative, we can minimize something else, etc., until we reach the final optimality criterion –for which there is exactly one optimal alternative.
No matter how complex our comparison, in all these cases, we have a relation A ≤ B between the two alternative describing that A is better or of the same quality as B.
Of course, each alternative has the same quality as itself A ≤ A, and if A ≤ B and B ≤ C, then A ≤ C. Thus, we arrive at the following definition.
By an optimality criterion, we mean a binary relation ≤ on this set which satisfies the following two properties:
for every
for all An alternative Aopt is called optimal with respect to the optimal criterion ≤ if we have Aopt ≤ A for all An optimality criterion ≤ is called final if for this criterion, there exists exactly one optimal alternative.
However, the numerical value of time depends on the selection of the measuring unit and on the selection of the starting point. If we replace the original unit with a new one which is a times smaller –e.g., consider seconds instead of minutes –then all numerical values of time are multiplied by a. The corresponding linear transformation t ↦ a · t is known as scaling.
Similarly, if we replace the original starting point for measuring time with a new starting point which is t0 moments earlier, then this value t0 will be added to all numerical values of time. The corresponding linear transformation t ↦ t + t0 is known as shift.
In general, if we change both the unit and the starting point, we replace t with a · t + t0 – i.e., we get a linear transformation.
The numerical values change, but the physical process remains the same. From this viewpoint, it is reasonable to require that the relative quality of two different methods should not change if we simply change the unit and/or the starting point. In terms of linear spaces –that describe different wavelet families – we thus arrive at the following definition.
By a linear transformation, we mean a function T (t) = a · t + t0 for some values a and t0. For each linear transformation T and each function e (t), by the result T (e) of applying T to e we mean a function e (T (t)). For each M-dimensional linear space L of smooth functions, by the result T (L) of applying T to L we mean the linear space formed by the functions T (e) for e ∈ L. We say that the optimality criterion ≤ on the set
Now, we can formulate our main result.
Main result
Comments.
Thus, we have indeed proven that the linear space corresponding to Daubechies wavelets is optimal –and thus, so, in this sense, Daubechies wavelets are optimal. The following proof follows ideas first described in [13].
Indeed, the fact that Lopt is optimal means that Lopt ≤ L for all families L, in particular, for all families of the type T-1 (L), where T-1 is the inverse transformation. So, Lopt ≤ T-1 (L) for each L. By using invariance of the optimality criterion, we conclude that T (Lopt) ≤ L for every L, i.e., that the linear space T (Lopt) is also optimal. However, the optimality criterion ≤ is final, which means that there is only one optimal space, so indeed, T (Lopt) = Lopt.
Let us now select any basis e1 (t), …, e
M
(t) in the optimal linear space. Invariance of the linear space Lopt means, in particular, that for each i and for each t0, the shifted function e
i
(t + t0) also belongs to this linear space, i.e., that
It is known that a general solution to such a system is a linear combination of expressions of the type t
k
· exp(α · t), where: the value α is an eigenvalue of the matrix ∥c
ij
∥, and the value k is an non-negative integer which is smaller that the multiplicity of this eigenvalue.
Similarly, another consequence of invariance is that for every i and for every a, the function e
i
(a · t) also belongs to the optimal space Lopt, i.e., that i.e., that
One can check that the only way to have a function representable both as a linear combination of these expressions (14) and a linear combination of expressions t k · exp(α · t) is when in the formula (14), we have k = 0 and α must be an integer. So, each function e i (t) is a linear combination of monomials t k –i.e., a polynomial.
To complete the proof, let us show that all polynomials can only have degree <M.
Indeed, suppose that the optimal linear space Lopt contains a polynomial of degree d, i.e., a function
These d + 1 polynomials e(0) (t), …, e(d) (t) are all linearly independent: indeed, each linear combination
According to linear algebra, in an M-dimensional space, we can have no more than M linearly independent elements, so here we have d + 1 ≤ M, thus d ≤ M - 1, hence indeed d < M.
The proposition is proven.
Footnotes
Acknowledgments
This work was supported in part by the National Science Foundation grants 1623190 (A Model of Change for Preparing a New Generation for Professional Practice in Computer Science), and HRD-1834620 and HRD-2034030 (CAHSI Includes), and by the AT&T Fellowship in Information Technology.
It was also supported by the program of the development of the Scientific-Educational Mathematical Center of Volga Federal District No. 075-02-2020-1478.
The authors are thankful to the anonymous referees for valuable suggestions.
