We study the effect of additive noise to the inversion of FIOs associated to a diffeomorphic canonical relation. We use the microlocal defect measures to measure the power spectrum of the noise in the phase space and analyze how that power spectrum is transformed under the inversion. In general, white noise, for example, is mapped to noise depending on the position and on the direction. In particular, we compute the standard deviation, locally, of the noise added to the inversion as a function of the standard deviation of the noise added to the data. As an example, we study the Radon transform in the plane in parallel and fan-beam coordinates, and present numerical examples.
The purpose of this work is to study how noise in discrete measurements affects the reconstruction in linear inverse problems
where A is a Fourier Integral Operator (FIO). Examples are the Radon transform and the geodesic X-ray transforms in two dimensions, at least, thermoacoustic tomography, and the linearization of some non-linear inverse problems like boundary and lens rigidity. We assume that A is associated with a local diffeomorphism (which condition can be relaxed to the clean intersection condition in principle), and elliptic. Then a parametrix exists, which we will denote by , also an FIO of the same type. One can regard the problem as mapping noise by FIOs, rather than under by their inverses but we keep the former point of view.
We want to emphasize that we are not trying to remove noise. That would be only possible with a priori, say statistical information about f, but this is not the goal of this work. On the other hand, understanding well the structure of the noise under the action of the inverse would allow for better understanding of what part of f (in phase space) is most affected by noise and would hopefully allow for more efficient noise reduction.
We study additive noise first. Such noise is typically created by noisy detectors which add certain constant (but usually low) noise to the signal or by background noise. In Section 7 we study examples of non-additive noise: multiplicative noise, Poisson noise as an example of modulation noise, and noise appearing in CT scan. In case of additive noise, we are given the noisy data , where (a function) is the noise. Then we are trying to solve
instead. The right-hand side (r.h.s.) may not be in the range of A so a solution may not even exist. What is often done is to apply the adjoint (assuming some Hilbert structure)
which automatically cuts the part of perpendicular to the range of A, and then invert , assuming that A is injective in the first place. If not, we invert on the range of . This can be viewed also as the least squares approximation, and it is what the Landweber iteration does, for example. So the inversion is
where is the so described least-squares reconstruction of f without noise. Of course, we could do a different “inversion”. One way to do it to choose a different Hilbert structure. What is described above is very common however and it is known as the Moore–Penrose inverse. We do not have to assume that the inversion is the Moore–Penrose inverse; it could be any parametrix of A, and the Moore–Penrose inverse is such a parametrix under the assumptions we made on A.
With the above considerations in mind, we can think of the added noise as
where, as above, is a parametrix, and is well-defined even if is not in the range of A. This is also the so described solution of
We can drop the and the notation now and just study (1.1) with g not necessarily in the range of A, i.e., g is the noise now.
The example we will use in this paper is the Radon transform in
where is the Euclidean line measure. It is written in “parallel geometry” coordinates. We study this example in more detail in Section 5; and in Section 6, we will study the same problem for the Radon transform in fan-beam coordinates. It is known that is an FIO of order with a canonical relation a graph of a local diffeomorphism (1-to-2). The most popular inversion formula is the “filtered back projection”
where is the transpose in distribution sense; and its versions with adding an additional filter. We view (1.6) as a unfiltered inversion and that with an additional filter, see (5.10), as a filtered one. Now, one can define a norm in the g space by . Then and (1.6) takes the form . Formula (1.6) is used all the time with noisy data not in the range of . In addition, we have . Therefore, the relation can be recast as , which is exactly (1.3).
On the other hand, we may assume that the natural space for g is . Then ; then the inversion is
This inversion formula is equivalent to (1.6). Note that the inverse is a version of (1.3) again.
Assume that in discrete measurements, the added noise consists of random variables with a known autocorrelation. The simplest case is independent identically distributed (i.i.d.) random variables at each “pixel” (white noise). The distribution could be Gaussian, uniform, etc. We convert the discrete measurements to a function on a “continuous”, space, i.e., locally a function on . Then we invert the data by applying a parametrix, as in (1.4). The discretization rate is assumed to be proportional to a small parameter and we are interested in the asymptotic properties as . Our main goal is a characterization of the induced noise after the inversion.
The novelties of our approach are the following. First, we view discretization and the inverse process – interpolation from a given discretization, as the step size tends to 0, in the semiclassical setting, where the small parameter is proportional to the step size. This point of view was proposed in the first author’s paper [17]. This allows us to use tools from semiclassical analysis to estimate the sharp sampling rate of , knowing the band limit of f, characterize aliasing artifacts during inversion if is undersampled, give a sharp limit of the resolution, etc. In this paper, we assume that we do not undersample .
The second novelty is moving the analysis of the spectral character of the noise to the phase space; roughly speaking, instead of localizing in the dual variable ξ only, to localize in both the spatial one x, and ξ. In the applied literature, there are two main ways to characterize noise: through its standard deviation (which assigns just one number) and through its power spectral density (or power spectrum). The latter is , where f is the noise, as a function of the frequency ξ. Knowing that, we can recover the standard deviation as well, by Parseval’s identity. Even though not always explicitly stated, when the noise is not expected to be homogeneous (translation invariant), one can localize in the base variable x by taking the modulus squared of the windowed Fourier transform with some . We propose going one step further: consider the power spectrum in the phase space of points x and (co)directions ξ. With the presence of the small parameter h, the natural framework is the semiclassical analysis again. The semiclassical version of localizing both in space and momentum is to localize near some in the x space with a smooth cutoff of size and then take the Fourier transform with ξ replaced by , see [19] for a discussion. The natural candidate of the power spectrum in the phase space then would be the so-called semiclassical defect measure which, roughly speaking, measures the spectral content of in the phase space. We call that measure power spectrum as well.
The third novelty is looking at the noise in ergodic sense, which we also call “spatial”, i.e., the noise in one measurement. There are two ways to look at the statistical properties of the noise. First, one might be interested in the expected value of the noise pointwise as we keep repeating the same experiment over and over again (in our context, if we have a series of noisy data sets and do an inversion for each one of them). We call this “temporal” view, and the analysis of the temporal properties is easier. In applications, we have one such experiment however. Our goal is to analyze the statistics of the noise in the inversion for a single experiment, as the sampling rate gets smaller and smaller, hence the term “spatial”. In statistics, an estimate with a single experiment is possible when the variables are i.i.d., and we rely on the ergodic properties of the sequence.
We start with analysis of discrete white noise. The flatness of its spectrum in temporal sense, see (8.6), is well-known, which justifies its name. In spatial (ergodic) sense, this is true only in a certain averaged sense, see Theorem 8.1. For the white noise interpolated to a “continuous” function, we show that the defect measure is flat as well in Theorem 4.1. In Theorem 4.2 we study the spectrum of more general, correlated noise.
Next, we study propagation of noise under FIOs (or simply A) of the mentioned type. With the semiclassical view of noise and is power spectrum, the analysis of the power spectrum of the result is reduced to the mapping property of a (semiclassical) defect measure under a (classical) FIO. The answer is given by the Egorov’s theorem with some extra care of the zero section. Then the tools described above would allow us the characterize the spectrum of the resulted noise in the reconstruction. We want to emphasize that even if we start with white noise g, which has a flat spectrum, the noise is not homogeneous in general – its power spectrum depends on the position x and the codirection ξ. In particular, its standard deviation may change from a neighborhood of one point to another.
As we mentioned already, our analysis is not restricted to (additive) white noise only, see also Section 7 for non-additive noise. We can have data with added non-white noise as well, as long as its power density in the general sense we consider it, is well defined. It could be pink, blue noise, if can be anisotropic noise, varying from point to point, or even noise corresponding to a non-absolutely continuous defect measure. For example, we may have the Radon transform with added noise depending on one of those two variables only, then the associated measure would be singular. Theorems 4.1 and 4.3 still apply and describe the power density of the noise in the reconstruction.
Instead of developing the general abstract theory further, we present its application to the inversion of the Radon transform in the plane. In “parallel geometry”, we show that the spectral density of the added noise is independent of the position x and proportional to up to the Nyquist limit (and the spectral power, which is the square of the density, is proportional to ). In “fan-beam coordinates”, the noise depends on the position x, on proportional to again but depends on the direction of ξ (relative to x) as well. We present many numerical simulations.
Noise is a major concern in the applied inverse problems and has been considered in the literature; nevertheless, we are not aware of directly related works. We will mention only a few more theoretical works about noise and inverse problems. Reconstruction of Riemannian manifolds with noisy data has been studied in [5]. Using noise a source for a reconstruction has been studied in [1,2,7,8].
The structure of the paper is as follows. In Section 2, we recall some basic facts about semiclassical analysis, needed for our exposition. We also study the relation between classical and semiclassical FIOs. In Section 3, we summarize and develop further some of the results in [17] about sampling in the semiclassical limit. In Theorem 4.1 in Section 4, we prove that the power spectral density of white noise is uniform, by computing its microlocal defect measure. We also show that more general noise satisfying some assumptions, has a well defined microlocal defect measure as well. Then we apply Egorov’s theorem to describe how that measure transforms under FIOs associated with a canonical diffeomorphism. Sections 5 and 6 are devoted to an application of the theory to the Radon transform on the plane in parallel and to fan-beam coordinates. We present many numerical examples as well. Multiplicative noise and other type of noise are analyzed in Section 7. Finally, in Section 8, we analyze discrete white noise without converting it to noise of a continuous variable. We show that it has flat spectrum on average.
Preliminaries on semiclassical analysis
We recall some basic facts from semiclassical analysis. For more details, we refer to [3,12,19]. Before that, a few words about the notation. All norms are in unless indicated otherwise; also . We denote by the Schwartz class; and is the space of the compactly supported distributions. For a linear operator A, is the transpose in distribution sense, while is the -adjoint.
Semiclassical wave front set
The semiclassical Fourier transform in of a function depending also on is given by
Its inverse is . We recall the definition of the semiclassical wave front set of a tempered h-depended distribution first. In this definition, can be arbitrary but in semiclassical analysis, is a “small” parameter and we are interested in the behavior of functions and operators as h gets smaller and smaller. Those functions are h-dependent and we use the notation or or just f. The Sobolev spaces are the semiclassical ones defined by the norm
Then an h-dependent family is said to be h-tempered (or just tempered) if for some s and N. All functions in this paper are assumed tempered even if we do not say so. The semiclassical wave front set of a tempered family is the complement of those for which there exists a function ϕ so that , and
in (or in any other “reasonable” space, which does not change the notion). The semiclassical wave front set naturally lies in but it is not conical as in the classical case. Elements of the zero section can be in .
Sjöstrand proposed essentially adding the classical wave front set to by considering the latter in , where the second space (the unit cosphere bundle) represents as a conic set, i.e., each with ξ unit is identified with the ray , . Their points are viewed as “infinite” ones describing the behavior as along different directions. An infinite point does not belong to the so extended if we have
with ϕ as above.
We define the symbol class of symbols in as the smooth functions on , depending also on h, satisfying the symbol estimates
for x in any compact set K, see, e.g., [6]. In fact, we are going to work with symbols supported in a fixed compact set in the ξ variable, so the behavior in ξ above does not matter; one may also work with the symbol class , see [12,19] where is defined as (2.2) with . Given , we write with
where the integral has to be understood as an oscillatory one. This is the standard quantization; sometimes it is convenient to work with the Weyl one , where is replaced by in (2.3). Then real symbols correspond to symmetric operators, in particular. Negligible operators are those with norms in any pair of Sobolev spaces.
Semiclassically band limited functions
In [19], it is said that a tempered is localized in phase space, if there exists so that
All functions in this paper will be of this type.
It is convenient to introduce the notation for the semiclassical frequency set of f.
For each tempered localized in phase space, set
In other words, is the projection of to the second variable, i.e.,
where . If (which is always closed) is bounded and therefore compact, then is compact.
We say that is semiclassically band limited (in ), if (i) is contained in an h-independent compact set, (ii) f is tempered, and (iii) there exists a compact set , so that for every open , we have
for every .
We showed in [17] that is semiclassical band limited if and only if it is localized in phase space and if and only of is finite (no points of the type (2.1)) and compact.
In applications, we take to be with some or the ball .
An example of semiclassically band limited functions can be obtained by taking any and convolving if with with . Then is semiclassically band limited with .
Classical ΨDOs as semiclassical ΨDOs
In the applications we have in mind, we deal with classical ΨDOs and FIOs and want to treat them as semiclassical ones. The negligible operators in the classical calculus are the smoothing ones. We showed in [17] that for every and for every smoothing K, we have . Next, every classical ΨDO of order m can be written as an oscillatory integral of the kind (2.3) with and a symbol vanishing for , plus a smoothing operator. Then formally, that oscillatory integral is an h-ΨDO with symbol . Then we can replace in (2.2) by to obtain an equivalent estimate, and
On the support of the symbol, we have , therefore the factor is not uniformly bounded near when . On the other hand, it is uniformly bounded when with . This allowed us in [17], for every , to split into an h-ΨDO with symbol with some cut-off function plus an operator mapping semiclassically band limited functions into functions with semiclassical wave front set in an neighborhood of the zero section. We show below that we can do the same thing for FIOs associated with canonical diffeomorphisms.
Let A be a properly supported FIO with a canonical relation which is a graph of a homogeneous canonical transformation. Then up to a smoothing operator, A is of the form
see [9, Section 25.3], with a a classical symbol and a phase homogeneous in η of odder 1, satisfying , for . The smoothing “error” can still be written in this form with (a ΨDO) and an amplitude of order , so the arguments below apply to it as well. Let have support in , on , and fix . Then , where
Under the assumptions above,
The operatoris an h-FIO with a (semiclassical) canonical relation the same as the (classical) one of A. Moreover, for every semiclassically band limited f with, we have.
For everywith support in some compact set independent of h, satisfyingfor some N, we havewith some.
Since A is properly supported, and f is either outside some fixed compact set in (a), or vanishes in (b), we can assume that the x–support of a is compact and independent of h as well. It follows from (2.2) that if a is a classical symbol of order m, then is a semiclassical one of order for . Therefore our claim (a) is true for and , which is true on the support of the symbol of . Hence is a semiclassical symbol of order . The second part of (a) is immediate.
To prove (b), multiply by and apply :
For the phase we have . By the homogeneity of ϕ, for , and , we have . Then a stationary phase argument implies for such ξ. This proves (b). □
Semiclassical defect measures
Given with , one can show that there exists a sequence so that the limit
exists for every symbol , see [12,19], and defines a Borel measure called a semiclassical defect measure associated to f. That measure may not be unique. Note that is invariantly defined on . On the other hand, its definition (2.7) depends on the choice of the measure (respectively the coordinates) used to define the space there. We can use every quantization of p in (2.7), for example the Weyl one which guarantees that (2.7) is real when p is real-valued.
When f is semiclassically band limited, is compact, hence has compact support as well, and
This in particular implies that our assumption guarantees that is asymptotically constant as . In fact, some authors require , see [12].
Sampling in the semiclassical limit
Sampling semiclassically band limited functions
We recall some results in [17] first. The classical Nyquist–Shannon sampling theorem says that a function with a Fourier transform supported in the box can be uniquely and stably recovered from its samples , as long as . More precisely, we have
where we adopt the “engineering” definition of the sinc function
Moreover,
where is the norm, see, e.g., [14] or [4].
The proof is based on viewing the samples as the (inverse) Fourier coefficients of , extended as -periodic function. We reproduce the proof below in the semiclassical case.
In [17], we formulated this, and related results in the semiclassical setting. One of those theorems is the following. Recall that is defined in Definition 2.1.
Letbe semiclassically band limited withwith some. Letbe supported in, andfor. If, thenand
One could think of as somewhat better versions of the sinc function: they decay faster if we choose to be smooth. We can do this because (which is compact) is assumed to be included in the interior of the closed . In case of an equality, we must take .
Assume now for simplicity that all and are equal to some B and s, respectively. We can always choose a linear transformation to get back to (3.2) or even more general sampling grids, and the dual one for the dual variables. Set . Then (3.2) and (3.3) take the form
and
The proof of Theorem 3.1 is based on the following observation. Since is supported in up to an error, and , we have
where is the periodic extension of with period in each 1D variable. Multiply this by to get
If is the interpolating function in (3.4), then
Take to complete the proof. The full details can be found in [17]. Also, χ does not need to be of product type, as shown there.
In the limit case , which is not allowed by the theorem since is compact, we have , where stands for the characteristic function of . Then χ is a product of sinc functions, see (3.1). We will use the notation
Then (3.7) takes the form
The functions form an orthogonal system, and
is an orthonormal basis in the subspace . For future reference, we want to mention that for every integer, if is defined by
then
instead. Then
is an orthonormal system in but not a basis. To make it a basis, notice that s was replaced by and k was replaced by there. Allowing the original k to run over all integer points, i.e., replacing k by there would complete (3.12) to a basis.
Constructing a semiclassically band limited function from a discrete sequence
The next question is how to associate a semiclassically band limited function to a set of numbers , , which we view as its samples. Without the band limited requirement, this can be done in infinitely many ways, of course, by various ways to interpolate between the samples. On the other hand, if we fix the band limit B, then for , such a function, if exists, would be oversampled, and those samples can be shown to be dependent. We can only hope that this problem always has a solution when . The next proposition, proven in [17], shows that it can be done when with a sinc interpolation.
Letbe both open. Fix. For, letbe the set of those k for which. Then for every collection of complex numbers,withtempered, there exists a semiclassically band limitedwithso that.
One such choice is given bywhereis equal to 1 near. Moreover, (
3.5
) holds.
With as above, we have and
compare to (3.6) and (3.9). Then . Let ψ be as in the theorem. Then has the required properties. □
By Theorem 3.1, (3.13) is the only such representation when is restricted to Ω, up to an error, if we want to keep the band limit B to be the sharp one .
The expansion (3.13) has the usual downsides associated with the presence of the sinc functions there – they decay too slowly at infinity allowing the influence of each term to extend too far. When (strictly), we can have the localized interpolation functions of Theorem 3.1 in principle. The situation is different than that in Theorem 3.1 though. The functions in (3.2) do not necessarily satisfy , where stands for the Kronecker symbol. In the case under consideration, they have to (up to an error). Also, when , the corresponding function would be undersampled rather than oversampled. Next, in interpolations like these, the desire is to make it as smooth as possible.
One way to enforce is to replace the sinc function in (3.12) with itself, multiplied by some with , , i.e., to put a product of for each point . Then has support larger than which corresponds to a band limit greater than , compare with (3.14). One can also have a rapidly decreasing instead of a one, and the resulting error by replacing it with a suitable a one can be estimated easily.
Lanczos-3 interpolation and other convolution based interpolations
One practical and approximate realization of the idea above is the Lanczos-3 interpolation. It is part of the family of the Lanczos-k interpolations with the number 3 below replaced by an integer k. In it, the functions in (3.2) are taken to be
where H is the Heaviside function, and x stands for each coordinate function . Its Fourier transform is not of compact support but decays like with a small leading term; and it is very small outside , see Fig. 1; as opposed to which Fourier transform is supported on . The kernel is easy to compute numerically, has a small support, and preserves the property because for k an integer. So for all practical purposes, choosing χ in (3.13) to be Lan3, provides an interpolation with a band limit no greater than and even ; 1.5 to 2 times that of (3.13), see Fig. 1.
If the samples in (3.15) with χ a Lanczos-3 kernel, are those of a function with a band limit (the Nyquist limit for that step size), then the reconstruction will leave frequencies below mostly unchanged, and will attenuate and alias those between and B as in Fig. 1. The resulting aliasing will be “small” because the amplitude is “small” away from (in Fig. 1, ). The created will have an essential band limit larger than B, as explained above, and this is true even if the samples are arbitrary.
The Lanczos-3 kernel Lan3 and its Fourier transform. The sinc kernel and its Fourier transform are shown as dashed lines.
The property of the Lanczos-3 kernel to be almost 1 in can be used to practical interpolations with an explicit kernel with small support approximating well enough in (3.2). For this, it is enough to oversample twice or even 1.5 times only in each coordinate and use the Lanczos-3 kernel. We use this technique in the numerical computations later. This way, we work with a very well localized kernel rather than with the sinc one.
The Lanczos-3 interpolation belongs to the family of the convolution based interpolations of the type
with various compactly supported kernels χ. It is easy to see that this is the case when the interpolation is translation invariant, has a finite domain of influence, and is a linear operator. The simplest examples are the nearest neighbor ( are characteristic functions of boxes in ) and the linear interpolation. Some of the higher order ones are the third order cubic Catmull–Rom spline and a fourth order cubic spline proposed by Keyes, see [11,13]. Without going into detail, we will mention that those two are very similar to Lanczos-2 and Lanczos-3, respectively with the Keyes one being a bit more smoothing that Lanczos-3. The Fourier transforms related to the cubic interpolations and the Lanczos-3 ones decay fast enough to be well approximated with compactly supported ones. Then we have the following.
Let be such that . Let be defined by (3.12). Then form an orthonormal system, see Remark 3.1. For
we have
Multiply (3.11) by , we get
where we used (3.7), valid for as in (3.15) with every χ as in the proposition. Therefore, . Hence it is easily seen that , where we recall that is defined by (3.16). Combining this with (3.17), we complete the proof. □
Noisy samples
Let us say we restore a semiclassically band limited function from noisy samples. Assume oversampling, i.e., (strictly). Without noise, we would use the formula (3.4) where are the samples which we call here . In other words, we would take as in (3.15) with χ so that
which we take to be in the Schwartz class; and we can do this since we can choose to be in .
If we do the same thing with the noisy samples, the added noise will be given by (3.15) again. Then would be true for the noise free samples since a priori, has a band limit B. This would not be true for the noisy samples, in general, because they are not necessary samples of such a function. In fact, one of the goals of the current contribution is to tackle this issue.
As in the proof of Proposition 3.2, note that (3.15) and the sinc reconstruction (3.13) are closely connected: one can get the former from the latter by applying the convolution operator to it.
Delta type of expansion
We can view the convolution based interpolation (3.15) as a convolved delta type of expansion in which χ is formally replaced by the Dirac delta. Indeed, start with
then with . On the Fourier side, we have (3.14) without the cutoff function there.
Noise and defect measures
Microlocal defect measures as a generalization of power density
We start this section by specifying the kind of white noise considered in the sequel, see also Section 6.
For every, the noise is modeled by a familyof independent and identically distributed (i.i.d.) real valued random variables defined on the same probability space. The random variableshave zero expected values and a common finite variance. For our computations we also make the following technical assumption on the common higher moments: there exists a constantsuch that
The variables model the noise at each cell/pixel , with the relative step fixed, and a small parameter. In Hypothesis 4.1 we allow to depend on h, but h will often be omitted for notational sake. In the numerical examples later, we use either normally distributed or uniformly distributed ones. For a fixed bounded domain Ω, the number of sampling points in it (we called that index set in Proposition 3.1) is . For each , only that many ’s will be used eventually; therefore, we have a triangular array of random variables , , .
As explained in the introduction, there are two types of statistical properties we are interested in. First, what we call “temporal” mean, variance, etc., are the moments of each as a random variable. They are determined by the process which creates them and in practical applications correspond to repeated experiments, hence the term “temporal”. We use the notation , , etc., for the expectation. The second, and the more interesting kind of properties are for a single experiment as , i.e., when the number of grows. The mean is just the mean of those finitely many numbers, and the variance is the mean of their squares. We call them empirical spatial mean and variance, using the notation VAR for the latter and STD for the spatial standard deviation. Limit theorems for averaged random quantities with certain invariances are called sometimes ergodic properties; we view them as “spatial” ones, interpreting as samples of some function in space. By the strong law of large numbers, the mean of of ’s converges to zero almost surely, and its spatial variance converges to almost surely, as . Below, we define similar quantities for continuous function-valued random variables.
Our terminology could be confusing since for random processes, that is families of real-valued random variables, t is naturally interpreted as a time parameter. However, in our case the parameter (denoted as x) is a spatial variable and has to be considered as a random field.
With Hypothesis 4.1 in hand, we think of each discrete noise as identified with a function as in (3.15) with some without necessarily assuming (3.19) for now. Clearly, , which is a temporal characteristic. We now state a lemma for the spatial mean and variance of .
Letbe a noise satisfying Hypothesis
4.1
, and define the functionaccording to (
3.15
). Then-almost surely we haveAs far as the spatial variance ofis concerned, we get the following-almost sure limit,
We will only prove (4.3), the proof of (4.2) being similar. To this aim, starting from (3.15) and using the fact that is an orthogonal system we get
where . Plugging (4.4) into the definition (4.3) of , we obtain
Taking limits in (4.5) now amounts to applying an almost sure limit theorem for the triangular array . This is ensured by the relation and classical theorems on strong law of large numbers for triangular arrays (see e.g [10, Corollary on p. 378]), as soon as the random variables satisfy Hypothesis 4.1. The proof of our claim (4.3) is now easily achieved. □
By (4.3), is bounded almost surely, therefore it almost surely has a microlocal defect measure (possibly not unique) associated to it. In this paper, we consider every such semiclassical defect measures , defined in Section 2.5, as a spectral density of . In Theorem 4.1 below however, we show that the limit is unique and it holds for every sequence in the case we consider.
One can see that makes sense as the variance density in the phase space. In fact for a domain Ω, the quantity
corresponds formally to p being the characteristic function of Ω, divided by , which would correspond to the usual variance definition if existed. We are not claiming that the latter limit exists however but when f is white noise, the defect measure exists as a limit in mean square sense, as we prove in Theorem 4.1 below. The superscript 0 in (4.6) is a reminder that this is a quantity in the limit . We want to emphasize that is just defined by (4.6) and (4.8) below for any f for which exists and it is not necessarily connected to any random f. When f is random (noise), is related to it as in Theorem 4.1 below. We define the standard deviation as the square root of the variance (with or without the superscript 0).
Assume now
with some continuous . Then taking the limit as Ω converges to a point, we set
Hence can be viewed as the asymptotic variance density of the noise at x.
A remark about the Wigner function
In this section, we will relate the Wigner function to the defect measures at a heuristic level. For a noise satisfying Hypothesis 4.1, we set
where is the Wigner function, see [2],
Note that is h-dependent and not a measure in general since it may take negative values. However, the existence theorem of defect measures says that there exits at least one sequence for which converges to some . Moreover, we have
In [2], de Verdière considers random vector fields , , and defines their auto-correlation by
Then he defines the power spectrum of f by
This lifts the notion of power spectrum to the phase space but the limit is not taken.
Following the steps of the forthcoming Theorem 4.1 and using crucially the fact that , we let the patient reader check that
where Q is defined by (4.19). Thanks to (4.9), this leads to the expected value of the Wigner function up to an error in a weak sense; and eventually, it could lead to the expected value of the defect measure, if we can take limits as in any reasonable probabilistic sense. There are several difficulties with this approach. We have to treat and estimate the remainder as a measure applied to p; different subsequences could converge to different defect measures for a fixed while the expected value applies to all such sequences, etc. The latter is the important reason we do not pursue this approach. In addition, the Wigner function method characterizes the power spectrum of the noise after repeated experiments (in temporal sense), while we want to study a single one (in ergodic sense).
The defect measure of white noise
Let , have values in R. As before, is a bounded domain. In the theorem below, given , we associate a semiclassically band limited function to by (3.15). This uses terms of the sequence . We allow to depend on h. Then we get a triangular array of random variables.
The following theorem is the main technical result of this paper.
Assume thatis a noise satisfying Hypothesis
4.1
, withmoments only. Namely the random variables,take values inRand are created by a white noise process with varianceand a bounded fourth moment.
Letbe the associated distribution given by (
3.20
) with some fixed. Then for every,where
Letbe the associated function given by (
3.15
) with some fixedand withnot necessarily satisfying (
3.19
). Then for every,where
Notice first that the l.h.s. of (4.12) is well-defined in distribution sense since the Schwartz kernel of , see (4.33), is Schwartz class. Let be such that . Then can be replaced by as in Section 3.5; which is (3.15). Therefore, we need to prove (b) only.
We start with the easier case when (3.19) is satisfied (with ). This corresponds to the practical situation of restoring an oversampled function with white noise added, and the theorem studies how the noise is added to the result.
Recall that the functions were defined in (3.8) and that form an orthonormal basis in the space , as mentioned earlier. The interpolation function χ satisfies by (3.19), therefore,
Since , we have
where, as before, , , and
with
We shall prove in Lemma 4.2 that . Our aim in (4.29) is to prove that in the sense we have
We now split the proof of (4.20) in several steps.
Step 1: A decomposition: Split the summation in (4.17) over elements on the diagonal and away from it:
where
Furthermore, according to (4.29) below we have
Thus owing to the fact that , we can recast (4.21) as
where the term is defined by
We are now reduced to prove that both and in (4.24) converge to 0 in .
Step 2: Analysis of: Observe that the random variables are independent, have zero expectation and a finite variance under our fourth moment assumptions. Then . Moreover, invoking the forthcoming inequality (4.28) and the fact that , we get
Therefore, converges to 0 as , in the sense.
Step 3: Analysis of: The random variables , , have expected values zero and variance . Next, and are not independent unless neither nor are equal to k or l but they are uncorrelated. Indeed, we only need to check that when, say and even then, because all have expectation zero. Therefore some elementary considerations, together with (4.28), reveal that
Therefore, in mean square sense.
Summarizing our considerations so far, the proof of the case when (3.19) holds is easily achieved by plugging (4.25) and (4.26) into (4.24).
Step 4: Dropping the assumption (
3.19
). Let m be such that . Let be as in (3.12). Then (4.16) takes the form, see also (3.18),
The necessary modifications of the proof above in this case are as follows. For the deterministic term featuring in (4.22) we have the same formula but now,
The set is an orthonormal system in but not a basis, see Remark 3.1. The missing elements are those with fractional indices in . Then there are many “gaps” in the sum compared to the one with a basis, giving us a trace as in Lemma 4.2. On the other hand, the extra factor in (4.27) allow us to think of each term as an approximation of all terms in a box around k of size one, which would add the missing terms. The error is (multiplied by the constant ), by (4.31). Since has points, this introduces an error, thus (4.14) is preserved. □
The following lemma was used in the proof above. Below, stands for the Hilbert–Schmidt norm.
Fordefined by (
4.18
), we havewhere q is the complete symbol of the h-ΨDO Q in (
4.19
). Next,whereis the inverse Fourier transform of p w.r.t. ξ
Inequality (4.28) follows directly from the fact that is bounded uniformly in h, see, e.g., [19, Theorem 4.21]. If we add the basis elements of to the terms in (4.18), we will get zero contribution, so we consider it done. Then the first equality in (4.29) follows by the definition of a trace. The second part follows from [3, Ch. 9].
To prove (4.30), write
see also the proof of [16, Theorem VI.23]. This proves the first part of (4.30). For the second part, notice that by [3, Ch. 9] again, the Hilbert–Schmidt norm of a classical ΨDO is given by
We can turn R into a classical ΨDO by setting formally to get
Combining this with (4.32), we complete the proof of (4.30) as well.
Finally, the Schwartz kernel of is given by
and is in the Schwartz class. Then
Make the change of variables , ; then , to get (4.31). □
(a) The presence of the parameter s in (4.15) is to be expected. The random sequence is not related to any distance scale, while is the distance between two adjacent points on the sampling grid after we associate to . Then s reflects the choice of that scale.
(b) For every x, we have, see (4.6),
in mean square sense, see also (4.10). In particular, if χ is a product of sinc functions, we get , i.e., has the same variance as that of , in a limit. If , then in one dimension. In dimension n, we have a product of such ’s, then the factor would be instead, therefore, . Note that there is no dependence on s here. For the linear interpolation, , therefore, . All those equalities are mean square limits in the sense of the theorem.
(c) If we are interested in the expected value of the variance in repeated experiments, the equivalent of (4.34) is easy to get. We can think of as a linear operator, say Ψ, applied to , i.e., . Then
where the latter norm is the Hilbert–Schmidt one. Then the equivalent of (4.34) can be derived from this formula. That requires repeated experiments however.
(d) The variance (4.6) is like the l.h.s. of (4.14) with p being the characteristic function of Ω divided by its volume. The theorem requires p to be smooth though, so we may think of (4.6) as an approximation of with (independent of ξ) approximating that normalized characteristic function.
(e) Theorem 4.1 says that the noisy in (3.15) converges in weak sense to .
(f) We can assume that the noise is not homogeneous, for example that are replaced by with some smooth ζ. This case can be handled as explained in Section 7.1, where and the problem with described there does not exist in this case. This would introduce the extra factor in (4.15). In principle, one can consider noise inhomogeneous in phase space, i.e., ζ being a suitably sampled ΨDO or an h-ΨDO.
In Fig. 2, we present an one dimensional numerical example. In Sections 5 and 6 we show two-dimensional ones. We take a discrete with components, upsize it to a 200 point grid with the Lanczos3 algorithm, and plot , where the hat stands for the Discrete Fourier Transform, then the same quantity computed as a square root of averaged over and experiments, for frequencies in . This illustrates (4.11). The limiting profile looks very close to the profile in Fig. 1, right, as expected from our Remark (e) above. At the right hand side of the plot, it is not as close to zero as the profile in Fig. 3 because of the error in (4.11); here only. The plot on the right is essentially the expected value of the Wigner function .
Plot of for , with averaged over 1, , and experiments.
In Fig. 3, the setup is as above but we show the smoothing effect of averaging the power spectrum within a single experiment, illustrating relation (4.14). To this aim we consider with , , and components. The frequency interval is divided into 25 subintervals and averaged there, similarly to Fig. 16. The plot on the left is very close to the plot of the modulus of the Fourier transform of the Lanczos3 filter in Fig. 1.
Plot of with a single experiment, for , with averaging over 25 subintervals.
Microlocal defect measure of more general noise
We consider more general noise now. First, we assume that the random variables might be correlated with the neighboring ones; and second, we assume that this correlation might be position dependent. Since the position of would be at , this more general noise would be assumed to satisfy the following.
For every, the noise is modeled by a familyof real valued random variables defined on the same probability spacewith zero expected values. They are all assumed to satisfy (
4.1
) with a uniform bound. For the autocorrelationwe assumewhere,,, is smooth in x, and supported in a bounded set w.r.t. both variables.
Note that we are no longer requiring, in particular, to have the same variance. They are not identically distributed, in general.
Let
be the inverse Fourier series of β with respect to the m variable. This is essentially the Wigner distribution related to the auto-correlation, in the limit . Since , we must have for all . Then (4.36) is just a cosine series, and in particular real. The theorem above shows that it is in fact non-negative.
The generalization of Theorem 4.1 to this case is the following.
Assume thatis a noise satisfying Hypothesis
4.2
, withmoments only. Letbe the associated function given by (
3.15
) with some fixedand withnot necessarily satisfying (
3.19
). Then (
4.14
) remains true with
We follow the proof of Theorem 4.1. We replace the diagonal Δ in it by , where M is so that for . The off–Δ terms do not contribute to the limit (4.14) as above. For the rest, we estimate their contribution for every fixed m, and then sum up the results. The analog of now, depending on m, is
where
The analysis of is similar: the random variables have zero expectation, thus . They have a uniformly bounded variance. To estimate , notice that only terms in the expansion would have a non-zero expectation; and by (4.28), again. It remains to compute the β term in (4.38).
Recall the definition (4.18) of . With there, an easy calculation shows that for any h-ΨDO , with (which is a symbol as well, notice that there is no h in the phase). The principal symbol of that is just . Then the β term in (4.38) takes the form
By the properties of , recall (3.8) and (3.9), replacing above with would result in an error in each term, and a total error . Considering this done, and moving the β factor to the right, we get a quadratic form with q multiplied by :
where .
So far, m was fixed. Summing over m (the number of those terms is ), we get to the situation of the proof of Theorem 4.1 with q replaced by
The theorem then follows as in the proof of Theorem 4.1. □
Spectral density under an FIO
We want to find out how a spectral density transforms under an action of a classical FIO of order m. It is easier to answer this question for semiclassical FIOs since the defect measures are a semiclassical object, and we will reduce the classical case to the semiclassical one.
Let A be a classical FIO of order m onwith a homogeneous principal symbol associated with a canonical relation which is a graph of a local diffeomorphism κ (called the canonical transformation of A). Letbe semiclassically band limited and uniformly bounded in. Then for every defect measuregiven as the limit (
2.7
) for some, the defect measureassociated to the same sequenceexists as well and it satisfieswhere b is the (classical) principal symbol of.
By (2.7),
Since we need to find away from the zero section, it is enough to assume that near .
If for a moment we ignore the need to cut near , then we can think of A as in (2.5) as an h-FIO with symbol for . Then by the semiclassical Egorov’s theorem [12, Theorem 5.5.5], which an analog of the classical one, (Theorem 25.3.5 in [9]), we would get
where Q is an h-ΨDO with a principal symbol , with b the (classical) principal symbol of and κ is the canonical transformation of A. Note that the canonical relations of A and its semiclassical version after the change are the same.
To deal with the fact that we have a classical FIO and a semiclassical ΨDO, we apply Theorem 2.1. Let be as in (2.6). For , the remainder would contribute an error to (4.39) if we replace A there by because near . Therefore, we can consider this done. Then is an h-FIO, see (2.6) with symbol supported where . On the support, , and there, is homogeneous for ; therefore
Then we can apply the semiclassical version of Egorov’s theorem [12, Theorem 5.5.5]. For that, we need to compare the principal symbol of the h-ΨDO to that of the classical ΨDO and see how the cutoff near the zero section affects that.
The principal symbol of is given by
where is the projection on the fist variable, and is a smooth Jacobian, homogeneous of order zero w.r.t. ξ, depending on the phase function only. For we have ; therefore
This is the principal symbol of as a classical ΨDO as well without the factor . Therefore, the limit of (4.40), as , would be
as long as for . Make the change of variables , and using the fact that κ is symplectic, in particular an isometry, we would get
when for , where is the pull-back under . Since is arbitrary, this holds when . Then
So far A was microlocalized near pair of points, where κ is a (global) diffeomorphism. Since it is only a local one, we can do the same for each branch, and add the results. Then b would be the principal symbol of with all branches combined, as stated. □
The proof also implies that if and are h-ΨDOs, then
where b still denotes the principal symbol of .
Take (i.e., a multiplication) with r smooth. Then, up to , equality (3.15) for takes a similar form but now are replaced by . This is an example of non-homogeneous noise, depending on the position, for which Theorem 4.1 applies but then the measure is as in (4.42).
Let R be a convolution with with some . This is an h-ΨDO with symbol , therefore we get the factor in (4.42). An elementary computation shows that, up to , is obtained from . Those are correlated (in general) random variables. They model sensors with cross-talk. Then Theorem 4.1 applies with the measure is as in (4.42).
Both examples are covered by Theorem 4.2 as well if you think of as f but generated by correlated noise .
Back to the inverse problem
We return to the inverse problem (1.1) now. Let A be an FIO as in Theorem 4.3, and elliptic. More precisely, let be a bounded domain, and let be another such domain so that the canonical transformation κ of A maps into . By a compactness argument, if A is defined first as , then the range of κ projected to its base variable is a bounded set, thus such an exists. Outside , the image of A is smooth. The measurement g, supposedly equal to for some but corrupted by noise, is a function defined in . Then (1.1) is microlocally solvable: (we do not have problems with g not being in the range because is a parametrix) and we are in the situation above with A replaced by . The added noise is given by (1.4). Dropping the subscript “noise” as we already did, we assume that g is given first as discrete noise and then converted to a semiclassically band limited function g as in (3.15). Then
We have not defined what noise is but we can think of this as noise because it is a linear combination of with random coefficients. It has zero mean in the sense of (4.2). Then
where κ is the canonical transformation of A and b is the principal symbol of . By Egorov’s theorem again applied to the operator , the principal symbol of it is that of multiplied by . Therefore, is the principal symbol of .
The defect measure (4.43) then describes the power spectrum of the noise in the reconstruction away from the zero section. We cannot expect to get an estimate near the zero section in this case since A may not be even injective. For example, the interior region of interest problem for the Radon transform in the plane has no unique solution and the practical solution is a parametrix. Then every element in the kernel would be smooth and could be considered as noise with zero frequency.
Next theorem is a direct consequence of (4.43). The operator Q is needed to cut the zero section, and R is a filter which we may want to apply to the data, see also next section. Below, stands for the principal symbol of Q.
Let A be as above, and elliptic, and letbe semiclassically band limited with, uniformly bounded in. Ifis any h-ΨDO inwith an h-independent symbol, and ifis a similar h-ΨDO in Ω withnear the zero section, thenfor every g (called there f) as in Theorem
4.3
.
By Remark 4.2 about Theorem 4.3 and (4.6),
Make the change of variables , where are the variables in the phase space of g, using the fact that κ is symplectic, and therefore an isometry, to get the second equality of the theorem. □
A typical use of this theorem is to take q to cut off smoothly a small neighborhood of the zero section. Then, for g being white noise, for example, the effect of that on the r.h.s. would be small. Then if we formally take , hence , we get a good approximation of the variance of the noise in the reconstruction away from the zero frequency noise, by Theorem 4.1. The operator R plays a role of a filter before the inversion.
We want to emphasize that g in Theorem 4.4 does not need to be white noise; we just need a well-defined , which is the case for noise satisfying Hypothesis 4.2, by Theorem 4.2.
In some situations, like in the next two sections, the requirement near the zero section can be removed, and the whole operator Q can be removed (replaced by ). Assume that the filter r is compactly supported in the dual variable. Since we deal with semiclassically band limited g, we can always assume that. Assume that is absolutely continuous near the zero section. In the case of the Radon transform in parallel geometry in the next section, for example, with g being white noise, that measure is , so this assumption is satisfied. Then the first integral in (4.44) has a limit when q (a priori vanishing near ) tends to 1, and that limit is given by the same formula with . Then the l.h.s. has the same limit, too„because we just defined it by that equality, see (4.6). A similar remark applies to the second integral.
The Radon transform in “parallel geometry”
We apply the theory to the Radon transform now. We study the parallel geometry parameterization first, where each (directed) line is parameterized by its signed distance p to the origin, and its normal ω, see (1.5). For
we choose the natural measures ; and the standard measure for p. Based on that, we a define the microlocal defect measure of . If we restrict p to , corresponding to Radon transforms of functions supported in , since φ naturally belongs to (modulo ) (call that Ω), then
The Radon transform is an FIO of order with a canonical relation given by the union of canonical relations corresponding to the canonical transformations
The ranges of intersect in the zero section only, and in particular, on the range of . Next, each branch is a local diffeomorphism. Indeed, is given by
It is well defined for but if we want x in the image to be in , we need to require ; therefore are well defined away from the zero section. Then is associated with , which is a local diffeomorphism as well. What prevents it from being global is that it is 2-to-1, i.e., and in particular, it is not injective.
The unfiltered inversion
The symbol of is , where is the dual of p. Applying the canonical transformation, we get . We could have obtained this as the principal (and full) symbol of . Therefore, by (4.43),
The fact that κ is 1-to-2 presents some subtlety here, already accounted for in the proof of Theorem 4.3. Microlocally, one can express as ; then each has normal operator with principal symbols one half of that ; then we apply (4.43), and the combined result would be still the principal symbol of .
Let us say that we have f supported in with a certain semiclassical band limit . We take its Radon transform . Here, f is not discretized, we can think of as the physical X-ray transform. The assumption on the band limit will be satisfied if the X-rays are not really ideal lines but have some thickness. Then we sample densely enough to satisfy the Nyquist requirements and add noise to it. The noise will have higher frequencies than those coming from f if is oversampled. When we invert , we will get higher frequencies for f as well that do not originally belong to the set where the frequency set of f lies. We can apply a filter, cutting them to . Note that this is a filter not affecting f, that is why we think of those as a unfiltered inversion. One way to do this is to restrict to before applying in (1.6).
More precisely, let and
Then the range of the frequency sets of all such f’s (the projection of the semiclassical wave front set on the fiber variable) of is the double cone
included in the box , see Fig. 4 and [17] for more details. The set (5.5) is the “worst scenario case” over all points . For , the opening of the cone is much smaller: . We refer to [17] and Fig. 3 there. This describes the range of κ. Therefore, some portion of the noise will not propagate back to the reconstructed f.
We assume that we sample at a rate smaller than the Nyquist requirement for the box . Moreover, we assume an interpolation kernel χ in (3.15) (with f replaced by g) is chosen so that in a neighborhood of . As we explained in the introduction, we assume that the data is (white) noise, since the problem is linear. Then the power spectrum of the noise (more precisely, the Wigner function) converges in mean sense to a defect measure that is absolutely continuous by Theorem 4.1, i.e., it has the form of the kind (4.7) on , with as in (4.15). Then on , we have , and
This is “blue noise”. Here and below, all equalities about the statistics of f are in the limit sense of Theorem 4.1, see (4.6) and (4.8). An important observation is that there is no x dependence in this case. The dependence on ξ is rotationally invariant. This is not the case with the Radon transform in fan-bean coordinates as we will see below.
The frequency set of .
By (5.2),
The two variances are equal because is independent of the position.
Assume that the sampling rates of g are based on and which take their sharp values not to allow undersampling: , , where B is the band limit of f as in (5.4). Then
Note that this is actually the sharp lower bound of the variation when the oversampling becomes asymptotically sharp sampling but it is not achievable in our theory; this would require a sinc interpolation while we need a rapidly decreasing kernel.
For the variance of , we have, see Theorem 4.4 and Remark 4.3,
We get the following theorem.
(unfiltered inversion).
Under the assumptions above, in particular assuming that g is white noise, and no undersampling, we haveIfis sampled sharply, then
Recall that we defined , see (4.6), and similarly, , as integral of the defect measure. The implication of this theorem is that when we have g created by a white noise process, then for every with near the origin, converges in mean square sense to a quantity (see (5.13)), which itself converges to the r.h.s. of (5.8), respectively (5.9), when . In other words, the cutoff near is removable at the expense of taking a double limit: first , then (in sense).
The filtered inversion
The Radon transform is inverted often with a low-pass filter before applying in (1.6), i.e.,
where ν is an even function decaying away from the origin. Assuming a band limit for the p variable, determined by the sampling rate , for example, one popular filter is the Hann filter:
and otherwise. Another commonly used filter is the cosine one
They are plotted in Fig. 5.
The Hann and the cosine filters with .
There are many other filters (windows) used in signal processing and imaging. We assume that ν is continuous and supported in . If the shape of the filter is fixed, say Hann, then with some fixed supported in , see, e.g., (5.11). Then (5.3) takes the form
where is the filtered inversion, defined as the operator applied to g in (5.10). Then the equivalent to (5.6) is
Taking as before, similarly to (5.7) we get the following analog of (5.7)
where
We proved the following.
(filtered inversion).
Under the assumptions above, in particular assuming white noise and no undersampling, with a filter, we haveIfis sampled sharply, then
If there is no filter (), we have , which explains the appearance of the factor in the definition of . For the Hann filter, , then . For the cosine filter, . In (5.19) below, the constant would be approximately 0.07676 for the Hann filter and 0.11327 for the cosine one.
Numerical experiments
We use MATLAB and the built in radon and iradon routines to compute and invert numerically the Radon transform in the plane. The default angular step is one degree but it can be changed. Assume that f is given on an lattice. Then by default, radon computes on a lattice, with rounded; the actual formula is . Then iradon inverts the data to the original grid (with N replaced by or which does not matter in view of our asymptotic setup).
As we showed in [17], this choice of the discretization of is suboptimal for ; we need to compute on an lattice with , at least, and some oversampling would be beneficial, see Fig. 6. With most test images, the (dominating) frequencies are well below the Nyquist limit, that is why most of the time the inversion is satisfactory. When we add, say white noise, the Nyquist limit is reached, and the inversion with iradon will alias some of those frequencies.
The sampling sets of f, and the reconstructed f with sharp sampling requirements.
Discretization
Let us say we have f on an grid. We think of that as discrete samples of f originally defined on, say, . This we have the steps , . Assume for a moment that we apply the classical sampling theory (no small parameter h) in a formal way at this point. Then those steps have to be , respectively at most, where are the band limits in the variable. Then we get , as the least upper bounds of the band limits of f. For the band limit of , we have , and the maximum is achieved at the vertices of the box . Note that the disk contains more frequencies than can be properly sampled on the grid; the extra ones lie outside that inscribed box.
We can connect the classical sampling theory to the semiclassical one as follows. Denote for a moment the semiclassical quantities with tildes over them. Let , , with , fixed. The steps s (, etc.) are equal to the semiclassical relative steps but since in our sampling theorems the absolute steps are , this means that the absolute steps are multiplied by h. Then our analysis holds as , i.e., as , (keeping the ratio constant) and the steps going to zero at a rate . This is the usual setup in numerical analysis where , i.e., the step is h.
For each such f we define the norm as
This is consistent with formula (16) in [17] and approximates the norm of a continuous function on that box with samples . Then
is the standard deviation of f when the mean of f is zero.
We will apply this to both f defined on for some , and to on .
Assume that g is a discrete representation of a function on sampled on an lattice. Assume g is obtained by a white noise process (with zero mean) and variance . Then a slight extension of Lemma 4.1 shows that almost surely.
The sampling steps are , ; hence to avoid aliasing, we need , .
Let f, to which will be applied, represent a discretization of a function on , and assume that it is sampled on an lattice. Then, similarly, the sharp band limit in each variable is .
As we showed in [17], and it follows easily from (5.5), to avoid aliasing, we need
This inequality, as well as the inequalities and the equalities below are meant in asymptotic sense, i.e., one should multiply, say the r.h.s. in this case by , as . Note that (5.16) follows from viewing f as supported in , i.e., , with frequency set in . As we mentioned above, that ball contains more frequencies than those in its inscribed square. For every in the range of with f as above, after an inversion we get f, of course, and then the frequencies will fall inside the inscribed square . If we take g to be “noise”, not in the range of , then by the mapping property of , see [17], formula (51), the frequency set of will generically fill the disk . If we want to avoid aliasing (without applying a filter), we would need to reconstruct on an grid or better. On the other hand, for all practical purposes, we would want to apply a filter.
Therefore, the discrete version of (5.15), including a filter now, is
where the formula has the same asymptotic and probabilistic meaning as explained after Theorem 5.1.
Assume now that we sample sharply, i.e., we have equalities in (5.16). Then , and we get
Therefore,
We can make the following conclusions from (5.17), (5.18) and (5.19).
With a sharp sampling rate, the noise ratio, measured as its standard deviation relative to that of g, increases as . This is understandable since we are allowing for higher frequencies, and is of order . At the same time, we can handle f with higher frequencies because N is proportional to the Nyquist bound.
The noise ratio, for a fixed N, is minimized when we sample sharply.
In many applications, increasing and decreases the size of the detectors, and then the discrete samples are scaled down by constants times and . If the added noise is expressed in units relative to that, then the quotient in (5.17) would be proportional to , i.e., the noise ratio increases with and . This is known in engineering.
Left: where and g is white noise. Right: radial profile of from the center to one of the sides (but not all the way along the diagonal to a vertex).
Defaultiradoninversion. First we present an inversion with the default one degree angular step. We choose , by default and is chosen by radon as an approximation to . We choose g to be normally distributed (Gaussian) noise with standard deviation one. Then we invert it with iradon. A plot of the modulus of the Fourier transform of the inversion f is shown in Fig. 7. We chose to plot here and below rather than for clarity. With an exact inversion, as , we should be seeing a density plot of square root of (5.6), i.e., , filling the whole square. We see is that the density increases in the radial variable from the center but at some point starts to decease until it visibly becomes zero when is slightly larger than a half of the side, and it is radially symmetric. This behavior can be explained by the following. The default choice (rounded) of actually lowers the Nyquist limit of the reconstructed f to of its original value. Without that, the boundary of the disk in Fig. 7 would be the circumscribed circle of that square but with that choice, it is the inscribed one. The gradual decrease close the border can be explained by an effectively low pass filter when inverting . Our numerical experiments below at much higher resolutions for confirm that.
A similar experiment with a uniformly distributed noise g in a symmetric interval around the origin produces virtually the same plot of , not shown. In both cases, the values of f look normally distributed.
High precision inversion. We present numerical inversions with a proper discretization. We want to model adding noise to discrete measurements of the “continuous” ; inverted with high precision; i.e., by upsampling first the discrete data several times to mimic inversion in the “continuous domain”. We do the following.
The function f is assumed to be defined on and sampled on an lattice.
We compute a high accuracy on a lattice, where , . To do that, we perform the computations on a finer grid.
We add noise to the so-computed .
We invert the noisy data by upsampling it first. The reconstructed is either left sampled on a finer grid or downsampled to the original one.
We give more details below. To do (ii), we upsample f on an lattice with Lanczos-3 with some . Typical m’s we use are and . Then we compute with radon on a lattice which we view as on sampled uniformly in each variable. The parameter m represents the degree of oversampling: corresponds to the sharp lower bound for proper sampling. Since computing involves interpolation of f for computing the line integrals (we use the option ‘spline’ in radon), such an oversampling allows us to reduce the errors in such interpolation compared to the sinc inversion. Then we downsample the computed to a lower resolution (without interpolation; we take every m-th value in each row and column). This simulates a high precision computed on the grid. To do (iii), we add noise.
In (iv), we invert on that lattice. We could resize to a different (but high enough resolution) before that but the results do not look much different. The resulting is computed on an lattice, which is viewed as f on sampled uniformly. If needed, that f could be resampled to an lattice but since it does not contain frequencies higher than the Nyquist limit corresponding to N, this is not needed for computing the standard deviation, for example.
We want to emphasize that it is possible to do (close to) ideal upsampling, say from to with and which preserves the band limits and by using the Fourier transform. On the other hand, this is not what is usually done. When we use Lanczos-3, for example, the interpolation kernel is the inverse Fourier transform of a smoothened version of , see Fig. 1, which is close to be equal to one in at least as explained in Section 3.3. On the other hand, Theorem 3.1 in [17] requires some oversampling, and an interpolation kernel to be the Fourier transform of a function similar to that in Fig. 1, equal to one on the (smaller) frequency band. Therefore if we choose we are in this regime.
To do experiments with noise only, we take in (1.2). Then steps (i) and (ii) are trivial, since . So our starting point is (iii), where we take g to be generated by either a normally or a uniformly distributed noise, on an grid. We upsample by a factor of m, i.e., to an grid an do the inversion there. We take in our experiments.
Non-filtered inversion. We test (5.8) now. To this end, we take g to be either Gaussian or uniformly distributed noise with zero mean on an grid as in (ii), with equalities there, i.e., , . Then we cut the Fourier transform of the result sharply to -th of the frequency box corresponding to the original resolution , ; denote this by , and apply to it without changing the grid size. This procedure provides more precise computation than just inverting the noise because it avoids the smoothing which happens in the part we cut off. If we had of a non-zero f polluted with noise, we would have upsized the data m times in each dimension first, and then would have performed that procedure.
Noise experiments
Noise ratio with Gaussian noise. Theoretical ratio:
Top: we choose , to be white noise; then looks like in Fig. 7. Bottom: and , i.e., the Fourier transform of the reconstruction after the frequency cut-off of the noise.
Since we effectively multiply both and by m, by (5.17), we see that (5.19) can be written in terms of the noise ratio as
We take g first to be a Gaussian noise with several choices of N and m; doing five experiments for each choice. The results are in Table 1 and in Fig. 8, we illustrate the inversion with . Similar experiments with a uniformly distributed noise with mean zero generate similar numbers, not shown.
Filtered inversion. We perform similar experiments with the Hann and the cosine filter. Since the Hann filter is very small near the band limit B, see Fig. 5, the smoothing effect of the interpolation used by iradon, see Fig. 7, plays a negligible role. Modeling that smoothening by the Lanczos-3 profile, for example, see Fig. 1, by introducing an extra factor in (5.14) shows an error of less than in . Then even with , we get a result close to the theoretical one, which is approximately 0.07676 for the Hann filter and 0.11327 for the cosine one, as we computed above. For , for example, we get for Hann and for cosine, where the smoothing effect of iradon is a bit less compensated for. The numbers for normally and uniformly distributed noise are very close. For the cosine filter, we plot (instead of for clarity), the computed radial profile of , and its theoretical one in Fig. 9 below. The radial profile is computed as averaged over 25 concentric rings. In this case, is proportional to the microlocal defect measure of f at any fixed x (it does not depend on x). The Hann filter behaves similarly, with the computed radial profile of very close to its theoretical one .
Cosine filter. Left: where and g is white noise. Center: the computed radial profile of from the center to one of the sides. Right: the theoretical profile .
Percentage of added noise
In many numerical simulations, we add noise to the data, as a percentage of a certain norm of the data, and measure the percentage of the noise in the reconstruction. This is especially interesting in (mildly or not) ill-posed problems.
There is a lot of flexibility in choosing those norms. Let us say that we choose the norm for f and the norm for . Then the left inverse is not bounded in those spaces but on semiclassically bounded functions (which are smooth), it is; we refer to [17] for semiclassical estimates.
Let be the noise added to , see (1.2). Its percentage is given my (converted to percentage). We are interested in , where is the noise in the reconstruction. We have
The coefficient K is the multiplier which relates the two percentages. Its first factor is proportional to the noise ratio we studied earlier since the norms are proportional to the standard deviations. The second one depends on f. To analyze it, write
where the convergence is in the sense of Theorem (4.1). Then we integrate over the semiclassical wave front. By (4.41),
We used again the fact that κ is an isometry. This works for general operators but for we actually know that . We can write , intertwine with , to get the formula above as an exact one, not just a limit. Therefore, (5.21) yields
Since the noise ratio is independent of f, we see that K would be large if, roughly speaking, f is low frequency. Most conventional images (with ) have a very large zero frequency relative to the rest of the spectrum and the second quotient in (5.22) does not vary much. When , we have and functions the variation of this quotient is higher. Then we do not need to isolate the zero section.
In Fig. 10 we demonstrate this effect. We choose and the dimensions of the grid for is chosen with equalities in (5.16), see also Fig. 6. We add the same amount of normally distributed noise, of , to . We measure different percentages of added noise to the reconstructed f depending on the frequency distribution of f, i.e., on the ratio in (5.22). Images with mostly lower frequencies suffer from noise more. On the other hand, given the a priori knowledge of their frequency band, that noise can be filtered out, unless we are looking for small high frequency detail in an overly lower frequency image. We chose non-negative f’s in that figure only. Numerical experiments with f of mean value zero show lower added noise on a few examples. If in Fig. 10(c) we allow random positive and negative amplitudes as well, for example (not shown), the added noise in (g) drops to . It is worth mentioning that with many conventional images, the values we are getting are close. In fact, statistically, such images share similar power spectra distributions [18].
Top: four different choices of , . Bottom: f reconstructed with noise added to . The numbers show the added noise to f, and .
Therefore, measuring the sensitivity of a particular inversion to noise this way can be quite misleading. The added noise to the image depends on the noise ratio (5.20) which in turn depends on the grid chosen to discretize ; and also depends on the choice of the test image.
The Radon transform in the plane in fan-beam coordinates
in fan-beam coordinates
We parametrize by the so-called fan-beam coordinates. Recall (5.1). Each line is represented by an initial point on the boundary of , where f is supported, and by an initial direction making angle β with the radial line through the same point, see Fig. 11. It is straightforward to see that this direction is given by . Then the lines through are given by
The canonical relation is the union of the graphs of , see [17], given by
Then are isomorphic under the symmetry mentioned above lifted to the tangent bundle
The inverses are given by
In particular, we recover the well known fact that κ is 1-to-2, as in the previous case.
The fan-beam coordinates.
Set , where
We have . Then . To compute , write
Since , we have . Therefore,
The factor in the middle of the r.h.s. is a multiplication operator, and applying Egorov’s theorem (one can actually do it even directly and without a remainder), one gets for the principal symbols, at least,
The equivalent to (5.3) then is
Therefore, the noise spectral distribution depends on x now, and it depends on the direction of ξ relative to x. For x, fixed, it is maximized when , and minimized when .
Sampling
As above, if is sampled on an grid, we have . As before, set . Then we consider f having in with . The image of this product under the canonical map, projected to the dual variable has the following smallest box containing it: , see [17]. The means taking at least , i.e., samples over the intervals indicated in (6.1). Compared to (5.16), this requires π times the number of samples, which makes it a less efficient sampling geometry, as shown in [17].
In Fig. 12, we present a numerical experiment to validate (6.3). We take g to be Gaussian noise and invert it with . Then we crop a small rectangle in the top left corner and take the modulus of its Fourier Transform. Then x is close to and the small black elongated oval in the center has a major axis along the same vector, formula (6.3) predicts.
Spectral density of the noise in f with the Hann filter. Left: measured in the top left corner. Right: theoretical profile (6.3) at that corner.
Noise ratio
We study the noise ratio with a filtered inversion. In ifanbeam in MATLAB, for example, is converted to parallel coordinates and the filter is applied after that. By (6.2), the filter , with ν even, takes the form , where B is the band limit of . The inversion operator then is which equals modulo lower order operators by Egorov’s theorem. We get, similarly to (5.12) that (6.3) modifies as
Assume that g is oversampled (related to B, see [17] for the sampling requirements), and it is white noise. Then the variance at a point, see (2.8) is given by
compare with (5.13). The integral is of elliptic type and varies between , when , and 4 when . To connect this to (5.13), the integrand in (5.13) there corresponds to formally; and then we get (5.13). Taking a square root, we see that the standard deviation would be higher in the center, the same as in the parallel geometry case, and will decrease slightly to about at , which corresponds to the four corners of the square in our numerical simulations.
Similarly to (5.7) and (5.13), we integrate over x in the inscribed disk in and divide by its area π to get the variation in that disk. Then and
This is within 6-7% of the parallel geometry variance, and about difference for the standard deviation.
Non-additive noise
In this section we discuss some types of non-additive noise. The exposition here will be more sketchy, we will point out how to fit those cases into the general framework we developed but will not go into detail.
Multiplicative noise
Assume the data is subject to a multiplicative noise. This can happen if the detectors are not perfectly calibrated and each one reports a signal somewhat larger or smaller than it should be (non-uniform response). In imaging systems, photo response non-uniformity (PRNU) is an example of such noise. A generic way to model the kind of multiplicative noise we have in mind is the following: consider a sequence of discrete noise samples , where
and is the white noise considered in Hypothesis 4.1. Then we set
where are the discrete noise samples, playing the role of above, and g is the noise-free continuous signal. We will compare the noise defined by (7.2), i.e., by the first formula in (7.1) (the second one can be treated similarly and one has to take into account that is not necessarily centered), to a noise of the form
We have
Since
with , we get
That factor h allows us to estimate, using Proposition 3.2, the error when replacing in (7.2) by . We would get an error. The problem here is that we want to apply this to , all dependent on h, and in general, grows like . This cancels the decay above. If we oversample a lot, the error will be “small”. Also, in regions with far away from the Nyquist limit, that term will be small. If we ignore it for a moment, the noise added to is given by (7.3). It is white noise as above but multiplied by . The defect measure of the noise added to the data then is like in (4.15) with the additional factor .
One important case which allows to overcome the difficulty above is when , where with (a Friedrichs mollifier) with . This corresponds to averaged measurements of an h-independent function . We refer to [17] for the sampling theory for such measurements. Then , and assuming (either h-independent or uniformly bounded there in h), we have (rather than ), which is the estimate we needed in the previous paragraph. Then the machinery we developed works and we need to multiply the noise measure by the additional factor in (4.15), i.e., we get there
under the assumption in (7.2). Then (5.13) takes the form
This shows that the standard deviation of the noise at x depends in particular on the line integrals of f along lines thorough x. Line integrals with large values would create stronger noise at x.
Modeling noise in CT scan
In CT scan tomography, what is measured is the attenuation along each ray. If is the initial intensity, and I is the one after the ray crosses the object, then the measurement is , by the Beer-Lambert law. Assuming an additive noise , we measure . If we invert this the same way as if there were no noise (which may not be the best strategy), we would get
Obviously, increasing will decrease the effect of the added noise but in many applications this is not desirable and/or the noise level may depend on . We take , i.e., is the added noise relative to . Then
Therefore,
If the noise is small enough, we can pass to a linearization to get
This is the multiplicative noise model above with replaced by . In (7.4), for example, the factor would be replaced by .
Modeling Poisson noise
In SPECT, we measure the attenuated X-ray transform but the particle count at each detector is low. In this case, the predominant noise is of Poisson type: the number of particles at each detector is randomized by a Poisson distribution with probability of taking value k being , where is the expected value at that detector, see [15, Section 4.5]. Both the expected value and the variance of P equals λ. Then the particle count at each detector equals , where w is a random variable with zero expected value and variance 1. Note that the probability distribution of w depends on λ and approximates a Gaussian one when and they are independent. Assuming locally averaged measurements, as above, we would get added noise when the units for are the number of particles; and α times that in general with some . Note that are not identically distributed (but Theorem 4.1 still applies) and are well approximated by Gaussian distributions when is not very small. The microlocal measure then would have the factor (we assume , thus ). This is similar to multiplicative noise, where the factor was proportional to .
Shepp–Logan with the Hann filter and with (a) multiplicative noise; (b) CT type of noise; (c) Poisson noise.
Three disks with the Hann filter and with (a) multiplicative noise; (b) CT type of noise; (c) Poisson noise.
Numerical examples
We present numerical simulations with the three types of non-additive noise in Figs 13 and 14. The phantoms are the Shepp–Logan one and three disks of different size and intensity, not shown there, both phantoms having ranges between 0 and 1. They are both rendered on a grid discretizing the square . Their Radon transforms are computed with 1,884 angular steps and 600 steps in the p variable covering the diagonal of the square. To simulate multiplicative noise, we choose the variance of in (7.2) to be 0.2. To simulate CT noise, we use the non-linear model (7.5) (rather than the linearization (7.6)) with . In the Poisson noise case, each value of is randomized as follows: ; it is worth noticing that ranges from 0 to 0.51 in the Shepp–Logan case and to 0.56 in the disks case. We chose the noise parameters so that the noise would be of similar strength, visibly, in all three cases, and the distribution is Gaussian. Hann filter is applied to the inversion.
Note that the noise has different character compared to Fig. 10(h), for example, and one can see individual lines (more precisely, line segments) in it. The multiplicative and the Poisson noise characters are somewhat similar; while the CT noise in the middle looks different. Our analysis shows that in the latter case, the standard deviation of the added noise in the linearization regime has range from to about times , see (7.6), while in the other two cases, the range is from 0 to a certain positive constant, which allows for almost zero noise locally before inversion. For this reason, individual lines are harder to distinguish in the CT case.
Discrete noise and its power spectrum
In this section, we analyze discrete white noise directly, without converting it to a continuous function. Here, is a random vector on an grid which we denoted by before. We will denote by the discrete delta function on . In Section 8.1, we follow mainly [15, Chapter 12], where f is a random variable depending on a (continuous) variable t; but most of it adapts to the discrete setting easily. We do a temporal analysis of the power spectrum for each fixed (discrete) frequency, with N fixed. We show that the spectrum of white noise is flat in the sense of expected value over repeated experiments, and we consider more general noise. On the other hand, for each experiment, the spectrum is quite, well, noisy and does not appear to smoothen as numerically. In the second part, we study the ergodic properties of the power spectrum, with a single experiment, as . We show in Theorem 8.1 that the power spectrum is flat on average. That theorem is an analog of Theorem 4.1.
We want to emphasize that depends on N, so we have a “triangular” array of random variables depending on the random outcome and increasing their size with N.
Temporal analysis
The discrete analog of the Fourier transform is the Discrete Fourier Transform (DFT) described below. It lives naturally on the discrete torus with period N. This shows that any time the DFT is used for spectral analysis, the original f is actually regarded as the restriction of a periodic function on a fundamental domain. We consider , and we denote ; with each element , a random variable in the same probability space. First, N will be fixed but eventually, we will take . We denote by the vector defined by , i.e., this is the multiplication of the functions of a discrete argument. Similarly, is the vector with components , while is the norm of f.
We define the (unitary) Discrete Fourier Transform (DFT) by
Its inverse is the adjoint one
Parseval’s equality takes the form
for complex-valued f and g, where the dot-product is the natural one in . In particular, is unitary. There is a natural (circular) convolution defined, and we have
Next, we have
For each f with random entries, as above, define the auto-correlation
The auto-covariance is defined as the auto-correlation of the centered f, i.e., of , and it is easy to see that
this terms comes from 1D processes, where x is the time.
if is a function of only:
where, with some abuse of notation, we used the same notation ACor on the right. A process f is called white noise if
Then we must have
with . We always assume that white noise has a zero mean. The process is wide-sense stationary (WSS) if it is stationary and its mean is constant. Then white noise is WSS if σ is constant. Note that WSS does not mean that are independent from each other but if they are independent, they are uncorrelated, i.e., (8.4) holds.
Let be the DFT of the auto-correlation of f, see (8.2), with respect to :
Then
In case of white noise satisfying (8.5), we have , thus we recover Theorem 11.2 in [15]:
This shows that even when f is not stationary, is stationary with auto-correlation . If , then each has standard deviation σ, and , i.e.,
In particular, for all m, which shows a flat (expectation of a) spectrum. By Theorem 11.3 in [15], if f is real and Gaussian, then the covariance of and equals , as we also show below. In particular, if in (8.5), we get covariance . Therefore, they are correlated when and (because is even) with standard deviation for each Fourier coefficient except for the zeroth one when it is . In fact, we do not need f to be Gaussian to have the same conclusion on asymptotic sense. We assume f real from now on.
Let f be real valued white noise with a finite fourth moment called. Then
We have
The only non-zero expectation terms are those with two (equal or not) pairs of equal indices. Assume first that , . Then we have two cases for the expectation term above:
If , the expectation term equals .
Whenever , this expectation term is the fourth moment .
The latter number of terms is , while the former is . Therefore, this set of indices contributes
to the sum.
Consider the terms with , but with to exclude the previous case. Then the corresponding sum is
We performed the change , above, used (8.1) and compensated for the added terms corresponding to in the second sum which are missing from the first one.
Finally, when , but , the dot product in the phase function becomes and the same argument gives us
The analysis of those three cases completes the proof. □
If f in Proposition
8.1
is normal, then the last term in (
8.7
) vanishes.
The proof follows from the well know fact that for normal distributions.
The results in Proposition 8.1 can be interpreted as follows. Up to an error , we get auto-covariance if and when (symmetry, because f is real), and if . If we stay in a fundamental domain of the type then the symmetry becomes .
Ergodic analysis. Flatness of the power spectrum on average
Let α be a locally Riemann integrable function on , periodic of period 1 in each variable. Assume that f is real valued. We are interested in the following linear functional
This is a discrete analog of (4.14) with p there depending on the dual variable only. It is a weighted (not normalized) average of the power spectrum. What we do there is effectively rescale the spectrum from the integer points in (and then extended by periodicity) to the ones with fractional components of the kind , forming a dense set in asymptotically. In statistics, this is done routinely in the study of peridograms, and is replaced by a continuous variable.
Assume a white noise process (8.5) with . Then , as by (8.6), where the integration is taken over the continuous torus in with period one.
The random variables have zero expectation, correlation given by Proposition 8.1, and variance . Write
The first term is a Riemannian sum. The second term has zero expectation and variance
The error terms come from the cross terms which are products of and , by Proposition 8.1. There are of them. Therefore, we proved the following.
Letbe a white noise process on(depending on N), with varianceand a finite fourth momentum. Then for every Riemann integrable function α onwe havewhere the integral is taken over the torus inwith period one.
Therefore, the measure converges weakly to in mean square sense. In particular, if we take α to be the characteristic function of, say a box U in , then the average of the power spectrum on U tends to in mean square sense.
More general noise
We assume now that the random variables depend on h, have zero mean and have uniformly bounded fourth momenta but are not necessarily independent or equally distributed. If we assume (4.35), then the power spectrum is expressed in Theorem 4.2. One special but important case is when the auto-correlation is space independent (stationary, see (8.3)), then β in (4.35) is independent of x and we have
with some . Then (8.8) takes the form
where is as in (4.36). In other words, the limit measure is .
Numerical examples
We illustrate the temporal behavior of the spectrum first. In Fig. 15, we take a random normally distributed vector f with and variance . The power spectrum is plotted next to it. As we can see, it looks flat on average with mean value close to one but the variation is substantial. On the right we plot the histogram of the (spatial) standard deviation over experiments; it appears to have mean 1. We recall that is the square root of
We have not proved a limit for it though. That would require estimating the auto-correlation of the summands above similarly to Proposition 8.1.
Left: a random normally distributed vector, , . Center: plot of for indices from 0 to 100 ( is an even function with period 200). Right: the histogram of over experiments; it appears centered around 1.
Next, we illustrate the spatial (ergodic) behavior of the power spectrum. The averaged power spectrum for a normally distributed vector is shown in Fig. 16. We divide the interval into 25 subintervals and average in each one of them. We take and . As we can see, the averaged spectrum gets flatter and flatter. This illustrates Theorem 8.1.
Plot of the averaged for and .
If we keep N fixed but average over many experiments, the spectrum gets flatter as well numerically, as (8.6) suggests.
Footnotes
Acknowledgements
The authors would like to thank Kiril Datchev for his advice and Magda Peligrad for making us aware of the reference []. P.S. partially supported by the National Science Foundation under grant DMS-1900475. S.T. partially supported by the National Science Foundation under grant DMS-1952966.
Y.C.De Verdière, A semi-classical calculus of correlations, Comptes Rendus Geoscience343(8–9) (2011), 496–501. doi:10.1016/j.crte.2011.03.002.
3.
M.Dimassi and J.Sjöstrand, Spectral Asymptotics in the Semi-Classical Limit, London Mathematical Society Lecture Note Series, Vol. 268, Cambridge University Press, Cambridge, 1999.
4.
C.L.Epstein, Introduction to the Mathematics of Medical Imaging, 2nd edn, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2008.
5.
C.Fefferman, S.Ivanov, M.Lassas and H.Narayanan, Reconstruction of a Riemannian manifold from noisy intrinsic distances, SIAM J. Math. Data Sci.2(3) (2020), 770–808. doi:10.1137/19M126829X.
6.
C.Gérard, Asymptotique des pôles de la matrice de scattering pour deux obstacles strictement convexes, Mém. Soc. Math. France (N. S.)31 (1988), 1–146.
7.
T.Helin, M.Lassas and L.Oksanen, Inverse problem for the wave equation with a white noise source, Comm. Math. Phys.332(3) (2014), 933–953. doi:10.1007/s00220-014-2115-9.
8.
T.Helin, M.Lassas, L.Oksanen and T.Saksala, Correlation based passive imaging with a white noise source, J. Math. Pures Appl.9(116) (2018), 132–160. doi:10.1016/j.matpur.2018.05.001.
9.
L.Hörmander, The Analysis of Linear Partial Differential Operators. IV, Vol. 275, Springer-Verlag, Berlin, 1985, Fourier integral operators.
10.
T.-C.Hu and R.L.Taylor, On the strong law for arrays and for the bootstrap mean and variance, Internat. J. Math. Math. Sci.20(2) (1997), 375–382. doi:10.1155/S0161171297000483.
11.
R.Keys, Cubic convolution interpolation for digital image processing, IEEE Transactions on Acoustics, Speech, and Signal Processing29(6) (1981), 1153–1160. doi:10.1109/TASSP.1981.1163711.
12.
A.Martinez, An Introduction to Semiclassical and Microlocal Analysis, Universitext, Springer-Verlag, New York, 2002.
13.
E.Meijering and M.Unser, A note on cubic convolution interpolation, IEEE Transactions on Image processing12(4) (2003), 477–479. doi:10.1109/TIP.2003.811493.
14.
F.Natterer, The Mathematics of Computerized Tomography, B. G. Teubner, Stuttgart, 1986.
15.
A.Papoulis and S.U.Pillai, Probability, Random Variables, and Stochastic Processes, 4th edn, Tata McGraw-Hill Education, 2002.
16.
M.Reed and B.Simon, Methods of Modern Mathematical Physics. III, Academic Press [Harcourt Brace Jovanovich, Publishers], New York–London, 1979, Scattering theory.
17.
P.Stefanov, Semiclassical sampling and discretization of certain linear inverse problems, SIAM J. Math. Anal.52(6) (2020), 5554–5597. doi:10.1137/19M123868X.
18.
A.Van der Schaaf and J.van Hateren, Modelling the power spectra of natural images: Statistics and information, Vision Research36(17) (1996), 2759–2770. doi:10.1016/0042-6989(96)00002-8.
19.
M.Zworski, Semiclassical Analysis, Graduate Studies in Mathematics, Vol. 138, American Mathematical Society, Providence, RI, 2012.