Sampling linear inverse problems with noise

Abstract

We study the effect of additive noise to the inversion of FIOs associated to a diffeomorphic canonical relation. We use the microlocal defect measures to measure the power spectrum of the noise in the phase space and analyze how that power spectrum is transformed under the inversion. In general, white noise, for example, is mapped to noise depending on the position and on the direction. In particular, we compute the standard deviation, locally, of the noise added to the inversion as a function of the standard deviation of the noise added to the data. As an example, we study the Radon transform in the plane in parallel and fan-beam coordinates, and present numerical examples.

Keywords

Noise Fourier Integral Operator microlocal inverse problem

1. Introduction

The purpose of this work is to study how noise in discrete measurements affects the reconstruction in linear inverse problems $\begin{matrix} (1.1) & A f = g, \end{matrix}$ where A is a Fourier Integral Operator (FIO). Examples are the Radon transform and the geodesic X-ray transforms in two dimensions, at least, thermoacoustic tomography, and the linearization of some non-linear inverse problems like boundary and lens rigidity. We assume that A is associated with a local diffeomorphism (which condition can be relaxed to the clean intersection condition in principle), and elliptic. Then a parametrix exists, which we will denote by $A^{- 1}$ , also an FIO of the same type. One can regard the problem as mapping noise by FIOs, rather than under by their inverses but we keep the former point of view.

We want to emphasize that we are not trying to remove noise. That would be only possible with a priori, say statistical information about f, but this is not the goal of this work. On the other hand, understanding well the structure of the noise under the action of the inverse would allow for better understanding of what part of f (in phase space) is most affected by noise and would hopefully allow for more efficient noise reduction.

We study additive noise first. Such noise is typically created by noisy detectors which add certain constant (but usually low) noise to the signal or by background noise. In Section 7 we study examples of non-additive noise: multiplicative noise, Poisson noise as an example of modulation noise, and noise appearing in CT scan. In case of additive noise, we are given the noisy data $g + g_{noise}$ , where $g_{noise}$ (a function) is the noise. Then we are trying to solve $\begin{matrix} (1.2) & A f = g + g_{noise} \end{matrix}$ instead. The right-hand side (r.h.s.) may not be in the range of A so a solution may not even exist. What is often done is to apply the adjoint (assuming some Hilbert structure) $\begin{matrix} A^{*} A f = A^{*} g + A^{*} g_{noise}, \end{matrix}$ which automatically cuts the part of $g_{noise}$ perpendicular to the range of A, and then invert $A^{*} A$ , assuming that A is injective in the first place. If not, we invert $A^{*} A$ on the range of $A^{*}$ . This can be viewed also as the least squares approximation, and it is what the Landweber iteration does, for example. So the inversion is $\begin{matrix} (1.3) & f_{recovered} = {(A^{*} A)}^{- 1} A^{*} g + {(A^{*} A)}^{- 1} A^{*} g_{noise} = f_{0} + {(A^{*} A)}^{- 1} A^{*} g_{noise}, \end{matrix}$ where $f_{0}$ is the so described least-squares reconstruction of f without noise. Of course, we could do a different “inversion”. One way to do it to choose a different Hilbert structure. What is described above is very common however and it is known as the Moore–Penrose inverse. We do not have to assume that the inversion is the Moore–Penrose inverse; it could be any parametrix of A, and the Moore–Penrose inverse is such a parametrix under the assumptions we made on A.

With the above considerations in mind, we can think of the added noise as $\begin{matrix} (1.4) & f_{noise} : = A^{- 1} g_{noise}, \end{matrix}$ where, as above, $A^{- 1}$ is a parametrix, and $A^{- 1} g_{noise}$ is well-defined even if $g_{noise}$ is not in the range of A. This is also the so described solution of $\begin{matrix} A f_{noise} = g_{noise} . \end{matrix}$ We can drop the $f_{noise}$ and the $g_{noise}$ notation now and just study (1.1) with g not necessarily in the range of A, i.e., g is the noise now.

Example 1.
The example we will use in this paper is the Radon transform $R$ in $R^{2}$ $\begin{matrix} (1.5) & R f (ω, p) = \int_{x \cdot ω = p} f (x) d ℓ, p \in R, ω \in S^{1}, \end{matrix}$ where $d ℓ$ is the Euclidean line measure. It is written in “parallel geometry” coordinates. We study this example in more detail in Section 5; and in Section 6, we will study the same problem for the Radon transform in fan-beam coordinates. It is known that $R$ is an FIO of order $1 / 2$ with a canonical relation a graph of a local diffeomorphism (1-to-2). The most popular inversion formula is the “filtered back projection” $\begin{matrix} (1.6) & f = \frac{1}{4 π} R^{'} | D_{p} | g, g = R f, \end{matrix}$ where $R^{'}$ is the transpose in distribution sense; and its versions with adding an additional filter. We view (1.6) as a unfiltered inversion and that with an additional filter, see (5.10), as a filtered one. Now, one can define a norm in the g space by $‖ | D_{p} |^{1 / 2} g ‖_{L^{2} (R \times S^{1})}$ . Then $R^{} = R^{'} | D_{p} |$ and (1.6) takes the form $f = {(4 π)}^{- 1} R^{} g$ . Formula (1.6) is used all the time with noisy data not in the range of $R$ . In addition, we have $R^{} R = 4 π Id$ . Therefore, the relation $f = {(4 π)}^{- 1} R^{} g$ can be recast as $f = {(R^{} R)}^{- 1} R^{} g$ , which is exactly (1.3).

On the other hand, we may assume that the natural space for g is $L^{2} (R \times S^{1})$ . Then $R^{} R = 4 π | D |^{- 1}$ ; then the inversion is $\begin{matrix} f = \frac{1}{4 π} | D | R^{'} g, g = R f . \end{matrix}$ This inversion formula is equivalent to (1.6). Note that the inverse is a version of (1.3) again.

Assume that in discrete measurements, the added noise consists of random variables with a known autocorrelation. The simplest case is independent identically distributed (i.i.d.) random variables at each “pixel” (white noise). The distribution could be Gaussian, uniform, etc. We convert the discrete measurements to a function on a “continuous”, space, i.e., locally a function on $R^{n}$ . Then we invert the data by applying a parametrix, as in (1.4). The discretization rate is assumed to be proportional to a small parameter $h > 0$ and we are interested in the asymptotic properties as $h \to 0$ . Our main goal is a characterization of the induced noise $f_{noise}$ after the inversion.

The novelties of our approach are the following. First, we view discretization and the inverse process – interpolation from a given discretization, as the step size tends to 0, in the semiclassical setting, where the small parameter $h > 0$ is proportional to the step size. This point of view was proposed in the first author’s paper [17]. This allows us to use tools from semiclassical analysis to estimate the sharp sampling rate of $A f$ , knowing the band limit of f, characterize aliasing artifacts during inversion if $A f$ is undersampled, give a sharp limit of the resolution, etc. In this paper, we assume that we do not undersample $A f$ .

The second novelty is moving the analysis of the spectral character of the noise to the phase space; roughly speaking, instead of localizing in the dual variable ξ only, to localize in both the spatial one x, and ξ. In the applied literature, there are two main ways to characterize noise: through its standard deviation (which assigns just one number) and through its power spectral density* (or power spectrum). The latter is $| \hat{f} (ξ) |^{2}$ , where f is the noise, as a function of the frequency ξ. Knowing that, we can recover the standard deviation as well, by Parseval’s identity. Even though not always explicitly stated, when the noise is not expected to be homogeneous (translation invariant), one can localize in the base variable x by taking the modulus squared of the windowed Fourier transform $| \hat{ϕ f} (ξ) |^{2}$ with some $ϕ \in C_{0}^{\infty}$ . We propose going one step further: consider the power spectrum in the phase space of points x and (co)directions ξ. With the presence of the small parameter h, the natural framework is the semiclassical analysis again. The semiclassical version of localizing both in space and momentum is to localize near some $x_{0}$ in the x space with a smooth cutoff of size $h^{1 / 2}$ and then take the Fourier transform with ξ replaced by $ξ / h$ , see [19] for a discussion. The natural candidate of the power spectrum in the phase space then would be the so-called semiclassical defect measure $d μ (x, ξ)$ which, roughly speaking, measures the spectral content of $f = f_{h} (x)$ in the phase space. We call that measure power spectrum as well.

The third novelty is looking at the noise in ergodic sense, which we also call “spatial”, i.e., the noise in one measurement. There are two ways to look at the statistical properties of the noise. First, one might be interested in the expected value of the noise pointwise as we keep repeating the same experiment over and over again (in our context, if we have a series of noisy data sets and do an inversion for each one of them). We call this “temporal” view, and the analysis of the temporal properties is easier. In applications, we have one such experiment however. Our goal is to analyze the statistics of the noise in the inversion for a single experiment, as the sampling rate gets smaller and smaller, hence the term “spatial”. In statistics, an estimate with a single experiment is possible when the variables are i.i.d., and we rely on the ergodic properties of the sequence.

We start with analysis of discrete white noise. The flatness of its spectrum in temporal sense, see (8.6), is well-known, which justifies its name. In spatial (ergodic) sense, this is true only in a certain averaged sense, see Theorem 8.1. For the white noise interpolated to a “continuous” function, we show that the defect measure $d μ$ is flat as well in Theorem 4.1. In Theorem 4.2 we study the spectrum of more general, correlated noise.

Next, we study propagation of noise under FIOs $A^{- 1}$ (or simply A) of the mentioned type. With the semiclassical view of noise and is power spectrum, the analysis of the power spectrum of the result $A^{- 1} g$ is reduced to the mapping property of a (semiclassical) defect measure under a (classical) FIO. The answer is given by the Egorov’s theorem with some extra care of the zero section. Then the tools described above would allow us the characterize the spectrum of the resulted noise in the reconstruction. We want to emphasize that even if we start with white noise g, which has a flat spectrum, the noise $A^{- 1} g$ is not homogeneous in general – its power spectrum depends on the position x and the codirection ξ. In particular, its standard deviation may change from a neighborhood of one point to another.

As we mentioned already, our analysis is not restricted to (additive) white noise only, see also Section 7 for non-additive noise. We can have data with added non-white noise as well, as long as its power density in the general sense we consider it, is well defined. It could be pink, blue noise, if can be anisotropic noise, varying from point to point, or even noise corresponding to a non-absolutely continuous defect measure. For example, we may have the Radon transform $R f (p, ω)$ with added noise depending on one of those two variables only, then the associated measure would be singular. Theorems 4.1 and 4.3 still apply and describe the power density of the noise in the reconstruction.

Instead of developing the general abstract theory further, we present its application to the inversion of the Radon transform in the plane. In “parallel geometry”, we show that the spectral density of the added noise is independent of the position x and proportional to $| ξ |^{1 / 2}$ up to the Nyquist limit (and the spectral power, which is the square of the density, is proportional to $| ξ |$ ). In “fan-beam coordinates”, the noise depends on the position x, on $| ξ |$ proportional to $| ξ |^{1 / 2}$ again but depends on the direction of ξ (relative to x) as well. We present many numerical simulations.

Noise is a major concern in the applied inverse problems and has been considered in the literature; nevertheless, we are not aware of directly related works. We will mention only a few more theoretical works about noise and inverse problems. Reconstruction of Riemannian manifolds with noisy data has been studied in [5]. Using noise a source for a reconstruction has been studied in [1,2,7,8].

The structure of the paper is as follows. In Section 2, we recall some basic facts about semiclassical analysis, needed for our exposition. We also study the relation between classical and semiclassical FIOs. In Section 3, we summarize and develop further some of the results in [17] about sampling in the semiclassical limit. In Theorem 4.1 in Section 4, we prove that the power spectral density of white noise is uniform, by computing its microlocal defect measure. We also show that more general noise satisfying some assumptions, has a well defined microlocal defect measure as well. Then we apply Egorov’s theorem to describe how that measure transforms under FIOs associated with a canonical diffeomorphism. Sections 5 and 6 are devoted to an application of the theory to the Radon transform on the plane in parallel and to fan-beam coordinates. We present many numerical examples as well. Multiplicative noise and other type of noise are analyzed in Section 7. Finally, in Section 8, we analyze discrete white noise without converting it to noise of a continuous variable. We show that it has flat spectrum on average.
2. Preliminaries on semiclassical analysis

We recall some basic facts from semiclassical analysis. For more details, we refer to [3,12,19]. Before that, a few words about the notation. All norms $‖ \cdot ‖$ are in $L^{2}$ unless indicated otherwise; also $⟨ ξ ⟩ : = {(1 + | ξ |^{2})}^{1 / 2}$ . We denote by $S$ the Schwartz class; and $E^{'}$ is the space of the compactly supported distributions. For a linear operator A, $A^{'}$ is the transpose in distribution sense, while $A^{*}$ is the $L^{2}$ -adjoint.

2.1. Semiclassical wave front set

The semiclassical Fourier transform $F_{h} f$ in $R^{n}$ of a function depending also on $h > 0$ is given by $\begin{matrix} F_{h} f (ξ) = \int e^{- i x \cdot ξ / h} f (x) d x . \end{matrix}$ Its inverse is ${(2 π h)}^{- n} F_{h}^{*}$ . We recall the definition of the semiclassical wave front set of a tempered h-depended distribution first. In this definition, $h > 0$ can be arbitrary but in semiclassical analysis, $h \in (0, h_{0})$ is a “small” parameter and we are interested in the behavior of functions and operators as h gets smaller and smaller. Those functions are h-dependent and we use the notation $f_{h}$ or $f_{h} (x)$ or just f. The Sobolev spaces are the semiclassical ones defined by the norm $\begin{matrix} ‖ f ‖_{H_{h}^{s}}^{2} = {(2 π h)}^{- n} \int {⟨ ξ ⟩}^{2 s} {| F_{h} f (ξ) |}^{2} d ξ . \end{matrix}$ Then an h-dependent family $f_{h} \in S^{'}$ is said to be h-tempered (or just tempered) if $‖ f_{h} ‖_{H_{h}^{s}} = O (h^{- N})$ for some s and N. All functions in this paper are assumed tempered even if we do not say so. The semiclassical wave front set of a tempered family $f_{h}$ is the complement of those $(x_{0}, ξ_{0}) \in R^{2 n}$ for which there exists a $C_{0}^{\infty}$ function ϕ so that $ϕ (x_{0}) \neq 0$ , and $\begin{matrix} F_{h} (ϕ f_{h}) = O (h^{\infty}) for ξ in a neighborhood of ξ_{0} \end{matrix}$ in $L^{\infty}$ (or in any other “reasonable” space, which does not change the notion). The semiclassical wave front set naturally lies in $T^{*} R^{n}$ but it is not conical as in the classical case. Elements of the zero section can be in ${WF}_{h} (f)$ .

Sjöstrand proposed essentially adding the classical wave front set to ${WF}_{h}$ by considering the latter in $T^{*} R^{n} \cup S^{*} R^{n}$ , where the second space (the unit cosphere bundle) represents $T^{*} R^{n}$ as a conic set, i.e., each $(x, ξ)$ with ξ unit is identified with the ray $(x, s ξ)$ , $s > 0$ . Their points are viewed as “infinite” ones describing the behavior as $ξ \to \infty$ along different directions. An infinite point $(x_{0}, ξ_{0})$ does not belong to the so extended ${WF}_{h} (f)$ if we have $\begin{matrix} (2.1) & F_{h} (ϕ f_{h}) = O (h^{\infty} {⟨ ξ ⟩}^{- \infty}) for ξ in a conical neighborhood of ξ_{0} \end{matrix}$ with ϕ as above.

2.2. Semiclassical pseudo-differential operators (h-ΨDOs)

We define the symbol class $S^{m . k}$ of symbols in $R^{n}$ as the smooth functions $p (x, ξ)$ on $R^{2 n}$ , depending also on h, satisfying the symbol estimates $\begin{matrix} (2.2) & | \partial_{x}^{α} \partial_{ξ}^{β} p (x, ξ) | ⩽ C_{α, β, K} h^{k} {⟨ ξ ⟩}^{m - | β |}, \end{matrix}$ for x in any compact set K, see, e.g., [6]. In fact, we are going to work with symbols supported in a fixed compact set in the ξ variable, so the behavior in ξ above does not matter; one may also work with the symbol class $h^{k} S^{m} (1)$ , see [12,19] where $S^{m} (1)$ is defined as (2.2) with $k = m = 0$ . Given $p \in S^{m}$ , we write $P = P_{h} = p (x, h D)$ with $\begin{matrix} (2.3) & P f (x) = {(2 π h)}^{- n} \iint e^{i (x - y) \cdot ξ / h} p (x, ξ) f (y) d y d ξ, \end{matrix}$ where the integral has to be understood as an oscillatory one. This is the standard quantization; sometimes it is convenient to work with the Weyl one $p^{w} (x, h D)$ , where $p (x, ξ)$ is replaced by $p ((x + y) / 2, ξ)$ in (2.3). Then real symbols correspond to symmetric operators, in particular. Negligible operators are those with $O (h^{\infty})$ norms in any pair of Sobolev spaces.

2.3. Semiclassically band limited functions

In [19], it is said that a tempered $f_{h}$ is localized in phase space, if there exists $p \in C_{0}^{\infty} (R^{2 n})$ so that $\begin{matrix} (Id - p (x, h D)) f_{h} = O (h^{\infty}), in S (R^{n}) . \end{matrix}$ All functions in this paper will be of this type.

It is convenient to introduce the notation $Σ_{h} (f)$ for the semiclassical frequency set of f.

Definition 2.1.
For each tempered $f_{h}$ localized in phase space, set $\begin{matrix} Σ_{h} (f) = {ξ; \exists x so that (x, ξ) \in {WF}_{h} (f)} . \end{matrix}$

In other words, $Σ_{h}$ is the projection of ${WF}_{h} (f)$ to the second variable, i.e., $\begin{matrix} Σ_{h} (f) = π_{2} \circ {WF}_{h} (f), \end{matrix}$ where $π_{2} (x, ξ) = ξ$ . If ${WF}_{h} (f)$ (which is always closed) is bounded and therefore compact, then $Σ_{h} (f)$ is compact.

In [17], we gave the following definition.
Definition 2.2.
We say that $f_{h} \in C_{0}^{\infty} (R^{n})$ is semiclassically band limited (in $B$ ), if (i) $supp f_{h}$ is contained in an h-independent compact set, (ii) f is tempered, and (iii) there exists a compact set $B \subset R^{n}$ , so that for every open $U \supset B$ , we have $\begin{matrix} (2.4) & | F_{h} f (ξ) | ⩽ C_{N} h^{N} {⟨ ξ ⟩}^{- N} for ξ \notin U \end{matrix}$ for every $N > 0$ .

We showed in [17] that $f_{h}$ is semiclassical band limited if and only if it is localized in phase space and if and only of ${WF}_{h} (f)$ is finite (no points of the type (2.1)) and compact.

In applications, we take $B$ to be ${[- B, B]}^{n}$ with some $B > 0$ or the ball $| ξ | ⩽ B$ .

An example of semiclassically band limited functions can be obtained by taking any $f \in E^{'} (R^{n})$ and convolving if with $ϕ_{h} = h^{- n} ϕ (\cdot / h)$ with $\hat{ϕ} \in C_{0}^{\infty}$ . Then $ϕ_{h} * f$ is semiclassically band limited with $B = supp \hat{ϕ}$ .
2.4. Classical ΨDOs as semiclassical ΨDOs

In the applications we have in mind, we deal with classical ΨDOs and FIOs and want to treat them as semiclassical ones. The negligible operators in the classical calculus are the smoothing ones. We showed in [17] that for every $f \in E^{'} (R^{n})$ and for every smoothing K, we have ${WF}_{h} (K f) \subset R^{n} \times {0}$ . Next, every classical ΨDO of order m can be written as an oscillatory integral of the kind (2.3) with $h = 1$ and a symbol $a (x, ξ)$ vanishing for $| ξ | ⩽ 1$ , plus a smoothing operator. Then formally, that oscillatory integral is an h-ΨDO with symbol $a (x, ξ / h)$ . Then we can replace $⟨ ξ ⟩$ in (2.2) by $| ξ |$ to obtain an equivalent estimate, and $\begin{matrix} | \partial_{x}^{α} \partial_{ξ}^{β} a (x, ξ / h) | ⩽ C_{α, β, K} h^{- | β |} | ξ / h |^{m - | β |} = C_{α, β, K} h^{- m} | ξ |^{m - | β |} . \end{matrix}$ On the support of the symbol, we have $| ξ | ⩾ h$ , therefore the factor $| ξ |^{m - | β |}$ is not uniformly bounded near $ξ = 0$ when $m < | β |$ . On the other hand, it is uniformly bounded when $| ξ | ⩾ ε$ with $ε > h$ . This allowed us in [17], for every $ε > 0$ , to split $a (x, D)$ into an h-ΨDO with symbol $a (x, ξ / h) (1 - χ (ξ / ε))$ with some cut-off function $χ \in C_{0}^{\infty}$ plus an operator mapping semiclassically band limited functions into functions with semiclassical wave front set in an $O (ε)$ neighborhood of the zero section. We show below that we can do the same thing for FIOs associated with canonical diffeomorphisms.

Let A be a properly supported FIO with a canonical relation which is a graph of a homogeneous canonical transformation. Then up to a smoothing operator, A is of the form $\begin{matrix} (2.5) & A f (x) = {(2 π)}^{- n} \iint e^{i (ϕ (x, η) - y \cdot η)} a (x, η) f (y) d y d η, \end{matrix}$ see [9, Section 25.3], with a a classical symbol and a phase $ϕ (x, η)$ homogeneous in η of odder 1, satisfying $det ϕ_{x η} \neq 0$ , $ϕ_{x} \neq 0$ for $η \neq 0$ . The smoothing “error” can still be written in this form with $ϕ (x, η) = x \cdot η$ (a ΨDO) and an amplitude of order $- \infty$ , so the arguments below apply to it as well. Let $ψ \in C_{0}^{\infty}$ have support in $B (0, 2)$ , $ψ = 1$ on $B (0, 1)$ , and fix $ε > 0$ . Then $A = A_{h, ε} + R_{h, ε}$ , where $\begin{matrix} (2.6) & \begin{matrix} A_{h, ε} f (x) = [A (Id - ψ (h D / ε)) f] (x) \\ = {(2 π h)}^{- n} \iint e^{i (ϕ (x, η) - y \cdot η) / h} a (x, η / h) (1 - ψ (η / ε)) f (y) d y d η, \\ R_{h, ε} f (x) = [A ψ (h D / ε) f] (x) = {(2 π h)}^{- n} \iint e^{i (ϕ (x, η) - y \cdot η) / h} a (x, η / h) ψ (η / ε) f (y) d y d η . \end{matrix} \end{matrix}$

Theorem 2.1.
Under the assumptions above,
The operator $A_{h, ε}$ is an h-FIO with a (semiclassical) canonical relation the same as the (classical) one of A. Moreover, for every semiclassically band limited f with ${WF}_{h} (f) \cap (R^{n} \times B (0, ε)) = \emptyset$ , we have $A f = A_{h, ε} f + O (h^{\infty})$ .

For every $f_{h} \in E^{'} (R^{n})$ with support in some compact set independent of h, satisfying $| F_{h} f_{h} | ⩽ C h^{- N}$ for some N, we have ${WF}_{h} (R_{h, ε} f) \subset R^{n} \times B (0, C ε)$ with some $C > 0$ .

Proof.
Since A is properly supported, and f is either $O (h^{\infty})$ outside some fixed compact set in (a), or vanishes in (b), we can assume that the x–support of a is compact and independent of h as well. It follows from (2.2) that if a is a classical symbol of order m, then $a (x, η / h)$ is a semiclassical one of order $(m, - m)$ for $| η | / h > 1$ . Therefore our claim (a) is true for $| η | > ε / 2$ and $0 < h < ε$ , which is true on the support of the symbol $a (x, η / h) (1 - ψ (η / ε))$ of $A_{h, ε}$ . Hence $\tilde{a}$ is a semiclassical symbol of order $(- m, m)$ . The second part of (a) is immediate.

To prove (b), multiply $R_{h, ε} f$ by $ρ \in C_{0}^{\infty}$ and apply $F_{h}$ : $\begin{matrix} F_{h} ρ R_{h, ε} f (ξ) = {(2 π h)}^{- n} \iint e^{i (ϕ (x, η) - x \cdot ξ) / h} ρ (x) a (x, η / h) ψ (η / ε) F_{h} f (η) d η d x . \end{matrix}$ For the phase $Φ : = ϕ (x, η) - x \cdot ξ$ we have $Φ_{x} = ϕ_{x} (x, η) - ξ$ . By the homogeneity of ϕ, for $| η | ⩽ 2 ε$ , and $| ξ | > C ε$ , we have $Φ_{x} \neq 0$ . Then a stationary phase argument implies $F_{h} ρ R_{h, ε} f (ξ) = O (h^{\infty})$ for such ξ. This proves (b). □

2.5. Semiclassical defect measures

Given $f_{h}$ with $‖ f_{h} ‖ ⩽ C$ , one can show that there exists a sequence $h_{j} \to 0$ so that the limit $\begin{matrix} (2.7) & lim_{h = h_{j} \to 0 +} {(p (x, h D) f_{h}, f_{h})}_{L^{2}} = \int p (x, ξ) d μ_{f} (x, ξ) \end{matrix}$ exists for every symbol $p \in C_{0}^{\infty}$ , see [12,19], and defines a Borel measure $d μ_{f} (x, ξ) ⩾ 0$ called a semiclassical defect measure associated to f. That measure may not be unique. Note that $d μ_{f}$ is invariantly defined on $T^{*} R^{n}$ . On the other hand, its definition (2.7) depends on the choice of the measure (respectively the coordinates) used to define the $L^{2}$ space there. We can use every quantization of p in (2.7), for example the Weyl one $p^{w} (x, h D)$ which guarantees that (2.7) is real when p is real-valued.

When f is semiclassically band limited, ${WF}_{h} (f)$ is compact, hence $d μ_{f}$ has compact support as well, and $\begin{matrix} (2.8) & ‖ f_{h_{j}} ‖_{L^{2}}^{2} = \int d μ_{f} + o (1) . \end{matrix}$ This in particular implies that our assumption guarantees that $‖ f_{h_{j}} ‖_{L^{2}}$ is asymptotically constant as $h_{j} \to 0$ . In fact, some authors require $‖ f_{h} ‖ = 1$ , see [12].

3. Sampling in the semiclassical limit

3.1. Sampling semiclassically band limited functions

We recall some results in [17] first. The classical Nyquist–Shannon sampling theorem says that a function $f \in L^{2} (R^{n})$ with a Fourier transform $\hat{f}$ supported in the box ${[- B, B]}^{n}$ can be uniquely and stably recovered from its samples $f (s k)$ , $k \in Z^{n}$ as long as $0 < s ⩽ π / B$ . More precisely, we have $\begin{matrix} (3.1) & f (x) = \sum_{k \in Z^{n}} f (s k) χ_{k} (x), χ_{k} (x) : = \prod_{j = 1}^{n} sinc (\frac{1}{s} (x_{j} - s k_{j})), \end{matrix}$ where we adopt the “engineering” definition of the sinc function $\begin{matrix} sinc (x) = sin (π x) / π x . \end{matrix}$ Moreover, $\begin{matrix} ‖ f ‖^{2} = s^{n} \sum_{k \in Z^{n}} {| f (s k) |}^{2}, \end{matrix}$ where $‖ \cdot ‖$ is the $L^{2}$ norm, see, e.g., [14] or [4].

The proof is based on viewing the samples $f (s k)$ as the (inverse) Fourier coefficients of $\hat{f}$ , extended as $2 π / s$ -periodic function. We reproduce the proof below in the semiclassical case.

In [17], we formulated this, and related results in the semiclassical setting. One of those theorems is the following. Recall that $Σ_{h} (f)$ is defined in Definition 2.1.

Theorem 3.1.
Let $f_{h}$ be semiclassically band limited with $Σ_{h} (f) \subset \prod (- B_{j}, B_{j})$ with some $B_{j} > 0$ . Let ${\hat{χ}}_{j} \in L^{\infty} (R)$ be supported in $[- π, π]$ , and ${\hat{χ}}_{j} (π ξ_{j} / B_{j}) = 1$ for $ξ \in Σ_{h} (f)$ . If $0 < s_{j} ⩽ π / B_{j}$ , then $\begin{matrix} (3.2) & f_{h} (x) = \sum_{k \in Z^{n}} f_{h} (s_{1} h k_{1}, \dots, s_{n} h k_{n}) \prod_{j} χ_{j} (\frac{1}{s_{j} h} (x_{j} - s_{j} h k_{j})) + O_{S} (h^{\infty}) ‖ f ‖, \end{matrix}$ and $\begin{matrix} (3.3) & ‖ f_{h} ‖^{2} = s_{1} \dots s_{n} h^{n} \sum_{k \in Z^{n}} {| f_{h} (s_{1} h k_{1}, \dots, s_{n} h k_{n}) |}^{2} + O (h^{\infty}) ‖ f ‖^{2} . \end{matrix}$

One could think of $χ_{j}$ as somewhat better versions of the sinc function: they decay faster if we choose ${\hat{χ}}_{j}$ to be smooth. We can do this because $Σ_{h} (f)$ (which is compact) is assumed to be included in the interior of the closed $\prod [- B_{j}, B_{j}]$ . In case of an equality, we must take $χ_{j} (x) = sinc (x)$ .

Assume now for simplicity that all $B_{j}$ and $s_{j}$ are equal to some B and s, respectively. We can always choose a linear transformation $y = W x$ to get back to (3.2) or even more general sampling grids, and the dual one $ξ = W^{*} η$ for the dual variables. Set $χ (x) = χ_{1} (x_{1}) \dots χ_{n} (x_{n})$ . Then (3.2) and (3.3) take the form $\begin{matrix} (3.4) & f_{h} (x) = \sum_{k \in Z^{n}} f_{h} (s h k) χ_{k} (x) + O_{S} (h^{\infty}) ‖ f ‖, χ_{k} (x) : = χ (\frac{1}{s h} (x - s h k)) \end{matrix}$ and $\begin{matrix} (3.5) & ‖ f_{h} ‖^{2} = {(s h)}^{n} \sum_{k \in Z^{n}} {| f_{h} (s h k) |}^{2} + O (h^{\infty}) ‖ f ‖^{2} . \end{matrix}$

The proof of Theorem 3.1 is based on the following observation. Since $F_{h} f$ is supported in $\prod (- B_{j}, B_{j})$ up to an $O (h^{\infty} | ξ |^{- \infty})$ error, and $π / s > B_{j}$ , we have $\begin{matrix} (3.6) & {(F_{h} f)}_{ext} (ξ) = {(s h)}^{n} \sum_{k} f (s h k) e^{- i s x \cdot ξ} + O_{S} (h^{\infty}), \end{matrix}$ where ${(F_{h} f)}_{ext} (ξ)$ is the periodic extension of $F_{h} f (ξ)$ with period $2 π / s$ in each 1D variable. Multiply this by $\hat{χ} (s ξ)$ to get $\begin{matrix} F_{h} f (ξ) = {(s h)}^{n} \hat{χ} (s ξ) \sum_{k} f (s h k) e^{- i s ξ \cdot k} + O_{S} (h^{\infty}) . \end{matrix}$ If $χ_{k}$ is the interpolating function in (3.4), then $\begin{matrix} (3.7) & F_{h} χ_{k} (ξ) = {(s h)}^{n} \hat{χ} (s ξ) e^{- i s ξ \cdot k} . \end{matrix}$ Take $F_{h}^{- 1}$ to complete the proof. The full details can be found in [17]. Also, χ does not need to be of product type, as shown there. Remark 3.1.
In the limit case $Σ_{h} (f) = {(- B, B)}^{n}$ , which is not allowed by the theorem since $Σ_{h} (f)$ is compact, we have $\hat{χ} = 1_{{[- π, π]}^{n}}$ , where $1_{{[- π, π]}^{n}}$ stands for the characteristic function of ${[- π, π]}^{n}$ . Then χ is a product of sinc functions, see (3.1). We will use the notation $\begin{matrix} (3.8) & {sinc}_{k} (x) : = \prod_{j = 1}^{n} sinc (\frac{1}{s h} (x_{j} - s h k_{j})) . \end{matrix}$ Then (3.7) takes the form $\begin{matrix} (3.9) & F_{h} {sinc}_{k} (ξ) = {(s h)}^{n} 1_{{[- π / s, π / s]}^{n}} (ξ) e^{- i s k \cdot ξ} . \end{matrix}$ The functions ${sinc}_{k}$ form an orthogonal system, and $\begin{matrix} (3.10) & ϕ_{k} : = {(s h)}^{- n / 2} {sinc}_{k} \end{matrix}$ is an orthonormal basis in the subspace $1_{{[- π / s, π / s]}^{n}} (h D) L^{2} (R^{n})$ . For future reference, we want to mention that for every $m ⩾ 2$ integer, if ${sinc}_{k}^{(m)} (ξ)$ is defined by $\begin{matrix} (3.11) & F_{h} {sinc}_{k}^{(m)} (ξ) = {(s h / m)}^{n} 1_{{[- m π / s, m π / s]}^{n}} (ξ) e^{- i s k \cdot ξ} \end{matrix}$ then $\begin{matrix} (3.12) & {sinc}_{k}^{(m)} (x) = \prod_{j} sinc (\frac{m}{s h} (x_{j} - s h k_{j})) \end{matrix}$ instead. Then $\begin{matrix} ϕ_{k}^{(m)} : = {(s h / m)}^{- n / 2} {sinc}_{k}^{(m)} \end{matrix}$ is an orthonormal system in $1_{{[- m π / s, m π / s]}^{n}} (h D) L^{2} (R^{n})$ but not a basis. To make it a basis, notice that s was replaced by $s / m$ and k was replaced by $m k$ there. Allowing the original k to run over all integer points, i.e., replacing k by $k / m$ there would complete (3.12) to a basis.

3.2. Constructing a semiclassically band limited function from a discrete sequence

The next question is how to associate a semiclassically band limited function to a set of numbers $f_{k}$ , $k \in Z^{n}$ , which we view as its samples. Without the band limited requirement, this can be done in infinitely many ways, of course, by various ways to interpolate between the samples. On the other hand, if we fix the band limit B, then for $B < π / s$ , such a function, if exists, would be oversampled, and those samples can be shown to be dependent. We can only hope that this problem always has a solution when $B ⩾ π / s$ . The next proposition, proven in [17], shows that it can be done when $B = π / s$ with a sinc interpolation.

Proposition 3.1.
Let $Ω ⋐ Ω_{1}$ be both open. Fix $s > 0$ . For $0 < h ≪ 1$ , let $K (h) \subset Z^{n}$ be the set of those k for which $s h k \in Ω$ . Then for every collection of complex numbers ${f_{k, h}}$ , $k \in K (h)$ with $\sum_{k} | f_{k, h} |^{2}$ tempered, there exists a semiclassically band limited $f_{h}$ with ${WF}_{h} (f) \subset Ω_{1} \times {[- π / s, π / s]}^{n}$ so that $f (s h k) = f_{k, h}$ .

One such choice is given by $\begin{matrix} (3.13) & {\tilde{f}}_{h} = ψ f_{h}, with f_{h} (x) = \sum_{k \in K (h)} f_{k, h} {sinc}_{k} (x), \end{matrix}$ where $ψ \in C_{0}^{\infty} (Ω_{1})$ is equal to 1 near $\bar{Ω}$ . Moreover, ( 3.5 ) holds.
Proof.
With $f_{h}$ as above, we have $f_{h} (s h k) = f_{h} (k)$ and $\begin{matrix} (3.14) & F_{h} f_{h} (ξ) = 1_{{[- π / s, π / s]}^{n}} (ξ) {(s h)}^{n} \sum_{k \in K (h)} f_{k, h} e^{- i s k \cdot ξ}, \end{matrix}$ compare to (3.6) and (3.9). Then ${WF}_{h} (f_{h}) \subset R^{n} \times {[- π / s, π / s]}^{n}$ . Let ψ be as in the theorem. Then ${\tilde{f}}_{h} = ψ f_{h}$ has the required properties. □

By Theorem 3.1, (3.13) is the only such representation when $f_{h}$ is restricted to Ω, up to an $O (h^{\infty})$ error, if we want to keep the band limit B to be the sharp one $B = π / s$ .

The expansion (3.13) has the usual downsides associated with the presence of the sinc functions there – they decay too slowly at infinity allowing the influence of each term to extend too far. When $B > π / s$ (strictly), we can have the localized interpolation functions $χ_{k}$ of Theorem 3.1 in principle. The situation is different than that in Theorem 3.1 though. The functions $χ_{j}$ in (3.2) do not necessarily satisfy $χ_{k_{1}} (s h k_{2}) = δ_{k_{1}, k_{2}}$ , where $δ_{k_{1}, k_{2}}$ stands for the Kronecker symbol. In the case under consideration, they have to (up to an $O (h^{\infty})$ error). Also, when $B > π / s$ , the corresponding function $f_{h}$ would be undersampled rather than oversampled. Next, in interpolations like these, the desire is to make it as smooth as possible.

One way to enforce $f_{h} (s h k) = f_{k, h}$ is to replace the sinc function in (3.12) with itself, multiplied by some $ϕ \in S$ with $ϕ (0) = 1$ , $\hat{ϕ} \in C_{0}^{\infty}$ , i.e., to put a product of $sinc (x) ϕ (x)$ for each point $x_{j}$ . Then $F (ϕ sinc) = {(2 π)}^{- 1} 1_{[- π, π]} * \hat{ϕ}$ has support larger than $[- π, π]$ which corresponds to a band limit greater than $π / s$ , compare with (3.14). One can also have a rapidly decreasing $\hat{ϕ}$ instead of a $C_{0}^{\infty}$ one, and the resulting error by replacing it with a suitable a $C_{0}^{\infty}$ one can be estimated easily.
3.3. Lanczos-3 interpolation and other convolution based interpolations

One practical and approximate realization of the idea above is the Lanczos-3 interpolation. It is part of the family of the Lanczos-k interpolations with the number 3 below replaced by an integer k. In it, the functions $χ_{j}$ in (3.2) are taken to be $\begin{matrix} Lan3 (x) : = H (3 - | x |) sinc (x) sinc (x / 3), \end{matrix}$ where H is the Heaviside function, and x stands for each coordinate function $x_{i}$ . Its Fourier transform is not of compact support but decays like $O (| ξ |^{- 2})$ with a small leading term; and it is very small outside $| ξ | ⩽ 2 π$ , see Fig. 1; as opposed to $sinc (x)$ which Fourier transform is supported on $| ξ | ⩽ π$ . The kernel $Lan3 (x)$ is easy to compute numerically, has a small support, and preserves the property $f (s h k) = f_{k, h}$ because $Lan3 (k) = δ_{k, 0}$ for k an integer. So for all practical purposes, choosing χ in (3.13) to be Lan3, provides an interpolation with a band limit no greater than $B = 2 π / s$ and even $B = 1.5 π / s$ ; 1.5 to 2 times that of (3.13), see Fig. 1.

If the samples in (3.15) with χ a Lanczos-3 kernel, are those of a function with a band limit $B = π / s$ (the Nyquist limit for that step size), then the reconstruction will leave frequencies below $B / 2$ mostly unchanged, and will attenuate and alias those between $B / 2$ and B as in Fig. 1. The resulting aliasing will be “small” because the amplitude is “small” away from $| ξ | ⩽ B$ (in Fig. 1, $B = π$ ). The created $f_{h}$ will have an essential band limit larger than B, as explained above, and this is true even if the samples are arbitrary.

Fig. 1.

The Lanczos-3 kernel Lan3 and its Fourier transform. The sinc kernel and its Fourier transform are shown as dashed lines.

The property of the Lanczos-3 kernel to be almost 1 in $[- π / 2, π / 2]$ can be used to practical interpolations with an explicit kernel with small support approximating well enough $χ_{j}$ in (3.2). For this, it is enough to oversample twice or even 1.5 times only in each coordinate and use the Lanczos-3 kernel. We use this technique in the numerical computations later. This way, we work with a very well localized kernel rather than with the sinc one.

The Lanczos-3 interpolation belongs to the family of the convolution based interpolations of the type $\begin{matrix} (3.15) & f_{h} (x) = \sum_{k \in K (h)} f_{k, h} χ_{k} (x), χ_{k} (x) : = χ (\frac{1}{s h} (x - s h k)), \end{matrix}$ with various compactly supported kernels χ. It is easy to see that this is the case when the interpolation is translation invariant, has a finite domain of influence, and is a linear operator. The simplest examples are the nearest neighbor ( $χ_{k}$ are characteristic functions of boxes in $R^{n}$ ) and the linear interpolation. Some of the higher order ones are the third order cubic Catmull–Rom spline and a fourth order cubic spline proposed by Keyes, see [11,13]. Without going into detail, we will mention that those two are very similar to Lanczos-2 and Lanczos-3, respectively with the Keyes one being a bit more smoothing that Lanczos-3. The Fourier transforms $\hat{χ}$ related to the cubic interpolations and the Lanczos-3 ones decay fast enough to be well approximated with compactly supported ones. Then we have the following.

Proposition 3.2.

Let $\hat{χ} \in L_{comp}^{\infty}$ . Then for $f_{h}$ given by ( 3.15 ) we have $\begin{matrix} ‖ f_{h} ‖^{2} ⩽ C {(s h)}^{n} \sum_{k \in K (h)} | f_{k, h} |^{2}, C : = ‖ \hat{χ} ‖_{L^{\infty}}^{2} . \end{matrix}$

Proof.

Let $1 ⩽ m \in Z$ be such that $supp \hat{χ} \in {[- m π, m π]}^{n}$ . Let ${sinc}_{k}^{(m)}$ be defined by (3.12). Then $ϕ_{k} = {(s h / m)}^{- n / 2} {sinc}_{k}^{(m)}$ form an orthonormal system, see Remark 3.1. For $\begin{matrix} (3.16) & g_{h} (x) : = \sum_{k \in K (h)} f_{k, h} {sinc}_{k}^{(m)} (x) \end{matrix}$ we have $\begin{matrix} (3.17) & ‖ g_{h} ‖^{2} = {(s h / m)}^{n} \sum_{k \in K (h)} | f_{k, h} |^{2} . \end{matrix}$ Multiply (3.11) by $\hat{χ} (s ξ)$ , we get $\begin{matrix} (3.18) & \hat{χ} (s h D) {sinc}_{k}^{(m)} = m^{- n} {\hat{χ}}_{k}, \end{matrix}$ where we used (3.7), valid for $χ_{k}$ as in (3.15) with every χ as in the proposition. Therefore, $f_{h} = m^{n} \hat{χ} (s h D) g_{h}$ . Hence it is easily seen that $‖ f_{h} ‖ ⩽ m^{n} ‖ \hat{χ} ‖_{L^{\infty}} ‖ g_{h} ‖$ , where we recall that $g_{h}$ is defined by (3.16). Combining this with (3.17), we complete the proof. □

3.4. Noisy samples

Let us say we restore a semiclassically band limited function from noisy samples. Assume oversampling, i.e., $B < π / s$ (strictly). Without noise, we would use the formula (3.4) where $f_{h} (\cdot)$ are the samples which we call here $f_{k, h}$ . In other words, we would take $f_{h}$ as in (3.15) with χ so that $\begin{matrix} (3.19) & \hat{χ} = 1 on {[- s B, s B]}^{n}, supp \hat{χ} \subset {(- π, π)}^{n}, \end{matrix}$ which we take to be in the Schwartz class; and we can do this since we can choose $\hat{χ}$ to be in $C_{0}^{\infty}$ .

If we do the same thing with the noisy samples, the added noise will be given by (3.15) again. Then $f (s h k) = f_{k, h}$ would be true for the noise free samples since a priori, $f_{h}$ has a band limit B. This would not be true for the noisy samples, in general, because they are not necessary samples of such a function. In fact, one of the goals of the current contribution is to tackle this issue.

As in the proof of Proposition 3.2, note that (3.15) and the sinc reconstruction (3.13) are closely connected: one can get the former from the latter by applying the convolution operator $\hat{χ} (s h D)$ to it.

3.5. Delta type of expansion

We can view the convolution based interpolation (3.15) as a convolved delta type of expansion in which χ is formally replaced by the Dirac delta. Indeed, start with $\begin{matrix} (3.20) & f_{h}^{δ} (x) : = {(s h)}^{n} \sum_{k \in K (h)} f_{k, h} δ (x - s h k), \end{matrix}$ then $f_{h} = χ_{h} * f_{h}^{δ}$ with $χ_{h} (x) = {(s h)}^{- n} χ (x / (s h))$ . On the Fourier side, we have (3.14) without the cutoff function $1$ there.

4. Noise and defect measures

4.1. Microlocal defect measures as a generalization of power density

We start this section by specifying the kind of white noise considered in the sequel, see also Section 6.

Hypothesis 4.1.
For every $h > 0$ , the noise is modeled by a family ${f_{k, h}; k \in Z^{n}}$ of independent and identically distributed (i.i.d.) real valued random variables defined on the same probability space $(X, F, P)$ . The random variables $f_{k, h}$ have zero expected values and a common finite variance $σ^{2}$ . For our computations we also make the following technical assumption on the common higher moments: there exists a constant $δ > 0$ such that $\begin{matrix} (4.1) & E (f_{k, h}^{4} {(log (1 + | f_{k, h} |))}^{1 + δ}) < \infty . \end{matrix}$

The variables $f_{k, h}$ model the noise at each cell/pixel $x_{k} = s h k$ , with the relative step $s > 0$ fixed, and $h > 0$ a small parameter. In Hypothesis 4.1 we allow $f_{k}$ to depend on h, but h will often be omitted for notational sake. In the numerical examples later, we use either normally distributed $f_{k}$ or uniformly distributed ones. For a fixed bounded domain Ω, the number of sampling points $x_{k} = s h k$ in it (we called that index set $K (h)$ in Proposition 3.1) is $| Ω | {(s h)}^{- n} (1 + o (h))$ . For each $h > 0$ , only that many $f_{k, h}$ ’s will be used eventually; therefore, we have a triangular array of random variables $f_{k, h}$ , $h > 0$ , $k \in K (h)$ .

As explained in the introduction, there are two types of statistical properties we are interested in. First, what we call “temporal” mean, variance, etc., are the moments of each $f_{k}$ as a random variable. They are determined by the process which creates them and in practical applications correspond to repeated experiments, hence the term “temporal”. We use the notation $E (f_{k})$ , $E (f_{k}^{2})$ , etc., for the expectation. The second, and the more interesting kind of properties are for a single experiment as $h \to 0$ , i.e., when the number $N \sim h^{- n}$ of $f_{k}$ grows. The mean is just the mean of those finitely many numbers, and the variance is the mean of their squares. We call them empirical spatial mean and variance, using the notation VAR for the latter and STD for the spatial standard deviation. Limit theorems for averaged random quantities with certain invariances are called sometimes ergodic properties; we view them as “spatial” ones, interpreting $f_{k}$ as samples of some function in space. By the strong law of large numbers, the mean of $\sim h^{- n}$ of $f_{k}$ ’s converges to zero almost surely, and its spatial variance converges to $σ^{2}$ almost surely, as $N \to \infty$ . Below, we define similar quantities for continuous function-valued random variables.

Our terminology could be confusing since for random processes, that is families ${f (t)}_{t \in R}$ of real-valued random variables, t is naturally interpreted as a time parameter. However, in our case the parameter (denoted as x) is a spatial variable and $f$ has to be considered as a random field.

With Hypothesis 4.1 in hand, we think of each discrete noise as identified with a function $f_{h}$ as in (3.15) with some $\hat{χ} \in C_{0}^{\infty}$ without necessarily assuming (3.19) for now. Clearly, $E (f_{h}) = 0$ , which is a temporal characteristic. We now state a lemma for the spatial mean and variance of $f_{h}$ .
Lemma 4.1.
Let ${f_{k, h}; k \in Z^{n}}$ be a noise satisfying Hypothesis 4.1 , and define the function $f_{h}$ according to ( 3.15 ). Then $P$ -almost surely we have $\begin{matrix} (4.2) & MEAN (f_{h}) : = \frac{1}{| Ω |} \int_{Ω} f_{h} d x \to 0, as h \to 0 . \end{matrix}$ As far as the spatial variance of $f_{h}$ is concerned, we get the following $P$ -almost sure limit, $\begin{matrix} (4.3) & {VAR}_{Ω} (f_{h}) : = \frac{1}{| Ω |} \int_{Ω} f_{h}^{2} d x \to σ^{2}, as h \to 0 . \end{matrix}$
Proof.
We will only prove (4.3), the proof of (4.2) being similar. To this aim, starting from (3.15) and using the fact that ${χ_{k}; k \in K (h)}$ is an orthogonal system we get $\begin{matrix} (4.4) & ‖ f_{h} ‖^{2} = \sum_{k \in K (h)} f_{k, h}^{2} ‖ χ_{k} ‖^{2} = c_{χ} {(s h)}^{n} \sum_{k \in K (h)} f_{k, h}^{2}, \end{matrix}$ where $c_{χ} = ‖ χ ‖^{2}$ . Plugging (4.4) into the definition (4.3) of ${VAR}_{Ω} (f_{h})$ , we obtain $\begin{matrix} (4.5) & {VAR}_{Ω} (f_{h}) = c_{χ} \frac{{(s h)}^{n}}{| Ω |} \sum_{k \in K (h)} f_{k, h}^{2} . \end{matrix}$ Taking limits in (4.5) now amounts to applying an almost sure limit theorem for the triangular array ${f_{k, h}^{2}; k \in K (h), h > 0}$ . This is ensured by the relation $Card (K (h)) = | Ω | {(s h)}^{- n} (1 + o (h))$ and classical theorems on strong law of large numbers for triangular arrays (see e.g [10, Corollary on p. 378]), as soon as the random variables $f_{k, h}^{2}$ satisfy Hypothesis 4.1. The proof of our claim (4.3) is now easily achieved. □

By (4.3), $f_{h}$ is $L^{2}$ bounded almost surely, therefore it almost surely has a microlocal defect measure (possibly not unique) associated to it. In this paper, we consider every such semiclassical defect measures $d μ_{f} (x, ξ) ⩾ 0$ , defined in Section 2.5, as a spectral density of $f_{h}$ . In Theorem 4.1 below however, we show that the limit is unique and it holds for every sequence $h \to 0$ in the case we consider.

One can see that $d μ_{f}$ makes sense as the variance density in the phase space. In fact for a domain Ω, the quantity $\begin{matrix} (4.6) & {VAR}_{Ω}^{0} (f) : = \frac{1}{| Ω |} \iint_{T^{*} Ω} d μ_{f} \end{matrix}$ corresponds formally to p being the characteristic function of Ω, divided by $| Ω |$ , which would correspond to the usual variance definition if ${lim}_{h \to 0} f_{h}$ existed. We are not claiming that the latter limit exists however but when f is white noise, the defect measure exists as a limit in mean square sense, as we prove in Theorem 4.1 below. The superscript 0 in (4.6) is a reminder that this is a quantity in the limit $h \to 0$ . We want to emphasize that ${VAR}^{0}$ is just defined by (4.6) and (4.8) below for any f for which $d μ_{f}$ exists and it is not necessarily connected to any random f. When f is random (noise), ${VAR}^{0} (f)$ is related to it as in Theorem 4.1 below. We define the standard deviation $STD (f)$ as the square root of the variance $VAR (f)$ (with or without the superscript 0).

Assume now $\begin{matrix} (4.7) & d μ_{f} = γ_{f} d x d ξ \end{matrix}$ with some continuous $γ_{f} ⩾ 0$ . Then taking the limit as Ω converges to a point, we set $\begin{matrix} (4.8) & {VAR}_{x}^{0} (f) : = \int γ_{f} (x, ξ) d ξ . \end{matrix}$ Hence ${VAR}_{x}^{0} (f)$ can be viewed as the asymptotic variance density of the noise at x.
4.2. A remark about the Wigner function

In this section, we will relate the Wigner function to the defect measures at a heuristic level. For a noise $f$ satisfying Hypothesis 4.1, we set $\begin{matrix} (4.9) & (p^{w} (x, h D) f_{h}, f_{h}) = \int p (x, ξ) W_{f} (x, ξ) d x d ξ, \end{matrix}$ where $W_{f}$ is the Wigner function, see [2], $\begin{matrix} W_{f}^{h} (x, ξ) = {(2 π h)}^{- n} \int e^{- i z \cdot ξ / h} f_{h} (x + z / 2) {\bar{f}}_{h} (x - z / 2) d z . \end{matrix}$ Note that $W_{f}^{h} d x d ξ$ is h-dependent and not a measure in general since it may take negative values. However, the existence theorem of defect measures says that there exits at least one sequence $h_{j} \to 0$ for which $W_{f}^{h}$ converges to some $d μ$ . Moreover, we have $\begin{matrix} (4.10) & \int W_{f}^{h} (x, ξ) d ξ = {| f (x) |}^{2}, \int W_{f}^{h} (x, ξ) d x = {(2 π h)}^{- n} {| F_{h} f (ξ) |}^{2} . \end{matrix}$ In [2], de Verdière considers random vector fields $f (x)$ , $x \in R^{n}$ , and defines their auto-correlation by $\begin{matrix} {ACor}_{f} (x, y) = E (f (x) \bar{f} (y)) \end{matrix}$ Then he defines the power spectrum of f by $\begin{matrix} P_{h} (x, ξ) = E (W_{f}^{h} (x, ξ)) . \end{matrix}$ This lifts the notion of power spectrum to the phase space but the limit $h \to 0$ is not taken.

Following the steps of the forthcoming Theorem 4.1 and using crucially the fact that $E (f_{k} f_{l}) = σ^{2} δ_{k, l}$ , we let the patient reader check that $\begin{matrix} (4.11) & \begin{matrix} E (p^{w} (x, h D) f_{h}, f_{h}) & = {(s h)}^{n} σ^{2} tr (Q (h)) \\ = \frac{s^{n} σ^{2}}{{(2 π)}^{n}} (\iint {| \hat{χ} (s ξ) |}^{2} p (x, ξ) d x d ξ + O (h)), \end{matrix} \end{matrix}$ where Q is defined by (4.19). Thanks to (4.9), this leads to the expected value of the Wigner function $W_{f}^{h}$ up to an $O (h)$ error in a weak sense; and eventually, it could lead to the expected value of the defect measure, if we can take limits as $h \to 0$ in any reasonable probabilistic sense. There are several difficulties with this approach. We have to treat and estimate the remainder as a measure applied to p; different subsequences $h_{j}$ could converge to different defect measures for a fixed $f_{k}$ while the expected value applies to all such sequences, etc. The latter is the important reason we do not pursue this approach. In addition, the Wigner function method characterizes the power spectrum of the noise after repeated experiments (in temporal sense), while we want to study a single one (in ergodic sense).

4.3. The defect measure of white noise

Let $f_{k}$ , $k \in Z^{n}$ have values in R. As before, $Ω \subset R^{n}$ is a bounded domain. In the theorem below, given $h > 0$ , we associate a semiclassically band limited function $f_{h}$ to ${f_{k}}$ by (3.15). This uses $| Ω | {(s h)}^{- n} (1 + o (1))$ terms of the sequence $f_{k}$ . We allow ${f_{k}}$ to depend on h. Then we get a triangular array of random variables.

The following theorem is the main technical result of this paper.

Theorem 4.1.
Assume that ${f_{k, h}; k \in Z^{n}}$ is a noise satisfying Hypothesis 4.1 , with $L^{4}$ moments only. Namely the random variables $f_{k}$ , $k \in Z^{n}$ take values in R and are created by a white noise process with variance $σ^{2} > 0$ and a bounded fourth moment.
Let $f_{h}^{δ}$ be the associated distribution given by ( 3.20 ) with some fixed $s > 0$ . Then for every $p \in C_{0}^{\infty} (T^{} Ω)$ , $\begin{matrix} (4.12) & \begin{matrix} {(p^{w} (x, h D) f_{h}^{δ}, f_{h}^{δ})}_{L^{2}} ⟶ \int p (x, ξ) d μ_{f^{δ}} (x, ξ), \\ as h \to 0 + in mean square sense, \end{matrix} \end{matrix}$ where* $\begin{matrix} (4.13) & d μ_{f^{δ}} (x, ξ) = σ^{2} s^{n} \frac{d x d ξ}{{(2 π)}^{n}} . \end{matrix}$

Let $f_{h}$ be the associated function given by ( 3.15 ) with some fixed $s > 0$ and with $\hat{χ} \in C_{0}^{\infty}$ not necessarily satisfying ( 3.19 ). Then for every $p \in C_{0}^{\infty} (T^{} Ω)$ , $\begin{matrix} (4.14) & \begin{matrix} {(p^{w} (x, h D) f_{h}, f_{h})}_{L^{2}} ⟶ \int p (x, ξ) d μ_{f} (x, ξ), \\ as h \to 0 + in mean square sense, \end{matrix} \end{matrix}$ where* $\begin{matrix} (4.15) & d μ_{f} (x, ξ) = σ^{2} s^{n} {| \hat{χ} (s ξ) |}^{2} \frac{d x d ξ}{{(2 π)}^{n}} . \end{matrix}$

Proof.
Notice first that the l.h.s. of (4.12) is well-defined in distribution sense since the Schwartz kernel of $p^{w} (x, h D)$ , see (4.33), is Schwartz class. Let $\hat{χ} \in C_{0}^{\infty}$ be such that $p (x, ξ) \hat{χ} (ξ) = p (x, ξ)$ . Then $f_{h}^{δ}$ can be replaced by $χ_{h} * f_{h}^{δ}$ as in Section 3.5; which is (3.15). Therefore, we need to prove (b) only.

We start with the easier case when (3.19) is satisfied (with $B < π / s$ ). This corresponds to the practical situation of restoring an oversampled function with white noise added, and the theorem studies how the noise is added to the result.

Recall that the functions ${sinc}_{k}$ were defined in (3.8) and that $ϕ_{k} = {(s h)}^{- n / 2} {sinc}_{k}$ form an orthonormal basis in the space $1_{{[- π / s, π / s]}^{n}} (h D) L^{2} (R^{n})$ , as mentioned earlier. The interpolation function χ satisfies $\hat{χ} 1_{{[- π, π]}^{n}} = \hat{χ}$ by (3.19), therefore, $\begin{matrix} (4.16) & χ_{k} = \hat{χ} (s h D) {sinc}_{k} = {(s h)}^{n / 2} \hat{χ} (s h D) ϕ_{k} . \end{matrix}$ Since $Ω \supset {supp}_{x} p$ , we have $\begin{matrix} (4.17) & (p^{w} (x, h D) f_{h}, f_{h}) = \sum_{k, l \in K {(h)}^{2}} f_{k} f_{l} (p^{w} (x, h D) χ_{k}, χ_{l}) = \sum_{k, l \in K {(h)}^{2}} p_{k l} f_{k} f_{l}, \end{matrix}$ where, as before, $K (h) = {k \in Z^{n}, s h k \in Ω}$ , $K^{2} = K \times K$ , and $\begin{matrix} (4.18) & p_{k l} : = (p^{w} (x, h D) χ_{k}, χ_{l}) = {(s h)}^{n} (Q ϕ_{k}, ϕ_{l}), \end{matrix}$ with $\begin{matrix} (4.19) & Q (h) : = \bar{\hat{χ}} (s h D) p^{w} (x, h D) \hat{χ} (s h D) . \end{matrix}$ We shall prove in Lemma 4.2 that $| p_{k l} | ⩽ C {(s h)}^{n}$ . Our aim in (4.29) is to prove that in the $L^{2} (X)$ sense we have $\begin{matrix} (4.20) & lim_{h \to 0} {(p^{w} (x, h D) f_{h}, f_{h})}_{L^{2}} = \frac{s^{n}}{{(2 π)}^{n}} σ^{2} \int p (x, ξ) d μ_{f} (x, ξ) . \end{matrix}$ We now split the proof of (4.20) in several steps.

Step 1: A decomposition: Split the summation in (4.17) over elements $(k, l)$ on the diagonal $Δ : = {k = l}$ and away from it: $\begin{matrix} (4.21) & {(p^{w} (x, h D) f_{h}, f_{h})}_{L^{2}} = W_{1} + W_{2}, \end{matrix}$ where $\begin{matrix} (4.22) & W_{1} : = \sum_{k \in K (h)} p_{k k} f_{k}^{2}, W_{2} : = \sum_{k, l \in K {(h)}^{2} ∖ Δ} p_{k l} f_{k} f_{l} . \end{matrix}$ Furthermore, according to (4.29) below we have $\begin{matrix} (4.23) & \sum_{k \in K (h)} p_{k k} = \frac{s^{n}}{{(2 π)}^{n}} \int q (x, ξ) d x d ξ . \end{matrix}$ Thus owing to the fact that $q = p + O (h)$ , we can recast (4.21) as $\begin{matrix} (4.24) & {(p^{w} (x, h D) f_{h}, f_{h})}_{L^{2}} - \frac{s^{n}}{{(2 π)}^{n}} σ^{2} \int p (x, ξ) d μ_{f} (x, ξ) = W_{1, 0} + W_{2}, \end{matrix}$ where the term $W_{1, 0}$ is defined by $\begin{matrix} W_{1, 0} = \sum_{k \in K (h)} (f_{k}^{2} - σ^{2}) p_{k k} . \end{matrix}$ We are now reduced to prove that both $W_{1, 0}$ and $W_{2}$ in (4.24) converge to 0 in $L^{2} (X)$ .

Step 2: Analysis of $W_{1, 0}$ : Observe that the random variables $f_{k}^{2} - σ^{2}$ are independent, have zero expectation and a finite variance ${\tilde{σ}}^{2} = E (f_{k}^{4}) - σ^{4}$ under our fourth moment assumptions. Then $E (W_{1, 0}) = 0$ . Moreover, invoking the forthcoming inequality (4.28) and the fact that $Card (K (h)) ⩽ c | Ω | {(s h)}^{- n}$ , we get $\begin{matrix} (4.25) & E (W_{1, 0}^{2}) = \sum_{k \in K (h)} {\tilde{σ}}^{2} p_{k k}^{2} ⩽ {(s h)}^{2 n} {\tilde{σ}}^{2} | Ω | {(s h)}^{- n} ⩽ C h^{n} . \end{matrix}$ Therefore, $W_{1, 0}$ converges to 0 as $h \to 0$ , in the $L^{2} (X)$ sense.

Step 3: Analysis of $W_{2}$ : The random variables $f_{k} f_{l}$ , $k \neq l$ , have expected values zero and variance $σ^{4}$ . Next, $f_{k} f_{l}$ and $f_{k^{'}} f_{l^{'}}$ are not independent unless neither $k^{'}$ nor $l^{'}$ are equal to k or l but they are uncorrelated. Indeed, we only need to check that when, say $k = k^{'}$ and even then, $E ((f_{k} f_{l}) (f_{k} f_{l^{'}})) = E (f_{k}^{2}) E (f_{l}) E (f_{l^{'}}) = 0$ because all $f_{k}$ have expectation zero. Therefore some elementary $L^{2} (X)$ considerations, together with (4.28), reveal that $\begin{matrix} (4.26) & E (W_{2}^{2}) = σ^{4} \sum_{k, l \in K {(h)}^{2} ∖ Δ} p_{k l}^{2} ⩽ C h^{n} . \end{matrix}$ Therefore, $W_{2} \to 0$ in mean square sense.

Summarizing our considerations so far, the proof of the case when (3.19) holds is easily achieved by plugging (4.25) and (4.26) into (4.24).

Step 4: Dropping the assumption ( 3.19 ). Let m be such that $supp \hat{χ} \subset (- m π, m π)$ . Let ${sinc}_{k}^{(m)}$ be as in (3.12). Then (4.16) takes the form, see also (3.18), $\begin{matrix} (4.27) & χ_{k} = m^{n} \hat{χ} (s h D) {sinc}_{k}^{(m)} = m^{n} {(s h / m)}^{n / 2} \hat{χ} (s h D) ϕ_{k}^{(m)} = {(s h m)}^{n / 2} \hat{χ} (s h D) ϕ_{k}^{(m)} . \end{matrix}$ The necessary modifications of the proof above in this case are as follows. For the deterministic term featuring in (4.22) we have the same formula but now, $\begin{matrix} p_{k l} : = (p^{w} (x, h D) χ_{k}, χ_{l}) = {(s h m)}^{n} (Q ϕ_{k}^{(m)}, ϕ_{l}^{(m)}) . \end{matrix}$ The set ${ϕ_{k}^{(m)}}$ is an orthonormal system in $1_{{[- m π / s, m π / s]}^{n}} (h D) L^{2} (R^{n})$ but not a basis, see Remark 3.1. The missing elements are those with fractional indices in $Z^{n} / m$ . Then there are many “gaps” in the sum $W_{1, 0}$ compared to the one with a basis, giving us a trace as in Lemma 4.2. On the other hand, the extra factor $m^{n}$ in (4.27) allow us to think of each term $m^{n} p_{k k}$ as an approximation of all $m^{n}$ terms in a box around k of size one, which would add the missing terms. The error is $O (h^{n + 1})$ (multiplied by the constant $m^{n}$ ), by (4.31). Since $K (h) / m$ has $O ({(m / h)}^{- n})$ points, this introduces an $O (h)$ error, thus (4.14) is preserved. □

The following lemma was used in the proof above. Below, $‖ \cdot ‖_{HS}$ stands for the Hilbert–Schmidt norm.
Lemma 4.2.
For $p_{k l}$ defined by ( 4.18 ), we have $\begin{array}{l} (4.28) & | p_{k l} | ⩽ C {(s h)}^{n}, \\ (4.29) & \sum_{k} p_{k k} = {(s h)}^{n} tr Q = \frac{s^{n}}{{(2 π)}^{n}} \int q (x, ξ) d x d ξ, \\ (4.30) & \sum_{k, l} | p_{k l} |^{2} = {(s h)}^{2 n} ‖ Q ‖_{HS}^{2} = \frac{s^{2 n} h^{n}}{{(2 π)}^{n}} \int {| q (x, ξ) |}^{2} d x d ξ, \end{array}$ where q is the complete symbol of the h-ΨDO Q in ( 4.19 ). Next, $\begin{matrix} (4.31) & \begin{matrix} p_{k l} & = (p^{w} (x, h D) χ_{k}, χ_{l}) \\ = s^{2 n} h^{n} \iint \overset{ˇ}{p} (\frac{s h}{2} (x + y + k + l), s (x - y + k - l)) χ (x) χ (y) d x d y, \end{matrix} \end{matrix}$ where $\overset{ˇ}{p}$ is the inverse Fourier transform of p w.r.t. ξ
Proof.
Inequality (4.28) follows directly from the fact that $‖ P (h) ‖$ is bounded uniformly in h, see, e.g., [19, Theorem 4.21]. If we add the basis elements of $(Id - 1_{{[- π / s, π / s]}^{n}} (h D)) L^{2} (R^{n})$ to the $ϕ_{k}$ terms in (4.18), we will get zero contribution, so we consider it done. Then the first equality in (4.29) follows by the definition of a trace. The second part follows from [3, Ch. 9].

To prove (4.30), write $\begin{matrix} (4.32) & ‖ Q ‖_{HS}^{2} = tr (Q^{} Q) = \sum_{k} ‖ Q ϕ_{k} ‖^{2} = \sum_{k, l} {| (Q ϕ_{k}, ϕ_{l}) |}^{2} = {(s h)}^{- 2 n} \sum_{k, l} | p_{k l} |^{2}, \end{matrix}$ see also the proof of [16, Theorem VI.23]. This proves the first part of (4.30). For the second part, notice that by [3, Ch. 9] again, the Hilbert–Schmidt norm of a classical ΨDO $R : = r (x, D)$ is given by $\begin{matrix} ‖ R ‖_{HS}^{2} = \frac{1}{{(2 π)}^{n}} \int {| r (x, ξ) |}^{2} d x d ξ . \end{matrix}$ We can turn R into a classical ΨDO by setting formally $r (x, ξ) = q (x, h ξ)$ to get $\begin{matrix} {‖ Q (h) ‖}_{HS}^{2} = \frac{1}{{(2 π h)}^{n}} \int {| q (x, ξ) |}^{2} d x d ξ . \end{matrix}$ Combining this with (4.32), we complete the proof of (4.30) as well.

Finally, the Schwartz kernel of $p^{w} (x, h D)$ is given by $\begin{matrix} (4.33) & h^{- n} \overset{ˇ}{p} ((x + y) / 2, (x - y) / h), \end{matrix}$ and $\overset{ˇ}{p}$ is in the Schwartz class. Then $\begin{matrix} (p^{w} (x, h D) χ_{k}, χ_{l}) = h^{- n} \iint \overset{ˇ}{p} (\frac{x + y}{2}, \frac{x - y}{h}) χ (\frac{1}{s h} (x - s h k)) χ (\frac{1}{s h} (y - s h l)) d x d y . \end{matrix}$ Make the change of variables $\tilde{x} = (x - s h k) / (s h)$ , $\tilde{y} = (x - s h l) / (s h)$ ; then $x = s h (\tilde{x} + k)$ , $y = s h (\tilde{y} + l)$ to get (4.31). □
Remark 4.1.
(a) The presence of the parameter s in (4.15) is to be expected. The random sequence $f_{k}$ is not related to any distance scale, while $s h$ is the distance between two adjacent points on the sampling grid after we associate $f_{k}$ to $f_{h}$ . Then s reflects the choice of that scale.

(b) For every x, we have, see (4.6), $\begin{matrix} (4.34) & {VAR}_{x}^{0} (f) = \int γ_{f} (x, ξ) d ξ = \frac{σ^{2}}{{(2 π)}^{n}} ‖ \hat{χ} ‖^{2} = σ^{2} ‖ χ ‖^{2}, \end{matrix}$ in mean square sense, see also (4.10). In particular, if χ is a product of sinc functions, we get $σ^{2}$ , i.e., $f_{h}$ has the same variance as that of $f_{k}$ , in a limit. If $χ = LAN 3$ , then $‖ χ ‖^{2} \approx 0.888$ in one dimension. In dimension n, we have a product of such $χ (x_{j})$ ’s, then the factor would be $‖ χ ‖^{2 n}$ instead, therefore, ${STD}_{x}^{0} (f) \approx {0.94}^{n} σ$ . Note that there is no dependence on s here. For the linear interpolation, $‖ χ ‖^{2} = 2 / 3$ , therefore, ${STD}_{x}^{0} (f) = {(2 / 3)}^{n / 2} σ \approx {0.816}^{n} σ$ . All those equalities are mean square limits in the sense of the theorem.

(c) If we are interested in the expected value of the variance in repeated experiments, the equivalent of (4.34) is easy to get. We can think of $f_{h}$ as a linear operator, say Ψ, applied to $f = {f_{k}}$ , i.e., $f_{h} = Ψ f$ . Then $\begin{matrix} E (‖ f_{h} ‖^{2}) = E (Ψ^{} Ψ f, f) = σ^{2} tr (Ψ^{*} Ψ) = σ^{2} ‖ Ψ ‖_{HS}^{2}, \end{matrix}$ where the latter norm is the Hilbert–Schmidt one. Then the equivalent of (4.34) can be derived from this formula. That requires repeated experiments however.

(d) The variance (4.6) is like the l.h.s. of (4.14) with p being the characteristic function of Ω divided by its volume. The theorem requires p to be smooth though, so we may think of (4.6) as an approximation of $(p f_{h}, f_{h})$ with $p \in C_{0}^{\infty} (Ω)$ (independent of ξ) approximating that normalized characteristic function.

(e) Theorem 4.1 says that the noisy $| {\hat{f}}_{h} |^{2}$ in (3.15) converges in weak sense to $\frac{s^{2 n} h^{n}}{{(2 π)}^{n}} σ^{2} | \hat{χ} (s ξ) |^{2}$ .

(f) We can assume that the noise is not homogeneous, for example that $f_{k, h}$ are replaced by $ζ (s h k) f_{k, h}$ with some smooth ζ. This case can be handled as explained in Section 7.1, where $g = ζ$ and the problem with $\nabla g$ described there does not exist in this case. This would introduce the extra factor $| ζ (x) |^{2}$ in (4.15). In principle, one can consider noise inhomogeneous in phase space, i.e., ζ being a suitably sampled ΨDO or an h-ΨDO.

In Fig. 2, we present an one dimensional numerical example. In Sections 5 and 6 we show two-dimensional ones. We take a discrete $f$ with $N = 100$ components, upsize it to a 200 point grid with the Lanczos3 algorithm, and plot $| \hat{f} |$ , where the hat stands for the Discrete Fourier Transform, then the same quantity computed as a square root of $| \hat{f} |^{2}$ averaged over $10^{2}$ and $10^{5}$ experiments, for frequencies in $[0, 100]$ . This illustrates (4.11). The limiting profile looks very close to the profile in Fig. 1, right, as expected from our Remark (e) above. At the right hand side of the plot, it is not as close to zero as the profile in Fig. 3 because of the $O (1 / N)$ error in (4.11); here $N = 100$ only. The plot on the right is essentially the expected value of the Wigner function $W_{f}^{h}$ .

Fig. 2.
Plot of $| \hat{f} |$ for $N = 100$ , with $| \hat{f} |^{2}$ averaged over 1, $10^{2}$ , and $10^{5}$ experiments.

In Fig. 3, the setup is as above but we show the smoothing effect of averaging the power spectrum within a single experiment, illustrating relation (4.14). To this aim we consider $f$ with $N = 10^{2}$ , $10^{4}$ , and $10^{6}$ components. The frequency interval is divided into 25 subintervals and averaged there, similarly to Fig. 16. The plot on the left is very close to the plot of the modulus of the Fourier transform of the Lanczos3 filter in Fig. 1.

Fig. 3.
Plot of $| \hat{f} |$ with a single experiment, for $N = 10^{2}, 10^{4}, 10^{6}$ , with averaging over 25 subintervals.
4.4. Microlocal defect measure of more general noise

We consider more general noise now. First, we assume that the random variables $f_{k, h}$ might be correlated with the neighboring ones; and second, we assume that this correlation might be position dependent. Since the position of $f_{k, h}$ would be at $x_{k} = s h k$ , this more general noise would be assumed to satisfy the following.

Hypothesis 4.2.
For every $h > 0$ , the noise is modeled by a family ${f_{k, h}; k \in Z^{n}}$ of real valued random variables defined on the same probability space $(X, F, P)$ with zero expected values. They are all assumed to satisfy ( 4.1 ) with a uniform bound. For the autocorrelation $ACor (f_{k, h}, f_{k + m, h})$ we assume $\begin{matrix} (4.35) & ACor (f_{k, h}, f_{k + m, h}) = β (s k h, m), \end{matrix}$ where $β (x, k)$ , $x \in R^{n}$ , $k \in Z^{n}$ , is smooth in x, and supported in a bounded set w.r.t. both variables.

Note that we are no longer requiring, in particular, $f_{k, h}$ to have the same variance. They are not identically distributed, in general.

Let $\begin{matrix} (4.36) & \overset{ˇ}{β} (x, ξ) = \sum_{m} e^{i s m \cdot ξ} β (x, m) \end{matrix}$ be the inverse Fourier series of β with respect to the m variable. This is essentially the Wigner distribution related to the auto-correlation, in the limit $h \to 0$ . Since $β (s (k + m) h, - m) = β (s k h, m)$ , we must have $β (x, m) = β (x, - m)$ for all $(x, m)$ . Then (4.36) is just a cosine series, and in particular real. The theorem above shows that it is in fact non-negative.

The generalization of Theorem 4.1 to this case is the following. Theorem 4.2.
Assume that ${f_{k, h}; k \in Z^{n}}$ is a noise satisfying Hypothesis 4.2 , with $L^{4}$ moments only. Let $f_{h}$ be the associated function given by ( 3.15 ) with some fixed $s > 0$ and with $\hat{χ} \in C_{0}^{\infty}$ not necessarily satisfying ( 3.19 ). Then ( 4.14 ) remains true with $\begin{matrix} (4.37) & d μ_{f} (x, ξ) = \frac{s^{n}}{{(2 π)}^{n}} \overset{ˇ}{β} (x, s ξ) {| \hat{χ} (s ξ) |}^{2} d x d ξ . \end{matrix}$
Proof.
We follow the proof of Theorem 4.1. We replace the diagonal Δ in it by $Δ = {(k, l); | k - l | ⩽ M}$ , where M is so that $β (\cdot, m) = 0$ for $| m | > M$ . The off–Δ terms do not contribute to the limit (4.14) as above. For the rest, we estimate their contribution for every fixed m, and then sum up the results. The analog of $W_{1}$ now, depending on m, is $\begin{matrix} (4.38) & W_{1} = \sum_{k \in K (h)} p_{k k + m} f_{k} f_{k + m} = \sum_{k \in K (h)} β (s k h, m) p_{k k + m} + W_{1, 0}, \end{matrix}$ where $\begin{matrix} W_{1, 0} = \sum_{k \in K (h)} (f_{k} f_{k + m} - β (s k h, m)) p_{k k + m} . \end{matrix}$ The analysis of $W_{1, 0}$ is similar: the random variables $f_{k} f_{k + m} - β (s k h, m)$ have zero expectation, thus $E (W_{1, 0}) = 0$ . They have a uniformly bounded variance. To estimate $E (W_{1, 0}^{2})$ , notice that only $O (m^{2} h^{- n})$ terms in the expansion would have a non-zero expectation; and by (4.28), $E (W_{1, 0}^{2}) = O (h^{n})$ again. It remains to compute the β term in (4.38).

Recall the definition (4.18) of $p_{k l}$ . With $l = k + m$ there, an easy calculation shows that $Q ϕ_{k + m} = q_{m} (x, h D) ϕ_{k}$ for any h-ΨDO $Q = q (x, h D)$ , with $q_{m} (x, ξ) = e^{i s m \cdot ξ} q (x + s h m, ξ)$ (which is a symbol as well, notice that there is no h in the phase). The principal symbol of that is just $e^{i s m \cdot ξ} q (x, ξ)$ . Then the β term in (4.38) takes the form $\begin{matrix} {(s h)}^{n} \sum_{k \in K (h)} (β (s k h, m) ϕ_{k}, q_{m} (x, h D) ϕ_{k}) . \end{matrix}$ By the properties of $ϕ_{k}$ , recall (3.8) and (3.9), replacing $β (s k h, m)$ above with $β (x, m)$ would result in an $O (s h)$ error in each term, and a total error $O (h)$ . Considering this done, and moving the β factor to the right, we get a quadratic form with q multiplied by $β (x, m)$ : $\begin{matrix} {(s h)}^{n} \sum_{k \in K (h)} (ϕ_{k}, {\tilde{q}}_{m} (x, h D) ϕ_{k}), \end{matrix}$ where ${\tilde{q}}_{m} (x, ξ) = e^{i s m \cdot ξ} β (x, m) q (x, ξ)$ .

So far, m was fixed. Summing over m (the number of those terms is $2 M + 1$ ), we get to the situation of the proof of Theorem 4.1 with q replaced by $\begin{matrix} \sum_{m} e^{i s m \cdot ξ} β (x, m) q (x, ξ) = \overset{ˇ}{β} (x, s ξ) q (x, ξ) . \end{matrix}$ The theorem then follows as in the proof of Theorem 4.1. □

4.5. Spectral density under an FIO

We want to find out how a spectral density transforms under an action of a classical FIO of order m. It is easier to answer this question for semiclassical FIOs since the defect measures are a semiclassical object, and we will reduce the classical case to the semiclassical one.

Theorem 4.3.
Let A be a classical FIO of order m on $R^{n}$ with a homogeneous principal symbol associated with a canonical relation which is a graph of a local diffeomorphism κ (called the canonical transformation of A). Let $f = f_{h}$ be semiclassically band limited and uniformly bounded in $L^{2}$ . Then for every defect measure $d μ_{f}$ given as the limit ( 2.7 ) for some $h = h_{j} \to 0$ , the defect measure $d μ_{h^{m} A f}$ associated to the same sequence $h_{j}$ exists as well and it satisfies $\begin{matrix} d μ_{h^{m} A f} = {κ^{- 1}}^{} (b d μ_{f}) on T^{} R^{n} ∖ 0, \end{matrix}$ where b is the (classical) principal symbol of $A^{} A$ .
Proof.
By (2.7), $\begin{matrix} (4.39) & \begin{matrix} \int p (x, ξ) d μ_{h^{m} A f} & = lim_{h = h_{j} \to 0} {(p (x, h D) h^{m} A f_{h}, h^{m} A f_{h})}_{L^{2}} \\ = lim_{h = h_{j} \to 0} {(h^{m} A^{} p (x, h D) h^{m} A f_{h}, f_{h})}_{L^{2}} . \end{matrix} \end{matrix}$ Since we need to find $d μ_{h^{m} A f}$ away from the zero section, it is enough to assume that $p = 0$ near $ξ = 0$ .

If for a moment we ignore the need to cut near $ξ = 0$ , then we can think of A as in (2.5) as an h-FIO with symbol $a (x, ξ / h) = h^{- m} a (x, ξ)$ for $| ξ | ≫ 1$ . Then by the semiclassical Egorov’s theorem [12, Theorem 5.5.5], which an analog of the classical one, (Theorem 25.3.5 in [9]), we would get $\begin{matrix} (4.40) & {(h^{m} A^{} p (x, h D) h^{m} A f_{h}, f_{h})}_{L^{2}} = {(Q f_{h}, f_{h})}_{L^{2}}, \end{matrix}$ where Q is an h-ΨDO with a principal symbol $b (p \circ κ)$ , with b the (classical) principal symbol of $A^{} A$ and κ is the canonical transformation of A. Note that the canonical relations of A and its semiclassical version after the change $ξ \mapsto ξ / h$ are the same.

To deal with the fact that we have a classical FIO and a semiclassical ΨDO, we apply Theorem 2.1. Let $A = A_{h, ε} + R_{h, ε}$ be as in (2.6). For $ε ≪ 1$ , the remainder $R_{h, ε}$ would contribute an $O (h^{\infty})$ error to (4.39) if we replace A there by $A_{h, ε}$ because $p = 0$ near $ξ = 0$ . Therefore, we can consider this done. Then $A_{h, ε}$ is an h-FIO, see (2.6) with symbol $\tilde{a} : = a (x, η / h) (1 - ψ (η / ε)) \in h^{- m} S^{0}$ supported where $η ⩾ ε$ . On the support, $| η | / h ⩾ ε / h$ , and there, $a (x, η / h)$ is homogeneous for $h ≪ 1$ ; therefore $\tilde{a} = h^{- m} a (x, η) (1 - ψ (η / ε))$

Then we can apply the semiclassical version of Egorov’s theorem [12, Theorem 5.5.5]. For that, we need to compare the principal symbol of the h-ΨDO $A_{h, ε}^{} A_{h, ε}$ to that of the classical ΨDO $A^{} A$ and see how the cutoff $(1 - ψ (η / ε))$ near the zero section affects that.

The principal symbol of $A_{h, ε}^{} A_{h, ε}$ is given by $\begin{matrix} c (x, ξ, h) = | \tilde{a} (π_{1} \circ κ (x, ξ)), ξ, h) |^{2} J (x, ξ), \end{matrix}$ where $π_{1}$ is the projection on the fist variable, and $J > 0$ is a smooth Jacobian, homogeneous of order zero w.r.t. ξ, depending on the phase function only. For $| ξ | > 2 ε$ we have $\tilde{a} = h^{- m} a (x, η)$ ; therefore $\begin{matrix} c (x, ξ, h) = h^{- 2 m} | a (π_{1} \circ κ (x, ξ)), ξ) |^{2} J (x, ξ), | ξ | ⩾ 2 ε . \end{matrix}$ This is the principal symbol of $A^{} A$ as a classical ΨDO as well without the factor $h^{- 2 m}$ . Therefore, the limit of (4.40), as $h = h_{j} \to 0$ , would be $\begin{matrix} \int b (p \circ κ) (x, ξ) d μ_{f} \end{matrix}$ as long as $p = 0$ for $| ξ | ⩽ 2 ε$ . Make the change of variables $κ (x, ξ) = (y, η)$ , and using the fact that κ is symplectic, in particular an isometry, we would get $\begin{matrix} \int p (x, ξ) d μ_{h^{m} A f} = \int p (x, ξ) {κ^{- 1}}^{} (b d μ_{f}), \end{matrix}$ when $p = 0$ for $| ξ | ⩽ 2 ε$ , where ${κ^{- 1}}^{}$ is the pull-back under $κ^{- 1}$ . Since $ε > 0$ is arbitrary, this holds when $0 \notin {supp}_{ξ} p$ . Then $\begin{matrix} (4.41) & d μ_{h^{m} A f} = {κ^{- 1}}^{} (b d μ_{f}) . \end{matrix}$ So far A was microlocalized near pair of points, where κ is a (global) diffeomorphism. Since it is only a local one, we can do the same for each branch, and add the results. Then b would be the principal symbol of $A^{} A$ with all branches combined, as stated. □

Note that in particular, if (4.7) holds, then ${κ^{- 1}}^{} (γ_{f} d x d ξ) = γ_{g} \circ κ^{- 1} d y d η$ .
Remark 4.2.
The proof also implies that if $Q = q (x, h D)$ and $R = r (x, h D)$ are h-ΨDOs, then $\begin{matrix} (4.42) & d μ_{h^{m} Q A R f} = | q |^{2} {κ^{- 1}}^{} (b | r |^{2} d μ_{f}) on T^{} R^{n} ∖ 0, \end{matrix}$ where b still denotes the principal symbol of $A^{} A$ .
Example 2.
Take $R = r (x)$ (i.e., a multiplication) with r smooth. Then, up to $O (h)$ , equality (3.15) for $r f_{h}$ takes a similar form but now $f_{k, h}$ are replaced by $r (s h k) f_{k, h}$ . This is an example of non-homogeneous noise, depending on the position, for which Theorem 4.1 applies but then the measure is as in (4.42).
Example 3.
Let R be a convolution with $h^{- n} ψ (x / h)$ with some $ψ \in C_{0}^{\infty}$ . This is an h-ΨDO with symbol $\overset{ˇ}{ψ} (ξ)$ , therefore we get the factor $| r |^{2} = | \overset{ˇ}{ψ} (ξ) |^{2}$ in (4.42). An elementary computation shows that, up to $O (h)$ , $R f_{h}$ is obtained from ${\tilde{f}}_{k, h} = \sum_{m} ψ (s (k - m)) f_{m, h}$ . Those are correlated (in general) random variables. They model sensors with cross-talk. Then Theorem 4.1 applies with the measure is as in (4.42).

Both examples are covered by Theorem 4.2 as well if you think of $R f$ as f but generated by correlated noise $f_{k, h}$ .
4.6. Back to the inverse problem

We return to the inverse problem (1.1) now. Let A be an FIO as in Theorem 4.3, and elliptic. More precisely, let $Ω \subset R^{n}$ be a bounded domain, and let $Ω^{'}$ be another such domain so that the canonical transformation κ of A maps $T^{*} Ω$ into $T^{*} Ω^{'}$ . By a compactness argument, if A is defined first as $A : E^{'} (Ω) \to E^{'} (R^{n})$ , then the range of κ projected to its base variable is a bounded set, thus such an $Ω^{'}$ exists. Outside ${\bar{Ω}}^{'}$ , the image of A is smooth. The measurement g, supposedly equal to $A f$ for some $f \in E^{'} (Ω)$ but corrupted by noise, is a function defined in $Ω^{'}$ . Then (1.1) is microlocally solvable: $f = A^{- 1} g$ (we do not have problems with g not being in the range because $A^{- 1}$ is a parametrix) and we are in the situation above with A replaced by $A^{- 1}$ . The added noise is given by (1.4). Dropping the subscript “noise” as we already did, we assume that g is given first as discrete noise ${g_{k}}$ and then converted to a semiclassically band limited function g as in (3.15). Then $\begin{matrix} A^{- 1} g = \sum_{s h k \in Ω^{'}} g_{k} A^{- 1} χ_{k} . \end{matrix}$ We have not defined what noise is but we can think of this as noise because it is a linear combination of ${A^{- 1} χ_{k}}$ with random coefficients. It has zero mean in the sense of (4.2). Then $\begin{matrix} (4.43) & d μ_{h^{- m} A^{- 1} g} = κ^{*} (b^{- 1} d μ_{g}) on T^{*} R^{n} ∖ 0, \end{matrix}$ where κ is the canonical transformation of A and b is the principal symbol of $A A^{*}$ . By Egorov’s theorem again applied to the operator $A^{*} (A A^{*}) A = {(A^{*} A)}^{2}$ , the principal symbol of it is that of $A^{*} A$ multiplied by $b \circ κ$ . Therefore, $b \circ κ$ is the principal symbol of $A^{*} A$ .

The defect measure (4.43) then describes the power spectrum of the noise in the reconstruction away from the zero section $ξ = 0$ . We cannot expect to get an estimate near the zero section in this case since A may not be even injective. For example, the interior region of interest problem for the Radon transform in the plane has no unique solution and the practical solution is a parametrix. Then every element in the kernel would be smooth and could be considered as noise with zero frequency.

Next theorem is a direct consequence of (4.43). The operator Q is needed to cut the zero section, and R is a filter which we may want to apply to the data, see also next section. Below, $σ_{p} (Q)$ stands for the principal symbol of Q.

Theorem 4.4.
Let A be as above, and elliptic, and let $g = g_{h}$ be semiclassically band limited with ${WF}_{h} (g) \subset T^{} Ω^{'}$ , uniformly bounded in* $L^{2} (Ω^{'})$ . If $R = r (x, h D)$ is any h-ΨDO in $Ω^{'}$ with an h-independent symbol, and if $Q = q (x, h D)$ is a similar h-ΨDO in Ω with $q = 0$ near the zero section, then $\begin{matrix} (4.44) & \begin{matrix} {VAR}_{Ω}^{0} (Q h^{- m} A^{- 1} R g) & = \frac{1}{| Ω |} \int_{T^{} Ω} | q |^{2} σ_{p} {(A^{} A)}^{- 1} κ^{} (| r |^{2} d μ_{g}) \\ = \frac{1}{| Ω |} \int_{T^{} Ω^{'}} {| q \circ κ^{- 1} |}^{2} σ_{p} {(A A^{})}^{- 1} | r |^{2} d μ_{g} \end{matrix} \end{matrix}$ for every g (called there f) as in Theorem* 4.3 .
Proof.
By Remark 4.2 about Theorem 4.3 and (4.6), $\begin{matrix} {VAR}_{Ω}^{0} (Q h^{- m} A^{- 1} R g) = \frac{1}{| Ω |} \int_{T^{} Ω} | q |^{2} κ^{} (b^{- 1} | r |^{2} d μ_{g}) . \end{matrix}$ Make the change of variables $(y, η) = κ (x, ξ)$ , where $(y, η)$ are the variables in the phase space of g, using the fact that κ is symplectic, and therefore an isometry, to get the second equality of the theorem. □

A typical use of this theorem is to take q to cut off smoothly a small neighborhood of the zero section. Then, for g being white noise, for example, the effect of that on the r.h.s. would be small. Then if we formally take $q = 1$ , hence $Q = Id$ , we get a good approximation of the variance of the noise in the reconstruction away from the zero frequency noise, by Theorem 4.1. The operator R plays a role of a filter before the inversion.

We want to emphasize that g in Theorem 4.4 does not need to be white noise; we just need a well-defined $d μ_{g}$ , which is the case for noise satisfying Hypothesis 4.2, by Theorem 4.2. Remark 4.3.
In some situations, like in the next two sections, the requirement $q = 0$ near the zero section can be removed, and the whole operator Q can be removed (replaced by $Id$ ). Assume that the filter r is compactly supported in the dual variable. Since we deal with semiclassically band limited g, we can always assume that. Assume that $σ_{p} {(A^{} A)}^{- 1} κ^{} d μ_{g}$ is absolutely continuous near the zero section. In the case of the Radon transform in parallel geometry in the next section, for example, with g being white noise, that measure is $C | ξ | d x d ξ$ , so this assumption is satisfied. Then the first integral in (4.44) has a limit when q (a priori vanishing near $ξ = 0$ ) tends to 1, and that limit is given by the same formula with $q = 1$ . Then the l.h.s. has the same limit, too„because we just defined it by that equality, see (4.6). A similar remark applies to the second integral.

5. The Radon transform in “parallel geometry”

We apply the theory to the Radon transform now. We study the parallel geometry parameterization first, where each (directed) line is parameterized by its signed distance p to the origin, and its normal ω, see (1.5). For $\begin{matrix} (5.1) & ω (φ) = (cos φ, sin φ), \end{matrix}$ we choose the natural measures $d φ$ ; and the standard measure $d p$ for p. Based on that, we a define the microlocal defect measure $d μ_{g} (φ, p, \hat{φ}, \hat{p})$ of $g = g_{h} (φ, p)$ . If we restrict p to $| p | ⩽ R$ , corresponding to Radon transforms of functions supported in $B (0, R)$ , since φ naturally belongs to $| φ | ⩽ π$ (modulo $2 π$ ) (call that Ω), then $\begin{matrix} (5.2) & {VAR}_{Ω}^{0} (g) = \frac{1}{4 π R} \int \int_{Ω} d μ_{g} (φ, p, \hat{φ}, \hat{p}) . \end{matrix}$ The Radon transform is an FIO of order $- 1 / 2$ with a canonical relation given by the union of canonical relations corresponding to the canonical transformations $\begin{matrix} κ_{\pm} : (x, ξ) ⟼ (\underset{φ}{\underset{︸}{arg (\pm ξ)}}, \underset{p}{\underset{︸}{\pm x \cdot ξ / | ξ |}}, \underset{\hat{φ}}{\underset{︸}{- x \cdot ξ^{⊥}}}, \underset{\hat{p}}{\underset{︸}{\pm | ξ |}}) . \end{matrix}$ The ranges of $κ_{\pm}$ intersect in the zero section only, and in particular, $\pm \hat{p} ⩾ 0$ on the range of $κ_{\pm}$ . Next, each branch is a local diffeomorphism. Indeed, $(x, ξ) = κ_{\pm}^{- 1} (φ, p, \hat{φ}, \hat{p})$ is given by $\begin{matrix} x = p ω (φ) - (\hat{φ} / \hat{p}) ω^{⊥} (φ), ξ = \hat{p} ω (φ) . \end{matrix}$ It is well defined for $\hat{p} \neq 0$ but if we want x in the image to be in $| x | < R$ , we need to require $p^{2} + {(\hat{φ} / \hat{p})}^{2} < R^{2}$ ; therefore $κ_{\pm}^{- 1}$ are well defined away from the zero section. Then $R^{- 1}$ is associated with $κ^{- 1}$ , which is a local diffeomorphism as well. What prevents it from being global is that it is 2-to-1, i.e., and in particular, it is not injective.

5.1. The unfiltered inversion

The symbol of ${RR}^{*}$ is $b = 4 π | \hat{p} |^{- 1}$ , where $\hat{p}$ is the dual of p. Applying the canonical transformation, we get $b \circ κ = 4 π / | ξ |$ . We could have obtained this as the principal (and full) symbol $4 π / | ξ |$ of $R^{*} R$ . Therefore, by (4.43), $\begin{matrix} (5.3) & d μ_{h^{1 / 2} R^{- 1} g} (x, ξ) = \frac{| ξ |}{4 π} κ^{*} d μ_{g} on T^{*} R^{2} ∖ 0 . \end{matrix}$ The fact that κ is 1-to-2 presents some subtlety here, already accounted for in the proof of Theorem 4.3. Microlocally, one can express $R$ as $R = R_{+} + R_{-}$ ; then each $R_{\pm}$ has normal operator $R_{\pm}^{*} R_{\pm}$ with principal symbols one half of that $R^{*} R$ ; then we apply (4.43), and the combined result would be still the principal symbol of $R^{*} R$ .

Let us say that we have f supported in $B (0, R)$ with a certain semiclassical band limit $B ⩾ | ξ |$ . We take its Radon transform $R f$ . Here, f is not discretized, we can think of $R f$ as the physical X-ray transform. The assumption on the band limit will be satisfied if the X-rays are not really ideal lines but have some thickness. Then we sample $R f$ densely enough to satisfy the Nyquist requirements and add noise to it. The noise will have higher frequencies than those coming from f if $R f$ is oversampled. When we invert $R f$ , we will get higher frequencies for f as well that do not originally belong to the set where the frequency set of f lies. We can apply a filter, cutting them to $| ξ | ⩽ B$ . Note that this is a filter not affecting f, that is why we think of those as a unfiltered inversion. One way to do this is to restrict $\hat{p}$ to $| \hat{p} | ⩽ B$ before applying $R^{'}$ in (1.6).

More precisely, let $supp f \subset B (0, R)$ and $\begin{matrix} (5.4) & {WF}_{h} (f) \subset {(x, ξ); | x | ⩽ R, | ξ | ⩽ B} . \end{matrix}$ Then the range of the frequency sets $Σ (R f)$ of all such f’s (the projection of the semiclassical wave front set on the fiber variable) of $R f$ is the double cone $\begin{matrix} (5.5) & {(\hat{φ}, \hat{p}); | \hat{φ} | ⩽ R | \hat{p} |, | \hat{p} | ⩽ B}, \end{matrix}$ included in the box $B : = {| \hat{φ} | ⩽ R B, | \hat{p} | ⩽ B}$ , see Fig. 4 and [17] for more details. The set (5.5) is the “worst scenario case” over all points $(φ, p)$ . For $| p | ≫ 0$ , the opening of the cone is much smaller: $| \hat{φ} | ⩽ | \hat{p} | \sqrt{R^{2} - p^{2}}$ . We refer to [17] and Fig. 3 there. This describes the range of κ. Therefore, some portion of the noise will not propagate back to the reconstructed f.

We assume that we sample $g = R f$ at a rate smaller than the Nyquist requirement for the box $B$ . Moreover, we assume an interpolation kernel χ in (3.15) (with f replaced by g) is chosen so that $\hat{χ} = 1$ in a neighborhood of $B$ . As we explained in the introduction, we assume that the data is (white) noise, since the problem is linear. Then the power spectrum of the noise (more precisely, the Wigner function) converges in mean sense to a defect measure $d μ_{g}$ that is absolutely continuous by Theorem 4.1, i.e., it has the form $d μ_{g} = γ_{g} d x d ξ$ of the kind (4.7) on $B$ , with $γ_{g}$ as in (4.15). Then on $B$ , we have $γ_{g} = s^{n} σ^{2} / {(2 π)}^{n} = : γ^{♯}$ , and $\begin{matrix} (5.6) & γ_{h^{1 / 2} R^{- 1} g} (x, ξ) = \frac{| ξ |}{4 π} γ^{♯} for | ξ | ⩽ B . \end{matrix}$ This is “blue noise”. Here and below, all equalities about the statistics of f are in the limit sense of Theorem 4.1, see (4.6) and (4.8). An important observation is that there is no x dependence in this case. The dependence on ξ is rotationally invariant. This is not the case with the Radon transform in fan-bean coordinates as we will see below.

Fig. 4.

The frequency set of $R f$ .

By (5.2), $\begin{matrix} {VAR}_{Ω}^{0} (g) = {VAR}_{p, φ}^{0} (g) = 4 B_{φ} B_{p} γ^{♯}, \forall (φ, p) \in S^{1} \times [- R, R] . \end{matrix}$ The two variances are equal because $γ_{R^{- 1} g}$ is independent of the position.

Assume that the sampling rates of g are based on $B_{φ}$ and $B_{p}$ which take their sharp values not to allow undersampling: $B_{p} = B$ , $B_{φ} = R B$ , where B is the band limit of f as in (5.4). Then $\begin{matrix} {VAR}_{p, φ}^{0} (g) = 4 R B^{2} γ^{♯} . \end{matrix}$ Note that this is actually the sharp lower bound of the variation when the oversampling becomes asymptotically sharp sampling but it is not achievable in our theory; this would require a sinc interpolation while we need a rapidly decreasing kernel.

For the variance of $f = R^{- 1} g$ , we have, see Theorem 4.4 and Remark 4.3, $\begin{matrix} (5.7) & \begin{matrix} {VAR}_{x}^{0} (h^{1 / 2} R^{- 1} g) & = \int_{| ξ | ⩽ B} γ_{R^{- 1} g} (x, ξ) d ξ \\ = \frac{1}{4 π} γ^{♯} \int_{| ξ | < B} | ξ | d ξ = \frac{1}{4 π} γ^{♯} 2 π \int_{0}^{B} ρ^{2} d ρ \\ = \frac{B^{3} γ^{♯}}{6} . \end{matrix} \end{matrix}$ We get the following theorem.

Theorem 5.1 (unfiltered inversion).

Under the assumptions above, in particular assuming that g is white noise, and no undersampling, we have $\begin{matrix} (5.8) & {STD}^{0} (R^{- 1} g) = \frac{B^{3 / 2}}{\sqrt{24 B_{φ} B_{p} h}} {STD}^{0} (g) . \end{matrix}$ If $g = R f$ is sampled sharply, then $\begin{matrix} (5.9) & {STD}^{0} (R^{- 1} g) = {(\frac{B}{24 R h})}^{1 / 2} {STD}^{0} (g) . \end{matrix}$

Recall that we defined ${VAR}^{0}$ , see (4.6), and similarly, ${STD}^{0}$ , as integral of the defect measure. The implication of this theorem is that when we have g created by a white noise process, then for every $Q = q (h D)$ with $q = 0$ near the origin, ${STD}^{0} (h^{1 / 2} Q R^{- 1} g)$ converges in mean square sense to a quantity (see (5.13)), which itself converges to the r.h.s. of (5.8), respectively (5.9), when $q \to 1$ . In other words, the cutoff near $ξ = 0$ is removable at the expense of taking a double limit: first $h \to 0$ , then $q \to 1$ (in $L^{1}$ sense).

5.2. The filtered inversion

The Radon transform is inverted often with a low-pass filter before applying $R^{'}$ in (1.6), i.e., $\begin{matrix} (5.10) & f = \frac{1}{4 π} R^{'} ν (D_{p}) | D_{p} | g, \end{matrix}$ where ν is an even function decaying away from the origin. Assuming a band limit $B_{p}$ for the p variable, determined by the sampling rate $s_{p}$ , for example, one popular filter is the Hann filter: $\begin{matrix} (5.11) & ν_{Hann} (\hat{p}) = \frac{1}{2} (1 + cos \frac{π \hat{p}}{B_{p}}) = {cos}^{2} \frac{π \hat{p}}{2 B_{p}}, | \hat{p} | ⩽ B_{p}, \end{matrix}$ and $ν_{Hann} (\hat{p}) = 0$ otherwise. Another commonly used filter is the cosine one $\begin{matrix} ν_{cosine} (\hat{p}) = cos \frac{π \hat{p}}{2 B_{p}}, | \hat{p} | ⩽ B_{p} . \end{matrix}$ They are plotted in Fig. 5.

Fig. 5.

The Hann and the cosine filters with $B = 1$ .

There are many other filters (windows) used in signal processing and imaging. We assume that ν is continuous and supported in $| \hat{p} | ⩽ B_{p}$ . If the shape of the filter is fixed, say Hann, then $ν (t) = ν_{0} (t / B_{p})$ with some fixed $ν_{0}$ supported in $[0, 1]$ , see, e.g., (5.11). Then (5.3) takes the form $\begin{matrix} (5.12) & γ_{h^{1 / 2} R_{ν}^{- 1} g} (x, ξ) = \frac{| ξ | ν_{0}^{2} (| ξ | / B_{p})}{4 π} γ_{g} \circ κ (x, ξ), \end{matrix}$ where $R_{ν}^{- 1} = R ν (D_{p})$ is the filtered inversion, defined as the operator applied to g in (5.10). Then the equivalent to (5.6) is $\begin{matrix} γ_{h^{1 / 2} {R_{ν}}^{- 1} g} (x, ξ) = \frac{| ξ | ν_{0}^{2} (| ξ | / B_{p})}{4 π} γ^{♯} . \end{matrix}$ Taking $B_{p} = B$ as before, similarly to (5.7) we get the following analog of (5.7) $\begin{matrix} (5.13) & \begin{matrix} {VAR}_{x}^{0} (h^{1 / 2} {R_{ν}}^{- 1} g) & = \int_{| ξ | ⩽ B} γ_{R_{ν}^{- 1} g} (x, ξ) d ξ \\ = \frac{1}{4 π} γ^{♯} \int_{| ξ | < B} | ξ | ν_{0}^{2} (| ξ | / B) d ξ = \frac{1}{4 π} γ^{♯} 2 π \int_{0}^{B} ρ^{2} ν_{0}^{2} (ρ / B) d ρ \\ = \frac{1}{6} B^{3} γ^{♯} c_{ν}, \end{matrix} \end{matrix}$ where $\begin{matrix} (5.14) & c_{ν} : = 3 \int_{0}^{1} ρ^{2} ν_{0}^{2} (ρ) d ρ . \end{matrix}$ We proved the following.

Theorem 5.2 (filtered inversion).

Under the assumptions above, in particular assuming white noise and no undersampling, with a filter $ν_{0} (| D_{p} | / B)$ , we have $\begin{matrix} (5.15) & {STD}^{0} (R^{- 1} g) = \frac{B^{3 / 2} \sqrt{c_{ν}}}{\sqrt{24 B_{φ} B_{p} h}} {STD}^{0} (g) . \end{matrix}$ If $R f$ is sampled sharply, then $\begin{matrix} {STD}^{0} (R^{- 1} g) = {(\frac{B c_{γ}}{24 R h})}^{1 / 2} {STD}^{0} (g) . \end{matrix}$

If there is no filter ( $ν_{0} = 1$ ), we have $c_{ν} = 1$ , which explains the appearance of the factor $1 / 3$ in the definition of $c_{ν}$ . For the Hann filter, $c_{ν} = 3 / 8 - 45 / (16 π^{2}) \approx 0.0900$ , then $\sqrt{c_{ν}} \approx 0.3000$ . For the cosine filter, $\sqrt{c_{ν}} \approx 0.4427$ . In (5.19) below, the constant would be approximately 0.07676 for the Hann filter and 0.11327 for the cosine one.

5.3. Numerical experiments

We use MATLAB and the built in radon and iradon routines to compute and invert numerically the Radon transform in the plane. The default angular step is one degree but it can be changed. Assume that f is given on an $N \times N$ lattice. Then by default, radon computes $R f (φ, p)$ on a $360 \times N \sqrt{2}$ lattice, with $N \sqrt{2}$ rounded; the actual formula is $2 ceil (\sqrt{2} (N - floor ((N - 1) / 2) - 1)) + 3$ . Then iradon inverts the data to the original grid (with N replaced by $N + 1$ or $N + 2$ which does not matter in view of our asymptotic setup).

As we showed in [17], this choice of the discretization of $R f$ is suboptimal for $N ≫ 1$ ; we need to compute $R f$ on an $N_{φ} \times N_{p}$ lattice with $N_{p} = 2 N$ , $N_{φ} = 2 π N$ at least, and some oversampling would be beneficial, see Fig. 6. With most test images, the (dominating) frequencies are well below the Nyquist limit, that is why most of the time the inversion is satisfactory. When we add, say white noise, the Nyquist limit is reached, and the inversion with iradon will alias some of those frequencies.

Fig. 6.

The sampling sets of f, $R f$ and the reconstructed f with sharp sampling requirements.

5.4. Discretization

Let us say we have f on an $M \times N$ grid. We think of that as discrete samples of f originally defined on, say, $[- a, a] \times [- b, b]$ . This we have the steps $s_{x_{1}} = 2 a / M$ , $s_{x_{2}} = 2 b / N$ . Assume for a moment that we apply the classical sampling theory (no small parameter h) in a formal way at this point. Then those steps have to be $π / B_{x_{1}}$ , respectively $π / B_{x_{2}}$ at most, where $B_{x_{j}}$ are the band limits in the $x_{j}$ variable. Then we get $B_{x_{1}} = M π / (2 a)$ , $B_{x_{2}} = N π / (2 b)$ as the least upper bounds of the band limits of f. For the band limit of $| ξ |$ , we have $B = {(B_{x_{1}}^{2} + B_{x_{2}}^{2})}^{1 / 2}$ , and the maximum is achieved at the vertices of the box $[- B_{x_{1}}, B_{x_{1}}] \times [- B_{x_{2}}, B_{x_{2}}]$ . Note that the disk $| ξ | ⩽ B$ contains more frequencies than can be properly sampled on the $M \times N$ grid; the extra ones lie outside that inscribed box.

We can connect the classical sampling theory to the semiclassical one as follows. Denote for a moment the semiclassical quantities with tildes over them. Let $M = \tilde{M} / h$ , $N = \tilde{N} / h$ , with $\tilde{M}$ , $\tilde{N}$ fixed. The steps s ( $s_{x_{1}}$ , etc.) are equal to the semiclassical relative steps $\tilde{s}$ but since in our sampling theorems the absolute steps are $s h$ , this means that the absolute steps are multiplied by h. Then our analysis holds as $h \to 0$ , i.e., as $M \to \infty$ , $N \to \infty$ (keeping the ratio constant) and the steps going to zero at a rate $\sim h$ . This is the usual setup in numerical analysis where $\tilde{s} = 1$ , i.e., the step is h.

For each such f we define the $L^{2}$ norm as $\begin{matrix} ‖ f ‖^{2} = \frac{4 a b}{M N} \sum_{i = 1}^{M} \sum_{j = 1}^{N} | f_{i j} |^{2} . \end{matrix}$ This is consistent with formula (16) in [17] and approximates the $L^{2}$ norm of a continuous function on that box with samples $f_{i j}$ . Then $\begin{matrix} STD (f) = {(\frac{1}{M N} \sum_{i = 1}^{M} \sum_{j = 1}^{N} | f_{i j} |^{2})}^{\frac{1}{2}} = \frac{‖ f ‖}{2 \sqrt{a b}} \end{matrix}$ is the standard deviation $STD (f)$ of f when the mean of f is zero.

We will apply this to both f defined on ${[- a, a]}^{2}$ for some $a > 0$ , and to $R f$ on $[- π, π] \times [- R, R]$ .

Assume that g is a discrete representation of a function on $[- π, π] \times [- R, R]$ sampled on an $N_{φ} \times N_{p}$ lattice. Assume g is obtained by a white noise process (with zero mean) and variance $σ^{2}$ . Then a slight extension of Lemma 4.1 shows that $VAR (g) \to σ^{2}$ almost surely.

The sampling steps are $s_{φ} = 2 π / N_{φ}$ , $s_{p} = 2 R / N_{p}$ ; hence to avoid aliasing, we need $B_{φ} ⩽ N_{φ} / 2$ , $B_{p} ⩽ π N_{p} / (2 R)$ .

Let f, to which $R$ will be applied, represent a discretization of a function on ${[- a, a]}^{2}$ , and assume that it is sampled on an $N \times N$ lattice. Then, similarly, the sharp band limit in each variable is $B_{x_{1}} = B_{x_{2}} = π N / (2 a)$ .

As we showed in [17], and it follows easily from (5.5), to avoid aliasing, we need $\begin{matrix} (5.16) & N_{p} ⩾ 2 N, N_{φ} ⩾ 2 π N . \end{matrix}$ This inequality, as well as the inequalities and the equalities below are meant in asymptotic sense, i.e., one should multiply, say the r.h.s. in this case by $(1 + o (1))$ , as $N \to \infty$ . Note that (5.16) follows from viewing f as supported in $B (0, \sqrt{2} a)$ , i.e., $R = \sqrt{2} R$ , with frequency set in $| ξ | ⩽ B : = \sqrt{2} B_{x_{1}}$ . As we mentioned above, that ball contains more frequencies than those in its inscribed square. For every $g = R f$ in the range of $R$ with f as above, after an inversion we get f, of course, and then the frequencies will fall inside the inscribed square ${[- B_{x_{1}}, B_{x_{1}}]}^{2}$ . If we take g to be “noise”, not in the range of $R$ , then by the mapping property of $κ^{- 1}$ , see [17], formula (51), the frequency set of $R^{- 1} g$ will generically fill the disk $| ξ | ⩽ B = B_{x_{1}} \sqrt{2}$ . If we want to avoid aliasing (without applying a filter), we would need to reconstruct $f = R^{- 1} g$ on an $N \sqrt{2} \times N \sqrt{2}$ grid or better. On the other hand, for all practical purposes, we would want to apply a filter.

Therefore, the discrete version of (5.15), including a filter now, is $\begin{matrix} (5.17) & VAR (R_{ν}^{- 1} g) = \frac{π^{2} N^{3} c_{ν}}{12 a^{2} N_{φ} N_{p}} VAR (g), \end{matrix}$ where the formula has the same asymptotic and probabilistic meaning as explained after Theorem 5.1.

Assume now that we sample sharply, i.e., we have equalities in (5.16). Then $N_{p} = 2 N$ , $N_{φ} = 2 π N$ and we get $\begin{matrix} (5.18) & VAR (R_{ν}^{- 1} g) = \frac{π N c_{ν}}{48 a^{2}} VAR (g) . \end{matrix}$ Therefore, $\begin{matrix} (5.19) & STD (R_{ν}^{- 1} g) \approx 0.2558 \sqrt{c_{ν}} \frac{\sqrt{N}}{a} STD (g) . \end{matrix}$

We can make the following conclusions from (5.17), (5.18) and (5.19).

With a sharp sampling rate, the noise ratio, measured as its standard deviation relative to that of g, increases as $\sqrt{N}$ . This is understandable since we are allowing for higher frequencies, and $R^{- 1}$ is of order $1 / 2$ . At the same time, we can handle f with higher frequencies because N is proportional to the Nyquist bound.

The noise ratio, for a fixed N, is minimized when we sample sharply.

In many applications, increasing $N_{ϕ}$ and $N_{p}$ decreases the size of the detectors, and then the discrete samples $g_{i j}$ are scaled down by constants times $N_{ϕ}$ and $N_{p}$ . If the added noise is expressed in units relative to that, then the quotient in (5.17) would be proportional to $N^{3} N_{φ} N_{p}$ , i.e., the noise ratio increases with $N_{ϕ}$ and $N_{p}$ . This is known in engineering.

Fig. 7.

Left: $| \hat{f} |$ where $f = R^{- 1} g$ and g is white noise. Right: radial profile of $| \hat{f} |^{2}$ from the center to one of the sides (but not all the way along the diagonal to a vertex).

Default iradon inversion. First we present an inversion with the default one degree angular step. We choose $N = 601$ , $N_{φ} = 360$ by default and $N_{p} = 853$ is chosen by radon as an approximation to $601 \sqrt{2}$ . We choose g to be normally distributed (Gaussian) noise with standard deviation one. Then we invert it with iradon. A plot of the modulus $| \hat{f} |$ of the Fourier transform $\hat{f}$ of the inversion f is shown in Fig. 7. We chose to plot here and below $| \hat{f} |$ rather than $| \hat{f} |^{2}$ for clarity. With an exact inversion, as $N \to \infty$ , we should be seeing a density plot of square root of (5.6), i.e., $c | ξ |^{1 / 2}$ , filling the whole square. We see is that the density increases in the radial variable $| ξ |$ from the center but at some point starts to decease until it visibly becomes zero when $| ξ |$ is slightly larger than a half of the side, and it is radially symmetric. This behavior can be explained by the following. The default choice $N_{p} = N \sqrt{2}$ (rounded) of $N_{p}$ actually lowers the Nyquist limit of the reconstructed f to $1 / \sqrt{2}$ of its original value. Without that, the boundary of the disk in Fig. 7 would be the circumscribed circle of that square but with that choice, it is the inscribed one. The gradual decrease close the border can be explained by an effectively low pass filter when inverting $R$ . Our numerical experiments below at much higher resolutions for $R f$ confirm that.

A similar experiment with a uniformly distributed noise g in a symmetric interval around the origin produces virtually the same plot of $| \hat{f} |$ , not shown. In both cases, the values of f look normally distributed.

High precision inversion. We present numerical inversions with a proper discretization. We want to model adding noise to discrete measurements of the “continuous” $R f$ ; inverted with high precision; i.e., by upsampling first the discrete data several times to mimic inversion in the “continuous domain”. We do the following.

The function f is assumed to be defined on ${[- 1, 1]}^{2}$ and sampled on an $N \times N$ lattice.

We compute a high accuracy $R f$ on a $N_{φ} \times N_{p}$ lattice, where $N_{φ} ⩾ 2 π N$ , $N_{p} ⩾ 2 N$ . To do that, we perform the computations on a finer grid.

We add noise to the so-computed $R f$ .

We invert the noisy data by upsampling it first. The reconstructed $f_{noisy}$ is either left sampled on a finer grid or downsampled to the original $N \times N$ one.

We give more details below. To do (ii), we upsample f on an $m N \times m N$ lattice with Lanczos-3 with some $m ⩾ 1$ . Typical m’s we use are $m = 2$ and $m = 3$ . Then we compute $R f (φ, p)$ with radon on a $2 π m N \times 2 m N$ lattice which we view as $R f (φ, p)$ on $[- π, π] \times [- \sqrt{2}, \sqrt{2}]$ sampled uniformly in each variable. The parameter m represents the degree of oversampling: $m = 1$ corresponds to the sharp lower bound for proper sampling. Since computing $R f$ involves interpolation of f for computing the line integrals (we use the option ‘spline’ in radon), such an oversampling allows us to reduce the errors in such interpolation compared to the sinc inversion. Then we downsample the computed $R f$ to a lower resolution $N_{φ} \times N_{p}$ (without interpolation; we take every m-th value in each row and column). This simulates a high precision $R f$ computed on the $N_{φ} \times N_{p}$ grid. To do (iii), we add noise.

In (iv), we invert $R$ on that lattice. We could resize to a different (but high enough resolution) before that but the results do not look much different. The resulting $f = R^{- 1} g$ is computed on an $m N \times m N$ lattice, which is viewed as f on ${[- 1, 1]}^{2}$ sampled uniformly. If needed, that f could be resampled to an $N \times N$ lattice but since it does not contain frequencies higher than the Nyquist limit $B = N π / 2$ corresponding to N, this is not needed for computing the standard deviation, for example.

We want to emphasize that it is possible to do (close to) ideal upsampling, say from $N_{x} \times N_{y}$ to ${\tilde{N}}_{x} \times {\tilde{N}}_{y}$ with ${\tilde{N}}_{x} > N_{x}$ and ${\tilde{N}}_{y} > N_{y}$ which preserves the band limits $B_{x_{1}} = N_{x} π / 2$ and $B_{x_{2}} = N_{y} π / 2$ by using the Fourier transform. On the other hand, this is not what is usually done. When we use Lanczos-3, for example, the interpolation kernel is the inverse Fourier transform of a smoothened version of $ν_{[- 1, 1]}$ , see Fig. 1, which is close to be equal to one in $[- 0.5, 0.5]$ at least as explained in Section 3.3. On the other hand, Theorem 3.1 in [17] requires some oversampling, and an interpolation kernel to be the Fourier transform of a function similar to that in Fig. 1, equal to one on the (smaller) frequency band. Therefore if we choose $m ⩾ 2$ we are in this regime.

To do experiments with noise only, we take $f = 0$ in (1.2). Then steps (i) and (ii) are trivial, since $R f = 0$ . So our starting point is (iii), where we take g to be generated by either a normally or a uniformly distributed noise, on an $N_{φ} \times N_{p}$ grid. We upsample by a factor of m, i.e., to an $m N_{φ} \times m N_{p}$ grid an do the inversion there. We take $m = 2, 3, 4, 5$ in our experiments.

Non-filtered inversion. We test (5.8) now. To this end, we take g to be either Gaussian or uniformly distributed noise with zero mean on an $N_{φ} \times N_{p}$ grid as in (ii), with equalities there, i.e., $N_{φ} = 2 π N$ , $N_{p} = 2 N$ . Then we cut the Fourier transform of the result sharply to $1 / m$ -th of the frequency box corresponding to the original resolution $N_{φ} \times N_{p}$ , $m = 2, 3, \dots$ ; denote this by $ν_{m} (D) g_{m}$ , and apply $R^{- 1}$ to it without changing the grid size. This procedure provides more precise computation than just inverting the noise because it avoids the smoothing which happens in the part we cut off. If we had $R f$ of a non-zero f polluted with noise, we would have upsized the data m times in each dimension first, and then would have performed that procedure.

Table 1

Noise experiments

Noise ratio with Gaussian noise. Theoretical ratio: $0.2558$

	$N = 100$	$N = 200$	$N = 300$
$m = 1$	$0.2224 \pm 0.61 %$	$0.2223 \pm 0.32 %$	$0.2226 \pm 0.16 %$
$m = 2$	$0.2552 \pm 0.38 %$	$0.2572 \pm 0.25 %$	$0.2578 \pm 0.17 %$
$m = 3$	$0.2569 \pm 0.70 %$	$0.2584 \pm 0.41 %$	$0.2591 \pm 0.07 %$

Fig. 8.

Top: we choose $g_{m}$ , $m = 2$ to be white noise; then $| R^{- 1} g_{m} |$ looks like in Fig. 7. Bottom: $ν_{m} {\hat{g}}_{m}$ and $R^{- 2} ν_{m} g_{m}$ , i.e., the Fourier transform of the reconstruction after the frequency cut-off of the noise.

Since we effectively multiply both $N_{φ}$ and $N_{p}$ by m, by (5.17), we see that (5.19) can be written in terms of the noise ratio as $\begin{matrix} (5.20) & Noise ratio : = m STD (R^{- 1} g) / \frac{\sqrt{N}}{a} STD (g) \approx 0.2558. \end{matrix}$ We take g first to be a Gaussian noise with several choices of N and m; doing five experiments for each choice. The results are in Table 1 and in Fig. 8, we illustrate the inversion with $m = 2$ . Similar experiments with a uniformly distributed noise with mean zero generate similar numbers, not shown.

Filtered inversion. We perform similar experiments with the Hann and the cosine filter. Since the Hann filter is very small near the band limit B, see Fig. 5, the smoothing effect of the interpolation used by iradon, see Fig. 7, plays a negligible role. Modeling that smoothening by the Lanczos-3 profile, for example, see Fig. 1, by introducing an extra factor in (5.14) shows an error of less than $1 %$ in $\sqrt{c_{ν}}$ . Then even with $m = 1$ , we get a result close to the theoretical one, which is approximately 0.07676 for the Hann filter and 0.11327 for the cosine one, as we computed above. For $N = 300$ , for example, we get $0.0767 \pm 0.11 %$ for Hann and $0.1105 \pm 0.21 %$ for cosine, where the smoothing effect of iradon is a bit less compensated for. The numbers for normally and uniformly distributed noise are very close. For the cosine filter, we plot $| \hat{f} |$ (instead of $| \hat{f} |^{2}$ for clarity), the computed radial profile of $| \hat{f} |^{2}$ , and its theoretical one $ρ ν_{cosine}^{2} (ρ) = ρ {cos}^{2} (π ρ / 2)$ in Fig. 9 below. The radial profile is computed as $| \hat{f} |^{2}$ averaged over 25 concentric rings. In this case, $| \hat{f} (ξ) |^{2}$ is proportional to the microlocal defect measure of f at any fixed x (it does not depend on x). The Hann filter behaves similarly, with the computed radial profile of $| \hat{f} |^{2}$ very close to its theoretical one $ρ {cos}^{4} (π ρ / 2)$ .

Fig. 9.

Cosine filter. Left: $| \hat{f} |$ where $f = R^{- 1} ν_{cosine} g$ and g is white noise. Center: the computed radial profile of $| \hat{f} |^{2}$ from the center to one of the sides. Right: the theoretical profile $ρ ν_{cosine}^{2} (ρ) = ρ {cos}^{2} (π ρ / 2)$ .

5.5. Percentage of added noise

In many numerical simulations, we add noise to the data, as a percentage of a certain norm of the data, and measure the percentage of the noise in the reconstruction. This is especially interesting in (mildly or not) ill-posed problems.

There is a lot of flexibility in choosing those norms. Let us say that we choose the $L^{2} (B (0, R))$ norm for f and the $L^{2} (S^{1} \times (- R, R))$ norm for $R$ . Then the left inverse $R^{- 1}$ is not bounded in those spaces but on semiclassically bounded functions (which are smooth), it is; we refer to [17] for semiclassical estimates.

Let $g_{noise}$ be the noise added to $g = R f$ , see (1.2). Its percentage is given my $‖ g_{noise} ‖ / ‖ R f ‖$ (converted to percentage). We are interested in $‖ f_{noise} ‖ / ‖ f ‖$ , where $f_{noise} = R^{- 1} g_{noise}$ is the noise in the reconstruction. We have $\begin{matrix} (5.21) & \frac{‖ f_{noise} ‖}{‖ f ‖} = K \frac{‖ g_{noise} ‖}{‖ R f ‖}, K : = \frac{‖ f_{noise} ‖}{‖ g_{noise} ‖} \cdot \frac{‖ R f ‖}{‖ f ‖} . \end{matrix}$ The coefficient K is the multiplier which relates the two percentages. Its first factor is proportional to the noise ratio we studied earlier since the $L^{2}$ norms are proportional to the standard deviations. The second one depends on f. To analyze it, write $\begin{matrix} ‖ R f ‖^{2} ⟶ \int γ_{R f} d φ d p d \hat{φ} d \hat{p}, \end{matrix}$ where the convergence is in the sense of Theorem (4.1). Then we integrate over the semiclassical wave front. By (4.41), $\begin{matrix} ‖ R f ‖^{2} & ⟶ \int (b γ_{f}) \circ κ^{- 1} d φ d p d \hat{φ} d \hat{p} = 4 π \int \frac{γ_{f} \circ κ^{- 1}}{| \hat{p} |} d φ d p d \hat{φ} d \hat{p} \\ = 4 π \int \frac{γ_{f} (x, ξ)}{| ξ |} d x d ξ = 4 π {‖ | D |^{- 1 / 2} f ‖}^{2} . \end{matrix}$

We used again the fact that κ is an isometry. This works for general operators but for $R$ we actually know that $4 π ‖ f ‖^{2} = ‖ | D_{p} |^{1 / 2} R f ‖^{2}$ . We can write $\tilde{f} = | D |^{1 / 2} f$ , intertwine $| D |^{1 / 2}$ with $| D_{p} |^{1 / 2}$ , to get the formula above as an exact one, not just a limit. Therefore, (5.21) yields $\begin{matrix} (5.22) & K = \frac{‖ f_{noise} ‖}{‖ g_{noise} ‖} \cdot \frac{4 π ‖ | D |^{- 1 / 2} f ‖}{‖ f ‖} . \end{matrix}$ Since the noise ratio is independent of f, we see that K would be large if, roughly speaking, f is low frequency. Most conventional images (with $f ⩾ 0$ ) have a very large zero frequency $\hat{f} (0)$ relative to the rest of the spectrum and the second quotient in (5.22) does not vary much. When $\int f (x) d x = 0$ , we have $\hat{f} (0) = 0$ and functions the variation of this quotient is higher. Then we do not need to isolate the zero section.

In Fig. 10 we demonstrate this effect. We choose $N = 300$ and the dimensions of the grid for $R f$ is chosen with equalities in (5.16), see also Fig. 6. We add the same amount of normally distributed noise, $20 %$ of $‖ R f ‖$ , to $R f$ . We measure different percentages of added noise to the reconstructed f depending on the frequency distribution of f, i.e., on the ratio in (5.22). Images with mostly lower frequencies suffer from noise more. On the other hand, given the a priori knowledge of their frequency band, that noise can be filtered out, unless we are looking for small high frequency detail in an overly lower frequency image. We chose non-negative f’s in that figure only. Numerical experiments with f of mean value zero show lower added noise on a few examples. If in Fig. 10(c) we allow random positive and negative amplitudes as well, for example (not shown), the added noise in (g) drops to $41 %$ . It is worth mentioning that with many conventional images, the values we are getting are close. In fact, statistically, such images share similar power spectra distributions [18].

Fig. 10.

Top: four different choices of $f ⩾ 0$ , $N = 300$ . Bottom: f reconstructed with $20 %$ noise added to $R f$ . The numbers show the added noise to f, and $‖ R f ‖ / ‖ f ‖$ .

Therefore, measuring the sensitivity of a particular inversion to noise this way can be quite misleading. The added noise to the image depends on the noise ratio (5.20) which in turn depends on the grid chosen to discretize $R f$ ; and also depends on the choice of the test image.

6. The Radon transform

R

in the plane in fan-beam coordinates

6.1. $R$ in fan-beam coordinates

We parametrize $R$ by the so-called fan-beam coordinates. Recall (5.1). Each line is represented by an initial point $R ω (α)$ on the boundary of $B (0, R)$ , where f is supported, and by an initial direction making angle β with the radial line through the same point, see Fig. 11. It is straightforward to see that this direction is given by $ω (α + β)$ . Then the lines through $B (0, R)$ are given by $\begin{matrix} (6.1) & x \cdot ω (α + β - π / 2) = R sin β, α \in [- π, π], β \in [- π / 2, π / 2] . \end{matrix}$

The canonical relation is the union of the graphs of $κ_{\pm}$ , see [17], given by $\begin{matrix} β = \pm {sin}^{- 1} \frac{x \cdot ξ}{R | ξ |}, α = arg ξ - β \pm \frac{π}{2} \hat{α} = x \cdot ξ^{⊥}, \hat{β} = \pm | ξ | \sqrt{R^{2} - {(x \cdot ξ / | ξ |)}^{2}} + \hat{α} . \end{matrix}$ Then $κ_{\pm}$ are isomorphic under the symmetry mentioned above lifted to the tangent bundle $\begin{matrix} (α, β, \hat{α}, \hat{β}) ⟼ (α + 2 β - π, - β, \hat{α}, 2 \hat{α} - \hat{β}) . \end{matrix}$ The inverses $κ_{\pm}^{- 1}$ are given by $\begin{matrix} (6.2) & x = R sin β ω (α + β - π / 2) - \frac{\hat{α}}{\hat{β} - \hat{α}} R cos β ω (α + β), ξ = \frac{\hat{β} - \hat{α}}{R cos β} ω (α + β - π / 2) . \end{matrix}$ In particular, we recover the well known fact that κ is 1-to-2, as in the previous case.

Fig. 11.

The fan-beam coordinates.

Set $(φ, p) = Φ (α, β)$ , where $\begin{matrix} φ = α + β - π / 2, p = R sin β . \end{matrix}$ We have $det d Φ = R cos β$ . Then $R_{FB} = R \circ Φ$ . To compute $R_{FB}^{*} R_{FB}$ , write $\begin{matrix} (R_{FB}^{*} R_{FB} f, f) = \int {| R_{FB} f (α, β) |}^{2} d α d β = \int {| R f (φ, p) |}^{2} \frac{1}{cos β} d φ d p . \end{matrix}$ Since $sin β = p / R$ , we have $cos β = \sqrt{1 - p^{2} / R^{2}}$ . Therefore, $\begin{matrix} R_{FB}^{*} R_{FB} = R^{*} {(1 - p^{2} / R^{2})}^{- 1 / 2} R . \end{matrix}$ The factor in the middle of the r.h.s. is a multiplication operator, and applying Egorov’s theorem (one can actually do it even directly and without a remainder), one gets for the principal symbols, at least, $\begin{matrix} σ_{p} (R_{FB}^{*} R_{FB}) = {(1 - \frac{{(x \cdot ξ)}^{2}}{R^{2} | ξ |^{2}})}^{- 1 / 2} σ_{p} (R^{*} R) = \frac{4 π}{| ξ |} {(1 - \frac{{(x \cdot ξ)}^{2}}{R^{2} | ξ |^{2}})}^{- 1 / 2} . \end{matrix}$ The equivalent to (5.3) then is $\begin{matrix} (6.3) & γ_{h^{1 / 2} R_{FB}^{- 1} g} (x, ξ) = \frac{| ξ |}{4 π} {(1 - \frac{{(x \cdot ξ)}^{2}}{R^{2} | ξ |^{2}})}^{1 / 2} γ_{g} \circ κ_{FB} (x, ξ), ξ \neq 0 . \end{matrix}$ Therefore, the noise spectral distribution depends on x now, and it depends on the direction of ξ relative to x. For x, $| ξ |$ fixed, it is maximized when $ξ ⊥ x$ , and minimized when $ξ ∥ x$ .

6.2. Sampling

As above, if $supp f \subset {[- 1, 1]}^{2}$ is sampled on an $N \times N$ grid, we have $B_{x_{1}} = B_{x_{2}} = N π / 2$ . As before, set $B = \sqrt{2} B_{x_{1}} = N π / \sqrt{2}$ . Then we consider f having ${WF}_{h} (f)$ in $B (0, R) \times B (0, \sqrt{2} B)$ with $R = \sqrt{2}$ . The image of this product under the canonical map, projected to the dual variable $(\hat{α}, \hat{β})$ has the following smallest box containing it: $[- R B, R B] \times [- 2 R B, 2 R B]$ , see [17]. The means taking at least $2 R B \times 2 R B$ , i.e., $2 N π \times 2 N π$ samples over the intervals indicated in (6.1). Compared to (5.16), this requires π times the number of samples, which makes it a less efficient sampling geometry, as shown in [17].

In Fig. 12, we present a numerical experiment to validate (6.3). We take g to be Gaussian noise and invert it with $R_{FB}$ . Then we crop a small rectangle in the top left corner and take the modulus of its Fourier Transform. Then x is close to $x_{0} = 0.8 (- 1, 1)$ and the small black elongated oval in the center has a major axis along the same vector, formula (6.3) predicts.

Fig. 12.

Spectral density of the noise in f with the Hann filter. Left: measured in the top left corner. Right: theoretical profile (6.3) at that corner.

6.3. Noise ratio

We study the noise ratio with a filtered inversion. In ifanbeam in MATLAB, for example, $R_{FB} f$ is converted to parallel coordinates and the filter is applied after that. By (6.2), the filter $ν (\hat{p})$ , with ν even, takes the form $F : = ν (| \hat{β} - \hat{α} | / (R B cos β))$ , where B is the band limit of $| ξ |$ . The inversion operator then is $R_{FB, ν}^{- 1} = R_{FB}^{- 1} F$ which equals ${(R_{FB, ν}^{*} R_{FB, ν})}^{- 1} ν (| D |) R_{FB}^{*}$ modulo lower order operators by Egorov’s theorem. We get, similarly to (5.12) that (6.3) modifies as $\begin{matrix} γ_{h^{1 / 2} R_{FB, ν}^{- 1} g} (x, ξ) = \frac{| ξ |}{4 π} {(1 - \frac{{(x \cdot ξ)}^{2}}{R^{2} | ξ |^{2}})}^{1 / 2} ν_{0}^{2} (| ξ | / B) γ_{g} \circ κ_{FB} (x, ξ), ξ \neq 0 . \end{matrix}$

Assume that g is oversampled (related to B, see [17] for the sampling requirements), and it is white noise. Then the variance at a point, see (2.8) is given by $\begin{matrix} {VAR}_{x} (h^{1 / 2} f) & = γ^{♯} \int_{| ξ | ⩽ B} \frac{| ξ |}{4 π} {(1 - \frac{{(x \cdot ξ)}^{2}}{R^{2} | ξ |^{2}})}^{1 / 2} ν_{0}^{2} (| ξ | / B_{p}) d ξ, \\ = \frac{γ^{♯} B^{3}}{4 π} \int_{0}^{2 π} \int_{0}^{1} {(1 - \frac{| x |^{2}}{R^{2}} {cos}^{2} θ)}^{1 / 2} ρ^{2} ν_{0}^{2} (ρ) d ρ d θ \\ = \frac{γ^{♯} B^{3}}{6} \frac{c_{ν}}{2 π} \int_{0}^{2 π} {(1 - \frac{| x |^{2}}{R^{2}} {cos}^{2} θ)}^{1 / 2} d θ, \end{matrix}$ compare with (5.13). The integral is of elliptic type and varies between $2 π$ , when $| x | = 0$ , and 4 when $| x | = R$ . To connect this to (5.13), the integrand in (5.13) there corresponds to $| x | = 0$ formally; and then we get (5.13). Taking a square root, we see that the standard deviation would be higher in the center, the same as in the parallel geometry case, and will decrease slightly to about $80 %$ at $| x | = R$ , which corresponds to the four corners of the square in our numerical simulations.

Similarly to (5.7) and (5.13), we integrate over x in the inscribed disk $| x | ⩽ 1$ in ${[- 1, 1]}^{2}$ and divide by its area π to get the variation in that disk. Then $R = \sqrt{2}$ and $\begin{matrix} {VAR}_{B (0, 1)} (f) \approx \frac{2.93}{π} \frac{γ^{♯} B^{3}}{6 h} c_{ν} \approx 0.9328 \frac{γ^{♯} B^{3}}{6 h} c_{ν} . \end{matrix}$ This is within 6-7% of the parallel geometry variance, and about $3 %$ difference for the standard deviation.

7. Non-additive noise

In this section we discuss some types of non-additive noise. The exposition here will be more sketchy, we will point out how to fit those cases into the general framework we developed but will not go into detail.

7.1. Multiplicative noise

Assume the data $g = A f$ is subject to a multiplicative noise. This can happen if the detectors are not perfectly calibrated and each one reports a signal somewhat larger or smaller than it should be (non-uniform response). In imaging systems, photo response non-uniformity (PRNU) is an example of such noise. A generic way to model the kind of multiplicative noise we have in mind is the following: consider a sequence of discrete noise samples ${w_{k, h}; h > 0, k \in K (h)}$ , where $\begin{matrix} (7.1) & w_{k, h} = 1 + f_{k, h}, or w_{k, h} = exp (f_{k, h}), \end{matrix}$ and ${f_{k, h}; h > 0, k \in K (h)}$ is the white noise considered in Hypothesis 4.1. Then we set $\begin{matrix} (7.2) & g_{noise} (x) = \sum_{k \in K (h)} w_{k, h} g (s h k) χ_{k} (x), χ_{k} (x) = χ (\frac{1}{s h} (x - s h k)), \end{matrix}$ where $w_{k, h}$ are the discrete noise samples, playing the role of $w_{noise}$ above, and g is the noise-free continuous signal. We will compare the noise $g_{h} (x)$ defined by (7.2), i.e., by the first formula in (7.1) (the second one can be treated similarly and one has to take into account that $w_{k, h}$ is not necessarily centered), to a noise of the form $\begin{matrix} (7.3) & {\tilde{g}}_{noise} (x) : = \sum_{k \in K (h)} w_{k, h} g (x) χ_{k} (x) = g (x) \sum_{k \in K (h)} w_{k, h} χ_{k} (x) . \end{matrix}$ We have $\begin{matrix} {\tilde{g}}_{noise} (x) - g_{noise} (x) = \sum_{k \in K (h)} w_{k, h} (g (x) - g (s h k)) χ_{k} (x) . \end{matrix}$ Since $\begin{matrix} | g (x) - g (s h k) | ⩽ C | x - s h k |, \end{matrix}$ with $C ⩽ C_{0} ‖ \nabla g ‖_{L^{\infty}}$ , we get $\begin{matrix} | (g (x) - g (s h k)) χ_{k} (x) | ⩽ C s h max (| x | | χ (x) |) ⩽ C^{'} h . \end{matrix}$ That factor h allows us to estimate, using Proposition 3.2, the error when replacing $g (s h k)$ in (7.2) by $g (x)$ . We would get an $O (h)$ error. The problem here is that we want to apply this to $g = A f$ , all dependent on h, and in general, $\nabla A f$ grows like $h^{- 1} | A f |$ . This cancels the decay above. If we oversample a lot, the error will be “small”. Also, in regions with ${WF}_{h} (A f)$ far away from the Nyquist limit, that term will be small. If we ignore it for a moment, the noise added to $A f$ is ${\tilde{g}}_{h}$ given by (7.3). It is white noise as above but multiplied by $g = A f$ . The defect measure of the noise added to the data then is like in (4.15) with the additional factor $| A f (x) |^{2}$ .

One important case which allows to overcome the difficulty above is when $g = ψ_{h} * g_{0}$ , where $ψ_{h} (x) = h^{- n} ψ (x / h)$ with $\int ψ = 1$ (a Friedrichs mollifier) with $\hat{ψ} \in C_{0}^{\infty}$ . This corresponds to averaged measurements of an h-independent function $g_{0}$ . We refer to [17] for the sampling theory for such measurements. Then $\nabla g = ψ_{h} * \nabla g_{0}$ , and assuming $g_{0} \in C^{1}$ (either h-independent or uniformly bounded there in h), we have $| \nabla g | ⩽ C$ (rather than $C / h$ ), which is the estimate we needed in the previous paragraph. Then the machinery we developed works and we need to multiply the noise measure by the additional factor $| g_{0} (x) |^{2}$ in (4.15), i.e., we get there $\begin{matrix} d μ_{g_{noise}} (x, ξ) = \frac{s^{n}}{{(2 π)}^{n}} σ^{2} {| g_{0} (x) |}^{2} {| \hat{ψ} (ξ) \hat{χ} (s ξ) |}^{2} d x d ξ \end{matrix}$ under the assumption $g = ψ_{h} * g_{0}$ in (7.2). Then (5.13) takes the form $\begin{matrix} (7.4) & \begin{matrix} {VAR}_{x}^{0} (h^{1 / 2} {R_{χ}}^{- 1} g_{noise}) & = \frac{1}{4 π} γ^{♯} \int_{| ξ | < B} {| R f (x \cdot ξ / | ξ |, arg (ξ)) |}^{2} | ξ | {| ν_{0} (| ξ | / B) \hat{ψ} (ξ) |}^{2} d ξ \\ = \frac{1}{4 π} γ^{♯} \int_{0}^{2 π} \int_{0}^{B} {| R f (x \cdot θ, θ) |}^{2} ρ^{2} {| \hat{ψ} (ρ θ) |}^{2} ν_{0}^{2} (ρ / B) d ρ d θ . \end{matrix} \end{matrix}$ This shows that the standard deviation of the noise at x depends in particular on the line integrals of f along lines thorough x. Line integrals with large values would create stronger noise at x.

7.2. Modeling noise in CT scan

In CT scan tomography, what is measured is the attenuation along each ray. If $I_{0}$ is the initial intensity, and I is the one after the ray crosses the object, then the measurement is $I = exp (- R f) I_{0}$ , by the Beer-Lambert law. Assuming an additive noise $g_{noise}$ , we measure $I_{noisy} = exp (- R f) I_{0} + g_{noise}$ . If we invert this the same way as if there were no noise (which may not be the best strategy), we would get $\begin{matrix} f_{noisy} = - R^{- 1} log (I_{noisy} / I_{0}) = - R^{- 1} log (exp (- R f) + g_{noise} / I_{0}) . \end{matrix}$ Obviously, increasing $I_{0}$ will decrease the effect of the added noise but in many applications this is not desirable and/or the noise level may depend on $I_{0}$ . We take $I_{0} = 1$ , i.e., $g_{noise}$ is the added noise relative to $I_{0}$ . Then $\begin{matrix} - log (exp (- R f) + g_{noise}) & = - log (exp (- R f) (1 + exp (R f) g_{noise})) \\ = R f - log (1 + exp (R f) g_{noise}) . \end{matrix}$ Therefore, $\begin{matrix} (7.5) & f_{noisy} = f - R^{- 1} log (1 + exp (R f) g_{noise}) \end{matrix}$ If the noise is small enough, we can pass to a linearization to get $\begin{matrix} (7.6) & f_{noisy} \approx f - R^{- 1} (exp (R f) g_{noise}) . \end{matrix}$ This is the multiplicative noise model above with $g w_{noise}$ replaced by $e^{g} w_{noise}$ . In (7.4), for example, the factor $| R f |^{2}$ would be replaced by $exp (2 R f)$ .

7.3. Modeling Poisson noise

In SPECT, we measure the attenuated X-ray transform but the particle count at each detector is low. In this case, the predominant noise is of Poisson type: the number of particles at each detector is randomized by a Poisson distribution with probability of taking value k being $P (k, λ) = e^{- k} λ^{k} / k!$ , where $λ ⩾ 0$ is the expected value at that detector, see [15, Section 4.5]. Both the expected value and the variance of P equals λ. Then the particle count at each detector equals $λ + w \sqrt{λ}$ , where w is a random variable with zero expected value and variance 1. Note that the probability distribution of w depends on λ and approximates a Gaussian one when $λ ≫ 1$ and they are independent. Assuming locally averaged measurements, as above, we would get added noise $w_{k, h} | ψ_{h} * R f |^{1 / 2}$ when the units for $R f$ are the number of particles; and α times that in general with some $α > 0$ . Note that $w_{k, h}$ are not identically distributed (but Theorem 4.1 still applies) and are well approximated by Gaussian distributions when $R f$ is not very small. The microlocal measure then would have the factor $α R f$ (we assume $f ⩾ 0$ , thus $R f ⩾ 0$ ). This is similar to multiplicative noise, where the factor was proportional to $| R f |^{2}$ .

Fig. 13.

Shepp–Logan with the Hann filter and with (a) multiplicative noise; (b) CT type of noise; (c) Poisson noise.

Fig. 14.

Three disks with the Hann filter and with (a) multiplicative noise; (b) CT type of noise; (c) Poisson noise.

7.4. Numerical examples

We present numerical simulations with the three types of non-additive noise in Figs 13 and 14. The phantoms are the Shepp–Logan one and three disks of different size and intensity, not shown there, both phantoms having ranges between 0 and 1. They are both rendered on a $300 \times 300$ grid discretizing the square ${[- 1, 1]}^{2}$ . Their Radon transforms are computed with 1,884 angular steps and 600 steps in the p variable covering the diagonal of the square. To simulate multiplicative noise, we choose the variance of $w$ in (7.2) to be 0.2. To simulate CT noise, we use the non-linear model (7.5) (rather than the linearization (7.6)) with $VAR (g_{noise}) = 0.03$ . In the Poisson noise case, each value of $R f$ is randomized as follows: $poissrnd (80 * R f) / 80$ ; it is worth noticing that $R f$ ranges from 0 to 0.51 in the Shepp–Logan case and to 0.56 in the disks case. We chose the noise parameters so that the noise would be of similar strength, visibly, in all three cases, and the distribution is Gaussian. Hann filter is applied to the inversion.

Note that the noise has different character compared to Fig. 10(h), for example, and one can see individual lines (more precisely, line segments) in it. The multiplicative and the Poisson noise characters are somewhat similar; while the CT noise in the middle looks different. Our analysis shows that in the latter case, the standard deviation of the added noise in the linearization regime has range from $e^{0} = 1$ to about $e^{0.5} \approx 1.65$ times $STD (g_{noise})$ , see (7.6), while in the other two cases, the range is from 0 to a certain positive constant, which allows for almost zero noise locally before inversion. For this reason, individual lines are harder to distinguish in the CT case.

8. Discrete noise and its power spectrum

In this section, we analyze discrete white noise directly, without converting it to a continuous function. Here, $f (k)$ is a random vector on an $N \times \dots \times N$ grid which we denoted by $f_{k}$ before. We will denote by $δ (k)$ the discrete delta function on $Z^{n}$ . In Section 8.1, we follow mainly [15, Chapter 12], where f is a random variable depending on a (continuous) variable t; but most of it adapts to the discrete setting easily. We do a temporal analysis of the power spectrum for each fixed (discrete) frequency, with N fixed. We show that the spectrum of white noise is flat in the sense of expected value over repeated experiments, and we consider more general noise. On the other hand, for each experiment, the spectrum is quite, well, noisy and does not appear to smoothen as $N \to \infty$ numerically. In the second part, we study the ergodic properties of the power spectrum, with a single experiment, as $N \to \infty$ . We show in Theorem 8.1 that the power spectrum is flat on average. That theorem is an analog of Theorem 4.1.

We want to emphasize that $f = {f (k)}$ depends on N, so we have a “triangular” array of random variables depending on the random outcome and increasing their size with N.

8.1. Temporal analysis

The discrete analog of the Fourier transform is the Discrete Fourier Transform (DFT) described below. It lives naturally on the discrete torus $T_{N}^{n} = Z^{n} / N Z^{n}$ with period N. This shows that any time the DFT is used for spectral analysis, the original f is actually regarded as the restriction of a periodic function on a fundamental domain. We consider $f : T_{N}^{n} \to C$ , and we denote $f = {f (k)}$ ; with each element $f (k)$ , $k = (k_{1}, \dots, k_{n}) \in T_{N}^{n}$ a random variable in the same probability space. First, N will be fixed but eventually, we will take $N ≫ 1$ . We denote by $f g$ the vector defined by $(f g) (k) = f (k) g (k)$ , i.e., this is the multiplication of the functions of a discrete argument. Similarly, $| f |$ is the vector with components $| f (k) |$ , while $‖ f ‖$ is the norm of f.

We define the (unitary) Discrete Fourier Transform (DFT) $\hat{f} = F f$ by $\begin{matrix} \hat{f} (k^{*}) = \frac{1}{N^{n / 2}} \sum_{k \in T_{N}^{n}} f (k) e^{- 2 π i k \cdot k^{*} / N}, k^{*} \in T_{N}^{n} . \end{matrix}$ Its inverse is the adjoint one $\begin{matrix} f (k) = \frac{1}{N^{n / 2}} \sum_{k^{*} \in T_{N}^{n}} \hat{f} (k^{*}) e^{2 π i k \cdot k^{*} / N}, k \in T_{N}^{n} . \end{matrix}$ Parseval’s equality takes the form $\begin{matrix} f \cdot g = \hat{f} \cdot \hat{g} \end{matrix}$ for complex-valued f and g, where the dot-product is the natural one in $C^{N}$ . In particular, $F$ is unitary. There is a natural (circular) convolution $f * g$ defined, and we have $\begin{matrix} F (f * g) = N^{n / 2} \hat{f} \hat{g}, F (f g) = N^{- n / 2} \hat{f} * \hat{g} . \end{matrix}$ Next, we have $\begin{matrix} (8.1) & F δ = N^{- n / 2}, F (1) = N^{n / 2} δ . \end{matrix}$

For each f with random entries, as above, define the auto-correlation $\begin{matrix} (8.2) & {ACor}_{f} (m, k) = E {f (m) \bar{f} (k)} . \end{matrix}$ The auto-covariance is defined as the auto-correlation of the centered f, i.e., of $f - E (f)$ , and it is easy to see that $\begin{matrix} {ACov}_{f} (m, k) = ACor (m, k) - E {f (m)} \overline{E {f (k)}} . \end{matrix}$

The process f is called stationary,1

¹
this terms comes from 1D processes, where x is the time.

{ACor}_{f} (m, k)

is a function of

m - k

only:

\begin{matrix} (8.3) & {ACor}_{f} (m, k) = {ACor}_{f} (m - k), \end{matrix}

where, with some abuse of notation, we used the same notation ACor on the right. A process f is called white noise if

\begin{matrix} (8.4) & {ACor}_{f} (m, k) = 0 for m \neq k . \end{matrix}

Then we must have

\begin{matrix} (8.5) & {ACor}_{f} (m, k) = σ^{2} (m) δ (m - k) \end{matrix}

with

σ^{2} (m) = VAR (f (m)) ⩾ 0

. We always assume that white noise has a zero mean. The process is wide-sense stationary (WSS) if it is stationary and its mean is constant. Then white noise is WSS if σ is constant. Note that WSS does not mean that

f (m)

are independent from each other but if they are independent, they are uncorrelated, i.e., (8.4) holds.

Let $Γ (m^{*}, k^{*})$ be the DFT of the auto-correlation of f, see (8.2), with respect to $(m, k)$ : $\begin{matrix} Γ (m^{*}, k^{*}) = F ({ACor}_{f}) (m^{*}, k^{*}) . \end{matrix}$ Then $\begin{matrix} E {\hat{f} (m^{*}) \bar{\hat{f}} (k^{*})} = E \frac{1}{N^{n}} \sum_{m, k} f (m) \bar{f} (k) e^{- 2 π i (m \cdot m^{*} - k \cdot k^{*}) / N} = Γ (m^{*}, - k^{*}) . \end{matrix}$

In case of white noise satisfying (8.5), we have $Γ (m^{*}, k^{*}) = N^{- n / 2} \hat{σ^{2}} (m^{*} + k^{*})$ , thus we recover Theorem 11.2 in [15]: $\begin{matrix} E {\hat{f} (m^{*} + k^{*}) \bar{\hat{f}} (m^{*})} = N^{- n / 2} \hat{σ^{2}} (k^{*}), \forall m^{*} . \end{matrix}$ This shows that even when f is not stationary, $\hat{f}$ is stationary with auto-correlation $\hat{σ^{2}}$ . If $σ = const.$ , then each $f (m)$ has standard deviation σ, and $N^{- n / 2} \hat{σ^{2}} = σ^{2} δ$ , i.e., $\begin{matrix} (8.6) & E {\hat{f} (k^{*}) \bar{\hat{f}} (m^{*})} = σ^{2} δ (k^{*} - m^{*}) . \end{matrix}$ In particular, $E {| \hat{f} (m^{*}) |^{2}} = σ^{2}$ for all m, which shows a flat (expectation of a) spectrum. By Theorem 11.3 in [15], if f is real and Gaussian, then the covariance of $| \hat{f} (m^{*}) |^{2}$ and $| \hat{f} (k^{*}) |^{2}$ equals $N^{- n} {(\hat{σ^{2}})}^{2} (m^{*} + k^{*}) + N^{- n} {(\hat{σ^{2}})}^{2} (m^{*} - k^{*})$ , as we also show below. In particular, if $σ = const.$ in (8.5), we get covariance $σ^{4} (δ (m^{*} + k^{*}) + δ (m^{*} - k^{*}))$ . Therefore, they are correlated when $k^{*} = m^{*}$ and $k^{*} = - m^{*}$ (because $\hat{f}$ is even) with standard deviation $σ^{4}$ for each Fourier coefficient except for the zeroth one when it is $2 σ^{4}$ . In fact, we do not need f to be Gaussian to have the same conclusion on asymptotic sense. We assume f real from now on.

Proposition 8.1.

Let f be real valued white noise with a finite fourth moment called $μ_{4}$ . Then $\begin{matrix} (8.7) & ACov {{| \hat{f} (k^{*}) |}^{2}, {| \hat{f} (m^{*}) |}^{2}} = σ^{4} δ (k^{*} - m^{*}) + σ^{4} δ (k^{*} + m^{*}) + \frac{μ_{4} - 3 σ^{4}}{N^{n}} . \end{matrix}$

Proof.

We have $\begin{matrix} ACov {{| \hat{f} (k^{*}) |}^{2}, {| \hat{f} (m^{*}) |}^{2}} \\ = \frac{1}{N^{2 n}} \sum_{m_{1}, m_{2}, k_{1}, k_{2}} E {f (k_{1}) f (k_{2}) f (m_{1}) f (m_{2})} e^{- 2 π i ((k_{1} - k_{2}) \cdot k^{*} + (m_{1} - m_{2}) \cdot m^{*}) / N} - σ^{4} . \end{matrix}$ The only non-zero expectation terms are those with two (equal or not) pairs of equal indices. Assume first that $k_{1} = k_{2}$ , $m_{1} = m_{2}$ . Then we have two cases for the expectation term above:

If $m_{1} \neq m_{2}$ , the expectation term equals $σ^{4}$ .

Whenever $k_{1} = k_{2} = m_{1} = m_{2}$ , this expectation term is the fourth moment $μ_{4}$ .

The latter number of terms is

N^{n}

, while the former is

N^{2 n} - N^{n}

. Therefore, this set of indices contributes

\begin{matrix} (1 - \frac{1}{N^{n}}) σ^{4} + \frac{1}{N^{n}} μ_{4} \end{matrix}

to the sum.

Consider the terms with $k_{1} = m_{1}$ , $k_{2} = m_{2}$ but with $k_{1} \neq k_{2}$ to exclude the previous case. Then the corresponding sum is $\begin{matrix} \frac{σ^{4}}{N^{2 n}} \sum_{k_{1} = m_{1} \neq k_{2} = m_{2}} e^{- 2 π i (k_{1} - k_{2}) \cdot (k^{*} + m^{*}) / N} & = \frac{σ^{4}}{N^{2 n}} \sum_{k, m} e^{- 2 π i k \cdot (k^{*} + m^{*}) / N} - \frac{σ^{4}}{N^{n}} \\ = σ^{4} δ (k^{*} + m^{*}) - σ^{4} / N^{n} . \end{matrix}$ We performed the change $k = k_{1} - k_{2}$ , $m = k_{2}$ above, used (8.1) and compensated for the added terms corresponding to $k = 0$ in the second sum which are missing from the first one.

Finally, when $k_{1} = m_{2}$ , $k_{2} = m_{1}$ but $k_{1} \neq m_{1}$ , the dot product in the phase function becomes $(k_{1} - m_{1}) \cdot (k^{*} - m^{*})$ and the same argument gives us $\begin{matrix} \frac{σ^{4}}{N^{2 n}} \sum_{k_{1} \neq m_{1}} e^{- 2 π i (k_{1} - m_{1}) \cdot (k^{*} - m^{*}) / N} & = \frac{σ^{4}}{N^{2 n}} \sum_{k \neq 0} \sum_{m} e^{- 2 π i k \cdot (k^{*} - m^{*}) / N} \\ = σ^{4} δ (k^{*} - m^{*}) - σ^{4} / N^{n} . \end{matrix}$ The analysis of those three cases completes the proof. □

Corollary 8.1.

If f in Proposition 8.1 is normal, then the last term in ( 8.7 ) vanishes.

The proof follows from the well know fact that $μ_{4} = 3 σ^{4}$ for normal distributions. Remark 8.1.

The results in Proposition 8.1 can be interpreted as follows. Up to an error $O (N^{- n})$ , we get auto-covariance $σ^{4}$ if $k^{*} = m^{*} \neq 0$ and when $k^{*} = - m^{*} \neq 0$ (symmetry, because f is real), and $2 σ^{4}$ if $k^{*} = m^{*} = 0$ . If we stay in a fundamental domain of the type $k_{j} \in {0, 1, \dots, N - 1}$ then the symmetry becomes $| \hat{f} (k^{*}) |^{2} = | \hat{f} (N - k_{1}^{*}, \dots N - k_{N}) |^{2}$ .

8.2. Ergodic analysis. Flatness of the power spectrum on average

Let α be a locally Riemann integrable function on $R^{n}$ , periodic of period 1 in each variable. Assume that f is real valued. We are interested in the following linear functional $\begin{matrix} μ_{N} (α) : = \frac{1}{N^{n}} \sum_{k^{*} \in T_{N}^{n}} α (k^{*} / N) {| \hat{f} (k^{*}) |}^{2} . \end{matrix}$ This is a discrete analog of (4.14) with p there depending on the dual variable only. It is a weighted (not normalized) average of the power spectrum. What we do there is effectively rescale the spectrum from the integer points in ${[0, N - 1]}^{n}$ (and then extended by periodicity) to the ones with fractional components of the kind $k^{*} / N$ , forming a dense set in ${[0, 1]}^{n}$ asymptotically. In statistics, this is done routinely in the study of peridograms, and $k^{*} / N$ is replaced by a continuous variable.

Assume a white noise process (8.5) with $σ = const$ . Then $E (μ_{N} (α)) \to σ^{2} \int α (ξ) d ξ$ , as $N \to \infty$ by (8.6), where the integration is taken over the continuous torus in $R^{n}$ with period one.

The random variables $| \hat{f} (k^{*}) |^{2} - σ^{2}$ have zero expectation, correlation given by Proposition 8.1, and variance $σ^{4}$ . Write $\begin{matrix} μ_{N} (α) = σ^{2} \frac{1}{N^{n}} \sum_{k^{*} \in T_{N}^{n}} α (k^{*} / N) + \frac{1}{N^{n}} \sum_{k^{*} \in T_{N}^{n}} α (k^{*} / N) ({| \hat{f} (k^{*}) |}^{2} - σ^{2}) . \end{matrix}$ The first term is a Riemannian sum. The second term has zero expectation and variance $\begin{matrix} \frac{σ^{4}}{N^{2 n}} \sum_{k^{*} \in T_{N}^{n}} ({| α (k^{*} / N) |}^{2} + α (k^{*} / N) \bar{α} (- k^{*} / N)) + O (\frac{1}{N^{n}}) ⩽ \frac{C}{N^{n}} . \end{matrix}$ The error terms come from the cross terms which are products of $σ^{4}$ and $O (N^{- n})$ , by Proposition 8.1. There are $N^{2 n}$ of them. Therefore, we proved the following.

Theorem 8.1.
Let $f (k)$ be a white noise process on $T_{N}^{n}$ (depending on N), with variance $σ^{2}$ and a finite fourth momentum. Then for every Riemann integrable function α on $T_{N}^{n}$ we have $\begin{matrix} (8.8) & μ_{N} (α) ⟶ σ^{2} \int α (ξ) d ξ in mean square sense, \end{matrix}$ where the integral is taken over the torus in $R^{n}$ with period one.

Therefore, the measure $N^{- n} \sum_{T_{N}^{n}} | \hat{f} (m^{}) |^{2} δ (ξ - m^{} / N)$ converges weakly to $σ^{2} d ξ$ in mean square sense. In particular, if we take α to be the characteristic function of, say a box U in $T_{N}^{n}$ , then the average of the power spectrum on U tends to $σ^{2}$ in mean square sense.
8.3. More general noise

We assume now that the random variables $f_{h} (k)$ depend on h, have zero mean and have uniformly bounded fourth momenta but are not necessarily independent or equally distributed. If we assume (4.35), then the power spectrum is expressed in Theorem 4.2. One special but important case is when the auto-correlation is space independent (stationary, see (8.3)), then β in (4.35) is independent of x and we have $\begin{matrix} (8.9) & ACor (f_{h} (k), f_{h} (k + m)) = β (m) \end{matrix}$ with some $β (m)$ . Then (8.8) takes the form $\begin{matrix} μ_{N} (α) ⟶ \int \overset{ˇ}{β} (ξ) α (ξ) d ξ in mean square sense, \end{matrix}$ where $\overset{ˇ}{β}$ is as in (4.36). In other words, the limit measure is $\overset{ˇ}{β} (ξ) d ξ$ .

8.4. Numerical examples

We illustrate the temporal behavior of the spectrum first. In Fig. 15, we take a random normally distributed vector f with $N = 200$ and variance $σ^{2} = 1$ . The power spectrum is plotted next to it. As we can see, it looks flat on average with mean value close to one but the variation is substantial. On the right we plot the histogram of the (spatial) standard deviation $STD (| \hat{f} |^{2})$ over $1, 000$ experiments; it appears to have mean 1. We recall that $STD (| \hat{f} |^{2})$ is the square root of $\begin{matrix} VAR (| \hat{f} |^{2}) = \frac{1}{N^{n}} \sum_{k^{*} \in T_{N}^{n}} {({| \hat{f} (k^{*}) |}^{2} - σ^{2})}^{2} . \end{matrix}$ We have not proved a limit for it though. That would require estimating the auto-correlation of the summands above similarly to Proposition 8.1.

Fig. 15.

Left: a random normally distributed vector, $N = 200$ , $σ^{2} = 1$ . Center: plot of $| \hat{f} |^{2}$ for indices from 0 to 100 ( $| \hat{f} |^{2}$ is an even function with period 200). Right: the histogram of $STD (| \hat{f} |^{2})$ over $100, 000$ experiments; it appears centered around 1.

Next, we illustrate the spatial (ergodic) behavior of the power spectrum. The averaged power spectrum for a normally distributed vector is shown in Fig. 16. We divide the interval $[0, N / 2]$ into 25 subintervals and average in each one of them. We take $N = 10^{2}, 10^{3}, 10^{4}$ and $N = 10^{5}$ . As we can see, the averaged spectrum gets flatter and flatter. This illustrates Theorem 8.1.

Fig. 16.

Plot of the averaged $| \hat{f} |^{2}$ for $N = 10^{2}, 10^{3}, 10^{4}$ and $N = 10^{5}$ .

If we keep N fixed but average over many experiments, the spectrum gets flatter as well numerically, as (8.6) suggests.

Footnotes

Acknowledgements

The authors would like to thank Kiril Datchev for his advice and Magda Peligrad for making us aware of the reference []. P.S. partially supported by the National Science Foundation under grant DMS-1900475. S.T. partially supported by the National Science Foundation under grant DMS-1952966.

References

Y.C.

De Verdière , Semiclassical analysis and passive imaging, Nonlinearity 22(6) (2009), R45. doi:10.1088/0951-7715/22/6/R01.

Y.C.

De Verdière, A semi-classical calculus of correlations, Comptes Rendus Geoscience 343(8–9) (2011), 496–501. doi:10.1016/j.crte.2011.03.002.

Dimassi and

Sjöstrand, Spectral Asymptotics in the Semi-Classical Limit, London Mathematical Society Lecture Note Series, Vol. 268, Cambridge University Press, Cambridge, 1999.

C.L.

Epstein, Introduction to the Mathematics of Medical Imaging, 2nd edn, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2008.

Fefferman,

Ivanov,

Lassas and

Narayanan, Reconstruction of a Riemannian manifold from noisy intrinsic distances, SIAM J. Math. Data Sci. 2(3) (2020), 770–808. doi:10.1137/19M126829X.

Gérard, Asymptotique des pôles de la matrice de scattering pour deux obstacles strictement convexes, Mém. Soc. Math. France (N. S.) 31 (1988), 1–146.

Helin,

Lassas and

Oksanen, Inverse problem for the wave equation with a white noise source, Comm. Math. Phys. 332(3) (2014), 933–953. doi:10.1007/s00220-014-2115-9.

Helin,

Lassas,

Oksanen and

Saksala, Correlation based passive imaging with a white noise source, J. Math. Pures Appl. 9(116) (2018), 132–160. doi:10.1016/j.matpur.2018.05.001.

Hörmander, The Analysis of Linear Partial Differential Operators. IV, Vol. 275, Springer-Verlag, Berlin, 1985, Fourier integral operators.

10.

T.-C.

Hu and

R.L.

Taylor, On the strong law for arrays and for the bootstrap mean and variance, Internat. J. Math. Math. Sci. 20(2) (1997), 375–382. doi:10.1155/S0161171297000483.

11.

Keys, Cubic convolution interpolation for digital image processing, IEEE Transactions on Acoustics, Speech, and Signal Processing 29(6) (1981), 1153–1160. doi:10.1109/TASSP.1981.1163711.

12.

Martinez, An Introduction to Semiclassical and Microlocal Analysis, Universitext, Springer-Verlag, New York, 2002.

13.

Meijering and

Unser, A note on cubic convolution interpolation, IEEE Transactions on Image processing 12(4) (2003), 477–479. doi:10.1109/TIP.2003.811493.

14.

Natterer, The Mathematics of Computerized Tomography, B. G. Teubner, Stuttgart, 1986.

15.

Papoulis and

S.U.

Pillai, Probability, Random Variables, and Stochastic Processes, 4th edn, Tata McGraw-Hill Education, 2002.

16.

Reed and

Simon, Methods of Modern Mathematical Physics. III, Academic Press [Harcourt Brace Jovanovich, Publishers], New York–London, 1979, Scattering theory.

17.

Stefanov, Semiclassical sampling and discretization of certain linear inverse problems, SIAM J. Math. Anal. 52(6) (2020), 5554–5597. doi:10.1137/19M123868X.

18.

Van der Schaaf and

van Hateren, Modelling the power spectra of natural images: Statistics and information, Vision Research 36(17) (1996), 2759–2770. doi:10.1016/0042-6989(96)00002-8.

19.

Zworski, Semiclassical Analysis, Graduate Studies in Mathematics, Vol. 138, American Mathematical Society, Providence, RI, 2012.