Locally Learning Biomedical Data Using Diffusion Frames

Abstract

Diffusion geometry techniques are useful to classify patterns and visualize high-dimensional datasets. Building upon ideas from diffusion geometry, we outline our mathematical foundations for learning a function on high-dimension biomedical data in a local fashion from training data. Our approach is based on a localized summation kernel, and we verify the computational performance by means of exact approximation rates. After these theoretical results, we apply our scheme to learn early disease stages in standard and new biomedical datasets.

1. Introduction

As personalized medicine expands, increasingly detailed biomedical data must be integrated to better understand normal function and evolution of multifactorial chronic disease in clinical trials and individual treatment decisions. The complexity of molecular, cellular, and tissue interactions, however, is a fundamental barrier to extracting the complicated relationships that underlie human physiology.

A recent idea, originating in computational harmonic analysis, is to let the data speak for itself. In this approach, one deals typically with high-dimensional, unstructured data. In theoretical analysis, one assumes that the data represents a sample from some unknown low-dimensional manifold embedded in a high-dimensional ambient Euclidean space. The objective is then to understand the geometry of this manifold. Thus, statistical techniques have been devised to estimate the dimension of this manifold (Costa and Hero, 2004). A simulation of Brownian motion is expected to reveal the relative neighborhoods of different data points, as well as provide local coordinate systems for the manifold (Jones et al., 2010; Lafon, 2004). See the special issue (Chui and Donoho, 2006) for an introduction to these ideas. Related analysis of graphs is discussed in Pesenson and Pesenson (2010).

In many practical applications, one needs to go beyond an understanding of the manifold and answer queries based on the data. These queries can be modeled mathematically as functions in the (unknown) manifold. This function may be known to us on few training points, and we aim to accurately predict the value of the function of items that are not yet observed. Models for this have been developed as eigenmaps/diffusion maps (Coifman et al., 2005a), multiscale approaches (Coifman et al., 2005b; Gavish et al., 2010; Nadler and Galun, 2007), and nonlinear dimension reduction (Belkin and Niyogi, 2003; Chui and Wang, 2010; Roweis and Saul, 2000; Saito, 2008; Tenenbaum et al., 2000; Wu et al., 2010).

Necessarily, any such model cannot be expected to yield perfect reproduction of the actual target function. The subject of approximation theory deals with the intrinsic errors inherent in constructing different kinds of models for the target function. In traditional scenarios, the accuracy of approximation is closely related to the smoothness of the function. Because of this history, many experts in approximation theory nowadays consider the accuracy of approximation itself to be a measurement of smoothness. This viewpoint is particularly useful in our setting, where the manifold is unknown and, therefore, it is impossible to define the smoothness in a classical manner.

The last named author and his collaborators have developed approximation theory tools applicable in the current context in a series of papers (Filbir and Mhaskar, 2010; Filbir and Mhaskar, 2011; Maggioni and Mhaskar, 2008; Mhaskar, 2010; Mhaskar, 2011). A particularly interesting aspect of this theory is a definition of pointwise smoothness of the target function. The research has also enabled us to devise specific algorithms, extending the theory developed for the understanding of geometry, with the property that the rate of convergence of these algorithms in neighborhoods of different points completely characterize local smoothness properties of the target function at those points.

Such local smoothness ideas are particularly useful in biomedicine, as disease progression underlies natural variations, medication leads to abrupt changes in disease progression, and environmental factors vary quickly, so that the query function might not be globally smooth. While late disease stages underlie large variations, the transition from healthy to early pathology can be smooth, leading to query functions that are locally smooth within such early disease transitions.

In this article, we shall review and further develop the salient features of diffusion geometry and approximation theory needed to “learn locally” from the acquired data. In contrast to common clustering methods used in biomedicine, we explicitly use that the clusters represent disease stages, i.e., are ordered quantitatively in a progressing fashion. Thus, some unspecific ordering is a-priori fed into the clustering process and is specified in the final result. Moreover, instead of interpolating on the training data, which usually leads to instabilities with large training sets, we allow our algorithm to correct misclassified training points.

The outline is as follows. In Section 2, we briefly discuss two approaches to reconstructing the query function from training data. In Section 3, we present our local learning approach. The numerical implementation is discussed in Section 4. In Section 5, we outline our scheme for the special case in which the manifold is the sphere. We apply our methods to analyze several biomedical datasets in Section 6.

2. Approximation on Unknown Manifolds

Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} be the underlying but unknown compact manifold, endowed with a probability measure μ. The training data \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal C} = \{y_i \}_{i = 1}^M \subset {\mathbb X}$$ \end{document} yield pairs of the form \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ (y_j, z_j) \}_{j = 1}^M$$ \end{document} , with z_j ≈ f (y_j) for the as yet unknown function f. This defines f only on \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal C}$$ \end{document} . The objective is to extend this function to the entire manifold \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} , including the data already observed and the data not yet available. In deterministic analysis, we do not deal with the noise explicitly, although some probabilistic estimates can be given in addition to deterministic guarantees that account for the noise (Gia and Mhaskar, 2008a).

There are two common approaches to solving this problem. In the first approach, one finds the extension \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$f_{\cal C}$$ \end{document} as a solution of some minimization/regularization problem, for example, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} f_{\cal C} = \arg \ \min_{g \in {\cal F}} \{ \mid \mid {\bf g} - {\bf z} \mid \mid + \delta \mid \mid g \mid \mid_{\cal F} \} , \tag{1} \end{align*} \end{document}

where δ is a balancing parameter, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal F}$$ \end{document} is a suitable class of functions to choose the modeling function g from, g denotes the vector \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$(g (y_1), \ldots, g (y_M)),$$ \end{document} z denotes the vector \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$(z_1, \ldots, z_M), \mid \mid \cdot \mid \mid$$ \end{document} is some norm on, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb R}^M$$ \end{document} , and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\mid \mid \cdot \mid \mid_{\cal F}$$ \end{document} is a penalty functional, commonly the norm on \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal F}$$ \end{document} . We will call Equation (1) the optimization approach.

The other approach is to imagine a priori that there is an underlying unknown function f on \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} , so that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$f (y_i) = z_i, i = 1, \ldots, M$$ \end{document} . We then seek a model \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$P \in {\cal F}$$ \end{document} so that for some performance guarantee \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\epsilon_{\cal C}$$ \end{document} , \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \mid \mid\, f - P \mid \mid_{\infty} \leq \epsilon_{\cal C} \mid \mid\, f \mid \mid_{\cal F}, \tag{2} \end{align*} \end{document}

The optimization approach has the advantage that the penalty functional may be chosen to reflect some domain-specific knowledge about the target function. Also, even if one does not expect any function underlying the phenomenon, one gets some smooth model to work with. On the other hand, when we do not know with absolute certainty any physical model that underlies the data, and are seeking a function on \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} as a model anyway, then it is more natural to assume that there is an underlying function from which the data is sampled, even though the function itself is yet unknown, so that the approximation approach seems more natural.

We noted some comparisons between the two approaches. First, we the optimization approach does not necessarily imply any performance guarantees of Equation (2). Moreover, the value of the regularization functional depends upon the data. There are no bounds to how large this value might get as more and more data are introduced. Finally, there are common computational issues such as local minima, convergence of the algorithms, and convergence of the minimizers \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$f_{\cal C}$$ \end{document} as the data becomes dense on the manifold. All of these issues are completely avoided in the approximation approach.

We will demonstrate below that in the approximation approach, we can construct a linear operator with mathematical performance guarantees of Equation (2). We do not need to solve any minimization problem, so that all the computational issues mentioned above are avoided altogether. Moreover, we can design this operator in a manner that its performance guarantees are automatically better on regions of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} where the target function is “smoother.” This does not involve a careful detection of edges and partitioning of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} .

3. Local Approximation of the Query Function

3.1. Local smoothness classes

Before we can define local smoothness of a function \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$f:{\mathbb X} \rightarrow {\rm R}$$ \end{document} , we must specify a few more objects. Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ \varphi_k \}_{k = 0}^\infty$$ \end{document} be the eigenfunctions of the Laplace-Beltrami operator Δ on \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} , and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ \ell^2_k \}_{k = 0}^\infty$$ \end{document} the associated eigenvalues, ordered in a nondecreasing way with ℓ₀ = 0 and ℓ_k → ∞ as k → ∞ . The space of diffusion polynomials up to degree N is Π_N := span{φ_k : ℓ_k < N}. Further technical details are discussed in Appendix A. Note that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ \varphi_k \}_{k = 0}^\infty$$ \end{document} was replaced by a more general orthonormal basis for L₂( \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} ,μ) in Filbir and Mhaskar (2010, 2011).

The object of interest in approximation theory is the degree of approximation of the target function: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} E_{N} (f) = \min_{P \in \Pi_N} \ \parallel\, f - P \ \parallel_\infty. \tag{3} \end{align*} \end{document}

Equation (3) measures the best error achievable if one wishes to use Π_N as the model for f, and wishes the error to be small at each point of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} . It turns out that the rate at which the quantity E_N(f) decreases to 0 as N → ∞ is closely related to the smoothness of f. Thus, if f is smooth enough so that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Delta^r f \in {\mathscr C} ({\mathbb X})$$ \end{document} for some integer r ≥ 1, then \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} E_{N} (f) \le \frac {c} {N^{2r}} \ \parallel \Delta^r f \ \parallel_\infty, \end{align*} \end{document}

where c > 0 is a constant. Here, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathscr C} ({\mathbb X})$$ \end{document} denotes the continuous functions on \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} endowed with the supremum norm. In practice, it is quite common to find an approximation of f by ad hoc means. This gives rise to the question that if one finds E_N(f) ↓ 0 at a certain rate, is it because f inherits a certain smoothness? Thus, the correct notion of smoothness required to answer this question is defined in terms of a regularization functional (known in some branches of analysis as the K-functional). If r ≥ 1 is an integer, this functional is defined by \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} K_{r} (f, \delta) = \inf \{ \ \parallel\, f - g \ \parallel_\infty + \delta^{2r} \ \parallel \Delta^r g \ \parallel_\infty \} , \qquad \delta > 0, \end{align*} \end{document}

where the infimum is taken over all g for which \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\ \parallel \Delta^r g \ \parallel_\infty < \infty$$ \end{document} . In the terminology introduced before, this measures the approximation of f by smooth functions g while controlling the growth of Δ^rg with the help of the regularization parameter δ. It is known (Maggioni and Mhaskar, 2008) that if s > 0 and r > s/2 is any integer, then \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} E_N (f) = {\cal O} (N^{- s}) \ {\rm as} \ N \rightarrow \infty \ {\rm if \ and \ only \ if} \ K_r (f, \delta) = {\cal O} \ (\delta^s) \ {\rm as} \ \delta \rightarrow 0. \end{align*} \end{document}

Standard approximation theory arguments show that if γ < s/2 is an integer, and β = s − 2γ, then \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} E_{N} (f) = {\cal O} (N^{- s}) \ {\rm if \ and \ only \ if} \ \Delta^\gamma f \in {\mathscr C} ({\mathbb X}) \ {\rm and} \ K_r (\Delta^\gamma f, \delta) = {\cal O} (\delta^{\beta}) . \end{align*} \end{document}

In particular, the choice of r > s/2 is not critical except that the constants involved in the \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal O}$$ \end{document} relations will depend upon the choice of r. This leads us to define the (global) smoothness class W^s directly in terms of the quantities E_N(f) as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} W^{s} ({\mathbb X}) : = \{f \in {\mathscr C} ({\mathbb X}) : E_{N} (f) = {\cal O} (N^{- s}) \} , \end{align*} \end{document}

endowed with the norm \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\ \parallel \cdot \parallel_{W^s}: = \ \parallel \cdot \parallel_\infty + \ \parallel (N^sE_{N} (f))_N \parallel_{\infty}$$ \end{document} . We will think of s as the parameter measuring the smoothness of the function f.

In the context of data-defined manifolds, we do not explicitly know any formulas for the local coordinate charts. Therefore, it is not easy to define the notion of derivatives. Nevertheless, we can redefine the notion of an infinitely differentiable function on our unknown manifold as membership in every smoothness class W^s \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$({\mathbb X})$$ \end{document} . Thus, the set of all infinitely differentiable functions is denoted by \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathscr C}^\infty ({\mathbb X}) = \bigcap\nolimits_{s > 0} W^{s} ({\mathbb X})$$ \end{document} . If \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$x_0 \in {\mathbb X}$$ \end{document} , we will define the local smoothness of f at x₀ by the natural windowing construction. Thus, we say that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$f \in W^{s}_{x_0} ({\mathbb X})$$ \end{document} if there exists a neighborhood U of x₀ such that for every \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\phi \in {\mathscr C}^\infty ({\mathbb X})$$ \end{document} , supported on U, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\phi f \in W^{s} ({\mathbb X})$$ \end{document} .

3.2. Localized summation kernels

We can clearly expand any \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$f \in L_2 ({\mathbb X}, \mu)$$ \end{document} in the orthonormal basis \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ \varphi_k \}_{k = 0}^\infty$$ \end{document} by \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$f = \sum\nolimits_{k = 0}^\infty \langle f, \varphi_k \rangle \varphi_k.$$ \end{document} To motivate our scheme presented later, we manipulate this expression without caring about convergence and interchanging limits, so that we derive \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} f (x) = \sum_{k = 0}^\infty \int_{{\mathbb X}} f (y) \varphi_k (y) d \mu (y) \varphi_k (x) = \int_{{\mathbb X}} f (y) \Phi (x, y) d \mu (y), \tag{4} \end{align*} \end{document}

where we formally use \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Phi (x, y) = \sum\nolimits_{k = 0}^\infty \varphi_k (y) \varphi_k (x)$$ \end{document} . This representation requires knowledge of f on the entire manifold \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} . To reconstruct f from the data only, we must replace the integral with a finite sum over data points and localize the kernel Φ so that f (x) is determined by its values in a neighborhood around x. In this section, we shall make these ideas mathematically precise and construct a linear operator based on the data to derive \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$P \in \Pi_N$$ \end{document} that essentially minimizes Equation (3).

First, we define \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \Phi_N (x, y) : = \sum_{k = 0}^\infty h \bigg (\frac {\ell_k} {N} \bigg) \varphi_k (x) \varphi_k (y), \quad {\rm for \ all} \ x, y \in {\mathbb X}, \tag {5} \end{align*} \end{document}

where h : \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb R}$$ \end{document} → [0, 1] satisfies h(t) = 1 if |t| ≤ 1/2 and h(t) = 0 if |t| ≥ 1, see Figure 1 for a typical function h and Appendix A for a few more technical conditions. Due to the “cut-off” function h, Equation (5) is a finite sum, and Φ_N, as a function of only one variable x or y, is contained in Π_N. Under the technical assumptions collected in Appendix A, the estimate \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} |\Phi_N (x, y)| {\lesssim} \frac {N^{\alpha}} {\max (1, (N \rho (x, y))^S}, \tag {6} \end{align*} \end{document}

FIG. 1.

Filter function: A typical filter function h that can be used in Equation (5).

holds, where α is the dimension of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} , ρ the metric on \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} , and S any integer (Filbir and Mhaskar, 2010; Maggioni and Mhaskar, 2008; Mhaskar, 2011). Here means that there is an absolute positive constant on the right-hand side such that the inequality holds. This localization property (6), very much like multi-scale approaches with wavelets, enables local analysis of functions on the manifold.

If f ≈ P, then the we also have \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$f \Phi_N (x, \cdot) \approx P \Phi_N (x, \cdot)$$ \end{document} . Naturally, we would expect that the product of two polynomials is again a polynomial of a larger degree, so that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$P \Phi_N (x, \cdot) \in \Pi_{aN}$$ \end{document} , for some fixed a > 0. Here, we only need the weaker product assumptions. (Appendix A), implying that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$P \Phi_N (x, \cdot)$$ \end{document} can be sufficiently well approximated with polynomials in Π_aN. To replace the integral in Equation (4) with a finite sum, we need a quadrature formula that is exact for polynomials up to degree aN: If our data \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal C} = \{y_i \}_{i = 1}^M$$ \end{document} are sufficiently dense in \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} (Appendix B), then there are quadrature weights \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ \omega_i \}_{i = 1}^M$$ \end{document} such that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \int_{{\mathbb X}} P (x) d \mu (x) = \sum_{j = 1}^M \omega_j P (x_j), \quad {\rm for \ all} \ P \in \Pi_{aN}. \end{align*} \end{document}

Since φ₀ ≡ 1, these weights can be computed by solving the linear system of equations \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \sum_{j = 1}^M \omega_j \varphi_k (x_j) = \delta_{k, 0}, \quad{\rm for \ all} \ k = 0, 1, 2, \ldots, aN. \end{align*} \end{document}

We refer to Appendix B and Filbir and Mhaskar (2010) for details on the existence of such quadrature weights with respect to the training data.

We now replace the right-hand side of Equation (4) with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \sigma_N (f, x) : = \sum_{j = 1}^M \omega_jf (y_j) \Phi_N (x, y_j) . \tag{7} \end{align*} \end{document}

To study the asymptotics for N → ∞ , we call for a sequence of training sets \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal C}_N$$ \end{document} that induce quadrature formulas of strength aN. We shall verify in Appendix C that, for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$f \in W^s_{y_0} ({\mathbb X})$$ \end{document} , there is δ > 0, such that, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \sup_{x \in {B}_\delta (y_0)} \mid\, f (x) - \sigma_{N} (f, x) \mid \lesssim N^{- s}, \tag{8} \end{align*} \end{document}

4. Specifying φ_K Numerically

To design a numerically feasible algorithm, we still need to compute a suitable orthonormal basis \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ \varphi_k \}_{k = 0}^\infty$$ \end{document} for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$L_2 ({\mathbb X}, \mu)$$ \end{document} , while the manifold \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} is not explicitly known to us. In a typical situation, we may have little training data and a much larger but finite collection \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{y_i \}_{i = 1}^M$$ \end{document} of data lying on the manifold. These data shall be used to approximate an orthonormal basis \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ \varphi_k \}_{k = 0}^\infty$$ \end{document} .

The eigenfunctions of the Laplace-Beltrami operator \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Delta_{\mathbb X} = {\rm div} \nabla$$ \end{document} on the manifold form an orthonormal basis that satisfies the technical assumptions needed in Appendix A. In order to compute them numerically, we build the graph Laplacian from a finite set of points on the manifold as follows: By using data points \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{y_i \}_{i = 1}^M \subset{\mathbb X}$$ \end{document} , the standard heat kernel \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$k_\varepsilon (x, y) = e^{- \ \parallel x - y \ \parallel^2 / 2 \varepsilon}, \varepsilon > 0$$ \end{document} , induces the weight matrix \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal W}^{\varepsilon}_M$$ \end{document} defined by \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal W}^{\varepsilon}_{M;i, j} = k_\varepsilon (y_i, y_j)$$ \end{document} . We build the diagonal matrix \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$D^{\varepsilon}_{M;i, i} = \sum\nolimits_{j = 1}^M {\cal W}^{\varepsilon}_{M;i, j}$$ \end{document} and define the (unnormalized) graph Laplacian as \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$L^{\varepsilon}_M = {\cal W}^{\varepsilon}_M - D^{\varepsilon}_M$$ \end{document} . For M → ∞ and ɛ → 0, the eigenvalues and interpolations of the eigenvectors of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$L_M^\epsilon$$ \end{document} converge toward the “interesting” eigenvalues and eigenfunctions of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Delta_{\mathbb X}$$ \end{document} when \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{y_i \}_{i = 1}^M$$ \end{document} are uniformly distributed on \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} . See Appendix D for the details and technical assumptions. If \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{y_i \}_{i = 1}^M$$ \end{document} are distributed according to the density p, then the graph Laplacian approximates the elliptic Schrödinger-type operator \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Delta + \frac {\Delta p} {p} $$ \end{document} (Coifman et al., 2005b; Nadler et al., 2006), whose eigenfunctions also form an orthonormal basis for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$L_2 ({\mathbb X}, \mu)$$ \end{document} .

5. Localized Kernels on the Sphere

The data of one of our applications lie on the sphere S^d⁻¹, so that we specify our approach for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X} = S^{d - 1}$$ \end{document} (Gia and Mhaskar, 2006; Gia and Mhaskar, 2008b). For \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$f:S^{d - 1} \rightarrow {\rm R}, \ {\rm let} \ \check{f} (x) : = f (x / \ \parallel x \ \parallel)$$ \end{document} be its extension to \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb R}^d \backslash \{0 \} $$ \end{document} . The Laplace-Beltrami operator \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Delta_{S^{d - 1}} \ {\rm is} \ \Delta_{S^{d - 1}} f = (\Delta_{\rm R^d} \check{f})_{\mid S^{d - 1}}$$ \end{document} , and the set of spherical harmonics \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal H}^d_k$$ \end{document} is formed by the homogeneous harmonic polynomials p on \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb R}^d$$ \end{document} of degree k restricted onto the sphere S^d⁻¹. In other words, p is a polynomial whose monomials have all the same total degree k and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Delta_{{\mathbb R}^d}p = 0.$$ \end{document} The eigenspaces of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Delta_{S^{d - 1}}$$ \end{document} are \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal H}^d_k$$ \end{document} with eigenvalues k(k + d − 2), respectively, and the dimension of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal H}^d_k$$ \end{document} is m_k := \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${k + d - 1 \choose k} - {k + d - 3 \choose k - 2}$$ \end{document} . If we choose an orthonormal basis \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ \varphi_{k, j} \}_{j = 1}^{m_k}$$ \end{document} for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal H}^d_k$$ \end{document} , then \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\sum\nolimits_{j = 1}^{m_k} \varphi_{k, j} (x) \varphi_{k, j} (y)$$ \end{document} does not depend on the choice of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ \varphi_{k, j} \}_{j = 1}^{m_k}$$ \end{document} and coincides with \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$P_k (\langle x, y \rangle)$$ \end{document} , where P_k is the Gegenbauer polynomial of degree k and parameter d/2 − 1, sec. Appendix E.1 and Stein and Weiss (1971). Therefore, we can explicitly compute \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \Phi_N (x, y) = \sum_{k = 0}^N h \bigg (\frac {\sqrt {k (k + d - 2)}} {N} \bigg) P_k (\langle x, y \rangle) . \end{align*} \end{document}

To derive σ_N(f,x) in Equation (7), we still need to determine the quadrature weights \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ \omega_j \}_{j = 1}^n$$ \end{document} , which we shall properly describe in Appendix E.2. Here, we only present a heuristic approach. Since \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{P_k (\langle x, \cdot \rangle) \}_{x \in S^{d - 1}}$$ \end{document} generates \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal H}^d_k$$ \end{document} , we aim to solve, for R as large as possible, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \sum_{j = 1}^n \omega_j P_k (\langle x, x_j \rangle) = \delta_{k, 0}, \qquad {\rm for \ all} \ k = 0, \ldots, R, \end{align*} \end{document}

and m_k = dim \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$({\cal H}^d_k)$$ \end{document} random choices of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$x \in S^{d - 1}$$ \end{document} . This leads to a linear system of equations whose solution is reasonably close to exact weights in practice, but only for small parameters d, n, and R. If any of these parameters is not small, then the problem becomes numerically unstable, and we need to follow the approach presented in Appendix E.

If the manifold is known, as in the case of the sphere, many types of basis functions can be constructed explicitly. For instance, spherical wavelets were considered in Antoine and Vandergheynst (1995), Dahlke et al. (1995, 2004), and Starck et al. (2006), which can capture multi-scale structure. However, replacing the eigenfunctions \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ \varphi_k \}_{k = 0}^\infty$$ \end{document} of the Laplace-Beltrami operator would require checking in each case if all the conditions in Appendix A are satisfied. Therefore, we shall not follow this path and, here, exclusively use the eigenfunctions of the Laplace-Beltrami operator.

6. Applications

We use the developed approximation scheme to cluster biomedical data into disease stages \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{C_l \}_{l = 0}^L$$ \end{document} . Therefore, the classes (disease stages) have a natural ordering, and we assign a meaningful number c_l to cluster C_l such that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$0 \leq c_0 < c_1 < \ldots < c_L$$ \end{document} . Hence, |c_i −c_j| represents the distance between C_i and C_j. The query function f is defined on the training data by f (x_i) = c_l if and only if \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$x_i \in C_l$$ \end{document} . The approximand σ_N(f,x) induces a nearest neighbor classification on the entire dataset by the proximity of σ_N(f,x) to any of the numbers \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$c_0, \ldots, c_L$$ \end{document} .

We first compare our proposed approach to widely used classification methods on two standard biomedical datasets. After this verification, we use our scheme to analyze multispectral retinal images of age-related macular degeneration (AMD) patients. All eye-related data were collected by our collaborators at the National Eye Clinic at the National Institutes of Health (Bethesda, NIH; Maryland).

6.1. Standard biomedical datasets

6.1.1. Cleveland Heart Disease Database: learning disease stages

The Cleveland Heart Disease Database (CHDD) (Detrano et al., 1989) contains 297 patterns with 13 attributes and is grouped into five progressive heart disease stages (values 0,1,2,3,4), where 0 corresponds to normal heart conditions. We removed six patterns due to missing values. Due to homeostasis and its failure in progressed disease, we expect the query function f to be smoother on normal heart conditions and early disease stages, while later stages may form a more heterogeneous group. We compare our method to support vector machines (SVM), in which clusters are derived through sequential binary clustering. Clusters are evaluated by means of binary false-positive or false-negatives for each disease stage. Indeed, our proposed method recovers f consistently better than SVM for the values 0, 1, and 2 when dealing with few training points (Table 1). As expected, our kernel performs poorly on the stages 3 and 4, implying the lack of smoothness of f within these progressed stages. The transition from 0 to 2 seems to be steered by a smoother process, which is reflected by a smoother query function yielding better results than SVM methods.

Table 1.

Sensitivity Analysis for Cleveland Heart Disease Database

Stage 0	SVM linear	SVM Gaussian	Local kernel
CHDD, 40	73%	71%	79%
CHDD, 100	78%	80%	83%
CHDD, 200	93%	95%	92%
Stage 1
CHDD, 40	69%	68%	73%
CHDD, 100	73%	75%	80%
CHDD, 200	88%	92%	85%
Stage 2
CHDD, 40	64%	62%	71%
CHDD, 100	68%	71%	75%
CHDD, 200	86%	85%	81%
Stage 3
CHDD, 40	61%	60%	54%
CHDD, 100	65%	66%	59%
CHDD, 200	78%	79%	69%
Stage 4
CHDD, 40	57%	55%	52%
CHDD, 100	63%	59%	54%
CHDD, 200	72%	69%	63%

Sensitivity (one minus false negative rate) for disease stages 0 to 4 using the dataset CHDD in Section 6.1.1. with 40, 100, and 200 training points, averaged over 50 instances for each method. Our local kernel method performs better for few training data than SVM on disease stages 0, 1, and 2, in which we expect the query function to be relatively smooth. CHDD, Cleveland Heart Disease Database.

6.1.2. Wisconsin Breast Cancer Database: data completion

After removing missing values, the Wisconsin Breast Cancer Database (WBCD; original) (Wolberg and Mangasarian, 1990) contains 683 patterns with 9 attributes. We aim to predict quantitative attributes. In fact, we randomly select 200, 300, and 400 training points to learn the attribute “clump thickness” (ranging from 1 to 10) and aim to predict its values on the remaining data. We call it a hit when the prediction is within a radius of 1, i.e., if the measured size was 3, the predictions in the interval [2, 4] are counted as a hit. The excellent performance of our proposed method by means of sensitivity (number of hits divided by the cluster size) for few training data is shown in Table 2.

Table 2.

Sensitivity Analysis for the Wisconsin Breast Cancer Database

	SVM linear	SVM Gaussian	Local kernel
WBCD, 100	74%	72%	80%
WBCD, 200	76%	79%	84%
WBCD, 300	92%	93%	90%

Sensitivity analysis for the data set WBCD in Section 6.1.2, averaged over clump thickness and 50 instances for each method. Our local kernel method yields high sensitivity compared to other methods when there are only few training data. WBCD, Wisconsin Breast Cancer Database.

6.2. Age-related macular degeneration

Age-related macular degeneration is the most common cause of blindness among the elderly population in the western world (Chew et al., 2009; Coleman et al., 2008; Krishnadev et al., 2010). Aging of the human retina is universally associated with microscopic changes within the retinal pigment epithelium (RPE), including increased number and volume of fluorescent lipofuscin granules (Meyers et al., 2004). In a majority of Americans over the age of 60, the earliest clinical signs of RPE dysfunction are observed in color fundus photographs as drusen—bright highly reflective extracellular deposits between the RPE and its basement membrane. Macular drusen increase in number and size with advancing age in epidemiological studies and larger, irregular-shaped, perifoveal drusen (“soft”) are considered to confer the greatest risk for progression to advanced AMD. Through many years of large-scale studies of the natural history of AMD and controlled prevention trials, clinical observations of fundus photographs suggest that people with soft (larger than 150 microns and irregularly shaped) drusen are at high risk for progressing to advanced AMD. Currently, pathologists in reading centers classify drusen based on size and shape (Bird et al., 1995) in reflection color fundus images. There is demand for automated analysis tools that allow for quantitative prediction of disease progression.

6.2.1. Retinal multi-spectral imaging

The retina is a multilayer neural tissue, uniquely suited for noninvasive optical imaging with high resolution. The first author, together with his collaborators at NIH, developed noninvasive multispectral fluorescence imaging of the human retina by adding selected interference filter sets to standard fundus cameras, allowing the monitoring of early changes within the RPE via the fluorescent lipofuscin granules (Dobrosotskaya et al., 2010, 2011; Ehler et al., 2010, 2011a, 2011b; Kainerstorfer et al., 2010a, 2010b, 2011). If F(Λ,λ) denotes the measured autofluorescence, where Λ is the excitation and λ the emission wavelength, then the Beer-Lambert law for the double-path yields \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} F (\Lambda, \lambda) = I (\Lambda) \Phi (\Lambda, \lambda) e^{- (D (\Lambda) + D (\lambda))}, \end{align*} \end{document}

where D is the integrated absorbance (optical density) of the tissue the light travels through, Φ(Λ,λ) is the fluorescence efficiency of lipofuscin, and I(Λ) the radiant power of the excitation light. For each of the six patients, four excitation filters with two emission filters and trifold imaging lead to 24 images (400 × 400 pixels) that are aligned by applying the commercial software i2k Align. We de-noise in z-direction by principal component analysis, keeping five eigenvectors capturing more than 98% of the dataset's variance. Spatial noise is reduced by averaging each pixel with its eight neighbors. Since the pathological changes are related to fractional changes of the autofluorescence, we can normalize the 160,000 vectors to lie on the sphere S⁴.

The principal component analysis and normalization also helps to compare pixels across patients. The drusen classification scheme presented in van Leeuwen et al. (2003; see also Bird et al., 1995) leads to 8 progressing stages. We partition them into four classes. An expert grader labeled few spatial regions, which led to 2,000 training pixels in three patients that were each labeled with one of the four partitions. Thus, we have 6,000 training vectors total, and pixels in a single patient can have distinct labels. We shall learn locally the pixel classification into drusen classes. Note that our method relies on the multispectral components of drusen in autofluorescence images rather than the spatial shape that is seen in reflection color fundus images.

After the learning step, we mark a region of interest (ROI) in six patients (including the three patients used for learning). To classify pixels from the ROI into the four partitions, we merge the ROI pixels of the patient under consideration with the ROI pixels of the three learned patients to form the data on the manifold. The labeled regions are the training data, and since the pixel vectors lie on the sphere, we can follow the approach in Section 5 to derive σ_N(f, x) (Fig. 2). The majority of the pixels within the ROI would define the drusen class of the patient. It should be mentioned that the size of the ROI influences the classification scheme, and we obtained good results with the center in the fovea and extending to 5 degrees, consistent with common image analysis in ophthalmology.

FIG. 2.

Drusen classification. The foveal class outnumbers the other classes in (d) and determines the patient's class. (a, left) Drusen in color fundus image; (a, right) Drusen in cross-section of a monkey retina. (b) Two spectral images of pre-advanced AMD patient. (c) Two spectral images of advanced AMD patient. (d, left) Squares show class centers; (d, right) four classes, where dark blue is not assigned to any class.

Next, we explore the influence of the size of the training data. We only use a fraction of the pixels that were originally labeled by our grader. The individual pixels are selected by a random sampling. Figure 3 shows that we require a critical number of training pixels and from there on, we obtain stable classification results.

FIG. 3.

The increments of the training pixels are in steps of 200. The y-axis counts the correct classifications. Note that we simply removed random pixels from the entire set of 3 × 2000 training pixels, so that each smaller training set is contained in the larger one. We repeated this process over 20 runs to avoid random anomalies, which led to the black curve. The most typical curve in a single run is depicted in red. For few training data, we only classify one single patient correctly. Increasing the size of the training pixels enables consistently correct classifications of all but one patient (consistently the same one). It seems that the critical number of training pixels is between 600 and 800 pixels.

7. Conclusions

To facilitate the analysis of complex biomedical data, we developed the mathematical foundations for a numerical algorithm that enables global and local data analysis integratively. After we validated our approach on two standard datasets, we aimed to classify and predict disease progression in AMD patients. Drusen were classified in multispectral retinal image sets that enabled quantitative measurements of advanced pathological changes. Clearly, our new mathematical approach to identifying new components controlling AMD progression is preliminary and needs further validation to claim its usefulness in general terms. We anticipate improvements through synergies of multiple analysis schemes with multi modal data so that our proposed analysis could be part of an iterative process.

8. Appendix

Appendix A. Technical assumptions

For the sake of readability, we list simple conditions used in Section 3 that can be further weakened (Filbir and Mhaskar, 2010; Maggioni and Mhaskar, 2008). The manifold \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} is supposed to be a smooth, compact, and connected Riemannian manifold without boundary. We point out that the following technical conditions are satisfied when \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ \varphi_k \}_{k = 0}^\infty$$ \end{document} are the eigenfunctions of the Laplace-Beltrami operator on \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}, \{ \ell^2_k \}_{k = 0}^\infty$$ \end{document} its eigenvalues, and μ the standard Riemannian probability measure (see Filbir and Mhaskar, 2010, 2011; Maggioni and Mhaskar, 2008; Sikora, 2004 for details).

A.1. Assumption on the volume of balls

If α > 0 is the dimension of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} , then we assume that μ(B_r(x))⪅r^α, for all \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$x \in {\mathbb X}$$ \end{document} and r > 0.

A.2. Assumption on products of polynomials

There is a ≥ 2 such that, for all \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$f, g \in \Pi_N$$ \end{document} , their product fg is contained in Π_aN. In fact, we only need the weaker condition saying that, for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \epsilon_N: = \sup_{\ell_j, \ell_k \leq N} {\rm dist} (\varphi_j \varphi_k, \Pi_{aN})_\infty, \end{align*} \end{document}

we have \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$N^c \epsilon_N \rightarrow 0$$ \end{document} as N → ∞ , for every c > 0.

A.3. Assumption on the growth of the heat kernel

let |∂^γK_t(x, y)|⪅ t^−α/^2−γ exp(−cρ(x, y)²/t),for all \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$t \in (0, 1 ], x, y \in {\mathbb X}$$ \end{document} , where c is a constant, γ = 0, 1, and ∂ is a differential operator of first order.

A.4. Assumption on the filter

Let: \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb R}_{\ge 0} \rightarrow {\mathbb R}$$ \end{document} be infinitely often differentiable and a nonincreasing function such that h(t) = 1 if t ≤ 1/2 and h(t) = 0 if t ≥ 1. For instance, we can choose \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} h (x) = \begin{cases} 1, \qquad \qquad \qquad \qquad \qquad x \leq 1 / 2, \\ \exp (\frac {(x - \frac {1} {2})^2 (2x^2 - 2x - 1)} {x^2 (x - 1)^2}), \quad 1 / 2 \leq x \leq 1, \\ 0, \qquad \qquad \qquad \qquad \qquad 1 \leq x, \end{cases} \end{align*} \end{document}

see Figure 1.

Appendix B. Existence of quadrature weights

For training data \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal C} = \{x_j \}_{j = 1}^n \subset {\mathbb X}$$ \end{document} , let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\delta_{{\cal C}} : = \sup\nolimits_{x \in {\mathbb X}} {\rm dist} \ \rho (x, {\cal C})$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$q_{{\cal C}}: = \min\nolimits_{i \neq j} \rho (x_i, x_j)$$ \end{document} . If \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} is well covered by \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal C}$$ \end{document} , i.e., \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\delta_{{\cal C}} \leq 2 q_{{\cal C}} \leq 2 \delta_{{\cal C}}$$ \end{document} , then there exists a constant c with the following property: for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$N \leq c / \delta_{{\cal C}}$$ \end{document} , there are positive numbers \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ \omega_j \}_{j = 1}^n$$ \end{document} , such that the cubature formula \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\int_{\mathbb X} P (x) d \mu (x) = \sum\nolimits_{j = 1}^n \omega_j P (x_j)$$ \end{document} holds, for all \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$P \in \Pi_N$$ \end{document} (Filbir and Mhaskar, 2010). Note that the weights satisfy \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\sum\nolimits_{j = 1}^n \omega_j = 1$$ \end{document} and hence are bounded since φ₀ ≡ 1.

Appendix C. Proof of local approximation

Given \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$f \in W^s_{y_0} ({\mathbb X})$$ \end{document} , pick δ > 0 such that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\phi f \in W^s ({\mathbb X})$$ \end{document} , for any \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\phi \in {\cal C}^\infty ({\mathbb X})$$ \end{document} , supported on B_3δ(y₀) and equals one on B_3δ/2(y₀). We can choose ϕ such that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\phi ({\mathbb X}) \subset [ 0, 1 ]$$ \end{document} . To verify Equation (8), we assume \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$N \geq \frac {2} {\delta} $$ \end{document} and estimate, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \sup_{x \in B_\delta (y_0)} \mid\, f (x) - \sigma_N (f, x) \mid & \leq \sup_{x \in B_\delta (y_0)} \mid\, f (x) - \sigma_N (\phi f, x) \mid + \sup_{x \in B_\delta (y_0)} \mid \sigma_N (\phi f - f) \mid \\ & \leq \sup_{x \in B_\delta (y_0)} \mid \phi (x) f (x) - \sigma_N (\phi f, x) \mid + \sup_{x \in B_\delta (y_0)} \mid \sigma_N (\phi f - f, x) \mid \\ & \lesssim N^{- s} + \sup_{x \in B_\delta (y_0)} \mid \sigma_N (\phi f - f, x) \mid, \end{align*} \end{document}

where we have used global approximation results from Mhaskar (2010), applied to the globally smooth function ϕf. In the remainder of this section, we shall estimate \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\sup\nolimits_{x \in B_\delta (y_0)} \mid \sigma_N (\phi f - f, x) \mid \lesssim N^{- s}$$ \end{document} .

To do so, we use the localization property of Φ_N, i.e., for S > max(1, α + s), \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \mid \Phi_N (x, y) \mid \frac {N^\alpha} {\max (1, (N \rho (x, y))^S} \end{align*} \end{document}

(Filbir and Mhaskar, 2010; Maggioni and Mhaskar, 2008; Mhaskar; 2011). This kernel localization yields, for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$x \in B_\delta (y_0)$$ \end{document} , \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \mid \sigma_N (\phi f - f, x) \mid \lesssim \sum_{j = 0}^n \omega_j (1 - \phi (x_j)) f (x_j) \frac {N^\alpha} {\max (1, (N \rho (x, x_j))^S} . \end{align*} \end{document}

We split the sum into two parts, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal I}_1 = \{j:x_j \not \in B_{3 \delta / 2} (y_0) \} $$ \end{document} and the remaining indices, on which the summands vanish. Thus, we obtain \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \mid \sigma_N (\phi f - f, x) \mid \lesssim \sum_{j \in {\cal I}_1} \omega_j (1 - \phi (x_j)) f (x_j) \frac {N^\alpha} {\max (1, (N \rho (x, x_j))^S} . \end{align*} \end{document}

Since \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$N \geq \frac {2} {\delta} $$ \end{document} and ρ(x, x_j) ≥ δ/2 on the range that we consider, we obtain \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \mid \sigma_N (\phi f - f, x) \mid \lesssim \sum_{j \in {\cal I}_1} \omega_j (1 - \phi (x_j)) f (x_j) \frac {N^{\alpha - S}} {\rho (x, x_j)^S} . \end{align*} \end{document}

The ball B_δ/₂(x) is contained in B_3δ/2(y₀) so that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal I}_2 = \{j: x_j \not \in B_{\delta / 2} (x) \} $$ \end{document} yields \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \mid \sigma_N (\phi f - f, x) \mid & \lesssim \sum_{j \in {\cal I}_2} \omega_j (1 - \phi (x_j)) f (x_j) \frac {N^{\alpha - S}} {\rho (x, x_j)^S} \\ & \lesssim \sup_{j \in J_2} (f (x_j)) N^{\alpha - S} \sum_{j \in {\cal I}_2} \frac {\omega_j} {\rho (x, x_j)^S} \\ & \lesssim N^{\alpha - S} \sum_{j \in {\cal I}_2} \frac {\omega_j} {\rho (x, x_j)^S}, \end{align*} \end{document}

where we have used that f is bounded. To finalize the proof, we can apply the crude estimate \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\sum\nolimits_{j \in {\cal I}_2} \frac {\omega_j} {\rho (x, x_j)^S} \leq (\frac {2} {\delta})^S$$ \end{document} , which follows from \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\sum\nolimits_{j = 1}^n \omega_j = 1$$ \end{document} . The relation α − S ≤ − s implies the desired inequality \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\sup\nolimits_{x \in B_\delta (y_0)} \mid \sigma_N (\phi f - f, x) \mid \lesssim N^{- s}$$ \end{document} , where the constant may depend on f, y₀, δ, α, and s.

Appendix D. Computing eigenfunctions

The graph Laplacian, as an operator on smooth functions, is \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} ({\cal L}^\varepsilon_M f) (x) = \frac {1} {M} \bigg (\sum_{j = 1}^M k_\varepsilon (x, y_j) f (y_j) - f (x) \sum_{j = 1}^M k_\varepsilon (x, y_j) \bigg) \end{align*} \end{document}

and, when \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{y_i \}_{i = 1}^M$$ \end{document} is uniformly distributed, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal L}^{\varepsilon}_M$$ \end{document} converges almost surely toward \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Delta_{{\mathbb X}}$$ \end{document} as M tends to infinity and ɛ to zero (Lafon, 2004; Nadler et al., 2006; Singer, 2006). Let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\lambda_{M, i}^\varepsilon$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$g_{M, i}^\varepsilon$$ \end{document} be the ith eigenvalue and eigenfunction of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$(2 \pi \varepsilon)^{1 - \alpha / 2}{\cal L}^{\varepsilon}_M$$ \end{document} , respectively, where α is again the dimension of the underlying manifold \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\mathbb X}$$ \end{document} . If λ_i and φ_i are the ith eigenvalue and eigenfunction of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Delta_{{\mathbb X}}$$ \end{document} , then according to Belkin and Niyogi (2006), there exists a sequence ɛ_M → 0 such that, in probability, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} \lim_{M \rightarrow \infty} \mid \lambda_{M, i}^{\varepsilon_M} - \lambda_i \mid = 0, \qquad \lim_{M \rightarrow \infty} \ \parallel g_{M, i}^{\varepsilon_M} - \varphi_i \ \parallel_{L_2} = 0. \end{align*} \end{document}

In order to compute \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$g_{M, i}^\varepsilon$$ \end{document} , we need to relate the spectrum of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal L}^{\varepsilon}_M$$ \end{document} with the spectrum of the matrix \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal L}^{\varepsilon}_M$$ \end{document} introduced in Section 4. It was shown in von Luxburg et al. (2008) that, for fixed ɛ and M, the “interesting” eigenvalues and eigenvectors of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal L}^{\varepsilon}_M$$ \end{document} are in a one-to-one relationship with the “interesting” eigenvalues and eigenfunctions of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal L}^{\varepsilon}_M$$ \end{document} , respectively. In fact, if g is any eigenfunction of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal L}^{\varepsilon}_M$$ \end{document} with eigenvalue λ, then the sampling \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$v = (g (y_1), \ldots, g_(y_M))^\top$$ \end{document} is an eigenvector of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal L}^{\varepsilon}_M$$ \end{document} with eigenvalue Mλ. If λ is in the discrete spectrum of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal L}^{\varepsilon}_M$$ \end{document} , then g is of the form \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} g (x) = \frac {\sum_{i = 1}^M k_\varepsilon (x, y_i) v_i} {\sum_{j = 1}^M k_\varepsilon (x, y_j) - M \lambda} . \tag {9} \end{align*} \end{document}

Conversely, if v is an eigenvector of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal L}^{\varepsilon}_M$$ \end{document} with eigenvalue Mλ such that λ is not in the essential spectrum of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal L}^{\varepsilon}_M$$ \end{document} , then g defined by Equation (9) is an eigenfunction of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal L}^{\varepsilon}_M$$ \end{document} with eigenvalue λ. Note that then g interpolates v, i.e., \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$(g (y_1), \ldots, g_(y_M))^\top = v$$ \end{document} . Thus, we diagonalize \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal L}^{\varepsilon}_M$$ \end{document} for large M and small ɛ, and interpolate via Equation (9), which yields an approximation of the eigenfunctions of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Delta_{{\mathbb X}}$$ \end{document} .

Note that the relation between eigenvectors of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal L}^{\varepsilon}_M$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal L}^{\varepsilon}_M f$$ \end{document} described above are independent on the distribution of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{y_i \}_{i = 1}^M$$ \end{document} . This is useful if \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{y_i \}_{i = 1}^M$$ \end{document} are not uniformly distributed but distributed according to the density p. In this case, \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal L}^{\varepsilon}_M f$$ \end{document} approximates the Fokker-Planck operator \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\Delta + \frac {\Delta p} {p} $$ \end{document} (Coifman et al., 2005b; Nadler et al., 2006).

Appendix E. Polynomials needed for the example on the sphere

E.1. Gegenbauer polynomials

The Gegenbauer polynomials \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} P^{(s)}_k (x) = \sum_{\ell = 0}^{\lfloor k / 2 \rfloor} (- 1)^\ell \frac {\Gamma (k - \ell + s)} {\Gamma (s) \ell! (k - 2 \ell) !} (2x)^{k - 2 \ell}, \end{align*} \end{document}

are orthogonal polynomials on the interval [−1, 1] with respect to the weight function (1 − x²)^s−1/2, where Γ is the usual Gamma function.

E.2. Orthonormal basis for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal H}^d_k$$ \end{document}

In the following, we shall present an explicit basis \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ \varphi_{k, i} \}_{i = 1}^{m_k}$$ \end{document} for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal H}^d_k$$ \end{document} such that we can solve \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\sum\nolimits_{j = 1}^n \omega_j \varphi_{k, i} (x_j) = \delta_{k, 0}$$ \end{document} , for all \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$i = 1, \ldots, m_k$$ \end{document} and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$k = 0, \ldots, R$$ \end{document} , so that \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$N = \sum\nolimits_{k = 0}^R m_k$$ \end{document} .

For \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$m, s \in {\mathbb N}$$ \end{document} , and d > 1, let \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} G^s_m (x_{(d)}) : = \sum_{i = 0}^{\lfloor m / 2 \rfloor} (- 1)^i \frac {\mid x_{(d - 1)} \mid^{2i} x_d^{m - 2i} / (m - 2i) !} {(2, 2)_i (d - 1 + 2s, 2)_i}, \end{align*} \end{document}

where \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$(a, b)_i = a (a + b) \cdots (a + (i - 1) b)$$ \end{document} with the convention (a, b)₀ = 1 and \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$x_{(d)} = (x_1, \ldots, x_d)$$ \end{document} . The collection of polynomials \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} \begin{align*} G_\nu (x_{(d)}) : = G^{\nu_2}_{\nu_1 - \nu_2} (x_{(d)}) \cdots G^{\nu_d}_{\nu_{d - 1} - \nu_d} (x_{(d - 2)}), \end{align*} \end{document}

for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\nu \in {\rm N}^d, \nu_1 \geq \nu_2 \geq \cdots \geq \nu_d = 0, 1$$ \end{document} , forms an orthonormal basis for \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal H}^d_{\nu_1}$$ \end{document} (Karachik, 1998). This basis in hand, we can find the weights \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$\{ \omega_j \}_{j = 1}^n$$ \end{document} that are needed to define σ_N(f, x). For quadratures on S², see Filbir and Themistoclakis (2008), and for other bases of \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $${\cal H}^d_{k}$$ \end{document} , see, for instance, Dunkl and Xu (2001). Note that the polynomial expression of G_ν(x_(d)) can be computed using computer algebra software. Evaluating these polynomials at our data points lead to round-off errors that can be controlled applying standard three-term relations of orthogonal polynomials.

Footnotes

Acknowledgments

M.E. was supported by the NIH/DFG Research Career Transition Awards Program (EH 405/1-1/575910). The research of F.F. was partially funded by Deutsche Forschungsgemeinschaft grant FI 883/3-1. H.N.M. was supported, in part, by grant DMS-0908037 from the National Science Foundation and grant W911NF-09-1-0465 from the U.S. Army Research Office.

Author Disclosure Statement

No competing financial interests exist.

References

Antoine

J.P.

, Vandergheynst

1995. Wavelets on the n-sphere and other manifolds. J. Math. Phys., 7:1013–1104.

Belkin

, Niyogi

2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural. Comput., 15:1373–1396.

Belkin

, Niyogi

2006. Convergence of laplacian eigenmaps, 129–136. Schölkopf

, Platt

J.C.

, Hoffman

NIPS, MIT Press: Cambridge, MA.

Bird

A.C.

, Bressler

N.M.

, Bressler

S.B.

et al. 1995. An international classification and grading system for age-related maculopathy and age-related macular degeneration. The international ARM epidemiological study group. Surv. Ophthalmol., 39:367–374.

Chew

E.Y.

, Lindblad

A.S.

, Clemons

et al. 2009. Summary results and recommendations from the age-related eye disease study. Arch. Ophthalmol., 127:1678–1679.

Chui

C.K.

, Donoho

D.L.

2006. Special issue: Diffusion maps and wavelets. Appl. Comput. Harmon. Anal., 21.

Chui

C.K.

, Wang

J.Z.

2010. Randomized anisotropic transform for nonlinear dimensionality reduction. International J. on Geomathematic., 1:25–50.

Coifman

R.R.

, Lafon

, Lee

A.B.

et al. 2005b. Geometric diffusions as a tool for harmonic analysis and structure definition of data. Part I: Diffusion maps. Proc. Nat. Acad. Sci., 102:7426–7431.

Coifman

R.R.

, Lafon

, Lee

A.B.

et al. 2005a. Geometric diffusions as a tool for harmonic analysis and structure definition of data. Part II: Multiscale methods. Proc. Nat. Acad. Sci., 102:7432–7438.

10.

Coleman

H.R.

, Chan

, Ferris

F.L.

et al. 2008. Age-related macular degeneration. Lancet., 372:1835–1845.

11.

Costa

J.A.

, Hero

A.O.

2004. Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Transactions on Signal Processing, 52:2210–2221.

12.

Dahlke

, Dahmen

, Weinreich

et al. 1995. Multiresolution analysis and wavelets on S² and S³. Numer. Funct. Anal. Optim, 16:19–41.

13.

Dahlke

, Steidl

, Teschke

2004. Coorbit spaces and banach frames on homogeneous spaces with applications to the sphere. Adv. Comput. Math., 21:147–180.

14.

Detrano

, Janosi

, Steinbrunn

et al. 1989. International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology., 64:304–310.

15.

Dobrosotskaya

, Ehler

, King

E.J.

et al. 2010. Sparse representation and variational methods in retinal image processing. Intern. Fed. for Medical & Biological Engineering, Springer Proceedings Series. 26th Southern Biomedical Engineering Conference, 361–364.

16.

Dobrosotskaya

, Ehler

, King

E.J.

et al. 2011. Modeling of the rhodopsin bleaching with variational analysis of retinal images. Proceedings of SPIE, 7962:79624N.

17.

Dunkl

C.F.

, Xu

2001. Orthogonal Polynomials of Several Variables, Number 81. Encyclopedia of Mathematics and its Applications. Cambridge Univ. Press: New York.

18.

Ehler

, Dobrosotskaya

, King

E.J.

et al. 2011a. Modeling photo-bleaching kinetics to map local variations in rod rhodopsin density. SPIE Medical Imaging, Computer-Aided Diagnosis, 79633R.

19.

Ehler

, Kainerstorfer

, Cunningham

et al. 2011b. Extended correction model for optical imaging. IEEE International Conference on Computational Advances in Bio and Medical Sciences, 93–98.

20.

Ehler

, Majumdar

, King

E.J.

et al. 2010. High-resolution autofluorescence imaging for mapping molecular processes within the human retina. Intern. Fed. for Medical & Biological Engineering, Springer Proceedings Series. 26th Southern Biomedical Engineering Conference, 344–347.

21.

Filbir

, Mhaskar

H.N.

2010. A quadrature formula for diffusion polynomials corresponding to a generalized heat kernel. J. Fourier Anal. Appl., 16:629–657.

22.

Filbir

, Mhaskar

H.N.

2011. Marcinkiewicz–Zygmund measures on manifolds. Journal of Complexity, 27:568–596.

23.

Filbir

, Themistoclakis

2008. Polynomial approximation on the sphere using scattered data. Math. Nachr., 281:650–668.

24.

Gavish

, Nadler

, Coifman

R.R.

2010. Multiscale wavelets on trees, graphs and high dimensional data: Theory and applications to semi supervised learning. ICML, 367–374.

25.

Gia

Q.T.L.

, Mhaskar

H.N.

2006. Polynomial operators and local approximation of solutions of eudo-differential operators on the sphere. Numerische Mathematik., 103:299–322.

26.

Gia

Q.T.L.

, Mhaskar

H.N.

2008a. Localized linear polynomial operators and quadrature formulas on the sphere. SIAM J. Numer. Anal., 47,1:440–466.

27.

Gia

Q.T.L.

, Mhaskar

H.N.

2008b. Quadrature formulas and localized linear polynomial operators on the sphere. SIAM J. Numer. Anal., 47:440–466.

28.

Jones

P.W.

, Maggioni

, Schul

2010. Universal local parametrizations via heat kernels and eigenfunctions of the laplacian. Ann. Acad. Sci. Fenn. Math., 35:131–174.

29.

Kainerstorfer

, Amyot

, Ehler

et al. 2010a. Direct curvature correction for non-contact imaging modalities – applied to multi-spectral imaging. J. Biomed. Opt., 15:046013.

30.

Kainerstorfer

, Ehler

, Amyot

et al. 2010b. Principal component model of multi spectral data for near real-time skin chromophore mapping. J. Biomed. Opt., 15:046007.

31.

Kainerstorfer

, Riley

J.D.

, Ehler

et al. 2011. Quantitative principal component model for skin chromophore mapping using multi spectral images and spatial priors. Biomedical Optics Express., 2:1040–1058.

32.

Karachik

V.V.

1998. On one set of orthogonal harmonic polynomials. Proc. Amer. Math. Soc., 126:3513–3519.

33.

Krishnadev

, Meleth

A.D.

, Chew

E.Y.

2010. Nutritional supplements for age-related macular degeneration. Curr. Opin. Ophthalmol., 21:184–9.

34.

Lafon

2004. Diffusion maps and geometric harmonics [Ph.D. thesis] Yale University: New Haven, CT.

35.

Maggioni

, Mhaskar

H.N.

2008. Diffusion polynomial frames on metric measure spaces. Appl. Comput. Harmon. Anal., 24:329–353.

36.

Meyers

S.M.

, Ostrovsky

M.A.

, Bonner

R.F.

2004. A model of spectral filtering to reduce photochemical damage in age-related macular degeneration. Trans. Am. Ophthalmol. Soc., 102:83–93discussion 93–5.

37.

Mhaskar

H.N.

2010. Eignets for function approximation. Appl. Comput. Harmon. Anal., 29:63–87.

38.

Mhaskar

H.N.

2011. A generalized diffusion frame for parsimonious representation of functions on data defined manifolds. Neural Networks., 24:345–359.

39.

Nadler

, Galun

2007. Fundamental limitations of spectral clustering. Neural Information Processing systems., 19:1017–1024.

40.

Nadler

, Lafon

, Coifman

R.R.

et al. 2006. Diffusion maps, spectral clustering and eigenfunctions of Fokker–Planck operators. Weiss

, Schölkopf

, Platt

Adv. Neural Inform. Process. Syst., 18. MIT Press: Cambridge, MA.

41.

Pesenson

I.Z.

, Pesenson

M.Z.

2010. Sampling, filtering and sparse approximations on combinatorial graphs. J. Fourier Anal. Appl., 16:921–942.

42.

Roweis

S.T.

, Saul

L.K.

2000. Nonlinear dimensionality reduction by locally linear embedding. Science., 290:2323–2326.

43.

Saito

2008. Data analysis and representation on a general domain using eigenfunctions of laplacian. Appl. Comput. Harmon. Anal., 25:68–97.

44.

Sikora

2004. Riesz transform, gaussian bounds and the method of wave equation. Math. Z., 247:643–662.

45.

Singer

2006. From graph to manifold Laplacian: The convergence rate. Appl. Comput. Harmon. Anal., 21:128–134.

46.

Starck

J.L.

, Moudden

, Abrial

et al. 2006. Wavelets, ridgelets and curvelets on the sphere. Astronomy & Astrophysics., 446:1191–U62.

47.

Stein

, Weiss

1971. Introduction to Fourier Analysis on Euclidean Spaces Princeton University Press. Princeton, N.J.

48.

Tenenbaum

J.B.

, de Silva

, Langford

J.C.

2000. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319–23.

49.

van Leeuwen

, Klaver

C.C.

, Vingerling

J.R.

et al. 2003. The risk and natural course of age-related maculopathy. Arch. Ophthalmol., 121:519–526.

50.

von Luxburg

, Belkin

, Bousquet

2008. Consistency of spectral clustering. Ann. Stat., 36:555–586.

51.

Wolberg

W.H.

, Mangasarian

O.L.

1990. Multisurface method of pattern separation applied to breast cytology diagnosis. Proceedings of the National Academy of Sciences, 87:9193–9196.

52.

, Guinney

, Maggioni

et al. 2010. Learning gradients: predictive models that infer geometry and dependence. Journal of Machine Learning Research, 11:2175–2198.

Locally Learning Biomedical Data Using Diffusion Frames

Abstract

Abstract

1. Introduction

2. Approximation on Unknown Manifolds

3. Local Approximation of the Query Function

3.1. Local smoothness classes

3.2. Localized summation kernels

4. Specifying φK Numerically

5. Localized Kernels on the Sphere

6. Applications

6.1. Standard biomedical datasets

6.1.1. Cleveland Heart Disease Database: learning disease stages

6.1.2. Wisconsin Breast Cancer Database: data completion

6.2. Age-related macular degeneration

6.2.1. Retinal multi-spectral imaging

7. Conclusions

8. Appendix

Appendix A. Technical assumptions

A.1. Assumption on the volume of balls

A.2. Assumption on products of polynomials

A.3. Assumption on the growth of the heat kernel

A.4. Assumption on the filter

Appendix B. Existence of quadrature weights

Appendix C. Proof of local approximation

Appendix D. Computing eigenfunctions

Appendix E. Polynomials needed for the example on the sphere

E.1. Gegenbauer polynomials

Footnotes

Acknowledgments

Author Disclosure Statement

References

4. Specifying φ_K Numerically