Abstract
Abstract
We congratulate the authors for their excellent work that provides a clear overview of the large and now mature field of regression models for functional data. We here complement their discussion indicating some directions of further research that we deem particularly important.
Introduction
Functional data analysis is now a days one of the most active areas of research in statistics, and regression models are of course fundamental analytic tools of any statistician. This explains the strong interest in the development of regression models for functional data; these past few years have seen an impressive number of contributions in this field by several well-known authors. Greven and Scheipl give an overview of this vast and mature field, presenting the various models proposed in literature under a unifying framework, describing various estimation techniques and discussing critical issues, as well as highlighting aspects that still needs further investigation. Their work is very instructive and timely, and we warmly congratulate the authors for it.
We here would like to contribute to the presentation of Greven and Scheipl by complementing it on some crucial points. Some of these points, such as registration of the functional data (Section 2) and inference within the regression model (Section 3), have already been touched upon by the authors, but we would like to stress them further. Other aspects are instead additional to their discussion; in particular, we will consider the possible dependence among the functional data (Section 4), the use of depth measures within the functional regression model (Section 5), the case where L2 is not the appropriate functional space (Section 6), the case where the functional data are surfaces or spatial fields possibly observed over complex domains (Section 7), and finally the inclusion of prior knowledge via the regularizing terms (Section 8). Our contribution is driven by specific interests of our research group in applied statistics at MOX, Department of Mathematics, Politecnico di Milano.
Misalignement of functional data
As the authors correctly point out in their work, one critical aspect is the possible misalignment of the functional data. In fact, if not appropriately taken into account, the misalignment may act as a confounding factor in the data analysis. The literature on registration of functional data has grown significantly over the last years; see, for instance, the review by Marron et al.(2015) and the Special Issue of the Electronic Journal of Statistics on ‘Statistics of Time Warpings and Phase Variations’ (see Marron et al. (2014) and subsequent papers), which collects more that 30 papers on this topics, centred around the analysis of 4 benchmark datasets.
Registration of the functional data is most of the time carried out as a preprocessing step, and subsequent analyses are performed without further consideration of the registration of the data, often discarding the phase variation, although this might contain important information for the problem under analysis. We agree with the authors that, whenever possible, the registration of the data should be integrated with the data analysis to avoid losing any information. Some examples in this respect are offered by Kneip and Ramsay (2008), who combine registration and principal components analysis; Sangalli et al. (2010, 2014), who jointly perform registration and clustering of functional data; and Hadjipantelis et al. (2014, 2015), who integrate registration within a regression analysis. Much still remains to be developed on this topic.
Well posedness of the regression model and inference
A key point to be aware of when fitting functional regression models concerns the inferential properties of the parameter estimators. In particular, parameter estimation involves solving a possibly ill-posed problem. A standard approach to perform estimation and inference on regression parameters is explicitly based on functional principal components analysis (FPCA) and, consequently, on spectral decomposition in terms of eigenvalues and eigenfunctions. Despite the fact that FPCA, or analogous projection methods, are often effective and straightforward to be applied to the analysis of functional data, the choice of the number of basis components remains something subjective and are not always properly justified. Moreover, dimensional reduction methods per se do not ensure the proper estimation of the regression parameters. In fact, the classical procedures do not automatically ensure unbiasedness of the estimators, neither of the estimator of the true functional coefficient nor of its projection on the correspondent sub-space identified by the chosen basis; see, for example, Ghiglietti et al.(2015). We believe these aspects should be further investigated.
Another important aspect is hypothesis testing on functional regressors. In this respect, besides the more classical global approach to null hypothesis testing (see, for example, Horváth and Kokoszka (2012)), an interesting novel line of investigation concerns local tests that aim at detecting specific portions of the functional domain where the effect of each regressor is possibly significant, controlling the family-wise error rate on the entire functional domain. A recent example is given in Abramowicz et al.(2014) that employs the interval testing procedure proposed in Pini and Vantini (2016) within a functional ANOVA model.
Dependent functional data
The functional data may display dependencies among them, either temporal or spatial. If present, the dependence should be appropriately modelled in the regression setting.
In the context of spatially dependent functional data with covariates, approaches based on universal kriging (Caballero et al.(2013)) or kriging with external drift (Ignaccolo et al.(2014)) may be useful, possibly with the estimation of the drift via generalized least squares (Menafoglio et al.(2013); Menafoglio et al.(2014)). Another attempt to analyze spatially dependent functional data with scalar (space–time varying) covariates is explored in Bernardi et al.(2016), which propose a regression model with differential regularizations.
In the context of time dependent functional data, autoregressive models are appropriate; see, for example, Horváth and Kokoszka (2012) and references therein, as well as the multivariate functional autoregressive model with scalar covariates in Canale and Vantini (2016).
Use of depth measures within functional regression models
When dealing with functional or high-dimensional data, parametric assumptions on the stochastic processes that generate data are in general very restrictive and difficult to check. So many non-parametric tools have been recently proposed, to visualize, describe and analyze functional data, and this can be profitably included within regression models. In particular, the notion of depth measure for functional data has been introduced in Lopez-Pintado and Romo (2007); Lopez-Pintado and Romo (2009), and generalized to multivariate functional data in Lopez-Pintado and Romo (2007); Lopez-Pintado and Romo (2009)Ieva and Paganoni (2013); Claeskens et al.(2014); Lopez-Pintado et al.(2014). Depth measures consider the functional datum as a whole, without projecting them into a finite dimensional space, providing a useful ranking of the sampled curves and allowing non-parametric rank tests to compare distributions of different samples. As shown, for instance, in Tarabelloni et al.(2015), these measures can be usefully employed to define a suitable scalar covariate to be used in (generalized) functional regression models.
When L2 is not the appropriate functional space
Almost all functional data analyses assume that the data, or better their functional estimates, belong to an L2 space. But in some cases, the functional data are constrained, for instance, these may be monotone functions, positive functions or density functions, and hence live in suitable subspaces of L2 or in different functional spaces. In these cases, assuming that the functional estimates of the data live in L2 is inappropriate and may lead to meaningless results. Some promising first works in this respect are offered by Chen and Müller (2012) and Hron et al.(2016) that discuss innovative FPCA techniques. Menafoglio et al.(2014) proposes a universal kriging approach for space dependent density functions. The development of regression models for functional data living in other functional spaces than L2 is certainly a very interesting line of investigation.
When the functional data are surfaces or spatial fields, possibly observed over complex domains
As also mentioned by the authors, a standard approach to handle functional data that are surfaces or spatial fields, such as images for instance, is to represent them on a basis obtained by tensor product of univariate basis, or by bidimensional kernels or wavelets. These techniques are naturally defined on tensorized domains and do not efficiently deal with data defined on more complex domains, when the shape of the domain is important for the phenomenon under study. Think for instance to neuroimaging data; to appropriately handle these data, it would certainly be desirable to comply with the formidably complex shape of the brain, instead of using techniques that inevitably smooth across its border. The same applies to most medical imaging data and, more generally, to many applications in the life sciences, as well as in physics, geo-sciences and in engineering.
Sangalli et al.(2013) propose regression models with differential regularization that make use of expansions in finite element basis. These bases allow to describe and efficiently handle domains with very complicated geometries, such as strong concavities and internal holes, as well as curved domains. Building on these regression models, Lila et al.(2016) propose a FPCA technique for signals observed over bidimensional manifolds and apply it to the study of neuroimaging data measuring neural activity in the cerebral cortex, the highly folded thin sheet of neuronal tissue that constitute the outermost part of the brain and where most neural activity is focused. The use of finite elements and other advanced bases will play a crucial role in the analysis of high-dimensional functional data, when tensor product bases and other standard multidimensional bases are inappropriate.
Including prior knowledge via the regularizing terms
In the literature concerning functional regression models, simple differential regulari- zations are commonly used. However, when prior knowledge is available about the problem under study, which can be formalized via a differential model governing the phenomenon behaviour, then this can be profitably included in the regularizing terms. Ramsay and Silverman (2005,Chapter 21), and Azzimonti et al.(2015) explore this idea which can be advantageously used in functional regression models.
