Abstract
Abstract
This discussion provides our reaction to the article by Greven and Scheipl. It contains an overview of their article and a description of the many areas of research that remain open and could benefit from further methodological and computational development.
The contribution
We would like to congratulate the authors on a timely and well-written article that covers an important topic. The authors provide an excellent overview of the existent literature and introduce the necessary concepts to understand, differentiate and apply the various methods proposed in functional regression. Given that functional data analysis (FDA) has been around for more than 50 years, one naturally wonders why a new article on this topic is necessary, what new concepts it introduces and how it could impact the practice of Statistics? We are left with a complex and subtle problem: describing what the article achieves and what it does not.
Against our skeptical instincts, we first describe the novelty and impact of the article. Most importantly, the article provides a complete, credible, state-of-the-art methodology for functional semi-parametric regression using well-tested software. Indeed, after reading this article, an informed reader could easily implement or use the tools described and improve or compare them to other methods. We find that extremely refreshing in a scientific environment where impact should be measured by how available and used the methods are and not by inconsequential, repetitive, and overly complex mathematical theorems. The approach the authors describe is one of the most impactful in practice because they focused on a particular combination of Statistical methodologies that work well together and have been thoroughly tested. This has not been achieved by chance, but by a clever combination of mature, refined and generalizable regression methods.
To be specific, there are three main ideas that support the practical machinery described in the article. The first idea is to use the revolutionary non-parametric regression methods introduced in the landmark articles of (O'Sullivan (1986); Eilers and Marx (1996)) to smooth model coefficients using penalized splines. The practical advantage of penalized splines is due to their flexibility, ease of use and development of automatic estimation of tuning parameters. For these reasons, penalized splines are very popular, flexible and well tested in scientific applications. With few exceptions (Eilers and Marx (2002); Eilers and Marx (2003); Reiss and Ogden (2007)), non-parametric regression and FDA have developed in parallel, which has limited the percolation of the new ideas from non-parametric smoothing into the FDA literature. We outline this link in the simplest case of functional regression when a subject-specific functional predictor Xi(t) is embedded in the linear predictor of an outcome as:
The second idea is that semi-parametric models that include smoothing penalties can be viewed as generalized linear mixed effects models (GLMMs) where the penalized parameters are treated as random effects in a corresponding mixed effects model; see, for example, the monographs (Ruppert et al.(2003)Ruppert, Wand, and Carroll; Wood (2006)) for an in-depth treatment. In a Bayesian context, this is equivalent to assuming a shrinkage prior on the model parameters (e.g., spline coefficients) with a distribution induced by the functional form of the penalty. Most widely used shrinkage priors are simple zero-mean multivariate normal distributions, which are specifically designed for smoothing and are computationally fast. These multivariate normal shrinkage priors correspond to quadratic penalties on the model parameters.
An important consequence of this idea is that smoothing components can be seamlessly integrated with other covariates, various random effects structures, as well as Gaussian or non-Gaussian outcomes. Thus, complex functional regression models can be fit using standard software, such as the excellent
The third idea is that functional data with complex sampling structures (multilevel, longitudinal) can be accomodated using projections on lower dimensional spaces spanned by either fixed bases or bases estimated from the data. The low-dimensional basis coefficients can then be modeled using standard mixed effects models, which fits in naturally with the mixed effects framework for functional regression and complex outcome sampling.
There are several other important contributions of the article including: (a) providing a general formula for functional regression (equation 2.1); (b) describing extensions to additive predictors and non-linear functional effects (end of Section 2.2); (c) proposing the component-wise gradient boosting approach to estimation as a general method for fitting the models (Section 4.2); (d) mentioning the intriguing use of
What remains to be done
Given the exceptional breadth of the article, one should naturally ask whether research in FDA should still be under intense methodological development. While many methodological problems have been solved, we believe that the size, complexity and variety of new applications has led, in fact, to a deficit in methodological development. Indeed, in our experience, every new scientific application we work on requires a non-trivial level of methodological development. Below we identify large classes of problems that require careful and rapid development.
Not all methods described in the article generalize directly to high-dimensional functional data. For example, the spectral power of electroencephalographic data measured in 5-second intervals for eight hours during sleep contains 5760 observations per subject (Crainiceanu et al.(2009)Crainiceanu, Caffo B., and Punjabi; Di et al.(2009)Di, Crainiceanu, B., and Punjabi) and modern data sets, such as the Sleep Heart Health Study (Quan et al.(1997)Quan, Howard, Iber, Kiley, Nieto, OConnor, Rapoport, Redline, Robbins, Samet, and Wahl), contain thousands of subjects. Activity data measured using accelerometers typically contains 1440 ‘activity counts’ for each day for several days and hundreds or thousands of individuals. Examples of such studies include National Health and Nutrition Examination Survey (NHANES) (Troiano et al.(2008)Troiano, Berrigan, Dodd, Msse, Tilert, and McDowell; Koster et al.(2012)Koster, Caserotti, Patel, Matthews, Berrigan, and Van Domelen) and BLSA (Stone and Norris (1966); Schrack et al.(2014)Schrack, Zipunnikov, Goldsmith, Bai, Simonsick, Crainiceanu, and Ferrucci). In these cases, conducting standard functional regression is possible and even fast using the
At the other extreme, functional data can be sampled sparsely, where only a few observations are available for every function. In cases when little is known about the underlying data-generating mechanism or when the data structure cannot be represented well by standard linear mixed effects models it is reasonable to use functional approaches (James et al.(2000)James, Hastie, and Sugar; Yao et al.(2005)Yao, Müller, and Wang; Di et al.(2014)Di, Crainiceanu, and Jank). While, the package
We have found the discussion about the Bayesian analysis of functional data particularly interesting as, in our experience, Bayesian software has lagged behind frequentist software for FDA. The authors correctly point out the literature, which suggests that
Another area of interest is the joint modeling of functional and time to event data (Tsiatis and Davidian (2004); Tseng et al.(2005)Tseng, Hsieh, and Wang). For example, an Intensive Care Unit (ICU) study focused on the association between daily measures of subject-specific Sequential Organ Failure Assessment (SOFA) scores and two outcomes: in-hospital mortality and physical impairment at hospital discharge among survivors (Gellar et al.(2014)Gellar, Colantuoni, Needham, and Crainiceanu; Gellar et al.(2015)Gellar, Colantuoni, Needham, and Crainiceanu). In this study, one is interested in multiple questions: (a) what is the association between SOFA history in the ICU and physical impairment at hospital discharge among survivors? (b) what patterns of SOFA scores are associated with death in the ICU? and (c) what is the probability of survival of a person who is alive in the ICU after a specific number of days given their covariates and SOFA history? This type of data can be densely or sparsely sampled and contains functional observations with unequal domain (e.g., subjects who are alive at discharge have a different length of SOFA history because they were in the ICU for different lengths of time) and censoring (e.g., discharge from the hospital can be viewed as censoring for death). It seems reasonable to extend the framework described by the authors to address such problems, which are increasingly common in applications. For example, the penalized function-on-function regression (pffr) introduced by Ivanescu et al.(2015)Ivanescu, Staicu, Scheipl, and Greven and Scheipl et al.(2015)Scheipl, Staicu, and Greven could be adapted to the case of dynamic functional prediction. This would be useful to predict the entire future in-ICU SOFA score trajectory of a patient at every time point when they are alive.
The authors have described principal component decompositions, which we agree that should be a first line approach, especially in cases when data are high dimensional. However, data can have heterogeneous non-Gaussian marginal distributions that are not well fit by PCA. For example, functional data can have skewed or heavy-tailed marginal distributions, which suggests that additional information may be available. An approach to modeling the entire distribution of the data is presented by Staicu et al.(2012)Staicu, Crainiceanu, and Reich, who suggested to transform the data first to ensure Gaussian marginal distributions and then conduct standard FDA.
An area of research that is currently under rapid methodological development is the modeling and analysis of populations of spatio-temporal processes. An example of such data is provided by studies that collect task or resting state functional Magnetic Resonance Imaging (fMRI); for a comprehensive review of fMRI see Lindquist (2008). Subject-specific spatio-temporal data can be represented as rectangular arrays, where one dimension represents space and the other represents time. Such objects are often massive with a typical fMRI scan containing 1000 00 voxels measured at 200 time points, or 20 million entries. Models for such data need to accomodate its intrinsic complexity and size. While there is an increased body of literature on this topic (Spencer et al.(2001)Spencer, Dien, and Donchin; Dien et al.(2003)Dien, Spencer, and Donchin; Smilde et al.(2005)Smilde, Jansen, Lamers, Van Der Greef, and Timmerman; Allen et al.(2014)Allen, Grosenick, and Taylor; Huang et al.(2016a)Huang, Reiss, Xiao, Zipunnikov, Lindquist, and Crainiceanu), much more is required to establish a coherent, computationally feasible Statistical framework.
In spite of the important advances in FDA, the current state-of-the-art is to extract summary statistics such as the mean, maximum or maximum location and use these summaries to predict outcomes. The reason is that extracting summary statistics is simpler, more intuitive and often beats or competes well with FDA approaches. Moreover, in practice, it is hard to convince a collaborator to switch to a less intuitive approach that requires more technical expertise without providing evidence that the new approach is better. If
The last area of research that may benefit from the framework described by Greven and Scheipl is functional regression with a large number of functional predictors that may have complex sampling structures. An example of such data comes from neurophysiological experiments designed to study the effect of stroke on motion integrity. In the experiment, all participants make 22 reaching motions with both their dominant and non-dominant hands to each of the eight targets for a total 352 motions for each subject (Goldsmith and Kitago (2015); Kitago et al.(2015)Kitago, Goldsmith, Harran, Kane, Berard, Huang, Ryan, Mazzoni, Krakauer, and Huang). A fundamental question is how to quantify the association between these motions and a scalar (or multivariate) health outcome. In this example, the number of functional predictors quickly explodes and one needs to either do selection of functional predictors (Gertheiss et al.(2013)Gertheiss, Maity, and Staicu; \cite Chen16) or identify ways of combining them into single index structures (Li et al.(2010)Li, Wang, and Carroll; Jiang and Wang (2011); Ma (2016)).
Conclusions
Unarguably,
Far from being finished, FDA is flourishing because of the incredible diversity of new problems that are generated by scientific applications. Increasingly, it becomes necessary for Statisticians to dive deeply into the subject matter, understand and enjoy the intricacies of real data analysis and keep pace with technological development. We tried to present several different directions of research inspired by important actual scientific problems. Indeed, there is nothing sadder than an alleged state-of-the-art methodological approach applied to a 30-year-old data set that was over-analyzed and that provides results that nobody in the real world cares about.
The authors of the article are perfect representatives of the new wave of Statisticians who combine solid methodological training with exceptional computational skills and a good sense for what is important. This is the harder way of doing science, but it is the right way.
