Discussion of the paper ‘A general framework for functional regression modelling’

Abstract

This discussion provides our reaction to the article by Greven and Scheipl. It contains an overview of their article and a description of the many areas of research that remain open and could benefit from further methodological and computational development.

Keywords

penalized Functional regression high-dimensional data

1 The contribution

We would like to congratulate the authors on a timely and well-written article that covers an important topic. The authors provide an excellent overview of the existent literature and introduce the necessary concepts to understand, differentiate and apply the various methods proposed in functional regression. Given that functional data analysis (FDA) has been around for more than 50 years, one naturally wonders why a new article on this topic is necessary, what new concepts it introduces and how it could impact the practice of Statistics? We are left with a complex and subtle problem: describing what the article achieves and what it does not.

Against our skeptical instincts, we first describe the novelty and impact of the article. Most importantly, the article provides a complete, credible, state-of-the-art methodology for functional semi-parametric regression using well-tested software. Indeed, after reading this article, an informed reader could easily implement or use the tools described and improve or compare them to other methods. We find that extremely refreshing in a scientific environment where impact should be measured by how available and used the methods are and not by inconsequential, repetitive, and overly complex mathematical theorems. The approach the authors describe is one of the most impactful in practice because they focused on a particular combination of Statistical methodologies that work well together and have been thoroughly tested. This has not been achieved by chance, but by a clever combination of mature, refined and generalizable regression methods.

To be specific, there are three main ideas that support the practical machinery described in the article. The first idea is to use the revolutionary non-parametric regression methods introduced in the landmark articles of (O'Sullivan (1986); Eilers and Marx (1996)) to smooth model coefficients using penalized splines. The practical advantage of penalized splines is due to their flexibility, ease of use and development of automatic estimation of tuning parameters. For these reasons, penalized splines are very popular, flexible and well tested in scientific applications. With few exceptions (Eilers and Marx (2002); Eilers and Marx (2003); Reiss and Ogden (2007)), non-parametric regression and FDA have developed in parallel, which has limited the percolation of the new ideas from non-parametric smoothing into the FDA literature. We outline this link in the simplest case of functional regression when a subject-specific functional predictor Xⁱ(t) is embedded in the linear predictor of an outcome as:

\int_{S} X_{i} (s) β (s) ds .

β (s) = \sum_{k = 1}^{K} B_{k} (s) θ_{k}

where K is the number of bases then

\int_{S} X_{i} (s) β (s) ds = \sum_{k = 1}^{K} a_{ik} θ_{k}

, where

a_{ik} = \int_{S} X_{i} (s) B_{k} (s) ds

are known constants. When one uses penalized splines, the

B_{k} (\cdot)

k = 1, \dots, K

, are K B-spline functions, where K is relatively large. This formula reduces functional regression to basic regression, while problems associated with over-fitting are resolved by inducing a penalty on the coefficients

θ_{k}

k = 1, \dots, K

. In the case of penalized splines, this could be a first- or second-order difference penalty, but other types of bases or penalties could be used, as well. The difference from penalized spline regression is that the regressors

a_{ik} = \int_{S} X_{i} (s) B_{k} (s) ds

are not the same as the B-spline regressors; instead they are obtained as the inner product between the functional predictor

X_{i} (s)

and the B-splines basis functions

B_{k} (s)

. Since this is a well-studied example of regression, models can be expanded to include scalar covariates, additional functional covariates, functional and non-functional interactions and so on. This provides a methodological platform that can be adjusted to the complexity of the problem while using well-tested software.

The second idea is that semi-parametric models that include smoothing penalties can be viewed as generalized linear mixed effects models (GLMMs) where the penalized parameters are treated as random effects in a corresponding mixed effects model; see, for example, the monographs (Ruppert et al.(2003)Ruppert, Wand, and Carroll; Wood (2006)) for an in-depth treatment. In a Bayesian context, this is equivalent to assuming a shrinkage prior on the model parameters (e.g., spline coefficients) with a distribution induced by the functional form of the penalty. Most widely used shrinkage priors are simple zero-mean multivariate normal distributions, which are specifically designed for smoothing and are computationally fast. These multivariate normal shrinkage priors correspond to quadratic penalties on the model parameters.

An important consequence of this idea is that smoothing components can be seamlessly integrated with other covariates, various random effects structures, as well as Gaussian or non-Gaussian outcomes. Thus, complex functional regression models can be fit using standard software, such as the excellent mgcv package (Wood (2000); Wood (2003); Wood (2004); Wood (2011)) in R (R Core Team (2014)). This allows for functional regression estimation, inference and prediction to be conducted in the mixed effects inferential framework. Translating a large class of functional regression models into GLMMs provides a new and powerful approach for FDA. It is our experience that in the simplest case of scalar on function regression, most reasonable approaches (e.g., penalized splines, local polynomials, kernel smoothing) provide similar results. However, when one is interested in more complex modelling, penalized splines embedded in the GLMM framework work much better.

The third idea is that functional data with complex sampling structures (multilevel, longitudinal) can be accomodated using projections on lower dimensional spaces spanned by either fixed bases or bases estimated from the data. The low-dimensional basis coefficients can then be modeled using standard mixed effects models, which fits in naturally with the mixed effects framework for functional regression and complex outcome sampling.

There are several other important contributions of the article including: (a) providing a general formula for functional regression (equation 2.1); (b) describing extensions to additive predictors and non-linear functional effects (end of Section 2.2); (c) proposing the component-wise gradient boosting approach to estimation as a general method for fitting the models (Section 4.2); (d) mentioning the intriguing use of JAGS (Plummer (2003)) to conduct Bayesian inference (Section 4.4); (e) pointing up important problems related to identifiability of parameters (Sections 5.1 and 5.2); (f) introducing regression for a wide variety of outcomes and loss functions; (g) providing useful insights into the computational complexity of various approaches (Section 5.3) and (h) conducting very interesting analyses of the multiple sclerosis data.

2 What remains to be done

Given the exceptional breadth of the article, one should naturally ask whether research in FDA should still be under intense methodological development. While many methodological problems have been solved, we believe that the size, complexity and variety of new applications has led, in fact, to a deficit in methodological development. Indeed, in our experience, every new scientific application we work on requires a non-trivial level of methodological development. Below we identify large classes of problems that require careful and rapid development.

Not all methods described in the article generalize directly to high-dimensional functional data. For example, the spectral power of electroencephalographic data measured in 5-second intervals for eight hours during sleep contains 5760 observations per subject (Crainiceanu et al.(2009)Crainiceanu, Caffo B., and Punjabi; Di et al.(2009)Di, Crainiceanu, B., and Punjabi) and modern data sets, such as the Sleep Heart Health Study (Quan et al.(1997)Quan, Howard, Iber, Kiley, Nieto, OConnor, Rapoport, Redline, Robbins, Samet, and Wahl), contain thousands of subjects. Activity data measured using accelerometers typically contains 1440 ‘activity counts’ for each day for several days and hundreds or thousands of individuals. Examples of such studies include National Health and Nutrition Examination Survey (NHANES) (Troiano et al.(2008)Troiano, Berrigan, Dodd, Msse, Tilert, and McDowell; Koster et al.(2012)Koster, Caserotti, Patel, Matthews, Berrigan, and Van Domelen) and BLSA (Stone and Norris (1966); Schrack et al.(2014)Schrack, Zipunnikov, Goldsmith, Bai, Simonsick, Crainiceanu, and Ferrucci). In these cases, conducting standard functional regression is possible and even fast using the refund package (Huang et al.(2016b)Huang, Scheipl, Goldsmith, Gellar, Harezlak, McLean, Swihart, Xiao, Crainiceanu, and Reiss) in R, but including functional random effects, as proposed by the authors, runs into computational problems. Dimensionality reduction solutions, including principal component analysis (PCA), need to be explored and assessed in these situations. As dimensions of the data increase even further to millions of observations in imaging or genomics, it becomes clear that a better integration between dimensionality reduction techniques and inferential approaches is necessary.

At the other extreme, functional data can be sampled sparsely, where only a few observations are available for every function. In cases when little is known about the underlying data-generating mechanism or when the data structure cannot be represented well by standard linear mixed effects models it is reasonable to use functional approaches (James et al.(2000)James, Hastie, and Sugar; Yao et al.(2005)Yao, Müller, and Wang; Di et al.(2014)Di, Crainiceanu, and Jank). While, the package refund has a powerful suite of functions dedicated to FDA with sparse and/or missing observations, it would be nice to know what are the current limits of the associated methods and software. More generally, it would be interesting to see a comparison of FDA software in terms of computation time, type of problems that can be addressed, generalizability, modularity, statistical testing, inferential performance and ease of use. It would be very nice to have a table that provides a cross tabulation of different capabilities by published software.

We have found the discussion about the Bayesian analysis of functional data particularly interesting as, in our experience, Bayesian software has lagged behind frequentist software for FDA. The authors correctly point out the literature, which suggests that BayesX (Adler et al.(2013)Adler, Kneib, Lang, Umlauf, and Zeileis; Belitz et al.(2013)Belitz, Brezger, Kneib, Lang, and Umlauf) or JAGS (Plummer (2003)) combined with the mixed effects framework presented by the authors is a particularly low hanging fruit. Such a cobination would take advantage of the Bayesian inferential machinery and may provide ideas for up scaling computations.

Another area of interest is the joint modeling of functional and time to event data (Tsiatis and Davidian (2004); Tseng et al.(2005)Tseng, Hsieh, and Wang). For example, an Intensive Care Unit (ICU) study focused on the association between daily measures of subject-specific Sequential Organ Failure Assessment (SOFA) scores and two outcomes: in-hospital mortality and physical impairment at hospital discharge among survivors (Gellar et al.(2014)Gellar, Colantuoni, Needham, and Crainiceanu; Gellar et al.(2015)Gellar, Colantuoni, Needham, and Crainiceanu). In this study, one is interested in multiple questions: (a) what is the association between SOFA history in the ICU and physical impairment at hospital discharge among survivors? (b) what patterns of SOFA scores are associated with death in the ICU? and (c) what is the probability of survival of a person who is alive in the ICU after a specific number of days given their covariates and SOFA history? This type of data can be densely or sparsely sampled and contains functional observations with unequal domain (e.g., subjects who are alive at discharge have a different length of SOFA history because they were in the ICU for different lengths of time) and censoring (e.g., discharge from the hospital can be viewed as censoring for death). It seems reasonable to extend the framework described by the authors to address such problems, which are increasingly common in applications. For example, the penalized function-on-function regression (pffr) introduced by Ivanescu et al.(2015)Ivanescu, Staicu, Scheipl, and Greven and Scheipl et al.(2015)Scheipl, Staicu, and Greven could be adapted to the case of dynamic functional prediction. This would be useful to predict the entire future in-ICU SOFA score trajectory of a patient at every time point when they are alive.

The authors have described principal component decompositions, which we agree that should be a first line approach, especially in cases when data are high dimensional. However, data can have heterogeneous non-Gaussian marginal distributions that are not well fit by PCA. For example, functional data can have skewed or heavy-tailed marginal distributions, which suggests that additional information may be available. An approach to modeling the entire distribution of the data is presented by Staicu et al.(2012)Staicu, Crainiceanu, and Reich, who suggested to transform the data first to ensure Gaussian marginal distributions and then conduct standard FDA.

An area of research that is currently under rapid methodological development is the modeling and analysis of populations of spatio-temporal processes. An example of such data is provided by studies that collect task or resting state functional Magnetic Resonance Imaging (fMRI); for a comprehensive review of fMRI see Lindquist (2008). Subject-specific spatio-temporal data can be represented as rectangular arrays, where one dimension represents space and the other represents time. Such objects are often massive with a typical fMRI scan containing 1000 00 voxels measured at 200 time points, or 20 million entries. Models for such data need to accomodate its intrinsic complexity and size. While there is an increased body of literature on this topic (Spencer et al.(2001)Spencer, Dien, and Donchin; Dien et al.(2003)Dien, Spencer, and Donchin; Smilde et al.(2005)Smilde, Jansen, Lamers, Van Der Greef, and Timmerman; Allen et al.(2014)Allen, Grosenick, and Taylor; Huang et al.(2016a)Huang, Reiss, Xiao, Zipunnikov, Lindquist, and Crainiceanu), much more is required to establish a coherent, computationally feasible Statistical framework.

In spite of the important advances in FDA, the current state-of-the-art is to extract summary statistics such as the mean, maximum or maximum location and use these summaries to predict outcomes. The reason is that extracting summary statistics is simpler, more intuitive and often beats or competes well with FDA approaches. Moreover, in practice, it is hard to convince a collaborator to switch to a less intuitive approach that requires more technical expertise without providing evidence that the new approach is better. If $X_{i} (t)$ is a functional predictor and $Z_{il}$ , $l = 1, \dots, L$ , are scalar summaries of the function $X_{i} (t)$ then the simplest linear predictor models that need to be considered are:

\sum_{l = 1}^{L} Z_{il} γ_{l} + \int_{S} X_{i} (s) β (s) ds .

Thus, it is important to quantify the improvement in the model likelihood by the addition of the functional predictor over and above the functional summaries and, possibly, other covariates. Some initial approaches to testing in FDA exist (Zhang and Chen (2007); Swihart et al.(2014)Swihart, Goldsmith, and Crainiceanu; Staicu et al.(2015)Staicu, S.N., and Carroll; Kong et al.(2016)Kong, Staicu, and Maity), but there is a need to deploy and validate testing within existent software platforms. From a practical perspective, the best use of non-parametric approaches for fitting

\int_{S} X_{i} (s) β (s) ds

may be to suggest simple parametric models, identify predictive functional features or quantify the remaining information in the functional component.

The last area of research that may benefit from the framework described by Greven and Scheipl is functional regression with a large number of functional predictors that may have complex sampling structures. An example of such data comes from neurophysiological experiments designed to study the effect of stroke on motion integrity. In the experiment, all participants make 22 reaching motions with both their dominant and non-dominant hands to each of the eight targets for a total 352 motions for each subject (Goldsmith and Kitago (2015); Kitago et al.(2015)Kitago, Goldsmith, Harran, Kane, Berard, Huang, Ryan, Mazzoni, Krakauer, and Huang). A fundamental question is how to quantify the association between these motions and a scalar (or multivariate) health outcome. In this example, the number of functional predictors quickly explodes and one needs to either do selection of functional predictors (Gertheiss et al.(2013)Gertheiss, Maity, and Staicu; \cite Chen16) or identify ways of combining them into single index structures (Li et al.(2010)Li, Wang, and Carroll; Jiang and Wang (2011); Ma (2016)).

3 Conclusions

Unarguably, R is the most important contribution that Statistics has provided to society over the past 20 years. The ability to compute and interact intelligently with the computer is at the very core of how data scientists conduct their business. For better or worse, students who sit in a presentation and think that they learned something new want to ‘fire’ up their computer and check to see whether methods are implementable and useful. Thus, ‘methods that do not compute’ are quickly abandoned while ‘methods that do compute’ are adopted. We contend that Statistical methods should be defined as ‘methods that do compute’. This is one of the main reasons why the authors should be congratulated. Indeed, the article is accompanied by an excellent supplement, which provides explicit ways to conduct computing in R together with a user-friendly software package. A more tutorial-like approach, where methods are inter weaved with examples in R, may have been even more useful and impactful.

Far from being finished, FDA is flourishing because of the incredible diversity of new problems that are generated by scientific applications. Increasingly, it becomes necessary for Statisticians to dive deeply into the subject matter, understand and enjoy the intricacies of real data analysis and keep pace with technological development. We tried to present several different directions of research inspired by important actual scientific problems. Indeed, there is nothing sadder than an alleged state-of-the-art methodological approach applied to a 30-year-old data set that was over-analyzed and that provides results that nobody in the real world cares about.

The authors of the article are perfect representatives of the new wave of Statisticians who combine solid methodological training with exceptional computational skills and a good sense for what is important. This is the harder way of doing science, but it is the right way.

References

Adler

Kneib

Lang

Umlauf

Zeileis

(2013) BayesXsrc: R package distribution of the BayesX C++ sources , R package ver- sion 2.1-2., URL http://CRAN.R-project.org/package=BayesXsrc (last accessed 6 January 2017).

Allen

Grosenick

Taylor

(2014) A generalized least squares matrix decomposition. Journal of the American Statistical Association , 109, 145–59.

Belitz

Brezger

Kneib

Lang

Umlauf

(2013) BayesX: Software for Bayesian inference in structured additive regression models , Version 2.1., URL http://www.BayesX.org/ (last accessed 6 January 2017).

Chen

Goldsmith

Ogden

(2016) A generalized least squares matrix decomposition. Variable Selection in Function-on-Scalar Regression , 5, 88–101.

Crainiceanu

Caffo

Punjabi

(2009) Nonparametric signal extraction and measurement error in the analysis of electroencephalographic data. Journal of the American Statistical Association , 104, 541–55.

Crainiceanu

Punjabi

(2009) Multilevel functional principal component analysis. The Annals of Applied Statistics , 3, 458–88.

Crainiceanu

Jank

(2014) Multi- level sparse functional principal component analysis. Stat , 3, 126–43. doi 10.1002/sta4.50. URL https://dx-doi-org.web.bisu.edu.cn/10.1002/sta4.50 (last accessed 6 January 2017).

Dien

Spencer

Donchin

(2003) Localization of the event-related potential novelty response as deffined by principal components analysis. Cognitive Brain Research , 17, 637–50.

Eilers

Marx

(1996) Flexible smoothing with b-splines and penalties. Statistical Science , 11, 89–121.

10.

Eilers

Marx

(2002) Generalized linear additive smooth structures. Journal of Computational and Graphical Statistics , 11,758–83.

11.

Eilers

Marx

(2003) Multivariate calibra-tion with temperature interaction using two-dimensional penalized signal regression. Chemometrics and intelligent laboratory systems , 66, 159–74.

12.

Gellar

Colantuoni

Needham

Crainiceanu

(2014) Variable-domain functional regression for modeling icu data. Journal of the American Statistical Association , 109, 1425–39.

13.

Gellar

Coljantuoni

Needham

Crainiceanu

(2015) Cox regression models with functional covariates for survival data. Statistical modelling , 15, 256–78.

14.

Gertheiss

Maity

Staicu

(2013) Variable selection in generalized functional linear model. Stat , 2, 86–101.

15.

Goldsmith

Kitago

(2015) Assessing systematic effects of stroke on motor control using hierarchical function-on-scalar regression. Journal of the Royal Statistical Society: Series C , 65, 215–36.

16.

Huang

Reiss

Xiao

Zipunnikov

Lindquist

Crainiceanu

(2016a) Two-way principal component analysis for matrix-variate data, with an application to functional magnetic resonance imaging data. Biostatistics . Available at https://www.ncbi.nlm.nih.gov/pubmed/27578805

17.

Huang

Scheipl

Goldsmith

Gellar

Harezlak

McLean

Swihart

Xiao

Crainiceanu

Reiss

(2016b). Refund: Regression with functional data . URL https://CRAN.R-project.org/package=refund. R.

18.

Ivanescu

Staicu

Scheipl

Greven

(2015) Penalized function-on-function regression. Computational Statistics , 30, 539–68.

19.

James

Hastie

Sugar

(2000) Principal component models for sparse functional data. Biometrika , 87, 587–602.

20.

Jiang

Wang

(2011) Functional single index models for longitudinal data. The Annals of Statistics , 39, 362–88.

21.

Kitago

Goldsmith

Harran

Kane

Berard

Huang

Ryan

Mazzoni

Krakauer

Huang

(2015) Robotic therapy for chronic stroke: general recovery of impairment or improved task-specific skill? Journal of Neurophysiology , 114, 1885–94.

22.

Kong

Staicu

Maity

(2016) Classical testing in functional linear models. Journal of Nonparametric Statistics , 28, 813–38.

23.

Koster

Caserotti

Patel

Matthews

Berrigan

Van Domelen

(2012) Association of sedentary time with mortality independent of moderate to vigorous physical activity. PLoS ONE , 7(6):e37696. Doi:10.1371/journal.pone.0037696.

24.

Wang

Carroll

(2010) Generalized functional linear models with semiparametric single-index interactions. Journal of the American Statistical Association , 105, 621–33.

25.

Lindquist

(2008) The statistical analysis of fMRI data. Statistical Science , 23, 439–64.

26.

(2016) Estimation and inference in functional single-index models. Annals of the Institute of Statistical Mathematics , 68, 181–208.

27.

O'Sullivan

(1986) A statistical perspective on ill-posed inverse problems. Statistical Science , 1, 502–18.

28.

Plummer

(2003) JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. In Hornik

Leisch

Zeileis

(eds) Proceedings of the 3rd International Workshop on Distributed Sta- tistical Computing (DSC 2003), Vienna. Available at https://www.r-project.org/conferences/DSC-2003/Drafts/Plummer.pdf

29.

Quan

Howard

Iber

Kiley

Nieto

OConnor

Rapoport

Redline

Robbins

Samet

Wahl

(1997). The sleep heart health study: Design, rationale, and methods. Sleep , 20, 1077–85.

30.

R Core Team (2014) R: A language and environment for statistical computing . R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/ (last accessed 6 January 2017).

31.

Reiss

Ogden

(2007) Functional principal component regression and functional partial least squares. Journal of the American Statistical Association , 102, 984–96.

32.

Ruppert

Wand

Carroll

(2003) Semiparametric regression . Cambridge: Cambridge University Press.

33.

Scheipl

Staicu

Greven

(2015) Functional additive mixed models. Journal of Computational and Graphical Statistics , 24, 477–501.

34.

Schrack

, Zipunnikov

Goldsmith

Bai

Simonsick

Crainiceanu

Ferrucci

(2014) Assessing the physical cliff: Detailed quantification of age-related differences in daily patterns of physical activity. Journals of Gerontology Series A: Biological Sciences and Medical Sciences , 69, 973–79.

35.

Smilde

Jansen

JJHH

Lamers

Van Der Greef

Timmerman

(2005) Anova-simultaneous component analysis (ASCA): A new tool for analyzing designed metabolomics data. Bioinformatics , 21, 3043–48.

36.

Spencer

Dien

Donchin

(2001) Spatiotemporal analysis of the late erp responses to deviant stimuli. Psychophy-siology , 38, 343–58.

37.

Staicu

Crainiceanu

Reich

Ruppert

(2012) Modeling functional data with spatially heterogeneous shape characteristics. Biometrics , 68, 331–43.

38.

Staicu

Serban

Carroll

(2015) Significance tests for functional data with complex dependence structure. Journal of Statistical Planning and Inference , 156, 1–13.

39.

Stone

Norris

(1966) Activities and attitudes of participants in the Baltimore longitudinal study. Journal of Gerontology , 21, 575–80.

40.

Swihart

Goldsmith

Crainiceanu

(2014) Restricted likelihood ratio tests for functional effects in the functional linear model. Technometrics , 56, 483–93.

41.

Troiano

Berrigan

Dodd

Msse

Tilert

McDowell

(2008) Physical activity in the united states measured by accelerometer. Medicine & Science in Sports & Exercise , 40, 181–88.

42.

Tseng

Hsieh

Wang

(2005) Joint modelling of accelerated failure time and longitudinal data. Biometrika , 92, 587–603.

43.

Tsiatis

Davidian

(2004) Joint modeling of longitudinal and time-to-event data: an overview. Statistica Sinica , 14, 809–34.

44.

Wood

(2000) Modelling and smoothing parameter estimation with multiple quadratic penalties. Journal of the Royal Statistical Society (B) , 62, 413–28.

45.

Wood

(2003) Thin-plate regression splines. Journal of the Royal Statistical Society (B) , 65, 95–114.

46.

Wood

(2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. Journal of the American Statistical Association , 99, 673–86.

47.

Wood

(2006) Generalized additive models: An introduction with R . Boca Raton, FL: Chapman & Hall CRC.

48.

Wood

(2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B) , 73, 3–36.

49.

Yao

Mller

Wang

(2005) Functional data analysis for sparse longitudinal data. Journal of the American Statistical Association , 100, 577–90.

50.

Zhang

Chen

(2007) Statistical inferences for functional data. The Annals of Statistics , 35, 1052–79.