Multiple smoothing parameters selection in additive regression quantiles

Abstract

We propose an iterative algorithm to select the smoothing parameters in additive quantile regression, wherein the functional forms of the covariate effects are unspecified and expressed via B-spline bases with difference penalties on the spline coefficients. The proposed algorithm relies on viewing the penalized coefficients as random effects from the symmetric Laplace distribution, and it turns out to be very efficient and particularly attractive with multiple smooth terms. Through simulations we compare our proposal with some alternative approaches, including the traditional ones based on minimization of the Schwarz Information Criterion. A real-data analysis is presented to illustrate the method in practice.

Keywords

additive quantile regression P-splines Schall algorithm Flexible modelling semiparametric quantile regression

1 Introduction

Quantile regression (QR) is nowadays a well-established framework in observational studies when interest is in modelling the quantiles of the response variable as a function of one or more covariates (Austin et al., 2005; Li et al., 2010). Implementation of QR was first discussed by Koenker and Bassett (1978), and since then many tutorials and books have appeared in the literature, the most notable being the one by Koenker (2005). Some advantages of QR with respect to the more usual mean regression include robustness to possible outliers and influential observations, and the ability to provide a complete picture of the response conditional distribution, see Austin et al., (2005) and Waldmann (2018) for a gentle introduction and discussion.

The linear QR model belongs to the mainstream of statistical methodology now, and the availability of specialized software makes it as simple to use as the familiar mean regression model. See the quantreg package in R, the quantreg procedure in SAS and the qreg command in Stata, all of them focusing mainly on linear QR. However, when linearity is questionable, alternative and more flexible approaches should be considered. Nonparametric or flexible modelling in QR is crucial in data analysis with important and noteworthy applications in many fields, the most famous one being growth charts (Cole and Green, 1992; Wei et al., 2006; Li et al., 2010; Muggeo et al., 2013). Here the anthropometric variable of main interest, for example, weight or height, is regressed on age, and a flexible relationship has to be fitted to obtain ‘reference’ values; the ultimate goal could be to identify observations out of such reference intervals. Unfortunately, flexible modelling of nonlinear effects within QR appears to be somewhat limited, especially when multiple smooth, yet unspecified, relationships have to be included in the same regression equation. In this context, the gap with the corresponding counterparts for mean regression, such as the generalized additive models (Wood, 2006), is somewhat substantial. A possible reason limiting the widespread usage of additive QR is the lack of efficient algorithms able to fit additive QR with automatic choice of the smoothing parameters.

Nonparametric smoothing in QR has been discussed using different techniques, such as kernel (Xiang, 1996; Yu and Jones, 1998; Liu et al., 2019), smoothing splines from both a frequentist as well as Bayesian paradigm (Thompson et al., 2010) and low rank splines with or without penalty (Wei et al., 2006; He and Shi, 1994; Ng and Maechler, 2007). Unpenalized splines strongly depend on the number and the positions of knots, and therefore these are not recommended, especially when there exist regions in the covariate range with few observations. Penalized splines are effective to deal with such “unlucky” data configurations: Bosch et al. (1995) express the problem in terms of cubic smoothing splines with an associated quadratic penalty, Koenker et al. (1994) also use smoothing splines, but with a total variation penalty on the first derivative and Ng and Maechler (2007) use splines with $L_{1}$ or $L_{\infty}$ roughness measures. Bollaerts et al. (2006) employ B-splines with a $L_{1}$ penalty on the first- or second-order differences of the coefficients. All the aforementioned papers proposing the penalized approach rely on a smoothing parameter which has to be selected for producing the final fit. Methods to select the smoothing parameter include the traditional cross-validation (CV; Bollaerts et al., 2006) and the Schwarz Information Criterion (SIC; Koenker et al., 1994; Koenker, 2011). Additional less-known approaches to select the smoothing parameter in QR are the generalized approximate CV understood as an approximation of the generalized comparative Kullback–Leiber distance for quantile smoothing splines (Yuan, 2006) and the relatively new criterion based on the $L$ -curve originally proposed by Hansen (1992), discussed by Frasso and Eilers (2015), and employed in the nonparametric QR by Andriyana et al. (2014). All these methods are far from being efficient because they typically rely on grid search. Namely one fixes a pre-specified grid of smoothing parameter values and fits the model for every candidate value in the grid. Then, the final fit is selected according to the best value of the criterion. There are two possible pitfalls with this approach: The optimal smoothing parameter value may depend on the number and position of the “candidate” values and, more importantly, the computational burden becomes particularly expensive when the regression equation involves multiple additive components with the consequent multidimensional grid of smoothing parameters. To avoid grid search, alternatively one could employ some numerical optimization methods to optimize the objective as a function of the smoothing parameters only, for instance via the simulated annealing option of optim in R, as in Koenker (2011). However, the computational load still remains an issue.

To the best of our knowledge there are some R packages implementing nonparametric QR. Without the presumption of being exhaustive, we cite some options. The well-known quantreg package (Koenker, 2016) includes the function rqss() which allows multiple smooth terms via qss, but it does not include automatic selection of the smoothing parameters. The R package cobs (Ng and Maechler, 2016) has a built-in function to perform automatic selection of smoothing parameter via the SIC, but it works just with a single covariate. For additive QR models, we mention the R packages mboost (Hothorn et al., 2018; Mayr and Hofner, 2018) for a boosting-based approach and the recent qgam (Fasiolo et al., 2018) relying on the extended log- $F$ distribution for the response. Both packages make use of P-splines, but not of $L_{1}$ penalty.

The goal of this article is to set up an efficient algorithm to fit additive QR models with $L_{1}$ penalties where multiple smooth terms are included with corresponding smoothing parameters to be estimated. The approach is similar to Geraci and Bottai (2007), in that some coefficients are viewed as random effects coming from a Laplace distribution. However, unlike Geraci and Bottai (2007), we deal with smoothing methods and end up with a rather plain and efficient algorithm without performing Gibbs sampling to solve intractable integrals. We presented our idea in a statistical workshop (Torretta et al., 2015); here we provide details, justification of the approach and report results from extensive simulation studies.

The rest of the article is structured as follows. In Section 2, we briefly review the P-spline framework for smoothing in QR; in Section 3, we describe the proposed algorithm in detail; and in Section 4, we present results from some simulation experiments. Section 5 is devoted to the analysis of a real dataset, and finally Section 6 reports conclusions and some discussions.

2 P-spline quantile regression framework

Let $Y$ be the quantitative response variable, $Q_{Y} (τ | z, x)$ the $τ$ th quantile of the distribution of $Y$ conditional to covariates $z_{1}, z_{2}, \dots, z_{p}$ and $x_{1}, x_{2}, \dots, x_{K}$ . The $x$ s are quantitative and understood to affect the response quantile through flexible relationships; the $z$ s are further covariates, possibly categorical, entering the model linearly. The additive QR equation can be expressed via

Q_{Y_{i}} (τ | z_{i}, x_{i}) = z_{i}^{T} β_{τ} + s_{1 τ} (x_{1 i}) + \dots + s_{K τ} (x_{Ki}),

(2.1)

where $β_{τ}$ quantifies the linear effect of covariates $z$ , and the ${s_{k τ} (x_{k})}_{k = 1, \dots, K}$ are smooth but otherwise unspecified functions. Following Bollaerts et al. (2006), Ng and Maechler (2007) and Muggeo et al. (2013), among others, we use a low-rank B-spline basis as smoother, thus $s_{k τ} (x_{ki}) = \sum_{j}^{J_{k}} b_{k τ j} B_{kj} (x_{ki})$ , where $B_{kj} (x_{i})$ is the $j$ th spline function of the basis $B_{k} (x_{ki}) = (B_{k 1} (x_{ki}), B_{k 2} (x_{ki}), \dots, B_{{kJ}_{k}} (x_{ki}))$ evaluated at $x_{ki}$ , and the $b_{k τ j} s$ are $J_{k}$ spline coefficients of the vector $b_{k τ}$ (Eilers and Marx, 1996). $J_{k}$ is usually taken as a large number to build a generous basis guaranteeing potentially flexible fits, for instance $J_{k} = \min {40, n / 4}$ , as suggested by Ruppert et al. (2003). When the regression equation involves multiple smooth terms as in (2.1), each basis has to be made identifiable. Among the different approaches, we employ the sum-to-zero constraints such that the estimated smooth curve sums to zero over the observed covariate space (Wood, 2006). Hereafter, we assume each basis is identifiable and the model intercept is included among the linear parameters $β_{τ}$ .

Denoting the response quantile in (2.1) by $Q_{i}$ , the penalized objective function to be minimized can be written as

L_{λ} (β, b_{1}, \dots, b_{K}) = \sum_{i = 1}^{n} ρ_{τ} (y_{i} - Q_{i}) + \sum_{k = 1}^{K} \{λ_{k τ} \sum_{j = 1}^{J_{k} - d_{k}} | Δ^{d_{k}} b_{k τ} |_{j}\},

(2.2)

where $ρ_{τ} (u) = u (τ - I (u < 0))$ is the check function and the $K$ penalties $λ_{k τ} \sum_{j = 1}^{J_{k} - d_{k}} | Δ^{d_{k}} b_{k τ} |_{j}$ control the wiggliness of the corresponding fitted curves ${\hat{s}}_{k τ} (\cdot)$ by shrinking the relevant spline coefficients differences. More specifically, $Δ^{d_{k}}$ is the $d_{k}$ order difference operator applied to the spline coefficients vector $b_{k}$ ; for instance, if $d_{k} = 1$ , the penalty is $\sum_{j}^{J_{k} - d_{k}} | Δ^{d_{k}} b_{k τ} |_{j} = | b_{k τ 1} - b_{k τ 2} | + | b_{k τ 2} - b_{k τ 3} | + \dots + | b_{k τ J_{k} - 1} - b_{k τ J_{k}} |$ . $λ_{k τ}$ is the positive smoothing parameter regulating under- or over-smoothing of the fitted curve. Like for mean regression, the fitted curve ${\hat{s}}_{k} (\cdot)$ approaches a $d_{k} - 1$ degree polynomial when $λ_{k τ}$ gets larger, while $λ_{k τ} = 0$ indicates no penalization, resulting in a potentially wiggly curve.

For fixed smoothing parameter, minimization of (2.2) is a relatively simple task since standard linear programming techniques may be used (He and Ng, 1999; Bollaerts et al., 2006; Koenker, 2011). Unfortunately, the smoothing parameter is not fixed in practice, and as briefly discussed in the Introduction, all of the current methods working with $L_{1}$ penalties, select it via grid-search or numerical optimization. This is sustainable for a single $λ$ , but it can become time-consuming and possibly infeasible with additive models where multiple smoothing parameters $λ_{k}$ have to be estimated. For instance, to alleviate the computational burden, Bang and Jhun (2012) and Andriyana et al. (2014) propose to search for a unique $λ$ , and compute the term specific smoothing parameters via the ‘ad hoc’ adjustment $λ_{k} = λ / \max {| {\hat{b}}_{jk} |}_{j = 1, \dots, J_{k}}$ or $λ_{k} = λ / sdev {| {\hat{b}}_{jk} |}_{j = 1, \dots, J_{k}}$ , where the ${\hat{b}}_{jk}$ s are estimates from a preliminary unpenalized fit. However, such approach can clearly produce non-optimal solutions. Alternatively, two recent approaches to smoothing in QR deal with $L_{2}$ penalty, via the boosting (Fenske et al., 2013; Mayr and Hofner, 2018) or by means of a smooth approximation of the check loss Fasiolo et al., 2018). In the next section, we discuss an iterative algorithm to select the $λ_{k}$ s in an additive QR (2.2) with $L_{1}$ objective and penalties.

3 The proposal: An iterative algorithm for smoothing parameter selection in QR

To estimate the multiple lambda parameters of objective (2.2), we propose to use an iterative algorithm which is sometimes referred in literature as ‘Schall algorithm’. However, the approach was discussed by Fellner (1986) for robust estimation of linear mixed models based on equations of Harville (1977) and extended to generalized mixed models by Schall (1991). The algorithm has also been employed for expectile smoothing by Schnabel and Eilers (2009). The underlying idea of such ‘Harville–Fellner–Schall’ algorithm exploits the link between penalized smoothing methods and random effects models: The penalized coefficients are viewed as random effects having relevant variance parameters, see Currie and Durban (2002) and Wand (2003) for details. Viewing the penalized coefficients as random effects from a known distribution allows us to estimate the smoothing parameter as the ratio of the error variance divided by the variance of the ‘random effects’, namely of the penalized coefficients. We apply a similar idea to estimate the smoothing parameters in nonparametric QR via minimization of (2.2). We first outline the algorithm and then postpone discussion.

The proposed algorithm for additive QR (2.1) at fixed $τ$ , with $p$ linear parameters and $K$ smooth terms, is summarized as follows.

Fix a (small) value for all smoothing parameters $λ_{k τ}^{(0)}, k = 1, \dots, K$ ;

Fit the QR (2.1) by minimizing the objective (2.2) at the fixed $λ_{k τ}^{(0)}$ such that parameter estimate ${\hat{b}}_{k τ}$ and fitted quantiles ${\hat{Q}}_{i}$ are obtained;

Compute ${\hat{ϕ}}_{τ} = \frac{\sum_{i} ρ_{τ} (y_{i} - {\hat{Q}}_{i})}{(n - p - \sum_{k} γ_{k} {edf}_{k})}$ and ${\hat{ψ}}_{k τ} = \frac{\sum_{j} | Δ^{d_{k}} {\hat{b}}_{k τ} |_{j}}{γ_{k} {edf}_{k}} k = 1, \dots, K$ , where ${edf}_{k}$ are the term-specific degrees of freedom to be defined later and $γ_{k} \geq 1$ is a fixed factor to further penalize for the complexity;

Compute ${\hat{λ}}_{k τ} = \frac{{\hat{ϕ}}_{τ}}{{\hat{ψ}}_{k τ}}, k = 1, \dots, K$ ;

Set ${\hat{λ}}_{k τ} \to λ_{k τ}^{(0)} (k = 1, \dots, K)$ and repeat Steps 2 to 4 until convergence.

Convergence in Step 5 is established throughout relative changes in the smoothing parameter values. Namely when the variation in absolute value between the current and the previous iteration of each $λ_{k}$ is less than a fixed tolerance, the algorithm stops by returning the QR model fitted at the last $λ$ value.

3.1 A theoretical justification of the iterative algorithm

Here we illustrate the rationale behind the proposed algorithm. For the sake of simplicity, we consider just one set of coefficients being penalized with a single tuning parameter $λ$ , but extension to multiple $λ$ is straightforward.

It is known that the objective function in QR may be obtained assuming an asymmetric Laplace ( $AL$ ) distribution for the response conditional distribution (Yu and Moyeed, 2001; Geraci and Bottai, 2007). Formally, $Y_{i} | b \sim AL (Q_{i}, ϕ, τ)$ with density function $f (y_{i} | b; Q_{i}, ϕ, τ) = \frac{τ (1 - τ)}{ϕ} exp {- \frac{ρ_{τ} (y_{i} - Q_{i})}{ϕ}}$ , where $ρ_{τ} (u) = u (τ - I (u \leq 0))$ , is the check function like in (2.2). $Q_{i}$ is the location parameter, $ϕ > 0$ is the scale parameter and $τ \in (0, 1)$ the skewness parameter assumed known and fixed, see Yu and Zhang (2005). Let the location parameters $Q_{i}$ depend on the values $x_{ij}$ s via the equation $Q_{i} = \sum_{j}^{J} b_{j} x_{ij}$ , and let the coefficients $(b_{1}, \dots, b_{J})$ be assumed independent random effects from a zero-mean symmetric Laplace distribution with scale parameter $ψ$ , whose joint density is $f (b_{1}, \dots, b_{J}; ψ) = (2 ψ)^{- J} exp {\sum_{j} | b_{j} | / ψ}$ .

Hence, given the aforementioned assumptions and $n$ independent observations, the joint density $f (y_{1}, \dots, y_{n}, b_{1}, \dots, b_{J})$ representing the likelihood is

{\{\frac{τ (1 - τ)}{ϕ}\}}^{n} (2 ψ)^{- J} exp \{- \frac{\sum_{i} ρ_{τ} (y_{i} - Q_{i})}{ϕ} - \frac{\sum_{j} | b_{j} |}{ψ}\} .

By using the simple re-parameterization $ψ = \frac{ϕ}{λ}$ and taking the log yields the log likelihood, apart from a constant,

ℓ = - n log ϕ - J log ϕ + J log λ - \frac{\sum_{i} ρ_{τ} (y_{i} - Q_{i})}{ϕ} - \frac{λ \sum_{j} | b_{j} |}{ϕ} .

(3.1)

If $ϕ$ and $λ$ are considered as nuisance, then maximization of (3.1) with respect to the $b_{j}$ s justifies minimization of (2.2). Actually, (2.2) is more general as it includes multiple $λ_{k}$ and penalty on the differenced coefficients, however the rationale is clearly the same.

The partial derivatives of (3.1) are easily obtained

\begin{matrix} \frac{\partial ℓ}{\partial λ} & = & \frac{J}{λ} - \frac{\sum_{j} | b_{j} |}{ϕ} \\ \frac{\partial ℓ}{\partial ϕ} & = & \frac{1}{ϕ^{2}} \{- ϕ (n + J) + \sum_{i} ρ_{τ} (y_{i} - Q_{i}) + λ \sum_{j} | b_{j} |\} . \end{matrix}

The root of $\frac{\partial ℓ}{\partial λ} = 0$ is $\hat{λ} = \frac{ϕ}{\sum_{j} | b_{j} | / J}$ , while by the estimating equation $\frac{\partial ℓ}{\partial ϕ} = 0$ , after plugging in $\hat{λ}$ , we get $\hat{ϕ} = \sum_{i} ρ_{τ} (y_{i} - Q_{i}) / n$ . Namely the maximum likelihood (ML) estimate of scale parameters ratio $λ$ is

\hat{λ} = \frac{\sum_{i} ρ_{τ} (y_{i} - {\hat{Q}}_{i}) / n}{\sum_{j} | {\hat{b}}_{j} | / J} = \frac{\hat{ϕ}}{\hat{ψ}},

(3.2)

which justifies Step 4 in the aforementioned algorithm, apart from the denominators $n$ and $J$ . The rationale of replacing them with the effective degrees of freedom and details about their computation are discussed in the next subsections.

3.2 Remarks

As illustrated above, parameter estimation is justified via maximization of the joint likelihood depending on fixed and random, that is, penalized, parameters. Unlike Geraci and Bottai (2007), it should be stressed that no tentative to integrate out the random effects is carried out, and thus the objective to be optimized is the joint, rather than the marginal, likelihood. It is worth noting that in the usual mixed models framework, the variances ratio expression, that is, the counterpart of (3.2) for mean regressions, typically comes from the marginal likelihood maximization. However, as discussed by McCulloch (1997), it can be motivated under different likelihoods, including the marginal, penalized and joint likelihood of fixed and random parameters as taken in this article for smoothing QR.

The scale parameter estimate $\hat{ψ}$ in (3.2) depends on the numerator $\sum_{j} | {\hat{b}}_{j} |$ which, in turn, relies on independence of random effects as shown in Subsection 3.1. Actually, the coefficients of a B-spline basis with a difference penalty are not associated with independent random effects, but it is straightforward to transform the B-spline functions in order to take independence of the $b_{j}$ s (Currie and Durban, 2002). As an alternative (Eilers and Marx, 2010), it is also possible to keep the B-spline basis and to take the differenced coefficients $\sum_{j}^{J - d} | Δ^{d} b |_{j}$ as depicted in Step 3 of the above algorithm.

As illustrated by (3.2) in the previous section, $\hat{λ}$ is the joint ML estimate under the Laplace assumptions for the $y_{i}$ s and $b_{j}$ s. In linear mixed models, a restricted ML (REML)’like approach is well known to attenuate bias of the variance parameter estimators and to improve greatly their performance (e.g., Fellner, 1986; Schall, 1991; Wand, 2003; Wood, 2006). Therefore, we speculate that a similar behaviour applies in QR. Unfortunately, a restricted version of log likelihood (3.1) is not straightforward to obtain, and it has not been derived formally. However, a natural and intuitive choice to grant such adjustment is to consider the equivalent degrees of freedom in estimation of $\hat{ψ}$ and $\hat{ϕ}$ , namely replacing $J$ and $n$ , respectively, by the effective and the residual degrees of freedom. Preliminary simulation experiments showing the better performance of the REML-like with respect to the ML approach have also emphasized a general tendency to undersmooth, namely to produce relatively too wiggly fitted curves. A practical modification to fix this issue is to further penalize for the model complexity in the generalized CV score, see (Wood, 2006): The idea is to increase the amount that each ${edf}_{k}$ counts. We undertake a similar approach, and therefore in Step 3 of the algorithm, we use REML-like estimates for the scale parameters wherein each ${edf}_{k}$ is increased by a factor of $γ_{k} > 1$ .

The proposed algorithm straightforwardly applies to so-called varying coefficient models, that is, regression equations involving also interactions between a smooth and a linear term, such as $z_{i} s_{τ} (x_{i})$ , see Andriyana et al. (2014). Detailed discussion of varying coefficients is beyond the goal of this article; but it is worth noting that to include varying coefficient terms, it suffices to modify the basis spline such that each $n$ -dimensional column is multiplied element-wise by the $n$ -dimensional covariate vector. Hence, the $i$ th row of the B-spline related to the varying coefficient term can be written as $\tilde{B} (x_{i}) = (z_{i} B_{1} (x_{i}), z_{i} B_{2} (x_{i}), \dots, z_{i} B_{J} (x_{i}))^{T}$ , and difference penalty on coefficients applies straightforwardly.

3.3 Quantifying the effective model dimension

The aforementioned algorithm in Step 3 requires to quantify the term-specific degrees of freedom ${edf}_{k}$ . To the best of our knowledge, there is no consensus in quantifying them for additive QR. Some criteria aimed at selecting the amount of smoothing, such as the CV (Bollaerts et al., 2006) or $L$ -curve (Andriyana et al., 2014), do not need term-specific degrees of freedom, while other criteria, such as the Akaike or the Schwarz Information Criterions just require the overall effective model dimension. For instance in Koenker (2011), the multiple smoothing parameters in additive QR, are selected by

{SIC}_{λ} = log [n^{- 1} \sum_{i}^{n} ρ_{τ} (y_{i} - {\hat{Q}}_{i})] + 0.5 n^{- 1} edf log (n),

(3.3)

where only the total $edf$ are computed via the null residuals; see also Koenker (2005) for a general discussion, and Li and Zhu (2008) for a rigorous proof on computing the model complexity via the number of interpolated points. In a REML-like framework, the term specific ${edf}_{k}$ should clearly depend on the selected $λ_{k}$ , namely ${edf}_{k} = J_{k}$ when $λ_{k} = 0$ , and ${edf}_{k} \to d_{k} - 1$ if $λ_{k} \to \infty$ . Two approaches could be undertaken to compute ${edf}_{k}$ at intermediate values of $λ_{k}$ .

The $L_{1}$ penalty leads to null estimates of some basis coefficients involved in the penalty term. Thus, a natural way to quantify complexity of the fitted curve is via the corresponding non-zero penalized estimates. More specifically, for the smooth term $k$ expressed by a $J_{k}$ -rank basis and $J_{k} - d_{k}$ penalized coefficient differences, we count the number of non-zero difference estimates and the number of underlying $d_{k} - 1$ unpenalized coefficients; formally ${edf}_{k} = # {Δ^{d_{k}} {\hat{b}}_{k} \neq 0} + (d_{k} - 1)$ . For instance, if we set $d_{k} = 2$ and $λ_{k}$ gets very large, we get all zero estimated differences, leading to ${edf}_{k} = d_{k} - 1 = 1$ which corresponds to the resulting linear fit.

The second viable approach relies on a smooth approximation of the $L_{1}$ norm in both the fidelity and penalty term. The goal is to build an approximate hat matrix in order to define the ${edf}_{k}$ s accordingly. Among the several smooth approximations which could be used (e.g., Muggeo et al., 2012), we use the simple identity $| u | = u^{2} / \sqrt{u^{2}}$ (Schnabel and Eilers, 2013). However, while Schnabel and Eilers (2013) use that to estimate multiple quantile curves with P-splines via iterative least squares, we exploit the smooth approximation only to build the hat matrix. Thus, given parameter estimates obtained via optimization of the $L_{1}$ -norm objective (2.2), residuals $e_{i} = y_{i} - {\hat{Q}}_{i}$ and weights $w_{τ i} = τ - I (e_{i} < 0)$ arranged into the matrix $W_{τ} = diag (w_{τ 1}, \dots, w_{τ n})$ , we note the parameter estimates can be expressed via $(\hat{β}, {\hat{b}}_{λ})^{T} = (X^{T} W_{τ} X + P_{λ})^{- 1} X^{T} W_{τ} y$ , where $X$ is the design matrix including linear covariates and the B-spline basis functions for the smooth terms and $P_{λ}$ is a block diagonal matrix including the zeroes for unpenalized coefficients and the penalty matrices relevant to the different smooth terms. For instance, the block relevant to the $k th$ smooth term would be $λ_{k} D_{k}^{d_{k} T} V_{k} D_{k}^{d_{k}}$ , where $D_{k}^{d_{k}}$ is the $d_{k}$ -order difference matrix and $V_{k}$ includes the reciprocals of the same differenced coefficients such that the quadratic form penalty equals the original $L_{1}$ -norm penalty, namely $b_{k}^{T} D_{k}^{T} V_{k} D_{k} b_{k} = \sum_{j} \frac{(Δ^{d_{k}} b_{k})_{j}^{2}}{\sqrt{(Δ^{d_{k}} b_{k})_{j}^{2}}} = \sum_{j} | Δ^{d_{k}} b_{k} |_{j}$ .

The least squares formulation allows to define the hat matrix whence the model degrees of freedom may be obtained. More specifically, the elements on the main diagonal of $(X^{T} W_{τ} X + P_{λ})^{- 1} X^{T} W_{τ} X$ represent the degrees of freedom of each coefficient associated to the corresponding column of the design matrix. By summing the $J_{k}$ elements corresponding to the coefficients $b_{k 1}, \dots, b_{{kJ}_{k}}$ relevant to the $k$ th B-spline basis, the term-specific ${edf}_{k}$ are obtained. Of course, the trace quantifies the overall model $edf$ .

4 Simulation studies

We assess the performance of the proposed approach via some simulation experiments. We follow some settings employed in Koenker (2011), but we also consider additional scenarios. Data were generated according to $y_{i} = g (x_{i}) + σ (x_{i}) e_{i}$ , where the covariate $x_{i} \sim U (0, 1)$ and the signal $g (x_{i})$ is defined according to four different functions: linear $0.2 + 0.4 x_{i}$ , logarithm $log (x_{i})$ , sinusoidal $sin (2 π x_{i})$ and ‘square root sinusoidal’ indicated by $g_{0} (x_{i}) = \sqrt{x_{i} (1 - x_{i})} sin ((2 π (1 + 2^{- 7 / 5})) / (x_{i} + 2^{- 7 / 5}))$ . The scale function is either constant $σ (x_{i}) = 0.2$ or depending on the covariate itself, that is, $σ (x_{i}) = 0.2 (1 + x_{i})$ . The errors $e_{i}$ are iid from four different distributions, Gaussian, $χ_{3}^{2}$ , $t_{3}$ and $t_{1}$ , such that the $τ$ th quantile equals zero. The sample size is $n = 400$ and five percentiles $τ = {0.10, 0.25, 0.50, 0.75, 0.90}$ are considered. To provide an idea of the signal to noise ratio in the considered scenarios, Figure 1 portrays a set of simulated data (at $τ = 0.5$ ) according to the different signal functions and error distributions.

Figure 1:

Some simulated data according to the signal curves and error distributions with constant scale function. The continuous line represents the true median signal

In the manuscript, we contrast three competitors: (a) the P-spline smoother (using $\min {40, n / 4}$ cubic B-spline functions with difference penalty $d = 3$ ) and $λ$ selected using the extra penalty factor $γ = 2$ in the Harville–Fellner–Schall Step 3 of the proposed algorithm (‘psplines+hfs’); (b) the same P-spline smoother with $λ$ selected by the ${SIC}_{λ}$ reported in (3.3), (‘psplines + sic’); (c) the quantile smoothing splines with a total variation penalty and $λ$ selected again by minimization of the ${SIC}_{λ}$ , (‘ssplines + sic’). Differences of the proposed ‘pspline+hfs’ with respect to other possible approaches, such as CV and the so-called $L$ -curve criterion, are reported in Supplementary Material. Both CV and $L$ -curve have been employed in literature to select just a single and not multiple, smoothing parameter in QR. Extension of tenfold, say, CV is prohibitive in practice due to its computational load, and $L$ -curve for bivariate smoothing has been proposed in linear regression only (Frasso and Eilers, 2013), and further studies and research are needed to assess its applicability in additive QR (2.1). Therefore, both CV and $L$ -curve could not be considered as effective competitors in additive QR, and discussion and comparative assessment has been reported in supplementary material which also includes some comparisons with the boosting approach.

Figure 2:

Constant scale function $σ (x) = 0.2$ . Contrasting three competitors for smoothing parameter selection in terms of MISE (on log scale) by different error distributions and signals: ‘sspline+sic’ (light grey box), ‘pspline+sic’ (medium grey box) and ‘pspline+hfs’ (dark grey box)

Figure 3:

Non-constant scale function $σ (x) = 0.2 (1 + x)$ . Contrasting three competitors for smoothing parameter selection in terms of MISE (on log scale) by different error distributions and signals: ‘sspline+sic’ (light grey box), ‘pspline+sic’ (medium grey box) and ‘pspline+hfs’ (dark grey box)

The three aforementioned main approaches are contrasted in terms of Mean Integrated Square Error (MISE) defined as the mean of squared differences between the true quantile curve and the corresponding fitted curve across 500 replicates, and also the Mean Integrated Absolute Error, similar to MISE but involving the absolute differences rather than the squares. Here we report results for the MISE values, while the MIAE results are reported in the Supplementary Material.

Figures 2 and 3 report the comparisons in terms of MISE: for all the error distributions considered and for every signal (linear, logarithmic, sinusoidal and sqrt+sinusoidal ‘ $g_{0} (\cdot)$ ’), each panel reports the boxplots of the log MISE values across the replicates for the five $τ$ values. The two figures refer to homoscedastic and heroscedastic scenarios.

Overall, no important difference emerges across the scenarios. Occasionally, some criterion appears to perform slightly better/worse than the others: for instance, ‘sspline+sic’ returns higher MISE values at higher quantiles with $log (x)$ signal and gaussian errors; the proposed ‘pspline+hfs’ exhibits slightly lower mean square errors at linear signal and middle percentiles (0.25,0.50,0.75), but at the same middle percentiles, it depicts higher values when the signal is $log (x)$ or $g_{0} (x)$ and errors follow the Cauchy distribution. However, no systematic patterns come out, and no difference among the homoscedastic or heteroscedastic case is noteworthy. Hence, while performances in terms of statistical efficiency are substantially the same across the scenarios, the gain in computational efficiency is not negligible; more details are postponed to Section 4.1.

The same simulation scenarios were run with an additional linear covariate to assess possible impacts on the sampling distribution of the linear coefficient estimators. No difference was observed among the MISE values and the sampling distributions and results are not shown for shortness.

We consider the more interesting scenarios of additive models, namely more smooth relationships in the QR equation. To illustrate, we consider the true quantile function $1 + 2 cos (x_{1 i}) + sin (2 π x_{2 i})$ with independent uniform covariates $x_{1 i} \sim U (- 4, 4)$ and $x_{2 i} \sim U (0, 1)$ , sample size $n = 400$ , the four aforementioned error distributions and five quantile curves. Figure 4 shows the log MISE, for the aforementioned competitors ‘sspline+sic’, ‘pspline+sic’ and the proposed ‘pspline+hfs’.

Figure 4:

Contrasting the three competitors (in terms of log(MISE)) for smoothing parameter selection with 2 additive terms by different error distributions and quantile curves: “sspline+sic” (light grey box), “pspline+sic” (medium grey box) and “pspline+hfs” (dark grey box). Right panels: constant scale function; left panels: non-constant scale function. The true signal is $1 + 2 cos (x_{1 i}) + sin (2 π x_{2 i})$

Unlike the single covariate case, here some differences are noteworthy. For the homoscedastic scenario, the traditional ‘sspline+sic’ performs bad most of times, noticeably quite worse than the others with Gaussian and Cauchy errors especially at extreme quantiles ( $τ = 0.1$ and $0.90$ ). On the other hand, ‘pspline+sic’ and ‘pspline+hfs’ behave about the same, with ‘pspline+hfs’ exhibiting lightly but constantly lower MISE values at middle quantiles ( $τ = 0.25, 0.50, 0.75$ ) with somewhat remarkable differences in the $t_{1}$ -distribution case. With heteroscedastic errors patterns are even more pronounced, especially with asymmetric errors where the MISE values from ‘sspline+sic’ are far higher than the others.

Differences in terms of MISE exhibit approximately the same patterns and are reported in the supplementary material which also shows comparisons relevant to smoothing parameter selection in varying coefficient models.

4.1 Computational issues

As previously discussed, computational efficiency is a major feature of our proposal. The benefit with respect to crude grid search approaches could be somewhat expected, especially when multiple smoothing parameters have to be estimated: in fact grid-search approaches require multidimensional evaluation grids leading to quite taxing procedures since a large number of fits have to be obtained to pick up the best lambdas values optimizing the selected criterion, such as the SIC or

L

-curve. CV is even heavier, since for each candidate lambda value in the grid, the model has to be fitted and tested several times to training/testing sub-datasets.

Table 1:
Computational times to fit additive quantile regresion with 1 or 2 smooth terms for $n = 1 000$ observations. Entries refer to averages (on 10 fits) of the elapsed components obtained via the system.time() function in R. Times also include building the B-spline basis and the penalty matrices. Computations run on R 3.6.1 on Windows, Intel i7-8700 CPU 3.20 GHz, RAM 16 GB.

N. Smooth			Execution
Terms	Criterion	Method	Time (Sec.)
1	sspline+sic	grid-search ( $15$ )	0.12
		grid-search ( $30$ )	0.21
	pspline+sic	grid-search ( $15$ )	0.12
		grid-search ( $30$ )	0.22
	pspline+hfs	iterative	0.07
2	sspline+sic	grid-search ( $15 \times 15$ )	57.3
		grid-search ( $30 \times 30$ )	228.7
	pspline+sic	grid-search ( $15 \times 15$ )	8.61
		grid-search ( $30 \times 30$ )	33.1
	pspline+hfs	iterative	0.33

To gain a rough assessment of the computational load, Table 1 reports the execution time of the aforementioned approaches when fitting additive regression quantiles with 1 or 2 smooth terms at $τ = 0.5$ for $n = 1 000$ observations. For the ‘grid-search’ rows, the numbers in parentheses refer to the number of lambda values to be evaluated to seek the optimum: We consider 15 or 30 values, with the former leading, as it would be expected, to lower execution times but higher mean square errrors.

While for single smooth terms, all times are within reasonable ranges, differences get notable with 2 smooth terms: grid-searches show quite large times in general, but sspline+sic exhibits larger running times, probably due to the higher number of parameters to be estimated. The proposed pspline+hfs presents times far lower than the competitors, since it is based on iterative procedure and typically less than 20 iterations are requested to get convergence. It is worth stressing that such large differences are expected to rise as the number of smoothing parameters increases, making our proposal quite attractive with multiple additive terms.

For the sake of completeness, we mention the SIC criterion could be minimized via numerical procedures rather than grid-searches: For instance, by means of the Nelder–Mead algorithm which does not require gradient evaluation, or quasi-Newton methods which compute the gradient numerically. However, we have experienced a somewhat strong influence of the supplied starting values on the final results, making the numerical procedures substantially unreliable. One could try to run the algorithm using different starting values or probably to rely on different algorithms such as the simulated annealing, but at the cost of increasing the computational burden.

5 Application: Modelling standing long jump in children

We apply the proposed algorithm to analyse data concerning the sport performance in children. Physical fitness is a powerful marker of the health condition in childhood and adolescence; for instance, fitness is negatively associated with cardiovascular risk factors for chronic disease, high blood pressure and total fatness (Kraemer and Häkkinen, 2008).

As an alternative to laboratory methods, physical fitness can be measured via the on-site fitness tests which are easy to be administered, especially when the population study involves schoolchildren. Among the different sport performance outcomes, the standing long jump (SLJ) test represents a widespread yet practical, time-efficient and cheap method for assessing the muscular fitness in children and adolescents. The schoolchildren stands at a line marked on the ground with the feet slightly apart and jumps: The SLJ performance is the horizontal distance jumped. SLJ, sometimes also known as standing broad jump, is commonly used to assess explosive leg power, but it is is also understood to be a proxy of muscular strength tests of the lower body. SLJ depends mainly on leg length, but it influenced also by neuromuscular maturation, as SLJ requires more coordination of movements and technique, for instance the so-called take-off angle (Saint-Maurice et al., 2015).

Data analysed here refer to wide survey carried out in years 2011–2013 to assess sport abilities and fitness in schoolchildren in Sicily. Data have been kindly provided by the Department of ‘Scienze Psicologiche, Pedagogiche e della Formazione’, University of Palermo. Measurements were gained by previously trained experts and include measurements on performance on several physical tests along with some anthropometric measurements collected in the major Sicilian cities in the three-year period. Here we present results relevant to $n = 488$ schoolchildren in Palermo collected in 2011; further results relevant to whole Sicily will be presented elsewhere.

Beside the response SLJ, the main covariate is the child weight. The substantive research question is how the weight could affect the SLJ performance, namely in statistical terms the relationship SLJ—weight which is expected to be nonlinear. However, the weight is strongly correlated with age and height, which are understood to affect the response as well, possibly in a nonlinear manner. Thus, an additive QR accounting simultaneously for the three covariates flexibly appears to be the most appropriate model, namely

Q_{slj} (τ) = β_{0} + β_{1} gender + s_{1 τ} (age) + s_{2 τ} (weight) + s_{3 τ} (height),

at fixed $τ$ . Multiple quantile curves could be useful to obtain the so-called growth charts, sometimes referred to as ‘normative reference values’ in sport medicine (Saint-Maurice et al., 2015; Sandercock et al., 2016). However, focusing only on upper quantiles would be useful for talent selection in school age. Thus, we set $τ = 0.95$ in the above regression equation.

We fit the additive QR model using the aforementioned algorithm; the smooth functions are expressed as identifiable cubic B-spline bases and the number of basis functions was set using the empirical rule of $\min {40, n / 4}$ (Ruppert et al., 2003). A third-order difference penalty was used in the penalty term.

Figure 5 portrays the fitted curves (centred due to identifiability constraints) along partial residuals, that is, the fitted quantile values plus the residuals from the full model. To quantify uncertainty, pointwise confidence intervals have been computed by means of the estimates covariance matrix based on bootstrap cases resampling.

Figure 5:

The smooth effect of age, weight and height on the quantile curve $τ = 0.95$ of standing long jump. In each panel, shaded areas portray the 95% pointwise confidence intervals based on bootstrap resampling and dots represent the partial residuals

For the age term, the smoothing parameter estimate is quite large, and thus the relevant relationship with the response corresponds to a polynomial of degree 2; on the other hand, the estimated lambdas for the weight and height terms are moderate, resulting in flexible relationships with 3.01 and 6.98 degrees of freedom, respectively. Controlling for age and height, the weight effect on the SLJ is worth discussing: Until 35–40 kg, there is almost no influence on the performance, but afterwards there is an important negative effect untill about 75 kg when the relationship stabilizes. Finally, the height effect shows different phases, with steeper slopes at very low ( $< 110$ cm) and at middle (130–150 cm) heights.

We have discussed estimation at single specified $τ$ . However, if the aim is to estimate multiple quantile curves at different $τ$ values, the proposed algorithm could be applied several times at different $τ$ values with the noncrossing constraints as discussed in Muggeo et al. (2013).

6 Conclusions

We have proposed an iterative algorithm for selecting multiple smoothing parameters in additive QR models with $L_{1}$ penalties. The idea exploits the link between smoothing and mixed modelling, a connection which is well consolidated in mean regression, but it appears not fully exploited in QR. The iterative nature of the proposed algorithm makes it very attractive in presence of several lambdas where the multidimensional grid search or the derivative-free numerical optimization demand extensive computations. While the algorithm relies on assuming proper Laplace distributions for the responses and the random effects with constant scale parameters, simulations have shown that our approach performs well even when the constant-scale assumption is violated and observations exhibit heteroscedasticity. This is quite remarkable from a practical perspective, since QR turns out to be very useful especially in heteroscedastic scenarios.

In addition to far better computational efficiency, simulation experiments have shown that most of the times our proposal exhibits good statistical performance as compared to the canonical approaches of smoothing splines or P-splines and tuning parameter selected by the Schwartz Information Criterion. The proposed approach returned slightly higher mean squared errors only in few scenarios at the lowest quantile ( $τ = 0.1$ ). Along with the algorithm itself, we have also discussed a strategy to quantify the equivalent degrees of freedom of the fitted model and also of each smooth term. In additive QR, quantification of the equivalent degrees of freedom for each smooth term does not appear to have been discussed previously.

We have focused on additive models, that is, multiple ‘univariate’ smooth terms, but generalization to bivariate smooths represents straightforward extensions where our proposal could apply. Modelling multiple quantile curves to produce growth charts for reference values can be carried out by applying the proposed algorithm at different probability values, namely with $τ$ -specific smoothing parameter values. As an alternative option, a unique smoothing parameter could be used for each quantile curve: Selecting and using a unique smoothing parameter for each $τ$ value represents a noteworthy issue to be investigated.

Also, the presented algorithm could be employed in random effects QR for longitudinal data, where a few proposals have been discussed (Koenker, 2004; Geraci and Bottai, 2007; Lamarche, 2010): Comparisons with Geraci and Bottai (2007) which use explicit marginal Laplace log likelihood and Gibbs sampler appear particularly noteworthy.

Yet another possible extension of the proposed algorithm is about modelling quantiles from discrete distributions, for instance to study the student performance at university via the number of credits gained after the first academic year (Grilli et al., 2016): Here QR is applied several times at the randomly jittered response, and if the smoothing parameter has to be estimated at each perturbated dataset, the proposed approach appears quite useful due to computational efficiency.

R code to implement the methods presented in this article is currently available from the corresponding author and will be also shipped in due time with the R package quantregGrowth. Detailed comparisons among the R packages to fit additive regression quantiles represent a noteworthy point to be investigated in a separate paper.

Supplementary materials

Supplementary materials for this article, including comparisons with other approaches along with further simulation results, R code and data discussed in Section 5, are available from http://www.statmod.org/smij/archive.html

Footnotes

Acknowledgments

Data illustrated in Section 5 have been kindly provided by Professor M. Bellafiore, ‘Dipartimento di Scienze Psicologiche, Pedagogiche e della Formazione’ University of Palermo.We would like to thank the Editor Professor Arnošt Komárek, the Associate Editor and the referees for their suggestions and carefully reading of the manuscript which lead to a substantial improvement of the article.

Declaration of conflicting interests

Funding

The authors disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This research has been supported by the Italian Ministerial grant PRIN 2017 ‘From high school to job placement: Micro-data life course analysis of university student mobility and its impact on the Italian North-South divide.’, n. 2017HBTK5P.

References

Andriyana

Gijbels

Verhasselt

(2014) P-splines quantile regression estimation in varying coefficients models. Test , 23, 153–94.

Austin

Dalyand

Alter

(2005) The use of quantile regression in health care research: A case study examining gender differences in the timeliness of thrombolytic therapy. Statistics in Medicine , 24, 791–816.

Bang

Jhun

(2012) Simultaneous estimation and factor selection in quantile regression via adaptive sup-norm regularization. Computational Statistics and Data Analysis , 56, 813–26.

Bollaerts

Eilers

Aerts

(2006) Quantile regression with monotonicity restrictions using P-splines and the L1-norm. Statistical Modelling , 6, 189–207.

Bosch

Woodworth

(1995) A convenient algorithm for quantile regres- sion with smoothing splines. Compu- tational Statistics and Data Analysis , 19, 613–30.

Cole

Green

(1992) Smoothing reference centile curves: The LMS method and penalized likelihood. Statistics in Medicine , 11, 1305–19.

Currie

Durban

(2002) Flexible smoothing with P-splines: A unified approach. Statistical Modelling , 4, 333–49.

Eilers

Marx

(1996) Flexible smoothing with B-splines and penalties. Statistical Science , 11, 89–121.

——– (2010) Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics , 2, 637–53.

10.

Fasiolo

Goude

Nedellec

Wood

(2018) Fast calibrated additive quantile regression. URL https://arxiv.org/abs/1707. 03307.

11.

Fellner

(1986) Robust estimation of variance components. Techonometrics , 28, 51–60.

12.

Fenske

Fahrmeir

Hothorn

Rzehak

Höhle

(2013) Boosting structured additive quantile regression for longitudinal childhood obesity data. The International Journal of Biostatistics , 9, 1–18.

13.

Frasso

Eilers

PHC

(2013) L-surface and V-valley for multi-dimensional smoothing parameter selection. In Proceedings of the 28th International Workshop on Statistical Modelling, edited by Muggeo

Capursi

Boscaino

Lovison

pages 151–56. Statistical Modelling Society . Amsterdam.

14.

——— (2015) L- and V-curves for optimal smoo- thing. Statistical Modelling , 15, 91–111.

15.

Geraci

Bottai

(2007) Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics , 8, 140–54.

16.

Grilli

Rampichini

Varriale

(2016) Statistical modelling of gained university credits to evaluate the role of pre-enrolment assessment tests: An approach based on quantile regression for counts. Statistical Modelling , 16, 47–66.

17.

Hansen

(1992) Analysis of discrete ill-posed problems by means of the L-curve. SIAM Review , 34, 561–80.

18.

Harville

(1977) Maximum likelihood approaches to variance component estimation and to related problems. Journal of The American Statistical Association , 72, 320–38.

19.

(1999) COBS: Qualitatively constrained smoothing via linear programming. Computational Statistics , 14, 315–38.

20.

Shi

(1994) Convergence rate of B-spline estimators of nonparametric conditional quantile functions. Journal of Nonparametric Statistics , 3, 299–308.

21.

Hothorn

Buehlmann

Kneib

Schmid

Hofner

(2018) mboost: Model-based boosting . R package version 2.9-1. URL https://CRAN.R-project.org/package=mboost (last accessed 28 May 2020).

22.

Koenker

(2004) Quantile regression for longitudinal data. Journal of Multivariate Analysis , 91, 74–89.

23.

Koenker

(2005) Quantile Regression . Cambridge: Cambridge University Press.

24.

Koenker

(2011) Additive models for quantile regression: model selection and condence bandaids. Brazilian Journal of Probability and Statistics , 25, 239–62.

25.

Koenker

(2016) quantreg: Quantile regression . R package version 5.29. URL https://CRAN.R-project.org/package=quantreg.

26.

Koenker

Bassett

(1978) Regression quantiles. Econometrica , 46, 33–50.

27.

Koenker

Portnoy

(1994) Quantile smoothing splines. Biometrika , 81, 673–80.

28.

Kraemer

Häkkinen

(2008) Handbook of Sports Medicine and Science, Strength Training for Sport . Hoboken, NJ: Wiley.

29.

Lamarche

(2010) Robust penalized quantile regression estimation for panel data. Journal of Econometrics , 157, 396–408.

30.

Graubard

Korn

(2010) Application of nonparametric quantile regression to body mass index percentile curves from survey data. Statistics in Medicine , 29, 558–72.

31.

Zhu

(2008) The L1 norm quantile regression. Journal of Computational and Graphical Statistics , 17, 163–85.

32.

Liu

Tang

(2019) Improved local quantile regression. Statistical Modelling , 19, 501–23.

33.

Mayr

Hofner

(2018) Boosting for statis- tical modelling: A non-technical intro- duction. Statistical Modelling , 18, 365–84.

34.

Mc Culloch

(1997) Maximum likelihood algorithms for generalized linear mixed models. Journal of the American Statistical Association , 92, 162–70.

35.

Muggeo

VMR

Sciandra

Augugliaro

(2012) Quantile regression via iterative least squares computations. Journal of Statistical Computation and Simulation , 82, 1557–69.

36.

Muggeo

VMR

Sciandra

Tomasello

Calvo

(2013) Estimating growth charts via nonparametric quantile regression: A practical framework with application in ecology. Environmental and Ecological Statistics , 20, 519–31.

37.

Maechler

(2007) A fast and efficient implementation of qualitatively constrained quantile smoothing splines. Statistical Modelling , 7, 315–28.

38.

——— (2016) COBS: Constrained B-splines (Sparse matrix based). R package version 1.3-1. URL https://CRAN.R-project.org/package=cobs (last accessed on 28 May 2020).

39.

Ruppert

Wand

Carroll

(2003). Semiparametric regression . Cambridge: Cambridge University Press.

40.

Saint- Maurice

Laurson

Kaj

Csanyi

(2015) Establishing normative reference values for standing broad jump among Hungarian youth. Research Quarterly for Exercise and Sport , 86, S37–44.

41.

Sandercock

Voss

Cohen

Taylor

Stasinopoulos

(2016) Centile curves and normative values for the twenty metre shuttle-run test in English schoolchildren. Journal of Sports Sciences , 30, 679–87.

42.

Schall

(1991) Estimation in generalized linear models with random effects. Biometrika , 78, 719–27.

43.

Schnabel

Eilers

(2009). Optimal expectile smoothing. Computational Statistics & Data Analysis , 53, 4168–77.

44.

Schnabel

Eilers

(2013) Simultaneous estimation of quantile curves using quantile sheets. AStA Advances in Statistical Analysis , 97, 77–87.

45.

Thompson

Cai

Moyeed

Reeve

Stander

(2010) Bayesian nonpara- metric quantile regression using splines. Computational Statistics and Data Analysis , 54, 1138–50.

46.

Torretta

Muggeo

VMR

Eilers

PHC

(2015) P-spline quantile regression with a mixed model algorithm. In Proceedings of the 30th International Workshop on Statistical Modelling, edited by Friedl

Wagner

, pages 372–76. Statistical Modelling Society . Amsterdam.

47.

Waldmann

(2018) Quantile regression: A short story on how and why. Statistical Modelling , 18, 203–18.

48.

Wand

(2003) Smoothing and mixed models. Computational Statistics , 18, 223–49.

49.

Wei

Pere

Koenker

(2006) Quantile regression methods for reference growth charts. Statistics in Medicine , 25, 1369–82.

50.

Wood

(2006) Generalized Additive Models: An Introduction with R . London: Chapman & Hall.

51.

Xiang

(1996) A kernel estimator of a conditional quantile. Journal of Multivari- ate Analysis , 59, 206–16.

52.

Jones

(1998) Local linear quantile regression. Journal of the American Statistical Association , 93, 228–37.

53.

Zhang

(2005) A three-parameter asymmetric Laplace distribution and its extension. Communications in Statistics: Theory and Methods , 34, 1867–79.

54.

Moyeed

(2001) Bayesian quantile regression. Statistics & Probability Letters , 54, 437–47.

55.

Yuan

(2006) GACV for quantile smoothing splines. Computational Statistics & Data Analysis , 50, 813–29.

Multiple smoothing parameters selection in additive regression quantiles

Abstract

Keywords

1 Introduction

2 P-spline quantile regression framework

3.1 A theoretical justification of the iterative algorithm

3.3 Quantifying the effective model dimension

Figure 1:

Some simulated data according to the signal curves and error distributions with constant scale function. The continuous line represents the true median signal

Constant scale function σ ( x ) = 0.2 . Contrasting three competitors for smoothing parameter selection in terms of MISE (on log scale) by different error distributions and signals: ‘sspline+sic’ (light grey box), ‘pspline+sic’ (medium grey box) and ‘pspline+hfs’ (dark grey box)

Figure 5:

The smooth effect of age, weight and height on the quantile curve τ = 0.95 of standing long jump. In each panel, shaded areas portray the 95% pointwise confidence intervals based on bootstrap resampling and dots represent the partial residuals

Supplementary materials

Footnotes

Acknowledgments

Declaration of conflicting interests

Funding

References

Constant scale function $σ (x) = 0.2$ . Contrasting three competitors for smoothing parameter selection in terms of MISE (on log scale) by different error distributions and signals: ‘sspline+sic’ (light grey box), ‘pspline+sic’ (medium grey box) and ‘pspline+hfs’ (dark grey box)

The smooth effect of age, weight and height on the quantile curve $τ = 0.95$ of standing long jump. In each panel, shaded areas portray the 95% pointwise confidence intervals based on bootstrap resampling and dots represent the partial residuals