Joint modelling of non-crossing additive quantile regression via constrained B-spline varying coefficients

Abstract

We present a unified framework able to fit the entire quantile process, namely to estimate simultaneously multiple non-crossing quantile curves. The framework relies on assuming each regression parameter varies smoothly across the percentile direction according to B-splines whose coefficients obey proper restrictions. Multiple linear and penalized smooth terms are allowed and the corresponding tuning parameters are estimated efficiently as part of the model fitting. Monotonicity and concavity constraints on the smoothed relationships are also easily accounted for in the framework. Simulation results provide evidence our proposal exhibits good statistical performance with respect to competitors while guaranteeing the non-crossing property and modest computational load. Analyses on a real dataset related to vocabulary size growth are presented to illustrate the model capability in practice.

Keywords

growth charts non-crossing quantile monotone B-spline P-splines

1 Introduction

Before going into article, as the first author, I would like to write a few words to remember prof. Brain Marx. I owe a lot to Brian, since I started attending the International Workshop on Statistical Modelling (IWSM) in 1999. Brian, just as he used to do with all new participants at the meeting, was always hospitable and friendly, at all subsequent workshops, contributing greatly to my continued participation in the conference. Brian was also my biggest supporter when I became chair of the Statistical Modelling Society in 2017. A preliminary version of this article was presented at IWSM 2019 in Guimaraes, Brian’s last meeting: some of the improvements were inspired by discussions with him during coffee breaks. Brian, you have left an unbridgeable void in the IWSM family, but your work in promoting statistical modelling and welcoming and engaging young students will continue. Thank you Brian for all your work. We miss you a lot!

Quantile regression (qr) (Koenker and Bassett, 1978; Koenker, 2005; Kneib, 2013) allows to estimate the effect of one or more covariates on the specified quantile of the conditional response distribution. Estimation of a single quantile curve does not pose any particular complication, but sometimes the focus is on estimating and managing multiple quantile curves: typical examples include the so-called ‘growth charts’ or ‘reference values’ where, for a specific problem at hand, the units have to be classified into ’normal’, ’over-’ or ‘under-’ taking the fitted quantiles as reference values. The most popular growth charts concern the ones for the growth of height, weight and other anthropometric measures in children (Borghi et al., 2006). Some less known successful applications of qr include Posidonia oceanica seagrass (Tomasello et al., 2016), for the forced expiratory volume (Kim et al., 2018), sport performance (Sandercock et al., 2016), and vocabulary developments (Frank et al., 2017). However, a well-known problem when dealing jointly with multiple quantiles is that of so-called crossing curves. Namely, any two, even quite close, estimated quantile curves should not cross each other along the covariate range in order to preserve the corresponding inequality in the population. Unfortunately crossing quantile curves can frequently occur, especially in small or moderate samples with sparse data and a nonlinear relationship to be estimated. Such a problem can lead to unpleasant consequences when the fitted model is used for prediction or classification (e.g., Wei et al., 2006).

Several authors have proposed solutions to fix the crossing issue in quantile regression. He (1997) discusses a location-scale shift model that avoids crossing while maintaining sufficient flexibility; Shim et al. (2009) propose to estimate location and scale functions simultaneously using doubly penalised kernel machines to achieve non-crossing; Chernozhukov et al., (2009) use the Lorentz inequalities and their generalizations; Bondell et al. (2010) set proper inequality constraints at the observed covariate values; Muggeo et al. (2013) achieve non-crossing by fitting sequentially the quantile regressions with inequality constraints to prevent non-crossing between two consecutive curves.

Schnabel and Eilers (2013) and Frumento and Bottai (2016) take a different approach by imposing a global structure for the quantile curves, rather than estimating them individually. In other words, they estimate the entire quantile process, and the curves at specific percentile levels can be ‘predicted’ afterwards. Schnabel and Eilers (2013) use the tensor product of B-splines to model the ‘surface’ defined over the (necessarily continuous) covariate by percentiles plane. Such surface—or ‘sheet’ as called by the authors—is fitted via iterative weighted least squares by putting two penalties on both covariate and percentile directions, which only reduces and does not eliminate the chance of occurrence of non-crossing. Frumento and Bottai (2016) exploit a similar perspective: they assume each covariate effect with respect to the percentile follows a polynomial or any specified parametric function, and they fit the model by minimizing an integrated loss function via a Newton-like algorithm. Their proposal has been extended to variable selection problems (Sottile et al., (2020) and nonlinear relationships (Bottai and Cilluffo, (2020). However, as in the aforementioned quantile sheet, this approach discourages but does not eliminate non-crossing, and further post-fit adjustments are requested to enforce a non-crossing fit (Sottile and Frumento, 2023).

Quite a different strategy to obtain non-crossing curves is represented by the lms method (Cole and Green, 1992), which is probably the oldest among different approaches, and the most widespread in some fields, especially in anthropometry. Here we fit a sub-regression model for each parameter of a specified response distribution, traditionally the Box-Cox-Normal or t distributions, and non-crossing quantile curves are obtained straightforwardly via the quantile function of the assumed model (Stasinopoulos and Rigby, 2007; Stasinopoulos et al., 2017). The key point is the appropriate selection of the response distribution: several alternatives could be undertaken, notably the Box-Cox-power-exponential, which has 4 parameters and it is the most flexible distribution able to account for any type of kurtosis. Several growth charts for World Health Organization have been produced by means of the lms method within the gamlss framework (Borghi et al., 2006).

Of course, there are a plenty of proposals relying on the Bayesian paradigm, such as Das and Ghosal (2018) and Yang and Tokdar (2017). While the Bayesian approach is able to provide satisfactory results, it requires specification of the priors and therefore it could not be favoured by practitioners. We do not discuss the Bayesian approaches in this article.

In this work we take an approach similar to Schnabel and Eilers (2013) and Frumento and Bottai (2016) by focussing on the entire quantile process. Namely, we aim to estimate jointly multiple non-crossing quantile curves depending on several linear and smooth terms via penalized splines with automatic and efficient ‘selection’ of the tuning parameters.

The rest of the article is organized as follows. In Section 2 we describe our proposal in detail, and in Section 3, we show results from simulation studies. Section 4 is devoted to the real data analysis and finally, Section 5 includes discussion and conclusions.

2 Methodology

Let $Q (τ | x)$ be the quantile at percentile $τ (0 < τ < 1)$ of the continuous response conditional distribution $Y | x$ . For ease of notation, we first assume there exist a single numeric covariate $x$ affecting the quantile via a flexible but otherwise unspecified function $h (\cdot)$ , possibly $τ$ specific, namely $Q (τ | x) = h (x, τ) \cdot h (\cdot)$ is assumed to be smooth and therefore we use $B$ -splines (Eilers and Marx, 1996, 2010, 2021) to approximate it. Namely, given $B_{1} (x_{i}), \dots, B_{p} (x_{i})$ basis functions we write

Q (τ | x_{i}) = h (x_{i}, τ) = \sum_{j}^{p} β_{j} (τ) B_{j} (x_{i}),

(2.1)

where $p$ is the basis dimension depending on its fixed and equispaced knots and polynomial degree, and the $β_{j}$ ’s are the spline coefficients to be estimated by minimization of the so-called check function $L (β (τ)) = \sum_{i} ρ_{τ} (y_{i} - B (x_{i})^{T} β (τ)) = w {(τ)}^{T} |y - B β (τ)|$ where $y = {(y_{i}, \dots, y_{n})}^{T}$ is the observed response, $w (τ)$ is the usual weight vector whose $i$ th element is $τ$ if $y_{i} \geq B {(x_{i})}^{T} β (τ)$ and $1 - τ$ otherwise. $β (τ) \in ℝ^{p}$ is the regression coefficient vector at the probability $τ$ , $B {(x_{i})}^{T}$ is the $i$ th row of the B-spline basis $B \in M_{n \times p}$ ; hereafter $M_{\cdot \times \cdot}$ will indicate a matrix with dimension in the subscript. $\hat{β} (τ)$ is obtained efficiently by means of any linear programming algorithm (Koenker, 2005, chap.6).

Since we have the entire quantile process as focus, we aim to estimate simultaneously quantile curves at percentiles $τ_{1}, \dots, τ_{K}$ . Let $Q (τ_{k} | x) = B β (τ_{k})$ be the qr model (2.1) written in matrix notation for the $n$ observations and percentile $τ_{k}$ . For all $K$ equations, the right hand sides can be written in compact form as $(I_{K} \otimes B) β$ , where $I_{K}$ is the $K$ -dimensional identity matrix and the whole ${(β {(τ_{1})}^{T}, \dots, β {(τ_{K})}^{T})}^{T} = β \in ℝ^{K p}$ collects the coefficients by probability level. Hence the objective to be minimized can be written $L (β) = w^{T} |(1_{K} \otimes y) - (I_{K} \otimes B) β|$ where the weight vector $w \in ℝ^{n K}$ includes all the $τ_{k}$ specific weights $w = 1_{K} \otimes {(w {(τ_{1})}^{T}, \dots, w {(τ_{K})}^{T})}^{T}$ . Again, as in the single quantile case, linear programming algorithms can be exploited, but optimization of $L (β)$ with no constraints leads to no advantage with respect to fitting separately the qr curves. To exploit joint estimation, we first assume each coefficient in the regression equation follows a smooth pattern across the percentiles $τ_{1}, \dots, τ_{K}$ . Namely for the generic $β (τ)$ we write $β (τ) = \sum_{r}^{q} θ_{r} C_{r} (τ; d e g_{τ})$ where the $C_{r} (τ; d e g_{τ})$ ’s are the basis functions evaluated in $τ$ of the B-spline of degree $d e g_{τ}$ with equally spaced knots in (0, 1), and the $θ_{r}$ ’s are $q$ coefficients. The values of the $j$ th coefficient at $τ_{1}, \dots, τ_{K}$ are written as $β_{j} (τ_{1}, \dots, τ_{K}) = {\overset{˘}{β}}_{j} = C θ_{j}$ , where $θ_{j} \in ℝ^{q}$ and $C \in M_{K \times q}$ . Assuming each coefficient $j$ has its $τ$ varying pattern, we express the $K p$ beta coefficients in terms of the $p q$ theta parameters via $\overset{˘}{β} = (I_{p} \otimes C) θ$ , where $I_{p}$ is the $p$ -dimensional identity matrix, ${(θ_{1}^{T}, \dots, θ_{p}^{T})}^{T} = θ \in ℝ^{p q}$ , and ${({\overset{˘}{β}}_{1}^{T}, \dots, {\overset{˘}{β}}_{p}^{T})}^{T} = \overset{˘}{β} \in ℝ^{K p}$ . $\overset{˘}{β}$ has the same entries of $β$ but in different positions: $β$ collects the coefficients by probability values, while $\overset{˘}{β}$ by the column index $j$ of $B$ . Thus, there exist a proper permutation matrix $P \in M_{K p \times K p}$ such that $β = P \overset{˘}{β}$ , and hence we can write

β = P (I_{p} \otimes C) θ .

(2.2)

Section A.1 of Supplementary Material includes a simple example about $P$ . The resulting objective depending on $θ$ via parametrization (2.2) is

L (θ) = w^{T} |(1_{K} \otimes y) - (I_{K} \otimes B) P (I_{p} \otimes C) θ| .

(2.3)

Note we now estimate $p q$ parameters, regardless of the number $K$ of selected probability values. Clearly $K$ increases the computational burden as the working sample is actually $n K$ . However its influence on $\hat{θ}$ , in terms of statistical performance, appears to be negligible, provided at least ten values are taken: simulations in Supplementary Material bear this out and we anticipate $K = 11$ will be used in practice.

2.1 Non-crossing constraints

Non-crossing is obtained when the inequality $Q (τ_{k} | x_{i}) \geq Q (τ_{k - 1} | x_{i})$ holds for each covariate pattern $i = 1, \dots, n$ and for any pair of contiguous quantile curves such that $τ_{k} \geq τ_{k - 1}$ . Equivalently, but somewhat more formally, we can write the non-crossing constraints via the first derivatives

\frac{\partial Q (τ; x_{i})}{\partial τ} \geq 0 \forall x_{i} i = 1, \dots, n .

(2.4)

Expressing the constraints via the first derivatives, allows to manage them more clearly as B-splines of degree $d e g$ have exact derivatives, continuous up to order $d e g - 1$ (Eilers and Marx, 2021, page 20). Given $β (τ) = \sum_{r} θ_{r} C_{r} (τ; d e g_{τ})$ , up a positive constant depending on the knot distance, we have $\partial β (τ) / \partial τ \propto \sum_{r} {Δ^{(1)} θ}_{r} C_{r} (τ; d e g_{τ} - 1)$ where the ${Δ^{(1)} θ}_{r}$ ’s are the first order differences of the $θ$ ’s, and the $C_{r} (\cdot; d e g_{τ} - 1)$ ’s refer to the B-basis of degree $d e g_{τ} - 1$ . Clearly, constraints (2.4) depend on $\partial β (τ) / \partial τ$ and, in turn, on the positiveness of the ${Δ^{(1)} θ}_{r}$ ’s. Hence for the generic ${\overset{˘}{β}}_{j} = β_{j} (τ_{1}, \dots, τ_{K}) = C θ_{j}$ , the inequalities to be fulfilled are $Δ^{(1)} θ_{j} \geq 0$ where $Δ^{(1)} \in M_{(q - 1) \times q}$ is the first order difference matrix. Thus, joint estimation of the quantile process with non-crossing constraints is obtained via

min L (θ) subject to (I_{p} \otimes Δ^{(1)}) θ \geq 0,

(2.5)

where $L (θ)$ is given in (2.3). Rather than including a full B-spline, alternatively one could include the model intercept and to set some constraints to make the basis identifiable. Following Wood (2006) we remove one basis function, the $p$ th say, and centre the remaining bases, that is, $(B_{j} (x_{i}) - {\bar{B}}_{j}) j = 1, \dots, p - 1$ , where the ${\bar{B}}_{j}$ ’s are the column means. The centred basis functions are not longer guaranteed to be positive but appropriate inequality constraints to attain non-crossing are easy to set. The regression equation involving the model intercept and the centred basis functions reads as

Q (τ ∣ x_{i}) = β_{0} (τ) + \sum_{j}^{p - 1} β_{j} (τ) (B_{j} (x_{i}) - {\bar{B}}_{j}) = β_{0} (τ) - \sum_{j}^{p - 1} β_{j} (τ) {\bar{B}}_{j} + \sum_{j}^{p - 1} β_{j} (τ) B_{j} (x_{i}),

which makes clear that, in addition to the inequalities on the $p - 1$ basis coefficients, we also need to assume $\partial β_{0} (τ) / \partial τ - \sum_{j} \{\partial β_{j} (τ) / \partial τ\} {\bar{B}}_{j} \geq 0$ . As discussed above, derivatives of the regression coefficients lead to first order differences of the $θ$ ’s, thus when the smooth term is expressed via centred basis the constrained objective (2.5) is replaced by

min L (θ) subject  to ([\begin{array}{l} 1 & - {\bar{B}}_{1} & \dots & - {\bar{B}}_{p - 1} \\ 0 & 1 & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & 1 \end{array}] \otimes Δ^{(1)}) θ \geq 0.

(2.6)

Notice that making identifiable the bases turns out crucial when several smooth terms are included in the quantile regression equation, see section 2.4.

2.2 Optimal smoothing: P-splines

Optimization of the constrained objective (2.5) or (2.6) prevents non-crossing curves when a B-spline basis is employed with no additional constraint on its coefficients. In the spirit of $P$ -splines, we can set a difference order on the adjacent coefficients related to the same curve in order to improve the estimator performance. For instance, at the probability level $τ_{k}$ the penalty would be $λ | |D^{(d)} β (τ_{k})| |_{1}$ where $| |a| |_{1}$ is the $L_{1}$ norm of $a$ , and $D^{(d)} \in M_{(p - d) \times p}$ is the matrix which forms differences of order $d$ , typically $d = 2$ or $3$ . However, as joint estimation is carried out, the overall penalty should account for wiggliness of all $K$ quantile curves, namely

P_{λ} (β) = λ \sum_{k}^{K} ||D^{(d_{k})} β (τ_{k}) ||_{1} = λ|| (I_{K} \otimes D^{(d)}) β| ‖_{1} =

(2.7)

= λ {‖(I_{K} \otimes D^{(d)}) P (I_{p} \otimes C) θ‖}_{1} = P_{λ} (θ),

(2.8)

where the last compact form assumes that the same difference order $d_{k} = d$ has been used for all the $K$ quantile curves. Therefore the objective to be optimized in (2.5) or (2.6) is $L_{λ} (θ) = L (θ) + P_{λ} (θ)$ which, at fixed $λ$ , is minimized via fitting the ‘unpenalized’ qr with the following augmented response and design matrix

[\begin{array}{l} 1_{K} \otimes y \\ - - - \\ 0 \end{array}] and [\begin{array}{l} (I_{K} \otimes B) P (I_{p} \otimes C) \\ - - - \\ (I_{K} \otimes λ D^{(d)}) P (I_{p} \otimes C) \end{array}],

(2.9)

where $0 \in ℝ^{K (p - d)}$ ; see Ng and Maechler (2007) for computational details.

The traditional approach to tune the smoothing parameter $λ \geq 0$ is via the Schwartz Information Criterion, SIC (Koenker et al., 1994; Koenker, 2011), $SIC (λ) = \sum_{k} log [\sum_{i} ρ_{k} (e_{i k}) / n] + e d f log (n) / (2 n)$ , where the $e_{i k}$ ’s are the residuals from the fitted curve at percentile $τ_{k}$ and where the $e d f$ represents the model equivalent degrees of freedom discussed in the next subsection. Note that the SIC formula includes the sum of the log values of $K$ check functions, $\sum_{k} \log \sum_{k} ρ_{i} (e_{i k})$ , rather than the log of the full objective $\log [\sum_{k} \sum_{i} ρ_{k} (e_{i k})]$ ; Bondell et al. (2010) recommend the former.

As an alternative to SIC to select $λ$ , we generalize the approach recently proposed by Muggeo et al. (2021) for a single quantile curve. The algorithm exploits the link between penalized and random effect models by viewing the penalized coefficients as random effects from a symmetric Laplace distribution, and the response variables from an asymmetric Laplace distribution. This leads to expressing the smoothing parameter as the ratio of the two scale parameter estimates, namely

λ = \frac{\sum_{k}^{K} \sum_{i}^{n} ρ_{k} (e_{i k}) / (n - e d f)}{\sum_{k}^{K} \sum_{j}^{p - d} | Δ^{d} \hat{β} {(τ_{k})}_{j} | / e d f} = \frac{L (\hat{θ}) / (n - e d f)}{{‖(I_{K} \otimes D^{(d)}) P (I_{p} \otimes C) \hat{θ}‖}_{1} / e d f} .

(2.10)

The numerator is just the overall unpenalized fidelity $L (\hat{θ})$ , while the denominator is the whole roughness of the $\hat{β}$ s expressed in terms of the $\hat{θ}$ ’s divided by their corresponding effective degrees of freedom ( $e d f$ ), possibly minus 1 if the unpenalized intercept is included. The algorithm starts by fixing a small values for $λ$ , and then updating it via the aforementioned ratio, where the computation of $e d f$ is discussed in the next section. The algorithm is quite efficient and usually converges in tens of iterations. We refer to it as the Harville-Fellner-Schall, HFS algorithm (Muggeo et al., 2021; Eilers and Marx, 2021, page 44), and it can be thought of as the analogue of the Schall algorithm in mean regression. It is worth stressing that (2.10) has been derived under the aforementioned Laplace assumptions, but its use is much wider and is substantially unaffected by the true response distributions, as underlined by the simulation study later.

2.3 Model complexity: approximate degrees of freedom

Quantifying the model complexity is a crucial step in model selection. Both the SIC and the HFS step in (2.10) need some measure of $e d f$ . The proposed joint qr uses $p q$ parameters, therefore an appropriate measure of model complexity should return that value when no penalty is included, regardless of the number of quantiles $K$ involved in the estimation. $e d f$ in qr are traditionally computed as the number of zero residuals (Koenker, 2005; Bondell et al., 2010; Koenker, 2011), but that approach would not be meaningful in our framework, since the number of interpolated points would increase when $K$ gets larger, making $e d f$ not independent of the number of percentiles used for estimation.

To quantify the $e d f$ we use the trace of the approximate hat matrix (Muggeo et al., 2021), which is the standard approach in the usual generalized linear models. The hat matrix is not defined for qr, but by exploiting the simple identity $|a| = a^{2} / |\tilde{a}|$ , where $|\tilde{a}|$ is known and approximates $a$ , we succeed in building the hat matrix for qr. To illustrate the idea we consider a penalized median regression with simplified notation: design matrix $Z$ , coefficients $b$ , linear predictor $Z b$ , residuals $e (b)$ , and penalty based on the differences $D b$ . The objective is ${‖e (b)‖}_{1} + λ {‖|D b|‖}_{1}$ and can be written in a weighted least squares form $e {(b)}^{T} W_{1} e (b) + λ b^{T} D^{T} W_{2} D b$ where $W_{1} = diag (| e (b) + ϵ |^{- 1})$ and $W_{2} = diag (| D b + ϵ |^{- 1})$ , wherein $ϵ$ is a very small constant preventing zero values. Schnabel and Eilers (2013) exploit this identity for parameter estimation, but we use that just for computing the degrees of freedom: in fact such least square formulation allows to gain straightforwardly the hat matrix ${(Z^{T} W_{1} Z + λ D^{T} W_{2} D)}^{- 1} Z^{T} Z$ whose leading diagonal elements are used to compute the model $e d f$ . The weighted least squares form also allows to account for the inequality constraints such as the non-crossing or the shape constraints. Following Bollaerts et al. (2006) and Muggeo and Ferrara (2008) we consider in the objective an asymmetric penalty taking an huge value $ω = 10^{5}$ say, only when the inequality constraints have been activated. For instance, if the constraint $D^{(1)} b \geq 0$ is set, the asymmetric penalty is $ω I (D^{(1)} b = 0)$ . Hence the hat matrix is ${(Z^{T} W_{1} Z + λ D^{T} W_{2} D + ω D^{(1) T} W_{3} D^{(1)})}^{- 1} Z^{T} Z$ where $W_{3} = diag (I (D^{(1)} b = 0))$ .

2.4 Extending the model: multiple linear and smooth terms

We write the qr equation with $H$ linear terms $Q (τ; z_{i}) = α_{0} (τ) + α_{1} (τ) z_{1 i} + \dots α_{H} (τ) z_{H i}$ . The design matrix is $B_{0} \in M_{n \times (H + 1)}$ which replaces $B$ in (2.3). To set up the non-crossing constraints, without loosing in generality we assume $m i n (z_{i}) = 0$ and $m a x (z_{i h}) = m_{h}$ for each covariate $h = 1, \dots, H$ . Due to linearity, the inequalities can be set just at the extremes $0$ and $m_{h}$ of each covariate range, namely

\frac{\partial α_{0} (τ)}{\partial τ} \geq 0 and \{\frac{\partial α_{0} (τ)}{\partial τ} + m_{h} \frac{\partial α_{h} (τ)}{\partial τ}\} \geq 0 \forall h = 1, \dots, H .

Again, such $H + 1$ inequalities are reflected on the $θ$ ’s, therefore the non-crossing constraints are

([\begin{matrix} 1 & 0 & \dots & 0 \\ 1 & m_{1} & 0 \\ ⋮ & ⋱ & ⋮ \\ 1 & 0 & \dots \end{matrix}] \otimes Δ^{(1)}) θ \geq 0.

(2.11)

When in addition to the $H$ linear terms, also several, $S$ say, smooths are included, the overall design matrix is $X = [B_{0} | B_{1} |\dots| B_{S}]$ where $B_{0} \in M_{n \times (H + 1)}$ includes the unpenalized linear terms, and the different B-spline bases are $B_{s} \in M_{n \times p_{s}} s = 1, \dots, S$ . By ‘joining’ constraints in (2.6) and (2.11), and by indicating with ${\bar{B}}_{1}^{T} \dots {\bar{B}}_{S}^{T}$ the row vectors of the means of basis functions for each smooth term, the overall non-crossing constraints are

([\begin{matrix} 1 & 0 & \dots & 0 & - {\bar{B}}_{1}^{T} & \dots & - {\bar{B}}_{S}^{T} \\ 1 & m_{1} & \dots & 0 & - {\bar{B}}_{1}^{T} & \dots & - {\bar{B}}_{S}^{T} \\ ⋮ & ⋱ & ⋮ & \dots \\ 1 & 0 & \dots & m_{H} & - {\bar{B}}_{1}^{T} & \dots & - {\bar{B}}_{S}^{T} \\ 0 & 0 & I_{p_{1}} & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & \dots & I_{p_{S}} \end{matrix}] \otimes Δ^{(1)}) θ \geq 0 .

(2.12)

If penalties on the basis coefficients are included, model estimation at fixed $λ_{1}, \dots, λ_{S}$ is carried out by augmenting properly the response vector and the design matrix as in (2.9): $B$ is replaced by the overall design matrix $X$ defined above, and in the bottom block $λ D^{(d)}$ is replaced by $D_{λ} = blockdiag (0, λ_{1} D_{1}^{(d_{1})}, \dots, λ_{S} D_{S}^{(d_{S})})$ . Clearly the $λ_{1}, \dots, λ_{S}$ are estimated by means of the HFS step (2.10) involving the appropriate roughness measures and degrees of freedom for each smooth term $s = 1, 2, \dots, S$ , that is,

λ_{s} = \frac{\sum_{k}^{K} \sum_{i}^{n} ρ_{k} (e_{i k}) / (n - H - 1 - \sum_{s} e d f_{s})}{\sum_{k}^{K} \sum_{j}^{p_{s} - d_{s}} | Δ^{d_{s}} \hat{β} {(τ_{k})}_{j s} | / e d f_{s}},

(2.13)

where $d_{s}$ is the difference order employed to penalize the coefficients of the $s$ th smooth having $e d f_{s}$ degrees of freedom. It is worth stressing that when the model includes several smooth terms, the SIC should be optimized with respect to $λ_{1}, \dots, λ_{S}$ (Koenker, 2011). However, multidimensional numeric optimization could result in a very heavy computational burden with questionable final results, possibly depending on starting values. Alternatively, a multidimensional grid search could be performed at the cost of increased computational time, probably unfeasible from a practical viewpoint. In contrast, the HFS algorithm is very efficient and largely insensitive of the number of $λ_{s}$ ’s to estimate.

2.5 Standard errors

To obtain the estimates variance matrix $cov (\hat{θ})$ we use the standard asymptotic theory (Koenker, 2011, 2005, page 77) relying on the sandwich formula reported in (2.14) formed by the ‘bread’ (the matrix outside) and the ‘meat’ (the matrix in the middle). The ‘meat’ is the gradient variance which could be also defined Information by borrowing terminology from mean regression. Let $X \in M_{n \times p}$ be the overall design matrix defined in the previous section for the single quantile curve with $p = 1 + H + \sum_{s = 1}^{S} p_{s}$ columns, $\tilde{X} = (I_{K} \otimes X) P (I_{p} \otimes C) \in M_{n K \times p q}$ the extended version corresponding to the $K$ quantile curves, and $\tilde{τ} = {(τ_{1}, \dots, τ_{K})}^{T} \otimes 1_{n} \in ℝ^{K n}$ . The gradient is $\tilde{U} = {\tilde{X}}^{T} (\tilde{τ} - 1 (\tilde{e} < 0)) \in ℝ^{p q}$ , where $\tilde{e} = {(e_{1}^{T}, \dots, e_{K}^{T})}^{T} \in ℝ^{K n}$ is composed by the residual vectors referring to the $K$ regression equations. The meat is $cov (\tilde{U}) = {\tilde{X}}^{T} (T \otimes I_{n}) \tilde{X} \in M_{p q \times p q}$ , where $T = {[min (τ_{k}, τ_{k'}) - τ_{k} τ_{k'}]}_{k, k'} \in M_{K \times K}$ . The ‘bread’ depends on the estimated densities ${\hat{f}}_{i k} = {\partial \hat{Q} (τ | x_{i}) / \partial τ |_{τ_{k}}}^{- 1}$ evaluated at every $τ_{k}$ and for each observation $i$ , and on the augmented design matrix including the penalty matrices as in (2.9). More specifically, let be $\tilde{X}$ the aforementioned design matrix, and ${\tilde{D}}_{λ} = (I_{K} \otimes D_{λ}) P (I_{p} \otimes C)$ the whole penalty matrix where $D_{λ}$ is defined in the previous section. To account for non-crossing constraints as discussed at the end of section 2.3, we write (2.12) by $R θ \geq 0$ where $R \in M_{p (q - 1) \times p q}$ , and $I (R \hat{θ} = 0) \in ℝ^{p (q - 1)}$ is the binary vector indicating if the constraint is active. Thus we build ${\tilde{R}}_{ω} = ω diag (I (R \hat{θ} = 0)) R$ , with $ω$ taking a large value as in section 2.3, and the whole augmented matrix is ${\tilde{X}}_{λ} = {[{\tilde{X}}^{T} |{\tilde{D}}_{λ}^{T}| {\tilde{R}}_{ω}^{T}]}^{T}$ . Moreover, let $diag (\hat{f})$ be the diagonal matrix having on the main diagonal the aforementioned ${\hat{f}}_{i k}$ ’s and ones in the remaining slots corresponding to the penalty terms (Koenker, 2011). Hence, the sandwich formula leads to

cov (\hat{θ}) \approx {({\tilde{X}}_{λ}^{T} diag (\hat{f}) {\tilde{X}}_{λ})}^{- 1} cov (\tilde{U}) {({\tilde{X}}_{λ}^{T} diag (\hat{f}) {\tilde{X}}_{λ})}^{- 1} .

(2.14)

Notice it is unnecessary to estimate the sparsity function or to rely on numerical approximations such as ‘Hendricks-Koenker’ or ‘Powell’ (Koenker, 2005, pages 79–80) used when a single qr is fitted. Indeed formula (2.14), via the ${\hat{f}}_{i k}$ ’s, exploits the exact B-spline derivatives discussed in section 2.1. Starting from $cov (\hat{θ})$ , and using (2.2), $cov (\hat{β})$ is obtained straightforwardly, and standard errors for the estimated quantile curves are computed accordingly.

3 Simulation study

We ran extensive simulations to assess how our proposed framework performs empirically. The performance is expressed by the MISE (mean integrated square error) defined as average (over the covariate space and the replicates) of the squared differences between true and fitted quantiles. In the first batch of simulations, we aim to assess how the quantile estimates depend on the number $K$ of percentiles involved in the estimation. Details are reported in Supplementary Material, but results suggest that as $K$ gets larger, the increasing computational load (the working sample size is actually $K n$ ), is not accompanied by a substantial reduction in the MISE, with very minor gains when more quantiles are involved in the estimation. Based on such results, and additional experiments not shown here, we conclude $K = 11$ or $15$ suffices to fit the model adequately while maintaining an acceptable computational burden.

In the second batch of simulations, we aim to compare our proposal with respect to some competitors when the relationship is both linear and nonlinear. Results for the linear case are reported in section C.1 of Supplementary Material. For the nonlinear case, data have been simulated according to $y_{1 i} = g_{1} (x_{i}) + e_{i}$ (scenario 1, homoscedastic) and $y_{2 i} = g_{2} (x_{i}) + g_{1} (x_{i}) e_{i}$ (scenario 2, heteroscedastic), where $g_{1} (x_{i}) = 0.5 + 2 x_{i} + sin (2 π x_{i} - .5)$ , $g_{2} (x_{i}) = 3 x_{i}$ and $x_{i} \sim U (0, 1)$ . The errors $e_{i}$ ’s are iid from $N (0, 1)$ , $χ_{3}^{2}$ , $t_{3}$ , and again $N (0, 1)$ where, at each replicate, the randomly selected 10% of simulated observations have been contaminated by adding to the covariate and response values, 1/3 of the observed range of the $x_{i}$ s or of the $y_{i}$ s respectively. We consider two alternative strategies to deal with nonlinearity: via B-splines without or with penalty. The unpenalized approach is illustrated in section D.2 of Supplementary Material, where we contrast our proposal with some competitors including Bondell et al. (2010); Schnabel and Eilers (2013); Frumento and Bottai (2016). Results emphasize our approach performs reasonably well, by exhibiting the lowest MISE’s most of times; see section C.3 of Supplementary Material for details. For the penalized approach we discuss results coming from $P$ -spline based fits whereby we consider only lms (Stasinopoulos et al., 2017) which appears to be the only competitor which fits the entire quantile process and accepts ‘automatic’ smoothing parameter selection in the current implementation. For our ‘biqr’ (B-spline integrated quantile regression) proposal, we examine the selection of $λ$ via the SIC or the HFS algorithm, while we run lms with its default settings. Table 1 reports the results. In the Gaussian case, not surprisingly, lms (based on the mean estimator) performs somewhat better than biqr especially with heteroscedastic data and at middle percentiles. For other error distributions results are somewhat fuzzy, with the proposed biqr which performs rather well when the dataset includes a small portion of outliers.

Table 1

MISE for the P-splines based estimated quantile curves at specified percentiles by signals and error distributions. Bold means the lowest value within any column for each error distribution

	Scenario 1							Scenario 2
	0.10	0.25	0.50	0.75	0.90	0.95	0.99	0.10	0.25	0.50	0.75	0.90	0.95	0.99
$N (0, 1)$
biqr (SIC)	0.352	0.295	0.281	0.294	0.353	0.402	0.617	0.539	0.447	0.428	0.447	0.542	0.639	1.027
biqr (HFS)	0.286	0.277	0.287	0.348	0.403	0.481	0.651	0.508	0.409	0.389	0.412	0.518	0.626	1.116
LMS	0.290	0.251	0.228	0.251	0.295	0.340	0.525	0.419	0.272	0.192	0.276	0.428	0.568	1.013
$χ_{3}^{2}$
biqr (SIC)	0.352	0.439	0.614	0.887	1.345	1.612	2.811	0.488	0.643	0.943	1.385	2.168	2.712	4.939
biqr (HFS)	0.323	0.418	0.587	0.875	1.326	1.593	3.222	0.438	0.612	0.921	1.389	2.121	2.647	5.517
LMS	0.459	0.457	0.426	0.673	1.134	1.469	2.410	0.747	0.706	0.777	1.417	2.410	3.118	5.135
$t_{3}$
biqr (SIC)	0.691	0.405	0.330	0.396	0.666	0.834	1.662	1.093	0.629	0.500	0.603	1.049	1.344	3.007
biqr (HFS)	0.671	0.405	0.324	0.394	0.648	0.792	1.940	1.022	0.600	0.451	0.581	1.009	1.305	3.351
LMS	0.553	0.407	0.328	0.406	0.594	0.891	2.398	0.785	0.445	0.253	0.437	0.786	1.345	3.417
$N (0, 1)$ + 10% outliers
biqr (SIC)	0.360	0.311	0.324	0.416	0.650	0.735	0.765	1.042	0.787	0.595	0.512	0.715	0.846	1.270
biqr (HFS)	0.352	0.303	0.321	0.408	0.640	0.716	0.734	1.010	0.764	0.578	0.493	0.704	0.840	1.257
LMS	0.544	0.515	0.507	0.529	0.608	0.719	1.068	0.861	0.625	0.533	0.649	0.904	1.102	1.650

4 Application

We apply the proposed B-spline integrated quantile regression to the Children’s vocabulary development. The Wordbank data ( http://wordbank.stanford.edu , Frank et al. (2017)) is a structured open database of developmental vocabulary data relevant to different countries across the world. Wordbank contains data from 84,138 children and 94,451 administrations of CDI, the ‘Communicative Development Inventories’, which represents a standardized parent-report tool widely used to collect data through families. By means of the CDI form, parents are either asked to indicate whether their child ‘understands’ (comprehension) or ‘understands and says’ (production), each of around 400–700 words. The whole dataset refers to 38 languages worldwide, but here we focus on the English (American) language production dataset having $n = 6414$ . The final goal is to estimate how the ‘vocabulary learning curve’ changes with age at different percentile levels while accounting for the gender effect. The quantile regression equation used is $Q_{prod} (τ | age, sex) = β_{0} (τ) + \sum_{j = 1} β_{j} (τ) B_{j} (age) + α (τ) sex$ with $0 < τ < 1$ where the $B_{j} (\cdot)$ ’s are the basis functions of an identifiable and centred cubic B-spline, and each regression coefficient is expressed via a rank 9 cubic B-spline $β_{j} (τ) = \sum_{r} θ_{r} C_{r} (τ)$ . Estimation is performed with non-crossing and monotonicity constraints, and penalty on the third-order spline coefficient differences with smoothing parameter selected by the HFS algorithm.

Figure 1 summarizes the model fits where we report the estimated quantile curves for males and females (panels A and B) and the $τ$ -varying sex effect with corresponding 95% pointwise confidence intervals (panel C). Panel D portrays a few pairs of quantile curves for male and females where the largest differences are at $τ = 0.65$ when the sex coefficient $\hat{α}$ reaches the largest effect.

Figure 1

Panels A and B: Some estimated quantile curves for males (A) and females (B). Panel C: $τ$ -varying effect of the linear coefficient of sex with corresponding 95% pointwise confidence intervals. The points refer to the estimates from qr models fitted separately with the three bars representing the 95% confidence intervals at the three $τ$ values $0.10, 0.65, 0.90$ . Panel D: Three quantile curves for males (continuous lines) and females (dashed lines): the largest distance between corresponding quantile curves is at about $τ = 0.65$ , when the sex coefficient gets the largest effect (in absolute value)

5 Conclusion

We have presented a unified framework to estimate the quantile process, that is, multiple quantile curves simultaneously when the non-crossing constraints are requested; while $K$ probability values are used to fit the model, ‘predictions’ at any percentile value are straightforward to obtain by simply evaluating the B-splines at the ‘new’ percentiles. Multiple linear and smooth terms are allowed, and the relevant smoothing parameters can be estimated iteratively as part of model fitting. The objective (2.3) being minimized with proper inequality constraints and possible penalties, can be considered as a discrete approximation of the integral $\int_{0}^{1} L (β (τ)) d τ$ , which is the loss function of Frumento and Bottai (2016). In that sense, our approach is strictly related to that, but our proposal also ensures by construction non-crossing and admits penalized splines which have better statistical performance. The other closely related framework is the lms which does ensure non-crossing, allows penalized splines and appears to perform slightly better under some simulation scenarios. However, as described in the Introduction, in the lms framework the covariate effect is specified via the regression equations of the model parameters (mean, variance, and kurtosis for instance). There is no explicit linear quantile regression equation, and thus quantification of the linear covariate effect at a specified percentile cannot be obtained straightforwardly; in fact, no linear term is usually included in the lms model. The lms framework allows to carry out ready diagnostic analysis: for instance, the worm plots using residuals can ‘validate’ the fitted model using several tools and under different perspectives. However, in the proposed framework, it is also possible to carry out a goodness of fit test following the idea of Frumento and Bottai (2016): let ${\hat{Q}}_{i}^{- 1} (\cdot)$ be the conditional distribution function for unit $i$ having covariate pattern $x_{i}$ , and $π_{i} = {\hat{Q}}_{i}^{- 1} (y_{i}; x_{i}, \hat{θ})$ the relevant probability value. If the model fits data well, the ${\hat{Q}}_{i}^{- 1} (\cdot)$ ’s should be reasonably close to the true distributions of the $y_{i}$ ’s. Therefore by the probability integral transform, we should expect $π_{i} \sim Unif (0, 1)$ , and the Kolmogorov-Smirnov test, say, can be applied to this aim. The approach described in this article will be implemented in the next release of the R package quantregGrowth.

Supplementary Material online includes further discussion and simulation results on the proposed framework, including fitting of discrete data following discussion of Frumento and Salvati (2021).

Supplementary materials

Supplementary materials for this article are available online.

Supplemental Material for Joint modelling of non-crossing additive quantile regression via constrained B-spline varying coefficients by Vito M.R. Muggeo, Gianluca Sottile, Giovanna Cilluffo, in Statistical Modelling

Footnotes

Acknowledgment

We would like to thank the Editor who handled the manuscript, Paul Eilers, for his careful reading and the referees whose suggestions greatly improved the article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The authors gratefully acknowledge financial support from the University of Palermo (FFR2022-23).

References

Bollaerts

, Eilers

and Aerts

(2006) Quantile regression with monotonicity restrictions using P-splines and the L1-norm. Statistical Modelling , 6, 189–207.

Bondell

, Reic

and Wang

(2010) Non-crossing quantile regression curve estimation. Biometrika , 97, 825–838.

Borghi

, de Onis

, Garza

, den Broeck

, Frongillo

, Grummer-Strawn

, Buuren

, Pan

, Molinari

, Martorell

, Onyango

, Martines

, and WHO Multicentre Growth Reference Study Group, JM (2006). Construction of the world health organization child growth standards: Selection of methods for attained growth curves. Statistics in Medicine , 25, 247–265.

Bottai

and Cilluffo

(2020) Nonlinear parametric quantile models. Statistical Methods in Medical Research , 29, 3757–3769.

Chernozhukov

, Fernandez-Val

and Galichon

(2009) Improving point and interval estimators of monotone functions by rearrangement. Biometrika , 96, 559–575.

Cole

and Green

(1992) Smoothing reference centile curves: The LMS method and penalized likelihood. Statistics in Medicine , 11, 1305–1319.

Das

and Ghosal

(2018) Bayesian non-parametric simultaneous quantile regression for complete and grid data. Computational Statistics & Data Analysis , 127, 172–186.

Eilers

and Marx

(2021) Practical smoothing: The joys of P-splines . Cambridge University Press.

Eilers

and Marx

(1996) Flexible smoothing with B-splines and penalties. Statistical Science , 11, 89–121.

10.

Eilers

and Marx

(2010) Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics , 2, 637–653.

11.

Frank

, Braginsky

, Yurovsky

and Marchman

(2017) Wordbank: An open repository for developmental vocabulary data. Journal of Child Language , 44, 677–694.

12.

Frumento

and Salvati

(2021) Parametric modeling of quantile regression coefficient functions with count data. Statistical Methods & Applications , 30, 1237–1258.

13.

Frumento

and Bottai

(2016) Parametric modeling of quantile regression coefficient functions. Biometrics , 72, 74–84.

14.

(1997) Quantile curves without crossing. The American Statistician , 51, 186–192.

15.

Kim

S-O

, Corey

, Stephenson

and Strug

(2018) Reference percentiles of FEV1 for the Canadian cystic fibrosis population: Comparisons across time and countries. Thorax , 73, 446–450.

16.

Kneib

(2013) Beyond mean regression. Statistical Modelling , 13, 275–303.

17.

Koenker

(2005) Quantile regression . Cambridge University Press.

18.

Koenker

(2011) Additive models for quantile regression: Model selection and confidence bandaids. Brazilian Journal of Probability and Statistics , 25, 239–262.

19.

Koenker

and Bassett

(1978) Regression quantiles. Econometrica , 46, 33–50.

20.

Koenker

, Ng

and Portnoy

(1994) Quantile smoothing splines. Biometrika , 81, 673–680.

21.

Muggeo

VMR

and Ferrara

(2008) Fitting generalized linear models with unspecified link function: A P-spline approach. Computational Statistics & Data Analysis , 52, 2529–2537.

22.

Muggeo

VMR

, Sciandra

, Tomasello

and Calvo

(2013) Estimating growth charts via nonparametric quantile regression: A practical framework with application in ecology. Environmental and Ecological Statistics , 20, 519–531.

23.

Muggeo

VMR

, Torretta

, Eilers

PHC

, Sciandra

and Attanasio

(2021) Multiple smoothing parameters selection in additive regression quantiles. Statistical Modelling , 21, 428–448.

24.

and Maechler

(2007) A fast and efficient implementation of qualitatively constrained quantile smoothing splines. Statistical Modelling , 7, 315–328.

25.

Sandercock

, Voss

, Cohen

, Taylor

and Stasinopoulos

(2016) Centile curves and normative values for the twenty metre shuttle-run test in English schoolchildren. Journal of Sports Sciences , 30, 679–687.

26.

Schnabel

and Eilers

(2013) Simultaneous estimation of quantile curves using quantile sheets. AStA Advances in Statistical Analysis , 97, 77–87.

27.

Shim

, Hwang

and Seok

(2009) Non-crossing quantile regression via doubly penalized kernel machine. Computational Statistics , 24, 83–94.

28.

Sottile

and Frumento

(2023) Parametric estimation of non-crossing quantile functions. Statistical Modelling , 23, 173–195.

29.

Sottile

, Frumento

, Chiodi

and Bottai

(2020) A penalized approach to covariate selection through quantile regression coefficient models. Statistical Modelling , 20, 369–385.

30.

Stasinopoulos

and Rigby

(2007) Generalized additive models for location scale and shape (GAMLSS) in R. Journal of Statistical Software , 23, 1–46.

31.

Stasinopoulos

, Rigby

, Heller

, Voudouris

and Bastiani

(2017) Flexible regression and smoothing: Using GAMLSS in R . Chapman and Hall/CRC.Raton.

32.

Tomasello

, Sciandra

, Muggeo

, Pirrotta

, Maida

and Calvo

(2016) Reference growth charts for Posidonia oceanica seagrass: An effective tool for assessing growth performance by age and depth. Ecological Indicators , 69, 50–58.

33.

Wei

, Pere

, Koenker

and He

(2006) Quantile regression methods for reference growth charts. Statistics in Medicine , 25, 1369–1382.

34.

Wood

(2006) Generalized additive models: An introduction with R . Chapman & Hall.

35.

Yang

and Tokdar

(2017) Joint estimation of quantile planes over arbitrary predictor spaces. Journal of the American Statistical Association , 112, 1107–1120.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.67 MB