Expectile and quantile regression

Abstract

Recent interest in modern regression modelling has focused on extending available (mean) regression models by describing more general properties of the response distribution. An alternative approach is quantile regression where regression effects on the conditional quantile function of the response are assumed. While quantile regression can be seen as a generalization of median regression, expectiles as alternative are a generalized form of mean regression.

Generally, quantiles provide a natural interpretation even beyond the 0.5 quantile, the median. A comparable simple interpretation is not available for expectiles beyond the 0.5 expectile, the mean. Nonetheless, expectiles have some interesting properties, some of which are discussed in this article. We contrast the two approaches and show how to get quantiles from a fine grid of expectiles. We compare such quantiles from expectiles with direct quantile estimates regarding efficiency. We also look at regression problems where both quantile and expectile curves have the undesirable property that neighbouring curves may cross each other. We propose a modified method to estimate non-crossing expectile curves based on splines. In an application, we look at the expected shortfall, a risk measure used in finance, which requires both expectiles and quantiles for estimation and which can be calculated easily with the proposed methods in the article.

Keywords

Expected shortfall Least asymmetrically weighted squares non-crossing penalized splines semiparametric

1 Introduction

Quantile regression allows the estimation of the effect of covariates on the distribution of a response variable. The idea has been suggested by (Koenker and Bassett, 1978) and is well elaborated with numerous extensions in (Koenker, 2005). The underlying regression model for the $α$ -quantile with $α \in (0, 1)$ is specified as

y_{i} = q_{i, α} + ε_{i, α}, i = 1, \dots, n

with

y_{i}

as response variable,

i = 1, \dots, n

and

q_{i, α}

as the

α

-quantile which may depend on covariates

x_{i}

, say, e.g., through the linear model

q_{i, α} = β_{0 α}^{(q)} + x_{i} β_{1 α}^{(q)}

. Unlike classical regression where a zero mean is assumed for the residuals, in quantile regression one postulates that the

α

-quantile of the residuals

ε_{i, α}

is zero, i.e.,

P (ε_{i, α} \leq 0) = α

. Estimates for

q_{i, α}

are obtainable through the minimizer of the weighted

L_{1}

sum

\sum_{i = 1}^{n} w_{i, α} |y_{i} - q_{i, α}|,

(1.1)

where

w_{i, α} = \{\begin{matrix} 1 - α, & for y_{i} < q_{i, α}, \\ α, & for y_{i} \geq q_{i, α} \end{matrix}

are asymmetric weights. Numerically, (1.1) can be minimized by linear programming, see, e.g., (Koenker, 2005).

As an alternative to the quantile regression, (Aigner et al., 1976) and (Newey and Powell, 1987) proposed to replace the $L_{1}$ distance in (1.1) by a quadratic $L_{2}$ term leading to the asymmetric least squares

\sum_{i = 1}^{n} w_{i, α} {(y_{i} - m_{i, α})}^{2}

(1.2)

where the minimizer

{\hat{m}}_{i, α}

, say, is called (estimated) expectile. The underlying regression model now is

y_{i} = m_{i, α} + ε_{i, α}, i = 1, \dots, n

under the assumption that the

α

-expectile

m_{i, α}

of the error terms is zero. The expectile

m_{i, α}

may again depend on covariates, e.g., through the linear expectile model

m_{i, α} = β_{0 α}^{(m)} + x_{i} β_{1 α}^{(m)}

. Expectile estimation is thereby a special form of M-quantile estimation Breckling and Chambers, 1988 and expectile regression has seen some increasing interest in the last years (Schnabel and Eiler, 2009b; Pratesi et al.,, 2009; Sobotka and Kneib, 2012; Guo and Härdle, 2013). An overview about methods focusing on estimation procedures regarding more features of the data than its centre (including semiparametric expectile and quantile regression) can be found in (Kneib, 2013).

In this article, we contrast quantile and expectile regression and propose some extensions to expectile estimation to link it to quantiles. A comparison of the two routines might remind us of the story of David and Goliath just by comparing the number of citations: about 1850 for (Koenker and Bassett, 1978) referring to quantiles and about 100 for (Newey and Powell, 1987), as of November 2013. Quantiles are certainly more dominant in the literature due to the fact that expectiles lack an intuitive interpretation while quantiles are just the inverse of the distribution function. Numerically, as seen by comparing (1.1) to (1.2), quantiles ‘live’ in the $L_{1}$ world while expectiles are rooted in the $L_{2}$ world. This by itself has several consequences. Quantiles need linear programming for estimation while expectiles are fitted using quadratic optimization. Beyond all discrepancies between quantiles and expectiles, it is important to note that both are related in various ways. (Jones, 1994) shows that expectiles are in fact quantiles of a distribution function uniquely related to the distribution of $y$ . (Yao and Tong, 1996) give a similar result by showing that there exists a unique bijective function $h : (0, 1) \to (0, 1)$ such that $q_{α} = m_{h (α)}$ , where $h (.)$ is defined through

h (α) = \frac{- α q_{α} + G (q_{α})}{- m_{0.5} + 2 G (q_{α}) + (1 - 2 α) q_{α}}

(1.3)

with

G (q) = \int_{- \infty}^{q} y dF (y)

as the partial moment function and

F (y)

as cumulative distribution of

y

(see also De Rossi and Harvey, 2009). Note that

m_{0.5} = E (y) = G (\infty)

. In this article, we will make use of relation (1.3) and relate quantile estimates

{\hat{q}}_{α}

to expectile-based quantile estimates

{\hat{m}}_{\hat{h} (α)}

, where

\hat{h} (.)

is an estimated version of

h (.)

in (1.3). One of the key findings of the article is that

{\hat{m}}_{\hat{h} (α)}

are numerically more demanding than quantile estimates, but, as simulations show, they serve as quantile estimates which can be even more efficient than the empirical quantile

{\hat{q}}_{α}

itself.

In quantile regression, a numerical problem in applications are the so-called crossing quantile functions. These occur if for estimated quantiles, one gets ${\hat{q}}_{α} (x) > {\hat{q}}_{α^{'}} (x)$ for $α < α^{'}$ for some value $x$ (in the observed range of the covariate), where ${\hat{q}}_{α} (x) = {\hat{β}}_{0 α}^{(q)} + x {\hat{β}}_{1 α}^{(q)}$ . Several methods, algorithms and model constraints have been proposed to circumvent the problem. Bondell et al., 2010 made use of linear programming. They also gave a good overview about earlier proposals including (Koenker, 1984), (He, 1997), (Wu and Liu, 2009) or (Neocleous and Portnoy, 2008). Chernozhukov et al. (2010) rearranged the fitted (linear) curves into a set of non-crossing curves, whereas (Dette and Scheder, 2011) used marginal integration techniques and monotone rearrangements. (Schnabel and Eilers, 2013b) proposed the so-called quantile sheets where crossings are circumvented by penalization. The problem of crossing curves occurs in principle in the same way for expectile regression. We demonstrate with simulations that crossing expectiles occur less frequently. This implies that less attention is needed to avoid crossing expectiles compared to crossing quantiles.

Quantile regression, as well as expectile regression, can be extended to non-parametric functional estimation. For quantile estimation, (Koenker et al., 1994) proposed spline-based estimation. (Bollaerts et al., 2006) made use of penalized B-splines with an $L_{1}$ penalty. Recently (Reiss and Huang, 2012) suggested quantile estimation based on penalized iterative least squares (see also Yuan, 2006). For expectiles, smooth estimation has been pursued by, e.g., (Pratesi et al. 2009), and (Schnabel and Eilers 2009b). Other semi- or non-parametric extensions of quantile regression which allow for flexibility are varying coefficient models. Here smoothness is achieved by letting the quantile not only depend on (unknown) coefficients which are constant, but on (smooth) coefficient functions (see, e.g., Honda, 2004; Noh et al., 2012 or Andriyana et al., 2014). The idea of smoothing can be extended by assuming that a ‘set’ of $α$ -quantile curves smoothly depends on both, the covariate and $α$ . Using B-splines this easily allows us to incorporate non-crossing conditions, as in (Bondell et al., 2010) or (Muggeo et al., 2013), for quantile estimation. The estimation of quantile sheets is proposed in (Schnabel and Eilers, 2013b) (Schnabel and Eilers 2009a) give a description of non-crossing smooth expectile curves (see also Eilers, 2013). Both approaches are based on penalized spline smoothing. Our approach to estimate expectile sheets is very similar to an extension formulated in Schnabel and Eilers, 2009a, who propose the estimation of expectile sheets using a tensor product of B-splines combined with a penalty in direction of the asymmetry parameter and covariate. But it is different in the sense that we are using a linear B-spline in direction of the asymmetry parameter and avoid the use and choice of a smoothing parameter in that same direction.

Expectiles might not gain popularity as much as quantiles, but we think they deserve their niche. For example, (Aigner et al., 1976) construct expectiles to estimate production frontiers and give an additional argument for using expectiles by stating that expectile regression is a way to treat asymmetric consequences as it places different weights on positive and negative residuals. But there are other fields which demand for expectiles as well, for example the field of risk measures for financial assets. (Ziegel, 2013) argues for the use of expectiles as a risk measure as they have desirable properties. Another frequently used risk measure is the ‘expected shortfall’, which needs the calculation of both, quantiles and expectiles. The expected shortfall (ES) is a trimmed mean, that is, the mean of a random variable conditional that its value is above (or below) a certain quantile. The ES can be written as a function of both, the quantile and the expectile for a level $α$ . Estimation of the ES has been recently proposed by (Leorato et al., 2012) by employing the integrated (conditional) fitted quantile regression function (see also Wang and Zhou, 2010). We extend an idea of (Taylor, 2008) and use the fitted quantiles and expectiles for the estimation of the ES. This connection becomes extremely useful for calculating the ES as it depends both on expectiles and their corresponding quantiles (as described by Taylor, 2008).

The article is organized as follows. In Section 2, we compare and contrast quantiles and expectiles, both theoretically and based on simulation. In Section 3, we look at quantile and expectile regression before Section 4 provides extensions and examples. Section 5 concludes the contest of David and Goliath.

2 Expectiles and quantiles

2.1 Quantiles from expectiles

Quantiles as well as expectiles uniquely define a distribution function. Let $F (y)$ denote the continuous distribution function of a univariate random variable $Y$ , which for the sake of simplicity for now is assumed to not depend on any covariates $x$ . The distribution is uniquely defined by the quantile function $q_{α} = q (α) = F^{- 1} (α)$ for $α \in (0, 1)$ or by the expectile function $m_{α} = m (α)$ for $α \in (0, 1)$ . First we show how to numerically derive the quantile function $q (α)$ from the expectile function $m (α)$ . With other words we demonstrate how the transfer function $h (.)$ in (1.3) can be derived numerically, which in practice allows us to calculate the quantile function from a fitted expectile function $\hat{m} (α)$ , say. Note first that expectiles are defined through

m_{α} = \frac{(1 - α) G (m_{α}) + α (m_{0.5} - G (m_{α}))}{(1 - α) F (m_{α}) + α (1 - F (m_{α}))}

(2.1)

which needs to be solved numerically for

F (m_{α})

. Let, therefore,

0 < α_{1} < α_{2} \dots < α_{L} < 1

be a dense set of knots covering the

(0, 1)

interval. In the following we denote

{\hat{m}}_{l} = {\hat{m}}_{α_{l}}

to simplify the notation. In principle, a fine grid of expectiles is all we need to estimate the distribution function or quantiles. If the original data is still at hand, one can set

{\hat{m}}_{0} = min {y_{i}, i = 1, \dots, n} - c_{0}

and just as well

{\hat{m}}_{L + 1} = max {y_{i}, i = 1, \dots, n} + c_{L + 1}

, where

c_{0}

and

c_{L + 1}

serve as tuning parameters. In our simulations in Section 2.2 and in the examples in Sections 2.4 and 4.2, we set

{\hat{m}}_{0} = {\hat{m}}_{1} + ({\hat{m}}_{1} - {\hat{m}}_{2})

{\hat{m}}_{L + 1} = {\hat{m}}_{L} + ({\hat{m}}_{L} - {\hat{m}}_{L - 1})

. If one chooses

α_{1}

to be close to zero (and analogously

α_{L}

to be close to 1), then there is obviously just a small difference between the minimal expectile and the minimal observed value of the data.

We now solve (2.1) for ${\hat{m}}_{l}$ , $l = 1, \dots, L,$ and denote the resulting estimator of the cumulative distribution function with ${\hat{F}}_{m} (.)$ . To obtain ${\hat{F}}_{m} (.)$ , we estimate the distribution function at the estimated expectiles ${\hat{m}}_{l}$ through

{\hat{F}}_{l} : = {\hat{F}}_{m} ({\hat{m}}_{l}) = \sum_{j = 1}^{l} {\hat{γ}}_{j}

(2.2)

for non-negative steps

{\hat{γ}}_{j} \geq 0, j = 1, \dots, L

and

{\hat{γ}}_{L + 1} = 1 - \sum_{l = 1}^{L} {\hat{γ}}_{l} \geq 0

. Making use of linear interpolation between adjacent values of

\hat{F} (.)

leads to

{\hat{G}}_{l} = \hat{G} ({\hat{m}}_{l}) = \sum_{j = 1}^{l} {\hat{c}}_{j} {\hat{γ}}_{j}

with

{\hat{c}}_{j} = ({\hat{m}}_{j} - {\hat{m}}_{j - 1}) / 2

and

{\hat{G}}_{L + 1} = {\hat{m}}_{0.5}

as (linear) constraint. This setting now allows to calculate

\hat{γ} = ({\hat{γ}}_{1}, \dots, {\hat{γ}}_{L})

from a set of estimated expectiles. Details are given in the Appendix.

Defining the linear interpolation ${\hat{F}}_{m} (y) = \sum_{j = 1}^{l} {\hat{γ}}_{j} + {\hat{γ}}_{j + 1} (y - {\hat{m}}_{l}) / ({\hat{m}}_{l + 1} - {\hat{m}}_{l})$ for $y \in [{\hat{m}}_{l}, {\hat{m}}_{l + 1})$ allows us to invert ${\hat{F}}_{m} (.)$ to obtain quantile estimates based on estimated expectiles. We define these as

{\hat{m}}_{\hat{h} (α)} = {\hat{F}}_{m}^{- 1} (α) .

Note that with the definition of

{\hat{m}}_{\hat{h} (α)}

for

α \in (0, 1)

, we get an explicit estimate of

\hat{h} (.)

as a by-product. This is derived by interpolating

α_{l}

and

{\hat{F}}_{m} (m_{l})

, which defines

h^{- 1} (.)

, and by taking the inverse we get

h (.)

We need that ${\hat{γ}}_{l} \geq 0$ which must be fulfilled since ${\hat{m}}_{l} \geq {\hat{m}}_{l - 1}$ . Numerical inaccuracy may yield negative values for ${\hat{γ}}_{l}$ , in particular, for $α_{l}$ close to 0 or 1. Estimation under the linear constraint $\hat{γ} \geq 0$ circumvents the problem. Moreover, the estimation can get numerically unstable, which is easily eliminated by imposing a small penalty on the calculated values $\hat{γ}$ . In fact, defining the density corresponding to $\hat{F} (\cdot)$ as $\hat{f} (\cdot)$ with $\hat{f} (y) = {\hat{γ}}_{l + 1} / ({\hat{m}}_{l + 1} - {\hat{m}}_{l})$ for $y \in [{\hat{m}}_{l + 1}, {\hat{m}}_{l})$ , we want $\hat{f} (\cdot)$ to be ‘smooth’. In other words, $\hat{f} (y) - \hat{f} (y + h)$ should be small for $h$ small. Given that $\hat{f} (\cdot)$ is a step function, this translates to imposing the penalty

λ_{pen} \sum_{l = 1}^{L - 1} {(\frac{{\hat{γ}}_{l}}{{\hat{m}}_{l} - {\hat{m}}_{l - 1}} - \frac{{\hat{γ}}_{l + 1}}{{\hat{m}}_{l + 1} - {\hat{m}}_{l}})}^{2} .

(2.3)

Details are provided in the Appendix. Note that the calculation of quantiles from expectiles is somewhat numerically demanding. Alternative approaches to estimate quantiles from expectiles are described in (Efron 1991) and (Schnabel and Eilers 2013a). (Schnabel and Eilers 2013a) propose to estimate non-crossing expectile curves using a so-called expectile bundle.Within the expectile bundle crossing curves are prevented but, as a location scale model is assumed, one also loses flexibility. Schnabel and Eilers (2013a) also give a description of how to estimate the density (and therefore, quantiles) from a set of expectiles by using penalized least squares. The approach by (Efron 1991) is a more naive way to get an estimation for quantiles on a bases of expectiles. (Efron 1991) proposed to estimate a high number of expectiles and to count the number of observations lying below each expectile. He calls the resulting estimates percentiles. Taylor (2008) also uses this method to estimate quantiles from expectiles to calculate the ES. The method proposed in (Efron 1991) clearly has the advantage that it is simple and easy to perform, but, as one can imagine, it is not very efficient. Especially for extreme values of

α

, our method is to be preferred as it leads to more precise estimates.

Conclusion: From a set of expectiles we can numerically obtain the quantile function. The method can also be applied in the regression scenario by conditioning on the explanatory variable as will be demonstrated in the article later.

2.2 Empirical evaluation

Evidently, the resulting fitted distribution function $F_{m} (.)$ is continuous but has $L + 1$ non-differentiable edges. In principle, one can set $L$ large to $n$ , but this may require heavy and numerically unstable calculations. In our experience, a sequence from $0.0001, 0.001, 0.01, 0.02, \dots, 0.98, 0.99, 0.999, 0.9999$ usually is sufficient for deriving the quantile in the range between 1% and 99%, but for large sample sizes, it may be sensible to choose $L$ such that $L$ is proportional to $n$ .

The procedure allows us now to derive quantiles from expectiles and the question arises how they perform in terms of efficiency. We, therefore, run a small simulation study where we estimate a number of expectiles slightly smaller than $n$ (for $n = 499$ , we estimated 459 expectiles; for $n = 199$ , we set $L = 159$ ). We simulate (a) from the standard normal distribution, (b) from the Chi-squared distribution (df = 2) and (c) from the t-distribution (df = 3) with sample sizes $n = 199$ and $n = 499$ , and each simulation is replicated 1000 times. We use odd sample sizes to guarantee unique quantiles, for e.g., $α = 0.5$ . We compare our quantiles from expectiles ${\hat{m}}_{\hat{h} (α)}$ with ordinary quantiles for $α = 0.01, 0.02, 0.05, \dots, 0.95, 0.98, 0.99$ . The calculation of quantiles from expectiles is a part of the R-package ‘expectreg’ (as all R-packages available from cran.r-project.org ). Quantiles ${\hat{q}}_{α}$ are calculated using the function rq from the R-Package ‘quantreg’ by (Koenker 2013b). We also look at smooth quantiles denoted by ${\hat{q}}_{α}^{smooth}$ and calculated using the method proposed by (Jones, 1992). For a moderate number of $L$ , numerical instability does not seem to be a problem within the estimation of quantiles from expectiles. In total, the simulation includes $6 ❌ 1000$ times the calculation of quantiles from expectiles, a procedure which failed in none of the 6000 cases. Penalty (2.3) does not only lead to a smooth distribution function and, therefore, to smooth quantile estimates, but also improves the numerical stability of the calculations. For $n = 499$ and a number of 459 expectiles, the implemented function needs around seven seconds to calculate ${\hat{F}}_{m} (.)$ (using one kernel of an ordinary computer).

In Figure 1, we show for one (randomly chosen) sample of each distribution with $n = 499$ the fitted transfer function $h (.)$ .

Figure 1:

Source: Authors' own (prepared with GNU R).

The true function is provided for comparison and apparently the fit looks acceptable. The function $h$ itself is of secondary interest for this simulation study, but we see that $h (.)$ in fact can be estimated. Moreover, we will need the transfer function later in the example of Section 4.3 where we estimate the ES.

In Figure 2, we visualize the result of our simulations. Here we compare ${\hat{m}}_{\hat{h} (α)}$ with ${\hat{q}}_{α}$ (solid lines) and, to make a fair comparison, with ${\hat{q}}_{α}^{smooth}$ (dotted lines). As the results for $n = 199$ and $n = 499$ are very similar, Figure 2 concentrates on $n = 199$ . The first plot of Figure 2 shows the simulation-based relative root mean squared error (RMSE) of the estimated quantiles for the standard normal distribution. Results for the Chi-squared distribution can be found in the second plot and in the third plot, findings for the student t-distribution are visualized. Results above the unit line stand in favour for expectiles, and Figure 2 mirrors surprisingly satisfactory performances for ${\hat{m}}_{\hat{h} (α)}$ . We notice the gain of efficiency for the two symmetric distributions and inner quantiles: The RMSE for quantiles from expectiles is 5–10% lower than the RMSE for smooth quantiles. Not surprisingly, the difference between quantiles and smooth quantiles becomes stronger when looking at extreme quantiles. This is also mirrored in the relative RMSE as we see that for quantiles reflecting extreme observations the smoothing leads to an improvement. Generally, it occurs that the expectile-based quantile estimators ${\hat{m}}_{\hat{h} (α)}$ behave sound and are (for most values of $α$ ) more efficient than the direct quantile estimates ${\hat{q}}_{α}$ . This holds as well in terms of relative RMSE as in terms of relative mean absolute error which is not reported here as the results were quite similar.

Figure 2:

Source: Authors' own (prepared with GNU R).

Conclusion: All in all, we see, that the calculation of quantiles from a set of expectiles is a sensible thing to do also in terms of efficiency. The numerical burden is of course not ignorable.

2.3 Expectiles and quantiles in the tail

As can be seen from Figure 1, we have for small values of $α$ that $h (α) ≪ α$ and accordingly for $α$ close to 1, $(1 - h (α)) ≪ (1 - α)$ , unless the distribution is heavily tailed. For instance, the $α = 0.01$ quantile of the standard normal distribution corresponds to the $h (α) = : \tilde{α} = 0.0014524$ expectile. This raises the question if and how well extreme expectiles can be estimated. To tackle this question formally, we look at expectiles and quantiles in the tail of the distribution by setting

α = λ / n (or α = 1 - λ / n)

(2.4)

for some

λ \geq 1

. Moreover, we assume that the tails have a reasonable interpretation in that the second-order moment of the underlying distribution is finite. The expectile estimate

{\hat{m}}_{\tilde{α}}

is defined as a minimizer of (1.2) for

\tilde{α} = h (α)

and we get

{\hat{m}}_{\tilde{α}} = {(\sum_{i = 1}^{n} {\hat{w}}_{i, \tilde{α}})}^{- 1} (\sum_{i = 1}^{n} {\hat{w}}_{i, \tilde{α}} Y_{i}),

(2.5)

where

{\hat{w}}_{i, \tilde{α}} = 1 - \tilde{α}

for

Y_{i} < {\hat{m}}_{\tilde{α}}

and

{\hat{w}}_{i, \tilde{α}} = \tilde{α}

for

Y_{i} \geq {\hat{m}}_{\tilde{α}}

. Note that (2.5) is not an analytic definition, since the iterated weights

{\hat{w}}_{i}

depend on the fitted value

{\hat{m}}_{\tilde{α}}

. We simplify (2.5) by replacing the ‘fitted’ weights by their ‘true’ weights

w_{i, \tilde{α}}

defined through

w_{i, \tilde{α}} = 1 - \tilde{α}

for

Y_{i} < m_{\tilde{α}}

and

w_{i, \tilde{α}} = \tilde{α}

otherwise. This allows us to approximate (2.5) to

{\hat{m}}_{\tilde{α}} {(\sum_{i = 1}^{n} w_{i, \tilde{α}})}^{- 1} (\sum_{i = 1}^{n} w_{i, α} Y_{i}) .

(2.6)

Note that as shown in the Appendix, we have

\tilde{α} ≪ α

for

α

close to 0 and

(1 - \tilde{α}) ≪ (1 - α)

for

α

close to 1. Therefore, we find

w_{α} : = E (w_{i, α}) = (1 - \tilde{α}) α + \tilde{α} (1 - α) α

, so that with (2.4), we may approximate (2.6) through expansion to

{\hat{m}}_{\tilde{α}} - m_{\tilde{α}} \approx λ^{- 1} \sum_{i = 1}^{n} w_{i, \tilde{α}} (Y_{i} - m_{\tilde{a}}) -

(2.7)

λ^{- 2} \sum_{i = 1}^{n} (w_{i, \tilde{α}} - w) \sum_{j = 1}^{n} w_{j} (Y_{i} - m_{\tilde{a}}) + \dots .

(2.8)

The first component in (2.7) has mean zero and variance

\begin{matrix} V_{\tilde{α}} : = & λ^{- 2} [{(1 - \tilde{α})}^{2} {H (q_{α}) - 2 G (q_{α}) q_{α} + q_{α}^{2} α} \\ + {\tilde{α}}^{2} {(H (\infty) - H (q_{α})) (m_{0.5} - G (q_{α})) + (1 - α) q_{α}^{2}}] . \end{matrix}

(2.9)

Note that with the assumption of finite second-order moments, we have

\int_{- \infty}^{q_{α}} y^{2} f (y) dy < \infty

which implies that

f (y) = o ({| y |}^{- 3})

for

y \to - \infty

. This in turn yields that

α q_{α}^{2} = o (1)

and

G (q_{α}) q_{α} = o (1)

so that overall

V_{\tilde{α}} = o (1)

Looking at the second component in (2.8), we find that its mean equals $λ^{- 2} \tilde{α} (q_{α} - m_{0.5})$ while its variance is of order $O ({\tilde{α}}^{2}) O (V_{α}) .$ Arguing that $\tilde{α} ≪ α$ , see the Appendix, we can conclude that the second term in (2.8) is of ignorable asymptotic order for $α = λ / n \to 0$ . The same holds by simple calculation for the remaining components not explicitly listed in (2.7) and (2.8). Hence, we may approximate the distribution of ${\hat{m}}_{\tilde{α}} - m_{\tilde{α}}$ through

{\hat{m}}_{\tilde{α}} - m_{\tilde{α}} λ^{- 1} \sum_{i = 1}^{n} w_{i, \tilde{α}} (y_{i} - m_{\tilde{α}}) .

(2.10)

In particular, with (2.10), we get the (asymptotic) unbiasedness

E ({\hat{m}}_{\tilde{α}}) = m_{\tilde{α}}

. One may even derive asymptotic normality from (2.10) by showing that higher order moments vanish. We can, therefore, conclude that even for extreme expectiles, i.e.,

\tilde{α} = h (α)

1 - \tilde{α} = 1 - h (α)

very small, respectively, we achieve asymptotic unbiasedness and normality.

Figure 3:

Source: Authors' own (prepared with GNU R).

We now pose the same question to quantiles, i.e., what can be said asymptotically about quantile estimation in the tails of the distribution. Following (Koenker and Bassett, 1978) and (Koenker, 2005 pp. 71–72 ), we can derive the distribution of the quantile estimate ${\hat{q}}_{α}$ as follows. Let $g_{α} (q) = \frac{1}{n} \sum_{i = 1}^{n} 1 {Y_{i} \leq q} - α$ with $1 {.}$ as a indicator function, then ${\hat{q}}_{α}$ is defined through $g_{α} ({\hat{q}}_{α}) \geq 0$ and $g_{α} ({\hat{q}}_{α} - δ) < 0$ for all $δ > 0$ . Hence

P ({\hat{q}}_{α} - q_{α} \leq δ) = P (\sum_{i = 1}^{n} 1 {Y_{i} \leq q_{α} + δ} \geq n α) = 1 - P (Z_{δ} < n α),

(2.11)

where

Z_{δ}

is a binomial random variable with parameters

Z_{δ} \sim Bin (n, F (q_{α} + δ))

. Note that

F (q_{α} + δ) α + f (q_{α}) δ

and with (2.4) we have that the distribution of

Z_{δ}

(for small

δ

) converges to a Poisson distribution. As a consequence, for extreme quantiles, we do not achieve asymptotic normality and, therefore, unbiasedness is not guaranteed. We can easily calculate the limit of

P ({\hat{q}}_{α} \leq q_{α})

which equals

1 - P (Z \leq λ)

for

Z \sim Poisson (λ)

. For instance, for

λ = 1

, this equals 0.26, which mirrors skewness of the distribution of extreme quantiles. Note, of course, that we may use extreme value theory to derive the asymptotic distribution of

{\hat{q}}_{α}

We run a small simulation to study the performance of tail expectile and tail quantile estimation. We simulate data and look at the distribution of extreme quantiles. In Figure 3 we show the distribution of ${\hat{q}}_{α} - q_{α}$ (solid line) and ${\hat{m}}_{\tilde{α}} - m_{\tilde{α}}$ (dashed line) for a sample size of $n = 1000$ . We look at the $α = 0.999$ (top row) and the $α = 0.99$ (bottom row) quantiles and the corresponding expectile. We simulate from (a) a normal distribution (left column), (b) a Chi-squared (with two degrees of freedom, middle column) and (c) a t-distribution (with three degrees of freedom, right-hand side column). The vertical line indicates the mean value of ${\hat{q}}_{α} - q_{α}$ (solid line) and ${\hat{m}}_{\tilde{α}} - m_{\tilde{α}}$ (dashed line) which should be zero to indicate unbiasedness. There is apparently a bias occurring for quantile estimation for $α$ close to 1 (or close to zero).

Conclusion: Overall we may conclude that expectiles estimates behave stable even for very small or very large values of $α$ . This is of course important to know if one uses a sequence including even extreme expectiles to estimate quantiles as suggested in the previous subsection.

2.4 Example

To illustrate expectiles and the conversion of expectiles to quantiles we give a short example. We apply our methods to data collected 2012 in Munich to construct the Munich rent index. The full data set consists of 3080 observations, i.e., rented apartments in Munich, Germany, and we here analyze the variable giving the net rent per squared meter $(m^{2})$ for each apartment. For illustration, we restrict our attention to apartments between 45 $m^{2}$ and 55 $m^{2}$ and examine the net rent per square for the resulting 421 apartments in the data set of that size.

First we calculate a fine grid of sample expectiles and quantiles for the variable net rent per $m^{2}$ and plot them in Figure 4 on the left. The estimated expectiles naturally form a smooth curve while the estimated quantiles mirror some variability. In a next step, we use the set of expectiles to calculate quantiles from expectiles and plot these against the empirical quantiles (see the right part of Figure 4). When zooming into the figure, one can notice, that the estimated quantiles for the inner range of $α \in (0, 1)$ nearly coincide with their empirical counterpart, whereas for extreme values of $α$ there is some minor fluctuation around the identity line. This behaviour is also supported by our simulation results in Section 2.2. All in all, we see the applicability of the approaches. The example will be further discussed in subsection 4.2 by regressing the rent on the floorspace.

Figure 4:

Source: Authors' own (prepared with GNU R).

3 Quantile and expectile regression

3.1 The problem of crossing quantiles and expectiles

So far we have considered the simple scenario with no explanatory variables involved. We extend this now to quantile and expectile regression. To do so, we assume a continuous covariate $x$ and define the quantile and expectile regression functions through $q_{α} (x) = β_{0 α}^{(q)} + x β_{1 α}^{(q)}$ and $m_{α} (x) = β_{0 α}^{(m)} + x β_{1 α}^{(m)}$ . Estimation of $q_{α} (x)$ and $m_{α} (x)$ is carried out using the weighted $L_{1}$ sum (1.1) and the corresponding $L_{2}$ version (1.2), respectively, with $q_{i, α} = q_{α} (x_{i})$ and $m_{i, α} = m_{α} (x_{i})$ .

A central problem occurring in quantile and expectile regression is crossing of fitted functions. For $0 < α < α^{'} < 1$ , by definition, we have $q_{α} (x) < q_{α^{'}} (x)$ and $m_{α} (x) < m_{α^{'}} (x)$ , for all $x$ . This inequality can, however, be violated for some (observed) $x$ values in the fitted functions, which is called the crossing quantile or expectile problem. Several remedies have been proposed to circumvent the problem, some of which will be used later in the next section. Before turning to that, we want to explore empirically in simulations how frequently one is faced with crossing quantile and expectile functions. We run a small simulation study and count the number of crossings between neighbouring fitted expectiles and quantiles. Therefore, we select a set of $α \in {0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.8, 0.9, 0.95, 0.98, 0.99}$ and simulate data from the following simple linear regression setting

y = 4 + 3 x + ε .

(3.1)

The covariate

x

is drawn from a uniform

U (- 1, 1)

distribution, and the random error is added from either (a) a normal

N (0, 1 . 5^{2})

, (b) a Chi-squared

(df = 2)

or (c) a t-distribution

(df = 3)

. Data sets are generated with sample sizes of

n = 49, 199, 499

and for each combination of settings 1000 replications are created. A data set is then analyzed by computing the set of

α

-quantiles using the R-package ‘quantreg’. For expectiles, we compute the resulting

\tilde{α} = h (α)

expectiles using the R-package ‘expectreg’. Function

h (.)

is computed separately for each error distribution according to (1.3), which makes the estimates comparable. For each data set and every generated covariate value, all neighbouring pairs of

α

and

h (α)

are checked for crossing regression lines within the range of observed covariates.

Table 1:

Number of crossings between two neighbouring expectiles/quantiles in the linear model (3.1) from 1000 data sets, starting with $α = 0.01$ . Crossing counts are given for all 10 pairs of expectiles or quantiles, respectively, sample sizes of $n = 49, 199, 499$ and the three error distributions for $ε$ , as defined previously. Quantiles smaller than $1 / n$ are omitted and indicated as $*$ in the table

expectiles
$ε \sim$	$N (0, 1 . 5^{2})$			$χ^{2} (2)$			t(3)
$h (α) with α$ / $n$	49	199	499	49	199	499	49	199	499
0.01–0.02	$*$	21	5	$*$	56	1	$*$	49	48
0.02–0.05	54	2	0	207	3	0	22	35	24
0.05–0.1	8	0	0	44	0	0	7	18	3
0.1–0.2	1	0	0	4	0	0	5	5	2
0.2–0.5	0	0	0	0	0	0	1	0	0
0.5–0.8	0	0	0	1	0	0	0	0	0
0.8–0.9	0	0	0	5	0	0	10	3	1
0.9–0.95	8	0	0	19	0	0	14	11	2
0.95–0.98	48	2	0	42	0	0	23	27	23
0.98–0.99	$*$	27	3	$*$	27	1	$*$	53	45
quantiles
$ε \sim$	$N (0, 1 . 5^{2})$			$χ^{2} (2)$			t(3)
$α$ / $n$	49	199	499	49	199	499	49	199	499
0.01–0.02	$*$	564	210	$*$	593	226	$*$	612	296
0.02–0.05	714	123	10	758	143	16	679	208	27
0.05–0.1	443	46	0	439	46	0	377	78	3
0.1–0.2	144	6	0	167	3	0	156	2	0
0.2–0.5	5	0	0	5	0	0	3	0	0
0.5–0.8	3	0	0	13	0	0	4	0	0
0.8–0.9	168	2	0	154	5	0	166	7	0
0.9–0.95	433	37	1	432	40	1	411	78	3
0.95–0.98	680	130	12	718	175	18	704	214	37
0.98–0.99	$*$	593	213	$*$	599	199	$*$	647	267

Source: Authors' own (prepared with GNU R).

The resulting number of crossings within the 1000 replications is summarized in Table 1. Not surprisingly, crossings occur in the tail of the distribution and become less frequent with increasing sample size. However, the numbers show that there are generally fewer crossings of expectiles, in particular, within the central $90 %$ of the distribution, while for quantile regression we obtain a large proportion of crossings for small samples even in the inner part of the distribution, i.e., between the 0.2 and 0.8 quantiles.

Conclusion: We may conclude from the simulation that expectiles seem less vulnerable for crossing problems than quantile estimates.

3.2 Non-crossing spline-based estimation

Several remedies have been suggested to circumvent or correct for crossing quantiles with references given in the introduction. We here extend the idea of Bondell et al., 2010 who fit non-crossing quantiles using linear programming. We pick up the idea and generalize it towards non-crossing spline-based expectile estimation. To do so, we first present spline-based quantile and expectile estimation by replacing $q_{α} (x)$ and $m_{α} (x)$ with bivariate functions

q (α, x) and m (α, x),

(3.2)

where both functions are smooth (or just linear) in the direction of

α

and smooth (or just linear) in the direction of

x

. The bivariate functions may be called quantile sheets or expectile sheets. The setting (3.2) transfers the estimation exercise to bivariate smoothing, as proposed in (Schnabel and Eilers, 2013b) and Schnabel and Eilers (2009a). We replace (or approximate)

q (α, x)

through

q (α, x) = [B^{(1)} (α) \otimes B^{(2)} (x)] u,

(3.3)

where

B^{(1)} (α)

is a (linear) B-spline basis set up on knots

0 < α_{1} < \dots < α_{L} < 1

B^{(2)} (x)

is a B-spline basis built upon some knots

τ_{1} < \dots < τ_{K}

covering the range of observed values of

x

and

u

is the vector of coefficients. If

q (α, x)

is assumed to be linear in

x

, one may take

B^{(2)} (x)

as linear B-spline and set

K = 2

in this case. Let

l = 1, \dots, L

be the indices of columns of

B^{(1)} (α)

, and

k = 1, \dots, K

the indices of columns of

B^{(2)} (x)

. Vector

u

may then be indexed by

u_{lk}

for

l = 1, \dots, L

k = 1, \dots, K

, and let

u_{l .} = {(u_{l 1}, \dots, u_{lk})}^{T}

. Non-crossing quantiles are now guaranteed by linear constraints on the parameter vector of the form

B^{(2)} (x) (u_{l .} - u_{l + 1 .}) \leq 0 for l = 1, \dots, L - 1

(3.4)

for all

x

in the observed range of the covariates. If

B^{(2)} (x)

is a linear B-spline basis, this simplifies to

u_{l k} \leq u_{l + 1 k}

for

l = 1, \dots, L - 1

and

k = 1, \dots, K

. In general, we can formulate (3.4) as a linear constraint by inserting for

x

the observed values

x_{i}, i = 1, \dots, n

We can now fit $q (α, x)$ by replacing (1.1) with its multiple version

\sum_{l = 1}^{L} \sum_{i = 1}^{n} w_{i, α_{l}} |y_{i} - B^{(1)} (α_{l}) \otimes B^{(2)} (x_{i}) u|

(3.5)

which is minimized with respect to

u

subject to the linear constraints (3.4) using linear programming. Alternatively, one may work with iterated weighted least squares by using the fact that

\begin{array}{l} | y - q_{α} | = {(\sqrt{{(y - q_{α})}^{2}})}^{- 1} {(y - q_{α})}^{2} \end{array}

. (Schnabel and Eilers, 2013b) change the weight from

w_{i, α}

w_{i, α} {(\sqrt{{(y - q_{α})}^{2}})}^{- 1}

and apply iterated weighted least squares to fit the function

q (α, x)

Replacing the $L_{1}$ distance in (3.5) by the $L_{2}$ distance

\sum_{l = 1}^{L} \sum_{i = 1}^{n} w_{i, α_{l}} {(y_{i} - B^{(1)} (α_{l}) \otimes B^{(2)} (x_{i}) v)}^{2}

(3.6)

gives a weighted least-squares criterion which allows to estimates the expectile sheet

m (α, x) = B^{(1)} (α) \otimes B^{(2)} (x) v

, where again the linear constraints (3.4) need to be fulfilled. Estimation can be carried out by iterative quadratic programming.

Figure 5:

Source: Authors' own (prepared with GNU R).

In Figure 5, we show an example for the simulation model (3.1) of the previous subsection. We plotted the resulting quantile and expectile sheets under the assumption of normally distributed residuals. We use a linear B-spline basis for $x$ with $K = 2$ . For every value, we have increasing (or nondecreasing) functions $q (α, x)$ and $m (α, x)$ in $α$ . A simple visual impression shows that the quantile sheet is more variable compared to the fitted expectile sheet. Note that we can now calculate for each value of $x$ a set of expectiles ${\hat{m}}_{α_{l}} (x) = \hat{m} (α_{l}, x)$ , which allows to apply the results of Section 2 to derive quantiles based on expectiles. The code for calculating the linear non-crossing quantile curves by (Bondell et al., 2010) is available from the homepage of Howard Bondell (see http://www4.stat.ncsu.edu/̃bondell/Software/NoCross/NoCrossQuant.R , last date of access 10 January 2014). The programme for fitting non-crossing expectiles is part of the R package ‘expectreg’ by (Sobotka et al., 2013).

Conclusion: For both expectiles and quantiles, we can fit sheets guaranteeing non-crossing functions. Overall, the expectile sheet provides a more smooth surface compared to the quantile sheet, in particular, in the direction of $α$ .

4 Extensions and examples

4.1 Penalized smooth expectile sheets

Following the expectile sheet $m (α, x)$ , we may assume that $m (α, x)$ is smooth in $x$ , but without any parametric (linear) assumption. This can be fitted with a B-spline basis as in Section 3.2, but now with $K$ being large. In order to control for a smooth and numerically stable fit, one may impose a penalty on the coefficients in the style of penalized spline regression (see Ruppert et al. 2003, 2009). In other words we supplement (3.6) by the quadratic penalty

λ_{v} v^{T} D^{T} D v,

where

D^{T} D

is an appropriately chosen penalty matrix, and

λ_{v}

is the smoothing parameter which is chosen data driven. We give an example in the next subsection. This was first proposed by Schnabel and Eilers (2009a) with an additional penalization in direction of

α

. For a specific value of

α

this has been proposed in Sobotka et al., 2012 for expectile smoothing or in Bollaerts et al., 2006 for quantile smoothing where the latter use a different penalization. The smoothing parameter

λ_{v}

can be chosen by asymmetric cross-validation or the Schall algorithm for mixed models as described in (Schnabel and Eilers 2009b).

Figure 6:

Source: Authors' own (prepared with GNU R).

4.2 Rent index of Munich

To see how the method performs in practice, we again take a look at the Munich rental data from Section 2.4. As a reminder, the data consists of 3080 observations, i.e., rented apartments in Munich, Germany. We consider two variables in our example: net rent per $m^{2}$ as response and living space measured in $m^{2}$ as covariate. First, we perform both a non-crossing and non-parametric expectile regression as described in Sections 3.2 and 4.1, respectively. Our underlying model is given with the expectile sheet $m (α, living space)$ . The smoothing parameter $λ_{v}$ was chosen automatically by using the Schall algorithm (see Schnabel Eilers, 2009b). The sheet resulting from the estimation procedure is shown in Figure 6. As one can see the dependency of the two variables is obviously of non-linear nature. The amount of smoothing done for the expectile sheet seems appropriate. The sheet serves as a basis to calculate quantile estimates for certain values of $x = 25, 30, \dots, 155$ and $α = 0.01, 0.02, 0.05, 0.10, 0.20, 0.50, 0.80, 0.90, 0.95, 0.98, 0.99$ . We apply the algorithm as described in Section 2 and obtain the mid-panel of Figure 6. The calculated values for the quantiles are indicated by points which are connected by lines. All in all, the quantiles from expectiles seem to behave well. We can see that there is a decrease in net rent per square metre as the apartment size grows. This continues for apartments up to size 100 $m^{2}$ , but then net rent remains, more or less, constant. A nice feature of our conversion is that non-crossing of quantiles is guaranteed.

As an alternative to the expectile-based analysis, we apply smooth non-crossing spline-based quantile fitting as described in (Muggeo et al., 2013) and implemented in the R-package ‘quantregGrowth’. Here, we choose cubic B-splines with a penalization through second-order differences. (Muggeo et al., 2013) guarantee non-crossing of quantile curves by imposing inequality constraints on the spline coefficients. As the R-package allows us to estimate growth charts, there is also the possibility to enforce monotonicity in the direction of the dependent variable. The resulting fit is shown in the right part of Figure 6. All in all, spline-based quantiles exhibit similar features as the quantiles from expectiles, although the quantile smoothing spline, due to its $L_{1}$ nature, is angled. For the non-crossing quantile smoothing, we decided to pick a smoothing parameter which would result in a smoothness comparable to the amount of smoothing mirrored in the second panel of Figure 6.

4.3 Expected shortfall

Investment risks are frequently measured using the ES, a stochastic risk measure, for the lower tail defined as $ES (α) = E (Y | Y < q_{α})$ for a continuous random variable $Y$ with $α$ -quantile $q_{α}$ . It measures the expectation given that the random variable does not exceed a fixed value and is often applied to financial time series. A naïve estimate would calculate the mean beyond a previously estimated quantile and would, therefore, be rather inefficient. (Taylor, 2008) presents a possibility to estimate the ES using expectiles and their connection to quantiles.

Note that the $\tilde{α}$ -expectile is implicitly defined through arg min $E (w_{i, \tilde{α}} {(y_{i} - m)}^{2})$ so that the expectile satisfies

\frac{1 - 2 \tilde{α}}{\tilde{α}} E [(Y - m_{\tilde{α}}) I (Y < m_{\tilde{α}})] = m_{\tilde{α}} - E (Y),

(4.1)

where, as above,

\tilde{α} = h (α)

. That is, the expectile

m_{\tilde{α}}

is determined by the expectation of the random variable

Y

conditional on

Y < m_{\tilde{α}}

. Rewriting (4.1) and using the fact

F (m_{\tilde{α}}) = α

leads to

{ES}_{low} (α) : = E (Y | Y < q_{α}) = (1 + \frac{\tilde{α}}{(1 - 2 \tilde{α}) α}) m_{\tilde{α}} - \frac{\tilde{α}}{(1 - 2 \tilde{α}) α} m_{0.5}

(4.2)

for the lower tail of

F

. Depending on whether the random variable describes a win or a loss, we define the ES for the upper tail as

{ES}_{up} (α) = E (Y | Y > m_{1 - \tilde{α}})

. In order to determine the appropriate

\tilde{α}

to a given

α

, (Taylor, 2008) estimates a dense set of expectiles and then constructs an empirical distribution function on the basis of the expectile curves. Here, we make use of the results derived in subsection 2.1 and estimate the distribution function

{\hat{F}}_{m} (.)

from expectiles. As introduced in Section 2.1, we estimate a dense set of expectiles (i.e., we set

α_{l} = 0.0005, 0.001, 0.005, 0.01, 0.02, \dots, 0.98, 0.99, 0.995, 0.999, 0.9995

) and compute the cumulative distribution function at each observed covariate value. The estimated distribution allows us to conclude the

\tilde{α}

value for a given

α

-quantile. We then calculate the ES explicitly for certain values of

α

, e.g.,

α = 0.01, 0.05, 0.95, 0.99

We apply the idea and estimate the ES for a serially drawn time series from the daily yields of the French stock index CAC40 in the time period between 1991 and 1998. All in all, there are 1860 observations/trading days and we take time $t$ as the covariate influencing the ES. For estimation, we, therefore, construct the expectile sheet $m (α, t)$ . As basis in $t$ , we use a cubic B-spline basis with 20 inner knots to account for the variability in time. However, in order to give a risk prediction for the next observations, we have equidistant knots from $min (t)$ to $max (t) + 0.02 (max (t) - min (t))$ . That way, we get an estimated risk for the upcoming day(s), i.e., we pursue out of range prediction. To achieve smoothness in time, we add a penalty of first-order differences $λ_{v} v^{T} D^{T} D v$ to (3.6), where $D v$ has rows $v_{lk} - v_{lk - 1}$ for $l = 1, \dots, L$ and $k = 1, \dots, K$ . The optimal smoothing parameter $λ_{v}$ is chosen via asymmetric cross-validation; see (Sobotka et al., 2012) for a more extensive description. Next, we apply the algorithm presented in Section 2 to all observed covariate values and also to the added time points beyond the data. This delivers the estimated $α$ -quantile and its corresponding $\tilde{α}$ -expectile. In turn we are able to estimate the expected shortfall (4.2) for each point in time. This is done sequentially, that is, observation by observation, to gather information about the changes in the distribution.

The result of the estimation is presented in Figure 7 (left part). For comparison, we also fit the ES based on the empirical distribution as suggested by (Taylor, 2008) (the right part of Figure 6). As can be seen, the volatility of the data is captured by the curves of the ES, for gains, as well as for losses. A generalization over the range of time can also be observed. When using the empirical distribution function on the other hand, especially the curves for $α = 0.05, 0.95$ tend towards overfitting. This is particularly visible for the time of low volatility around day 1300. The small amount of prediction incorporated by the splines turns out to be just a linear extension of the last fits. For accurate predictions, one should aim to combine conditional autoregressive expectiles (CARE, Kuan et al., 2009) that are able to account for the autocorrelation in the data with the methods introduced in this article. Still, the example shows that an improvement in ES estimation is possible when using the efficient distribution estimation introduced in Section 2.1.

Figure 7:

Source: Authors' own (prepared with GNU R).

5 Discussion

In this article, we looked at quantiles, as Goliath, and expectiles, as David, and explored how their connection can be used in practice. An algorithm was presented to estimate quantiles from a (fine) grid of expectiles. We noticed and examined properties of extreme quantiles and expectiles and discussed the crossing issue of quantile and expectile regression. Even so, as crossing of neighbouring curves is an issue, we proposed a modified method to circumvent this problem. All methods regarding expectiles which were described in detail in this article can be found in the R-package ‘expectreg’.

Apparently, referring again to the comparison of expectiles and quantiles to David and Goliath is undissolved. There is no final fight, and research on both ends continues. It is certainly true that quantiles are dominant in the literature but we wanted to show that expectiles are an interesting alternative to quantiles and that their combined use is helpful, in particular, for the estimation of the ES. We also demonstrated the use of quantile and expectile sheets as smooth variants to quantile and expectile regression, respectively. This accommodates quite naturally the constraints of non-crossing quantile and expectile curves and the latter allows for smooth expectile regression based on implemented software, as mentioned above. Also, expectile regression now can be performed without loosing interpretability, since quantiles can be estimated from expectiles.

All in all, we hope to have convinced the reader that expectiles do not immediately ‘belong in the spittoon’ as (Koenker 2013a) provocatively postulates. We think that expectiles provide an interesting and worthwhile alternative to the well-established quantile regression.

Footnotes

Acknowledgments

Financial support from the German Research Foundation (DFG) grant KA 1188/7-1 is gratefully acknowledged. We thank two anonymous reviewers for constructive remarks which led to an improved version of this article.

Relation of α and α ̃ for α → 0

We assume that $y$ has finite second moments, then with (1.3) (A.1)

\frac{α}{\tilde{α}} = \frac{- α m_{0.5} + 2 α G (q_{α}) + α (1 - 2 α) q_{α}}{- α q_{α} + G (q_{α})} .

Since the nominator and denominator both tend to zero for

α \to 0

, we apply the rule of de l'Hospital. Observing that

α q_{α} = o (1)

for

α \to 0

, we get (A.2)

lim_{α \to 0} \frac{α}{\tilde{α}} = lim_{α \to 0} - \frac{f (q_{α}) (q_{α} - m_{0.5}) + α}{α} > 1 .

Hence

\tilde{α} < α

for

α \to 0

. Note that since

f (q_{α}) = o ({| q_{α} |}^{- 3})

which follows due to the existence of second-order moments, we find again that nominator and denominator of (A.2) converge to zero for

α \to 0

. Assuming now that

f (q_{α})

is proportional to

{| q_{α} |}^{- 3}

for

α \to 0

which is required to guarantee finite second-order moments, we get, again with the rule of de l'Hospital applied to (A.2), that

lim_{α \to 0} α / \tilde{α} = const \geq 1

, while if

f (q_{α})

is proportional to

| q_{α} |^{- (3 + δ)}

for some

δ > 0

, we get with the same arguments that

lim_{α \to 0} α / \tilde{α} = \infty

Estimation of F ̂ m ( . )

Let $0 < α_{1} < \dots < α_{L} < 1$ be a dense set of knots covering $(0, 1)$ and containing $0.5$ and define with $l_{0}$ the index with $α_{l_{0}} = 0.5$ . First note, that (2.1) for $α_{l_{0}}$ gives a redundant information as it states that $m_{0.5} = m_{0.5}$ . That is to say that we need an additional constraint. This is found by observing that (A.3)

m_{0.5} = \int_{- \infty}^{\infty} y dF (y) = \frac{m_{L} + m_{L + 1}}{2} + \sum_{l = 1}^{L} (c_{l} - c_{L + 1}) γ_{l}

with the approximation of

F (.)

from Section 2.1. Remembering the definition of the expectiles (2.1), we define function

g_{l} (.)

by (A.4)

\begin{matrix} g_{l} (γ_{l}, \dots, γ_{1}) & = & m_{l} - \frac{(1 - α_{l}) G_{l} (γ_{l}, \dots, γ_{1}) + α_{l} (m_{0.5} - G_{l} (γ_{l}, \dots, γ_{1}))}{(1 - α_{l}) F_{l} (γ_{l}, \dots, γ_{1}) + α_{l} (1 - F_{l} (γ_{l}, \dots, γ_{1}))} for \\ l & = & 1, \dots, L . \end{matrix}

We now need

γ_{1}, \dots, γ_{L}

such that

g_{l} \equiv 0

which in principle can be seen as a root finding problem. We implemented a version where we minimize the sum of squares of

g_{l} (γ)

under certain restrictions: We face the minimization problem (A.5)

min_{γ_{1}, \dots, γ_{L}} S (γ_{1}, \dots, γ_{L}) = min_{γ_{1}, \dots, γ_{L}} \sum_{l = 1}^{L} (g_{l} {(γ_{l}, \dots, γ_{1})}^{2})

under the constraints that

γ_{l} \geq 0

and

\sum_{l = 1}^{L} γ_{l} \leq 1

which is solved by Newton's method in optimization and also implemented in the R-package ‘expectreg’ by Sobotka et al., 2013. Penalty parameter

λ_{pen}

, which ensures numerical stability and smoothness of the distribution function, may be set equal to the squared empirical variance of the data from which the expectiles are estimated. In our simulations, we set

λ_{pen}

equal to five times the squared empirical variance of the data (for each of the three distributions considered and for both sample sizes

n = 199

and

n = 499

References

Aigner

Amemiya

Poirier

(1976) On the estimation of production frontiers: maximum likelihood estimation of the parameters of a discontinuous density function. International Economic Review , 17 (2), 377–96.

Andriyana

Gijbels

Verhasselt

(2014) P-splines quantile regression estimation in varying coefficient models. TEST , 23 (1), 153–94.

Bollaerts

Eilers

Aerts

(2006) Quantile regression with monotonicity restrictions using p-splines and the l1-norm. Statistical Modelling , 6(3), 189–207.

Bondell

Reich

Wang

(2010) Non-crossing quantile regression curve estimation. Biometrika , 97, 825–38.

Breckling

Chambers

(1988) M-quantiles. Biometrika , 75, 761–71.

Chernozhukov

Fernández-Val

Galichon

(2010) Quantile and probability curves without crossing. Econometrica , 78(3), 1093–125.

Rossi G

Harvey

(2009) Quantiles, expectiles and splines. Journal of Econometrics, Nonparametric and Robust Methods in Econometrics (Special issue). 152(2), 179–85.

Dette

Scheder

(2011) Estimation of additive quantile regression. Annals of the Institute of Statistical Mathematics , 63 (2), 245–65.

Efron

(1991) Regression percentiles using asymmetric squared error loss. Statistica Sinica , 1, 93–125.

10.

Eilers

PHC

(2013) Discussion: The beauty of expectiles. Statistical Modelling , 13(4), 317–22.

11.

Guo

Härdle

(2013) Simultaneous confidence bands for expectile functions. AStA –Advances in Statistical Analysis , 96 (4), 517–41.

12.

(1997) Quantile curves without crossing. The American Statistician , 51(2), 186–92.

13.

Honda

(2004) Quantile regression in varying coefficient models. Journal of Statistical Planning and Inference , 121(1), 113–25.

14.

Jones

(1992) Estimating densities, quantiles, quantile densities and density quantiles. Annals of the Institute of Statistical Mathematics , 44(4), 721–27.

15.

Jones

(1994) Expectiles and M-quantiles are quantiles. Statistics & Probability Letters , 20(2), 149–53.

16.

Kneib

(2013) Beyond mean regression (with discussion and rejoinder). Statistical Modelling , 13(4), 275–303.

17.

Koenker

(1984) A note on l-estimates for linear models. Statistics and Probability Letters , 2(6), 323–25.

18.

Koenker

(2005) Quantile regression, econometric society monographs . Cambridge: Cambridge University Press.

19.

Koenker

(2013a) Discussion of ‘beyond mean regression’ by T. Kneib. Statistical Modelling , 13(4), 323–33.

20.

Koenker

(2013b) Quantreg: quantile regression . R package version 4.97.

21.

Koenker

Bassett

(1978) Regression quantiles. Econometrica , 46(1), 33–50.

22.

Koenker

Portnoy

(1994) Quantile smoothing splines. Biometrika , 81(4), 673–80.

23.

Kuan

Yeh

Hsu

(2009) Assessing value at risk with care, the conditional autoregressive expectile models. Journal of Econometrics , 150(2), 261–70.

24.

Leorato

Peracchi

Tanase

(2012) Asymptotically efficient estimation of the conditional expected shortfall. Computational Statistics and Data Analysis , 56(4), 768–84.

25.

Muggeo

Sciandra

Tomasello

Calvo

(2013) Estimating growth charts via nonparametric quantile regression: a practical framework with application in ecology. Environmental and Ecological Statistics , 20, 519–31.

26.

Neocleous

Portnoy

(2008) On monotonicity of regression quantile functions. Statistics and Probability Letters , 78(10), 1226–29.

27.

Newey

Powell

(1987) Asymmetric least squares estimation and testing. Econometrica , 55(4), 819–47.

28.

Noh

Chung

Van

Keilegom I

(2012) Variable selection of varying coefficient models in quantile regression. Electronic Journal of Statistics , 6, 1220–38.

29.

Pratesi

Ranalli

Salvati

(2009) Nonparametric M-quantile regression using penalised splines. Journal of Nonparametric Statistics , 21(3), 287–304.

30.

Reiss

Huang

(2012) Smoothness selection for penalized quantile regression splines. International Journal of Biostatistics , 8(1).

31.

Ruppert

Wand

Carroll

(2003) Semiparametric Regression . Cambridge: Cambridge University Press.

32.

Ruppert

Wand

Carroll

(2009) Semiparametric regression during 2003–2007. Electronic Journal of Statistics , 3, 1193–256.

33.

Schnabel

Eilers

PHC

(2009a, July 20–24) Non-crossing smooth expectile curves. In JG Booth, ed. Proceedings of the 24th International Workshop on Statistical Modelling . Ithaca, NY, USA, 330–36.

34.

Schnabel

Eilers

PHC

(2009b) Optimal expectile smoothing. Computational Statistics and Data Analysis , 53(12), 4168–77.

35.

Schnabel

Eilers

PHC

(2013a) A location-scale model for non-crossing expectile curves. Stat , 2(1), 171–83.

36.

Schnabel

Eilers

PHC

(2013b) Simultaneous estimation of quantile curves using quantile sheets. Advances in Statistical Analysis , 97(1), 77–87.

37.

Sobotka

Kauermann

Waltrup

Schulze and Kneib

(2012) On confidence intervals for semiparametric expectile regression. Statistics and Computing , 23(2), 135–48.

38.

Sobotka

Kneib

(2012) Geoadditive expectile regression. Computational Statistics and Data Analysis , 56(4), 755–67.

39.

Sobotka

Schnabel

Waltrup

L Schulze

(2013) Expectreg: expectile and quantile regression . With contributions from P. Eilers, T. Kneib and G. Kauermann, R package version 0.39.

40.

Taylor

(2008) Estimating value at risk and expected shortfall using expectiles. Journal of Financial Econometrics , 6(2), 231–52.

41.

Wang

Zhou

(2010) Estimation of the retransformed conditional mean in health care cost studies. Biometrika , 97, 147–58.

42.

Liu

(2009) Stepwise multiple quantile regression estimation using non-crossing constraints. Statistics and Its Interface , 2, 299–310.

43.

Yao

Tong

(1996) Asymmetric least squares regression estimation: A nonparametric approach. Journal of Nonparametric Statistics , 6( 2–3), 273–92.

44.

Yuan

(2006) Gacv for quantile smooting splines. Computational Statistics and Data Analysis , 50, 813–29.

45.

Ziegel

(2013) Coherence and elicitability. arXiv:1303.1690v2.

Expectile and quantile regression—David and Goliath?

Abstract

Keywords

1 Introduction

2.1 Quantiles from expectiles

3.1 The problem of crossing quantiles and expectiles

4.1 Penalized smooth expectile sheets

4.3 Expected shortfall

Footnotes

Acknowledgments

Relation of α and α ̃ for α → 0

Estimation of F ̂ m ( . )

References