On the use of the Box-Cox transformation in censored and truncated regression models

Abstract

In this paper we revisit some issues related to the use of the Box-Cox transformation in censored and truncated regression models, which have been overlooked by the econometric and statistical literature. We first analyze the shape of the density function of the random variable which, rescaled by a Box-Cox transformation, leads to a normal random variable. Then, we identify the value ranges of the Box-Cox scale parameter for which a regular expectation of the derived random variable does not exist. This result calls for an extension of the concept of expectation, which can be computed regardless of the value of the scale parameter. For this purpose, we extend the concept of mean of a rescaled series of observations to the case of a random variable. Finally, we run estimates of censored and truncated Box-Cox standard Tobit models to determine the range of the scale parameter most relevant for empirical demand analyzes. These estimates highlight significant deviations from the assumption of normality of the dependent variable towards highly right skewed and leptokurtic distributions with no expectation.

Keywords

Box-Cox normal variable censored and truncated Box-Cox Tobit models generalized expectation of a Box-Cox normal variable empirical analyses of censored and truncated samples of household expenditures surveys

1. Introduction

In specifying parametric censored or truncated models, econometricians make an extensive use of normality assumption to model random disturbances. This assumption allows easily deriving the analytical form of the model likelihood function in order to estimate consistently and efficiently the model parameters by means of the maximum likelihood method. Unfortunately, contrary to the classical regression model which estimation is robust with respect to deviations from normality assumption of disturbances, the maximum likelihood estimation of censored or truncated regression models becomes inconsistent under non normality of disturbances.

To deal with this potential error of specification, two options are open, namely to devise robust estimation methods providing consistent estimates even in the presence of deviations with respect to the normality assumption, or to use a transformation of the non normal dependent variable to rescale it in order to turn its distribution into a normal one. In particular, to turn skewed distributions into symmetric normal-like distributions, the Box-Cox (1964) monotonic function with a free value scale parameter has been extensively applied in regression analysis of strictly positive dependent variables, as it provides a flexible family of scale transformations including linear, logarithmic and inverse scales as special cases. At a lesser extent, the Box-Cox transformation has also been used in censored or truncated regression models, notably by Poirier (1978), Lankford and Wyckoff (1991), Yen (1993), Jones and Yen (2000), Chaze (2005). In particular, to account for negative values of the dependent variable in Box-Cox Tobit models, Lankford and Wyckoff (1991) and Chaze (2005) uses a two parameter Box-Cox function which translates the lower truncation bound of the dependent variable to a negative value by adding a translation parameter to this variable in the Box-Cox function.

Empirical evidence provided by Box-Cox regression estimates seems to show that scale parameter estimates are ranging from somewhat below 0 to somewhat above 1 (Davidson & MacKinnon, 2004). But such estimates, based on classical regression assumptions, may not be relevant for Box-Cox censored and truncated regression models. On the other end, for these models such empirical evidence, based on the use of individual survey data is still not well documented. This issue deserves to be inquired to see at what extent scale parameter estimates range between 0 and $-$ 1 as, for these values, the distribution of the dependent variable is highly skewed and leptokurtic leading to an infinite expectation. This result, overlooked by the econometric and statistical literature, because mentioned in a footnote of Poirier and Melino (1978) with no formal proof, asks for reconsidering the theoretical magnitudes conventionally used for forecasting purposes and marginal effect measurement, which are nonsense when computed numerically on ground of a divergent integral formula, as made in Taylor (1986), Lankford and Wyckoff (1991) and Chaze (2005), in particular.

For sake of clarity, it is worthwhile mentioning that Box-Cox transformation has been extensively used in many fields of econometric and statistical analysis. Notably, it has been used to specify families of flexible functional forms for modeling the responses of dependent variables to changes of their explanatory variables, the flexibility of the functional form being required by the lack of a priori information provided by theory (economic or other) on the shape of the functional form. An early example of such procedure in applied econometrics is given by the use of a generalized Box-Cox cost function for modeling the interrelated demands for factors of production of an industry by Berndt and Khaled (1979), later extended by incorporating a more traditional constant elasticity of substitution cost function by Tishler and Lipovetsky (1996, 2000). This approach leads to a system of factor demands or cost shares equations where only the explanatory variables are rescaled according to a one parameter Box-Cox transformation. Hence, the empirical implementation of this model may be carried out using a classical system of nonlinear seemingly unrelated regressions. Less known is the use of the Box-Cox transformation to specify families of flexible statistical models for discrete response dependent variables, as proposed by Lipovetsky and Conklin (2000) to generalize the logistic and algebraic binary response models. From this perspective, our paper deals with a two parameters Box-Cox generalization of the censored and truncated standard Tobit model as defined in chapter 10.2 of Amemiya (1985).

On this basis, this paper intends to first carefully analyze (Section 2) the shape of what we shall call a Box-Cox normal distribution, namely the distribution of a random variable which transformation according to a Box-Cox function leads to a normal-like distribution. Our analysis complement and thorough that of Freeman and Morrades (2006), which calls the Box-Cox normal distribution: the power-normal distribution. We prefer our denomination to avoid confusion with a special case of beta distribution called “power-function distribution” (Johnson & Kotz, 1970). Then (Section 3), we identify the value ranges of the Box-Cox scale parameter for which the derived distribution does not have a regular expectation. To tackle the forecasting and marginal effect measurement problems within a Box-Cox normal regression model in a unified framework (with respect to the scale parameter values), we extend (Section 4) the concept of expectation by relating it to the transformation to normality used. This generalization of the expectation of a random variable exists for every scale parameter value of the Box-Cox transformation. Finally, we present (Section 5) econometric estimates of censored and truncated Box-Cox standard Tobit models with family expenditure survey data conducted on a large range of goods to determine the range of the Box-Cox scale parameter most relevant for empirical demand analyzes. These estimates highlight significant deviations from the assumption of normality of the dependent variable towards highly right skewed and leptokurtic distributions with no expectation. This asks for developing alternatives to the concept of expectation to tackle forecasting and marginal effect measurement issues.

Figure 1.

The Box-Cox transformation for different values of $\lambda$ and $\alpha=0.3$ .

2. The typical shapes of a Box-Cox normal distribution

The use of transformations to rescale a non normal variable into a normal one goes back to Edgeworth and was termed by him the Method of Translation (Johnson, 1949). Despite its dominant role in probability and theoretical statistics, it is obvious that normal distribution cannot provide an adequate representation of the vast majority of empirical distributions encountered in the real world. The most frequent departure from normality in the representation of empirical data is that referred to as skewness. The intent of the Box-Cox transformation is precisely to provide a trick to find, within a family of transformations, the one which best translate a skewed distribution into a normal-like distribution. The effectiveness of the Box-Cox transformation relies on the wide range of transformations it includes, which allows symmetrizing almost any degree of skewness, from the mildest to the heaviest, by selecting the value of a single scale parameter. Surprisingly, we found no study in the econometric and statistical literature analyzing the shape of a Box-Cox normal distribution according to the whole value range of its scale parameter, despite the importance that represents this knowledge to assess the empirical relevance of a scale parameter estimate. It is the purpose of this section to fill this gap.

Using the general two parameter transformation proposed by Box and Cox (1964):

$\displaystyle T(y;\alpha,\lambda)=\begin{cases}{\displaystyle\frac{(y+\alpha)^% {\lambda}-1}{\lambda}}&\text{if }\lambda\neq 0\\ \log(y+\alpha)&\text{if }\lambda=0\end{cases},$ (1)

we define a Box-Cox normal variable as the random variable $y$ leading to a normal random variable $x$ , with expected value $\mu$ and standard deviation $\sigma$ , once transformed by Eq. (1), namely such that:

$\displaystyle T(y;\alpha,\lambda)=x=\mu+\sigma z,$ (2)

where $z$ stands for a standard normal variable. Parameter $\alpha$ is termed a location parameter as it sets a lower bound, equal to $-\alpha$ , to the domain where the transformation Eq. (1) holds. Parameter $\lambda$ is termed a scale parameter as it characterizes the degree of rescaling implemented by the Box-Cox transformation on variable $y$ . This transformation is strictly increasing and linear for $\lambda=1$ , strictly convex for $\lambda>1$ , and strictly concave for $\lambda<1$ , with a vertical left-asymptote at $y=-\alpha$ when $\lambda\leqslant 0$ , and a ceiling asymptote at $x=-1/\lambda$ when $\lambda<0$ , as shown by Fig. 1. Note also, that convexity increases with $\lambda$ towards the limiting case of an upward right angle curve at $x=0$ and $y=-\alpha+1$ , reached when $\lambda\to+\infty$ . Conversely, concavity increases towards the limiting case of a downward right angle curve at $x=0$ and $y=-\alpha+1$ when $\lambda$ decreases towards $-\infty$ .

The inverse Box-Cox transformation is written as:

$\displaystyle T^{-1}(x;\alpha,\lambda)=\begin{cases}(1+\lambda x)^{1/\lambda}-% \alpha&\text{if }\lambda\neq 0\\ e^{x}-\alpha&\text{if }\lambda=0\end{cases}.$ (3)

As shown by Fig. 2, the reciprocal of the Box-Cox function is strictly increasing and linear for $\lambda=1$ , but strictly concave for $\lambda>1$ , and strictly convex for $\lambda<1$ , with a floor asymptote at $y=-\alpha$ when $\lambda\leqslant 0$ , and a vertical right-asymptote at $x=-1/\lambda$ when $\lambda<0$ . Concavity increases with $\lambda$ to reach the limiting case of a downward right angle curve at $x=0$ and $y=-\alpha+1$ when $\lambda\to+\infty$ . Conversely, convexity increases towards the limiting case of an upward right angle curve at $x=0$ and $y=-\alpha+1$ when $\lambda$ decreases towards $-\infty$ .

The inverse Box-Cox function is defined over an unbounded range of real values only when $\lambda=0$ , while for $\lambda\neq 0$ it is lower bounded by $-1/\lambda$ when $\lambda>0$ , and upper bounded by $-1/\lambda$ when $\lambda<0$ . Therefore, to be compatible with this domain of definition, the normal variable $x$ must be left-truncated at $-1/\lambda$ when $\lambda>0$ , and right-truncated at $-1/\lambda$ when $\lambda<0$ . This implies the standard normal variable $z$ in Eq. (2) shall be replaced by a truncated standard normal variable at the boundaries of the interval $[B_{1},B_{2}]$ , which can be written in general terms as:

$\displaystyle B_{1}=\begin{cases}-{\displaystyle\frac{\mu+(1/\lambda)}{\sigma}% }&\text{if }\lambda>0\\ -\infty&\text{if }\lambda\leqslant 0\end{cases},$ (4)

and

$\displaystyle B_{2}=\begin{cases}+\infty&\text{if }\lambda\geqslant 0\\ -{\displaystyle\frac{\mu+(1/\lambda)}{\sigma}}&\text{if }\lambda<0\end{cases}.$ (5)

Figure 2.

The inverse Box-Cox transformation for different values of $\lambda$ and $\alpha=0.3$ .

Figure 3.

Box-Cox normal density function generated by a Box-Cox transformation with $\lambda=2$ .

To analyze the profile of the probability density function of a Box-Cox normal variable, we perform the change of variable:

$\displaystyle z_{[B_{1},B_{2}]}=\frac{T(y;\alpha,\lambda)-\mu}{\sigma}$ (6)

where $z_{[B_{1},B_{2}]}$ denotes the truncated standard normal variable at the boundaries of interval $[B_{1},B_{2}]$ , which density function is written as ${\phi(z_{[B_{1},B_{2}]})}/\Pi$ , with $\phi(z)$ standing for the density function of the standard normal variable $z$ , and

$\displaystyle\Pi=\Phi(B_{2})-\Phi(B_{1})=\begin{cases}1-\Phi\left({-{% \displaystyle\frac{\mu+(1/\lambda)}{\sigma}}}\right)=\Phi\left({{\displaystyle% \frac{\mu+(1/\lambda)}{\sigma}}}\right)&\text{if }\lambda>0\\ 1&\text{if }\lambda=0\\ \Phi\left({-{\displaystyle\frac{\mu+(1/\lambda)}{\sigma}}}\right)&\text{if }% \lambda<0\end{cases},$ (7)

with $\Phi(z)$ the distribution function of $z$ . Given the Jacobian of this transformation:

$\displaystyle\frac{T^{\prime}(y;\alpha,\lambda)}{\sigma}=\frac{(y+\alpha)^{% \lambda-1}}{\sigma},$ (8)

the density function of $y$ may be written as:

$\displaystyle f(y)=\frac{(y+\alpha)^{\lambda-1}\phi\left({{\displaystyle\frac{% T(y;\alpha,\lambda)-\mu}{\sigma}}}\right)}{\sigma\Pi}.$ (9)

The way the Box-Cox transformation generates the profile of the density function of $y$ from that of $z_{[B_{1},B_{2}]}$ can be illustrated by considering the “standard” case where $\alpha=0$ , $\mu=0$ and $\sigma=1$ (the more general case of transformation Eq. (2) will be contemplated later). To this end, we use the diagrams of Figs 3–5, inspired by Johnson (1949), representing for some outstanding values of $\lambda$ the effect of distortion, generated by the inverse Box-Cox transformation of the $z_{[B_{1},B_{2}]}$ -scale, on the distribution of $y$ . The shaded columns under the two density functions represent the equal probabilities of observing values of $z_{[B_{1},B_{2}]}$ and $y$ within the corresponding small intervals ${\rm d}z_{[B_{1},B_{2}]}$ and ${\rm d}y$ , related by ${\rm d}z_{[B_{1},B_{2}]}=J(y){\rm d}y$ , with $J(y)=y^{\lambda-1}$ the Jacobian of the Box-Cox transformation. Accordingly, the contraction of ${\rm d}y$ due to the Box-Cox transformation, implying a local amplification of the density function of $y$ , is greater where $J(y)$ has a high value than where its value is smaller. More precisely, $J(y)$ magnifies or reduces the value of the density function of $z_{[B_{1},B_{2}]}$ , at the point $z_{[B_{1},B_{2}]}={(y^{\lambda}-1)}/\lambda$ , depending on whether $J(y)>1$ or $J(y)<1$ , respectively.

Figure 4.

Box-Cox normal density function generated by a Box-Cox transformation with $\lambda=0.7$ .

Figure 5.

Box-Cox normal density function generated by a Box-Cox transformation with $\lambda=-0.7$ .

Therefore, according to the values of $\lambda$ , many qualitatively different shapes of the density function of $y$ may be evidenced.

•

When $\lambda=1$ , $J(y)=1$ whatever the value of $y$ . Hence, the shape of the density function of $y=1+z_{[B_{1},B_{2}]}$ is the same as that of $z_{[B_{1},B_{2}]}$ , namely a left truncated standard normal distribution at $y=0$ with a mode at $y=1$ .

•

When $\lambda>1$ , $J(y)$ is an unbounded increasing function of $y$ taking value 0 at $y=0$ and value 1 at $y=1$ , while the density function of $z_{[B_{1},B_{2}]}$ at $y=0$ holds finite. Hence, the density function of $y$ starts from zero and shifts the mode beyond $y=1$ at the value:

$\displaystyle\text{Mode}(y)=\left({\frac{1}{2}+\sqrt{\frac{1}{4}+\lambda(% \lambda-1)}}\right)^{1/\lambda}=\lambda^{1/\lambda},$ (10)

which is the positive solution of the following quadratic equation (with respect to the unknown $y^{\lambda})$ obtained by equating to zero the derivative of the density function of $y$ :

$\displaystyle(y^{\lambda})^{2}-y^{\lambda}-\lambda(\lambda-1)=0.$ (11)

This mode is an increasing function of $\lambda$ , up to a maximum reached at $\lambda=e$ ; then it decreases towards the limit value of $1$ when $\lambda\to+\infty$ . Indeed, as illustrated by Fig. 3, for increasing values of $\lambda$ since $\lambda=1$ , the left truncation of $z_{[B_{1},B_{2}]}$ increases towards 0, as well as the concentration of the probability density of $z_{[B_{1},B_{2}]}$ around 0. Simultaneously, the concavity of the inverse Box-Cox transformation increases, since the linear case up to the case of a downward square curve with an angular point at $z=0$ and $y=1$ , when $\lambda\to+\infty$ . This generates an increased concentration and symmetry of the density function of $y$ around its mode until the collapse of the entire probability mass at $y=1$ , when $\lambda\to+\infty$ .

•

When $0<\lambda<1$ , $J(y)$ is a decreasing function of $y$ , from $+\infty$ at $y=0$ to 0 when $y\to+\infty$ , taking value 1 at $y=1$ , while the density function of $z_{[B_{1},B_{2}]}$ at $y=0$ holds finite. Therefore, contrary to the previous case, the density function of $y$ becomes unbounded at $y=0$ . Indeed, as illustrated by Fig. 4, the convexity of the inverse Box-Cox transformation increases, from the linear case up to the exponential case, for decreasing values of $\lambda$ from $\lambda=1$ to $\lambda=0+$ . This generates a concentration of the density function of $y$ towards 0, for $y<1$ , and a thickening of the density function towards $+\infty$ , for $y>1$ . Despite this overall reverse J-shaped profile, the density function has still a local mode preceded by a local antimode, due to the decrease towards $-\infty$ of the left truncation of $z_{[B_{1},B_{2}]}$ , when $\lambda\to 0+$ . Indeed, for $0<\lambda<1$ Eq. (11) has two positive roots, namely:

$\displaystyle y=\left({\frac{1}{2}-\sqrt{\frac{1}{4}+\lambda(\lambda-1)}}% \right)^{1/\lambda}=(1-\lambda)^{1/\lambda}\text{∼{}∼{}and∼{}∼{}}y=\left({% \frac{1}{2}+\sqrt{\frac{1}{4}+\lambda(\lambda-1)}}\right)^{1/\lambda}=\lambda^% {1/\lambda},$ (12)

which correspond to a local minimum and a local maximum of the density function of $y$ . For $0<\lambda<1/2$ , the local antimode is given by $\lambda^{1/\lambda}$ and the local mode by $(1-\lambda)^{1/\lambda}$ , as $\lambda^{1/\lambda}<(1-\lambda)^{1/\lambda}\Leftrightarrow\lambda<1/2$ . The interpretation of roots Eq. (12) is reversed when $1/2<\lambda<1$ . For $\lambda=1/2$ , these two roots coalesce into a single one, corresponding to an horizontal inflexion point.

•

When $\lambda=0$ , the density of $y$ is that of the well-known lognormal distribution, taking value 0 at $y=0$ , which corresponds to the limit of the local antimode $\lambda^{1/\lambda}$ when $\lambda\to 0+$ , and with a mode at $\lim_{\lambda\to 0+}(1-\lambda)^{1/\lambda}=1/e$ . The value of the standard lognormal density function at the origin can be easily computed as the limit, for $y\to 0+$ , of the lognormal density function written as:

$\displaystyle f(y)=(1/{\sqrt{2\pi}})\exp\{-(1/2)\ln^{2}y-\ln y\}=(1/{\sqrt{2% \pi}})\exp\{-(1/2)[(\ln y+1)^{2}-1]\}.$

•

Finally, when $\lambda<0$ , $J(y)$ is still a decreasing function of $y$ , taking an infinite value at $y=0$ , but this time the density function of $z_{[B_{1},B_{2}]}$ at $y=0$ tends to 0, leading to an indeterminate product of the form $+\infty\times 0$ . To solve this indetermination, we apply L’Hospital’s rule to the density of $y$ rewritten has the ratio of $J(y)$ to $\Pi/{\phi({(y^{\lambda}-1)}/\lambda)}$ . Computing the ratio of the derivatives of these two terms leads to:

$\displaystyle\lim_{y\to 0+}f(y)=\lim_{y\to 0+}\frac{{J}^{\prime}(y)}{\left[{% \Pi\bigg{/}{\phi\left({{\displaystyle\frac{y^{\lambda}-1}{\lambda}}}\right)}}% \right]^{\prime}}=\frac{\lambda(\lambda-1)}{\Pi}\lim_{y\to 0+}\left[{\frac{y}{% y^{\lambda}-1}\phi\left({\frac{y^{\lambda}-1}{\lambda}}\right)}\right]=0.$ (13)

Hence, for negative values of $\lambda$ the density function of $y$ is unimodal right skewed, with a mode given by the positive solution Eq. (10) of Eq. (11). Written as a function of the absolute value of $\lambda$ , $|\lambda|$ , this mode may be expressed as:

$\displaystyle\text{Mode}(y)=\left({\frac{1}{2}+\sqrt{\frac{1}{4}+|\lambda|(|% \lambda|+1)}}\right)^{{-1}/{|\lambda|}}=(1+|\lambda|)^{{-1}/{|\lambda|}},$ (14)

and turns out to be an increasing function of $|\lambda|$ , from $\lim_{|\lambda|\to 0+}(1+|\lambda|)^{{-1}/{|\lambda|}}=1/e$ , up to $1$ when $|\lambda|\to+\infty$ . Furthermore, as the Jacobian’s values for $0<y<1$ increase with $|\lambda|$ towards $+\infty$ and those for $y>1$ decrease with $|\lambda|$ towards $0$ , the concentration of the density function around its mode increases with the value of $|\lambda|$ , until the collapse of the entire probability mass at $y=1$ , when $|\lambda|\to+\infty$ , where the Jacobian is equal to $1$ . Indeed, as illustrated by Fig. 5, for decreasing values of $\lambda$ since $\lambda=0$ , the right truncation of $z_{[B_{1},B_{2}]}$ decreases towards 0, while the concentration of the probability density of $z_{[B_{1},B_{2}]}$ increases around 0. Simultaneously, the convexity of the inverse Box-Cox transformation increases, since the exponential case up to the case of an upward square curve with an angular point at $z=0$ and $y=1$ , when $\lambda\to-\infty$ . This generates an increased concentration and asymmetry of the density function of $y$ around its mode until the collapse of the entire probability mass at $y=1$ , when $\lambda\to-\infty$ .

In summary:

•

For $\lambda\geqslant 1$ , the density functions of $y$ is that of a unimodal left skewed distribution which concentration and symmetry around $y=1$ increase with $\lambda$ , from the case of a $N(1,1)$ distribution left truncated at 0, when $\lambda=1$ , to the limiting case of a degenerate distribution at $y=1$ when $\lambda\to+\infty$ .

•

For $0\leqslant\lambda<1$ , the density functions of $y$ is J-shaped, with a local antimode followed by a local mode, coalescing into an inflexion point with zero derivative when $\lambda=1/2$ , and tending to 0 and $1/e$ , respectively, when $\lambda\to 0+$ .

•

For $\lambda<0$ , the density function of $y$ is that of a unimodal left skewed distribution which concentration and skewness around $y=1$ increase, when $\lambda$ decreases, towards the limiting case of a degenerate distribution at $y=1$ when $\lambda\to-\infty$ .

Figures 6–8 depict all these situations.

Figure 6.

Shapes of the standard Box-Cox normal density function for $\lambda\geqslant 1$ .

Figure 7.

Shapes of the standard Box-Cox normal density function for $0\leqslant\lambda\leqslant 1$ .

Figure 8.

Shapes of the standard Box-Cox normal density function for $\lambda\leqslant 0$ .

In the general case of Box-Cox transformation Eq. (2), where density function of $y$ is written as:

$\displaystyle f(y)=\frac{(y+\alpha)^{\lambda-1}\phi\left({{\displaystyle\frac{% (y+\alpha)^{\lambda}-(1+\lambda\mu)}{\lambda\sigma}}}\right)}{\sigma\Pi},$ (15)

equating to zero the derivative of this function leads to the following quadratic equation with respect to the unknown $(y+\alpha)^{\lambda}$ :

$\displaystyle[(y+\alpha)^{\lambda}]^{2}-(1+\lambda\mu)(y+\alpha)^{\lambda}-% \lambda(\lambda-1)\sigma^{2}=0,$ (16)

and accordingly to the following two roots with respect to $y$ :

$\displaystyle y=-\alpha+\left({\frac{1+\lambda\mu}{2}\pm\sqrt{\left({\frac{1+% \lambda\mu}{2}}\right)^{2}+\lambda(\lambda-1)\sigma^{2}}}\right)^{1/\lambda}.$ (17)

Contrary to the possible roots of Eq. (11), which are all real numbers, the roots of Eq. (16) may be imaginary numbers, meaning that the density function Eq. (15) may not have a modal value (but may have an inflexion point). This happens when the discriminant of Eq. (16), $\Delta=[{(1+\lambda\mu)}/2]^{2}+\lambda(\lambda-1)\sigma^{2}$ , is negative, which is only possible for $0<\lambda<1$ and $\mu<2\sigma\sqrt{{(1-\lambda)}/\lambda}-(1/\lambda)$ . Conversely, when $\mu>2\sigma\sqrt{{(1-\lambda)}/\lambda}-(1/\lambda)$ , roots of Eq. (17) are real and distinct, which corresponds to the presence of a local mode preceded by a local antimode that coalesce into an horizontal inflexion point, at $y=-\alpha+[{(1+\lambda\mu)}/2]^{1/\lambda}$ , when $\mu=2\sigma\sqrt{{(1-\lambda)}/\lambda}-(1/\lambda)$ .

For $\lambda>1$ or $\lambda<0$ , $\sqrt{\Delta}>{(1+\lambda\mu)}/2$ implying that only one of the two real roots of Eq. (16) is positive. Consequently, the density function Eq. (15) has a single global mode at $y=-\alpha+[{(1+\lambda\mu)}/2+\sqrt{\Delta}]^{1/\lambda}$ .

3. The expectation of a Box-Cox normal distribution

Expectation is a key concept of probabilistic regression models, as it represents the causal model by which a random dependent variable is related to a set of explanatory variables. Therefore, before using these models, we must ensure of the existence of the expected value of the dependent variable, conditional to the vector of explanatory variables. In this section, we prove that an expectation for the Box-Cox normal distribution exists only for some scale parameter ranges of the Box-Cox transformation.

The mathematical expectation of the Box-Cox normal distribution Eq. (9) is defined by the following improper Riemann integral: $E(y)=\int_{-\alpha}^{+\infty}{yf(y){\rm d}y}$ . We remind that a Riemann integral is said to be improper if it doesn’t comply with the two basic assumptions of a Riemann integral $\int_{a}^{b}{f(x){\rm d}x}$ , namely that the integration interval $[a,b]$ is finite and the integrated function $f(x)$ is bounded over this interval.

Under the change of variable Eq. (2), this integral can be reformulated as follows, according to the values of the scale parameter $\lambda$ .

•
For $\lambda>0$ :

$\displaystyle E(y)=\lim_{b\to+\infty}\int_{-1/\lambda}^{b}{(1+\lambda x)^{1/% \lambda}\frac{\phi\left({\frac{x-\mu}{\sigma}}\right)}{\Phi\left({\frac{\mu+(1% /\lambda)}{\sigma}}\right)}}\frac{{\rm d}x}{\sigma}-\alpha.$ (18)
•
For $\lambda=0$ :

$\displaystyle E(y)=\left[{\lim_{a\to-\infty}\int_{a}^{0}{e^{x}\phi\left({\frac% {x-\mu}{\sigma}}\right)}\frac{{\rm d}x}{\sigma}+\lim_{b\to+\infty}\int_{0}^{b}% {e^{x}\phi\left({\frac{x-\mu}{\sigma}}\right)}\frac{{\rm d}x}{\sigma}}\right]-\alpha.$ (19)
•
For $\lambda<0$ :

$\displaystyle E(y)=\left[{\lim_{a\to-\infty}\int_{a}^{0}{(1+\lambda x)^{1/% \lambda}\frac{\phi\left({\frac{x-\mu}{\sigma}}\right)}{\Phi\left({\frac{\mu+(1% /\lambda)}{\sigma}}\right)}}\frac{{\rm d}x}{\sigma}+\lim_{\varepsilon\to 0+}% \int_{0}^{-(1/\lambda)-\varepsilon}{(1+\lambda x)^{1/\lambda}\frac{\phi\left({% \frac{x-\mu}{\sigma}}\right)}{\Phi\left({\frac{\mu+(1/\lambda)}{\sigma}}\right% )}}\frac{{\rm d}x}{\sigma}}\right]-\alpha.$ (20)

All these expectations are defined as limits of Riemann integrals either because the integration interval is not finite, which is the case for all the three integrals, or because the integrated function is not bounded over the integration interval, which is the case for integral Eq. (20) at $x=-1/\lambda$ .

The existence of improper Riemann integrals in expectation Eq. (19) is trivial, because it is the well known expectation of a lognormal random variable, which can be easily computed using the change of variable $x=\mu+\sigma z$ , leading to:

$\displaystyle\int_{-\infty}^{+\infty}{e^{x}\phi\left({\frac{x-\mu}{\sigma}}% \right)}\frac{{\rm d}x}{\sigma}=\int_{-\infty}^{+\infty}{e^{\mu+\sigma z}\phi(% z)}{\rm d}z=e^{\mu+(1/2)\sigma^{2}}.$ (21)

To prove the existence of improper Riemann integral in expectation Eq. (18), we use the property of dominance of function $(1+\lambda x)^{1/\lambda}$ by $e^{x}$ , when $\lambda>0$ , illustrated by Fig. 2: $(1+\lambda x)^{1/\lambda}\leqslant e^{x}$ for every $x\geqslant-1/\lambda$ . According to this property, we can write the inequality:

$\displaystyle\int_{-1/\lambda}^{+\infty}{(1+\lambda x)^{1/\lambda}\frac{\phi% \left({\frac{x-\mu}{\sigma}}\right)}{\Phi\left({\frac{\mu+(1/\lambda)}{\sigma}% }\right)}}\frac{{\rm d}x}{\sigma}\leqslant\int_{-\infty}^{+\infty}{e^{x}\frac{% \phi\left({\frac{x-\mu}{\sigma}}\right)}{\Phi\left({\frac{\mu+(1/\lambda)}{% \sigma}}\right)}}\frac{{\rm d}x}{\sigma}=\frac{e^{\mu+(1/2)\sigma^{2}}}{\Phi% \left({\frac{\mu+(1/\lambda)}{\sigma}}\right)},$ (22)

which proves the existence of the left term of the inequality by the finite positive value of its right term.

To analyze the existence of improper Riemann integrals in expectation Eq. (20), we first observe that for $\lambda<0$ , the inverse Box-Cox function is dominated by $1$ for all $x\leqslant 0$ , as it is a positive strictly increasing function, taking value $1$ when $x=0$ . Therefore, for the first improper Riemann integral of Eq. (20) we can write the inequality:

$\displaystyle\int_{-\infty}^{0}{(1+\lambda x)^{1/\lambda}\frac{\phi\left({% \frac{x-\mu}{\sigma}}\right)}{\Phi\left({\frac{\mu+(1/\lambda)}{\sigma}}\right% )}}\frac{{\rm d}x}{\sigma}<\int_{-\infty}^{0}{\frac{\phi\left({\frac{x-\mu}{% \sigma}}\right)}{\Phi\left({\frac{\mu+(1/\lambda)}{\sigma}}\right)}}\frac{{\rm d% }x}{\sigma}=\frac{\Phi\left({-\frac{\mu}{\sigma}}\right)}{\Phi\left({\frac{\mu% +(1/\lambda)}{\sigma}}\right)},$ (23)

which proves the existence of its left term by the finite positive value of its right term.

For the second improper Riemann integral of Eq. (20), we write two similar inequalities to lower and upper bound the integral value:

$\displaystyle K_{\rm L}\int_{0}^{-1/\lambda}{(1+\lambda x)^{1/\lambda}{\rm d}x% }<\int_{0}^{-1/\lambda}{(1+\lambda x)^{1/\lambda}\frac{\phi\left({\frac{x-\mu}% {\sigma}}\right)}{\Phi\left({\frac{\mu+(1/\lambda)}{\sigma}}\right)}}\frac{{% \rm d}x}{\sigma}<K_{\rm U}\int_{0}^{-1/\lambda}{(1+\lambda x)^{1/\lambda}{\rm d% }x},$ (24)

where:

$\displaystyle K_{\rm L}=\frac{\mathop{\operatorname{Min}}\limits_{0\leqslant x% \leqslant-1/\lambda}\phi\left({{\displaystyle\frac{x-\mu}{\sigma}}}\right)}{% \Phi\left({{\displaystyle\frac{\mu+(1/\lambda)}{\sigma}}}\right)\sigma}\text{∼% {}∼{}and∼{}∼{}}K_{\rm U}=\frac{\mathop{\operatorname{Max}}\limits_{0\leqslant x% \leqslant-1/\lambda}\phi\left({{\displaystyle\frac{x-\mu}{\sigma}}}\right)}{% \Phi\left({{\displaystyle\frac{\mu+(1/\lambda)}{\sigma}}}\right)\sigma}$ (25)

Therefore, the second improper Riemann integral of Eq. (20) will converge if its upper bound will converge, and diverge if its lower bound will diverge. Now, for $\lambda\neq-1$ , the computation of these bounds gives:

$\displaystyle\int_{0}^{-1/\lambda}(1+\lambda x)^{1/\lambda}{\rm d}x=\lim% \limits_{\varepsilon\to 0+}\int_{0}^{-(1/\lambda)-\varepsilon}{(1+\lambda x)^{% 1/\lambda}{\rm d}x}=\lim\limits_{\varepsilon\to 0+}\left[{\frac{(1+\lambda x)^% {{(\lambda+1)}/\lambda}}{\lambda+1}}\right]_{0}^{-(1/\lambda)-\varepsilon}=% \frac{(-\lambda)^{{(\lambda+1)}/\lambda}\lim\limits_{\varepsilon\to 0+}% \varepsilon^{{(\lambda+1)}/\lambda}-1}{\lambda+1}=\begin{cases}+\infty&\text{% when }-1<\lambda<0\\ -1/(\lambda+1)&\text{when }\lambda<-1\end{cases}$ (26)

and for $\lambda=-1$ :

$\displaystyle\int_{0}^{1}{\frac{{\rm d}x}{1-x}=\lim\limits_{\varepsilon\to 0+}% }\int_{0}^{1-\varepsilon}{\frac{{\rm d}x}{1-x}}=\lim\limits_{\varepsilon\to 0+% }[{-\log(1-x)}]_{0}^{1-\varepsilon}=-\lim\limits_{\varepsilon\to 0+}\log% \varepsilon=+\infty.$ (27)

Therefore, we conclude that expectation Eq. (20) does not exist for $-1\leqslant\lambda<0$ .
4. The $\bm{T}$ -expectation of a Box-Cox normal distribution

An intuitive alternative to the expectation of the distribution of $y$ , computable for every scale parameter value of the Box-Cox transformation, is represented by the median of $y$ , as suggested by Carroll and Ruppert (1981). In this paper we explore another alternative to the expectation of a Box-Cox normal variable when the value of the scale parameter prevents its calculation, by relying on the concept of $T$ -mean of a series of $n$ observations of a variable $y$ , where $T$ denotes a monotonic transformation of $y$ . More precisely, a $T$ -mean is defined as the value of $y$ , denoted by $\bar{y}_{T}$ , that rescaled by the transformation $T$ equals the arithmetic mean of the $n$ observations of $y$ rescaled by the same transformation. This leads to the following computational formula:

$\displaystyle\bar{y}_{T}=T^{-1}\left({\frac{1}{n}\sum\limits_{i=1}^{n}{T(y_{i}% )}}\right),$ (28)

with $T^{-1}$ the inverse of the $T$ transformation. This formula generalizes: the arithmetic mean, as it provides the arithmetic mean, when $T(y)=y$ ; the geometric mean, when $T(y)=\ln y$ ; the harmonic mean, when $T(y)=1/y$ ; the quadratic mean, when $T(y)=y^{2}$ ; and so on.

A natural extension of this concept to the case of a random variable $y$ , consists in replacing in this formula the arithmetic mean of the transformed observations of $y$ , by the expectation of random variable $y$ rescaled by the $T$ transformation. This extension provides a generalization of $E(y)$ , that exists for a Box-Cox normal distribution regardless of the value of scale parameter $\lambda$ . We shall name it a $T$ -expectation and denote it by $E_{T}(y)$ .

The $T$ -expectation of the Box-Cox normal distribution is defined by the following integral:

$\displaystyle E_{T}(y)=T^{-1}({E(T(y))})=T^{-1}\left({\int_{-\alpha}^{+\infty}% {T(y)f(y){\rm d}y}}\right),$ (29)

with $T(y)$ defined by the family of functions Eq. (1) and $T^{-1}(x)$ by the family of functions Eq. (3). Under the change of variable Eq. (2), $E(T(y))$ can be reformulated and computed as follows, according to the values of the scale parameter $\lambda$ .

•

For $\lambda>0$ :

$\displaystyle E(T(y))=\int_{-{\displaystyle\frac{\mu+(1/\lambda)}{\sigma}}}^{+% \infty}{(\mu+\sigma z){\displaystyle\frac{\phi(z)}{\Phi\left({{\displaystyle% \frac{\mu+(1/\lambda)}{\sigma}}}\right)}}{\rm d}z}=\mu+{\displaystyle\frac{% \sigma}{\Phi\left({{\displaystyle\frac{\mu+(1/\lambda)}{\sigma}}}\right)}}\int% _{-\frac{\mu+(1/\lambda)}{\sigma}}^{+\infty}{z\phi(z){\rm d}z}=\mu+{% \displaystyle\frac{\sigma}{\Phi\left({{\displaystyle\frac{\mu+(1/\lambda)}{% \sigma}}}\right)}}\int_{-\infty}^{\frac{\mu+(1/\lambda)}{\sigma}}{{\phi}^{% \prime}(z){\rm d}z}=\mu+\sigma{\displaystyle\frac{\phi\left({{\displaystyle% \frac{\mu+(1/\lambda)}{\sigma}}}\right)}{\Phi\left({{\displaystyle\frac{\mu+(1% /\lambda)}{\sigma}}}\right)}}.$ (30)

•

For $\lambda=0$ :

$\displaystyle E(T(y))=\int_{-\infty}^{+\infty}{(\mu+\sigma z)\phi(z){\rm d}z}=% \mu+\sigma\int_{-\infty}^{+\infty}{z\phi(z){\rm d}z}=\mu.$ (31)

•

For $\lambda<0$ :

$\displaystyle E(T(y))=\int_{-\infty}^{-\frac{\mu+(1/\lambda)}{\sigma}}{(\mu+% \sigma z)\frac{\phi(z)}{\Phi\left({-{\displaystyle\frac{\mu+(1/\lambda)}{% \sigma}}}\right)}{\rm d}z}=\mu-{\displaystyle\frac{\sigma}{\Phi\left({-{% \displaystyle\frac{\mu+(1/\lambda)}{\sigma}}}\right)}}\int_{-\infty}^{-\frac{% \mu+(1/\lambda)}{\sigma}}{{\phi}^{\prime}(z){\rm d}z}=\mu-\sigma{\displaystyle% \frac{\phi\left({-{\displaystyle\frac{\mu+(1/\lambda)}{\sigma}}}\right)}{\Phi% \left({-{\displaystyle\frac{\mu+(1/\lambda)}{\sigma}}}\right)}}.$ (32)

By inserting these results into the inverse Box-Cox transformation Eq. (3), we finally obtain the following expressions of the $T$ -expectation.

•

For $\lambda>0$ :

$\displaystyle E_{T}(y)=\left\{{1+\lambda\left[{\mu+\sigma{\displaystyle\frac{% \phi\left({{\displaystyle\frac{\mu+(1/\lambda)}{\sigma}}}\right)}{\Phi\left({{% \displaystyle\frac{\mu+(1/\lambda)}{\sigma}}}\right)}}}\right]}\right\}^{1/% \lambda}-\alpha.$ (33)

•

For $\lambda=0$ :

$\displaystyle E_{T}(y)=e^{\mu}-\alpha.$ (34)

•

For $\lambda<0$ :

$\displaystyle E_{T}(y)=\left\{{1+\lambda\left[{\mu-\sigma{\displaystyle\frac{% \phi\left({-{\displaystyle\frac{\mu+(1/\lambda)}{\sigma}}}\right)}{\Phi\left({% -{\displaystyle\frac{\mu+(1/\lambda)}{\sigma}}}\right)}}}\right]}\right\}^{1/% \lambda}-\alpha.$ (35)

5. Empirical evidence with a Box-Cox standard Tobit model

In order to determine empirical estimates of the Box-Cox scale parameter, we use two surveys conducted by the Bureau of Labour Statistics of the U.S. Department of Labour. The first survey, called “Dairy Survey”, collects data from 10,351 households on frequently purchased goods, especially food, on a two weeks basis. The second survey, called “Interview Survey”, collects data from 25,813 households on all expenditure, on a quarterly basis. The micro-data files are publicly available on the website of the Bureau of Labour Statistics, and may be downloaded and used without permission.

For each item of expenditure listed in Tables 1 and 2, we specify a Box-Cox standard Tobit model of the form:

$\displaystyle T(y^{\ast};\alpha,\lambda)={\beta}^{\prime}x+\sigma z_{[B_{1},B_% {2}]}\text{∼{}∼{}with∼{}∼{}}y=\begin{cases}y^{\ast}&\text{if }y^{\ast}>0\\ 0&\text{if }y^{\ast}\leqslant 0\end{cases},$ (36)

where $y$ represents the observed level of the dependent variable, $y^{\ast}$ the latent level of the dependent variable, $x$ a column-vector of explanatory variables, and $\beta$ a column-vector of impact coefficients of the explanatory variables on the continuous latent dependent variable.

The dependent variable is defined as the household expenditure divided the number of the household’s consumption units, computed according to the OECD definition, namely: 1 unit, for the first adult aged 18 years or over; 0.7 units, for the other adults aged 18 years or over; 0.5 units, for persons aged fewer than 18 years. According to a suggestion due to Box and Cox (1982), we rescale this measurement of the dependent variable by dividing it by its sample geometric mean, in order to have a null sample mean of $T(y;\alpha,\lambda)$ for $\lambda=0$ . So defined the dependent variable as well as the location parameter $\alpha$ are pure numbers, namely dimensionless magnitudes without an explicit unit of measurement. Indeed, the ratio of a household expenditure per consumption unit ( $Y$ ) to its sample geometric mean ( $\bar{Y}_{G}$ ) can be written as a product of ratios of the same magnitude, namely:

$\displaystyle y=Y/{\bar{Y}_{G}}=\prod\limits_{i=1}^{n_{+}}{(Y/Y_{i})}^{1/{n_{+% }}},$

with $Y_{i},i=1,\ldots,n_{+}$ the sample nonzero observations of $Y$ .

As explanatory variables we selected:

•

The logarithm of household’s net income per consumption unit, and its square;

•

The age of household’s reference person;

•

The number of years of education of household’s reference person;

•

A binary indicator of gender of household’s reference person (1 $=$ female, 0 otherwise);

•

A binary indicator of an Hispanic household’s reference person (1 $=$ Hispanic, 0 otherwise);

•

A binary indicator of a black household’s reference person (1 $=$ black, 0 otherwise);

•

A binary indicator of household’s living in a SMSA (1 $=$ yes, 0 $=$ no).

Table 1

Estimates of Box-Cox standard Tobit model with samples of censored observations

	[l]Expenditureitem	[c]Percentageof censoredobservations	Log-tobit model			Box-Cox tobit model
		[c]Percentageof censoredobservations	$\alpha$ : location parameter		Score test of $\lambda=0$	$\lambda$ : scale parameter		$\alpha$ : location parameter
			Estimate	$t$ -value		Estimate	$t$ -value	Estimate	$t$ -value
[l]Dairysurvey,sample size $=$ 10351	Cereal	46	3.73	20.28	$-$ 6.09
	Bakeprod	23	1.55	24.15	$-$ 8.16
	Beef	64	3.47	20.69	$-$ 7.52
	Pork	63	4.49	13.58	$-$ 12.33
	Othmeat	64	5.03	13.90	$-$ 10.48
	Poultry	63	5.37	15.98	$-$ 9.94
	Seafood	74	3.38	12.59	$-$ 8.89
	Eggs	63	23.42	5.31	$-$ 10.89
	Milkprod	38	5.70	11.81	$-$ 18.97
	Othdairy	37	2.95	17.99	$-$ 12.81
	Frshfrut	35	1.96	23.04	$-$ 5.28
	Frshveg	34	1.94	31.50	$-$ 4.98
	Procfrut	57	5.70	10.91	$-$ 11.53
	Procveg	53	3.16	15.97	$-$ 7.04
	Sweets	53	2.01	19.97	$-$ 11.26
	Nonalbev	32	1.84	22.92	$-$ 9.21
	Oils	61	5.93	16.20	$-$ 7.52
	Miscfood	19	1.14	27.09	$-$ 6.17	$-$ 1.06	$-$ 3.65	3.94	4.79
	Foodaway	21	1.05	24.64	$-$ 2.58	$-$ 0.17	$-$ 2	1.46	6.44
	Alcbev	72	2.40	13.86	$-$ 4.83
	Smoksupp	87	2.76	8.62	$-$ 3.54	$-$ 0.63	$-$ 0.83	6.07	1.46
	Petfood	80	2.39	12.49	$-$ 5.73
	Persprod	61	1.45	17.20	$-$ 7.58
	Persserv	89	5.38	6.83	$-$ 5.49
	Drugsupp	72	1.06	16.61	$-$ 5.74
	Houskeep	41	1.24	23.51	$-$ 7.60
[l]Interviewsurvey,sample size $=$ 25813	Food	1	0.23	33.63	$-$ 6.94	$-$ 0.15	$-$ 5.00	0.36	12.17
	Alcool	63	1.56	25.22	$-$ 6.33	$-$ 0.31	$-$ 2.30	2.57	5.58
	Housing	0	0.11	26.12	$-$ 6.34	$-$ 0.10	$-$ 6.43	0.18	13.21
	Apparel	40	0.90	38.90	$-$ 7.81	$-$ 0.31	$-$ 6.69	1.69	12.56
	Transport	6	0.17	42.67	$-$ 37.57	$-$ 1.32	$-$ 20.94	1.44	31.35
	Health	23	0.67	42.68	$-$ 10.82	$-$ 0.37	$-$ 7.68	1.42	13.15
	Entertainment	14	0.45	45.48	$-$ 20.1
	Perscare	41	1.19	33.15	$-$ 13.81	$-$ 0.57	$-$ 4.25	2.71	7.03
	Reading	80	1.58	18.06	$-$ 6.54	$-$ 0.36	$-$ 1.82	2.79	4.00
	Education	92	0.58	12.22	9.95	0.11	3.52	0.41	9.84
	Tobacco	83	4.61	14.83	$-$ 5.64
	Miscexp	70	0.58	27.18	$-$ 7.52	$-$ 0.16	$-$ 4.72	0.92	11.38
	Cashcont	54	0.58	34.86	$-$ 4.68	$-$ 0.10	$-$ 4.57	0.79	14.87
	Insurance	20	0.61	51.04	$-$ 16.68
	Shows	66	0.98	27.01	$-$ 6.82	$-$ 0.23	$-$ 3.57	1.57	8.57
	Foodaway	20	0.81	41.46	$-$ 16.84	$-$ 0.72	$-$ 6.28	2.31	9.00
	Vacations	80	1.30	18.90	1.78	0.07	1.21	1.09	6.01

Table 2

Estimates of Box-Cox standard Tobit model with samples of truncated observations

	[l]Expenditureitem	Number ofpositiveobservations	One parameter Box-Cox			Two parameters Box-Cox
		Number ofpositiveobservations	$\lambda$ : scale parameter		score test of $\alpha=0$	$\lambda$ : scale parameter		$\alpha$ : location parameter
			Estimate	$t$ -value		Estimate	$t$ -value	Estimate	$t$ -value
[l]Dairysurvey,sample size $=$ 10351	Cereal	5562	0.05	3.85	$-$ 1.09
	Bakeprod	7945	0.07	6.19	$-$ 7.85
	Beef	3724	$-$ 0.13	$-$ 10.15	1.77	$-$ 0.21	$-$ 5.76	0.06	2.29
	Pork	3844	$-$ 0.08	$-$ 4.73	1.01	$-$ 0.11	$-$ 2.57	0.02	0.84
	Othmeat	3751	$-$ 0.01	$-$ 0.73	2.57	$-$ 0.11	$-$ 2.23	0.08	1.84
	Poultry	3845	$-$ 0.10	$-$ 6.50	1.79	$-$ 0.18	$-$ 3.84	0.06	1.77
	Seafood	2657	$-$ 0.03	$-$ 2.25	1.53	$-$ 0.10	$-$ 2.59	0.05	1.79
	Eggs	3824	$-$ 0.12	$-$ 6.48	2.44	$-$ 0.33	$-$ 3.63	0.18	2.36
	Milkprod	6450	$-$ 0.07	$-$ 5.66	1.84	$-$ 0.16	$-$ 3.73	0.07	2.70
	Othdairy	6553	0.04	2.85	$-$ 5.04
	Frshfrut	6752	0.12	12.39	$-$ 5.80
	Frshveg	6856	0.09	13.75	0.89	0.07	8.01	0.01	1.39
	Procfrut	4461	0.07	4.80	1.09	0.04	1.18	0.02	0.92
	Procveg	4816	0.05	3.65	0.04	0.05	2.27	0	0.04
	Sweets	4912	$-$ 0.06	$-$ 5.12	0.62	$-$ 0.08	$-$ 2.88	0.01	0.50
	Nonalbev	7055	0.09	8.73	$-$ 5.27
	Oils	4006	0	0.16	1.12	$-$ 0.05	$-$ 1.57	0.04	1.91
	Miscfood	8426	0.14	16.28	$-$ 7.08
	Foodaway	8154	0.18	20.83	$-$ 19.73
	Alcbev	2917	0.06	4.56	1.60	0.02	0.79	0.02	1.23
	Smoksupp	1393	0	$-$ 0.16	$-$ 6.24
	Petfood	2049	0	0.01	2.28	$-$ 0.08	$-$ 2.13	0.05	1.89
	Persprod	4045	$-$ 0.03	$-$ 2.32	$-$ 2.70
	Persserv	1142	$-$ 0.03	$-$ 1.17	1.82	$-$ 0.10	$-$ 1.46	0.04	1.12
	Drugsupp	2913	$-$ 0.06	$-$ 5.08	0.71	$-$ 0.07	$-$ 3.55	0.01	0.55
	Houskeep	6141	0.08	12.46	3.74	$-$ 0.02	$-$ 0.94	0.07	3.51
[l]Interviewsurvey,sample size $=$ 25813	Food	25664	0.13	25.34	4.08	0.06	4.04	0.05	4.47
	Alcool	9510	0.07	10.00	$-$ 0.55
	Housing	25753	0.10	30.22	3.10	$-$ 0.06	$-$ 3.88	0.12	8.35
	Apparel	15394	0.08	18.90	$-$ 3.33
	Transport	24334	$-$ 0.02	$-$ 7.01	9.95	$-$ 1.34	$-$ 21.01	1.41	25.45
	Health	19940	0.16	44.09	$-$ 5.53
	Entertainment	22224	0.05	15.19	6.71	$-$ 0.18	$-$ 9.85	0.21	9.38
	Perscare	15140	0.02	3.53	$-$ 6.11
	Reading	5235	0.03	3.04	1.42	0	0.25	0.01	1.17
	Education	2110	0	$-$ 0.26	$-$ 7.29
	Tobacco	4509	0.12	11.66	$-$ 0.13
	Miscexp	7727	$-$ 0.04	$-$ 7.36	0.23	$-$ 0.04	$-$ 5.44	0	0.16
	Cashcont	11848	0.02	4.80	0.04	0.02	3.53	0	0.02
	Insurance	20608	0.13	64.36	4.03
	Shows	8679	0	0.18	1.41	$-$ 0.01	$-$ 0.71	0	1.11
	Foodaway	20647	0.07	14.63	3.15	0.03	3.05	0.02	3.23
	Vacations	5131	0.09	11.24	0.47	0.07	4.37	0.01	1.25

To estimate the parameters of model Eq. (36) by the maximum likelihood method, we need to derive the probability distribution of an observation of the dependent variable $y$ .

If such an observation is censured, and therefore can take either a value $y=0$ or $y>0$ , its distribution is a discrete-continuous mixture, which assigns a probability mass $P(y=0)=P(y^{\ast}\leqslant 0)$ to $y=0$ and a density function $f(y)=f_{y^{\ast}}(y)$ to any $y>0$ , with: $P(y=0)+\int_{0}^{\infty}{f(y){\rm d}y}=1$ . Note that $P(y=0)>0$ only for $\alpha>0$ .

Now, the density function $f(y)$ is written as Eq. (9) with $\mu={\beta}^{\prime}x$ , while the computation of $P(y=0)=1-\int_{0}^{\infty}{f(y){\rm d}y}$ can be performed using the change of variable:

$\displaystyle z=\frac{T(y;\alpha,\lambda)-{\beta}^{\prime}x}{\sigma}$ (37)

leading to:

$\displaystyle{\rm d}z=\frac{(y+\alpha)^{\lambda-1}}{\sigma}{\rm d}y\Rightarrow f% (y){\rm d}y=\frac{\phi(z){\rm d}z}{\Pi},$ (38)

with $\Pi$ defined by Eq. (7),

$\displaystyle y=0\Leftrightarrow z=\frac{T(0;\alpha,\lambda)-{\beta}^{\prime}x% }{\sigma},y=\infty\Leftrightarrow z=\begin{cases}\infty&\text{for }\lambda% \geqslant 0\\ -{\displaystyle\frac{{\beta}^{\prime}x+(1/\lambda)}{\sigma}}&\text{for }% \lambda<0\end{cases}$ (39) $\displaystyle\int_{0}^{\infty}{f(y){\rm d}y}=\frac{1}{\Pi}\begin{cases}% \displaystyle\int_{\frac{T(0;\alpha,\lambda)-{\beta}^{\prime}x}{\sigma}}^{% \infty}{\phi(z){\rm d}z}&\text{for }\lambda\geqslant 0\\ \displaystyle\int_{\frac{T(0;\alpha,\lambda)-{\beta}^{\prime}x}{\sigma}}^{-% \frac{{\beta}^{\prime}x+(1/\lambda)}{\sigma}}{\phi(z){\rm d}z}&\text{for }% \lambda<0\end{cases}={\displaystyle\frac{1}{\Pi}}\begin{cases}1-\Phi\left({{% \displaystyle\frac{T(0;\alpha,\lambda)-{\beta}^{\prime}x}{\sigma}}}\right)&% \text{for }\lambda\geqslant 0\\ \Phi\left({-{\displaystyle\frac{{\beta}^{\prime}x+(1/\lambda)}{\sigma}}}\right% )-\Phi\left({{\displaystyle\frac{T(0;\alpha,\lambda)-{\beta}^{\prime}x}{\sigma% }}}\right)&\text{for }\lambda<0\end{cases}$ (40)

and finally to:

$\displaystyle P(y=0)=1-\int_{0}^{\infty}{f(y){\rm d}y}=\begin{cases}\frac{% \displaystyle\Phi\left({\frac{[({\alpha^{\lambda}-1)}/\lambda]-{\beta}^{\prime% }x}{\sigma}}\right)-\Phi\left({-\frac{{\beta}^{\prime}x+(1/\lambda)}{\sigma}}% \right)}{\displaystyle\Phi\left({\frac{{\beta}^{\prime}x+(1/\lambda)}{\sigma}}% \right)}&\text{for }\lambda>0\\ \Phi\left({{\displaystyle\frac{\ln\alpha-{\beta}^{\prime}x}{\sigma}}}\right)&% \text{for }\lambda=0\\ \frac{\Phi\left(\displaystyle{\frac{[({\alpha^{\lambda}-1)}/\lambda]-{\beta}^{% \prime}x}{\sigma}}\right)}{\displaystyle\Phi\left({-\frac{{\beta}^{\prime}x+(1% /\lambda)}{\sigma}}\right)}&\text{for }\lambda<0\end{cases}$ (41)

Note also that: $\lim_{\alpha\to 0+}P(y=0)=0$ .

From these results we derive the following expression of the log-likelihood for a random sample of $n$ observations of a censored dependent variable $y$ :

$\displaystyle\ln L=\sum\limits_{i=1}^{n}{\ln L_{i}=\sum\limits_{i|{y_{i}=0}}{% \ln P(y_{i}=0)}+\sum\limits_{i|{y_{i}>0}}{\ln f(y_{i})}}.$ (42)

If the observation of $y$ is truncated only values $y>0$ can be observed. Thus, the distribution of $y$ is that of a continuous random variable with density function given by:

$\displaystyle f_{y>0}(y)=\frac{f(y)}{1-P(y=0)}\text{∼{}∼{}for∼{}∼{}}y>0,$ (43)

with $1-P(y=0)=\int_{0}^{\infty}{f(y){\rm d}y}$ given by Eq. (40). Thus, the log-likelihood for a random sample of $n$ observations of a truncated dependent variable $y$ is written as:

$\displaystyle\ln L=\sum\limits_{i=1}^{n}{({\ln f(y_{i})-\ln[1-P(y_{i}=0)]})}.$ (44)

Table 1 presents the results of estimating the Box-Cox standard Tobit model Eq. (36) with the full sample of censured observations (10’351 observations for the “Dairy Survey”, and 25’813 observations for the “Interview Survey”). The first part of the table presents the estimation of a lognormal Tobit model of the dependent variable. To test this assumption, we provide the score test statistics for the assumption $\lambda=0$ , which is asymptotically distributed according to a standard normal random variable. The reported results show that a highly significant positive estimate of the location parameter has been obtained for all the expenditure items of both surveys, which comply with the presence in the survey data of a moderate to very high rate of censored observations. According to the values of the score statistics, the maintained assumption of log normality of the dependent variables is significantly rejected against a more skewed distribution characterized, with one exception (education), by negative values of parameter $\lambda$ . The second part of Table 1 displays the joint maximum likelihood estimation of both parameters $\lambda$ and $\alpha$ , with the corresponding t-statistics. For all items for which such a joint estimate has been obtained, the results confirm the sign of parameter $\lambda$ indicated by the score test. On the other hand, a strong negative correlation between these two parameter estimators significantly reduces their statistical significance as well as the power of the score test for the assumption $\lambda=0$ .

Common sense suggests that these disappointing results stem from an inadequate consideration of the actual mechanisms generating the censored observations. Indeed, the standard Tobit model attributes the censorship of the expenditure for the purchase of a given good or service to a negative desired level of consumption of the consumer, as a consequence of an economic rational decision based on his preferences, resources and good prices. If such a mechanism may be relevant to explain the censoring of expenditures for luxury goods, as it is the case for some of the expenditure items of the “Interview Survey” (notably “shows” and “vacations”), it is hardly to be a credible explanation for the censorship of the expenditure items of the “Dairy Survey”, which specificity and duration of the survey period call for the use of other censoring mechanisms; notably Cragg (1971) good selection mechanism and Deaton and Irish (1984) infrequent purchasing mechanism. Adding these mechanisms to the standard Tobit model alters the likelihood function of a censured sample but not that of a truncated sample, when the added censoring mechanisms are independent of the standard Tobit relation. Indeed, suppose we add to the Box-Cox standard relation Eq. (36) two new censoring mechanisms modelled by two latent variables $y_{1}^{*}$ and $y_{2}^{*}$ jointly independent of $y^{*}$ , with $y=\begin{cases}y^{*}&\text{if }y^{*}>0,y_{1}^{*}>0,y_{2}^{*}>0\\ 0&\text{otherwise}\end{cases}$ . For this model the probability of a censored observation is given by $P(y=0)=1-P(y^{*}>0)P(y_{1}^{*}>0,y_{2}^{*}>0)$ . As the joint density function of $y^{*}$ , $y_{1}^{*}$ and $y_{2}^{*}$ is written as $f(y^{*},y_{1}^{*},y_{2}^{*})=f(y^{*})f(y_{1}^{*},y_{2}^{*})$ , the density function of a positive observation of $y$ may be expressed as follows:

$\displaystyle f(y|{y>0})=f(y)\int_{0}^{\infty}\int_{0}^{\infty}{f(y_{1}^{*},y_% {2}^{*}){\rm d}y_{2}^{*}}{\rm d}y_{1}^{*}=f(y)P(y_{1}^{*}>0,y_{2}^{*}>0).$

For a truncated observation of $y$ , this leads to the following density function:

$\displaystyle f_{y>0}(y)=\frac{f(y|{y>0})}{1-P(y=0)}=\frac{f(y)P(y_{1}^{*}>0,y% _{2}^{*}>0)}{P(y^{*}>0)P(y_{1}^{*}>0,y_{2}^{*}>0)}=\frac{f(y)}{P(y^{*}>0)},$

which corresponds to the truncated density function of the Box-Cox standard Tobit model Eq. (43).

For this reason we can expect more relevant results in estimating parameters $\lambda$ and $\alpha$ with the truncated samples of observations. These estimates are shown in Table 2, displaying in a first part the maximum likelihood estimates of the scale parameter $\lambda$ , with the corresponding t-statistics, performed by assuming $\alpha=0$ , and the score test statistics for this assumption as an indicator of the direction of search of a free estimate of $\alpha$ , namely $\alpha>0$ for a negative value of the score test, and $\alpha<0$ for a positive value. The second part of the table displays the joint maximum likelihood estimates of parameters $\lambda$ and $\alpha$ , with the corresponding t-statistics, for all the expenditure items for which a positive estimate of the location parameter is expected, with two exceptions (transport and insurance of the “Interview Survey). From these estimates we can conclude that negative estimates of parameter $\lambda$ are prevalent compared to positive estimates. For the expenditure items of the “Dairy Survey”, 13 estimates of this parameter are negative, ranging between $-$ 0.33 and $-$ 0.02, while only 4 estimates are positive, ranging between 0.02 and 0.07. For the “Interview Survey”, 5 estimates are negative, ranging between $-$ 1.34 and $-$ 0.01, 4 estimates are positive, taking values between 0.02 and 0.07, and an estimate is zero.

6. Conclusion

The empirical results presented in Section 5 suggest that in analyzing family expenditures for consumption goods and services at a detailed or semi-aggregate level with Box-Cox Tobit models, we must expect to be faced to significant departures from the assumption of normality of the dependent variable towards highly right skewed and leptokurtic probability distributions with no expectation. The non existence of the expectation of the dependent variable asks for reconsidering the theoretical magnitudes conventionally used for forecasting purposes and marginal effect measurement. To this end, the concept of T-expectation presented in Section 4 deserves to be experienced empirically and theoretically analyzed. Our research is continuing in this direction.

Footnotes

Acknowledgments

We thank the MASA Co-Editor-in-Chief, Prof. Stan Lipovetsky, for suggesting to benchmark our findings to former applications of the Box-Cox transformation in econometrics and in statistics.

References

Amemiya

(1985). Advanced econometrics. Harvard University Press, Cambridge (MA).

Berndt

E. R.

, & Khaled

(1979). Parametric productivity measurement and choice among flexible functional forms. Journal of Political Economy, 87(6), 1220-1245.

Box

, & Cox

(1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological), 26(2), 211-252.

Box

G. E. P.

, & Cox

D. R.

(1982). An analysis of transformations revisited, rebutted. Journal of the American Statistical Association, 77(377), 209-210.

Carroll

R. J.

, & Ruppert

(1981). On prediction and the power transformation family. Biometrika, 68(3), 609-615.

Chaze

(2005). Assessing household health expenditure with box-cox censoring models. Health Economics, 14(9), 893-907.

Cragg

(1971). Some statistical models for limited dependent variables with applications for the demand for durable goods. Econometrica, 39(5), 829-844.

Davidson

, & Mackinnon

J. G.

(2004). Econometric theory and methods. Oxford University Press.

Deaton

, & Irish

(1984). A statistical model for zero expenditures in household budgets. Journal of Public Economics, 23, 59-80.

10.

Freeman

, & Modarres

(2006). Inverse box-cox: The power-normal distribution. Statistics & Probability Letters, 76(8), 764-772.

11.

Johnson

, & Kotz

(1995). Continuous univariate distributions – 2. Houghton Mifflin Company.

12.

Johnson

N. L.

(1949). Systems of frequency curves generated by methods of translation. Biometrika, 36(1-2), 149-176.

13.

Jones

A. M.

, & Yen

S. T.

(1994). A box-cox double hurdle model. The Manchester School, 68(2), 203-221.

14.

Lankford

R. H.

, & Wyckoff

(1991). Modeling charitable giving using a box-cox standard tobit model. The Review of Economics and Statistics, 73(3), 460-470.

15.

Lipovetsky

, & Conklin

(2000). Box-cox generalization of logistic and algebraic binary response models. International Journal of Operations and Quantitative Management, 6, 276-285.

16.

Poirier

D. J.

(1978). The use of the box-cox transformation in limited dependent variable models. Journal of the American Statistical Association, 73(362), 284-287.

17.

Poirier

D. J.

, & Melino

(1978). A note on the interpretation of regression coefficients within a class of truncated distributions. Econometrica, 46(5), 1207-1209.

18.

Taylor

(1986). The retransformed mean after a fitted power transformation. Journal of the American Statistical Association, 81(393), 114-118.

19.

Tishler

, & Lipovetsky

(1997). The flexible ces-gbc family of cost functions: Derivation and application. The Review of Economics and Statistics, 79(4), 638-646.

20.

Tishler

, & Lipovetsky

(2000). A globally concave, monotone and flexible cost function: Derivation and application. Applied Stochastic Models in Business and Industry, 16(4), 279-296.

21.

Yen

S. T.

(1993). Working wives and food away from home: The box-cox double hurdle model. American Journal of Agricultural Economics, 75(4), 884-895.