Abstract
This article addresses the issue of building regression models for bounded responses, which are robust in the presence of outliers. To this end, a new distribution on (0,1) and a regression model based on it are proposed and some properties are derived. The distribution is a mixture of two beta components. One of them, showing a higher variance (variance inflated) is expected to capture outliers. Within a Bayesian approach, an extensive robustness study is performed to compare the new model with three competing ones present in the literature. A broad range of inferential tools are considered, aimed at measuring the influence of various outlier patterns from diverse perspectives. It emerges that the new model displays a better performance in terms of stability of regression coefficients’ posterior distributions and of regression curves under all outlier patterns. Moreover, it exhibits an adequate behaviour under all considered settings, unlike the other models.
Keywords
Introduction
In many disciplines, the issue of modelling a bounded continuous response variable (e.g., a rate or proportion) as a function of covariates is a relevant one (Kieschnick and McCullough, 2003), some examples being geology, medicine, economics, psychology, environmetrics, psephology, etc. Clearly, linear regression models are unsuitable for these data. To properly address this issue, the beta regression (BReg) model has been proposed and extensively studied (Ferrari and Cribari-Neto, 2004; Paolino, 2001; Smithson and Verkuilen, 2006; Espinheira et al., 2008a,b; Ferrari et al., 2011; Pereira et al., 2013). Moreover, Ospina et al. (2006), Ferrari and Pinheiro (2011) and Bayer and Cribari-Neto (2013) have addressed small sample likelihood asymptotic issues, whereas Cribari-Neto and Souza (2012), Bayer and Cribari-Neto (2017), Simas et al. (2010) and Canterle and Bayer (2017) have addressed testing, model selection, and alternative regression functions. Inflated beta models which account for the presence of mass at 0 and 1 response values are dealt with in Ospina and Ferrari (2010, 2012) and Pereira and Cribari-Neto (2014). For a thorough review of the whole subject, see Liu and Eugenio (2016)
In the basic beta framework, the mean response is related to a linear predictor through a link function. A precision parameter is also present, which may be modelled as a function of covariates too. Despite its variety of shapes, the beta distribution fails to model a wide range of phenomena (Bayes et al., 2012; García et al., 2011) including the heavy tails phenomenon, which is strictly related to the presence of outliers, a critical problem in regression analysis (Rousseeuw and Leroy, 2005). Outliers are observations which are statistically unreasonable with respect to a given probability model (Barnett and Lewis, 1995) and can be interpreted as generated by an alternative random mechanism (Box and Tiao, 1968). Indeed, several authors highlighted the lack of robustness to outliers connected with standard inferential procedures for the BReg model. In particular, Espinheira et al. (2008a,b) adopting a classical approach, propose ways of identifying outliers or influential observations so as to remove them before applying maximum likelihood inference. Instead, Ghosh (2017), again under a classical framework, suggests a different estimation method aimed at producing more stable results.
Bayes et al. (2012) adopt a third and radically different approach by proposing, under a Bayesian setting, a new more sophisticated model specifically devised to account for outliers. Here the aim is not only to achieve point estimate robustness but also to model both the main data body and outliers. Bayes’ model is based on the beta rectangular (BR) distribution (Hahn, 2008), which is a mixture of a uniform and a beta distribution, where the uniform component is expected to capture outliers. A simulation study proved its greater robustness with respect to the BReg model.
To achieve a greater general flexibility, a different mixture regression model can be found in Migliorati et al. (2018), where the response is a flexible beta (FB), that is, a mixture of two betas with common variance and different means (mean-shift type model). This proposal is capable of taking heavy tails, asymmetry, and bimodality into account. Although not specifically meant to address robustness, the flexible beta regression (FBReg) is shown to behave satisfactorily in the presence of some outlier patterns.
In the spirit of Bayes’ work, the main objective of the present article is to introduce and study, within a Bayesian approach, a new mixture regression model for bounded responses capable of handling outliers. To this end, a new beta mixture distribution is proposed and employed for the response variable.
The motivation behind this new regression model comes from some limitations of the two existing mixture ones. In particular, the mean-shift structure of the FBReg model is consistent with the presence of only all upper or all lower outliers, which may not be suitable in some contexts. The beta rectangular regression (BRReg) model does not share this difficulty, but it models the outliers through a fixed component, the uniform, implying a given mean and variance, which may lead to a poor fit.
A natural and simple way to overcome these possible drawbacks is to consider for the outliers a component which has a variable centre, equal to the main data component mean, and a larger (and not fixed) variance. The resulting distribution, called variance inflated beta (VIB), is thus a mixture of two betas with the same mean but different variances.
To understand the potential of the proposed regression model, we perform a comprehensive and detailed robustness comparison of all regression models, aimed at providing an evaluation of the strength of various types of inferential effects produced under different outlier scenarios. This will also allow us to provide practical guidelines for selecting the most appropriate model under various contexts.
In particular, we contaminate a simulated dataset with three different types of outlier patterns, representative of most important cases. In each pattern, outliers are introduced with variable degrees of departure from the rest of the data. To this end, we implement a perturbation operation for the observations specifically designed for constrained data. We then assess outlier influence from a variety of different perspectives. Specifically, we consider their impact on marginal and joint parameter posteriors, fitting measures suitably built to cope with mixture models (widely applicable information criterion WAIC), residuals and predictive intervals. We also propose a measure of the outlier impact on the whole regression curve, as, in some contexts, this gives further complementary information with respect to the regression coefficients’ behaviour.
Differently from the other models, the variance inflated beta regression (VIBReg) will be shown to have a satisfactory performance under almost all the examined conditions.
The rest of the article is organized as follows. The VIB distribution and VIBReg model are introduced in Section 2, where some properties (e.g., identifiability and likelihood boundedness) are derived as well. Details concerning the Bayesian estimation procedure, including a highly efficient Hamiltonian Monte Carlo (HMC) algorithm, are provided in Section 3. Section 4 is devoted to the robustness analysis of the four regression models. An application to a well-known dataset is illustrated in Section 5, whereas Section 6 draws some final conclusions and a discussion of main results.
A new variance inflated beta model for bounded continuous variables
Introduction
In the regression models available in the literature (as well as in the new variance inflated one), the overall mean of the response variable
where
The basic model is the beta regression one BReg(
Estimation issues within the BReg model have been dealt with by maximum likelihood (Ferrari and Cribari-Neto, 2004) and through Bayesian inference (Branscum et al., 2007).
Then, Bayes et al. (2012) proposed the BRReg(
Recently, Migliorati et al. (2018) proposed the FBReg(
Here the parameters
Our motivation to devise a new regression model, which produces robust inferences in presence of outliers, has originated from the attempt to remove some possible limitations of the BRReg and the FBReg models. The very same structure of the FBReg model favours the presence of only upper or lower outliers, which may not be suitable in all contexts. On the other hand, the BRReg forces the mixture component which captures the outliers to have a fixed mean and variance (
A natural and simple way to remove these limitations is to consider for the outliers a component which has the same centre as the main data component, but which displays a larger (and variable) variance. The resulting distribution (hereafter, VIB
where
is the beta density in the mean-precision parametrization.
The parameter
VIB raw and central moments of any order (as well as the moment generating function) can be easily and explicitly derived. This is because, thanks to the special beta mixture, moments and moment generating function can be written as mixtures of the corresponding quantities of the two beta components, which are available in explicit form. For example, the moment generating function is:
where
where
In particular, the variance of the VIB distribution is equal to:
where
Note that the sign of the skewness is determined by μ,
The kurtosis coefficient is equal to:
where
The VIB kurtosis formula is involved and for space constraints we do not perform a full study here. Rather, we limit ourselves to show VIB kurtosis greater flexibility than the beta one, in particular as long as high values are concerned. The beta kurtosis, which is given by
Indeed, the VIB distribution extends the shapes of the beta and BR ones mainly in terms of heavy tails. More precisely, any unimodal beta distribution necessarily displays zero limits at zero and at one. Positive and finite limits can be obtained, within the beta distribution, only with monotone (or uniform) densities. Furthermore, any unimodal BR has both limits positive and equal.
On the contrary, the VIB distribution allows to have unimodal densities with one or both positive limits (see Figures 1a and 1c). Moreover, the VIB can also model the presence of extremely low (close to zero) and/or extremely high (i.e., close to one) values, maintaining a unimodal shape for the central data body (see Figure 1b), unlike both the beta and the BR.
Some examples of VIB distributions
The VIBReg is obtained by letting the ith response variable to be VIB distributed as
g being a suitable link function.
It is worthwhile to highlight the variance behaviour entailed by the model. By expression (2.5),
The VIBReg model shows other relevant properties such as strong identifiability. The latter means that two elements of a parametric family are equal if and only if the corresponding parameters are identical. Typically, general mixture models (including mixtures of arbitrary beta distributions) do not possess this property, often displaying invariance under permutation of the components instead. This may have negative consequences both on the interpretation of parameters (and therefore on the elicitation of priors), and on computational grounds, where, for example, labelling problems may arise in the estimation process. Under completely general conditions, the VIBReg does not share these difficulties, as shown by the following proposition (see 7 for the proof).
Consider a vector of independent response variables
where
Another typical issue within mixture models is likelihood unboundedness. Possible unbounded likelihood points may generate computational difficulties and are generally of no inferential relevance, being often caused by an overfitting problem which can occur, for example, when a component of the mixture is entirely dedicated to a single observation. The VIBReg likelihood is not bounded, but very simple and mild restrictions on its parameter space can be adopted to avoid unbounded points.
In Appendix B, we give the complete proof of Proposition 2. Note that the constrained parametric spaces implied by (i) and (ii) can be viewed as restrictions of the one corresponding to case (iii), which is the more general. These constrained spaces have the following implications: case (i) implies a lower bound on the variance of the two mixture components, and a bound on their ratio, case (ii) implies a bound on the variance ratio only, case (iii) implies a bound on neither of them. Indeed, it is easy to see that, by choosing sufficiently low values of
The two aforementioned propositions apply, in particular, to an i.i.d. sample from the VIB distribution, being this a special case of the VIBReg model, namely the case with a single (fictitious) constant covariate.
For comparison, recall that the FBReg model is both identifiable and with bounded likelihood, whereas the BRReg possesses neither property (see (Migliorati et al., 2018)).
The likelihood of the VIBReg model for independent observations
where
We decided to adopt a Bayesian approach, which makes it easy to cope with complex models with many parameters and, moreover, with small sample problems affecting maximum likelihood estimates. In order to apply Markov chain Monte Carlo (MCMC) techniques, a recent solution is the adaptive HMC algorithm (Duane et al., 1987). The latter is a generalization of the Metropolis algorithm, which combines MCMC and deterministic simulation methods. This is achieved through augmentation of the posterior density by an independent distribution of an auxiliary variable, introduced to enable faster movements of the algorithm through the parameter space. Typically, this leads to greater efficiency with respect to Gibbs sampler or Metropolis techniques, by strongly reducing autocorrelation among draws (Brooks et al., 2011). One of the main drawbacks of HMC samplers, when applied to mixture models, is that they are only defined on continuous distributions. Indeed, in any mixture model it is necessary to introduce n-dimensional vectors of latent indicators for mixture components. Therefore, a partition of the parameter space must be introduced, and HMC updates for continuous parameters and Metropolis updates for discrete ones must be alternated. Finally, the discrete parameters can be summed out of the model.
Clearly, a complete specification of the Bayesian model requires the elicitation of the parameters’ prior distributions. We assumed a priori independence (a usual choice when no prior information is available) and adopted flat priors. More precisely, for the regression parameters
We implemented the HMC algorithm resorting to the Stan program (Stan Development Team, 2016; Gelman et al., 2014) and to its default No-U-Turn Sampler (NUTS) (Hoffman and Gelman, 2014).
In this Section, we perform an extensive simulation study to assess the robustness of the proposed VIBReg model in presence of various configurations and various degrees of intensity of outliers and to compare it with competing models. Robustness will be measured through a broad range of tools taking different perspectives into account. In particular, the impact of outliers shall be assessed by considering marginal and global posteriors of regression parameters, measures of distance between regression curves, fitting measures, as well as predictive intervals and residuals.
To contaminate a subset of response values with the purpose of generating artificial outliers, we apply the perturbation operation on the simplex, which can be viewed as the analogous of addition onto real space (Pawlowsky-Glahn et al., 2015). Indeed, simply shifting the observed responses by a given range of quantities may generate values outside the interval
In general terms, the perturbation of vector
where 𝒞 is the closure operation being:
where
In a 2-dimensional simplex, the perturbing vector can be set equal to
The initial simulation framework considers a BReg model with a quantitative covariate x. Specifically, we simulate
Each baseline dataset has been further contaminated according to three patterns of perturbation as follows: Increase by a perturbing factor Increase by a perturbing factor Increase by a perturbing factor
For each pattern of perturbation, different degrees of intensity have been evaluated by letting
Therefore, all the graphs reported further, which help to illustrate the various effects of outliers, have
The scatter plots for

The simulated scenarios have been defined following a long-established scheme (beginning with Peña and Guttman [1993], amongst other) of contamination of a baseline case. Here, the scenarios we devised are comparable to the ones in Bayes et al. (2012), the main differences being the introduction of the perturbation operation which replaces the addition and the removal of their Scenario II (i.e., decrease responses for high x values), which is symmetric to our Scenario I and gives similar results.
Under each of the four regression models, the parameters have been estimated by HMC through the Stan software, as described in Section 3, and the convergence to the equilibrium distribution has been checked by means of diagnostic tests for stationarity (Geweke and Heidel diagnostics) and autocorrelation (Raftery diagnostic) (Mengersen et al., 1999).
A first evaluation of the robustness of the four models can be obtained by analysing the variations induced by the outliers in the posterior distribution of each regression parameter. To this end, we plot the MC posterior mean and 95% credible interval (CI) as a function of
Scenario I: Posterior mean (solid line) and 95% credible interval (dashed line) of
(top panel) and
(bottom panel) for each regression model as
varies
Scenario I: Posterior mean (solid line) and 95% credible interval (dashed line) of
(top panel) and
(bottom panel) for each regression model as
varies
In all scenarios, the regression parameters of the BReg seem to be highly affected by increasing values of
Furthermore, it should be noted that under the BRReg model, the intercept tends to move downwards (below the true value) even when only upper outliers are present.
This apparently odd (and in contrast with the other models) performance can be explained by recalling that the BRReg, by modelling the outliers with a uniform, assigns a 0.5 mean to them, which has the effect of always shrinking the overall mean towards such a value regardless the data pattern.
In Scenario II, the VIBReg and the BRReg models behave very well even for high degrees of perturbation.
Scenario II: Posterior mean (solid line) and 95% credible interval (dashed line) of
(top panel) and
(bottom panel) for each regression model as
varies
Scenario III: Posterior mean (solid line) and 95% credible interval (dashed line) of
(top panel) and
(bottom panel) for each regression model as
varies
A second assessment of the outliers impact is measured in terms of modification of the joint, instead of the marginal, parameter posteriors. In particular, we focus on the regression coefficients, being typically the most relevant parameters of the model. We, therefore, compute the Kullback–Leibler (KL) divergence between the joint posterior referred to the unperturbed dataset,
KL divergence measure for VIBReg (solid line), FBReg (dashed line), BRReg (dotted line) and BReg (dotdash line)
KL divergence measure for VIBReg (solid line), FBReg (dashed line), BRReg (dotted line) and BReg (dotdash line)
The inspection of the KL divergence over the three simulated scenarios (Figure 6) confirms the better general robustness of the VIBReg model, whose distance curves either lie below or are comparable to the others. The VIBReg is the only model with non-(rapidly) increasing curves in all three scenarios. Interestingly, the proposed model is remarkably stable for extreme levels of perturbation, in all scenarios, unlike the other models.
A careful investigation of the plots in Section 4.2 reveals that in some cases,
Therefore, it seems relevant to directly evaluate the global impact of the outliers on regression curves. To this end, we consider the
where the integral is over the range of observed x-values,
Curve distances for VIBReg (solid line), FBReg (dashed line), BRReg (dotted line) and BReg (dotdash line)
The results reported in Figure 7 are largely consistent with the ones relative to the joint posterior KL distance, with an even more evident better VIB performance.
The main difference lies in a general worse behaviour of the BRReg model, especially in Scenario III where the BRReg is outperformed even by the BReg for moderate perturbation. This may be due to the balancing effect of the estimates of the regression parameters which is weaker for the BRReg.
The aim of this Section is to consider the effects of outliers on the fitting ability of the models. A well-established way of comparing models is through the value of proper fitting criteria, which typically mediate between goodness of fit and complexity. Different measures have been proposed within a Bayesian framework, the most popular being the deviance information criterion (DIC) (Spiegelhalter et al., 2002) and the Bayesian counterparts of the well-known Akaike information criterion (AIC) (Akaike, 1998) and Bayesian information criterion (BIC) (Schwarz, 1978) criteria, namely expected AIC (EAIC) and expected BIC (EBIC) (Brooks, 2002). DIC is defined as:
where
The other two criteria take the form:
where
These criteria are known to have some problems, especially in mixture models contexts. In particular, DIC is not defined for non-regular statistical models and can produce negative estimates of
Recently, a new criterion called WAIC has been proposed (Watanabe, 2010). This is a fully Bayesian measure in that it uses the entire posterior distribution (instead of point estimates only), the main consequences being invariance to reparameterizations and possibility of computation also for non-regular models (Gelman et al., 2014; Vehtari et al., 2017).
WAIC evaluates the goodness of fit of a model through the log pointwise predictive density (
where
The model complexity adjustment is essential here, as the
The first penalization follows a logic similar to that used for DIC and it can be estimated by:
The second is based on the posterior variance of the log predictive density, which can be estimated as:
where
In both cases WAIC is given by −2 times the difference between
In our simulation study, both WAIC criteria have been computed, but only the first is reported, say WAIC1 (see penalization 4.3), since they provide completely consistent conclusions. In particular, Figure 8 shows the WAIC1 values as δ varies.
Mean values of WAIC1 criterion for VIBReg (solid line), FBReg (dashed line), BRReg (dotted line) and BReg (dotdash line) models as
varies in the three scenarios
As a general pattern, for low values of
For higher
To obtain further insight into the performances of the four models in the various outlier scenarios, we analyse residuals and predictive intervals based on a given sample with
We examine Bayesian residuals (Gelman et al., 2014), defined as:
where

BReg, FBReg and VIBReg exhibit a similar outlier detection capability. Roughly speaking, they recognize third out of the fourth introduced outliers in the first two scenarios, and all of them in the third one (see Figure 9). In addition, all these models identify as an anomalous response the non-perturbed value with a high negative residual. This seems a reasonable outcome in the light of Figure 2. Note that the anomalous non-perturbed value is the rightmost among the lowest ones, occurring where the predictive mean is close to one and, consequently, the variance is very low. On the contrary, BRReg only detects the two most extreme lower outliers introduced in Scenario III. This is because the uniform component of the BR, which models outliers, induces a particularly high response variability. Indeed, the estimated response variance under the BRReg is always (in each scenario and for any covariate value) at least three times as large as the variance under the other three models. See Figure 10, where the ratios of the BRReg, VIBReg and BReg variances to the FBReg one are reported. From this figure, it also emerges that the other models display a similar variance.
Estimated response variance ratio with respect to FBReg in the three scenarios. VIBReg (solid line), BRReg (dotted line) and BReg (dotdash line)
Finally, we simulate draws from the posterior predictive distribution for each model (see Figures 16, 17 and 18 in 16). The FBReg provides the smallest and best centred intervals in the first two scenarios, whereas the BReg always has the poorest performance. The BRReg almost always displays very large intervals for the outliers (often nearly covering the whole (0,1) interval), thus being poorly informative. Again, this is caused by its uniform component. The VIBReg shows a slightly worse performance than the FBReg in the first two scenarios and a comparable behaviour in the third one.
It is useful to perform an overall comparison of the regression models under the various scenarios and different perspectives considered in the simulation in order to obtain useful indications on which model is preferable in different contexts.
Let us first compare the models in terms of residuals, as here results are quite clear-cut and somewhat special. Indeed, the BRReg is the only model unable to identify nearly all the outliers in all scenarios. This is because the uniform component of such model greatly increases the response variability, thereby producing a masking effect of the outliers. On the contrary, the VIBReg and FBReg manage to accurately identify anomalous observations, and exhibit a considerable robustness at the same time.
Let us now compare the models excluding the effects on residuals.
In all scenarios, the VIBReg has better or comparable performances than all other models in terms of robustness of marginal and joint posteriors as well as regression curves. Very importantly, it is the only model always showing a great stability when extreme outliers are present. The FBReg prevails in terms of predictive intervals and fitting ability in the first two scenarios. The VIBReg is the best performing model in Scenario III with respect to all considered effects. In particular, it displays a substantial larger stability than the other models in terms of regression coefficients and curve.
In no occasion, the BRReg dominates the other models (nor the BReg) resulting particularly unsatisfactory with respect to stability of the regression curve, residuals and predictive intervals.
Application to Australian Institute of Sport data
With the purpose of comparing the four competing models, we provide an application to the Australian Institute of sport dataset, which is included in the R library sn. This dataset has been analysed by Bayes et al. (2012) to show BRReg robustness. It consists of 37 rowing athletes for whom the body fat percentage (Bfat, response variable), the lean body mass (lbm, a quantitative covariate) and gender (encoded as 1 if female and 0 if male) have been collected.
Regression model with one covariate
In all four models, a regression model for the mean is defined as follows:
where
Here, the presence of two lower outliers (filled-in circles corresponding to subjects 16
The estimation of vector of parameters, both on the entire dataset and after removing the outliers, has been dealt with through HMC algorithm (see Section 3). Table 1 reports the parameter posterior means as well as the fitting measure WAIC1 (see Section 4.4).
Posterior parameter means and fitting measure WAIC1 for the four models under the regression model with one covariate
The best fit is provided by the FBReg model both on the entire dataset and after removing the outliers. The regression coefficient estimates are similar among models in the case with no outliers. In terms of single regression coefficients, the BReg results the most affected by outliers, whereas the BRReg is the most stable. However, the overall impact of outliers on the regression curves leads to different conclusions. To see this, in Figure 11, we report the fitted curves for the four models. As they are virtually overlapping when outliers are absent, in this case, we plot only one curve to preserve readability.

Comparing the curves with and without outliers, one can see that the VIBReg and the FBReg exhibit quite a smaller variation than the other two models. Indeed, this is confirmed by the curve distances (see formula 4.2), which take the values 0.264 (VIBReg), 0.406 (FBReg), 0.709 (BRReg) and 0.906 (BReg). Interestingly, the regression curve of the BRReg is shifted upwards, that is, in the direction opposite to the outliers, unlike the other models. This behaviour, as already noticed, is due to the uniform component which drives the overall regression curve towards 0.5.
An analysis of the residuals, of the estimated response variances, and of the predictive intervals (see Figures 12, 13 and 14) confirms the conclusions drawn in Section 4.5.
Standardized residuals for each subject and for each model on the entire dataset
Estimated response variance ratio with respect to FBReg. VIBReg (solid line), BRReg (dotted line) and BReg (dotdash line)
Posterior predictive means (cross symbols) and 90% predictive intervals for each subject and for each model (observed values in circles, filled-in for outliers) on the entire dataset
Specifically, only the BRReg model residuals do not identify the two outliers. This is because its response variance is at least 20 times larger than the variances of the other models (see Figure 13). In terms of predictive intervals, only the BReg fails to account for the outliers. Moreover, the FBReg shows the smallest length of the intervals corresponding to outliers, while this length is larger for the VIBReg (approximately equal to 0.5) and especially for the BRReg (approximately equal to 1).
The analysis on the Sport dataset has been further enhanced by evaluating the impact of gender on the response variable. To this purpose, the regression function for the mean, in all four competing models, is set as follows:
where
To better explore the flexibility provided by our new model, we additionally regressed onto covariates the parameter
This is consistent with data pattern which displays a different shape for male and female athletes (see top left panel of Figure 15). When accounting for gender, a third outlier clearly shows up belonging to the female group (lowest filled squared point).
Furthermore, one of the two previously identified outliers (squared empty point in Figure 15) in the new setting becomes a very extreme one. For this reason, we decided to conduct our analysis by considering the following three cases: (1) entire dataset, (2) dataset without the extreme male outlier, (3) dataset without all three outliers.

As in the one covariate example, we report the posterior means of the parameters as well as the fitting measure WAIC1 of the various models (see Table 2). It is worth noting that the BRReg model always provides the worst fit, whereas the VIBReg outperforms all the other models when outliers are present.
Parameter estimates and WAIC1 under the regression models with three covariates
A graphical representation of the fitted regression curves for the four models can be found in Figure 15. The VIBReg, FBReg and BReg regression curves are basically overlapping in all scenarios. Furthermore, no model seems to be stable when the extreme male outlier is introduced (case 1). Indeed, this extreme point causes a substantial flattening effect for the male group and is never recognized as an outlier. However, fairly satisfactory results are obtained when such a point is excluded. It should also be noted that the regression curve of the BRReg model is still affected by the shifting upwards phenomenon, already observed in one covariate example. This may explain its poor fit.
Thus, although the VIBReg is not more stable than the other models in this particular example, its special structure does allow for a better fitting in presence of outliers.
It appears also of interest to examine how the additional flexibility, introduced by letting the parameter
Posterior estimates of inflation parameter
and precision parameters separately for males and females
Even in this case, we have computed residuals and posterior predictive intervals. We decided not to report them, because they display the same characteristics already highlighted in the one covariate example.
In this article, we introduced a new regression model for proportions aimed at handling outlier observations, and we compared it with the BReg model and two other mixture regression models for bounded responses. All three mixture models share a common beta mixture structure with two components: an arbitrary beta component is intended to account for the main body of data and the other one is devoted to capture outlying observations. This second component is expected to better account for the increased variability and to make the parameter estimates and regression curves less affected by outliers. In particular, the second component is taken to be a uniform distribution in the BRReg, a beta distribution with the same variance as the main component but a shifted mean in the FBReg, and a beta distribution with the same mean but an inflated variance in the VIBReg model.
Indeed, the results of our robustness study show that, in many respects, mixture models are by far more stable than the (BReg) one.
More specifically, in all scenarios, the VIBReg has better or comparable performances than all other models in terms of robustness of estimates as well as regression curves, with specially outstanding results when extreme outliers or both upper and lower outliers are present. The FBReg prevails in terms of predictive intervals and fitting ability in the first two scenarios, with the VIBReg being just slightly worse. It follows that the VIBReg, unlike the other models, appears a satisfactory choice under all examined conditions.
From a computational perspective, the use of Stan software, instead of the more accessible to practitioners WinBUGS, made the application of the estimation algorithms quite efficient in terms both of runtimes and of convergence characteristics. This is particularly relevant, since, within mixture models, low convergence rates or even convergence failures often occur.
Although in the article, we have considered response variables with values in
Finally, another relevant extension of the VIBReg can be achieved by letting some other model parameters (besides the mean) to depend on covariates. An example in this direction has been given in the application of Section 5.2, where the variance inflation parameter is considered. In addition, although the model implies a natural form of heteroscedasticity, as discussed in Section 2.3, it may be useful in some settings to model the variability (i.e., the parameter ϕ) as a function of covariates as shown, for example, by Smithson and Verkuilen (2006). This can be achieved within the VIBReg without special difficulties.
Supplementary materials
Supplementary materials for this article are available at http://www.statmod.org/smij/archive.html.
Acknowledgements
We are grateful to the referees and to the editor for their constructive comments, which helped to improve the article.
Appendices
Appendix A Proof of Proposition 1 (Identifiability of the VIBReg model)
Proof. First we show that the VIB distribution is identifiable. Let
where
and it is a continuous function on
It follows that
we obtain
Let us now focus on the VIBReg identifiability. Obviously,
Appendix B Proof of Proposition 2 (Boundedness of the VIBReg model likelihood)
Proof. Let us first study some asymptotic properties of the log-likelihood
By Lemma 1 in the Appendix of Migliorati et al. (2017), a necessary condition for
where
is the KL divergence between the vector of probabilities
Let us now consider the log-likelihood
where
Therefore, it is enough to prove likelihood boundedness for case (iii). A necessary condition for
thus converging to zero, as
It follows that
On the other hand, if there exists an observation
As, under condition (iii),
Appendix C Predictive intervals
Scenario I: Posterior predictive means (cross symbols) and 90% predictive intervals for each subject (outlying observations are filled-in circles)
Scenario II: Posterior predictive means (cross symbols) and 90% predictive intervals for each subject (outlying observations are filled-in circles)
Scenario III: Posterior predictive means (cross symbols) and 90% predictive intervals for each subject (outlying observations are filled-in circles)
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
Research partially financially supported by the Italian Ministry of University and Research, Grants F.A. from the University of Milano-Bicocca.
