Abstract
Doubly bounded continuous data are common in the social and behavioral sciences. Examples include judged probabilities, confidence ratings, derived proportions such as percent time on task, and bounded scale scores. Dependent variables of this kind are often difficult to analyze using normal theory models because their distributions may be quite poorly modeled by the normal distribution. The authors extend the beta-distributed generalized linear model (GLM) proposed in Smithson and Verkuilen (2006) to discrete and continuous mixtures of beta distributions, which enables modeling dependent data structures commonly found in real settings. The authors discuss estimation using both deterministic marginal maximum likelihood and stochastic Markov chain Monte Carlo (MCMC) methods. The results are illustrated using three data sets from cognitive psychology experiments.
1. Introduction
Continuous dependent variables with scales bounded at both ends are fairly common throughout the social and behavioral sciences. Examples from psychology are the percentage of time attending one stimulus versus another as in habituation studies, subjective probability or confidence ratings common in cognitive research, or total scores on a symptom questionnaire applied in a community setting. In behavioral economics examples include most utility scales (usually assumed to be bounded at both ends) and proportional measures such as allocations to investments in a portfolio and leverage (the ratio of debt to assets plus debt). These continuous doubly bounded dependent variables frequently present special problems for modelers. Chief among these are uncorrectable skew and heteroscedasticity, both often arising from the fact that as the mean response moves toward either scale boundary, the variance tends to decrease and skew tends to increase. In short, in a bounded response space, the moments of the distribution are inextricably linked. Furthermore, the closer one is to a boundary the stronger this linkage is.
Several authors (Kieschnick & McCullogh, 2003; Paolino, 2001; Smithson & Verkuilen, 2006) have presented arguments that traditional methods such as regression-linearizing and variance-stabilizing transformations or robust estimators are not always appropriate for modeling this kind of limited dependent variable. They propose an alternative in the form of using the beta distribution to model such variables. The beta distribution is a well-known member of the exponential family of distributions for which the regularity conditions that help ensure maximum likelihood estimates exist and are well defined. Nevertheless, the literature on generalized linear models (GLMs) for beta-distributed dependent variables (i.e., beta regression) is sparse. In the otherwise exhaustive Handbook of Beta Distributions, there is no mention of beta regression (Gupta & Nadarajah, 2004), although there is a sizeable parallel literature on beta-binomial regression (an early instance is Crowder, 1978), suited to modeling proportions of counts where the beta distribution is used as a hierarchically specified random effect to address overdispersion.
An early example of beta regression is Brehm and Gates's (1993) model of police compliance with supervision. However, Paolino (2001) was the first to employ the mean-precision parameterization of the beta distribution that greatly simplifies interpretation. Apparently independently, Ferrari and Cribari-Neto (2004) derived a similar beta regression model that recently has been implemented in the SAS GLIMMIX procedure (SAS, 2008), and likewise Kieschnick and McCullogh (2003) compared the performance of a beta regression model for proportions, as employed in economics and finance research, with several alternatives and concluded that it is often the best option. Noël and Dauvier (2007) and Noël (2008) have presented unfolding and dominance item-response models for continuous doubly bounded scale items based on the beta distribution.
Some of the above-mentioned versions of beta regression treat the precision parameter as a nuisance. Recently Smithson and Verkuilen (2006) followed and extended Paolino’s model by explicitly modeling precision (and therefore dispersion) as well as the mean response, utilizing the extended GLM framework for joint modeling of means and dispersions described in Chapter 10 of McCullagh and Nelder (1989) and developed by Smyth (1989) for random variables in the exponential family of distributions. There are good reasons for regarding the modeling of dispersion as important in its own right. We concur with Carroll’s (2003) observation that ignoring variance structure or treating it as simply a nuisance not only can lead to inefficient estimation but to misleading conclusions. Smithson and Verkuilen (2006) presented real-world examples of this and the present article includes additional illustrations.
Likelihood maximization has been the dominant approach in estimating beta regression models (e.g., Paolino, 2001; Smithson & Verkuilen, 2006), with Vasconcellos and Cribari-Neto (2005) and Ospina, Cribari-Neto, and Vasconcellos (2006) proposing bias-correcting adjustments. Epsinheira, Ferrari, and Cribari-Neto (2008a, 2008b) explore alternative weighted standardized residuals and influence diagnostics for the Ferrari-Cribari-Neto beta GLM. Buckley (2002) implemented the Paolino model in a Bayesian Markov chain Monte Carlo (MCMC) procedure, and Smithson and Verkuilen (2006) also did so with their version.
A number of gaps remain in our knowledge about beta regression models. Chief among them is the absence of methods for handling dependent observations. In this article, we develop and explore generalized linear mixed models (GLMMs) and related dependent-observation models for beta-distributed dependent variables. Justifications for GLMMs as a way of handling dependencies are well known. The dependencies among observations may arise naturally from effects such as autocorrelation over time, within-subject correlations in repeated measures experiments, or clumping due to within-cluster homogeneity. We begin by reprising the independent-observations beta regression model and then developing an overall framework for beta GLMMs, including both mixed and mixture distribution models. A simulation example is presented, comparing maximum likelihood and Bayesian MCMC estimations, along with a discussion of variance partition issues and goodness of fit. Three examples of applications to real data are provided.
2. Regression With the Beta Distribution
2.1. Independent Observations
This section summarizes a longer discussion found in Smithson and Verkuilen (2006). Let
To form a regression model, consider two design matrices
In summary, beta regression assumes
Let
2.2. Estimation in the Independence Case
The beta forms a two-parameter exponential family and, for models lacking dispersion regressors that are free of the usual trouble spots for any GLM such as improperly scaled design matrices or collinearity, is generally quite numerically tractable. The estimation theory for the independent case has been examined in great detail by Ospina et al. (2006), who note that beta regression for independent observations satisfies all usual regularity conditions for maximum likelihood estimation (MLE). Estimates of location parameters are very close to unbiased even for modest samples while estimates for the precision parameter tend to be somewhat optimistic, that is, too large, with the bias increasing as the true precision increases. They propose a correction for this bias, which we do not pursue here. In our experience, behavioral science data tend to have relatively lower values of the precision parameter and are less biased as a consequence. With a dispersion submodel, estimation is trickier and better starting values for the parameters are required. We have found the best procedure is to get starting values by fitting a reasonable location-only model before considering dispersion covariates.
MLE proceeds through numerical maximization of the log likelihood using a Newton or quasi-Newton method with standard errors for the parameters coming from the inverse of the final Hessian, that is, observed information, or using Fisher scoring and expected information. In general, Bayesian estimation seems to be a bit more forgiving than MLE in the analyses we have conducted because the priors serve to smooth the likelihood. Unsurprisingly, given that both are likelihood based, where MLE has problems, in particular, when
2.2.1. Handling Boundary Observations
One important problem with data on a bounded interval is the presence of exact boundary observations, that is, 0 or 1. These observations occur in real data but are impossible given the sample space of (0, 1). For instance, if the response scale is, say, a discrete response scale on {0, 1,…, 100} that is being analyzed as a continuous score transformed into a proportion, exact values of 0 or 100 can and do occur. Even in a slider response with extremely high resolution, it is not unlikely that responses will pile up a bit at the boundary, causing a probably unimportant but notable deviation from the model. Points on the interior such as the midpoint may also occur more often than is entirely consistent with the model.
Provided that boundary observations do not represent qualitatively different responses but instead are reasonably assumed to be the result of finite precision of measurement, it is reasonable to proceed with analysis by “cheating” the observations slightly away from the boundary, via either of two methods. The first rescales all observations by a small amount:
We advise use of sensitivity analysis to ensure that estimates and inference are not notably affected by the choice of
2.3. The Mixed Model
It is somewhat unfortunate that the usual terminology distinguishes between mixed and mixture models as, conceptually, they are essentially the same. In both cases, one or more missing independent variables constant within a cluster are posited to account for dependence among observations in a cluster. Upon conditioning on these variables and observed explanatory variables, the observations are assumed independent. Beta regression in the independent case states that, conditional on the regressors
We follow the basic scheme given in McCulloch, Searle, and Neuhaus (2009, chap. 8), which we take as a basic reference. Pinheiro and Chao (2006) provide a very useful summary of the estimation theory for mixed location models. Much less is known about estimation of dispersion mixtures or location/dispersion mixtures. The discussion found in Johnson (2003), where a mixed location/dispersion model was considered for discrete ordinal variables is helpful. Proactive Monte Carlo simulation of models that are likely to be used in a given study may be useful if one is in doubt about estimation properties (Steiger, 2001). To do this, simply generate replicated data sets using a random number generator and estimate the desired model on these replications.
Let
An important special case is the finite mixture model, which asserts that, conditional on regressors
2.4. Remarks on Estimation
We now offer some remarks on the two estimation strategies used in this article in the context of mixed and mixture models. First, we discuss issues that are common to both methods and then remark on aspects that apply to each individually.
Estimation of a GLMM is markedly more complex than in the independent observations case. In particular, conditions guaranteeing global concavity of the log likelihood do not hold here, and it is crucial to get good starting values. Because numerical optimization algorithms are only locally convergent, having good starting values greatly speeds estimation. Similarly, for MCMC, starting too far from the region of high posterior density will slow convergence. We have found that estimating a sequence of simpler models and feeding previous parameter estimates into later estimations greatly facilitates the fitting process. For instance, a fixed effects model with no random components should provide decent starting values for the fixed effects in a mixed model and a simplified random effects only model often provides a good guess for the mixing components. Even providing the correct signs of the coefficients will help convergence. The other area where estimation can go awry for the beta is in the matter of the log-gamma function and its derivatives near 0. These functions will underflow for argument values of approximately
MML estimation involves using a double iteration to fit the models, alternating between a quadrature step (using Gaussian quadrature) to integrate out the random effects and an optimization step to update the fixed effect parameters of the marginal likelihood. This process is computationally intensive. It is incumbent on the user to make a careful choice of the number of quadrature nodes and to have good starting values for model parameters. Adaptive Gaussian quadrature is slower (often much slower) but more accurate and requires fewer quadrature nodes. Ensuring that the solution is not sensitive to the number of quadrature nodes by manually varying that number is highly advisable. In our experience, beta regression seems to require more quadrature nodes than, say, logistic regression, which is perhaps unsurprising because the likelihood function is more complex than for the Bernoulli. The only other major area of trouble is when the maximum of the log-likelihood function is near a degeneracy. In this case, the Hessian will be poorly conditioned and optimization will usually fail (MCMC usually exhibits poor mixing in this case as well). Developing a regularization strategy for MML, for example, by adding a small number of pseudo-observations to shrink the likelihood toward a relevant null model and smooth the log likelihood, would constitute a useful research topic.
MCMC estimation generally performs well, both in successfully estimating complex models and in accuracy and interval coverage. Our conclusions regarding the strengths and weaknesses of this approach are largely in line with Browne and Draper’s (2006) comparisons with likelihood and quasi-likelihood approaches. For medium to large random effects and moderate to large samples, Bayesian MCMC estimates and interval coverage seem to be reasonably well behaved. In particular, MCMC interval coverage fidelity appears superior to that provided by asymptotic Gaussian confidence intervals. MCMC estimates also are less biased than the quasi-likelihood ones (especially regarding variance components). Furthermore, in our experience MCMC often can successfully estimate complex models where likelihood-based approaches fail or are intractable, particularly when the number of random effects is larger than 3.
However, MCMC interval coverage fidelity is not well studied for small-to-medium samples in non-Gaussian random effects or hierarchical models. Browne and Draper (2006) found that undercoverage in logistic random-effects models could be corrected for by using appropriate posterior summary measures, but as they point out this is not a satisfactory solution. Likewise, MCMC may not perform as well for models in which random effects are small, but in general neither does any other known method. Finally, large data sets, models with many parameters, or models that yield full conditional distributions that are not log-concave can cause MCMC to be time-consuming, although this is likely to become less of an issue as computing power increases.
It has been widely noted that the choice of priors for variance parameters in random effects models can be problematic. We recommend avoiding the use of improper priors and evaluating the effects of choosing alternative “noninformative” priors even when they are proper, for reasons summarized by Congdon (2003, p. 21). Of course it may be reasonable to use an informative prior for substantive and/or technical reasons. See Chen, Ibrahim, Shao, and Weiss (2003) for recent investigations regarding informative priors for GLMMs. In the analyses reported here, we adopted two commonly recommended priors for variance parameters: A uniform prior for
2.5. Goodness of Fit and Model Checking
A full explication of model evaluation, model checking, and model comparison for beta GLMMs is beyond the scope of this article. These topics raise some open questions in the GLMM literature generally. We briefly review the available tools for evaluating and comparing beta GLMs and indicate unresolved issues. We also address some of these issues in the examples.
Global model comparison may be achieved using well-known measures such as the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and Deviance Information Criterion (DIC). These measures are all essentially attempts to estimate relative Kullback-Liebler divergence based on the optimized log likelihood and a penalty term representing model complexity. For MML, comparative measures of fit based on the likelihood (e.g., AIC and BIC) and the log-likelihood chi-square test for comparing nested models all are applicable. For Bayesian modelers, the posterior log likelihoods and DIC may be used. These measures are all subject to the usual caveats that apply in other contexts and we emphasize that they are best used in a comparative fashion. One particular warning that applies to users of beta regression is that all integration constants need to be included in the computation of these measures when they are used to compare across distributions (Pawitan, 2001).
Model evaluation and checking involve two related issues: How well the model “predicts” the data and whether there are unduly influential observations (e.g., outliers). The first issue may be dealt with via simulations from the posterior predictive density. Typically the match between the replicated and actual data is evaluated by comparing their cumulative relative frequencies. A similar approach can be used in MML as each point predicts a density, conditional on the regressors and parameter estimates. The main idea is generating replicated data by predicting
Lack of a deviance or an appropriate residual is an undesirable feature of the beta GLMM. Clearly, the usual notion of a standardized residual is not adequate even for beta GLMs, because these models typically are heteroscedastic. Ferrari and Cribari-Neto (2004) proposed a deviance residual for the beta GLM, but Epsinheira et al. (2008a) found this residual to be inaccurate, particularly when the location submodel estimates are close to 0 or 1. Epsinheira et al. propose and evaluate two residuals based on the Fisher scoring algorithm. Epsinheira et al. (2008b) develop a Cook-like distance measure of the influence of deleted observations on parameter estimates. The Bayesian tools for influence diagnostics include the conditional predictive ordinate (CPO; Weiss & Cho, 1998), which is readily applicable to beta GLMMs and probably is the most well-established option for these models. This method is essentially an approximation to the jackknife and, more broadly, deletion of suspect observations (or blocks of them) provides a useful means to diagnose models. Longford (2001) proposes a similar approach in a likelihood context, using the parametric bootstrap to simulate the distribution of residuals at various levels in a multilevel model. An algorithm such as Atkinson’s forward search seems promising in this context (Atkinson, 1994). The principal disadvantage of approaches such as the CPO, parametric bootstrap, or forward search is, of course, computational intensity. Ntzoufras (2009) provides CPO code in WinBUGS.
3. A Simulation Study
To demonstrate the basic properties of mixed beta regression, we have performed parameter recovery simulation studies, one of whose results we show here. The simulated data and code are available from the authors. The study in question had 100 replications of
Results of the 4 × 1 Simulation Study
Note: MML = marginal maximum likelihood.
This model is quite similar to several commonly employed in the behavioral sciences, including unidimensional item response theory models or those for repeated measures experiments. We fit the model in many different ways but report the results from two runs, one using MML estimation and the other using Bayesian MCMC estimation. Given good starting values, MML convergence is rapid, each replication taking under 30 seconds to run on a 2.4 GHz Intel Core 2 Duo with 2 GB of RAM running Windows XP Pro using SAS 9.1.3 NLMIXED (SAS Institute, 2006). Simulation studies with MCMC are slower, but each run took no more than 5 minutes to converge running winBUGS 1.4.3 (Spiegelhalter, Thomas, Best, & Lunn, 2004) on a similar machine.
As can be seen in the table, the parameters are nearly unbiased even for this relatively modest sample, though of course inside the link they are more biased (Pawitan, 2001). Further examination shows that all parameters both inside the link and as linear predictors have reasonably symmetric distributions; location parameters are better approximated by a Gaussian than dispersion parameters but the skew exhibited for the latter is quite mild. The only parameter that exhibits any sort of downward bias and nonnormality is
3.1. Partitioning the Variance
One common issue in GLMMs involves partitioning the variance into that due to different levels in the model. For instance, in a two-level model with a random interecept, what proportion of the variation in the dependent variable is attributable to fixed effects, unexplained Level 1 variability and the Level 2 variability accounted for by the random intercept? This was discussed quite thoroughly by Goldstein and colleagues (Browne, Subramanian, Jones, & Goldstein, 2005; Goldstein, Browne, & Rasbash, 2002) in the context of mixed binary regression. In particular, in a linear mixed model, the question is much more straightforward to answer than in a nonlinear model such as mixed beta regression. Mixed beta regression suffers the same problems as mixed binary regression regarding the impact of nonlinearity. In particular, because the variance is a function of the mean, it is not possible to compute straightforward quantities such as the intraclass correlation frequently used in a mixed model context to measure the proportion of dependence at the cluster level.
We have adapted the methods they proposed, which we illustrate here. The approximation method, as explained in Goldstein et al. (2002), uses a first-order Taylor expansion of the link function at the mean of the distribution of the appropriate random effects. Browne et al. (2005) extend their approach to mixed models with more than two levels. We adapt their approach for the logistic link by incorporating the dispersion effect parameter
We start with an estimated location submodel
In a two-level model, following Goldstein et al.’s (2002) first-order Taylor expansion of the location submodel, our adaptation of their formulation yields the following approximate estimate of the Level 2 variance,
We start with an estimated location submodel. Generate a large number (say, Select one or more appropriate combinations of values for the covariates Compute the variance Compute The approximate VPC is
The approximation method is illustrated here; a similar demonstration of the simulation method is available from the authors. In our sample of 100 observations from our repeated-measures simulation, the sample means are
Approximation and Simulation Method VPC Results
4. Applications
We provide three real examples of mixed or mixture beta regressions using data from several cognitive experiments. The first example shows a mixture model applied to judged probabilities elicited in an experiment by Gurr (2009). It demonstrates the performance of a finite mixture of betas in a relatively simple setting. The second example is a more complicated reanalysis of judged probability data from Fox and Rottensreich (2006). This example illustrates the fact that modeling the dispersion structure often leads to different conclusions than modeling location alone, more in line with theoretical expectations. The third example considers numerical confidence ratings taken from Roy and Liersch (2009). This model includes both location and dispersion submodel random effects.
4.1. Partition Effects on Probability Judgments by a Finite Mixture Model
On grounds of insufficient reason, a probability of
Gurr (2009) investigated partition priming effects by setting up an ambiguous sample space and introducing an experimental manipulation to influence perceptions of this space. One hundred fifty-five undergraduate students at The Australian National University (108 females, 43 males, and 4 unspecified) were recruited for his study. Their ages ranged from 17 to 43 years (
The resultant sample space is ambiguous because the number of positions available in Geoff’s firm and the number of potential applicants to those positions are not specified. The hypothesis associated with this part of the study was that participants would be more likely to return “ignorance” priors (i.e., assigning equal probabilities to all three targets) under the no-information condition. In particular, it was expected that the no-information condition would elicit more ignorance priors comprising probabilities of 1/2 for each target, by virtue of participants thinking in terms of a two-alternative sample space for each target.
Participants also were randomly assigned to one of two priming conditions, one asking them which applicant was most likely to get a position in Geoff’s firm (the comparison prime) before they were asked for their probability assignments and the other not asking this question. The major hypothesis for this aspect of the experiment was that the comparison prime would make participants more likely to make their probabilities additive (i.e., summing to 1 across the three targets) because it would cue them to think that there might be only one position available. The provision of three targets enables a distinction between judges expressing ignorance by assigning probabilities of 1/2 to all three targets and who express ignorance by assigning 1/3 to all targets, whereas using only two targets would conflate additivity with ignorance prior assignments.
Initially, probability assignments in the two tasks were analysed separately using mixture beta-regression models via MLE methods, using SAS 9.2 and IBM SPSS Statistics 18 (IBM SPSS results are reported here). No-effects models were determined by comparing model fit for a single-distribution, two-component, and three-component mixture models. All two-component mixture models fixed one component distribution’s mean either at 1/2 or 1/3 while freeing the other mean to be estimated, and the three-component model fixed two component means at 1/2 and 1/3 while freeing the third component mean to be estimated. The fixed-means component distributions were modeled by
Beginning with the no-information condition task, both two-component mixture models (one with a component mean fixed at 1/2 and the other fixed at 1/3) outperform the single-distribution model, difference in log-likelihood chi-squares:

Component distributions for prime by information conditions.
A mixed three-component mixture model was estimated in WinBUGs 1.4.3 to assess the joint effects of the prime and the no-information versus information conditions. This model is limited to a random intercept in the location submodel, due to there being only two data points per participant. The location submodel is
We used a two-chain model with a 10,000 iteration burn-in and estimates based on the subsequent 10,000 iterations. The parameter estimates, standard errors, and 95% credible intervals are shown in Table 3
. The priming effect on relative composition is similar in both conditions, as can be seen from the credible intervals for
Beta-Regression Mixture Models for Gurr (2009) Study
The estimated and observed proportions are shown in Table 4 . There is a slight tendency to underestimate the proportions of additive responses in the no-prime condition and to overestimate them in the prime condition for the information-condition task, but otherwise the composition structure is recovered well. Of the 310 mean component classification scores produced by the MCMC run, 288 (92.9%) of these were within 0.1 of their correct classification, so the model correctly assigned most cases to a component distribution.
Recovery of Mixture Composition Structure
4.2. Dispersion Modeling to Address Response Style in Judged Probabilities
See, Fox, and Rottenstreich (2006) presented a series of studies of the influence that the number of possibilities (the state-space partition) has on judged probabilities. In their Study 1, participants observed a set of randomly ordered restaurant receipts for various meals on various days by a person named “Joe.” The receipts contained two pieces of information: day of the week (Sunday, Monday, …, Saturday) and meal category (breakfast, lunch, and dinner). Participants were told they would be asked to judge the likelihood of the events they had observed. See et al.’s primary hypothesis was that subjective probability estimates would be biased toward the ignorance prior suggested by the target attribute’s partition: meal of the day (three alternatives, therefore an ignorance prior of 1/3) or day of the week (1/7). Following the learning phase, 267 participants were asked to estimate the likelihoods of all 10 attributes (meal categories and days of the week). One peculiarity of their data is the substantial number of exact boundary observations, particularly concentrated in certain participants. These values needed to be adjusted slightly. In our analysis here, we replaced exact 0 values (12 in all) with 0.001 and exact 1 (5 in all) with 0.999. The exact 1 responses are particularly important because these are highly unlikely responses given the stochastic specification of the model.
To assess the relative influence of the partition and the information provided during the learning phase, See et al. conducted separate regression analyses for each participant. Their model was derived from Fox and Rottenstreich’s (2003) ignorance prior model, which can be written as follows:
One obvious weakness with this analysis is the fact that each participant only provides 10 observations, which means each individual-level regression only has six degrees of freedom. We reanalyzed their data using a beta GLMM approach. All models shown here were fit using WinBUGs 1.4.3. We ran two-chain models with a burn-in of 5,000 iterations and an additional 10,000 iterations for estimation purposes. (MML was also attempted. The final Hessian was singular but coefficient estimates were very close to those from MCMC.) In these models,
Table 4 shows the posterior mean
Other models are possible. One obvious alternative is a model that incorporates a random intercept in the dispersion submodel instead of one in the location submodel. Including random intercepts in both submodels resulted in models that failed to converge in WinBUGS. However, a model with a random intercept in the dispersion submodel alone converged well. Moreover, it considerably outperformed all of the models presented thus far (posterior mean
What, then, does the dispersion random effect mean here? If one examines the raw data as well as
See et al. (2006) also collected confidence judgements and knowledge judgments to measure how knowledgable their subjects were. The motivation behind this was that more knowledgeable subjects might be less susceptible to priming effects. A subject-level knowledgeability covariate could be incorporated into a more elaborate model, with a cross-level knowledgeability-by-partition interaction term to ascertain whether greater knowledgeability lessens the partition effect.
As a comparison and more in line with the original analysis, we also attempted to fit logit-normal mixed models by transforming the dependent variable and assuming it is normal. These models exhibited substantial sensitivity to the choice of
4.3. Location and Scale Mixing: Numerical Confidence Ratings
The data used for this example come from Roy and Liersch (2009), who considered the “better than average” effect, also known as the Lake Wobegon Effect, the well-known finding that more than 50% of subjects are willing to rate themselves as being better than average, impossible in a symmetric distribution (e.g., Alicke & Govorun, 2005). The participants are 35 female undergraduates at a large Midwestern university (an additional 20 male participants were dropped from the analysis here to avoid dealing with a substantial gender by skill interaction). They provided numerical confidence ratings (on a scale of 0–100) of their ability to perform ten skills: riding a bike (bke), dancing (dnc), driving a car (drv), sensing emotion (emo), karate (kte), performing magic tricks (mag), musical performance (mus), speaking in public (spk), playing “ball” sports (spt), and tying a shoe (sho).
These skills are expected to have substantially varying difficulties in the study population—for instance, karate and magic are both expected to be difficult while tying shoes is expected to be easy. The experimenters hoped to see if participants would rate themselves as being worse than average on difficult skills and better on easy skills, which would imply substantially skewed responses, both left and right skewed, as difficulty varied. Of course, it would not be surprising for there to be substantial differences in response style in these data, in location, dispersion, or both, much like what was observed for the judged probabilities in the previous example. A location difference indicates a systematic anchor point of “generalized perceived self-competency” for a given participant while a dispersion difference indicates how spread a given participant’s responses are around that. Box plots of the ratings are shown in Figure 2 .

Roy and Liersch (2009) confidence ratings for 35 female participants.
We rescaled the response to lie inside the unit interval by letting
We consider four different models for these data. The first model fits only fixed location and dispersion effects for skills with no random effects. The second model considers a location random effect only. The third fits a dispersion random effect only. The fourth considers random effects for both location and dispersion, allowing these to be correlated. The parameter estimates shown here are from the MML estimation but MCMC estimates are very similar. As there are many model parameters, we consider only the random effects and fit statistics for the moment. Examination of the reproduced means for Model 4 shows that it almost exactly recovers the raw data means for each skill (as indeed do the others). However, as can be seen from the BIC statistics, Models 3 and 4 are quite soundly preferred to the other two, again suggesting substantial differences among subjects in scale usage. However, Model 4 has some modest support over Model 3 (and indeed would have more if
All is not well with Model 4, however. To evaluate it in more detail, we computed the Pearson correlation between
Posterior Mean -2log L for Three Models
Random Effects Parameter Estimates and Fit Statistics for the Confidence Rating Data
Note: BIC = Bayesian Information Criterion.
To assess the effect of these outliers on the random effects estimates, we refit Model 4 omitting karate and music. In this case, the fit statistics are no longer comparable, but parameter estimates should be similar. The fixed effects are very similar but given the specification of the model this is to be expected. Examining
5. Discussion and Conclusion
The principal focus of this article has been a GLMM framework for dependent random variables with conditional beta distributions. In particular, we have shown that for subjective rating data, a random effects dispersion model is highly useful in modeling individual heterogeneity in response style. Response style is a pervasive problem with self-report data such as numerical confidence ratings, judged probabilities, sliders, and so on, all of which are used quite commonly by psychologists and other behavioral scientists. While in many cases it is simply a nuisance behavior, in other cases, response style heterogeneity might be an important study outcome in its own right, for instance, when group differences are expected as in attitude polarization (Dolnicar & Grun, 2007). We have seen in two examples how partition-priming effects on probability judgments are more accurately described by response-style heterogeneity than by a shift in the mean response.
This framework may be extended and developed in several ways. We briefly discuss the following starting points for such developments: Alternative estimation techniques, model diagnostics, special mixture models, alternative link functions, and alternative distributions.
Beginning with estimation methods, at least three major approaches remain to be explored. This article dealt with MML and Bayesian MCMC techniques but omitted restricted maximum likelihood (REML), penalized quasi-likelihood (PQL), and generalized least squares (GLS) approaches. These techniques were neglected primarily for lack of software tools with which to investigate them. As mentioned earlier, there is a limited implementation of PQL estimation in the SAS GLIMMIX procedure. Investigations of REML and GLS await software developments.
We now turn to extensions of the GLMM itself. Some applications require “
More general extensions of the GLMM may be obtained by considering alternative link functions and distributions. As indicated earlier, the logit is not the only twice-differentiable link function that transforms the unit interval to the real line. Little is known about how these alternative link functions perform in comparison with the logit.
Likewise, several alternatives to the beta distribution have been proposed. Kumaraswamy (1980) provided a two-parameter (
A more popular alternative is Johnson’s
Fourth, the logit-normal and logit-logistic distributions are members of the otherwise unexplored symmetric family that utilizes an invertible transformation
Finally, Cox (1996) and Papke and Wooldridge (1996) apply a quasi-likelihood approach to modeling a variable on the unit interval. The quasi-likelihood approach specifies the first and second moments of the conditional distribution as functions of the mean but does not specify the full distribution; in this sense it is a second-order analog of maximum likelihood. This approach is useful when the relationship between the mean and variance is not captured by a standard distribution. Kieschnick and McCullogh (2003) compare a special case of Papke and Wooldridge’s model with their beta regression model on two data sets and find that both work reasonably well and give similar results. A strategy using a semiparametric model based on the Dirichlet process might also be a useful strategy to consider (McAuliffe, Blei, & Jordan, 2006).
Summing up, this article has extended beta regression to deal with dependent observations in a general way, along the lines of the multilevel modeling literature. The resulting GLMM has been shown to be effective in a simulation study and three real-world applications and, as indicated earlier, it has applicability in a wide range of disciplines where doubly bounded constructs are commonplace. We also have demonstrated the practicality of MLL and MCMC estimation of this GLMM under a variety of conditions. Code for the examples in this article and additional resources are available at http://dl.dropbox.com/u/1857674/betareg/betareg.html. As should be apparent from this section, there is considerable potential for extensions and developments in modeling doubly bounded dependent variables, but the state of the art there is rapidly approaching a richness and sophistication comparable to the models available for other kinds of variables.
